Can ChatGPT Be Detected? A Detailed Look

In the two short years since its launch, ChatGPT has almost single-handedly ushered in the era of artificial intelligence. This powerful AI chatbot can assist us in many ways: you can have it help you research a product you’re planning on buying, complete background research for a project you’re writing, or even have it do the work for you if you want.

As content created by ChatGPT has become more pervasive, so have the methods we’ve developed for detecting it. But are these tools, such as AI Detector, effective? Short answer: yes.

However, the longer answer is that there are a lot of variables at play with anything that has been AI-generated by ChatGPT. For starters, every response is different, and ChatGPT’s model is updated quite often—sometimes multiple times a year. On top of that, the prompt itself that you give can completely change the output.

So, to reiterate, yes, ChatGPT can be detected. And while AI detectors aren’t entirely foolproof, their overall accuracy is quite high. To help explain why and how, let’s dive into both ChatGPT and the AI detectors themselves—starting with everyone's favorite chatbot.

Exploring ChatGPT and Its Versions in More Detail

OpenAI initially unveiled ChatGPT in November of 2022, although the company itself has been dabbling in AI since 2015. You might remember DALL-E, which came out a year before ChatGPT; this AI model focused on creating images.

As for GPT—a large language model (LLM); we’ll explore these a bit later—OpenAI first created GPT-1 back in 2018. Although this early version was still capable of some pretty impressive natural language, it was a far cry from where we are today—and where we’ll probably be in the future.

One purpose of exploring each subsequent version of ChatGPT (as well as the GPT model powering the chatbot)—other than it being interesting—is that it helps to describe just how difficult AI detection is. No matter how accurate an AI detector is at any given moment, a new version of ChatGPT is right around the corner. And each time ChatGPT changes, the detectors have to change alongside it.

GPT-3.5 and 3.5 Turbo

By the time ChatGPT came out, OpenAI had been working with AI for a number of years; that’s why it was running on GPT-3.5. That’s also why ChatGPT came out the gate exceedingly skilled at holding conversations, writing essays, and even performing some technical tasks.

However, as impressive as GPT-3.5 was, its limitations quickly became apparent. The model sometimes got tripped up trying to provide longer responses, and would often repeat itself.

And, when ChatGPT was wrong—which it still is—but back then, it was “confidently” wrong, and would stick to its guns no matter how incorrect it was. It was also prone to “hallucinate”—or in other words, it would make things up entirely, almost out of thin air. As you might imagine, these sorts of issues made it less reliable when you were counting on its accuracy and truthfulness.

An incremental version of GPT-3, coined Turbo, was released in 2023 and was just a little more efficient and held better conversations than its predecessor.

GPT-4

GPT-4, also released in 2023, directly addressed many of GPT-3’s issues, refining both accuracy and context retention to help avoid those pesky issues of repeating itself and answering prompts with hallucinations.

The jump from GPT-3.5 to GPT-4 wasn’t just incremental; it introduced a more nuanced understanding of language: GPT-4 handled longer conversations with greater coherence, allowing it to keep track of complex discussions over extended prompts without losing consistency.

While GPT-3.5 was already great, GPT-4 helped to iron out a lot of the issues that were present in the launch version of the model. It’s not flawless, but GPT-4 is a major step forward for OpenAI and ChatGPT.

GPT-4o and o1

It took over a year, but in May of 2024, OpenAI released GPT-4o. While not a major new version, 4o brought a lot of changes and improvements to help make ChatGPT better at what it did.

It refined both ChatGPT’s efficiency and responsiveness and introduced optimizations that reduce processing lag, allowing it to deliver more rapid responses without sacrificing quality or coherence. Interactions with GPT-4o feel seamless and immediate, creating a more conversational flow that doesn’t stall or disrupt momentum.

While GPT-4 was already fast and efficient, GPT-4o brought an enhanced memory. This most recent version was originally just for paid subscribers, but now everyone can benefit from ChatGPT’s memory. GPT-4o can remember aspects of your conversation, allowing subsequent responses to become more and more personalized.

Despite taking over a year, OpenAI updated ChatGPT again In September 2024 with a preview of o1. This new model is intended to spend more time thinking before it responds, which is especially great for tackling complex problems in science, coding, math, and similar fields.

GPT-5 and Beyond

For anyone hoping for a new version of ChatGPT anytime soon, don’t get too excited. OpenAI’s CEO Sam Altman has said that although they’re working on a lot of new models, for the moment, their focus is on Open AI o1.

However, even though very little is known about the next major version of ChatGPT, it’s sure to be a powerful tool that AI tools will have initial difficulty detecting until they get caught up.

A Quick Look at How ChatGPT Works

At the core of each ChatGPT model is an LLM, with the specific kind used by ChatGPT being part of its namesake: the generative pre-trained transformer (GPT).

At the upper level, an LLM is a type of artificial intelligence specifically designed to understand and generate human-like text based on vast amounts of data. These models are built on deep learning architectures.

By analyzing billions of sentences, phrases, and word associations, LLMs begin to grasp language structure, tone, and meaning, enabling them to respond in surprisingly coherent and contextually relevant ways. At their core, LLMs are designed to predict the next word in a sequence based on what came before. But given their scale and depth, they’re capable of doing far more than simple word prediction.

What makes an LLM “large” is the sheer scale of its parameters—essentially adjustable variables that fine-tune its language understanding. It’s these variables and other features that get changed and tweaked with each subsequent version of ChatGPT.

What Is a GPT?

A GPT is the specific architecture behind some of the most advanced LLMs. "Pre-trained" refers to the training process, which occurs before the model has any interaction with users.

The “transformer” part refers to the model’s architecture. It’s built to analyze language by processing all words in a sequence simultaneously, understanding not only individual word meanings but also the relationships between them in context. This parallel processing allows it to generate highly contextual and coherent responses, whether it’s writing an essay, crafting code, or engaging in a conversation.

As a subset of LLMs, GPT models take the idea of language modeling and push it further. Each generation of GPT improves its ability to mimic human-like reasoning and conversation flow, making it one of the most powerful tools in natural language processing today.

So, while not all LLMs use the GPT architecture, ChatGPT and other notable AI technologies are grounded in this particular approach, blending enormous datasets with fine-tuned predictive power.

How Do AI Detectors Work?

It most likely won’t come as much of a surprise, but AI detectors work similarly to how ChatGPT works, as deep down, they’re both powered by LLMs. While, obviously, the functioning of the LLMs is quite different, they are built from the same foundation.

Rather than creating naturally-reading responses to whatever input you type into the prompt, the LLMs of AI detectors are trained to, in essence, “spot the difference.” How it works is by training the LLM on text written by both ChatGPT (and other AI models, for that matter) as well as “real” human writing.

The better each facet of the LLM is trained, the better the results. So, the more varied the human writing data set is, from as many writers as possible, the better the AI detector will be in telling whether something is written by a human.

Similarly, the more content written by ChatGPT that the detectors are trained on, the more accurately they’ll be able to tell when something is AI-written. These tools just need to make sure they’re updated each time a new version of a chatbot like ChatGPT comes out.

What About ChatGPT Makes It Detectable

If you’ve read anything produced by ChatGPT, you’ll have definitely noticed that oftentimes, it takes on a dull and robotic tone. After all, ChatGPT is a computer, and computers think in ones and zeroes, a fact readily apparent in AI-generated writing. Tools like AI Detector pick up on these differences when compared to “real” writing.

ChatGPT’s output, in regards to writing fundamentals, can best be described with two concepts: burstiness and perplexity. Burstiness refers to the variance of sentences and paragraphs and perplexity refers to the overall randomness of word choices.

In the most reductive terms, ChatGPT is like an exceedingly advanced auto-complete system, which works by predicting the most likely strings of words that correctly respond to the prompt you entered. With that in mind, you can see why AI writing has very little perplexity and burstiness—and that’s something detectors can pick up on.

Image Credit: Scribbr

Humans aren’t machines; we create writing based on our wealth of knowledge and experience, which results in something unique and unpredictable. That’s why our words are much higher in both perplexity and burstiness than what AI can generate.

While AI writing, at a glance, appears similar to what we produce, it’s ultimately just a facsimile—an artificially created copy. That’s why most machine-generated content pales in comparison to what’s created by skilled writers. ChatGPT provides excellent information—and performs really helpful tasks—but it does so like a machine.

Where AI Detectors Fall Short With ChatGPT and in General

While the best AI detectors are accurate and reliable—at least as it stands now—they’re not entirely foolproof, and there are some things that can easily decrease the likelihood of AI-generated text being detected.

For starters, with as powerful as ChatGPT is, every output is different. So on rare occasions, you may receive output that your AI detector can’t spot.

On top of that, as a generative AI model, ChatGPT’s prompts have almost limitless potential for influencing the output. A simple prompt, like “Write me a blog post about artificial intelligence,” will provide a bog standard output that will almost certainly be detected.

A prompt with more specific instructions, such as, “Write me a blog post targeted at a general audience about an overview of artificial intelligence technology. Don’t use two words when one would suffice. Use a friendly, conversational tone, as though you were speaking to a close colleague, and inject just a bit of humor.”

Chances are, using a prompt like this is far less likely to be flagged by AI detectors. The more specific the instructions, and the higher the number of them, the less like typical AI-generated content your ChatGPT response will be. While it still might have those hallmarks of AI, low perplexity and burstiness, it will most likely have more variety than most other ChatGPT responses.

That’s the difference between an AI detector that’s 100% certain that something has been AI-generated and 50% certain. Chances are, the more you mess with the prompt, adding more creativity and specifics, the less like typical ChatGPT text the result will be.

How ChatGPT’s Updates Affect Detectability

As you can tell from the opening of this article, ChatGPT has been updated multiple times throughout the last two years, with even more versions on the near horizon.

This constant evolution and iteration of ChatGPT can make detection difficult. That’s because the detectors must be trained on vast amounts of AI-written data based on each subsequent version of ChatGPT.

When ChatGPT first came out with GPT-3.5, the first detectors were trained using that version. Then, when GPT-4 came out, the detectors needed to be retrained using new data, and again when 4o hit the scene. You get the picture.

So, detectors will be least effective right when a new version of ChatGPT—or any chatbot for that matter—releases. Then, once the tools are retrained with new data, they’ll be able to maintain their expected levels of accuracy. Granted, with each new version of ChatGPT, the detectors themselves become better at efficiently upgrading their models.

Final Words – Why Would You Need to Detect ChatGPT?

As it stands in 2024, ChatGPT is easily detectable, which is great news for any person or organization that’s seeking to double-check that content has been written by a person. For educational institutions, tools like AI Detector can be a great help in dissuading academic honesty, although a great pair of eyes can also be helpful since these sorts of AI tools shouldn’t be the end-all-be-all.

The other side of the coin includes publishers and writers who want to uphold their reputation by ensuring that every piece of content they publish reads as though a talented human wrote it and not ChatGPT—especially since, as time goes on, a time may come when ChatGPT can be almost indistinguishable from human writing.

Frequently Asked Questions

Can colleges tell if I use ChatGPT?

Yes, a lot of college professors rely on tools like AI Detector—and their own discretion—to detect when students use ChatGPT. While these tools have grown increasingly good at analyzing sentence patterns and word choice, they aren't perfect, especially as ChatGPT evolves. While you might slip through undetected now, it's a risk, especially as universities continuously update their detection software to keep up.

Can I get in trouble for using ChatGPT at work?

Potentially, yes. Some workplaces have strict policies about using AI, especially when dealing with sensitive information. It’s worth also mentioning that there are companies that welcome AI to boost productivity, but others see it as cutting corners. Always check your company’s policy before using ChatGPT on the job.

How can I make ChatGPT undetectable?

To avoid detection, you'll need to heavily edit ChatGPT’s output. Fortunately, AI detectors can help immensely here, as they’ll highlight which sentences need to be changed. Then, just edit for word choices, and sentence structure, and add personal touches like anecdotes or slight imperfections. However, keep in mind that intentionally trying to hide AI use might lead to consequences, especially in academic or professional settings where trust is key.