OpenAI has declared GPT-4o its flagship model. According to the company, new training and updates make it the go-to option for complex, multi-step tasks. Currently, GPT-4 and GPT-4 Turbo are still very popular, with free users getting limited access to GPT-4o.
Amidst this change and with the backdrop of GPT-4o1, let’s review these two leading models of ChatGPT.
What is ChatGPT?
ChatGPT is a Large Language Model (LLM), developed by the American research organization, OpenAI. It's an artificial intelligence chatbot that can produce advanced outputs with a series of simple instructions from the user.
OpenAI developed ChatGPT so it could understand human language (text) and respond to it.
How ChatGPT Works
ChatGPT is designed to process language and be able to engage with it in the way that a human would. To do that, ChatGPT uses a neural network, which is an AI model that can process language by:
Scanning through text
Analyzing the relationships between words and sentences
Breaking text into smaller chunks and focusing on essential words
Processing all of the above at once
This approach, which gained massive traction with the release of GPT-3.5 in 2022, was much faster than older models. GPT-3.5 was the more refined version of GPT-3 and was more effective because it could clearly understand context and nuance. Based on the same conversation, ChatGPT users could watch the chatbot learn from their inputs and steer the discussion.
The main tasks that ChatGPT was performing for users were text-based. ChatGPT takes and interprets the user’s input and then produces a coherent result. It could have conversations, answer questions, and even be directed to more intellectual tasks. For example, ChatGPT could look at an argument's pros and cons, or write an essay about it.
ChatGPT Uses
ChatGPT has many uses. Some people use it as a fun conversation partner, but others have used it for tasks related to:
Creative writing
Problem-solving
Deeply exploring new topics
Ideation
Brainstorming
Reading and summarizing long articles or studies
Writing new code
Then, there were commercial use cases that most internet users were already familiar with. For example, ChatGPT can function as an AI customer service chatbot. It looks and is interacted with in a similar way, but is far more capable of processing information over the course of a long conversation.
If you’re just using ChatGPT for casual purposes, you may not quickly see the enormous differences between GPT-3.5 and GPT-4, but understanding each model's capacities is important if you want a virtual assistant or SaaS-based tools.
GPT-4 & GPT-4 Turbo: What Changed?
GPT-4 came out in March 2023, offering a much greater capacity for professional and business-level tasks. It offered a basic version but also powered the ChatGPT Team and ChatGPT Enterprise subscription packages. You can still use GPT-4 for these tasks with a Pro subscription for $20 per month.
Image credit: Oh, Namkee & Choi, Gyu-Seong & Lee, Woo. (2023). ChatGPT goes to Operating Room: Evaluating GPT-4 Performance and the Future Direction of Surgical Education and Training in the Era of Large Language Models. 10.1101/2023.03.16.23287340.
GPT-4 was a big change for businesses and professionals thanks to some new improvements.
Enhanced Understanding & Accuracy
While the underlying technology of GPT-4 is the same as that used in GPT-3.5, it was trained more precisely. A quick test will reveal that GPT-4 has a better understanding of the subtleties of language, including nuanced context. In practice, this makes it more suitable for analytical tasks, such as text summarization.
GPT-4 can and does still make mistakes, but its release represented a major improvement with a greatly reduced chance of misunderstandings or AI hallucinations. While these improvements were great for casual users, they were especially relevant to those looking for customer service or text analysis assistance.
The other significant change was that GPT-4 has a better memory and contextual understanding of long conversations than GPT-3.5, reinforcing its ability to perform complex tasks. AI customer support chatbots have been around for a long time, but if you’ve used the older ones, you know that they are almost absurdly limited in what they can remember. With GPT-4, the more a GPT chatbot interacts with the customer, the better it gets because it can remember all previous interactions and apply that context in its next response.
Multimodality
GPT-3.5 was always impressive, but its capabilities were limited to text. GPT-4 however, can process both text and images, meaning that it's able to scan and contextualize images. The result is that it can be used for a wide range of entertainment or professional purposes. In the same way you could feed ChatGPT a large block of text to analyze, you could now get it to do the same with:
Artistic images
Documents
Graphs and charts
This addition makes GPT-4 even more useful for professionals in marketing, finance, and business management, as its multimodal capability means that it can generate text based on image inputs. For example, it can describe what it sees in a painting or summarize a chart. It can also generate captions or descriptions.
These capabilities have quickly made GPT-4 a time-saving tool and many organizations have adopted these specific use cases to improve their workflows.
API
GPT-4 and GPT-4 Turbo were designed to streamline API integrations, with improved performance and overall cost-efficiency. OpenAI started offering businesses faster application response times and scalability.
Advanced GPT-4 customization has also made it possible to more closely fine-tune application functionality, so the same customer support roles they were already offering can be highly adapted to meet specific needs. This has been a major move toward 'brand tone', where AI models can be trained to reflect a brand’s values and tone.
All of this means that GPT-4 can be integrated with other platforms more efficiently.
Speed
GPT-4 Turbo can process more information at a more efficient rate than GPT-3.5, with businesses running API requests at high volumes benefitting the most. But even casual users with a Pro account can see the improvement, with tasks such as writing a story or summarizing a book becoming much faster to complete.
GPT-4o
GPT-4o (GPT-4 Omni) is the latest model of ChatGPT from OpenAI, and has even greater capacity, offering real-time conversations and better accuracy across subjects. In short, it’s one of the most capable AI models available in 2024.
GPT-4o Performance
The speed that GPT models offer can vary, but GPT-4o is consistently faster than its predecessors at complex tasks. GPT-4 Turbo is also fast and streamlined, but it was built for cost-efficiency. For example, GPT-4o can generate tokens at a rate of over 100 per second, compared to about one-fifth of that from GPT-4 Turbo.
Perhaps a more important question is how accurate each model is. Across the board, GPT-4o provides among the highest levels of accuracy against other OpenAI models and against the industry as a whole.
GPT-4 | GPT-4o | |
MMLU | 86.4% | 88.7% |
HumanEval | 76.5% | 90.2% |
MGSM | 74.5% | 90.5% |
In the table we can see some overall improvements. In the MMLU benchmark (Massive Multitask Language Understanding), there is apparently less progress, but this is a comprehensive benchmark which tests general knowledge in 57 academic subjects, including STEM, the humanities, and social sciences. GPT models generally perform among the best in general knowledge accuracy.
One place we can see significant improvement is in coding. The HumanEval benchmark tests AI models on programming problem-solving. It’s a mix of language comprehension, some simple math, and the ability to solve a problem that a programmer would have when developing code. If coding assistance is a big part of what you’re looking for with ChatGPT, GPT-4o is clearly better.
MGSM (Multilingual Grade School Math) tests mathematics, another area where we see a big improvement with GPT-4o.
ChatGPT vs Human Experts
Soon, we will see 'Humanity’s Last Exam' put AI models to a true test. The project will see each AI model tested on 1,000 crowd-sourced questions, testing general knowledge and reasoning. So far, it’s difficult to find comprehensive data on how well AI models perform against human experts, but what we do know is that in most cases AI loses out when compared against specific subject matter experts.
While there are cases where models like Gemini score slightly higher, GPT-4o doesn’t normally beat humanity, scoring 56.1% on a test of PhD level science questions where human experts averaged 69.7%. That said, OpenAI’s GPT-4o1 model did beat human experts on GPQA Diamond (a graduate-level Google-proof Q&A benchmark), as explained on OpenAI’s website.
There are a few problems with comparing how ChatGPT performs against human experts. First, ChatGPT doesn’t claim to be a subject matter expert and can’t be treated as one. If you cite a research paper authored by a subject matter expert, your argument can be considered more authoritative. This is never the case if you cite an answer from ChatGPT. If the work you present is incorrect, you are the one responsible for it.
GPT-4 vs GPT-4o: Originality
In other cases, ChatGPT may get an answer right, even on the level of a postgraduate essay, but if its training included copyrighted material, and that comes out in their answer, only you can take responsibility for plagiarism.
Currently, ChatGPT is one of the most widely used and factually accurate AI chatbots. Not surprisingly though, given its popularity, it’s also the one with the most attention on its misuse. University professors and employers have become accustomed to sorting through the essays and resumes written with AI, in addition to which, all ChatGPT models will produce content that can be detected by an AI detector.
Increasingly, professionals are using AI detection tools to help them catch AI-produced content. These can point out the parts of text that are likely AI-generated, highlighting the overall likelihood of it not having been produced by a human. The same tools normally come with plagiarism detectors built in, as the two issues commonly exist together.
Different AI detectors have already experimented with the differences between GPT-4 and GPT-4o, finding that GPT-4o produces text that is slightly less likely to be detected. However, for the time being at least, the bottom line remains that if content is copied from any ChatGPT model and pasted before being presented, it is very likely to get found out.
Neither GPT-4 nor GPT-4o can claim full originality or ability to evade AI detectors or human eyes. GPT-4o is just marginally better in both cases.
To see if content was produced by ChatGPT, all you need to do is copy it into AI Detector and wait a moment for the results.
GPT-4o vs GPT-4: Cost
Image credit: ChatGPT
A free ChatGPT account will get you access to GPT-4 mini and limited access to GPT-4o. For access to all the features of GPT-4o, including multimodality as well as other models, you will need a pro account, costing USD $20 per month.
GPT-4o vs GPT-4: Complexity
Both ChatGPT models can understand complex text and perform complex tasks, but GPT-4o is better at both.
Specifically, consider the reasons for the improvement of GPT-4o on the MMLU benchmark. If you give it more challenging text, it will demonstrate that it is better at understanding nuance and context. And while its answers are not much more factually accurate, they demonstrate a more human complexity to them.
These improvements are much more apparent in GPT-4o’s ability to solve word-based mathematical problems. It’s also easier for it to take text prompts and turn them into coherent code. Both of these tasks require a deeper and more complete understanding of meaning and nuance in language.
With regards to the above:
GPT-4 demonstrates a coherent and concise focus on key points
GPT-4o presents a more comprehensive explanation
So, what does this mean?
GPT-4o is better and quicker at learning about complex subject matter
GPT-4o is more able to translate textual explanations into mathematical expressions
GPT-4 produces faster and simpler answers
GPT-4o vs GPT-4: Last Thoughts
It’s clear that in most ways that matter, GPT-4o represents a vast improvement from GPT-4. The improvement is particularly noticeable when it comes to reasoning and answering questions or requests accurately. Any logical, mathematical, or coding tasks are significantly more likely to be answered successfully by GPT-4o.
Both GPT-4 and GPT-4o are consistently among the highest-performing AI models in the AI benchmarks. In addition, GPT-4o is able to be more accurate than human experts in some areas.
At the same time, neither GPT-4 nor GPT-4o are authoritative sources of factual information, and in some cases they can produce inaccurate answers to questions or get flagged for plagiarism. To that end, teachers and other professionals who review written work can often detect ChatGPT content as they self-report. In addition, one study by the International Journal for Educational Integrity found that detection tools overall are able to detect pasted ChatGPT content 74% of the time.
If you want to use ChatGPT for professional, educational, or entertainment purposes, both models have much to offer. In many cases, GPT-4 will more than suffice, but for complex tasks or API integrations requiring scalability, GPT-4o now offers significant improvements compared to its predecessor.