GPT-4, the latest generation of OpenAI’s foundation AI model has been released today. The system is “multi-modal”, meaning as well as taking text input it can also understand images and output text based on pictures. ChatGPT-maker OpenAI claims that GPT-4 “exhibits human-level performance on various professional and academic benchmarks”.
The model can take images or text in but can only output text. It has no video capability as had been previously rumoured but that could come in the future as previous versions of GPT have been updated and improved over time. This new milestone is described as a “scale up” in deep learning that can pass a simulated bar exam with a top 10% score, whereas the previous generation was in the bottom 10% of all test takers.
The company, backed by a $10bn Microsoft investment and off the back of the success of ChatGPT, says it has spent six months iteratively aligning GPT-4 using lessons from an adversarial testing program and from ChatGPT, specifically the way humans interacted with the chatbot. This led to the “best-ever results” on factuality, steerability and refusing to go outside OpenAI-defined guardrails – all a problem for GPT-3.5, the model on which ChatGPT is built.
Work started on GPT-4 two years ago, soon after the launch of GPT-3. This included rebuilding the deep learning stack and creating a supercomputer from the ground up alongside Microsoft Azure that can manage the workload. GPT-3.5, which powers ChatGPT was trained on this system as part of a test run, leading to a number of “bugs” and issues that had to be addressed.
The GPT-4 training run was unprecedentedly stable, OpenAI claims, making it the first large model to have a training performance that could be accurately predicted ahead of time. “As we continue to focus on reliable scaling, we aim to hone our methodology to help us predict and prepare for future capabilities increasingly far in advance—something we view as critical for safety,” a company blog post said.
GPT-4 to be introduced in ChatGPT
GPT-4’s ability to understand text will be made available via ChatGPT initially, and with a waitlist via the GPT API. The image capability will initially only be available with a single partner – Be My Eyes – the virtual volunteer technology that lets users send images via the app to an AI-powered Virtual Volunteer, which will provide instantaneous identification, interpretation and conversational visual assistance for a wide variety of tasks.
As for the text output, a casual conversation using GPT-4 will be only “subtly different” to GPT-3.5, OpenAI says. The notable differences will come out as the text in question becomes more complex as it is built to be more “reliable, creative and able to handle more nuanced instructions” than its predecessor.
“To understand the difference between the two models, we tested on a variety of benchmarks, including simulating exams that were originally designed for humans,” the company wrote in a blog post. “We proceeded by using the most recent publicly-available tests or by purchasing 2022–2023 editions of practice exams. We did no specific training for these exams. A minority of the problems in the exams were seen by the model during training, but we believe the results to be representative.”
This included the Bar Exam, the last, US SAT exams, Medical Knowledge Self-Assessment Program and high school AP tests. In addition, it was put through Sommelier tests and Leetcode exams as well as traditional machine learning benchmarks. “GPT-4 considerably outperforms existing large language models, alongside most state-of-the-art models.”
“We’ve also been using GPT-4 internally, with great impact on functions like support, sales, content moderation, and programming. We also are using it to assist humans in evaluating AI outputs, starting the second phase in our alignment strategy,” the company explained.
It has some limitations, including those seen in earlier models including the fact it hallucinates facts and makes reasoning errors, but these have been reduced. OpenAI claims. “Great care should be taken when using language model outputs, particularly in high-stakes context,” the company warned. Its team has also been iterating to make it safer and more aligned from the beginning of training including selection and filtering of pre-training data and expert engagement.
One of the first companies to incorporate GPT-4 will be the language-learning app Duolingo. It will be part of the new Duolingo Max product that includes ‘roleplay’ where users can practice their conversational language skills against the AI and ‘Explain my answer’, which acts like a tutor with the “Owl” character sharing personalised explanations.