- | 9:00 am
Here’s the real reason AI companies are slimming down their models
OpenAI’s ‘new mini’ version of GPT-4o is all about the economics of more complex AI-powered apps.
OpenAI on Thursday announced GPT-4o mini, a smaller and less expensive version of its GPT-4o AI model. OpenAI is one of a number of AI companies to develop a version of its best “foundation” model that trades away some intelligence for some speed and affordability. Such a trade-off could let more developers power their apps with AI, and may open the door for more complex apps like autonomous agents in the future.
The largest large language models (LLMs) use billions or trillions of parameters (or the synapse-like connection points where a neural network does its calculations) to perform a wide array of reasoning and query-related tasks. They’re also trained on massive amounts of data covering a wide variety of topics. “Small language models,” or SLMs, on the other hand, use only millions or tens of millions of parameters to perform a narrower set of tasks, and require less computing power and a smaller set of more focused training data.
For developers with simpler (and perhaps less profitable) apps, an SLM may be their only viable option. OpenAI says GPT-4o mini is 60% cheaper than GPT-3.5 Turbo, formerly the most economical OpenAI model for developers.
Or, it may be a question of speed. Many applications of AI don’t require the vast general knowledge of a large AI model. They may need faster answers to easier questions. “If my kid’s writing his term paper [with the help of an AI tool], the latency isn’t a huge issue,” says Mike Intrator, CEO of CoreWeave, which hosts AI models in its cloud. Latency refers to the time needed for an AI app to get an answer from a model in the cloud. “But if you were to use it for surgery or for automated driving or something like that, the latency begins to make much more of an impact on the experience.” The models used in self-driving cars, Intrator points out, have to be small enough to run on a computer chip in the car, not up in a cloud server.
GPT-4o mini is smaller than other models, but still not small enough to run on a device like a phone or game console. So it must run on a server in the cloud like all of OpenAI’s other models. The company isn’t saying whether it’s working on on-device models (though Apple has confirmed it is).
FASTER AND CHEAPER MODELS COULD BE THE KEY TO THE NEXT GENERATION OF AI-POWERED APPS
Today most AI-powered applications involve a single query, or a few queries, to a model running in the cloud. But cutting-edge apps require many queries to many different models, says Robert Nishihara, cofounder and CEO of Anyscale, which provides a platform for putting AI models and workloads into production. For example, an app that helps you select a vacation rental might use one model to generate the selection criteria, another model to select some rental options, and still another model to score each of those options against the criteria, and so on. And directing and orchestrating all these queries is a complex business.
“When so many model invocations are composed together, cost and latency explode,” Nishihara says. “Finding ways to reduce cost and latency is an essential step in bringing these applications to production.”
The performance of the models is important, but their speed and cost are equally important. OpenAI knows this, as do companies like Meta and Google, both of which are creating smaller and faster open-source models. The model downsizing efforts of these companies are crucial to using AI models for more complex applications, such as personal assistants that do end-to-end tasks on behalf of a user, Nishihara says.
OpenAI doesn’t divulge the parameter size of its models, but its mini is likely comparably sized to Anthropic’s Claude 3 Haiku and Google’s Gemini 1.5 Flash. OpenAI says mini performs better than those comparable models in benchmark tests.
OpenAI says app developers—the biggest beneficiaries of the speed and cost improvements—will be able to access mini through an API starting today, and that the new models will begin to support queries from its ChatGPT app today as well.
The “o” in GPT-4o stands for “omni” or “multimodal,” meaning the ability to process and reason on imagery and sound, not just text. The mini model supports text and vision in the API, and OpenAI says the model will support video and audio capabilities in the future.