• | 9:00 am

Why downsizing large language models is the future of generative AI

Smaller language models can be based on a billion parameters or less—still pretty large, but much smaller than foundational LLMs like ChatGPT and Bard.

Why downsizing large language models is the future of generative AI
[Source photo: Alicja_/Pixabay]

Businesses are keen to unlock the power of generative AI, and yet large language models like ChatGPT present obvious challenges for corporate use. A study this month found that 75% of organizations are considering or have implemented bans on generative AI applications, citing security, privacy, and other concerns. The high cost of training LLMs has also been seen as a significant barrier to adoption.

To get value from generative AI, the path forward lies in smaller language models, which require less time and resources to maintain and can be operated inside a company’s existing security perimeter. Smaller language models can be faster and more accurate because they’re optimized for a narrower set of tasks compared to the do-it-all models that have garnered most of the attention to-date.

Public LLMs like ChatGPT are known as “foundational” models, and they’re created by scraping vast amounts of information from the internet to answer questions about virtually any topic—from how to bake a cake to how to balance a stock portfolio. They do fairly well with general knowledge questions, but they are prone to errors because they try to do so much, and they take an immense amount of computing power to build and maintain.

What they’re not good at is answering questions specific to your organization, because they don’t have access to the sensitive customer, financial, and other proprietary data that businesses keep locked away behind their security perimeters. Feeding organizational data into a public LLM is simply not an option for most companies as the security and privacy risks are too high.

This is a wasted opportunity because generative AI can be an immensely powerful tool for businesses. With the right data, an account executive could ask a question like: “Show me clients who are vulnerable to churn and suggest offers to keep them engaged.” A marketer could ask: “Give me campaign ideas for our new product launch in Q4 based on what worked for similar launches in the past.”

The key to enabling these types of questions is smaller language models, which companies can operate and train within their secure cloud environment. These models can be customized by training them on a business’s most sensitive data, because that data never has to be fed into a public LLM, and because they are smaller models, they require significantly less cost.


LLMs like ChatGPT are reportedly trained on more than 100 billion “parameters,” or the values that determine how the model behaves. That makes them immensely expensive to build and operate—the estimated cost to train ChatGPT was $4 million.

Smaller language models can be based on a billion parameters or less—still pretty large, but much smaller than foundational LLMs like ChatGPT and Bard. They are pre-trained to understand vocabulary and human speech, so the incremental cost to customize them using corporate and industry-specific data is vastly lower. There are several options for these pre-trained LLMs that can be customized internally, including AI21 and Reka, as well as open source LLMs like Alpaca and Vicuna.

Smaller language models aren’t just more cost-efficient, they’re often far more accurate, because instead of training them on all publicly available data—the good and the bad—they are trained and optimized on carefully vetted data that addresses the exact use cases a business cares about.

That doesn’t mean they’re limited to internal corporate data. Smaller language models can incorporate third-party data about the economy, commodities pricing, the weather, or whatever data sets are needed, and combine them with their proprietary data sets. These data sources are widely available from data service providers who ensure the information is current, accurate, and clean.


Looking ahead, we may end up with only a few dozen widely-used foundational LLMs in the world, operated by technology giants like Meta, Google and Baidu. Like the search engines of today, these giant LLMs require immense resources to maintain, and the economics don’t support hundreds of chatbots like Bard and ChatGPT.

However, I do see a future with thousands of smaller language models, operating at the company or department level and providing valuable insights for employees. These smaller models can be immensely useful and are the key to unlocking the real power of generative AI for business.

  Be in the Know. Subscribe to our Newsletters.


Torsten Grabs is the senior director of product management at Snowflake. More

More Top Stories: