• | 9:00 am

What is a large language model and how does it work?

Large language models are the foundational technology behind recent artificial intelligence advancements like ChatGPT.

What is a large language model and how does it work?
[Source photo: iconeer/Getty Images; Markus Spiske/Unsplash]

With the emergence of ChatGPT and other AI-driven technologies, there’s been ongoing conversation around how the tech will usher us into a new era—one that may simultaneously destroy careers and open the door to new opportunities. There’s less discussion, however, around the technology underpinning the AI innovations: large language models (LLMs for short).

Below, a quick guide on how LLMs work.

WHAT IS A LARGE LANGUAGE MODEL?

LLMs are machine learning models that utilize deep learning algorithms to process and understand language. They’re trained with immense amounts of data to learn language patterns so they can perform tasks. Those tasks can range from translating texts to responding in chatbot conversations—basically anything that requires language analysis of some sort.

The best-known example of LLMs is ChatGPT, with which users can have conversations or ask specific tasks related to language. Another popular example: BERT, or Bidirectional Encoder Representations from Transformers, which was developed by Google and can understand questions to form meaningful responses.

HOW DO LARGE LANGUAGE MODELS WORK?

LLMs are comprised of multiple layers of neural networks, which work together to analyze text and predict outputs. They’re also trained with a left-to-right or bidirectional transformer, which works to maximize the probability of following and preceding words in context—just like a human could reasonably predict what might come next in a sentence.

LLMs also have an attention mechanism that allows them to focus selectively on parts of text in order to identify the most relevant sections for summaries, for example.

HOW DO YOU TRAIN AN LLM?

LLMs can be incredibly expensive to train. A 2020 study estimated that the cost of training a model with 1.5 billion parameters can be as high as $1.6 million. However, advances in software and hardware have brought those costs down in recent years.

Generally, training an LLM includes identifying a data set, which likely needs to be large in order for it to perform functions like a human, determining the network layer configuration, using supervised learning to learn the information in the data set, and finally fine-tuning, or adding specific adjustments based on performance or motive.

With task-specific training, it’s an iterative process of figuring out what you need that’s not reflected and how to achieve that end goal. However, training LLMs can be quite difficult: you need distributed software, and the training time is long, in addition to requiring the technical knowledge necessary to train the model.

  Be in the Know. Subscribe to our Newsletters.

ABOUT THE AUTHOR

More

More Top Stories:

FROM OUR PARTNERS