How do Large Language Models work?

Breaking down what ChatGPT and others are doing under the hood

Large Language Models (LLMs) like ChatGPT, the new “Sydney” mode in Bing (which still exists apparently), and Google’s Bard have completely taken over the news cycles. I’ll leave the speculation on whose jobs these are going to steal for other publications; this post is going to dive into how these models actually work, from where they get their data to the math (well, the basics you need to know) that allows them to generate such weirdly “real” text.

🖇 Workplace Example

Creating powerful ML models from scratch as an incredibly specialized discipline. While many Data Scientists and ML Engineers indeed do that with frameworks like PyTorch and Tensorflow , others build on top of existing open source models and extend their functionality. And you can even outsource the entire model development process, and use someone else’s right out of the box.

Model development is iterative: unless your data is super simple, you’ll likely need to try different algorithms, and tweak them constantly before your model begins to make any sense. This is part science and math, part art, and part plain randomness. 

Language models and generating text

When your data has a time component to it – say you want to predict stock prices in the future, or understand what’s going to happen in an upcoming election – it’s pretty easy to understand what a model is doing. It’s using the past to predict the future. But many ML models don’t work with time series data at all; language models are a great example of that.

Language models are just ML models that work with text data. You train them on what’s called a corpus (or just body) of text, and then use them for any number of different things, like:

Machine learning 101, a crash course

LLMs are a type of Machine Learning model like any other. So to understand how they work, we need to first understand how ML works in general. Disclaimer: there are some incredible visual resources on the web that explain how Machine Lea...