I’ve already written about how Large Language Models like ChatGPT and Claude work. But how are they made?
How do you actually build and train an AI model to do all of the amazing stuff that ChatGPT can do?
What is training in the first place?
The process of creating a model is called training, kind of like training a kid to ride a bike, or whatever those people were doing at the Pokemon gyms. In old school Machine Learning – like the kind I went to school for – training broke down into 4 major steps:
- Acquire data on the problem: gather a dataset that you’ll use to teach your model to do what you want it to do, like classify an image or predict a stock price.
- Label your dataset: data needs context to be useful to the model, like what’s in an image or if a stock went up or down.
- Train your model: using some standard algorithms and linear algebra, teach the model what’s going on in your nicely curated dataset.
- Test your model: make sure what your model has learned transfers well to new data (and ideally, the real world).
In a sense, training a model really is like teaching a kid how to do something, like riding a bike. It’s less about telling them how to do it, and more about giving them repetitions so they can figure out what’s going on for themselves. With some well timed guidance, of course.
In the same sense, a model is a decision making machine. The way you train a model is by showing it many, many different situations and what the correct outcome is in those situations. The model uses some fancy math to learn the patterns in those situations and learns to apply them to new data. And like teaching a kid to do something, the way you train a model – from the method to the algorithms used – vary slightly depending on what you want the model to do.
Let’s run through a few examples.