What is a Generative AI model doing?
Old school Machine Learning was all about prediction. You’d train a model on a dataset, and use it to predict what’s going to happen when new data enters the fray, like a new day in the stock market or a new image of some corn in your field.
Generative AI is also all about prediction. But instead of predicting a highly scoped, specific thing – like a number, or a yes/no answer – it predicts entire sentences, paragraphs, images, videos, or even audio. It’s trained to generate entire swaths of new data based on your prompts.
So even though the techniques and style have changed quite a bit over the past 5 years, GenAI and old school ML aren’t so far apart; they’re both learning patterns in data, and then using those patterns to do something.
Types of GenAI models
There are a bunch of different types of GenAI models, some of which have been around for a while. You’ve got Generative Adversarial Networks (GANs), where one model creates something and a sister model critiques it. You’ve got Variational Autoencoders (VAEs), which use probability distributions. Then you have Recurrent Neural Networks (RNNs), which is a special type of neural network that predicts sequences of words. There are even more with even longer acronyms.
But most of the advances over the past few years have come from two specific types of GenAI models:
- Transformers – mostly for text generation
- Diffusion models – mostly for image and video generation
Like any software system, each has things that it’s good at and things that it’s less good at. Let’s run through each type of model, where they came from, and how they work. In practice, many state of the art models today are using some combination of both.