What is RAG?

Retrieval Augmented Generation is a way to make AI models more personalized

Last updated Mar 23, 2026ai

Read within learning track:

TL;DR#

Retrieval Augmented Generation (RAG) is a way to make LLMs like GPT-4 more accurate and personalized to your specific data.

LLMs are powerful as hell, but they’re also generic: they’re trained on all data on the internet ever!
RAG helps you get more personalized responses tailored to your data by embedding your data in your model prompts
RAG relies on the model’s context window, which is how much data in can take in a prompt
Today’s RAG pipelines are pretty complex and rely on embedding models and vector databases

Alongside old school fine tuning, RAG is becoming the standard way to get better, more personalized results out of state of the art LLMs.

Terms Mentioned

Back to the future: training models#

The funny thing about RAG is that the basic concept has been around for as long as machine learning has. Long time readers will recall that back in the day, I studied Data Science in undergrad. “Old School” machine learning, before everyone was calling it AI, was entirely predicated on training a new model for every problem.

How old school ML worked: custom models#

Imagine you’re a Data Scientist tasked with understanding and predicting customer churn (your employer has a big churn problem). Your goal is to be able to predict a brand new customer’s chances of churning, so your marketing team can give them discounts and winback offers before they leave. Here’s what you might do:

You gather a curated data set

You spend time gathering all of the historical data your company has about churn: who churned, when, what characteristics they had when they did, and anything they did beforehand. Each customer gets a label: churned, or didn’t churn.

You train a model on the data set

Using either simple linear regression or something as complicated as deep learning with neural networks, your model goes through the data and tries to find patterns. It eventually learns (or tries to learn) which characteristics tend to lead to a customer churning, and which don’t.

You test the model on new data

To make sure the model isn’t just spitting your data back at you, you test it on new data and see how it performs. The model needs to generalize, meaning perform well on new data that doesn’t look exactly like the data you trained it on. Models that are trained too well on the training data and don’t generalize well are called “overfit.”

Loading image...

The important theme here is that each model – whether you trained it from scratch, or took an existing model off the shelf – needed to be customized to your data set. This was how everyone thought about machine learning when I was doing it professionally. Everyone has different problems, so everyone needs different models.

Generative AI: not customized by default#

The Generative AI that we use today, like ChatGPT or Claude, isn’t like this at all. Instead, it’s trained on one, colossally large data set – all of the internet – that isn’t curated and doesn’t belong to your business at all. You prompt the model to focus it on a specific problem, say, generating an outreach email, and it outputs something that represents the data it was trained on.

The broad strokes, non-customized nature of these GenAI models is fine for some use cases. But for many, especially business use cases, you need models to be aware of your data and give you those kinds of tailored responses. You can’t build a customer support chatbot that doesn’t know anything about your product [1] This hasn’t stopped companies from trying!.

So how do you customize GenAI models to work with your data?

The basic idea of RAG: data in context windows#

The most straightforward way to customize a model like GPT-4 would be to retrain it on your unique data, updating the model itself along the way. You can do this – it’s called fine tuning (future post forthcoming) – but it’s expensive and requires a lot of infrastructure to do. What if there was a way to keep the models themselves the same, but somehow get them to output more customized responses aware of your data?

Continue reading with an all-access subscription

What is RAG?

Retrieval Augmented Generation is a way to make AI models more personalized

Last updated Mar 23, 2026ai

Justin Gage

Read within learning track:

TL;DR#

Retrieval Augmented Generation (RAG) is a way to make LLMs like GPT-4 more accurate and personalized to your specific data.

LLMs are powerful as hell, but they’re also generic: they’re trained on all data on the internet ever!
RAG helps you get more personalized responses tailored to your data by embedding your data in your model prompts
RAG relies on the model’s context window, which is how much data in can take in a prompt
Today’s RAG pipelines are pretty complex and rely on embedding models and vector databases

Alongside old school fine tuning, RAG is becoming the standard way to get better, more personalized results out of state of the art LLMs.

Back to the future: training models#

How old school ML worked: custom models#

You gather a curated data set

You train a model on the data set

You test the model on new data

Loading image...

Generative AI: not customized by default#

So how do you customize GenAI models to work with your data?

The basic idea of RAG: data in context windows#

Continue reading with an all-access subscription

Explore learning tracks

What is RAG?

TL;DR#

Terms Mentioned

Training

Fine Tuning

ChatGPT

Database

Context Window

Back to the future: training models#

How old school ML worked: custom models#

Generative AI: not customized by default#

The basic idea of RAG: data in context windows#

In this post

More in this track

What is Machine Learning?

How do Large Language Models work?

The post about GPUs

What’s an inference provider?

Explore learning tracks

What is RAG?

TL;DR#

Terms Mentioned

Training

Fine Tuning

ChatGPT

Database

Context Window

Back to the future: training models#

How old school ML worked: custom models#

Generative AI: not customized by default#

The basic idea of RAG: data in context windows#

In this post

More in this track

What is Machine Learning?

How do Large Language Models work?

The post about GPUs

What’s an inference provider?