open sidebar menu
  • Toggle menu
    Skip to Section
    On This Page
    What is training in the first place?
    What's the difference between old school ML training and generative AI training?
    How long does it take to train an AI model?
    What are the steps in training an AI model?
    How does training actually work under the hood?
    What makes training successful?
    Training vs. Inference: What's the difference?
    Frequently Asked Questions About Training

Learn

AI Reference
Your dictionary for AI terms like LLM and RLHF
Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Knowledge Bases
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore knowledge bases

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...

Meet Technically

Technically exists to help you get better at your job by becoming more technically literate.

Learn more →

Solutions for Teams

For GTM Teams
Sell more software to developers by becoming technically fluent.
For Finance Professionals
Helping both buy-side and sell-side firms ask better technical questions.
General Team Inquiries
Volume discounts on Technically knowledge bases.
Loading...
Pricing
Sign In
← Back to AI Reference

Training

intermediate

Training is the process of creating an AI model and teaching it how to actually do something useful.

  • Training a model is a lot like training a toddler and a lot less like writing a computer program.
  • You show the model thousands (or billions) of examples until it finds the patterns on its own.
  • Different types of AI require different teaching styles, from simple "right/wrong" quizzes to complex essay writing.
  • Modern AI training can take weeks, cost millions, and requires a small army of GPUs.

What is training in the first place?

If you want a computer to do something, you usually have two options: programming or training.

Programming is like writing a recipe. You give the computer explicit, step-by-step instructions: "If the user clicks this button, open this window."

Training is like teaching a kid to ride a bike. You can't write a line of code that explains "balance." You just have to put the kid on the bike, give them a push, and let them wobble until they figure it out.

In the AI world, a "model" is just a mathematical structure waiting to be taught. Training is the process of showing that model millions of examples (images, sentences, stock prices) and letting it adjust its internal math until it recognizes the patterns.

What is training in the first place?

Loading image...

If you want a computer to do something, you usually have two options: programming or training.

Programming is like writing a recipe. You give the computer explicit, step-by-step instructions: "If the user clicks this button, open this window."

Training is like teaching a kid to ride a bike. You can't write a line of code that explains "balance." You just have to put the kid on the bike, give them a push, and let them wobble until they figure it out.

In the AI world, a "model" is just a mathematical structure waiting to be taught. Training is the process of showing that model millions of examples (images, sentences, stock prices) and letting it adjust its internal math until it recognizes the patterns.

What's the difference between old school ML training and generative AI training?

The difference mostly comes down to what you are asking the model to do: pass a multiple-choice test or write a creative essay.

1. Traditional Machine Learning (Old School)

This is straightforward. You train the model on clear inputs and outputs.

  • Input: A picture.
  • Goal: Is it a hot dog? Yes or No.

There is a clear right answer. The model learns to draw a line between "hot dog" and "not hot dog." Our brains do the same via basic logic; the model is a little different, and is using fancy math to make a statistical decision based on the pixels of the input image.

2. Generative AI (New School)

This is much trickier because there is no single "correct" answer when you want a model to generate text. If you ask ChatGPT to "write a poem about sad robots," there are infinite valid responses. Instead of learning strict rules, the model learns probability distributions. It figures out what words usually follow other words to create a sentence that sounds human.

So obviously, training this kind of model is a very different kind of task. Instead of a nice labeled dataset of inputs and outputs, the model is trained to predict the next word in a sentence over and over again…among other things (more on this later).

What's the difference between old school ML training and generative AI training?

Loading image...

The difference mostly comes down to what you are asking the model to do: pass a multiple-choice test or write a creative essay.

1. Traditional Machine Learning (Old School)

This is straightforward. You train the model on clear inputs and outputs.

  • Input: A picture.
  • Goal: Is it a hot dog? Yes or No.

There is a clear right answer. The model learns to draw a line between "hot dog" and "not hot dog." Our brains do the same via basic logic; the model is a little different, and is using fancy math to make a statistical decision based on the pixels of the input image.

2. Generative AI (New School)

This is much trickier because there is no single "correct" answer when you want a model to generate text. If you ask ChatGPT to "write a poem about sad robots," there are infinite valid responses. Instead of learning strict rules, the model learns probability distributions. It figures out what words usually follow other words to create a sentence that sounds human.

So obviously, training this kind of model is a very different kind of task. Instead of a nice labeled dataset of inputs and outputs, the model is trained to predict the next word in a sentence over and over again…among other things (more on this later).

How long does it take to train an AI model?

It depends entirely on the kind of model.

  • Old School Models: If you want to train a model to recognize the difference between a square and a circle, you can do that on your laptop in about the time it takes to make a coffee.
  • New School Models: If you want to train something like GPT-4, you need thousands of specialized chips (GPUs) running 24/7 for months.

Older models were more opinionated and special purpose, so they were also much smaller – you didn’t need tons and tons of internal math when the task space was focused. But for today’s GenAI models, this math is incredibly complex and varied; so we are dealing in the billions (or even trillions) of little internal parameters.

😰 Don’t sweat the details 😰

You'll often hear about "parameters" (e.g., 70 billion parameters). Just think of parameters as the number of brain cells the model has. The more brain cells, the longer it takes to train, but the smarter it (usually) gets.

How long does it take to train an AI model?

It depends entirely on the kind of model.

  • Old School Models: If you want to train a model to recognize the difference between a square and a circle, you can do that on your laptop in about the time it takes to make a coffee.
  • New School Models: If you want to train something like GPT-4, you need thousands of specialized chips (GPUs) running 24/7 for months.

Older models were more opinionated and special purpose, so they were also much smaller – you didn’t need tons and tons of internal math when the task space was focused. But for today’s GenAI models, this math is incredibly complex and varied; so we are dealing in the billions (or even trillions) of little internal parameters.

😰 Don’t sweat the details 😰

You'll often hear about "parameters" (e.g., 70 billion parameters). Just think of parameters as the number of brain cells the model has. The more brain cells, the longer it takes to train, but the smarter it (usually) gets.

What are the steps in training an AI model?

For a classic machine learning problem, the process usually looks like this:

  1. Acquire data on the problem: gather a dataset that you’ll use to teach your model to do what you want it to do, like classify an image or predict a stock price.
  2. Label your dataset: data needs context to be useful to the model, like what’s in an image or if a stock went up or down.
  3. Train your model: using some standard algorithms and linear algebra, teach the model what’s going on in your nicely curated dataset.
  4. Test your model: make sure what your model has learned transfers well to new data (and ideally, the real world).

For modern Generative AI (like LLMs), it’s quite different:

  1. Pre-training: You feed the model the entire internet so it learns how language works (grammar, facts, reasoning). This gives it a good base to work with.
  2. Supervised Fine Tuning: You teach the model to actually follow orders ("Summarize this text") rather than just blabbering. Base models without SFT will just spew words.
  3. RLHF (Reinforcement Learning from Human Feedback): You further teach the model how to be a good assistant by having humans rate the model's answers, teaching it to be helpful and focused (and not toxic).

What are the steps in training an AI model?

For a classic machine learning problem, the process usually looks like this:

  1. Acquire data on the problem: gather a dataset that you’ll use to teach your model to do what you want it to do, like classify an image or predict a stock price.
  2. Label your dataset: data needs context to be useful to the model, like what’s in an image or if a stock went up or down.
  3. Train your model: using some standard algorithms and linear algebra, teach the model what’s going on in your nicely curated dataset.
  4. Test your model: make sure what your model has learned transfers well to new data (and ideally, the real world).

For modern Generative AI (like LLMs), it’s quite different:

  1. Pre-training: You feed the model the entire internet so it learns how language works (grammar, facts, reasoning). This gives it a good base to work with.
  2. Supervised Fine Tuning: You teach the model to actually follow orders ("Summarize this text") rather than just blabbering. Base models without SFT will just spew words.
  3. RLHF (Reinforcement Learning from Human Feedback): You further teach the model how to be a good assistant by having humans rate the model's answers, teaching it to be helpful and focused (and not toxic).

How does training actually work under the hood?

At its core, training is just a game of "Warmer or Colder." The technical term is minimizing loss, but here is the analogy:

Imagine you are blindfolded and trying to shoot a free throw.

  1. Prediction: You take a shot.
  2. Loss Calculation: You miss by 3 feet to the left.
  3. Optimization: Your friend tells you that you missed to the left.
  4. Adjustment: You adjust your stance slightly to the right.
  5. Repeat: You do this 10 million times.

Eventually, you will stop missing. That is exactly what the computer is doing—calculating how "wrong" it was and tweaking its internal numbers slightly to be less wrong next time.

For GenAI models, the free throws are actually sentence completion. The model is fed a few words (or thousands) and asked to generate a probability distribution of what the next word should be. We score how right or wrong it is, and run the whole thing again.

It sounds pretty simple, but in practice getting this done is a very difficult engineering feat. These digital free throws are happening at incredible speeds, distributed across hundreds or thousands of GPUs working simultaneously.

How does training actually work under the hood?

Loading image...

At its core, training is just a game of "Warmer or Colder." The technical term is minimizing loss, but here is the analogy:

Imagine you are blindfolded and trying to shoot a free throw.

  1. Prediction: You take a shot.
  2. Loss Calculation: You miss by 3 feet to the left.
  3. Optimization: Your friend tells you that you missed to the left.
  4. Adjustment: You adjust your stance slightly to the right.
  5. Repeat: You do this 10 million times.

Eventually, you will stop missing. That is exactly what the computer is doing—calculating how "wrong" it was and tweaking its internal numbers slightly to be less wrong next time.

For GenAI models, the free throws are actually sentence completion. The model is fed a few words (or thousands) and asked to generate a probability distribution of what the next word should be. We score how right or wrong it is, and run the whole thing again.

It sounds pretty simple, but in practice getting this done is a very difficult engineering feat. These digital free throws are happening at incredible speeds, distributed across hundreds or thousands of GPUs working simultaneously.

What makes training successful?

You can have the most expensive computer in the world, but your model will still be trash if you screw up the basics.

  • Quality Data: This is the golden rule. Garbage In, Garbage Out. If you train a model on Reddit comments, don't be surprised when it starts trolling you. This is why so much effort from the big labs is put into curating and improving the massive pre-training dataset.
  • Architecture: You need to choose the right mathematical structure for the problem (e.g., don't use a text model to analyze images). Most companies have standardized on the Transformer architecture, but this is changing.
  • Compute: You need enough horsepower to actually get through the data in a reasonable amount of time. Each round of the sentence guessing game is itself simple, but it needs to be distributed across tons and tons of chips to run fast.

And even then, sometimes training runs just don’t work. There are a myriad number of possibilities for where bugs can come in, the model can get confused and stuck at a local optimum, etc. This stuff is hard.

What makes training successful?

You can have the most expensive computer in the world, but your model will still be trash if you screw up the basics.

  • Quality Data: This is the golden rule. Garbage In, Garbage Out. If you train a model on Reddit comments, don't be surprised when it starts trolling you. This is why so much effort from the big labs is put into curating and improving the massive pre-training dataset.
  • Architecture: You need to choose the right mathematical structure for the problem (e.g., don't use a text model to analyze images). Most companies have standardized on the Transformer architecture, but this is changing.
  • Compute: You need enough horsepower to actually get through the data in a reasonable amount of time. Each round of the sentence guessing game is itself simple, but it needs to be distributed across tons and tons of chips to run fast.

And even then, sometimes training runs just don’t work. There are a myriad number of possibilities for where bugs can come in, the model can get confused and stuck at a local optimum, etc. This stuff is hard.

Training vs. Inference: What's the difference?

"Training" is learning. "Inference" is doing.

  • Training: This happens once (usually). It is the act of studying for the exam. It is computationally expensive, takes a long time, and is where the "intelligence" is created.
  • Inference: This happens every time you use the model. When you ask ChatGPT a question, it isn't learning anything new right that second; it is just recalling what it learned during training. It's quick, cheap, and efficient.

In practice, during inference, the GenAI model is generating token by token using the mathematical weights it developed during training.

Training vs. Inference: What's the difference?

Loading image...

"Training" is learning. "Inference" is doing.

  • Training: This happens once (usually). It is the act of studying for the exam. It is computationally expensive, takes a long time, and is where the "intelligence" is created.
  • Inference: This happens every time you use the model. When you ask ChatGPT a question, it isn't learning anything new right that second; it is just recalling what it learned during training. It's quick, cheap, and efficient.

In practice, during inference, the GenAI model is generating token by token using the mathematical weights it developed during training.

Frequently Asked Questions About Training

Can you train a model on your own data?

Absolutely. This is actually where the money is for most businesses. You take a smart model that already understands the world to some degree (like GPT-5), and you "fine-tune" it on your specific legal contracts or customer support tickets. It's like hiring a smart college grad and then training them on your specific business and constraints.

What happens if training goes wrong?

Two bad things usually happen.

  1. Underfitting: The model fails to find the pattern. Maybe the data wasn’t good enough, or the training code / algorithm was wrong.
  2. Overfitting: The model just memorizes the data instead of learning the concept. It's like a student who memorized the answers to the practice test but fails the real exam because the questions are slightly different.

And this is aside from training bugs that can derail the entire training run in the first place.

How do you know when training is done?

Researchers have sophisticated monitoring setups to measure how a training run is progressing and how good the model is getting.

To simplify things, imagine you hold back a chunk of data (the "test set") that the model is never allowed to see during training. You periodically test the model on this hidden data. When the model stops getting better at predicting the hidden data – and has reached your desired level of sophistication – you pull the plug.

Can you pause and pick up training later?

Yes, thank god. Training big models costs thousands of dollars an hour. If the power goes out, you don't want to start over. Engineers save "checkpoints" (like save states in a video game) so if something crashes, they can resume from the last save.

Frequently Asked Questions About Training

Can you train a model on your own data?

Absolutely. This is actually where the money is for most businesses. You take a smart model that already understands the world to some degree (like GPT-5), and you "fine-tune" it on your specific legal contracts or customer support tickets. It's like hiring a smart college grad and then training them on your specific business and constraints.

What happens if training goes wrong?

Two bad things usually happen.

  1. Underfitting: The model fails to find the pattern. Maybe the data wasn’t good enough, or the training code / algorithm was wrong.
  2. Overfitting: The model just memorizes the data instead of learning the concept. It's like a student who memorized the answers to the practice test but fails the real exam because the questions are slightly different.

And this is aside from training bugs that can derail the entire training run in the first place.

How do you know when training is done?

Researchers have sophisticated monitoring setups to measure how a training run is progressing and how good the model is getting.

To simplify things, imagine you hold back a chunk of data (the "test set") that the model is never allowed to see during training. You periodically test the model on this hidden data. When the model stops getting better at predicting the hidden data – and has reached your desired level of sophistication – you pull the plug.

Can you pause and pick up training later?

Yes, thank god. Training big models costs thousands of dollars an hour. If the power goes out, you don't want to start over. Engineers save "checkpoints" (like save states in a video game) so if something crashes, they can resume from the last save.

Illustration comparing programming with step-by-step rules versus training through examples.
Diagram contrasting old-school machine learning with generative AI outputs.
Hand-drawn graphic showing how training works through prediction, loss, and adjustment.
Comparison of training as the study phase and inference as the test-taking phase.

Related posts

How do you train an AI model?

A deep dive into how models like ChatGPT get built.

Appliedai
How are companies using AI?

Enough surveys and corporate hand-waving. Let's answer the question by looking at usage data from an AI compute provider.

Appliedai
2026 vibe coding tool comparison

Comparing Replit, v0, Lovable and Bolt, in a bakeoff to decide who will be Vandalay Industries go-to vibe coding tool.

Appliedai

Impress your engineers

70K+ product managers, marketers, bankers, and other -ers read Technically to understand software and work better with developers.

Newsletter
Support
Sponsorships
X + Linkedin
Privacy + ToS

Written with 💔 by Justin in Brooklyn