open sidebar menu
  • Toggle menu
    Skip to Section
    On This Page
    What is a loss function?
    How do loss functions work in practice?
    What are different types of loss functions?
    How do loss functions "tell" models if they're right or wrong?
    Why do different tasks need different loss functions?
    What happens when you choose the wrong loss function?
    How do you know if your loss function is working?
    Frequently Asked Questions About Loss Function

Learn

AI Reference
Your dictionary for AI terms like LLM and RLHF
Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Knowledge Bases
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore knowledge bases

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...

Meet Technically

Technically exists to help you get better at your job by becoming more technically literate.

Learn more β†’

Solutions for Teams

For GTM Teams
Sell more software to developers by becoming technically fluent.
For Finance Professionals
Helping both buy-side and sell-side firms ask better technical questions.
General Team Inquiries
Volume discounts on Technically knowledge bases.
Loading...
Pricing
Sign In
← Back to AI Reference

Loss Function

intermediate

When training a machine learning model, a loss function is something you design that tells the model when its answers are right and when they're wrong.

  • It's like a scoring system that gives the model feedback on how well it's doing
  • Different types of tasks need different loss functions (classification vs. prediction)
  • The model uses this feedback to gradually improve its performance
  • Think of it as the "teacher" that grades the AI's homework and helps it learn

Loss functions are where the actual learning happens - they turn the vague goal of "get better" into specific mathematical feedback.

What is a loss function?

I keep mentioning these 'fancy algorithms' and it's now time to discuss those. The magic that happens when a model digests your data and learns a pattern is actually not magic it all; it's math. Specifically, models use something called a loss function to figure out if they're doing well or not.

A loss function is something you design that tells the model when its answers are right and when they're wrong. There are a bunch of different types, some catered to specific ML tasks like image classification (bugs) or regression (stock prices).

Here's the problem a loss function solves: imagine you're teaching someone to shoot free throws, but you can only communicate through a single number. If they make the shot, you say "0" (no error). If they miss, you say how far off they were. Over time, they learn to adjust their technique to minimize that number. That's essentially what a loss function does for AI models.

What is a loss function?

Loading image...

I keep mentioning these 'fancy algorithms' and it's now time to discuss those. The magic that happens when a model digests your data and learns a pattern is actually not magic it all; it's math. Specifically, models use something called a loss function to figure out if they're doing well or not.

A loss function is something you design that tells the model when its answers are right and when they're wrong. There are a bunch of different types, some catered to specific ML tasks like image classification (bugs) or regression (stock prices).

Here's the problem a loss function solves: imagine you're teaching someone to shoot free throws, but you can only communicate through a single number. If they make the shot, you say "0" (no error). If they miss, you say how far off they were. Over time, they learn to adjust their technique to minimize that number. That's essentially what a loss function does for AI models.

How do loss functions work in practice?

In Iowa, one way this can go down is that your model will start by guessing at random whether an image has a bug in it. If it's right, it gets a point. If it's wrong, it loses a point. After enough of these iterations, it starts to learn what means bug and what doesn't.

On Wall Street, your model is trained slightly differently. It plots all of the stock price fluctuations on a graph and uses a loss function to find the most likely relationship between time and price. If the line is good, it gets a few points. If it's bad, it loses a few points.

The Iowa example uses what's called a classification loss function - the model is trying to put things into categories (bug vs. no bug). The Wall Street example uses a regression loss function - the model is trying to predict a continuous number (stock price).

How do loss functions work in practice?

In Iowa, one way this can go down is that your model will start by guessing at random whether an image has a bug in it. If it's right, it gets a point. If it's wrong, it loses a point. After enough of these iterations, it starts to learn what means bug and what doesn't.

On Wall Street, your model is trained slightly differently. It plots all of the stock price fluctuations on a graph and uses a loss function to find the most likely relationship between time and price. If the line is good, it gets a few points. If it's bad, it loses a few points.

The Iowa example uses what's called a classification loss function - the model is trying to put things into categories (bug vs. no bug). The Wall Street example uses a regression loss function - the model is trying to predict a continuous number (stock price).

What are different types of loss functions?

Different tasks need different types of feedback, so there are several flavors of loss functions:

Classification Loss Functions

For tasks where you're sorting things into categories (spam vs. not spam, cat vs. dog):

  • Cross-entropy loss: Penalizes confident wrong answers more than uncertain wrong answers
  • Hinge loss: The classic "support vector machine" approach - focuses on getting the decision boundary right

Regression Loss Functions

For tasks where you're predicting numbers (stock prices, temperature, sales):

  • Mean squared error: Penalizes big mistakes way more than small ones
  • Mean absolute error: Treats all mistakes more equally

Advanced Loss Functions

For complex tasks like generating images or text:

  • Adversarial loss: Two models compete against each other to improve
  • Contrastive loss: Teaches the model to distinguish similar from different examples

What are different types of loss functions?

Loading image...

Different tasks need different types of feedback, so there are several flavors of loss functions:

Classification Loss Functions

For tasks where you're sorting things into categories (spam vs. not spam, cat vs. dog):

  • Cross-entropy loss: Penalizes confident wrong answers more than uncertain wrong answers
  • Hinge loss: The classic "support vector machine" approach - focuses on getting the decision boundary right

Regression Loss Functions

For tasks where you're predicting numbers (stock prices, temperature, sales):

  • Mean squared error: Penalizes big mistakes way more than small ones
  • Mean absolute error: Treats all mistakes more equally

Advanced Loss Functions

For complex tasks like generating images or text:

  • Adversarial loss: Two models compete against each other to improve
  • Contrastive loss: Teaches the model to distinguish similar from different examples

How do loss functions "tell" models if they're right or wrong?

This is where the math gets interesting, but the concept is straightforward. During training, here's what happens:

Step 1: Make a Prediction

The model looks at an example (like a photo) and makes its best guess ("I think this is a cat").

Step 2: Calculate the Loss

The loss function compares the prediction to the correct answer and outputs a number representing how wrong the model was.

Step 3: Adjust the Model

Using calculus (specifically, gradients), the model figures out which of its internal settings to tweak to reduce the loss next time.

Step 4: Repeat

This happens millions of times until the model gets good at minimizing the loss function.

The beautiful thing is that models learn to minimize whatever you measure. Design your loss function well, and the model learns exactly what you want. Design it poorly, and the model optimizes for the wrong thing entirely.

How do loss functions "tell" models if they're right or wrong?

Loading image...

This is where the math gets interesting, but the concept is straightforward. During training, here's what happens:

Step 1: Make a Prediction

The model looks at an example (like a photo) and makes its best guess ("I think this is a cat").

Step 2: Calculate the Loss

The loss function compares the prediction to the correct answer and outputs a number representing how wrong the model was.

Step 3: Adjust the Model

Using calculus (specifically, gradients), the model figures out which of its internal settings to tweak to reduce the loss next time.

Step 4: Repeat

This happens millions of times until the model gets good at minimizing the loss function.

The beautiful thing is that models learn to minimize whatever you measure. Design your loss function well, and the model learns exactly what you want. Design it poorly, and the model optimizes for the wrong thing entirely.

Why do different tasks need different loss functions?

Because different tasks have different definitions of "getting it right." Consider these examples:

  • Medical Diagnosis: Missing a cancer case (false negative) might be way worse than a false alarm (false positive). Your loss function should penalize missed cases more heavily.
  • Spam Detection: False positives (good emails marked as spam) might be more annoying to users than false negatives (spam getting through). Adjust accordingly.
  • Stock Prediction: Being off by $1 when the stock is at $100 is different from being off by $1 when it's at $10. Percentage-based loss functions handle this better.
  • Content Generation: There's no single "right" way to complete a sentence, so you need loss functions that can handle multiple valid outputs.

Why do different tasks need different loss functions?

Because different tasks have different definitions of "getting it right." Consider these examples:

  • Medical Diagnosis: Missing a cancer case (false negative) might be way worse than a false alarm (false positive). Your loss function should penalize missed cases more heavily.
  • Spam Detection: False positives (good emails marked as spam) might be more annoying to users than false negatives (spam getting through). Adjust accordingly.
  • Stock Prediction: Being off by $1 when the stock is at $100 is different from being off by $1 when it's at $10. Percentage-based loss functions handle this better.
  • Content Generation: There's no single "right" way to complete a sentence, so you need loss functions that can handle multiple valid outputs.

What happens when you choose the wrong loss function?

Your model optimizes for the wrong thing, often in spectacular fashion. Some famous examples:

  • The Paperclip Maximizer: A thought experiment where an AI told to maximize paperclip production eventually converts everything on Earth into paperclips because that's what it was optimized for.
  • Gaming the Metric: Real systems that were optimized for "time spent on platform" started showing users increasingly extreme content because outrage keeps people engaged longer.
  • Medical AI Bias: Models trained on loss functions that only cared about overall accuracy learned to ignore rare diseases that mostly affect minority populations.

The lesson? Your loss function is your model's definition of success, so choose carefully.

What happens when you choose the wrong loss function?

Loading image...

Your model optimizes for the wrong thing, often in spectacular fashion. Some famous examples:

  • The Paperclip Maximizer: A thought experiment where an AI told to maximize paperclip production eventually converts everything on Earth into paperclips because that's what it was optimized for.
  • Gaming the Metric: Real systems that were optimized for "time spent on platform" started showing users increasingly extreme content because outrage keeps people engaged longer.
  • Medical AI Bias: Models trained on loss functions that only cared about overall accuracy learned to ignore rare diseases that mostly affect minority populations.

The lesson? Your loss function is your model's definition of success, so choose carefully.

How do you know if your loss function is working?

You watch what the model actually learns to do, not just whether the loss number goes down. Here are some warning signs:

  • The model is "gaming" the system: Loss goes down but real-world performance doesn't improve
  • Weird edge case behavior: The model does bizarre things in situations you didn't think to test
  • Overconfidence: The model is very sure about predictions that are obviously wrong
  • Underconfidence: The model is uncertain about things it should know

Good loss function design is part science, part art, and part careful observation of unintended consequences.

How do you know if your loss function is working?

You watch what the model actually learns to do, not just whether the loss number goes down. Here are some warning signs:

  • The model is "gaming" the system: Loss goes down but real-world performance doesn't improve
  • Weird edge case behavior: The model does bizarre things in situations you didn't think to test
  • Overconfidence: The model is very sure about predictions that are obviously wrong
  • Underconfidence: The model is uncertain about things it should know

Good loss function design is part science, part art, and part careful observation of unintended consequences.

Frequently Asked Questions About Loss Function

Can you change the loss function during training?

You can, but it's usually not a great idea. Changing the loss function mid-training is like changing the rules of the game while someone's learning to play - it tends to confuse the model rather than help. Most people stick with one loss function for the entire training process.

Do modern AI systems use just one loss function?

Nope, often they use combinations of multiple loss functions. ChatGPT, for example, uses different loss functions during pre-training (predict the next word), instruction tuning (be helpful), and RLHF (match human preferences). It's like having multiple teachers giving feedback on different aspects of performance.

How do you design a custom loss function?

Very carefully, and usually by starting with existing loss functions and modifying them. You need to think about what you actually want the model to do, what failure modes you want to avoid, and how to balance different objectives. It often takes several iterations to get right.

Can loss functions be biased?

Absolutely, and this is a huge concern in AI ethics. If your loss function doesn't account for fairness across different groups, your model will learn to be unfair. For example, a hiring algorithm might optimize for "historical hiring patterns" and perpetuate existing biases.

What's the most common mistake with loss functions?

Optimizing for the wrong thing because it's easier to measure. For example, optimizing a recommendation system for "clicks" instead of "user satisfaction" often leads to clickbait. The metrics you choose to optimize become your reality, so choose wisely.

Frequently Asked Questions About Loss Function

Can you change the loss function during training?

You can, but it's usually not a great idea. Changing the loss function mid-training is like changing the rules of the game while someone's learning to play - it tends to confuse the model rather than help. Most people stick with one loss function for the entire training process.

Do modern AI systems use just one loss function?

Nope, often they use combinations of multiple loss functions. ChatGPT, for example, uses different loss functions during pre-training (predict the next word), instruction tuning (be helpful), and RLHF (match human preferences). It's like having multiple teachers giving feedback on different aspects of performance.

How do you design a custom loss function?

Very carefully, and usually by starting with existing loss functions and modifying them. You need to think about what you actually want the model to do, what failure modes you want to avoid, and how to balance different objectives. It often takes several iterations to get right.

Can loss functions be biased?

Absolutely, and this is a huge concern in AI ethics. If your loss function doesn't account for fairness across different groups, your model will learn to be unfair. For example, a hiring algorithm might optimize for "historical hiring patterns" and perpetuate existing biases.

What's the most common mistake with loss functions?

Optimizing for the wrong thing because it's easier to measure. For example, optimizing a recommendation system for "clicks" instead of "user satisfaction" often leads to clickbait. The metrics you choose to optimize become your reality, so choose wisely.

Illustration explaining how a loss function measures how wrong a model was.
Diagram comparing classification and regression loss functions.
Loop showing models guessing, measuring loss, and adjusting repeatedly.
Illustration showing how a poorly defined loss leads to harmful optimization.

Related posts

How do you train an AI model?

A deep dive into how models like ChatGPT get built.

Appliedai
How are companies using AI?

Enough surveys and corporate hand-waving. Let's answer the question by looking at usage data from an AI compute provider.

Appliedai
2026 vibe coding tool comparison

Comparing Replit, v0, Lovable and Bolt, in a bakeoff to decide who will be Vandalay Industries go-to vibe coding tool.

Appliedai

Impress your engineers

70K+ product managers, marketers, bankers, and other -ers read Technically to understand software and work better with developers.

Newsletter
Support
Sponsorships
X + Linkedin
Privacy + ToS

Written with πŸ’” by Justin in Brooklyn