Technically
AI Reference
Your dictionary for AI terms like LLM and RLHF
Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Learning Tracks
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore learning tracks

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...
I'm feeling luckyPricing
Log In
← Back to Universe

Post-training

aiintermediate

LLM post-training turns a model from a knowledgeable blob that produces rambling answers, into a helpful assistant.

There are 2 primary post-training methods (instructional fine-tuning and reinforcement learning from human feedback), although AI labs are cooking up new post-training methods every day.

Instructional fine-tuning

Instructional fine-tuning feeds the model a set of question / answer pairs, to teach the model examples of how you'd want it to respond.

Here's a question answer pair:

  • Question: how do you clean a dirty pan?
  • Answer: mix together 2 parts vinegar and 1 part baking soda, and scrub until clean.

These pairs need to actually be curated by humans (for now). So they are more expensive to create and maintain, and are used in much smaller doses. But they're critical – they turn a model that acts like a child into one that acts like an adult.

RLHF: reinforcement learning from human feedback

Every model provider integrates human feedback into their model's responses to varying degrees. The idea is pretty simple: just tell the model if its responses are good or not (and how) so it can get better.

Somewhat ironically, you can view this as a return to machine learning training fundamentals: humans labeling data.

Read the full post ↗

How do you train an AI model?

A deep dive into how models like ChatGPT get built.

Read in the Knowledge Base →

Mentioned in

How can AI use websites?Paid Plan

How Browserbase built their viral demo of Open Operator (in 24 hrs!) using Vercel's v0 and the AI SDK.

Appliedai

Related terms

ChatGPT

LLM

Loss Function

Machine Learning

Pre-training

Training

Impress your agents

70K+ PMs, engineers, investors, and operators read to Technically to expand their prompting vocabulary.

Content
  • All Posts
  • Learning Tracks
  • AI Reference
  • Companies
  • Terms Universe
Company
  • Pricing
  • Sponsorships
  • Contact
Connect
SubscribeSubstackYouTubeXLinkedIn
Legal
  • Privacy Policy
  • Terms of Service

© 2026 Technically.