technically logo

Learn

Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Knowledge Bases
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore knowledge bases

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...

Meet Technically

Technically exists to help you get better at your job by becoming more technically literate.

Learn more →

Solutions for Teams

For GTM Teams
Sell more software to developers by becoming technically fluent.
For Finance Professionals
Helping both buy-side and sell-side firms ask better technical questions.
General Team Inquiries
Volume discounts on Technically knowledge bases.
Loading...
Pricing
Sign In
← Back to Universe

Post-training

aiintermediate

LLM post-training turns a model from a knowledgeable blob that produces rambling answers, into a helpful assistant.

There are 2 primary post-training methods (instructional fine-tuning and reinforcement learning from human feedback), although AI labs are cooking up new post-training methods every day.

Instructional fine-tuning

Instructional fine-tuning feeds the model a set of question / answer pairs, to teach the model examples of how you'd want it to respond.

Here's a question answer pair:

  • Question: how do you clean a dirty pan?
  • Answer: mix together 2 parts vinegar and 1 part baking soda, and scrub until clean.

These pairs need to actually be curated by humans (for now). So they are more expensive to create and maintain, and are used in much smaller doses. But they're critical – they turn a model that acts like a child into one that acts like an adult.

RLHF: reinforcement learning from human feedback

Every model provider integrates human feedback into their model's responses to varying degrees. The idea is pretty simple: just tell the model if its responses are good or not (and how) so it can get better.

Somewhat ironically, you can view this as a return to machine learning training fundamentals: humans labeling data.

Read the full post ↗

How do you train an AI model?

A deep dive into how models like ChatGPT get built.

Read in the Knowledge Base →

Mentioned in

How can AI use websites?

How Browserbase built their viral demo of Open Operator (in 24 hrs!) using Vercel's v0 and the AI SDK.

Appliedai

Related terms

ChatGPT

Context Window

Inference

LLM

Loss Function

Machine Learning

Support
Sponsorships
Twitter
Linkedin
Privacy + ToS