Post-training

aiintermediate

LLM post-training turns a model from a knowledgeable blob that produces rambling answers, into a helpful assistant.

There are 2 primary post-training methods (instructional fine-tuning and reinforcement learning from human feedback), although AI labs are cooking up new post-training methods every day.

Instructional fine-tuning

Instructional fine-tuning feeds the model a set of question / answer pairs, to teach the model examples of how you'd want it to respond.

Here's a question answer pair:

Question: how do you clean a dirty pan?
Answer: mix together 2 parts vinegar and 1 part baking soda, and scrub until clean.

These pairs need to actually be curated by humans (for now). So they are more expensive to create and maintain, and are used in much smaller doses. But they're critical – they turn a model that acts like a child into one that acts like an adult.

RLHF: reinforcement learning from human feedback

Every model provider integrates human feedback into their model's responses to varying degrees. The idea is pretty simple: just tell the model if its responses are good or not (and how) so it can get better.

Somewhat ironically, you can view this as a return to machine learning training fundamentals: humans labeling data.

Read the full post ↗

How do you train an AI model?

A deep dive into how models like ChatGPT get built.

Read in the Knowledge Base →

Mentioned in

How can AI use websites?

How Browserbase built their viral demo of Open Operator (in 24 hrs!) using Vercel's v0 and the AI SDK.

Appliedai

Related terms

ChatGPT

LLM

Loss Function

Machine Learning

Pre-training

Training

Impress your engineers

70K+ product managers, marketers, bankers, and other -ers read Technically to understand software and work better with developers.

← Back to Universe

Post-training

aiintermediate

LLM post-training turns a model from a knowledgeable blob that produces rambling answers, into a helpful assistant.

There are 2 primary post-training methods (instructional fine-tuning and reinforcement learning from human feedback), although AI labs are cooking up new post-training methods every day.

Instructional fine-tuning

Instructional fine-tuning feeds the model a set of question / answer pairs, to teach the model examples of how you'd want it to respond.

Here's a question answer pair:

Question: how do you clean a dirty pan?
Answer: mix together 2 parts vinegar and 1 part baking soda, and scrub until clean.

RLHF: reinforcement learning from human feedback

Somewhat ironically, you can view this as a return to machine learning training fundamentals: humans labeling data.

Read the full post ↗

How do you train an AI model?

A deep dive into how models like ChatGPT get built.

Read in the Knowledge Base →

Mentioned in

How can AI use websites?

How Browserbase built their viral demo of Open Operator (in 24 hrs!) using Vercel's v0 and the AI SDK.

Appliedai

Impress your engineers

70K+ product managers, marketers, bankers, and other -ers read Technically to understand software and work better with developers.

Learn

Explore knowledge bases

Meet Technically

Solutions for Teams

Post-training

Instructional fine-tuning

RLHF: reinforcement learning from human feedback

Read the full post ↗

How do you train an AI model?

Mentioned in

How can AI use websites?

Related terms

ChatGPT

LLM

Loss Function

Machine Learning

Pre-training

Training

Impress your engineers

Learn

Explore knowledge bases

Meet Technically

Solutions for Teams

Post-training

Instructional fine-tuning

RLHF: reinforcement learning from human feedback

Read the full post ↗

How do you train an AI model?

Mentioned in

How can AI use websites?

Related terms

ChatGPT

LLM

Loss Function

Machine Learning

Pre-training

Training

Impress your engineers