Post-training

LLM post-training turns a model from a knowledgeable blob that produces rambling answers, into a helpful assistant.

There are 2 primary post-training methods (instructional fine-tuning and reinforcement learning from human feedback), although AI labs are cooking up new post-training methods every day.

Instructional fine-tuning

Instructional fine-tuning feeds the model a set of question / answer pairs, to teach the model examples of how you'd want it to respond.

Here's a question answer pair:

  • Question: how do you clean a dirty pan?
  • Answer: mix together 2 parts vinegar and 1 part baking soda, and scrub until clean.

These pairs need to actually be curated by humans (for now). So they are more expensive to create and maintain, and are used in much smaller doses. But they're critical – they turn a model that acts like a child into one that acts like an adult.

RLHF: reinforcement learning from human feedback

Every model provider integrates human feedback into their model's responses to varying degrees. The idea is pretty simple: just tell the model if its responses are good or not (and how) so it can get better.

Somewhat ironically, you can view this as a return to machine learning training fundamentals: humans labeling data.