Language Models (Classical)

Topics

Notes

Linked

Beam Search

Greedy decoding misses high-probability sequences. Maintain top-k partial hypotheses at each step for better approximate search.

RLHF - Reinforcement Learning with Human Feedback

Specifying a reward function for complex tasks like language generation is intractable. Learn a reward model from human preferences and optimize with RL.

BERT

Language models only use left context, missing bidirectional understanding. Mask random tokens and train to predict them using full context.