Language Models (Classical)
Topics
Notes
Linked
Beam Search
Greedy decoding misses high-probability sequences. Maintain top-k partial hypotheses at each step for better approximate search.
RLHF - Reinforcement Learning with Human Feedback
Specifying a reward function for complex tasks like language generation is intractable. Learn a reward model from human preferences and optimize with RL.
BERT
Language models only use left context, missing bidirectional understanding. Mask random tokens and train to predict them using full context.