Large Language Models (LLMs)
Topics
Notes
Linked
Better Think Thrice - Learning to Reason Causally with Double Counterfactual Consistency
How to make LLMs more causally consistent.
OMNI - Open-endedness via Models of human Notions of Interestingness
How to measure notions of interestingness without hand-coded formulas for learning agents
Emergent Misalignment in LLMs
Fine-tuning LLMs on narrow tasks can unexpectedly produce misaligned behavior on unrelated tasks.
Direct Preference Optimization (DPO)
Learn from preference dataset without the complicated RL setup
Group Relative Policy Optimization (GRPO)
Avoid learning an explicit value function in RL alignment setup