These are my working notes on various topics in machine learning and AI, from foundational ideas and recent research. I write them for myself as a thinking tool - they are opinionated, sometimes incomplete, and updated over time. They are also heavily interlinked, so following connections is often the best way to explore.
- Adaptive Learning Rate Optimizers
- Autoencoders
- Autoregressive Models
- BERT
- BLEU
- BM25
- Backpropagation
- Boltzmann Machines
- CNNs for NLP
- Calibration
- Capsule Networks (CapsNet)
- Challenges of GAN
- Compositional semantics and sentence representations
- Compressed Sensing
- Conditional GAN
- Confident Learning - Principled Data Cleaning
- Contrastive Divergence
- Coreference Resolution
- Counterfactual Evaluation and LTR
- Cross Validation
- Cross entropy
- Decision Theory
- Depth and Trainability
- Discrete Fourier Transform
- Discriminant Functions
- Disentangled Representations
- Distant Supervision
- Dropout
- Dyna-Q - Planning and Learning
- Dynamic Programming (RL)
- Emergent Misalignment in LLMs
- Energy based models
- Ensemble Methods
- Equivalent Kernel
- Expected Reciprocal Rank
- Fisher Information
- GRU
- Gaussian Mixture Model
- Generative Adversarial Networks
- Graph Convolutional Networks (GCN)
- Group Equivariant Convolutional Neural Networks
- Harris Corner Detection
- Hopfield Networks
- Hough Transform
- Incremental Implementation of Estimating Action Values
- InfoGAN
- Inverse Reinforcement Learning
- Jensen's Inequality
- Jensen–Shannon Divergence
- K-Means
- KL Divergence
- Kernel Methods
- LSTM
- Lagrange Multipliers
- LambdaRank
- Latenent Variable Models
- Learning to Defer
- Learning to Rank
- Least squares for classification
- ListNet and ListMLE
- Logistic Regression
- Loss Functions
- MAML - Model-Agnostic Meta-Learning
- Maximum Entropy Principle
- Maximum Likelihood Estimation
- Maximum Mean Discrepancy (MMD)
- Meta Learning
- Model Based Reinforcement Learning
- Model Complexity and Occams Razor
- Model Free Reinforcement Learning
- Monte-Carlo Tree Search
- Multi-Armed Bandits
- Multi-Head Latent Attention (MLA)
- Natural Policy Gradient
- Normalization
- Normalizing Flows
- Off-policy learning with approximation
- On-policy learning with approximation
- Online Evaluation and LTR
- PGT Actor-Critic
- Perceptron
- PixelRNN
- Policy Gradient
- Polyloss
- Positional Encoding
- Principle Component Analysis (PCA)
- Prioritized Sweeping
- Probabilistic Generative Models
- RankNet
- Recurrent Neural Networks (RNN)
- Regularized Least Squares
- Reinforcement Learning Problem Setup
- Semi-Markov Decision Processes
- SentencePiece - Unigram LM Encoding
- Similarity Measures
- State Update Functions in Partially Observable MDP
- Stochastic Gradient Descent
- Uncertainty in Machine Learning
- Variational Autoencoders
- Variational Inference
- Weight Initialization in Deep Neural Networks
- Why Generative Models
- Why implicit density models
- PPO - Proximal Policy Optimization
- TRPO - Trust-Region Policy Optimization
- Hierarchical Reasoning Model (HRM)
- MAP-Elites
- Challenges of optimizing deep models
- Tiny Reasoning Model (TRM)
- Deep Supervision with Recursion
- Backpropagation Through Time (BPTT)
- Temporal Difference Learning
- Transformers
- Direct Preference Optimization (DPO)
- Self-Attention Mechanism
- Autoregressive Generation and KV Caching in Transformers
- Beam Search
- Class Imbalance
- Grouped Query Attention (GQA)
- Mixture of Experts in Transformers (MoE)
- Mixture of Experts
- REINFORCE - Score Function Estimator
- Rotary Position Embeddings (RoPE)
- Byte Pair Encoding
- Focal Loss
- Tokenization
- RLHF - Reinforcement Learning with Human Feedback
- Bradley-Terry Model
- Importance Sampling
- Distribution Shift
- RMSNorm
- Maximum A Posteriori (MAP)
- Conformal Prediction
- Convolutional Neural Networks (CNN)
- Layer Normalization
- Monte-Carlo Estimation
- Gaussian Processes
- Stochastic Gradients
- Bias vs Variance in Machine Learning
- Convolution
- Pathwise Gradient Estimator
- Reinforcement Learning
- Advantage Functions
- Bellman Equation and Value Functions
- Control Variates
- Deep Q-Learning
- Eligibility Trace
- Expectation Maximization
- Markov Decision Processes
- Markov Reward Processes
- Multi-Network Training with Moving Average Target
- Partial Observability
- REINFORCE - Monte Carlo Policy Gradient
- Singular Value Decomposition (SVD)
- Support Vector Machines (SVM)
- Bayesian Estimation
- Collaborative filtering
- High-Dimensional Dot Product Normalization
- Activation Functions
- Bayesian Linear Regression
- Bayesian Model Selection with Model Evidence
- ReLU
- Basis Functions
- Generalized Advantage Estimate
- Monte-Carlo RL Methods
- Gaussian Distribution
- Covariate Shift