Model-Agnostic Meta-Learning (MAML)

Created May 7, 2021 · Updated March 4, 2026

MAML attempts to answer the question: How to find an initialization for the meta-learner that is not only useful for adapting to various problems, but also can be adapted quickly (in a small number of steps) and efficiently (using only a few examples)?
MAML optimizes for a set of parameters such that when a gradient step is taken with respect to a particular task i, the parameters are close the optimal parameters for task i.
Doesn't make any assumptions on the form of the model.
No additional parameters introduced for meta-learning, and uses Stochastic Gradient Descent.

Advantages of MAML

Substantially outperform a number of existing approaches on popular few-shot image classification benchmarks, Omniglot and MiniImageNet, including existing approaches that were much more complex or domain specific.
When MAML combined with Policy Gradient methods for Reinforcement Learning. MAML discovered a policy which let a simulated robot adapt its locomotion direction and speed in a single gradient update.

MAML is trained by backpropagating the loss through the within-episode gradient descent procedure. This normally requires computing second-order gradients, which can be expensive to obtain (both in terms of time and memory). For this reason, an approximation is often used whereby gradients of the within-episode descent steps are ignored. This approximation is called first-order MAML.

Combines the complementary strengths of Prototypical Networks and MAML.
By allowing gradients to flow through the Prototypical Network-equivalent linear layer initialization, it significantly helps the optimization of this model and outperforms vanilla fo-MAML by a large margin.