Meta Learning

Created May 28, 2021 · Updated March 4, 2026

A meta-learning model is trained over a variety of learning tasks and optimized for the best performance on a distribution of tasks, including potentially unseen tasks.
In meta-learning, one dataset is considered as one data sample.
Has shown many promising results in Computer Vision, but have started to make its way to Natural Language Processing

Approaches

Mainly categorized into three approaches:

The main idea is to learn a good metric space to compare new examples to examples already see.
$P_{\theta}(y \mid \mathbf{x})$ is modeled as $\sum_{\left(\mathbf{x}_{i} y_{i}\right) \in S} k_{\theta}\left(\mathbf{x}, \mathbf{x}_{i}\right) y_{i}\left({ }^{*}\right)$
Common approach is to train a siamese network using gradient descent, and use comparison scheme such as k-NN or k-Means.
Other works are Matching Networks (Vinyals et al., 2016), Relation Network (RN) (Sung et al., 2018), Prototypical Networks (Snell, Swersky & Zemel, 2017)
Works well for few shot classification, but not known how well in regression or RL

One network (meta-learner) learns to update another network (the learner)
In LSTM Meta-Learner, LSTM is used because remembers how it previously updated the learner model (think how momentum works)
MAML - Model-Agnostic Meta-Learning disregards any specific model and is compatible with any model that learns through gradient descent