Meta Learning

  • A meta-learning model is trained over a variety of learning tasks and optimized for the best performance on a distribution of tasks, including potentially unseen tasks.
  • In meta-learning, one dataset is considered as one data sample.
  • Has shown many promising results in Computer Vision, but have started to make its way to Natural Language Processing

Approaches

Mainly categorized into three approaches:

Model-based

  • Makes no assumption on the form of $P_{\theta}(y \mid \mathbf{x})$
  • Instead uses model designed specifically for fast learning
  • Commonly used are RNNs, NTMs

Metric-based

  • The main idea is to learn a good metric space to compare new examples to examples already see.
  • $P_{\theta}(y \mid \mathbf{x})$ is modeled as $\sum_{\left(\mathbf{x}_{i} y_{i}\right) \in S} k_{\theta}\left(\mathbf{x}, \mathbf{x}_{i}\right) y_{i}\left({ }^{*}\right)$
  • Common approach is to train a siamese network using gradient descent, and use comparison scheme such as k-NN or k-Means.
  • Other works are Matching Networks (Vinyals et al., 2016), Relation Network (RN) (Sung et al., 2018), Prototypical Networks (Snell, Swersky & Zemel, 2017)
  • Works well for few shot classification, but not known how well in regression or RL

Optimization-based

  • One network (meta-learner) learns to update another network (the learner)
  • In LSTM Meta-Learner, LSTM is used because remembers how it previously updated the learner model (think how momentum works)
  • MAML - Model-Agnostic Meta-Learning disregards any specific model and is compatible with any model that learns through gradient descent

References

  1. Learning to Learn, Chelsea Finnm Jul 2017 https://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/
  2. Meta-Learning: Learning to Learn Fast https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html