Meta Learning
- A meta-learning model is trained over a variety of learning tasks and optimized for the best performance on a distribution of tasks, including potentially unseen tasks.
- In meta-learning, one dataset is considered as one data sample.
- Has shown many promising results in Computer Vision, but have started to make its way to Natural Language Processing
Approaches
Mainly categorized into three approaches:
Model-based
- Makes no assumption on the form of $P_{\theta}(y \mid \mathbf{x})$
- Instead uses model designed specifically for fast learning
- Commonly used are RNNs, NTMs
Metric-based
- The main idea is to learn a good metric space to compare new examples to examples already see.
- $P_{\theta}(y \mid \mathbf{x})$ is modeled as $\sum_{\left(\mathbf{x}_{i} y_{i}\right) \in S} k_{\theta}\left(\mathbf{x}, \mathbf{x}_{i}\right) y_{i}\left({ }^{*}\right)$
- Common approach is to train a siamese network using gradient descent, and use comparison scheme such as k-NN or k-Means.
- Other works are Matching Networks (Vinyals et al., 2016), Relation Network (RN) (Sung et al., 2018), Prototypical Networks (Snell, Swersky & Zemel, 2017)
- Works well for few shot classification, but not known how well in regression or RL
Optimization-based
- One network (meta-learner) learns to update another network (the learner)
- In LSTM Meta-Learner, LSTM is used because remembers how it previously updated the learner model (think how momentum works)
- MAML - Model-Agnostic Meta-Learning disregards any specific model and is compatible with any model that learns through gradient descent
References
- Learning to Learn, Chelsea Finnm Jul 2017 https://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/
- Meta-Learning: Learning to Learn Fast https://lilianweng.github.io/lil-log/2018/11/30/meta-learning.html