Meta-Learning for Data and Processing Efficiency | Computer Science Department at Princeton University

Report ID:

TR-006-19

Authors:

Ravi, Sachin

Date:

May 10, 2019

Pages:

Download Formats:

[PDF]

Abstract:

Deep learning models have shown great success in a variety of machine learning
benchmarks; however, these models still lack the efficiency and flexibility of humans.
Current deep learning methods involve training on a large amount of data to produce
a model that can then specialize to the specific task encoded by the training data.
Humans, on the other hand, are able to learn new concepts throughout our lives
with comparatively little feedback. In order to bridge this gap, previous work has
suggested the use of meta-learning. Rather than learning how to do a specific task,
meta-learning involves learning how-to-learn and utilizing this knowledge to learn
new tasks more effectively. This thesis focuses on using meta-learning to improve the
data and processing efficiency of deep learning models when learning new tasks.
First, we discuss a meta-learning model for the few-shot learning problem, where
the aim is to learn a new classification task having unseen classes with few labeled
examples. We use a LSTM-based meta-learner model to learn both the initialization
and the optimization algorithm used to train another neural network and show that
our method compares favorably to nearest-neighbor approaches. The second part of
the thesis deals with improving the predictive uncertainty of models in the few-shot
learning setting. Using a Bayesian perspective, we propose a meta-learning method
which efficiently amortizes hierarchical variational inference across tasks, learning a
prior distribution over neural network weights so that a few steps of gradient descent
will produce a good task-specific approximate posterior. Finally, we focus on applying
meta-learning in the context of making choices that impact processing efficacy. When
training a network on multiple tasks, we have a choice between interactive parallelism
(training on different tasks one after another) and independent parallelism (using the
network to process multiple tasks concurrently). For the simulation environment
considered, we show that there is a trade-off between these two types of processing
choices in deep neural networks. We then discuss a meta-learning algorithm for an
iii
agent to learn how to train itself with regard to this trade-off in an environment with
unknown serialization cost.