Game Theory and Optimization in Boosting
Boosting is a central technique of machine learning, the branch of
artificial intelligence concerned with designing computer
programs that can build increasingly better models of reality as
they are presented with more data.
The theory of boosting is based on the observation that combining
with low predictive power can often lead to a significant boost in
the accuracy of the combined meta-model.
This approach, introduced about twenty years ago, has been a
prolific area of research, and has proved
immensely successful in practice.
However, despite extensive work, many basic questions about boosting
In this thesis, we increase our understanding of three such
theoretical aspects of boosting.
In Chapter 2 we study the convergence
properties of the most well known boosting algorithm, AdaBoost.
Rate bounds for this important algorithm are known for only special
situations that rarely hold in practice.
Our work guarantees fast rates hold under all situatons, and
the bounds we provide are optimal.
Apart from being important for practitioners, this bound also has
implications for the statistical properties of AdaBoost.
Like AdaBoost, most boosting algorithms are used for classification
tasks, where the object
is to create a model that can categorize relevant input data into
one of a finite number of different classes.
The most commonly studied setting is binary classification, when
there are only two possible classes, although the tasks arising in
practice are almost always multiclass in nature.
In Chapter 3 we provide a broad and general
framework for studying boosting for multiclass classification.
Using this approach, we are able
to identify for the first time the minimum assumptions under which
boosting the accuracy is possible in the multiclass setting.
Such theory existed previously for boosting for binary
classification, but straightforward extensions of that to the
multiclass setting lead to assumptions that are either too strong or
too weak for boosting to be effectively possible.
We also design boosting algorithms using these minimal assumptions,
which work in more general situations than previous
algorithms that assumed too much.
In the final chapter, we study the problem of learning from expert
advice which is closely related to boosting.
The goal is to extract useful advice from the opinions of a group of
experts even when there is no consensus among the experts
Although algorithms for this task enjoying excellent guarantees have
existed in the past, these were only approximately optimal, and
exactly optimal strategies were known only when the experts gave
binary ``yes/no' opinions.
Our work derives exactly optimal strategies when the experts provide
probabilistic opinions, which can be more nuanced than deterministic
In terms of boosting, this provides the optimal way of combining
individual models that attach confidence rating to their predictions
indicating predictive quality.