Quick links

Leaving the Span

Date and Time
Wednesday, October 25, 2006 - 4:00pm to 5:30pm
Computer Science Small Auditorium (Room 105)
Manfred K. Warmuth, from University of California, Santa Cruz
Robert Schapire
When linear models are too simple then the following "kernel trick" is commonly used: Expand the instances into a high-dimensional feature space and use any algorithm whose linear weight vector in feature space is a linear combination of the expanded instances. Linear models in feature space are typically non-linear in the original space and seemingly more powerful. Also dot products can still be computed efficiently via the use of a kernel function.

However we discuss a simple sparse linear problem that is hard to learn with any algorithm that uses a linear combination of the embedded training instances as its weight vector, no matter what embedding is used. We show that these algorithms are inherently limited by the fact that after seeing k instances only a weight space of dimension k can be spanned.

Surprisingly the same problem can be efficiently learned using the exponentiated gradient (EG) algorithm: Now the component-wise logarithms of the weights are essentially a linear combination of the training instances. This algorithm enforces "additional constraints" on the weights (all must be non-negative and sum to one) and in some cases these constraints alone force the rank of the weight space to grow as fast as 2k.

(Joint work with S.V.N. Vishwanathan!)

Follow us: Facebook Twitter Linkedin