However we discuss a simple sparse linear problem that is
hard to learn with any algorithm that uses a linear
combination of the embedded training instances
as its weight vector, no matter what embedding is used.
We show that these algorithms are inherently limited
by the fact that after seeing *k* instances only a
weight space of dimension *k* can be spanned.

Surprisingly the same problem can be
efficiently learned using the exponentiated gradient (EG) algorithm:
Now the component-wise logarithms of the weights are essentially a
linear combination of the training instances. This algorithm enforces "additional constraints" on
the weights (all must be non-negative and sum to one) and in some
cases these constraints alone force the rank of the weight space to
grow as fast as *2 ^{k}*.

(Joint work with S.V.N. Vishwanathan!)