On the Compressed Sensing Properties of Word Embeddings
Distributed representations of words, or word embeddings, computed using large text corpora have become a popular way of encoding linguistic features for applications in natural language processing. However, their power, in terms of the information they encode and how this relates to performance on downstream tasks, is not theoretically understood. Drawing inspiration from results in compressed learning [7, 1], we present a remarkable empirical
property of word embeddings - they are more efficient than random matrices for sparse recovery of Bag-of-Words vectors from lin- ear compression. We discuss how this result can be understood by introducing a new, efficiently-verifiable compressed sensing property guaranteeing exact recovery of nonnegative signals that depends on geometric results connecting basis pursuit and neighborly polytopes . Finally, we analyze the extent to which
different em- beddings satisfy this property and how to connect these results to understand the performance of these representations on downstream tasks.