|
TR-797-07
Maximum Entropy Density Estimation and Modeling Geographic Distributions of Species (thesis) |
|
| Authors: | Dudik, Miroslav |
| Date: | August 2007 |
| Pages: | 186 |
| Download Formats: | [PDF] |
Maximum entropy (maxent) approach, formally equivalent to maximum likelihood, is a widely used density-estimation method. When input datasets are small, maxent is likely to overfit. Overfitting can be eliminated by various smoothing techniques, such as regularization and constraint relaxation, but theory explaining their properties is often missing or needs to be derived for each case separately. In this dissertation, we propose a unified treatment for a large and general class of smoothing techniques. We provide fully general guarantees on their statistical performance and propose optimization algorithms with complete convergence proofs. As special cases, we can easily derive performance guarantees for many known regularization types including L1 and L2-squared regularization. Furthermore, our general approach enables us to derive entirely new regularization functions with superior statistical guarantees. The new regularization functions use information about the structure of the feature space, incorporate information about sample selection bias, and combine information across several related density-estimation tasks. We propose algorithms solving a large and general subclass of generalized maxent problems, including all discussed in the dissertation, and prove their convergence. Our convergence proofs generalize techniques based on information geometry and Bregman divergences as well as those based more directly on compactness. |
|