Modern datasets are rapidly growing in size and complexity, and this wealth of data holds the promise for many transformational applications. Machine learning is seemingly poised to deliver on this promise, having proposed and rigorously evaluated a wide range of data processing techniques over the past several decades. However, concerns over scalability and usability present major roadblocks to the wider adoption of these methods, and in this talk I will present work that addresses these concerns. In terms of scalability, my work relies on a careful application of divide-and-conquer methodology. In terms of usability, I focus on developing tools to diagnose the applicability of learning techniques and to autotune components of typical machine learning pipelines. I will discuss applications in the context of matrix factorization, estimator quality assessment and genomic variant calling.
Ameet Talwalkar is a postdoctoral fellow in the Computer Science Division at UC Berkeley. He obtained a Ph.D. in Computer Science from the Courant Institute at New York University, and prior to that graduated summa cum laude from Yale University. His work addresses scalability and ease-of-use issues in the field of machine learning, as well as applications related to large-scale genomic sequencing analysis. He has won the Janet Fabri Prize for best doctoral dissertation and the Henning Biermann Award for exceptional service at NYU, received Yale's undergraduate prize in Computer Science, and is an NSF OCI postdoctoral scholar.