Diving into the Mysteries of Deep Learning
by Doug Hulette
Sanjeev Arora, the Charles C. Fitzmorris Professor in Computer Science, is exploring the most baffling aspects of machine learning—especially “deep learning.” His end goal is to open the door to training techniques for machines that make the right decisions, mathematically guaranteed.
Professor Sanjeev Arora
photo by Frank Wojciechowski
“Current machine learning approaches are not very well understood mathematically, and they can sometimes fail with no warning and for no apparent reason,” says Arora, who in May was elected a member of the National Academy of Sciences, the most recent of many recognitions for his work. “Theoretical machine learning is devoted to mathematical understanding of machine learning. We try to rigorously understand the performance of machine learning algorithms, and to design new algorithms that are more well-founded, or whose formal properties are better understood.”
Beginning in 2017 and continuing through 2020, Arora is dividing his working time between the CS department and the Institute for Advanced Study, leading a new program in Theoretical Machine Learning. The program includes two postdocs and visiting faculty.
Arora earned a doctorate in computer science from UC Berkeley after receiving a bachelor of science in math and computer science from MIT. He joined Princeton in 1994. Professor Andrew Appel, former chair of the CS department, said, “Sanjeev Arora sustained and rebuilt the CS department’s Theory Group to its position of strength today — not only by his own research work but by his leadership in faculty recruitment and mentoring, building the postdoc program, and building cooperative activities with the University’s math department and the Institute for Advanced Study.”
In 2001 and again in 2011, Arora shared the Gödel Prize for outstanding papers in theoretical computer science. He received the 2011 ACM Prize in Computing. He is a fellow of the ACM and of the American Academy of Arts and Sciences, in addition to the National Academy of Science.
Arora discussed some of his interests in an email Q&A:
Your work develops new modes of analysis, new ways of mathematical reasoning. Is this math or is it computer science?
Formal reasoning about computers and algorithms calls for a new mode of thinking. We use mathematical reasoning in the service of computer science. Many aspects of deep learning are mysterious to its practitioners, and there is a pressing need to understand it more rigorously. For instance, for which problems does a particular deep architecture work? What determines the efficiency of the training algorithm, and how many training data it will require? How well does the trained net work if the data is slightly corrupted?
For many years your research centered on computational complexity theory, and you founded Princeton’s Center for Computational Intractability. Have you shifted your focus?
Throughout my career I have enjoyed taking on new problems, and venturing into new fields. That said, computationally intractable problems are actually ubiquitous in machine learning, including in deep learning, which is currently the most promising way to imbue machines with intelligence. Many of the problems in training machine learning algorithms are “theoretically intractable” (formally, “NP hard”), and it is a big scientific mystery why they are in practice solvable at very large scale. One of our goals is to understand such issues.
What’s an example of the kind of real-world problems your work may impact?
In the past year we have developed new, simple methods to capture the meaning of pieces of text (e.g., a paragraph of English text). This is a basic task in text processing, and the best algorithms for it involve deep learning. We discovered more elementary methods whose workings can be better understood and which require less training data in many settings. This allowed our technique to be useful in analysis of fMRI data in neuroscience experiments, whereby our colleagues were trying to understand the workings of the brain by taking fMRI readings of subjects watching movies. Their supply of training data was very limited. [fMRI is a form of brain imaging that illuminates functions by detecting specific areas of activity.]
The goal of machine learning is to create algorithms and computers that can make intelligent decisions. Are we nearing the so-called singularity, when computers outsmart humans? Is that a good thing?
My best guess is that singularity a la the “Terminator” movies is very far away. Machine learning can beat humans on Go and Chess, but on most other tasks its abilities are no match for even a human toddler. But the main application of machine learning is analysis of data, and in this it already leads to vast capabilities never seen before in history, for example, the ability to store enormous amounts of data about every human on earth and process it to extract all kinds of information. This extracted information can be used in ways good and bad, and arguably is already affecting our society, economy, and politics.
A few years ago, Facebook and Twitter were seen as ways to erase tyranny from Earth; today we realize how naïve this hope was. People who worry about singularity should instead worry more about unanticipated societal changes due to technological progress, especially machine learning.