In the hallways of Princeton, a fascination with the human mind unlocked the power of deep learning

September 22, 2025

Cube of cubes depicting lots of images of airplanes all fitting into a single abstract structure. — Intellectual connections sparked at Princeton inspired the creation of ImageNet, a comprehensive computer vision database that would lead to a paradigm shift in machine learning. Illustration by Bumper DeJesus

By Julia Schwarz

When Fei-Fei Li arrived in Princeton in January 2007 as an assistant professor, she was assigned an office on the second floor of the computer science building. Her neighbor was Christiane Fellbaum.

On the surface, Li and Fellbaum didn’t appear to have much in common. Li is a computer vision expert; Fellbaum is a linguist. But they liked each other. “We talked often and became very friendly,” said Fellbaum.

Their friendship, as it turned out, would be fortuitous for the future of computing. Their meeting sparked an intellectual connection that inspired a computer vision database — a database that would lead to a paradigm shift in machine learning.

Li and Fellbaum both had a fascination with the human mind’s ability to store and retrieve massive amounts of data. According to Li, they shared “a special interest in understanding — even mapping — the way a mind conceptualizes the world.”

Fei Fei Li and Christiane Fellbaum — *Fei-Fei Li in 2007 and Christiane Fellbaum in 2017. Photos by Frank Wojciechowski and Mark Czajk*

“As humans, we’re naturally adept at recognizing things after even a single glimpse,” Li wrote in her book, The Worlds I See. Even very young children can recognize surface properties, objects, scenes, faces, and materials immediately and without touching them.

Humans are similarly adept at learning and using words, an insight that animates Fellbaum’s work. “We know tens of thousands of words and concepts,” said Fellbaum. “How do we organize them all? Our brains are very good, but we are not computers.”

When they became neighbors in 2007, Fellbaum, a senior research scholar in the computer science department, had spent the last 20 years helping to build WordNet, an enormous database of over 145,000 English words.

Li, who had completed her dissertation just two years before, was about to embark on a similarly ambitious project, creating a comprehensive image database to train computer vision systems.

After learning about WordNet from Fellbaum, Li decided to call her project ImageNet.

Creating maps of meaning

WordNet was started by Princeton professor George A. Miller, one of the founders of cognitive psychology. His work centered on questions about how the mind processes information.

He was particularly interested in the mind’s ability to learn and recognize words. Grammar has a finite set of rules, said Fellbaum, but our vocabularies are dynamic and constantly evolving. “We keep using words in new ways, we add new words, we throw out old words,” she said.

Miller hypothesized that the mind supports this dynamic system through a network of semantic meaning, with each word connected to related words. The word “house,” for example, is connected in the mind to words like “home,” “dwelling,” “structure.” New words are added to the network like branches on a tree.

*George A. Miller, professor of psychology at Princeton and the creator of WordNet. Photo by John T. Miller*

Miller’s hypothesis about this network of meaning was eventually borne out in psychological experiments, Fellbaum said, which showed that when asked about semantically related words people could identify them more quickly. Subjects more quickly recalled “house” if they had just seen the word “dwelling.”

Miller began constructing WordNet in 1985. Fellbaum, who completed a Ph.D. in linguistics at Princeton, joined the project in 1987. Twenty years later, when Li and Fellbaum became neighbors, WordNet was nearly complete. Its 20,000 noun categories contained, essentially, all the objects that an English speaker can describe in words. Many WordNets were created for other languages as well.

Li, in awe of WordNet, decided to use this magisterial work as a scaffold to create a comprehensive map of visual meaning.

She aimed to pair each of WordNet's object categories with about a thousand annotated images, ultimately building a computer vision dataset that would reflect the fullness and complexity of the world.

By 2010, Li, her graduate students and a small army of crowdsource workers would compile millions of images and build the “largest hand-curated data set in AI’s history,” she wrote.

Soon, it would spark a revolution in deep learning.

How can a dataset build artificial intelligence?

Current deep learning systems are built on three essential components: a neural network algorithm, powerful hardware, and enormous amounts of organized training data.

In 2007, none of these components existed as they do now. Computers had about 2% of the processing power of current machines, neural network algorithms were not widely used, and organized datasets were small.

Only a handful of researchers were focused on building larger datasets. Some computer scientists questioned whether large amounts of training data would even be useful for building artificial intelligence.

“Prestige comes from building models,” said Olga Russakovsky, who worked on ImageNet as a graduate student at Stanford and is now an associate professor of computer science at Princeton. “Most people wondered why collecting a dataset would be intellectually interesting.”

Li was an exception. Her hypothesis, first developed as a graduate student at Caltech, was that, like a human child, a computer vision system needs to see many examples in a category — a thousand images of a German shepherd, for example — before it can classify an object at a single glance.

Li knew that building a comprehensive database, one that accurately compiled thousands of object categories, would need to contain millions of images. She also knew that collecting and annotating millions of images would take years. As an assistant professor, she had limited resources. But “Princeton is really good at encouraging junior faculty to think big,” said Russakovsky.

ImageNet interface in 2009 with 35 images of a wombat — *The ImageNet interface in 2009, showing the entry for “wombat.” Image by Fei-Fei Li and Jia Deng*

In this spirit, Li found an ally among the senior faculty: Kai Li (no relation), an expert in distributed systems and computer architecture, who championed her project when few others did. His longstanding interest in systems and data storage made him a valuable collaborator on the project — text search had just undergone a transformation during early years of the internet and he was already working on building tools for image search. He and his graduate students, including Jia Deng, would be integral to building ImageNet.

Deng, now a professor of computer science at Princeton, would play an essential role in ImageNet’s construction. He was the lead author on the first paper about ImageNet, published in 2009.

At the time, Deng said, many computer scientists were interested in studying data for the purposes of indexing and retrieving information on the internet. The idea that data could be used to build artificial intelligence, he said, “was a very bold hypothesis."

Deep learning in everything, everywhere

In 2025, deep learning is ubiquitous. If you use image recognition to unlock your phone, you’re using a deep learning model. When Spotify suggests a song, when Amazon Alexa understands a voice command, when Instagram recommends someone to follow, and when ChatGPT answers a question, all of these tasks rely on deep learning.

But the power of deep learning gives it the potential to do far more than simply make our lives more convenient, Russakovsky said. Deep learning can be used to improve farming methods by analyzing weather patterns and automating water usage. It can mitigate climate change by optimizing energy grids and traffic flows. It can help medical experts develop new therapies for disease. It can improve patient monitoring and help assess cognitive function. It can extend and augment physical capabilities to make the world more accessible to people with disabilities.

Fei Fei Li and Olga Russakovsky — *Fei-Fei Li and Olga Russakovsky in 2015. Together they co-founded a national organization called AI4ALL which is dedicated to increasing diversity in AI. Photo by Lauren Yang/Stanford AI4ALL*

So, what happened to bring deep learning from a fringe idea incubating in the hallways of universities to global prominence as the centerpiece of a technological and economic transformation?

To encourage colleagues to use ImageNet as a benchmarking tool, in 2010 Li and her team created a competition where teams of experts could build models to recognize ImageNet images and compare their results. By then the dataset was nearly complete, with more than 14 million annotated images, by far the largest in history.

At first, the challenge produced nothing exciting. But in its third year, in 2012, a team from the University of Toronto shocked the AI world by winning in a way no one expected, using a technique — neural networks — that had been largely abandoned by the broader community. The model, called AlexNet, was created by Geoffrey Hinton and his graduate students Alex Krizhevky and Ilya Sutskever.

By that point, Li and Deng had moved to Stanford University and recruited Russakovsky to the project. As organizers of the competition, Deng and Russakovsy had front row seats to what has been called AI’s “Big Bang moment.”

“This was the moment when people started to believe that deep learning models actually worked,” said Russakovsky.

Neural nets weren’t new. They had been successfully used since the 1980s, but only in limited, constrained settings, like reading handwritten numbers on checks. Data was the missing ingredient, and it would transform neural networks into the most successful and widely used machine learning approach. “Without the data, you can't get the results,” said Russakovsky.

Hinton would go on to win the Nobel Prize in 2024 for his contributions to artificial intelligence, a prize he shared with Princeton neuroscientist John Hopfield, who had invented a class of networks that laid the foundation for Hinton’s models.

Li, Hinton, Fellbaum, Miller and Hopfield all share a deep intellectual connection: they are fascinated by the human mind and its ability to absorb and recall vast amounts of information. Their interest in understanding human intelligence in turn created the most successful artificial intelligence paradigm to date.

But the job is not done. The human mind, Russakovsky said, remains integral to advancing AI that aligns with the public good and creates useful new forms of technology.

In that sense, ImageNet and its forebear WordNet, in all their boldness and foresight, were only the beginning. They provided structure and, together with AlexNet, demonstrated the need for large amounts of highly organized data.

But many questions and deeper problems in artificial intelligence remain and the human mind can continue to provide inspiration. “A lot of the questions researchers are now asking,” Russakovsky said, “are calling for us to reexamine things and ask: How is the human mind actually doing this?”