Finding Needles in a 10 TB Haystack, 140M Times/Day

Date and Time
Thursday, November 15, 2001 - 4:00pm to 5:30pm
Computer Science Small Auditorium (Room 105)
Rob Shillner, from Google, Inc
David Dobkin
Search is one of the most ubiquitous and important applications used on the internet, but it is also one of the hardest applications to do well. Google is a search engine company that began as a research project at Stanford University, and has evolved into the world's largest and most trafficked search engine in just under three years. Three main characteristics have driven this growth: search quality, index size, and speed. Addressing these issues has required tackling problems in a range of computer science disciplines, including algorithm and data structure design, networking, operating systems, distributed and fault-tolerant computing, information retrieval, and user interface design. In this talk, I'll focus on Google's unique hardware platform of 10,000 commodity PCs running Linux, and some of the challenges and benefits presented by this platform. I'll also describe some of the interesting problems that arise in crawling and indexing more than a billion web pages, and performing 140 million queries per day on this index. Finally, I'll describe some of the challenges facing search engines in the future.
