Internet availability -- the ability for any two nodes in the Internet to communicate — is essential to being able to use the Internet for delivery critical applications such as real-time health monitoring and response. Despite massive investment by ISPs worldwide, Internet availability remains poor, with literally hundreds of outages occurring daily, even in North America and Europe. Some have suggested that addressing this problem requires a complete redesign of the Internet, but in this talk I will argue that considerable progress can be made with a small set of backwardly compatible changes to the existing Internet protocols. We take a two-pronged approach. Many outages occur on a fine-grained time scale due to the convergence properties of BGP, the Internet’s interdomain routing system. We describe a novel set of additions to BGP that retains its structural properties, but applies lessons from fault tolerant distributed systems research to radically improve its availability. Other outages are longer-lasting and occur due to complex interactions between router failures and router misconfiguration. I will describe some ongoing work to build an automated system to quickly localize and repair these types of problems.
Thomas Anderson is the Robert E. Dinning Professor of Computer Science and Engineering at the University of Washington. His research interests span all aspects of building practical, robust, and efficient computer systems, including distributed systems, operating systems, computer networks, multiprocessors, and security. He is an ACM Fellow, winner of the ACM SIGOPS Mark Weiser Award, winner of the IEEE Bennett Prize, past program chair of SIGCOMM and SOSP, and he has co-authored seventeen award papers.