Improving the Reliability of Commodity Operating Systems
Date and Time
Tuesday, April 26, 2005 - 4:00pm to 5:30pm
Computer Science Small Auditorium (Room 105)
Michael Swift, from Washington
Despite decades of research in fault tolerance, commodity operating systems, such as Windows and Linux, continue to crash. In this talk, I will describe a new reliability subsystem for operating systems that prevents the most common cause of crashes, device driver failures, without requiring changes to drivers themselves. To date, the subsystem has been used in Linux to prevent system crashes in the presence of driver failures, recover failed drivers transparently to the OS and applications, and update drivers "on the fly" without requiring a system reboot after installation. Measurements show that the system is extremely effective at protecting the OS from driver failures, while imposing little runtime overhead.