Systems and Tools for Reliable Software: Replication, Reproducibility, and Security
While this brings unprecedented opportunity, it also increases the probability of failures and the difﬁculty of diagnosing them. Increased scale and transience has also made management increasingly challenging. Devices can come and go for a variety of reasons including mobility, failure and recovery, and scaling capacity to meet demand.
In this talk, I will be presenting several systems that I built to address the resulting challenges to reliability, management, and security.
Ori is a reliable distributed ﬁle system for devices at the network edge. Ori automates many of the tasks of storage reliability and recovery through replication, taking advantage of fast LANs and low cost local storage in edge networks.
Castor is record/replay system for multi-core applications with predictable and consistently low overheads. This makes it practical to leave record/replay on in production systems, to reproduce difficult bugs when they occur, and to support recovering from hardware failures through fault tolerance.
Cryptographic CFI (CCFI) is a dynamic approach to control flow integrity. Unlike previous CFI systems that rely purely on static analysis, CCFI can classify pointers based on dynamic and runtime characteristics. This limits the attacks to only actively used code paths, resulting in a substantially smaller attack surface.
Ali is currently completing his PhD at Stanford University where he is advised by Prof. David Mazières. His work focuses on improving reliability, ease of management and security in operating systems and distributed systems. Previously, he was a Staff Engineer at VMware, Inc. where he was the technical lead for the live migration products. Ali received an M.Eng. in electrical engineering and computer science and a B.S. in electrical engineering from the Massachusetts Institute of Technology.