Identifying Dark Latency
By analogy, dark latency in software is latency that is undetectable directly, but whose presence can be inferred from overall application delays. We bring four tools to bear to observe dark latency in some Google web services, then identify and fix several root causes, in low-level Google libraries and in the TCP stack.
The talk makes a case for building and using very low-overhead tools to do bursty tracing of hundreds of thousands of timestamped events per second. A methodology of time-aligning multiple traces (remote procedure call, network, CPU, and lock-contention) across scores of interacting machines is the minimum needed to understand some sources of latency in real web services.
Dick Sites is a Senior Staff Engineer at Google, where he has worked for 6 years. He previously worked at Adobe Systems, Digital Equipment Corporation, Hewlett-Packard, Burroughs, and IBM. His accomplishments include co-architecting the DEC Alpha computers, advancing the art of binary translation for computer executables, adding electronic book encryption to Adobe Acrobat, decoding image metadata for Photoshop, and building various computer performance monitoring and tracing tools at the above companies. He also taught Computer Science for four years at UC/San Diego. Most recently he has been working on Unicode text processing and on CPU and network performance analysis at Google. Dr. Sites holds a PhD degree in Computer Science from Stanford and a BS degree in Mathematics from MIT. He also attended the Master's program in Computer Science at University of North Carolina 1969-70. He holds 34 patents and was recently elected to the U.S. National Academy of Engineering.