05-07
Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM  at 194 Nassau Street conference room.

Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM  at 194 Nassau Street conference room.
 
Committee:
Examiners: Mike Freedman (adviser), Wyatt Lloyd, Asaf Cidon (Columbia University)
Readers: Mike Freedman and Jialin Ding
 
Abstract:
Modern computer systems routinely lose performance and availability to
time spent waiting on blocking events: CPUs stall on memory, services
block on storage and networks, and stateful applications wait through
conservative failover protocols. This dissertation studies a common
question: when a system must wait, how can it safely run other useful
work in parallel with that wait? The dissertation develops this theme
through two systems projects. The first, Speculative Recovery, targets
failover for stateful applications using recovery from disaggregated
storage (REDS). REDS is resource efficient because only one instance
runs during normal operation, but failover is slow because timeout and
recovery run sequentially. Speculative Recovery starts backup recovery
as soon as the primary appears unavailable, while letting the primary
continue in case it recovers first. The work introduces disk superposition
and the super/collapse abstractions, allowing temporary divergence of
disk state while ensuring only one version becomes externally
observable. The design includes collocated-clone for near-normal clone
performance and dirty-bit-based rules for correctness. Implemented in
Ceph and evaluated with MySQL, PostgreSQL, and MariaDB, the
approach improves failover while preserving the resource efficiency of
REDS. The second project, LiteSwitch, targets sub-microsecond CPU
stall cycles caused by CXL-attached memory. CXL expands memory
capacity but increases access latency and amplifies memory-induced
stalls. Existing harvesting techniques are mismatched: profiling-based
methods struggle with CXL latency variation, and interruptbased
delivery is too expensive for hundreds-of-nanoseconds windows.
LiteSwitch uses a lightweight hardware-software co-design. On the
hardware side, location-dependent memory branching (LDMB) detects
long-latency accesses online and delivers control via direct branching.
On the software side, Bundled Handoff provides fast scavenger
selection, and xstate-aware context switching avoids unnecessary
SIMD/FP iiisave/restore overhead. Evaluation shows substantial
slowdown reductions across representative workloads and CXL latency
settings. Taken together, these projects show that idle time can be an
opportunity rather than unavoidable loss. The central lesson is that
useful parallelization with waiting is effective only when systems codesign
performance mechanisms with correctness constraints. By
combining overlap with careful control over observability, ordering, and
runtime overhead, this dissertation demonstrates practical ways to
improve both availability and performance in modern systems.

 
Date and Time
Thursday May 7, 2026 11:30am - 1:30pm
Not yet determined.
Event Type
Speaker
Nanqinqin Li
Host
Nanqinqin Li

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List