05-07
Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM at 194 Nassau Street conference room.

Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM at 194 Nassau Street conference room.

Committee:

Examiners: Mike Freedman (adviser), Wyatt Lloyd, Asaf Cidon (Columbia University)

Readers: Mike Freedman and Jialin Ding

Abstract:

Modern computer systems routinely lose performance and availability to

time spent waiting on blocking events: CPUs stall on memory, services

block on storage and networks, and stateful applications wait through

conservative failover protocols. This dissertation studies a common

question: when a system must wait, how can it safely run other useful

work in parallel with that wait? The dissertation develops this theme

through two systems projects. The first, Speculative Recovery, targets

failover for stateful applications using recovery from disaggregated

storage (REDS). REDS is resource efficient because only one instance

runs during normal operation, but failover is slow because timeout and

recovery run sequentially. Speculative Recovery starts backup recovery

as soon as the primary appears unavailable, while letting the primary

continue in case it recovers first. The work introduces disk superposition

and the super/collapse abstractions, allowing temporary divergence of

disk state while ensuring only one version becomes externally

observable. The design includes collocated-clone for near-normal clone

performance and dirty-bit-based rules for correctness. Implemented in

Ceph and evaluated with MySQL, PostgreSQL, and MariaDB, the

approach improves failover while preserving the resource efficiency of

REDS. The second project, LiteSwitch, targets sub-microsecond CPU

stall cycles caused by CXL-attached memory. CXL expands memory

capacity but increases access latency and amplifies memory-induced

stalls. Existing harvesting techniques are mismatched: profiling-based

methods struggle with CXL latency variation, and interruptbased

delivery is too expensive for hundreds-of-nanoseconds windows.

LiteSwitch uses a lightweight hardware-software co-design. On the

hardware side, location-dependent memory branching (LDMB) detects

long-latency accesses online and delivers control via direct branching.

On the software side, Bundled Handoff provides fast scavenger

selection, and xstate-aware context switching avoids unnecessary

SIMD/FP iiisave/restore overhead. Evaluation shows substantial

slowdown reductions across representative workloads and CXL latency

settings. Taken together, these projects show that idle time can be an

opportunity rather than unavoidable loss. The central lesson is that

useful parallelization with waiting is effective only when systems codesign

performance mechanisms with correctness constraints. By

combining overlap with careful control over observability, ordering, and

runtime overhead, this dissertation demonstrates practical ways to

improve both availability and performance in modern systems.

Date and Time

Thursday May 7, 2026 11:30am - 1:30pm

Not yet determined.

Event Type

Final Public Oral

Speaker

Nanqinqin Li

Host

Nanqinqin Li

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.

CS Talks Mailing List

05-07 Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM at 194 Nassau Street conference room.

05-07
Nanqinqin Li will present his FPO "System Idle Time Need Not Be Wasted" on Thursday, 5/7/2026 at 11:30 AM at 194 Nassau Street conference room.