Dongsheng Yang will present his FPO "Practical Learned Cache Eviction for Modern Computing Systems" on Thursday, 5/14/26 at 9:30am in CS 302.
The members of his committee are as follows:
Examiners: Kai Li (adviser), Wyatt Lloyd (adviser), Ravi Netravali, Jialin Ding
Readers: Wyatt Lloyd, Kai Li, and Daniel S. Burger (Microsoft Research)
Everyone is invited to attend his talk.
Abstract:
This dissertation studies how to make machine-learning-based cache eviction practical across modern systems with different operating constraints. Although cache workloads vary across domains, they share a common tension: better eviction decisions can improve miss ratio and end-to-end performance, but stronger policies often add compute and metadata overhead that harms deployability.
I address this tension through three projects in CDN caching, in-memory key-value caching, and LLM serving. The first project, Machine Learning at the Tail (MAT), introduces a heuristic-filtered learned eviction framework for CDN-style workloads. By focusing predictions on high-value candidates, MAT reduces prediction overhead by roughly an order of magnitude while matching the miss ratio of the state-of-the-art ML cache and achieving similar throughput to a heuristic-only system. The second project, OpML, designs a practical learned eviction framework for in-memory caches with strict latency and throughput bounds. OpML's core insight is opportunistic ML: it exploits the burstiness of in-memory cache traffic to run ML eviction during off-peak periods and fall back to a heuristic at peak, combined with compact metadata encoding and an asynchronous ML pipeline. Implemented in Cachelib, OpML improves throughput by 10–50% over LRU while lowering miss ratio by 5–24%. The third project, Learned Prefix Caching (LPC), improves LLM prefix reuse by predicting conversation continuation likelihood and combining this signal with recency-aware decisions. Across three real-world datasets, LPC reduces required cache capacity by 18–47% at equivalent hit ratios and improves prefilling throughput by 11% in emulated deployment.
Taken together, these projects show that learned eviction can deliver robust systems benefits—both lower miss ratios and preserved end-to-end throughput—when algorithmic quality and systems constraints are co-designed. This dissertation contributes a unified design perspective for deployable ML caching: restrict ML to high-impact decisions, keep inference off critical request paths, and bound metadata overhead so prediction gains are not offset by reduced effective capacity.
Date and Time
Thursday May 14, 2026 9:30am -
11:30am
Location
Computer Science 302
Event Type
Host
Dongsheng Yang