I'm Xiaoqi, a final year PhD student at Department of Computer Science, Princeton University, advised by Prof. Jennifer Rexford. Before joining Princeton, I received my Bachelor's degree from Yao Class, Institute for Interdisciplinary Information Sciences, Tsinghua University.
I aspire to bridge theoretical insights in streaming algorithms with practical innovations in measuring real-world networks. My research primarily focuses on designing and building efficient data-plane algorithms for network measurements and beyond.News:
Network Measurement in Data Plane
I design algorithm and data structures that run directly in the data plane of high-speed programmable switches, to perform measurement telemetry and help operators analyze performance and security anomalies. Since high-speed switches impose tight memory and arithmetic constraints, I employ various approximation techniques and hardware-specific adaptations to produce algorithms that can process traffic at Terabits per second line rate.
BeauCoup: run multiple queries simultaneously in data plane
We identify limited memory access, instead of limited memory space, as the most critical resource constraint for running multiple network measurement algorithm on programmable switches.
By using Coupon Collectors, BeauCoup supports running multiple distinct counting queries simultaneously, while only making a small constant number of memory accesses per packet.
Furthermore, unlike earlier work, BeauCoup supports updating traffic queries on-the-fly without re-compiling P4 code, by separating query parameters from data-plane logic. Queries can be updated via installing TCAM rules only, minimizing network downtime.
Featured in APNIC Blog!Paper (SIGCOMM'20) Slides P4 Code
PRECISION: Heavy-hitter detection in programmable data plane
To run at Terabits-per-second line rate in programmable data plane, measurement algorithms must follow a restrictive programming model, with constraints on memory size and access.
We modified the Space-Saving algorithm to adapt to these hardware constraints, to track Heavy Hitters (network flows with the largest volume) in the data plane. By recirculating a small fraction of packets probabilistically, we greatly simplify our algorithm's memory access pattern. Our algorithm still achieves comparable accuracy to algorithms in an unrestricted programming model.
In our paper, we also summarized the programming model constraints into three easy-to-follow rules for designing future measurement algorithms.
AROMA: Routing-oblivious packet and flow sampling
Traditional traffic sampling and analysis methods require precise routing information to de-duplicate measurements collected from multiple vantage points.
However, network administrators may not have complete or accurate knowledge of how every packet is routed.
By using random hash functions, we build a routing-agnostic network measurement framework that sample flows and packets from multiple vantage points across the network. The samples can subsequently be used to perform various network-wide analysis.
Measuring RTT in data plane using multi-stage hash tables
We measure packet-level round trip time by matching a TCP packet with its corresponding acknowledgement.
We buuild multi-stage hash tables to reduce the effect of hash collisions when using hash tables in the data plane, and allows table entries to "lazily expire" by deleting them upon hash collisions.
Compared with prior work that only looks at handshake packets, our work produces multiple RTT samples per flow, which helps capturing changing network condition for long-running TCP flows such as video streaming.
Fridge: Unbiased measurement in the data plane
Hash collisions in data structures often affect the accuracy of network measurement in the data plane. For example, when we measure round-trip delays by saving request packets in a hash-indexed array and matching with subsequent response packets, those requests with higher delays are less likely to survive all the hash collisions until their responses arrive. This creates a bias in the measured delay distribution.
We propose Fridge, a novel data structure that tracks the time each request sample spent within it before being matched with the corresponding response. This allows us to precisely analize and correct for the survivorship bias caused by hash collisions. Using an entry probability and an insertion counter, the Fridge can significantly reduce the bias in delay distribution measurement, achieving the same accuracy with 2x-4x memory saving.
High-speed In-Kernel network measurement using eBPF
eBPF is an infrastructure that allows to dynamically load and run micro-programs directly in the Linux kernel's network data plane, without the need for recompiling the kernel. This provides a great opportunity to run sketch-based high-speed network measurement algorithms on servers.
However, verbatimly running existing sketch-based measurement algorithms under eBPF suffers from several performance bottlenecks. In this paper, we present a case study and discuss techniques to optimize sketch-based network measurement algorithms for best performance under eBPF.
To appear in Computer Communication Review.Draft Code
Real-time, Closed-loop Control
With data-plane measurement algorithms, programmable switches can now react to changes in traffic in real time, without the higher latency in conventional SDN control plane. The switches can now react to events in a sub-millisecond time scale to improve network performance.
ConQuest: real-time microburst analysis and mitigation
Microburst is a phenomenon where bursty ingress traffic fills up a switch's queuing buffer rapidly, causing high latency, jitter, and in the worse case, packet drops.
Due to its short timescale, conventional monitoring methods based on packet sampling or aggregated statistics provide very little insights about the dynamics or the cause of microbursts.
ConQuest analyzes each individual network flow's contribution to queuing in real time, using a round-robin snapshots data strucutre running in the programmable data plane. After identifying the very bursty flows that caused microbursts, ConQuest can immediately suppress these bursty flows to surgically mitigate microbursts, protecting other flows.
We also propose a novel tapping-based setup to analyze queuing in legacy, non-programmable switches. We have used ConQuest to perform fine-grained queuing analysis in two real-world deployments: at Princeton University's campus network, as well as in AT&T's carrier network.
Synthesize state machines for data plane execution
Many stateful network processing logic can be formulated as finite state machines; executing them in data plane allows switches to take different actions according to different states.
However, programmable switches support only a limited set of arithmetic operations between reading and writing values stored in stateful memory, making it challenging to implement complex state transitions as atomic memory updates.
We propose a novel technique to map states to numerical values and implement state transitions as simple arithmetic operations. We then formulate the requirement as SMT constraints and use an off-the-shelf SMT solver to produce such mappings. Evaluation shows our approach can successfully synthesize complex example state machines within a few minutes.
Best Paper Award at SOSR'22!Paper (SOSR'22) Code
Scalable bandwidth fairness under network slicing
Maintaining bandwidth fairness across millions of users is challenging: the switch hardware only supports dozens of queues, while software switching running on CPUs incurs extra latency. Each user’s fair share changes constantly as users arrive and leave; slicing poses further challenges as surplus bandwidth from underutilized slice need to be reallocated to users in overutilized slices.
We tackle these challenges and implement approximate slice-level bandwidth fairness, by guessing the fair rates and continually refine the guess using a real-time control loop. To make more accurate guesses, we implement approximated linear interpolation computation entire within the data plane. This allows each update to finish under a millisecond and our system converges to fair rate within 3 milliseconds, 20 times faster than prior state-of-the-art.
To appear in IEEE INFOCOM 2023.Draft
Security and Privacy
Programmable switches can play a more vital role in network security and privacy, for example via actively participating in encryption and authentication protocols. As a starting point, I implemented important cryptography building blocks for high-speed programmable switches.
Implementing AES on programmable switches
We implement the standardized and widely-used AES encryption on the Barefoot Tofino high-throughput programmable switch, using the Scrambled Lookup Tables optimization.
We observe that traditional AES implementations are often optimized for embedded devices that lacks memory space, yet switches have moderate memory size. By building many lookup tables, we trade memory space overhead for shorter execution pipeline length.
Implementing HalfSipHash on programmable switches
Hash function is a vital component of many data plane algorithms and is often used for indexing, fingerprinting, and sampling. The usage of readily available CRC32 as a hash function opens up a major attack surface when processing adversarial traffic. We implement a secure keyed hash function, HalfSiphash, on the Barefoot Tofino high-throughput programmable switch.Paper (SPIN'21 workshop) P4 Code
P4Campus: Campus Network as a P4 testbed
Many researchers find it challenging to deploy their research in real-world networks. Meanwhile, our university campus network faces the same operational challenges as other larger networks, making it an ideal local testbed for networking research projects. We collaborate with Princeton University's campus network operator to deploy various P4-based network applications in the campus network to improve the network's performance, privacy, and security, and at the same time validating novel research ideas using production-grade prototypes. By sharing our experiences, we hope to enable others to replicate a similar setup in their own campus networks.Website Slides Paper (SIGCOMM CCR)
- [SIGCOMM'20] BeauCoup: Answering Many Network Traffic Queries, One Memory Update at a Time. Xiaoqi Chen, Shir Landau-Feibish, Mark Braverman, Jennifer Rexford
- [ToN] Designing Heavy-Hitter Detection Algorithms for Programmable Switches. Ran Ben-Basat, Xiaoqi Chen, Gil Einziger, Ori Rottenstreich
- [CoNEXT'19] Fine-Grained Queue Measurement in the Data Plane. Xiaoqi Chen, Shir Landau Feibish, Yaron Koral, Jennifer Rexford, Ori Rottenstreich, Steven A Monetti, Tzuu-Yi Wang
- IEEE/ACM Transactions on Networking (ToN)
- IEEE Transactions on Network Science and Engineering (TNSE)
- Journal of Network and Computer Applications (JNCA)
- Computer Networks
Shadow Program Committee:
- EuroSys, 2022
- IMC, 2022
Artifact Evaluation Committee:
- ACM SIGCOMM, 2021 & 2022
- ACM CoNEXT, 2021
- IEEE INFOCOM 2020, External Reviewer
- ACM SIGCOMM 2020, Student Volunteer
- ACM/IEEE IPSN 2016 Demo session, Technical Program Committee
Honors & Awards
- Best Paper Award, SOSR 2022: Synthesizing State Machines for Data Planes
- Siebel Scholars, class of 2022
- Finalist, Facebook PhD Fellowship, 2020 (top 4% among 1800+ applicants)
- School of Engineering and Applied Science Award for Excellence, Princeton University, 2020
I love playing badminton and soccer, and I played in the IIIS varsity soccer team at Tsinghua's interdepartmental league.
I enjoy other sports as well, including tennis, table tennis, basketball, swimming and climbing.
Did you know
I mostly use Python in my research project. I enjoy writing in Go and C++ as well.