SCALABLE, NETWORK-WIDE TELEMETRY WITH PROGRAMMABLE SWITCHES
Managing modern networks requires collecting and analyzing network traffic from distributed switches in real time, i.e., performing network-wide telemetry. Telemetry systems
must be flexible and fine-grained to support myriad queries about the security, performance,
and reliability of networks. Yet, they must also scale as the number of queries, link speeds,
and the size of the networks increase. Realizing these goals requires balancing the division of labor between high-speed, but resource constrained, network switches and generalpurpose CPUs to support flexible telemetry at scale.
First, we present Sonata, a flexible and scalable network telemetry system that uses the
compute resources of both stream-processing servers and a single Protocol Independent
Switch Architecture (PISA) switch. PISA switches offer both high-speed processing and
limited programmability. We show how to execute Sonata’s high-level queries at line rate
by first compiling them to PISA primitives. Next, we model the resource constraints of
PISA switches to solve an optimization problem that minimizes the load on the stream
processor by executing portions of queries directly in the switch. Sonata can support a
wide range of monitoring queries and reduces the stream processor’s workload by orders
of magnitude over existing telemetry systems.
Second, we present Herd, a system for implementing a subset of Sonata queries distributed across several switches. Herd determines network-wide heavy hitters, i.e., flows
that consist of many more packets than most others, by counting flows at the switches,
without maintaining per-flow state, and probabilistically reporting to a central coordinator.
Based on these reports, the coordinator adapts parameters at each switch based on the spatial locality of the flows. Simulations using packet traces show that our prototype can detect
network-wide heavy hitters accurately with 17% savings in communication overhead and
38% savings in switch state compared to existing approaches. We then present an algorithm
to tune system parameters in order to maximize detection accuracy under switch memory
and bandwidth constraints.
Together, Sonata and Herd provide network operators the ability to execute a set of
network-wide telemetry queries from a single interface that combines the strengths of both
programmable data planes and general-purpose CPUs.