Privacy-Preserving Collaborative Anomaly Detection (thesis)
Unwanted traffic is a major concern in the Internet today. Unwanted traffic includes Denial of Service attacks, worms, and spam. Identifying and mitigating unwanted traffic costs businesses many billions of USD every year. The process of identifying this traffic is called anomaly detection, and Intrusion Detection Systems (IDS’es) are among the most prevalent techniques. IDS’es, such as Snort, allow users to write "rules" that specify the properties of traffic that should be detected and the corrective action to be taken in response. Unfortunately, applying these rules in an online setting can be prohibitively expensive for large networks, such as Tier-1 ISPs, which may have tens of thousands of links and many Gbps of traffic. In the first chapter of this thesis we present a system that leverages machine learning algorithms to detect the same type of unwanted traffic as Snort, but on summarized data for faster processing. Our results demonstrate that this system can effectively learn to classify many Snort rules with a high degree of accuracy.
Unfortunately, distinguishing good traffic from unwanted traffic is challenging even in an offline setting because many types of unwanted traffic traffic, such as network attacks, deliberately mimic the behavior of normal traffic. We therefore propose that the targets of unwanted traffic should collaborate by correlating their attack data, under the assumption that a given malicious host is likely to affect more than one victim over time. That is, the senders of unwanted traffic will use individual computers (i.e., malicious hosts) repeatedly for various nefarious purposes in order to maximize their profits, and this repeated use will leave traces across networks. In the second chapter of this thesis we present a measurement study that quantifies the potential gain from this collaborative anomaly detection. Specifically, using traces from operational networks, we calculate the fraction of detected network anomalies (viz., IP scans, port scans, and DoS attacks) that could have been mitigated if some subset of the victims collaborated by sharing information about past perpetrators.
One major challenge with the proposed collaborative anomaly detection is that the human owner/operators of participating networks are often hesitant to openly share information about the hosts (customers) that use their services. In the third chapter of the thesis we address this problem by proposing and evaluating the efficiency of a novel cryptographic protocol that allows victims to collaborate in a manner that protects their privacy. Our protocol allows participants to submit a set of IP addresses that they suspect might be engaging in unwanted activity, and it returns the set of IP addresses that existed in some fraction of all suspect sets (i.e., threshold set-intersection). The protocol preserves privacy because it never reveals who suspected whom, and a submitted IP address is only revealed when more than n participating networks suspect it. Our implementation of said protocol is able to correlate millions of suspect IP addresses per hour when running on two quad-core machines.