I develop network measurement systems and techniques to empirically study security and privacy threats that millions of consumers face every day.
As many of these threats are hidden from the public, I build systems to directly engage with the threats from the perspectives of the users and adversaries — both in the lab and in the wild. Based on my data-driven insights, I develop countermeasures for consumers or with industry and government partners.
My research covers two broad areas:
I was contacted by various government agencies — e.g., FBI, FTC, and New York State Attorney General — to help with the investigations of a number of security and privacy threats related to my research. Also, my work was covered in multiple media outlets. Examples include:
Abstract: The proliferation of smart home devices has created new opportunities for empirical research in ubiquitous computing, ranging from security and privacy to personal health. Yet, data from smart home deployments are hard to come by, and existing empirical studies of smart home devices typically involve only a small number of devices in lab settings. To contribute to data-driven smart home research, we crowdsource the largest known dataset of labeled network traffic from smart home devices from within real-world home networks. To do so, we developed and released IoT Inspector, an open-source tool that allows users to observe the traffic from smart home devices on their own home networks. Between April 10, 2019 and January 21, 2020, 5,404 users installed IoT Inspector, allowing us to collect labeled network traffic from 54,094 smart home devices. At the time of publication, IoT Inspector is still gaining users and collecting data from more devices. We demonstrate how this data enables new research into smart homes through two case studies focused on security and privacy. First, we find that many device vendors, including Amazon and Google, use outdated TLS versions and send unencrypted traffic, sometimes to advertising and tracking services. Second, we discover that smart TVs from at least 10 vendors communicated with advertising and tracking services. Finally, we find widespread cross-border communications, sometimes unencrypted, between devices and Internet services that are located in countries with potentially poor privacy practices. To facilitate future reproducible research in smart homes, we will release the IoT Inspector data to the public.
Abstract: The number of Internet connected TV devices has grown significantly in recent years, especially Over-the-Top ("OTT") streaming devices, such as Roku TV and Amazon Fire TV .OTT devices offer an alternative to multi-channel television subscription services and are often monetized through behavioral advertising.To shed light on the privacy practices of such platforms, we developed a system that can automatically download OTT apps (also known as channels) and interact with them while intercepting the network traffic and perform best-effort TLS interception. We used this smart crawler to visit more than 2,000 channels on two popular OTT platforms, namely Roku and Amazon Fire TV. Our results show that tracking is pervasive on both OTT platforms and traffic to known trackers is present on 69% of Roku channels and 89% of Amazon Fire TV channels. We also discover widespread practice of collecting and transmitting unique identifiers including WiFi MAC addresses and SSIDs. Moreover, a large number of trackers send data over unencrypted channels, potentially exposing it to malicious eavesdroppers. Finally we show that the countermeasures available for these devices, such as limiting ad tracking options and adblocking, are practically ineffective. Based on our findings, we make recommendations for researchers, regulators, policy makers, platform and app developers.
See also: Blog Post
Abstract: The proliferation of smart home Internet of Things (IoT) devices presents unprecedented challenges for preserving privacy within the home. In this paper, we demonstrate that a passive network observer (e.g., an Internet service provider) can infer private in-home activities by analyzing Internet traffic from commercially available smart home devices even when the devices use end-to-end transport-layer encryption. We evaluate common approaches for defending against these types of traffic analysis attacks, including firewalls, virtual private networks, and independent link padding, and find that none sufficiently conceal user activities with reasonable data overhead. We develop a new defense, "stochastic traffic padding" (STP), that makes it difficult for a passive network adversary to reliably distinguish genuine user activities from generated traffic patterns designed to look like user interactions. Our analysis provides a theoretical bound on an adversary's ability to accurately detect genuine user activities as a function of the amount of additional cover traffic generated by the defense technique.
See also: Blog Post
Abstract: We consider the problem of regulating products with negative externalities to a third party that is neither the buyer nor the seller, but where both the buyer and seller can take steps to mitigate the externality. The motivating example to have in mind is the sale of Internet-of-Things (IoT) devices, many of which have historically been compromised for DDoS attacks that disrupted Internet-wide services such as Twitter Brian Krebs (2017); Nicky Woolf (2016). Neither the buyer (i.e., consumers) nor seller (i.e., IoT manufacturers) was known to suffer from the attack, but both have the power to expend effort to secure their devices. We consider a regulator who regulates payments (via fines if the device is compromised, or market prices directly), or the product directly via mandatory security requirements.
Both regulations come at a cost—implementing security requirements increases production costs, and the existence of fines decreases consumers’ values—thereby reducing the seller’s profits. The focus of this paper is to understand the efficiency of various regulatory policies. That is, policy A is more efficient than policy B if A more successfully minimizes negatives externalities, while both A and B reduce seller’s profits equally.
We develop a simple model to capture the impact of regulatory policies on a buyer’s behavior. In this model, we show that for homogeneous markets—where the buyer’s ability to follow security practices is always high or always low—the optimal (externality-minimizing for a given profit constraint) regulatory policy need regulate only payments or production. In arbitrary markets, by contrast, we show that while the optimal policy may require regulating both aspects, there is always an approximately optimal policy which regulates just one.
See also: Project Website
Abstract: Many Internet of Things (IoT) devices have voice user interfaces (VUIs). One of the most popular VUIs is Amazon's Alexa, which supports more than 47,000 third-party applications ("skills"). We study how Alexa's integration of these skills may confuse users. Our survey of 237 participants found that users do not understand that skills are often operated by third parties, that they often confuse third-party skills with native Alexa functions, and that they are unaware of the functions that the native Alexa system supports. Surprisingly, users who interact with Alexa more frequently are more likely to conclude that a third-party skill is native Alexa functionality. The potential for misunderstanding creates new security and privacy risks: attackers can develop third-party skills that operate without users' knowledge or masquerade as native Alexa functions. To mitigate this threat, we make design recommendations to help users distinguish native and third-party skills.
Abstract: In this paper, we present two web-based attacks against local IoT devices that any malicious web page third-party script can perform, even when the devices are behind NATs. In our attack scenario, a victim visits the attacker’s website, which contains a malicious script that communicates with IoT devices on the local network that have open HTTP servers. We show how the malicious script can circumvent the same-origin policy by exploiting error messages on the HTML5 MediaError interface or by carrying out DNS rebinding attacks. We demonstrate that the attacker can gather sensitive information from the devices (e.g., unique device identifiers and precise geolocation), track and profile the owners to serve ads, or control the devices by playing arbitrary videos and rebooting. We propose potential countermeasures to our attacks that users, browsers, DNS providers, and IoT vendors can implement.
Abstract: Ransomware is a type of malware that encrypts the files of infected hosts and demands payment, often in a cryptocurrency such as Bitcoin. In this paper, we create a measurement framework that we use to perform a large-scale, two-year, end-to-end measurement of ransomware payments, victims, and operators. By combining an array of data sources, including ransomware binaries, seed ransom payments, victim telemetry from infections, and a large database of Bitcoin addresses annotated with their owners, we sketch the outlines of this burgeoning ecosystem and associated third-party infrastructure. In particular, we trace the financial transactions, from the moment victims acquire bitcoins, to when ransomware operators cash them out. We find that many ransomware operators cashed out using BTC-e, a now-defunct Bitcoin exchange. In total we are able to track over $16 million in likely ransom payments made by 19,750 potential victims during a two-year period. While our study focuses on ransomware, our methods are potentially applicable to other cybercriminal operations that have similarly adopted Bitcoin as their payment channel.
Abstract: Digital currencies have flourished in recent years, buoyed by the tremendous success of Bitcoin. These blockchain-based currencies, called altcoins, are associated with a few thousand to millions of dollars of market capitalization. Altcoins have attracted enthusiasts who enter the market by mining or buying them, but the risks and rewards could potentially be significant, especially when the market is volatile. In this work, we estimate the potential profitability of mining and speculating 18 altcoins using real-world blockchain and trade data. Using opportunity cost as a metric, we estimate the mining cost for an altcoin with respect to a more popular but stable coin. For every dollar invested in mining or buying a coin, we compute the potential returns under various conditions, such as time of market entry and hold positions. While some coins offer the potential for spectacular returns, many follow a simple bubble-and-crash scenario, which highlights the extreme risks—and potential gains—in altcoin markets.
Abstract: Sites for online classified ads selling sex are widely used by human traffickers to support their pernicious business. The sheer quantity of ads makes manual exploration and analysis unscalable. In addition, discerning whether an ad is advertising a trafficked victim or a independent sex worker is a very difficult task. Very little concrete ground truth (i.e., ads definitively known to be posted by a trafficker) exists in this space. In this work, we develop tools and techniques that can be used separately and in conjunction to group sex ads by their true owner (and not the claimed author in the ad). Specifically, we develop a machine learning classifier that uses stylometry to distinguish between ads posted by the same vs. different authors with 96% accuracy. We also design a linking technique that takes advantage of leakages from the Bitcoin mempool, blockchain and sex ad site, to link a subset of sex ads to Bitcoin public wallets and transactions. Finally, we demonstrate via a 4-week proof of concept using Backpage as the sex ad site, how an analyst can use these automated approaches to potentially find human traffickers.
Abstract: In this paper, we investigate a new form of blackhat search engine optimization that targets local listing services like Google Maps. Miscreants register abusive business listings in an attempt to siphon search traffic away from legitimate businesses and funnel it to deceptive service industries---such as unaccredited locksmiths---or to traffic-referral scams, often for the restaurant and hotel industry. In order to understand the prevalence and scope of this threat, we obtain access to over a hundred-thousand business listings on Google Maps that were suspended for abuse. We categorize the types of abuse affecting Google Maps; analyze how miscreants circumvented the protections against fraudulent business registration such as postcard mail verification; identify the volume of search queries affected; and ultimately explore how miscreants generated a profit from traffic that necessitates physical proximity to the victim. This physical requirement leads to unique abusive behaviors that are distinct from other online fraud such as pharmaceutical and luxury product scams.
See also: Slides
Abstract: In this paper, we present an empirical study of a recent spam campaign (a “stress test”) that resulted in a DoS attack on Bitcoin. The goal of our investigation being to understand the methods spammers used and impact on Bitcoin users. To this end, we used a clustering based method to detect spam transactions. We then validate the clustering results and generate a conservative estimate that 385,256 (23.41 %) out of 1,645,667 total transactions were spam during the 10 day period at the peak of the campaign. We show the impact of increasing non-spam transaction fees from 45 to 68 Satoshis/byte (from $0.11 to $0.17 USD per kilobyte of transaction) on average, and increasing delays in processing non-spam transactions from 0.33 to 2.67 h on average, as well as estimate the cost of this spam attack at 201 BTC (or $49,000 USD). We conclude by pointing out changes that could be made to Bitcoin transaction fees that would mitigate some of the spam techniques used to effectively DoS Bitcoin.
Abstract: At the current stratospheric value of Bitcoin, miners with access to significant computational horsepower are literally printing money. For example, the first operator of a USD $1,500 custom ASIC mining platform claims to have recouped his investment in less than three weeks in early February 2013, and the value of a bitcoin has more than tripled since then. Not surprisingly, cybercriminals have also been drawn to this potentially lucrative endeavor, but instead are leveraging the resources available to them: stolen CPU hours in the form of botnets. We conduct the first comprehensive study of Bitcoin mining malware, and describe the infrastructure and mechanism deployed by several major players. By carefully reconstructing the Bitcoin transaction records, we are able to deduce the amount of money a number of mining botnets have made.
I am currently a postdoctoral fellow at Princeton University advised by Prof. Nick Feamster (who recently moved to University of Chicago). I am affiliated with Princeton's Center for Information Technology Policy and Department of Computer Science.
I obtained my PhD in Computer Science from University of California, San Diego, advised by Prof. Alex C. Snoeren and Prof. Kirill Levchenko (who recently moved to UIUC). My PhD dissertation uses cryptocurrencies to measure financial activities of malicious actors and to uncover potential identities of these actors.
I graduated from Williams College (Massachusetts) with a BA in Computer Science, advised by Prof. Jeannie Albrecht. At Williams, I also directed a series of Chinese cooking shows on Williamstown Community Television.
One of my long-term collaborators is Momo (pictured below), who constantly travels with me for work and for leisure.