Data-Intensive Experimentation in Cyber Security

At Symantec Research Labs, I built the Worldwide Intelligence Network Environment (WINE), a platform for experimenting with Big Data techniques in cyber security. Playing with new Big Data ideas is challenging for many researchers and engineers, because of the need for specialized computing infrastructures and for large data sets that are representative of real-world problems. This is particularly true in security, which deals with sensitive artifacts (e.g., malware, private information) that are difficult to share publicly and where the data collected last month may not be relevant for today's attacks, because of the ongoing arms race with cyber criminals. To fill this gap, WINE provides a platform for conducting data analysis at scale, using field data collected at Symantec (e.g., anti-virus telemetry, file downloads), and promotes rigorous experimental methods [BADGERS 2011, CSET 2011, EDCC 2012]. WINE loads, samples and aggregates multiple data feeds, originating from millions of hosts around the world, and keeps them up-to-date. This allows researchers to conduct open ended and reproducible experiments, e.g. for validating new ideas on real-world data, for conducting empirical studies or for comparing the performance of different algorithms against reference data sets archived in WINE. The research opportunities provided by this new approach are best illustrated through a couple of examples: measuring the duration of zero-day attacks [CCS 2012] and evaluating the real-world impact of security technologies [LEET 2012].

Empirical Study of Zero-Day Attacks

A zero-day attack exploits one or more vulnerabilities that have not been disclosed publicly. Knowledge of such vulnerabilities gives cyber criminals a free pass to attack any target, from Fortune 500 companies to millions of consumer PCs around the world, while remaining undetected (recent examples include Stuxnet and the Elderwood project). The impact of zero-day attacks has been debated for more than a decade but their duration and prevalence in the real world remained unknown, because zero-day attacks are rare events that are unlikely to be observed in honeypots or in lab experiments. Instead, studying zero-day attacks requires the analysis of Internet-scale data. With Leyla Bilge, I used WINE to measure the duration of 18 zero-day attacks, from field data collected on 11 million hosts worldwide [CCS 2012]. These attacks lasted between 19 days and 30 months, with a median of 8 months and an average of approximately 10 months (because we take the the first vulnerability exploit, recorded in the field and observable in WINE, as the starting point of the attack, these numbers represent lower bounds rather than precise estimations). This study also identified 11 vulnerabilities that were not previously known to have been employed in zero-day attacks.

Paper: [CCS 2012]

News features:

Evaluating Operating-System Security

However, while the volume of malware is growing (over 1 million new malware variants were detected each day in 2011), over the past ten years operating-system vendors have introduced a number of security technologies that aim to make exploits harder and to reduce the attack surface of the platform. WINE also provides clues about the impact of these security technologies on the cyber-security arms race. With Petros Efstathopoulos, I analyzed anti-virus telemetry received from 5 million hosts to evaluate the factors that influence the production of malware [LEET 2012]. Preliminary results suggest that the number of distinct virus families observed in the field is correlated with factors that describe the target of opportunity for cyber criminals, such as a platform's deployment size. This means that most viruses target recent operating system versions, which run on many hosts around the world, in spite of the state-of-the-art security technologies included in these versions (this trend is visible for both Windows and Mac OS). This data can also help measure the effectiveness of security technologies in the field.

Paper: [LEET 2012]

News features:

The WINE platform and data sets are available to the research community for additional experiments. For more information about accessing WINE, please visit
Tutorial: [CCS 2011].


Conference and Workshop Publications