Vyas Sekar | Project DataFuel

Our conversations with leading enterprise AI vendors across market verticals (e.g., security, telemetry, finance) tell us that at every step along the way, lack of access to realistic and diverse data from multiple deployments hampers innovation; e.g., products trained on data not representative of customer environment, there is no way to quantitatively assess products; machine learning workflows experiences data drift, and product feedback is not quantitative. The result today is poor products, lack of transparency, lots of effort in debugging/reproduction/resolution, and impossibility to share insights across customers.

As part of the DataFuel project, we have been leading research on demonstrating the feasibility of using synthetic data using Generative Adversarial Networks (GANs) to address these pain points for various tasks (e.g., telemetry, anomaly detection, model training). We have identified and addressed key fidelity, scalability, and privacy challenges and tradeoffs in existing GAN-based approaches. By synthesizing domain-specific insights with recent advances in machine learning and privacy, we identify design choices to tackle these challenges.

Code on github

SIGCOMM

Practical GAN-based Synthetic IP Header Trace Generation using NetShare

Yin, Yucheng, Lin, Zinan, Jin, Minhao, Fanti, Giulia, and Sekar, Vyas

In SIGCOMM 2022
AAAI

RareGAN: Generating Samples for Rare Classes

Lin, Zinan, Liang, Hao, Fanti, Giulia, and Sekar, Vyas

In AAAI 2022
ICML

Pareto GAN: Extending the Representational Power of GANs to Heavy-Tailed Distributions

Huster, Todd, Cohen, Jeremy, Lin, Zinan, Chan, Kevin, Kamhoua, Charles, Leslie, Nandi O., Chiang, Cho-Yu, and Sekar, Vyas

In Proc. ICML 2021

PDF
AISTATS

On the Privacy Properties of GAN-generated Samples

Lin, Zinan, Fanti, Giulia, and Sekar, Vyas

In Proc. AISTATS 2021

PDF
IMC

Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions

Lin, Zinan, Jain, Alankar, Wang, Chen, Fanti, Giulia C., and Sekar, Vyas

In IMC ’20: ACM Internet Measurement Conference, Virtual Event, USA, October 27-29, 2020 2020

PDF
arxiv

Why Spectral Normalization Stabilizes GANs: Analysis and Improvements

Lin, Zinan, Sekar, Vyas, and Fanti, Giulia C.

CoRR 2020