About me

My name is Soila Pertet Kavulya. I graduated with my PhD Student in Electrical and Computer Engineering at Carnegie Mellon University in May 2013. My advisor was Professor Priya Narasimhan.

My research focuses on diagnosis of problems in distributed systems. I have applied my diagnosis algorithms to MapReduce systems, Voice-over-IP systems, Internet Services, automotive systems, and group communication systems.



Google+ Google+
Google Scholar Google Scholar Citations
Twitter Twitter
Email spertet@ece.cmu.edu

Thesis: Automated diagnosis of chronic problems in production systems

[Thesis document] [Slides]

Large distributed systems are susceptible to chronic performance problems where the system still works, but with degraded performance. Chronic performance problems occur intermittently or affect a subset of end-users.

This dissertation presents a top-down diagnostic framework for diagnosing chronic performance problems. The framework comprises of four components. First, an extensible log-analysis framework that extracts end-to-end causal flows using common white-box (application) logs in the production system; these end-to-end flows capture the user's experience with the system. Second, anomaly-detection tools exploit heuristics and a peer-comparison approach to label each end-to-end flow as successful or failed. Third, statistical diagnostic tool combines white-box metrics with black-box metrics (e.g., CPU usage) to localize the source of the problem by identifying attributes that are more correlated with failed flows than successful ones. Fourth, a visualization tool that uses peer-comparison to highlight anomalous nodes in a parallel-computing cluster.

The diagnostic framework has been used to localize real incidents at an academic cloud-computing cluster that runs the Hadoop parallel-processing framework, and a production Voice-over-IP system at a major Internet Services Provider.

Selected Publications