Rajeev Gandhi Systems Faculty
ECE Department
Carnegie Mellon University
PA 15213

Tel: 412-268-4922
Office: HH B210

Carnegie Mellon Links:
ECE Department
My Andrew
Enrollment Services
Search Carnegie Mellon:



I joined Carnegie Mellon University in 2003 as a Systems Faculty with the Electrical and Computer Engineering Department and the Information Networking Institute. I was previously a Research Staff Member with Motorola's Broadband Communications Division in San Diego, CA, where I was involved in the H.264 video-compression standardization activity. I received a Motorola Outstanding Performance award in 2002 in recognition of my contributions to global standardization activities. Prior to this, I received my Ph.D. in March 2000 from the University of California, Santa Barbara and my B.Tech. degree from IIT Bombay in 1994.


My research interests are in the area of problem diagnosis or fingerpointing in large-scale distributed systems. Problem diagnosis involves instrumenting a given system to gather meaningful data, and analyzing the collected data to detect the source or even the root cause of the problems in the system. Fingerpointing is a challenging problem because the distributed nature of processing/computation can cause the problem to affect the behavior of all the nodes in the system. We are currently working on identifying performance problems in MapReduce systems such as Hadoop, and file systems such as PVFS, Lustre, BFS and CoreFS. Our current fingerpointing algorithms use black-box data and/or white-box data to fingerpoint a faulty node in Hadoop and the filesystems. My current research projects include the following:

  • Problem Diagnosis in PVFS/Lustre: Automatically diagnosing performance problems in parallel file systems by identifying, gathering and analyzing either OS-level black-box performance metrics or system call attributes across parallel file systems.
  • Kahuna: Diagnosing performance problems in Hadoop by comparing OS-level performance metrics and Hadoop's log statistics across all the nodes of a cluster to fingerpoint a faulty node.
  • SALSA: Analyzing Logs as StAte machines: SALSA examines Hadoop logs to derive a state-machine view of the system's execution along with control-flow, data-flow models and related statistics. The state-machine view of Hadoop is then used for failure diagnosis and visualizing the Hadoop's distributed behavior.
  • Gumshoe: Failure diagnosis in distributed systems through the application of statistical anomaly-detection algorithms, machine-learning techniques such as clustering, etc.

I am fortunate to work with talented students such as Jiaqi Tan , Soila Kavulya, Michael Kasick and Xinghao Pan. I am also affiliated with the Center for Sensed Critical Infrastructure Research (CenSCIR) and Parallel Data Lab (PDL) at CMU.


  • Visual, Log-based Causal Tracing for Performance Debugging of MapReduce Systems
    Jiaqi Tan, Soila Kavulya, Rajeev Gandhi, Priya Narasimhan. to be presented at IEEE International Conference on Distributed Computing Systems (ICDCS), Genoa, Italy, Jun 2010
  • An Analysis of Traces from a Production MapReduce Cluster
    Soila Kavulya, Jiaqi Tan, Rajeev Gandhi, Priya Narasimhan, to be presented at IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Melbourne, Australia, May 2010
  • Kahuna: Problem Diagnosis for MapReduce-Based Cloud Computing Environments
    Jiaqi Tan, Xinghao Pan, Soila Kavulya, Rajeev Gandhi and Priya Narasimhan, to be presented at IEEE/IFIP Network Operations and Management Symposium (NOMS), Osaka, Japan (April 2010)
  • Black-Box Diagnosis in Parallel File Systems
    Michael P. Kasick, Jiaqi Tan, Rajeev Gandhi and Priya Narasimhan, to be presented at USENIX Conference on File and Storage Technologies (FAST), San Jose, CA (Feb 2010)
  • System-Call Based Problem Diagnosis for PVFS
    Michael Kasick, Keith Bare, Eugene Marinelli, Rajeev Gandhi and Priya Narasimhan, Fifth Workshop on Hot Topics in System Dependability (HotDep), Lisbon, Portugal, June 2009

The list of all my publications can be found here.


I teach the Fundamentals of Embedded Systems (18-342/14-642) course at Carnegie Mellon University. This practical, hands-on course introduces students to the basic building-blocks and the underlying scientific principles of embedded systems. The course covers both the hardware and software aspects of embedded processor architectures, along with operating system fundamentals, such as virtual memory, concurrency, task scheduling and synchronization. Through a series of laboratory projects involving state-of-the-art processors, students learn to understand implementation details and to write assembly-language and C programs that implement core embedded OS functionality, and that control/debug features such as timers, interrupts, serial communications, flash memory, device drivers and other components used in typical embedded applications. Relevant topics, such as optimization, profiling, and real-time operating systems are also covered.


  • Co-inventor, Frequency coefficient scanning paths for coding digital video content. United States Patent: 7088867. August 2006.
  • Co-inventor, Macroblock level adaptive frame/field coding for digital video content. United States Patent: 6980596. December 2005.