Transactional Workload Characterization in PostgreSQL and in Oracle

 
Nikos Hardavellas (hardav@cs.cmu.edu)
Stephen Somogyi (ssomogyi@ece.cmu.edu)

Semester Project for 15-721 (Database Management Systems), Spring 2004


Proposal - March 31, 2004
Milestone - April 2, 2004
Report - due April 30, 2004



Database workloads are an important class of applications, responsible for one third of the symmetric multiprocessor (SMP) server market (as of 1999). However, they are rarely used by the computer architecture community for performance evaluations of new algorithms, protocols, and hardware platforms. Full-scale database performance evaluation requires large hardware configurations and poses complex hardware and software tuning challenges. Commercially available database servers routinely employ 70 to 200 initialization parameters to control the runtime database management issues. In addition, the operating system presents numerous configuration alternatives. Although default values are provided for most of these parameters, often they do not match the requirements of the intended workloads or they result in non-representative behavior. Even worse, modern database systems are very sensitive to hardware configuration choices such as the number of disks used, data striping and the degree of multithreading allowed. Few researchers have the resources to construct such large hardware configurations, or the expertise to tune such complex workload-specific parameters. The inherent difficulties of evaluating performance using database workloads have resulted in a flurry of different and sometimes conflicting results in literature.

In this project we will characterize the performance of OLTP workloads using PostgreSQL and Oracle. Prior to performing the characterization, we must install and tune (individually) both DBMSs for TPC-C on the target platforms. Some investigation steps will require detailed hardware simulation; thus, we must also install and tune through Simics, so that we can leverage the SimFlex simulator to model the desired hardware. The characterization will be conducted over four distinct axes. First, we will generate a breakdown of time spent in different database components (e.g. buffer management, hash join, index scan) on a per-transaction basis. The second axis of characterization will be a breakdown of processor time (e.g. computation, idle, memory stall, I/O), with non-computation time further characterized by hardware component (e.g. caches, hardware queues, protocol engines). Third, on a per-transaction basis, we will characterize the breakdown of time spent in different hardware components. The final axis of characterization will be a breakdown of time spent in different database components, but further broken down by hardware component.