Transactional Workload Characterization in PostgreSQL and in Oracle
Nikos Hardavellas (hardav@cs.cmu.edu)
Stephen Somogyi (ssomogyi@ece.cmu.edu)
Semester Project for 15-721 (Database Management Systems), Spring
2004
Proposal -
March 31, 2004
Milestone -
April 2, 2004
Report - due April 30, 2004
Database workloads are an important class of applications, responsible
for one third of the symmetric multiprocessor (SMP) server market (as
of 1999). However, they are rarely used by the computer architecture
community for performance evaluations of new algorithms, protocols, and
hardware platforms. Full-scale database performance evaluation requires
large hardware configurations and poses complex hardware and software
tuning challenges. Commercially available database servers routinely
employ 70 to 200 initialization parameters to control the runtime
database management issues. In addition, the operating system presents
numerous configuration alternatives. Although default values are
provided for most of these parameters, often they do not match the
requirements of the intended workloads or they result in
non-representative behavior. Even worse, modern database systems are
very sensitive to hardware configuration choices such as the number of
disks used, data striping and the degree of multithreading allowed. Few
researchers have the resources to construct such large hardware
configurations, or the expertise to tune such complex workload-specific
parameters. The inherent difficulties of evaluating performance using
database workloads have resulted in a flurry of different and sometimes
conflicting results in literature.
In this project we will characterize the performance of OLTP
workloads using PostgreSQL and
Oracle. Prior to performing the
characterization, we must install and tune (individually) both DBMSs
for TPC-C on the target
platforms. Some investigation steps will require detailed hardware
simulation; thus, we must also install and tune through Simics, so that we can leverage the SimFlex simulator to
model the desired hardware. The characterization will be conducted over
four distinct axes. First, we will generate a breakdown of time
spent in different database components (e.g. buffer management, hash
join, index scan) on a per-transaction basis. The second axis of
characterization will be a
breakdown of processor time (e.g.
computation, idle, memory stall, I/O), with non-computation time
further
characterized by hardware component (e.g. caches, hardware
queues, protocol engines). Third, on a per-transaction basis, we will
characterize the breakdown of
time spent in different hardware components. The final axis of
characterization will be a
breakdown of time spent in different database components, but further
broken down by hardware component.