Project Proposal:

Transactional Workload Characterization in PostgreSQL and in Oracle

 
Nikos Hardavellas (hardav@cs.cmu.edu)
Stephen Somogyi (ssomogyi@ece.cmu.edu)

March 31, 2004

Project Home Page
 

Overview

Database workloads are an important class of applications, responsible for one third of the symmetric multiprocessor (SMP) server market (as of 1999). However, they are rarely used by the computer architecture community for performance evaluations of new algorithms, protocols, and hardware platforms. Full-scale database performance evaluation requires large hardware configurations and poses complex hardware and software tuning challenges. Commercially available database servers routinely employ 70 to 200 initialization parameters to control the runtime database management issues. In addition, the operating system presents numerous configuration alternatives. Although default values are provided for most of these parameters, often they do not match the requirements of the intended workloads or they result in non-representative behavior. Even worse, modern database systems are very sensitive to hardware configuration choices such as the number of disks used, data striping and the degree of multithreading allowed. Few researchers have the resources to construct such large hardware configurations, or the expertise to tune such complex workload-specific parameters. The inherent difficulties of evaluating performance using database workloads have resulted in a flurry of different and sometimes conflicting results in literature.

In this project we aim to characterize the performance of OLTP workloads using PostgreSQL and Oracle. The characterization will be divided into four distinct stages.

Characterization Methodology

The first axis of (evil) characterization will be a breakdown of time spent in different database components on a per-transaction basis (e.g. buffer pool management, hashing, index traversal, joins, waiting on locks). With this we will assess the relative importance of different DMBS components for an OLTP workload. Already existing statistics may give us some of the required information.

The second axis of (even more evil) characterization will be a breakdown of time from the point of view of the processor (e.g. computation, idle, memory stall, I/O). Miss time will be further characterized by hardware component (e.g. L1 cache, L2 cache, hardware queues, protocol engines, remote memory access). This will characterize the overall elapsed time of the workload in the presence of latency-hiding techniques used by the processor or the operating system (e.g. context switching).

The third axis of (very evil) characterization will be a breakdown of time spent in hardware components, but on a per-transaction basis. The challenge in this will be to meaningfully break down transaction elapsed time – that is, to characterize the critical path of transactions. To be accurate, this must include time for memory accesses and I/O, even if these latencies are overlapped with other computation (i.e. due to other transactions). We intend to investigate the behavior of the DBMS on a per-transaction basis because it has been shown that OLTP is latency sensitive. Assessing the bottlenecks of a single transaction will provide intuition on how to improve its performance and allow us to achieve a much higher OLTP throughput, without relying on latency-hiding techniques.

The fourth axis of (extremely evil!) characterization will be a breakdown of time spent in different database components, but further broken down by hardware component. This will allow us to identify the bottlenecks specific to each database operation. For example, sequential scan might be I/O-bound, while aggregation over a clustered index might be memory- or network-bound.

These axes of characterization have different requirements. Detailed hardware modeling requires simulation (through Simics/SimFlex). Access to source code of the DBMS is required to identify transactions and to annotate which database component is active at any given time. The following table summarizes the proposed characterizations:

Axis
Time Breakdown
Platorm
DBMS
1
Per-transaction DBMS component breakdown real hardware PostgreSQL, Oracle
2
Overall hardware breakdown simulation PostgreSQL, Oracle
3
Per-transaction hardware breakdown simulation PostgreSQL
4
Per-DBMS-component hardware breakdown simulation PostgreSQL

Project Goals

The goals of this project are:
  1. Port the TPCC kit’s backend to interface with eSql/C for Oracle.
  2. Tune TPCC on Oracle running on Solaris.
  3. Tune TPCC on PostgreSQL running on Solaris.
  4. Characterize the performance of both DBMSs from the point of view of the DBMS (Axis 1).
  5. Characterize the performance of both DBMSs from the point of view of the processor (Axis 2).
  6. Compare and contrast the behavior of Oracle and PostgreSQL.

The secondary goals (i.e. wish list, if time allows us) of this project are:
  1. Compare the performance of DB2 to Oracle and PostgreSQL.
  2. Compare the performance of PostgreSQL on Solaris and Linux. PostgreSQL is not optimized for any particular system, and the comparison will provide us useful insights into the dependence of OLTP performance on operating systems.
  3. Characterize the performance of PostgreSQL from the point of view of the processor, on a per-transaction basis (Axis 3).
  4. Characterize performance from the point of view of the processor, with knowledge of what the DBMS is doing (Axis 4).
  5. Characterize performance from the point of view of the operating system (e.g. TLB misses, kernel time, kernel synchronization, idle time due to context switches, user time).

Schedule

Week
Tasks
3/29
Obtain, install, and test Oracle. Install and tune TPC-C on PostgreSQL running on Solaris/Linux.
4/5
Adapt TPC-C kit for Oracle. Tune TPC-C on Oracle. Finalize tuning TPC-C on PostgreSQL running on Solaris. Axis 1 characterization of PostgreSQL.
4/12
Finalize tuning TPC-C on Oracle. Axis 1 characterization of Oracle. Install PostgreSQL on Simics/SimFlex. Warmup and obtain checkpoints. Axis 2 characterization of PostgreSQL.
4/19
Install Oracle on Simics/SimFlex. Warmup and obtain checkpoints. Axis 2 characterization of Oracle.
4/26
Oracle vs. PostgreSQL comparison. Final report and poster presentation.

Milestone

By Friday, April 2, we plan to have installed and tuned TPC-C on PostgreSQL running on Solaris and Linux.

Resources

We already have PostgreSQL 7.4.2. We ordered Oracle 8i and we will have it by Friday April 2 (it is out of stock at the CMU computer store/ACIS and had to contact the vendor). We will run both DBMSs on a Sparc/Solaris platform. We have such a system in the scotch cluster. We will place the database data and log on different disks, which are available.

Getting Started

We already have PostgreSQL 7.4.2 installed and somewhat tuned on Linux. We also have some knowledge of its internals. The TPC-C kit has been ported to 7.4.2 (the original kit was adapted to PostgreSQL 7.3). This experience will help us speed through the installation. We also have some experience on the behavior and tuning process for Oracle.