Dinero Man Page (modified for 18-742)
NAME
dineroIII - cache simulator, version III
SYNOPSIS
dineroIII -b block_size -u unified_cache_size -i
instruction_cache_size -d data_cache_size [ other_options ]
DESCRIPTION
dineroIII is a trace-driven cache simulator that supports
sub-block placement. Simulation results are determined by
the input trace and the cache parameters. A trace is a fin-
ite sequence of memory references usually obtained by the
interpretive execution of a program or set of programs.
Trace input is read by the simulator in din format
(described later). Cache parameters, e.g. block size and
associativity, are set with command line options (also
described later). dineroIII uses the priority stack method
of memory hierarchy simulation to increase flexibility and
improve simulator performance in highly associative caches.
One can simulate either a unified cache (mixed, data and
instructions cached together) or separate instruction and
data caches. This version of dineroIII does not permit the
simultaneous simulation of multiple alternative caches.
dineroIII differs from most other cache simulators because
it supports sub-block placement (also known as sector place-
ment) in which address tags are still associated with cache
blocks but data is transferred to and from the cache in
smaller sub-blocks. This organization is especially useful
for on-chip microprocessor caches which have to load data on
cache misses over a limited number of pins. In traditional
cache design, this constraint leads to small blocks. Unfor-
tunately, a cache with small block devotes much more on-chip
RAM to address tags than does one with large blocks. Sub-
block placement allows a cache to have small sub-blocks for
fast data transfer and large blocks to associate with
address tags for efficient use of on-chip RAM.
Trace-driven simulation is frequently used to evaluating
memory hierarchy performance. These simulations are repeat-
able and allow cache design parameters to be varied so that
effects can be isolated. They are cheaper than hardware
monitoring and do not require access to or the existence of
the machine being studied. Simulation results can be
obtained in many situations where analytic model solutions
are intractable without questionable simplifying assump-
tions. Further, there does not currently exist any gen-
erally accepted model for program behavior, let alone one
that is suitable for cache evaluation; workloads in trace-
driven simulation are represented by samples of real work-
loads and contain complex embedded correlations that syn-
thetic workloads often lack. Lastly, a trace-driven
simulation is guaranteed to be representative of at least
one program in execution.
dineroIII reads trace input in din format from stdin. A din
record is two-tuple label address. Each line of the trace
file must contain one din record. The rest of the line is
ignored so that comments can be included in the trace file.
The label gives the access type of a reference.
0 read data.
1 write data.
2 instruction fetch.
3 escape record (treated as unknown access type).
4 escape record (causes cache flush).
The address is a hexadecimal byte-address between 0 and
ffffffff inclusively.
Cache parameters are set by command line options. Parame-
ters block_size and either unified_cache_size or both
data_cache_size and instruction_cache_size must be speci-
fied. Other parameters are optional. The suffixes K, M and
G multiply numbers by 1024, 1024^2 and 1024^3, respectively.
The following command line options are available:
-b block_size
sets the cache block size in bytes. Must be explicitly
set (e.g. -b16).
-u unified_cache_size
sets the unified cache size in bytes (e.g., -u16K). A
unified cache, also called a mixed cache, caches both
data and instructions. If unified_cache_size is posi-
tive, both instruction_cache_size and data_cache_size
must be zero. If zero, implying separate instruction
and data caches will be simulated, both
instruction_cache_size and data_cache_size must be set
to positive values. Defaults to 0.
-i instruction_cache_size
sets the instruction cache size in bytes (e.g.
-i16384). Defaults to 0 indicating a unified cache
simulation. If positive, the data_cache_size must be
positive as well.
-d data_cache_size
sets the data cache size in bytes (e.g. -d1M).
Defaults to 0 indicating a unified cache simulation.
If positive, the instruction_cache_size must be posi-
tive as well.
-S subblock_size
sets the cache sub-block size in bytes. Defaults to 0
indicating that sub-block placement is not being used
(i.e. -S0).
-a associativity
sets the cache associativity. A direct-mapped cache
has associativity 1. A two-way set-associative cache
has associativity 2. A fully associative cache has
associativity data_cache_size block_size. Defaults to
direct-mapped placement (i.e. -a1).
-r replacement_policy
sets the cache replacement policy. Valid replacement
policies are l (LRU), f (FIFO), and r (RANDOM).
Defaults to LRU (i.e. -rl).
-f fetch_policy
sets the cache fetch policy. Demand-fetch (d), which
fetches blocks that are needed to service a cache
reference, is the most common fetch policy. All other
fetch policies are methods of prefetching. Prefetching
is never done after writes. The prefetch target is
determined by the -p option and whether sub-block
placement is enabled.
d demand-fetch which never prefetches.
a always-prefetch which prefetches after every
demand reference.
m miss-prefetch which prefetches after every
demand miss.
t tagged-prefetch which prefetches after the first
demand miss to a (sub)-block. The next two prefetch
options work only with sub-block placement.
l load-forward-prefetch (sub-block placement only)
works like prefetch-always within a block, but it will
not attempt to prefetch sub-blocks in other blocks.
S sub-block-prefetch (sub-block placement only)
works like prefetch-always within a block except when
references near the end of a block. At this point
sub-block-prefetches references will wrap around within
the current block.
Defaults to demand-fetch (i.e. -fd).
-p prefetchdistance
sets the prefetch distance in sub-blocks if sub-block
placement is enabled or in blocks if it is not. A
prefetch_distance of 1 means that the next sequential
(sub)-block is the potential target of a prefetch.
Defaults to 1 (i.e. -p1).
-P abort_prefetch_percent
sets the percentage of prefetches that are aborted.
This can be used to examine the effects of data refer-
ences blocking prefetch references from reaching a
shared cache. Defaults to no prefetches aborted (i.e.
-P0).
-w write_policy
selects one of two the cache write policies. Write-
through (w) updates main memory on all writes. Copy-
back (c) updates main memory only when a dirty block is
replaced or the cache is flushed. Defaults to copy-
back (i.e. -wc)
-A write_allocation_policy
selects whether a (sub)-block is loaded on a write
reference. Write-allocate (w) causes (sub)-blocks to
be loaded on all references that miss. Non-write-
allocate (n) causes (sub)-blocks to be loaded only on
non-write references that miss. Defaults to write-
allocate (i.e. -Aw).
-D debug_flag
used by implementor to debug simulator. A debug_flag
of 0 disables debugging; 1 prints the priority stacks
after every reference; and 2 prints the priority
stacks and performance metrics after every reference.
Debugging information may be useful to the user to
understand the precise meaning of all cache parameter
settings. Defaults to no-debug (i.e. -D0).
-o output_style
sets the output style. Terse-output (0) prints results
only at the end of the simulation run. Verbose-output
(1) prints results at half-million reference increments
and at the end of the simulation run. Bus-output (2)
prints an output record for every memory bus transfer.
Bus_and_snoop-output (3) prints an output record for
every memory bus transfer and clean sub-block that is
replaced. Defaults to terse-output (i.e. -o0). For
bus-output, each bus record is a six-tuple:
BUS2 are four literal characters to start bus record
access is the access type ( r for a bus-read, w for a
bus-write, p for a bus-prefetch, s for snoop activity
(output style 3 only).
size is the transfer size in bytes
address is a hexadecimal byte-address between 0 and
ffffffff inclusively
reference_count is the number of demand references
since the last bus transfer
instruction_count is the number of demand instruction
fetches since the last bus transfer
-Z skip_count
sets the number of trace references to be skipped
before beginning cache simulation. Defaults to none
(i.e. -Z0).
-z maximum_count
sets the maximum number of trace references to be pro-
cessed after skipping the trace references specified by
skip_count. Note, references generated by the simula-
tor not read from the trace (e.g. prefetch references)
are not included in this count. Defaults to 10 million
(i.e. -z10000000).
-Q flush_count
sets the number of references between cache flushes.
Can be used to crudely simulate multiprogramming.
Defaults to no flushing (i.e. -Q0).
-L mult-level cache implementation
sets the flag to indicate that dinero is to be run with
the multi-level cache implementation. Dinero will out-
put the addresses and labels of cache misses along with
the results (needs to be sorted). Enables piping of
multiple levels of caches. Defaults to single-level
cache.
-W address_length
lets dinero know the input address length for purposes
of calculating bus traffic. Must be explicitly set
(e.g. -W16). Address_length must be a factor of block_
size and a multiple of bytes (i.e. 8, 16, 32, etc.)
FILES
doc.h contains additional programmer documentation.
SEE ALSO
Mark D. Hill and Alan Jay Smith, Experimental Evaluation of
On-Chip Microprocessor Cache Memories, Proc. Eleventh Inter-
national Symposium on Computer Architecture, June 1984, Ann
Arbor, MI.
Alan Jay Smith, Cache Memories, Computing Surveys , 14-3,
September 1982.
BUGS
Not all combination of options have been thoroughly tested.
AUTHOR
Mark D. Hill
Computer Sciences Dept.
1210 West Dayton St.
Univ. of Wisconsin
Madison, WI 53706
markhill@cs.wisc.edu
CHANGES
Modified by Mo-Hsi Edwin Su (8/5/97)
For 18-742 (Fall 97) Advanced Computer Architecture
Professor Philip Koopman