Homework 2
Due Wednesday September 9, 1998 |
|
Problem 1:
1) You are a DARPA program manager and someone submits a proposal for a
multi-chip module for Radar signal processing. In order to provide
sufficient computational power you need to process a 2048 x 2048 (=
4Mega-data-element) array of 64-bit data elements once every 100 msec.
a) Draw a "plumbing diagram" for this system and label
bandwidths for each piece of "pipe" under the following
assumptions:
- There is one multi-chip module, with 8 identical CPUs bonded to it.
- Each CPU processes exactly one-eighth of the data array every 100
msec period, with no data shared among CPUs. (So, each CPU touches all
words in its one-eighth the array, and only those words.) Assume
instruction accesses have a 0% cache miss ratio.
- Each CPU has a cache with a 10% miss rate, and accesses each 8-byte
word of data within its one-eight of the array 20 times per 100 msec
interval.
- There is a 64-bit data bus going from the multi-chip module to main
memory, which can sustain a transfer rate of one piece of 64-bit data
every clock cycle, operating at 50 MHz.
- There are two banks of ideal memory (no inter-bank conflicts), each
of which can complete a 64-bit word transfer every 20 ns.
b) What is the bottleneck to this system in terms of bandwidth, and how "big"
should it be to just barely eliminate the bottleneck?
c) What is the maximum acceptable cache miss rate to eliminate the
problem of the bandwidth bottleneck observed in part (b) of this question?
Problem 2:
This is an exploration of memory bandwidth versus latency. Assume that
you have a processor which takes 3 clock cycles to access cache.
- Calculate and plot curves as follows. Show a table with all
calculated values and an example calculation (for example, if you use
Excel, include the formula for a representative cell in the
spreadsheet):
- X axis is number of total clock cycles to access main memory.
Plot points for every 3 clock cycles on a linear scale up to 48
clocks. (i.e., 3, 6, 9, ... , 45, 48). Note that these
numbers include the time accessing cache, so a time of 3 means that
all of memory is cache memory.
- Y axis is program execution time in clock cycles. Assume that the
program has a total of 1 million accesses made to memory (most to
cache, but some miss in cache and end up referencing main memory).
- Plot five curves assuming for each curve a different percentage
of accesses miss in cache and go to main memory: 1%, 5%, 15%, 25%,
35%. For example, the 1% curve would assume that 99% of the 1
million accesses take 3 clock cycles (for cache), and 10,000
accesses (1% of 1 million) take the number of clock cycles for each
point plotted on the X axis.
- Example: for the 1% curve and 15 clock cycles, the total number
of clock cycles would be:
990000 * 3 + 10000 * 15 = 3120000
You are encouraged to compute these numbers and plot them with a
program such as matlab
or a spreadsheet such as Excel (an example
spreadsheet for this problem is provided for your
convenience).
- At about how many main memory access clocks (what X value) does the
program run half as fast it does for a memory access time of 3 clocks (X
value of 3 -- which is equivalent to all memory being as fast as cache)?
Show how you compute all five answers. Check your work by looking at the
graph for those points that appear on it.
Problem 3:
Let's say that you have a choice between spending money on cache
bandwidth or bus bandwidth. You must choose between the following two
design options:
Design Option 1: |
Off-chip cache memory access takes 4 clocks (by providing 256
pins for data) |
Main memory access takes 24 clocks (by using a 32 bit data bus
and cycling 4 times) |
|
|
Design Option 2: |
Off-chip cache memory access takes 6 clocks (by providing only
128 pins for data and cycling twice for each transfer) |
Main memory access take 16 clocks (by using a 128 bit data bus) |
|
- Assume a 4% cache miss rate (i.e., 96% of accesses are to the
off-chip cache memory, and 4% are to main memory). If you have to choose
more pins for the cache or more bits on the data bus, which of the above
two options will be faster and by how much in terms of clocks per
average access?
- Which case would be faster with a 20% cache miss rate and by how
much?
18-548/15-548 home page.