18-548/15-548 Fall 1998

Homework 9:
Disks &
Vector Architecture

Due November 11, 1998

Problem 1: SCSI Bus Performance

The SCSI protocol consists of several phases for each data request and reply. The table below gives a breakdown of bus activity by phase (measured using a SCSI bus analyser) on a system with a number of fast disks transferring sequential blocks to a single host. Each individual request is for a 64 K block of data and in the multiple-disk cases, requests are issued in round-robin order (disk 1, disk 2, disk 3, disk 1, disk 2, disk 3, disk 1, etc.).

phase	1 disk	2 disks	3 disks
ARBITRATE	1%	1%	1%
SELECT	1%	1%	5%
MESSAGE	3%	10%	12%
COMMAND	1%	1%	1%
DATA	27%	55%	79%
STATUS	1%	1%	1%
BUS FREE	66%	31%	1%

a) Given a maximum bus throughput of 20 MB/s at 64K requests, what is the data transfer rate into the host in each case (with 1 disk, with 2 disks, with 3 disks)?

b) What would the transfer rate be if we added a fourth disk (remember that a single SCSI bus can hold up to 7 devices)?

If we reduce the request size to 8K each instead of 64K, we achieve the following utilization:

phase	1 disk (8K)
ARBITRATE	1%
SELECT	15%
MESSAGE	14%
COMMAND	1%
DATA	27%
STATUS	1%
BUS FREE	41%

c) Assuming the same sequential workload, but with 8K requests, does it make sense for me to add a second disk to this system? A third disk?

d) What if I changed my workload to random requests (where prefetching would no longer work, and seek time becomes an issue)? Would I benefit from a second disk? A third disk?

Problem 2: Vector Architecture

A particular vector computer design has the characteristics & assumptions given below. Some of the assumptions such as ignoring bank conflicts are made to make the problem easier and are obviously not realistic.

3 memory pipes (two VAGs for loading, one VAG for storing)
Bus can carry one 8-byte word per clock tick (all numbers are in 8-byte words) and operates at 60 MHz
No cache is involved
There are no memory bank conflicts; ignore cycle time of DRAM after data is written or read
The VRF holds at least 3 vectors of length 8
VDS can transfer two words concurrently on each clock cycle from any point to any other point (except the bus, which can only handle one word at a time)
There are appropriate buffers at the bus interface so data waits until a bus is free. Reads are giving priority over writes (so writes wait if there is any read pending)
Vector instructions are dispatched by a scalar processor at one clock per instruction.
Vector chaining as described in class is supported and should be used in your solutions; data is moved in more-or-less "fair" fashion, in which the VAGs attempt to keep at about the same vector element number, but don't let resources go idle in order to do this.
All resources in the system are fully pipelined at one pipeline stage per clock tick.

Latency	Clock ticks of latency
Vector instruction dispatch	1
VAG setup	1
Address reaches memory bank	3
DRAM read latency (ignore time to complete cycle)	4
Data returns from memory bank after access via bus	3
VDS delay	1
Adder delay (starting when both operands available)	4
VDS delay	1
Result sent to memory bank via bus (address & data)	3
Data written in to DRAM (ignore time to complete cycle)	4

A 4-element vector addition takes 4 clock cycles to issue ("vload", "vload", "vadd", "vstore"). What is the elapsed time for a 4-element vector addition in clock ticks? Provide a spreadsheet printout or other table diagram illustrating how you got this solution (similar to the spreadsheets in lecture 16, but using columns and latencies appropriate to the table above).

18-548/15-548 home page.

18-548/15-548 Fall 1998

Homework 9: Disks & Vector Architecture

Homework 9:
Disks &
Vector Architecture