Due Wednesday September 23, 1998
Problem 1:
Consider an L1 uniprocessor-architecture cache with the following characteristics:
1a) How many total bits are within each cache block? Support this answer with a sketch showing all the bits in one block (not only data bits, but all overhead bits as well).
1b) How many total bits are within each cache sector? Support this answer with a sketch showing all the bits in on sector (not only data bits, but all overhead bits as well -- you can "point" to the sketch of a block in (1a) instead of having to draw each block out in complete detail).
1c) How many total bits are required to implement this cache in order to get the benefit of 8 KB of cached memory locations?
Problem 2:
Consider a cache memory with the following characteristics:
2a) Assume the cache starts out completely invalidated. For each access indicate whether it is a hit, a compulsory miss, a capacity miss, or a conflict miss. Assume cache starts out completely invalidated. State in ten or fewer words why you selected each category.
read 0x0010 read 0x0014 read 0x001C read 0x0110 read 0x001C
2b) Assume the cache starts out completely invalidated. For each access below state whether it is a "hit" or a "miss", and indicate the number of words of memory traffic generated by the access.
write 0x0024 write 0x0020 write 0x0124 read 0x0024
Problem 3:
Note: this is representative of a "hard" test question you might see on Test #1 or Test #2. Therefore it is not as obvious/straightforward as previous homework questions. You are encouraged to discuss this question with other students and the course staff to ensure that you thoroughly understand what is going on here.
You have instrumented only data references from the subroutine "sum_array" in the following program on an Alpha workstation ("long" values are 64 bits). The resultant data reads and writes have counted with a particular cache configuration. In this question you'll deduce the cache configuration used.
#include <stdio.h> #include <stdlib.h> void sum_array(int N, long *a, long *b, long *c, long *d, long *e) { int i; for (i = 0; i < N; i++) { *(a++) = *(b++) + *(c++) + *(d++) + *(e++); } } int main(int argc, char *argv[]) { int N, offset, i; long *a, *b, *c, *d, *e; /* 64-bit elements in each array */ if (argc != 3) { fprintf(stderr, "\nUsage: test <size> <offset>\n"); exit(-1); } sscanf(argv[1], "%d", &N); sscanf(argv[2], "%d", &offset); a = (long *) malloc(5 * ( (N*sizeof(long)) + offset) ); b = a + N; c = b + N; d = c + N; e = d + N; sum_array(N, a, b, c, d, e); }
The program was executed with a command line having successively higher values of N from 1 to 2100. The below graph shows the number of combined data misses for each value of N. · Bus size and word size are both 8 bytes.
Some data that might be of interest from the above chart ar:
N # Misses Total Traffic (words) 500 626 2516 501 628 2533 502 629 2534 503 630 2535 504 630 2520 505 632 2537 506 633 2538 507 696 3531 508 1020 8700 509 1277 12797 510 1597 17902 511 1853 21983 512 2079 25584 513 1862 22097 514 1606 17986 515 1288 12883 516 1032 8772 517 717 3717 518 651 2646 519 653 2663 520 652 2632 521 655 2665 522 655 2650 523 657 2667 524 658 2668
Answer the following questions, giving brief support for your answer of (unsupported answers will not receive full credit).
3a) Ignoring overhead for the subroutine call, what is the theoretical minimum possible data total traffic (in 8-byte words) that this program has to move (combined into and out of the cache) for N = 500?
3b) Does the cache perform write allocation?
3c) What is the actual associativity of the cache that produced the data given? (and how did you figure that out?)
3d) How many bytes does the cache hold (data only, not counting control+tag bits)?
18-548/15-548 home page.