Teaching Cache Coherence Protocols
with Model Checking
Brian T. Gold
bgold at cmu.edu
January, 2009
About this Lab
Designing your own cache coherence protocol is a great way to learn about
this challenging topic in computer architecture. This lab has you design
and verify your protocol using
Murphi, a model checking tool from David
Dill's group at Stanford. The material on this page is meant to
supplement lecture notes explaining the basics of cache coherence.
History and Usage
This lab assignment grew out of the Multiprocessor Computer Architecture
course taught at Carnegie Mellon in the ECE department (course: 18-742). I
believe Prof. Andreas Nowatzyk wrote the original version of this when he
taught the course in Spring 2004, when I was the TA. Originally, we asked
students to implement a baseline 3-hop protocol along with an optimization
(e.g., speculative reads) of their choosing. We gave students the
documentation for Murphi but didn't give them any other starting point for
writing their code. The assignment typically ran two weeks.
I found that many students were stumbling on the model checker usage,
especially given that we need to use some of Murphi's more advanced features
(e.g., symmetry reduction) to get tractable state spaces. In the version on
this page, I'm giving students a minimal framework that handles the
messaging aspects of the lab. All they have to do is implement the
actual coherence protocol.
We've used this assignment in courses with great success, but I've also
given it to new graduate students to help them learn the coherence protocols
we use in our research. Eric
Chung took this quite far: he learned the
protocols using this method and within a semester had implemented an
industrial-strength coherence protocol and controller in an FPGA. This led
to a WARFP workshop
paper.
Why Model Checking?
The real question is, "What is so hard about cache coherence protocols?"
Answer: The corner cases. It is relatively easy to think about the
common-case operations, but thinking through
all possible races is
very tough, especially for aggressive protocols that attempt to minimize the
number of messages (hops) per transition.
Model checking aids protocol designers by forcing them to handle all
possible races. If you forget to handle a race or handle it incorrectly,
the checker will show you a counter-example from which you can work to fix
your design. The complexity of coherence-protocol races is not conducive to
lecture notes; instead, we've found that model checking is a good way
to get students designing their own protocols and learning about common
mistakes along the way.
Comments, complaints?
Please send me email (bgold at cmu.edu). Thanks!
Cache Coherence Protocol Lab
Overview
In this assignment, you will design and verify a cache coherence protocol
for a multiprocessor system. To help get you started, we'll give you a
basic framework for implementing your protocol, but you will have to
work out the interesting parts yourself. You will be working in
the Murphi model-checking framework, which will allow you to specify
your protocol and verify its correctness. We'll provide some notes on
Murphi for this assignment and give you pointers to the complete
documentation.
Protocol Design
You will design and verify an invalidation-based cache coherence protocol.
The protocol you develop will have a number of characteristics:
- It uses an interconnect network that supports only point-to-point
communication. All communication is done by sending and receiving messages.
The interconnect network may reorder messages arbitrarily. It may delay
messages, but it will always deliver messages eventually. Messages are never
lost, corrupted or replicated. Message delivery cannot be assumed to be in
the same order as they were sent, even for the same sender and receiver
pair.
- At the receiving side of the interconnect system, messages are delivered
to a receive port. Once a message has been delivered to the receive port, it
will block all subsequent messages to this port until the message is read.
Consider this behavior equivalent to that of a mail-box with room for only
one letter: you have to remove the letter from the mailbox before you can
receive the next one. On the sending side, there is no such restriction: you
can always send messages. The interconnect system has enough buffer space
to queue messages.
- For the purpose of this assignment, you may assume that there is no
limit on the buffer space in the interconnect system. However, your protocol
will be considered broken if there is a way to generate an infinite number
of undelivered messages. Besides, you will not be able to verify your
protocol in this case.
- You may assume that the interconnect network supports multiple lanes
(i.e., virtual channels). For each lane, you have a separate set of send-
and receive-ports for each unit. Traffic on one lane is independent of
traffic on the other lanes. Messages will never switch lanes. Note that
using fewer lanes is better.
- Each processor has a dedicated cache that is not shared with any other
processor. All caches must be kept coherent by your cache coherence
protocol. Processors may issue load and store operations only. Because this
assignment only deals with cache coherence and not with consistency issues,
you will be concerned with only one storage location (address). However, you
need to model cache conflicts. To do this, you need to model a third
operation besides load and store: a cache write-back. Write-backs normally
arise from a cache conflict if the old line is dirty. Write-back operations
may occur at any time between any pair of load/store operations. If the
cache is in a clean state, you may simply set it to be invalid or take the
appropriate action according to your coherence protocol. Cache replacements
of dirty lines must obviously write the line back to memory.
- You should assume that the coherence unit is equal to one word and that
all loads and stores read or write the entire word.
- Besides processors with their caches, there is one memory unit in your
system. The memory unit has a directory-based cache-consistency controller
which ensures that only one processor can write to the memory block at a time
(exclusive-ownership style protocol). The directory representation is
unimportant for this assignment. You should assume that you have a full
directory (bit vector) that can keep track of all sharers.
- The interconnect system can send messages from any unit to any other
unit. It is OK if your protocol requires that a cache controller has to send
a message to another cache controller.
For this assignment, your cache coherence protocol should not worry about
consistency issues. Because of that, you may assume that the memory of this
machine has only one word. Your protocol has to make sure that loads from
up to three (3) processors always return the value of the most recent
stores. In this context, this means that loads and stores issued by one
processor are seen by that processor in program order.
You are supposed to write a plain, directory-based cache-coherence baseline
protocol without any optimizations other than forwarding of invalidations
and exclusive ownership. In other words, your baseline protocol should use
no more than 3 hops for any transactions. For example assume that processor
3 has exclusive ownership and processor 1 issues a load, then P3 is supposed
to send the cache line to the memory and to P1 (forwarding). This is to
minimize latency.
The baseline protocol shall deliver data always in the state needed by the
requesting processor. In other words do not bother with speculating on
supplying data in an exclusive state for a normal load. Exclusivity is always a
consequence of a store. Therefore in this case you only have 3 cache states:
I = invalid, S = shared (read-only) and M = modified (exclusive and dirty).
The memory unit could be regarded as a home-node without a processor, so it
will never do anything on its own. For example, it will never issue an
unsolicited recall-request.
Getting Started
To complete this lab, you will need to download Murphi 3.1. Instructions
for building Murphi can be found in the tar-ball, along with a complete
language and user manual.
In addition to Murphi itself, we are also distributing a minimal framework
that handles the messaging aspects of this assignment. Comments in the code
explain what's there and where to insert your protocol code.
Deliverables and Deadlines
(redacted from web version... usually 2 weeks, turn in all code and a short writeup)
Sample Solutions
No peeking! But just in case, here are sample protocols.
Basic 3-hop protocol
The first example is a basic 3-hop protocol designed according to the
description above.
Adding Home Node optimizations
The second example adds the following: the home node (directory) can now
issue read and write requests, but as an optimization, its state is not kept
in the directory. This allows the home node to write to the address without
updating the coherence state, so long as the state is Invalid. For private
data this can be a big improvement in latency, as we don't have to wait for
the coherence controller just to access local data.
This code models the protocol used in Flexus, the simulator used in our
research group. The protocol itself was originally designed by Nikos Hardavellas, with input from
Andreas Nowatzyk and others. I took Nikos' design and modeled it in Murphi
using a framework similar to this lab. The protocol diagrams PDF illustrates
just how complex the transitions get in these protocols.
last updated: January 2009