712 Project Ideas (Fall 2000)
This page lists a bunch of potential ideas for projects. Feel
free to use one, propose something completely different, or
refine one of these into your own idea.
- Resource logging: Systems like
Abacus
could potentially benefit
from historical resource availability and usage databases. For example,
application profiles (resources used over time) could allow Abacus to
(a) proactively relocate objects if there's high confidence that the
application will change behavior and (b) prevent migrating objects
belonging to applications that will terminate shortly. Monitoring
resource availability is similarly useful. Construct and evaluate a
distributed database of significant system attributes. Part of such a
project is determining what data is useful to monitor and determining
how to monitor it.
- Study agility vs. stability. Systems like
Odyssey
and
Abacus
try to adapt quickly to environmental changes, because reacting slowly could
squander some of the benefit of adaptation. However, adapting to
transient conditions can lead to instability (frequent adaptations).
Create a model and simulator to study the conditions under which
instability occurs and ways in which one can adaptively determine how
agile to be while maintaining a certain degree of stability. For
example, if network latencies are highly variable, perhaps one needs to
be less agile.
(Well done, such a project would exploit an understanding of control systems.)
- Learning of disk mappings. The firmware of disk drives uses fairly
regular schemes for mapping logical block numbers (from 1 to N) to
physical disk locations, but almost every disk uses a slightly
different scheme.
DIXtrac includes a simple expert system to determine
the mapping schemes for modern SCSI disks, but it still fairly
frequently runs into schemes its authors have never seen (thus the expertise
isn't in it yet). Extend DIXtrac with a simple
learning system to discover minimized
representations/algorithms for expressing a particular disk's mapping,
and evaluate it on real disks.
- Study centralized vs. decentralized resource allocation.
Intuitively, centralized resource allocators can make a better use of
cluster resources because they are aware of all the workloads and all
the resources. However, centralization introduces a bottleneck and a
central point of failure. Validate these common beliefs. Explore
approaches to decentralizing resource management while still
finding stable (if not globally optimal) resource allocation points.
- Collective I/O: parallel file system designs sometimes add a collective
synchronization operation to allow the file system maximal opportunities
for reordering. Construct such a system, perhaps using modified NFS
service, and test the power of these optimizations. Implement support for
multiple failure tolerance.
- File system clustering: Clustering or coalescing of multiple disk reads is
a very big win.
C-FFS
clustered the objects of a directory
into a contiguous unit for prefetching. Implement and test. NFS loses
alot of clustering potential through excessive serialization through the
cache. Propose corrections and evaluate.
- Process migration requires identical processors, or a transformation
system, identical operating system support such as open files and sockets,
redirected input and output, and distributed shared memory. Build a simple
implementation on a system like Linux and evaluate it in contrast to
alternative load balancing tools, such as
Condor.
- Distributed transactions for NFS: Quicksilver (discussed later in term)
tried to give a fully
general system for transactions in the OS, but its principle clients were
storage based. Perhaps a much simpler implementation, restricted to the
file system, will provide the right tradeoff between utility and
simplicity. Construct an NFS variant that provides abortable
transactions to its clients and evaluate its overhead for non-aborting
clients and aborting clients.
- Serializability service in a multi-threaded RVM:
CMU's RVM tool (part of the Coda project)
provides only atomicity and durability under a software crash fault
model. It does not provide serializability and argues against sharing
the service across different address spaces (processes). However,
multi-threaded applications are increasingly common, especially in
servers, so perhaps RVM should offer serializability. Add
serializability to RVM and evaluate its overhead with a multi-threaded
application (file server, mail server, web server, etc).
- Data race warnings are useful in local systems. Consider debugging tools for
distributed system. One useful tool would be a "replay" system that reruns
a "trace" of a distributed system with identical asynchronous events. Such a
system needs to control when signals reach applications and control the outcome
of synchronization. Build such a tool for collections of Linux (or Java)
systems and evaluate the slowdown during recording and play back.