PREVIOUS PROJECTS
Perspective
Perspective is a storage system designed for the home,
with the decentralization and flexibility sought by home users
and a new semantic filesystem construct, the view,
to simplify management.
A view is a semantic description of a set of files, specified as
a query on file attributes, and the ID of the device on which they are stored.
By examining and modifying the views associated with a device, a user
can identify and control the files stored on it.
This approach allows users to reason about what is stored where in the same
way (semantic naming) as they navigate their digital content.
Thus, in serving as their own administrators, users do not have to deal
with a second data organization scheme (hierarchical naming) to
perform replica management tasks, such as specifying redundancy to increase
reliability and data partitioning to address device capacity exhaustion.
Experiences with Perspective deployments and user studies confirm the
efficacy of view-based data management. I am continuing to explore how
other aspects of data management, such as file location and security
can be addressed in a user comprehensible, semantic fashion.
Self-* Storage
Administration is currently a large part of the total cost of ownership of
enterprise storage.
The Self-*
Storage project focuses on designing storage systems that decrease the need
for administrator work in data management.
We have built a versatile brick-based storage
infrastructure to support automation of various administrative tasks.
My work in the project included designing and building a component to
automatically choose an appropriate encoding when an object is first created
in the system. I also
researched methods for administrators to specify high-level policy
objectives to the system, and
ways to efficiently provide request-flow tracing in this kind of distributed
system.
Continuous Reorganization
The way data is laid out on disk has a huge impact on system
performance. For this reason,
researchers have proposed various heuristics to optimize for particular data
access patterns. Each of these
heuristics works well for some data, but may be useless or detrimental to
other data. As a result, few are
used in practice. To enable
their safe use for data-specific layout optimization, we proposed a
two-tiered system that takes a variety of heuristics and adaptively combines
them using simulation and optimization methods to place each piece of data
using a correct heuristic.
Robustly combining heuristics proved to be difficult, providing us
insight into the complexity of even relatively small automation problems,
such as this one.
Freeblock Scheduling
To perform adaptation like layout optimization, a system must be able to
efficiently service background tasks. Freeblock scheduling allows a
system to perform background disk operations without cost to a foreground
workload, even if the disk is currently 100% busy. It obtains this
extra bandwidth by scheduling background operations in the rotational latency
of the disk, which would otherwise be wasted. I worked specifically on
modifying background tasks, such as data migration, to work well with the
freeblock system.