PREVIOUS PROJECTS

Perspective Perspective is a storage system designed for the home, with the decentralization and flexibility sought by home users and a new semantic filesystem construct, the view, to simplify management. A view is a semantic description of a set of files, specified as a query on file attributes, and the ID of the device on which they are stored. By examining and modifying the views associated with a device, a user can identify and control the files stored on it. This approach allows users to reason about what is stored where in the same way (semantic naming) as they navigate their digital content. Thus, in serving as their own administrators, users do not have to deal with a second data organization scheme (hierarchical naming) to perform replica management tasks, such as specifying redundancy to increase reliability and data partitioning to address device capacity exhaustion. Experiences with Perspective deployments and user studies confirm the efficacy of view-based data management. I am continuing to explore how other aspects of data management, such as file location and security can be addressed in a user comprehensible, semantic fashion.

Self-* Storage Administration is currently a large part of the total cost of ownership of enterprise storage. The Self-* Storage project focuses on designing storage systems that decrease the need for administrator work in data management. We have built a versatile brick-based storage infrastructure to support automation of various administrative tasks. My work in the project included designing and building a component to automatically choose an appropriate encoding when an object is first created in the system. I also researched methods for administrators to specify high-level policy objectives to the system, and ways to efficiently provide request-flow tracing in this kind of distributed system.

Continuous Reorganization The way data is laid out on disk has a huge impact on system performance. For this reason, researchers have proposed various heuristics to optimize for particular data access patterns. Each of these heuristics works well for some data, but may be useless or detrimental to other data. As a result, few are used in practice. To enable their safe use for data-specific layout optimization, we proposed a two-tiered system that takes a variety of heuristics and adaptively combines them using simulation and optimization methods to place each piece of data using a correct heuristic. Robustly combining heuristics proved to be difficult, providing us insight into the complexity of even relatively small automation problems, such as this one.

Freeblock Scheduling To perform adaptation like layout optimization, a system must be able to efficiently service background tasks. Freeblock scheduling allows a system to perform background disk operations without cost to a foreground workload, even if the disk is currently 100% busy. It obtains this extra bandwidth by scheduling background operations in the rotational latency of the disk, which would otherwise be wasted. I worked specifically on modifying background tasks, such as data migration, to work well with the freeblock system.