no way to compare when less than two revisions

Differences

This shows you the differences between two versions of the page.


Last revision
reliable_processors_and_systems [2017/09/29 14:11] – external edit 127.0.0.1
Line 1: Line 1:
 +====== Reliable Processors and Systems ======
  
 +This research investigates the impact of soft-error tolerance in future deep-submicron microprocessor designs. The study investigates different options to achieve the desired level of protection against soft errors. This research effort is in part supported by NSF through a CAREER Award.   The [[http://www.ece.cmu.edu/~truss | TRUSS Project]] (Total Reliability Using Scalable Servers) develops a reliable, available, and serviceable (RAS) hardware platform based on a distributed cluster of commodity blade servers.  The goal of the project is to leverage the cost-effectiveness of commodity processor and memory modules in a reliable server design that achieves both performance and cost scalability.  This research effort is in part supported by NSF through an ITR Award and by Intel.  (Go to the [[http://www.ece.cmu.edu/~truss | TRUSS Project Page]].)
 +
 +  * **Students**
 +    * Jared Smolens ([[http://www.ece.cmu.edu/~jhoe/distribution/2008/jsmolens.pdf |PhD Thesis]])
 +    * Brian Gold (advised by Babak Falsafi)
 +    * Jangwoo Kim (advised by Babak Falsafi)
 +  * **Publications**
 +    * **Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors**. B. T. Gold, B. Falsafi, and J. C. Hoe. Pacific Rim International Symposium on Dependable Computing (PRDC), November 2009. 
 +    * **OpenSPARC: An Open Platform for Hardware Reliability Experimentation**. I. Parulkar, A. Wood, J. C. Hoe, B. Falsafi, S. V. Adve and J. Torrellas. Fourth Workshop on Silicon Errors in Logic-System Effects (SELSE), April 2008. ([[http://www.ece.cmu.edu/~jhoe/distribution/2008/selse08.pdf |pdf]])
 +    * **Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding**. J. Kim, N. Hardavellas, K. Mai, B. Falsafi and J. C. Hoe. International Symposium on Microarchitecture (MICRO), December 2007. ([[http://www.ece.cmu.edu/~jhoe/distribution/2007/micro07.pdf |pdf]])
 +    * **PAI: A Lightweight Mechanism for Single-Node Memory Recovery in DSM Servers**. J. Kim, J. C. Smolens, B. Falsafi and J. C. Hoe. Pacific Rim International Symposium on Dependable Computing (PRDC), December 2007. ([[http://www.ece.cmu.edu/~jhoe/distribution/2007/prdc07.pdf |pdf]])
 +    * **Detecting Emerging Wearout Faults**. J. C. Smolens, B. T. Gold, J. C. Hoe, B. Falsafi, and K. Mai. The Third Workshop on Silicon Errors in Logic - System Effects (SELSE), April 2007. ([[http://www.ece.cmu.edu/~jhoe/distribution/2007/selse07.pdf |pdf]])
 +    * **Reunion: Complexity-Effective Multicore Redundancy**. J. C. Smolens, B. T. Gold, B. Falsafi, and J. C. Hoe. International Symposium on Microarchitecture (MICRO), December  2006.([[http://www.ece.cmu.edu/~jhoe/distribution/2006/micro06.pdf |pdf]])
 +    * **TRUSS: Reliable, Scalable Server Architecture**. B. T. Gold, J. C. Smolens, J. Kim, E. S. Chung,  V. Liaskovitis, E. Nurvitadhi, B. Falsafi, J. C. Hoe, and A. G. Nowatzyk. IEEE Micro, Volume 25, Number 6, November/December 2005. ([[http://ieeexplore.ieee.org/iel5/40/33228/01566557.pdf?tp=&arnumber=1566557&isnumber=33228 |pdf]])  
 +    * **Fingerprinting: Bounding Soft-Error-Detection Latency and Bandwidth**. J. C. Smolens, B. T. Gold, J. Kim, B. Falsafi, J. C. Hoe, and A. G. Nowatzyk. IEEE Micro, Volume 24, Number 6, November/December 2004. ([[http://ieeexplore.ieee.org/iel5/40/30203/01388154.pdf?tp=&arnumber=1388154&isnumber=30203&arSt=22&ared=29&arAuthor=Smolens%2C+J.C.%3B++Gold%2C+B.T.%3B++J.+Kim%3B++Falsafi%2C+B.%3B++Hoe%2C+J.C.%3B++Nowatzyk%2C+A.G.%3B |pdf]]) //(note: Top Picks version of ASPLOS 2004.)//
 +    * **Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures**. J. C. Smolens, J. Kim, J. C. Hoe, and B. Falsafi. International Symposium on Microarchitecture (MICRO), November  2004. ([[http://www.ece.cmu.edu/~jhoe/distribution/2004/micro04.pdf |pdf]])
 +    * **Fingerprinting: Bounding Soft-Error Detection Latency and Bandwidth**. J. C. Smolens, B. T. Gold, J. Kim, B. Falsafi, J. C. Hoe, and A. G. Nowatzyk. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2004. ([[http://www.ece.cmu.edu/~jhoe/distribution/2004/asplos04.pdf |pdf]])
 +    * **Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery**. J. Ray, J. C. Hoe and B. Falsafi.  International Symposium on Microarchitecture (MICRO), December 2001. ([[http://www.ece.cmu.edu/~jhoe/distribution/2001/micro01.pdf |pdf]])
 +
 +  * **Thesis**
 +    * **Fingerprinting: Hash-Based Error Detection in Microprocessors**. Jared Smolens, PhD, ECE/CMU, December 2007. ([[http://www.ece.cmu.edu/~jhoe/distribution/2008/jsmolens.pdf |pdf]])