Differences

This shows you the differences between two versions of the page.

Link to this comparison view

reliable_processors_and_systems [2017/09/29 10:11] (current)
Line 1: Line 1:
 +====== Reliable Processors and Systems ======
  
 +This research investigates the impact of soft-error tolerance in future deep-submicron microprocessor designs. The study investigates different options to achieve the desired level of protection against soft errors. This research effort is in part supported by NSF through a CAREER Award. ​  The [[http://​www.ece.cmu.edu/​~truss | TRUSS Project]] (Total Reliability Using Scalable Servers) develops a reliable, available, and serviceable (RAS) hardware platform based on a distributed cluster of commodity blade servers. ​ The goal of the project is to leverage the cost-effectiveness of commodity processor and memory modules in a reliable server design that achieves both performance and cost scalability. ​ This research effort is in part supported by NSF through an ITR Award and by Intel. ​ (Go to the [[http://​www.ece.cmu.edu/​~truss | TRUSS Project Page]].)
 +
 +  * **Students**
 +    * Jared Smolens ([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2008/​jsmolens.pdf |PhD Thesis]])
 +    * Brian Gold (advised by Babak Falsafi)
 +    * Jangwoo Kim (advised by Babak Falsafi)
 +  * **Publications**
 +    * **Chip-Level Redundancy in Distributed Shared-Memory Multiprocessors**. B. T. Gold, B. Falsafi, and J. C. Hoe. Pacific Rim International Symposium on Dependable Computing (PRDC), November 2009. 
 +    * **OpenSPARC:​ An Open Platform for Hardware Reliability Experimentation**. I. Parulkar, A. Wood, J. C. Hoe, B. Falsafi, S. V. Adve and J. Torrellas. Fourth Workshop on Silicon Errors in Logic-System Effects (SELSE), April 2008. ([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2008/​selse08.pdf |pdf]])
 +    * **Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding**. J. Kim, N. Hardavellas,​ K. Mai, B. Falsafi and J. C. Hoe. International Symposium on Microarchitecture (MICRO), December 2007. ([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2007/​micro07.pdf |pdf]])
 +    * **PAI: A Lightweight Mechanism for Single-Node Memory Recovery in DSM Servers**. J. Kim, J. C. Smolens, B. Falsafi and J. C. Hoe. Pacific Rim International Symposium on Dependable Computing (PRDC), December 2007. ([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2007/​prdc07.pdf |pdf]])
 +    * **Detecting Emerging Wearout Faults**. J. C. Smolens, B. T. Gold, J. C. Hoe, B. Falsafi, and K. Mai. The Third Workshop on Silicon Errors in Logic - System Effects (SELSE), April 2007. ([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2007/​selse07.pdf |pdf]])
 +    * **Reunion: Complexity-Effective Multicore Redundancy**. J. C. Smolens, B. T. Gold, B. Falsafi, and J. C. Hoe. International Symposium on Microarchitecture (MICRO), December ​ 2006.([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2006/​micro06.pdf |pdf]])
 +    * **TRUSS: Reliable, Scalable Server Architecture**. B. T. Gold, J. C. Smolens, J. Kim, E. S. Chung, ​ V. Liaskovitis,​ E. Nurvitadhi, B. Falsafi, J. C. Hoe, and A. G. Nowatzyk. IEEE Micro, Volume 25, Number 6, November/​December 2005. ([[http://​ieeexplore.ieee.org/​iel5/​40/​33228/​01566557.pdf?​tp=&​arnumber=1566557&​isnumber=33228 |pdf]])  ​
 +    * **Fingerprinting:​ Bounding Soft-Error-Detection Latency and Bandwidth**. J. C. Smolens, B. T. Gold, J. Kim, B. Falsafi, J. C. Hoe, and A. G. Nowatzyk. IEEE Micro, Volume 24, Number 6, November/​December 2004. ([[http://​ieeexplore.ieee.org/​iel5/​40/​30203/​01388154.pdf?​tp=&​arnumber=1388154&​isnumber=30203&​arSt=22&​ared=29&​arAuthor=Smolens%2C+J.C.%3B++Gold%2C+B.T.%3B++J.+Kim%3B++Falsafi%2C+B.%3B++Hoe%2C+J.C.%3B++Nowatzyk%2C+A.G.%3B |pdf]]) //(note: Top Picks version of ASPLOS 2004.)//
 +    * **Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures**. J. C. Smolens, J. Kim, J. C. Hoe, and B. Falsafi. International Symposium on Microarchitecture (MICRO), November ​ 2004. ([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2004/​micro04.pdf |pdf]])
 +    * **Fingerprinting:​ Bounding Soft-Error Detection Latency and Bandwidth**. J. C. Smolens, B. T. Gold, J. Kim, B. Falsafi, J. C. Hoe, and A. G. Nowatzyk. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2004. ([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2004/​asplos04.pdf |pdf]])
 +    * **Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery**. J. Ray, J. C. Hoe and B. Falsafi. ​ International Symposium on Microarchitecture (MICRO), December 2001. ([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2001/​micro01.pdf |pdf]])
 +
 +  * **Thesis**
 +    * **Fingerprinting:​ Hash-Based Error Detection in Microprocessors**. Jared Smolens, PhD, ECE/CMU, December 2007. ([[http://​www.ece.cmu.edu/​~jhoe/​distribution/​2008/​jsmolens.pdf |pdf]])