Fault-Tolerant Communication in Networks-on-Chip
The network-on-chip (NoC) architecture proposes to connect multiple
heterogeneous cores using an on-chip network instead of a shared bus,
and requires network protocols with end-to-end reliability guarantees.
The design of NoC protocols must revisit the core assumptions of large-scale
networking: because high bandwidth is available and computational resources
are scarce, NoC communication can utilize excess network capacity rather
than implement sophisticated fault-tolerance schemes
[ASP-DAC 2003]. We introduced the first
pragmatic approach for fault-tolerant communication in NoC, stochastic
communication, based on randomized gossip protocols. Stochastic communication
provides sustainable throughput and gracefully degrading latency with up to
70% of network packets corrupted by soft errors
[DATE 2003;
VLSI Design 2007].
Stochastic communication advocated a fundamental paradigm shift from
traditional
chip-design approaches, which guarantee the correctness of devices and
interconnects, by tolerating network-on-chip faults at the system level.
Publications
Journal Articles and Book Chapters
-
P. Bogdan, T. Dumitraş, R. Mărculescu
Stochastic Communication: A New Paradigm for Fault-Tolerant Networks-on-Chip
VLSI Design, special issue on Networks-on-Chip, Hindawi, 2007
- T. Dumitraş and R. Mărculescu
On-Chip Stochastic Communication
Embedded Software for SoC, A. Jerraya et al., eds., Kluwer, 2003
Conference Papers
-
T. Dumitraş, S. Kerner and R. Mărculescu
Enabling On-Chip Diversity through Architectural Communication Design
IEEE/ACM ASP-DAC, Jan. 2004
-
T. Dumitraş and R. Mărculescu
On-Chip Stochastic Communication
EDAA/IEEE/ACM DATE Conference, Mar. 2003
-
T. Dumitraş, S. Kerner and R. Mărculescu
Toward On-Chip Fault-Tolerant Communication
IEEE/ACM ASP-DAC, Jan. 2003
Best Paper Award
Theses
-
T. Dumitraş
On-Chip Stochastic Communication
MS Thesis, Carnegie Mellon University, May 2003