Fault-Tolerant Communication in Networks-on-Chip

The network-on-chip (NoC) architecture proposes to connect multiple heterogeneous cores using an on-chip network instead of a shared bus, and requires network protocols with end-to-end reliability guarantees. The design of NoC protocols must revisit the core assumptions of large-scale networking: because high bandwidth is available and computational resources are scarce, NoC communication can utilize excess network capacity rather than implement sophisticated fault-tolerance schemes [ASP-DAC 2003]. We introduced the first pragmatic approach for fault-tolerant communication in NoC, stochastic communication, based on randomized gossip protocols. Stochastic communication provides sustainable throughput and gracefully degrading latency with up to 70% of network packets corrupted by soft errors [DATE 2003; VLSI Design 2007]. Stochastic communication advocated a fundamental paradigm shift from traditional chip-design approaches, which guarantee the correctness of devices and interconnects, by tolerating network-on-chip faults at the system level.

Publications

Journal Articles and Book Chapters

Conference Papers

Theses