@eberle18

Transient and Permanent Error Control for High-End Multiprocessor Systems-on-Chip

, , , und . Networks on Chip (NoCS), 2012 Sixth IEEE/ACM International Symposium on, Seite 169-176. (Mai 2012)
DOI: 10.1109/NOCS.2012.27

Zusammenfassung

High-end MPSoC systems with built-in high-radix topologies achieve good performance because of the improved connectivity and the reduced network diameter. In high-end MPSoC systems, fault tolerance support is becoming a compulsory feature. In this work, we propose a combined method to address permanent and transient link and router failures in those systems. The LBDRhr mechanism is proposed to tolerate permanent link failures in some popular high-radix topologies. The increased router complexity may lead to more transient router errors than routers using simple XY routing algorithm. We exploit the inherent information redundancy (IIR) in LBDRhr logic to manage transient errors in the network routers. Thorough analyses are provided to discover the appropriate internal nodes and the forbidden signal patterns for transient error detection. Simulation results show that LBDRhr logic can tolerate all of the permanent failure combinations of long-range links and 80% of links failures at short-range links. Case studies show that the error detection method based on the new IIR extraction method reduces the power consumption and the residual error rate by 33% and up to two orders of magnitude, respectively, compared to triple modular redundancy. The impact of network topologies on the efficiency of the detection mechanism has been examined in this work, as well.

Beschreibung

IEEE Xplore Abstract - Transient and Permanent Error Control for High-End Multiprocessor Systems-on-Chip

Links und Ressourcen

Tags

Community

  • @eberle18
  • @dblp
@eberle18s Tags hervorgehoben