Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Communication in Multicomputers with Nonconvex Faults

Published: 01 May 1997 Publication History

Abstract

A technique to enhance multicomputer routers for fault-tolerant routing with modest increase in routing complexity and resource requirements is described. This method handles solid faults in meshes, which includes all convex faults and many practical nonconvex faults, for example, faults in the shape of L or T. As examples of the proposed method, adaptive and nonadaptive fault-tolerant routing algorithms using four virtual channels per physical channel are described.

References

[1]
A. Agarwal, et al, "The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor," Proc. Workshop Scalable Shared Memory Multiprocessors. Kluwer Academic Publishers, 1991.
[2]
K. Bolding and L. Snyder, "Overview of Fault Handling for the Chaos Router," Proc. 1991 Int'l Workshop Defect and Fault Tolerance in VLSI Systems, pp. 124-127, 1991.
[3]
R.V. Boppana and S. Chalasani, "Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks," IEEE Trans. Computers, vol. 44, no. 7, pp. 848-864, July 1995.
[4]
Y.M. Boura and C.R. Das, "Fault-Tolerant Routing in Mesh Networks," Proc. 1995 Int'l Conf. Parallel Processing, pp. I.106-I.109, Aug. 1995.
[5]
S. Chalasani and R.V. Boppana, "Adaptive Fault-Tolerant Wormhole Routing Algorithms with Low Virtual Channel Requirements," Proc. Int'l Symp. Parallel Architectures, Algorithms, and Networks, pp. 214-221, Dec. 1994.
[6]
S. Chalasani and R.V. Boppana, "Communication in Multicomputers with Nonconvex Faults," Proc. EURO-PAR '95, pp. 671-684, Aug. 1995.
[7]
S. Chalasani and R.V. Boppana, "Communication in Multicomputers with Nonconvex Faults," Technical Report CS-96-12, Computer Science Division, Univ. of Texas at San Antonio, Oct. 1996.
[8]
A.A. Chien and J.H. Kim, "Planar-Adaptive Routing: Low-Cost Adaptive Networks for multiprocessors," Proc. 19th Ann. Int'l. Symp. Computer Architectures, pp. 268-277, 1992.
[9]
Cray Research, Inc., Cray T3D System Architecture Overview, Sept. 1993.
[10]
W.J. Dally and H. Aoki, "Deadlock-Free Adaptive Routing in Multiprocessor Networks Using Virtual Channels," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 4, pp. 466-475, Apr. 1993.
[11]
W.J. Dally and C.L. Seitz, "Deadlock-Free Message Routing in Multiprocessor Interconnection Networks," IEEE Trans. Computers, vol. 36, no. 5, pp. 547-553, May 1987.
[12]
J. Duato, "A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks," IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, pp. 1,320-1,331, Dec. 1993.
[13]
P.T. Gaughan and S. Yalamanchili, "A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks," IEEE Trans. Parallel and Distributed Systems, vol. 6, no. 5, pp. 482-497, May 1995.
[14]
C.J. Glass and L.M. Ni, "Fault-Tolerant Wormhole Routing in Meshes," Proc. 23th Ann. Int'l Symp. Fault-Tolerant Computing, pp. 240-249, 1993.
[15]
Intel Corportation, Paragon XP/S Product Overview, 1991.
[16]
M.D. Noakes, et al, "The J-Machine Multicomputer: An Architectural Evaluation," Proc. 20th Ann. Int'l Symp. Computer Architectures, pp. 224-235, May 1993.
[17]
C.L. Seitz, "Concurrent Architectures," VLSI and Parallel Computation, chapter 1, pp. 1-84, R. Suaya and G. Birtwistle, eds. San Mateo, Calif.: Morgan-Kaufman, 1990.
[18]
Y.-J. Suh B.V. Dao J. Duato and S. Yalamanchili, "Software Based Fault-Tolerant Oblivious Routing in Pipelined Networks," Proc. 1995 Int'l Conf. Parallel Processing, pp. I.101-I.105, Aug. 1995.

Cited By

View all
  • (2020)Effect of Virtual Channels for a Fault-Tolerant XY Routing Method with the Passage of Faulty NodesProceedings of the 2020 8th International Conference on Information and Education Technology10.1145/3395245.3396419(267-272)Online publication date: 28-Mar-2020
  • (2017)Towards Maximum Utilization of Remained Bandwidth in Defected NoC LinksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2016.257068036:2(285-298)Online publication date: 1-Feb-2017
  • (2013)uDIRECProceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2540708.2540722(148-159)Online publication date: 7-Dec-2013
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 46, Issue 5
May 1997
128 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 May 1997

Author Tags

  1. Solid faults
  2. deadlocks
  3. mesh networks
  4. multicomputers
  5. routing algorithms
  6. wormhole routing.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Effect of Virtual Channels for a Fault-Tolerant XY Routing Method with the Passage of Faulty NodesProceedings of the 2020 8th International Conference on Information and Education Technology10.1145/3395245.3396419(267-272)Online publication date: 28-Mar-2020
  • (2017)Towards Maximum Utilization of Remained Bandwidth in Defected NoC LinksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2016.257068036:2(285-298)Online publication date: 1-Feb-2017
  • (2013)uDIRECProceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2540708.2540722(148-159)Online publication date: 7-Dec-2013
  • (2013)Enabling power efficiency through dynamic rerouting on-chipACM Transactions on Embedded Computing Systems10.1145/2485984.248599912:4(1-23)Online publication date: 3-Jul-2013
  • (2012)Topology Agnostic Dynamic Quick Reconfiguration for Large-Scale Interconnection NetworksProceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)10.1109/CCGrid.2012.62(228-235)Online publication date: 13-May-2012
  • (2011)A Scalable Method for Signalling Dynamic Reconfiguration Events with OpenSMProceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGrid.2011.48(332-341)Online publication date: 23-May-2011
  • (2008)An Adaptive and Fault-Tolerant Routing Algorithm for MeshesProceeding sof the international conference on Computational Science and Its Applications, Part I10.1007/978-3-540-69839-5_95(1235-1248)Online publication date: 30-Jun-2008
  • (2007)Improving a fault-tolerant routing algorithm using detailed traffic analysisProceedings of the Third international conference on High Performance Computing and Communications10.5555/2401945.2402029(766-775)Online publication date: 26-Sep-2007
  • (2007)ImmucubeIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2007.104718:6(776-788)Online publication date: 1-Jun-2007
  • (2006)Broadcasting and routing in faulty mesh networksProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898784(266-266)Online publication date: 25-Apr-2006
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media