Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Employing transport layer multi-railing in cluster networks

Published: 01 March 2010 Publication History

Abstract

Building clusters from commodity off-the-shelf parts is a well-established technique for building inexpensive medium- to large-size computing clusters. Many commodity mid-range motherboards come with multiple Gigabit Ethernet interfaces, and the low cost per port for Gigabit Ethernet makes switches inexpensive as well. Our objective in this work is to take advantage of multiple inexpensive Gigabit network cards and Ethernet switches to enhance the communication and reliability performance of a cluster. Unlike previous approaches that take advantage of multiple network connections for multi-railing, we consider CMT (Concurrent Multipath Transfer) that extends SCTP (Stream Control Transmission Protocol), a transport protocol developed by the IETF, to make use of the multiple paths that exist between two hosts. In this work, we explore the applicability of CMT in the transport layer of the network stack to high-performance computing environments. We develop SCTP-based MPI (Message Passing Interface) middleware for MPICH2 and Open MPI, and evaluate the reliability and communication performance of the system. Using Open MPI with support for message striping over multiple paths at the middleware level, we compare the differences in supporting multi-railing in the middleware versus at the transport layer.

References

[1]
R.T. Aulwes, D.J. Daniel, N.N. Desai, R.L. Graham, L.D. Risinger, M.A. Taylor, T.S. Woodall, M.W. Sukalski, Architecture of LA-MPI, a network-fault-tolerant MPI, in: 18th International Parallel and Distributed Processing Symposium, IPDPS'04, Sante Fe, New Mexico, 2004
[2]
O. Aumage, E. Brunet, R. Namyst, N. Furmento, NewMadeleine: A fast communication scheduling engine for high performance networks, in: Communication Architecture for Clusters Workshop, CAC 2007, workshop held in conjunction with IPDPS, 2007
[3]
Buntinas, D., Mercier, G. and Gropp, W., Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem. Parallel Computation. v33 i9. 634-644.
[4]
A. Caro, End-to-end fault tolerance using transport layer multihoming, Ph.D. Thesis, Computer Science Dept., University of Delaware, 2005
[5]
Chase, J., Gallatin, A. and Yocum, K., End system optimizations for high-speed TCP. IEEE Communications Magazine. v39. 68-74.
[6]
Clark, D., Jacobson, V., Romkey, J. and Salwen, H., An analysis of TCP processing overhead. Communications Magazine, IEEE. v27 i6. 23-29.
[7]
S. Coll, E. Frachtenberg, F. Petrini, A. Hoisie, L. Gurvits, Using multirail networks in high-performance clusters, in: Proceedings of the 2001 IEEE International Conference on Cluster Computing, 2001, pp. 15-24
[8]
A.P. Foong, T.R. Huff, H.H. Hum, J.R. Patwardhan, G.J. Regnier, TCP performance re-visited, in: ISPASS'03: Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, IEEE Computer Society, Washington, DC, USA, 2003, pp. 70-79
[9]
J. Iyengar, End-to-end concurrent multipath transfer using transport layer multihoming, Ph.D. Thesis, Computer Science Dept., University of Delaware, 2006
[10]
Iyengar, J., Amer, P. and Stewart, R., Concurrent multipath transfer using SCTP multihoming over independent end-to-end paths. IEEE/ACM Transactions on Networking. v14 i5. 951-964.
[11]
Iyengar, J., Amer, P. and Stewart, R., Performance implications of receive buffer blocking in concurrent multipath transfer. Computer Communications. v30 i4. 818-829.
[12]
V. Jacobson, Congestion avoidance and control, in: ACM SIGCOMM, 1988
[13]
Kamal, H., Penoff, B. and Wagner, A., SCTP Versus TCP for MPI. In: Supercomputing '05, IEEE Computer Society, Washington, DC, USA.
[14]
K. Kant, N. Jani, SCTP performance in data center environments, in: Proceedings of SPECTS, 2005
[15]
J.B. Layton, The network IS the cluster: Infiniband and Ethernet network fabric solutions for HPC (part one), Available from http://www.linux-mag.com/id/3507, 2007
[16]
Liu, J., Vishnu, A. and Panda, D.K., Building multirail infiniband clusters: MPI-level designs and performance evaluation. In: Super Computing, IEEE Computer Society. pp. 33
[17]
Miura, S., Hanawa, T., Yonemoto, T., Boku, T. and Sato, M., RI2N/DRV: Multi-link Ethernet for high-bandwidth and fault-tolerant network on PC clusters. Parallel and Distributed Processing Symposium, International. v0. 1-7.
[18]
Mohamed, N., Self-configuring communication middleware model for multiple network interfaces. In: COMPSAC (1), IEEE Computer Society. pp. 115-120.
[19]
Mohamed, N., Al-Jaroodi, J., Jiang, H. and Swanson, D.R., High-performance message striping over reliable transport protocols. The Journal of Supercomputing. v38 i3. 261-278.
[20]
Natarajan, P., Ekiz, N., Amer, P.D., Iyengar, J. and Stewart, R., Concurrent multipath transfer using SCTP multihoming: Introducing the potentially-failed destination state. In: Networking, pp. 727-734.
[21]
P. Natarajan, J. Iyengar, P.D. Amer, R. Stewart, Concurrent multipath transfer using transport layer multihoming: Performance under network failures, in: MILCOM, Washington, DC, USA, 2006
[22]
Ohio State University, OSU MPI Benchmarks, http://mvapich.cse.ohio-state.edu, 2007
[23]
T. Okamoto, S. Miura, T. Boku, M. Sato, D. Takahashi, RI2N/UDP: High bandwidth and fault-tolerant network for PC-cluster based on multi-link Ethernet, in: 21st International Parallel and Distributed Processing Symposium, IPDPS'07, Long Beach, California, 2007, Workshop on Communication Architecture for Clusters
[24]
B. Penoff, M. Tsai, J. Iyengar, A. Wagner, Using CMT in SCTP-based MPI to exploit multiple interfaces in cluster nodes, in: Proceedings, 14th European PVM/MPI Users' Group Meeting, Paris, France,2007
[25]
F. Petrini, W. chun Feng, A. Hoisie, S. Coll, E. Frachtenberg, The quadrics network: High performance clustering technology, 2002
[26]
Stevens, W.R., Fenner, B. and Rudoff, A.M., . In: Unix Network Programming, Vol. 1, Pearson Education.
[27]
R. Stewart, I. Arias-Rodriguez, K. Poon, A. Caro, M. Tuexen, Stream Control Transmission Protocol (SCTP) specification errata and issues, Available from http://www.ietf.org/rfc/rfc4460.txt, 2006
[28]
Stewart, R.R. and Xie, Q., Stream Control Transmission Protocol (SCTP): A Reference Guide. 2002. Addison-Wesley Longman Publishing Co., Inc.
[29]
R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, T. Taylor, M.K.I. Rytina, L. Zhang, V. Paxson, The Stream Control Transmission Protocol (SCTP), Available from http://www.ietf.org/rfc/rfc2960.txt, 2000
[30]
University of Mannheim, University of Tennessee, NERSC/LBNL, Top 500 Computer Sites, http://www.top500.org/, 2007
[31]
A. Vishnu, P. Gupta, A.R. Mamidala, D.K. Panda, Scalable systems software-A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation, in: SC, 2006, p. 85
[32]
T. Woodall, R. Graham, R. Castain, D. Daniel, M. Sukalski, G. Fagg, E. Gabriel, G. Bosilca, T. Angskun, J. Dongarra, J. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, Open MPI's TEG point-to-point communications methodology: Comparison to existing implementations, in: Proceedings, 11th European PVM/MPI Users' Group Meeting, Budapest, Hungary, 2004, pp. 105-111
[33]
T. Woodall, R. Graham, R. Castain, D. Daniel, M. Sukalski, G. Fagg, E. Gabriel, G. Bosilca, T. Angskun, J. Dongarra, J. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, TEG: A high-performance, scalable, multi-network point-to-point communications methodology, in: Proceedings, 11th European PVM/MPI Users' Group Meeting, Budapest, Hungary, 2004

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing  Volume 70, Issue 3
March, 2010
135 pages

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 March 2010

Author Tags

  1. Cluster
  2. Concurrent Multipath Transfer
  3. MPI
  4. Middleware
  5. Multi-railing
  6. Network interfaces
  7. SCTP
  8. TCP
  9. Transport protocol

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media