Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/IPDPS.2005.30guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Hardware Acceleration Unit for MPI Queue Processing

Published: 04 April 2005 Publication History

Abstract

With the heavy reliance of modern scientific applications upon the MPI Standard, it has become critical for the implementation of MPI to be as capable and as fast as possible. This has led some of the fastest modern networks to introduce the capability to offload aspects of MPI processing to an embedded processor on the network interface. With this important capability has come significant performance implications. Most notably, the time to process long queues of posted receives or unexpected messages is substantially longer on embedded processors. This paper presents an associative list matching structure to accelerate the processing of moderate length queues in MPI. Simulations are used to compare the performance of an embedded processor augmented with this capability to a baseline implementation. The proposed enhancement significantly reduces latency for moderate length queues while adding virtually no overhead for extremely short queues.

References

[1]
A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Sheiman. LogGP: Incorporating long messages into the LogP model. Journal of Parallel and Distributed Computing, 44(1):71-79, 1997.
[2]
R. Alverson. Red Storm. In Invited Talk, Hot Interconnects 10, August 2003.
[3]
R. Brightwell, S. Goudy, and K. D. Underwood. A preliminary analysis of the MPI queue characteritics of several applications. submitted, May 2004.
[4]
R. Brightwell, T. B. Hudson, A. B. Maccabe, and R. E. Riesen. The Portals 3.0 message passing interface. Technical Report SAND99-2959, Sandia National Laboratories, December 1999.
[5]
R. Brightwell, W. Lawry, A. B. Maccabe, and R. Riesen. Portals 3.0: Protocol building blocks for low overhead communication. In Proceedings of the 2002 Workshop on Communication Architecture for Clusters, April 2002.
[6]
R. Brightwell, A. B. Maccabe, and R. Riesen. Design, implementation, and performance of MPI on Portals 3.0. International Journal of High Performance Computing Applications, 17(1):7-20, Spring 2003.
[7]
R. Brightwell and K. D. Underwood. An analysis of NIC resource usage for offloading MPI. In Proceedings of the 2004 Workshop on Communication Architecture for Clusters, Santa Fe, NM, April 2004.
[8]
D. Buntinas and D. K. Panda. NIC-based reduction in Myrinet clusters: Is it beneficial? In Proceedings of the SAN- 02 Workshop (in conjunction with HPCA), February 2002.
[9]
D. Buntinas, D. K. Panda, and P. Sadayappan. Fast NIC-based barrier over Myrinet/GM. In Proceedings of the International Parallel and Distributed Processing Symposium, April 2001.
[10]
D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. SimpleScalar LLC.
[11]
D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. In Proceedings 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1-12, 1993.
[12]
B. Hutchings, P. Bellows, J. Hawkins, S. Hemmert, B. Nelson, and M. Rytting. A CAD suite for high-performance FPGA design. In Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines, pages 12-24, Napa, CA, April 1999. IEEE Computer Society, IEEE.
[13]
R. P. Martin, A. M. Vahdat, D. E. Culler, and T. E. Anderson. Effects of communication latency, overhead, and bandwidth in a cluster architecture. In Proceedings of the 24th Annual International Symposium on Computer Architecture, June 1997.
[14]
Message Passing Interface Forum. MPI: A message-passing interface standard. The International Journal of Supercomputer Applications and High Performance Computing, 8, 1994.
[15]
A. Moody, J. Fernandez, F. Petrini, and D. K. Panda. Scalable NIC-based reduction on large-scale clusters. In Proceedings of the ACM/IEEE SC2003 Conference, November 2003.
[16]
Myricom, Inc. Myrinet Express (MX): A high performance, low-level, message-passing interface for Myrinet, July 2003.
[17]
F. Petrini, W. chun Feng, A. Hoisie, S. Coll, and E. Frachtenberg. The Quadrics network: High-performance clustering technology. IEEE Micro, 22(1):46-57, January/February 2002.
[18]
B. Plattner, G. Varghese, J. Turner, and M. Waldvogel. Scalable high-speed prefix matching, February 2002.
[19]
A. Rodrigues. Enkidu discrete event simulation framework. Technical Report TR04-14, University of Notre Dame, 2004.
[20]
A. Rodrigues, R. Murphy, R. Brightwell, and K. Underwood. Enhancing NIC performance for MPI using processing-inmemory. In Proceedings the 2005 Workshop on Communication Architectures for Clusters, April 2005.
[21]
P. Shivam, P. Wyckoff, and D. Panda. EMP: Zero-copy OS-bypass NIC-driven gigabit ethernet message passing. In Proceedings of the 2001 Conference on Supercomputing, Nov. 2001.
[22]
K. D. Underwood and R. Brightwell. The impact of MPI queue usage on message latency. In Proceedings of the International Conference on Parallel Processing (ICPP), Montreal, Canada, August 2004.

Cited By

View all
  • (2019)MPI tag matching performance on ConnectX and ARMProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343224(1-10)Online publication date: 11-Sep-2019
  • (2019)Evaluating tradeoffs between MPI message matching offload hardware capacity and performanceProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343223(1-11)Online publication date: 11-Sep-2019
  • (2019)INCAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356153(1-13)Online publication date: 17-Nov-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IPDPS '05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
April 2005
ISBN:0769523129

Publisher

IEEE Computer Society

United States

Publication History

Published: 04 April 2005

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)MPI tag matching performance on ConnectX and ARMProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343224(1-10)Online publication date: 11-Sep-2019
  • (2019)Evaluating tradeoffs between MPI message matching offload hardware capacity and performanceProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343223(1-11)Online publication date: 11-Sep-2019
  • (2019)INCAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356153(1-13)Online publication date: 17-Nov-2019
  • (2018)Using Simulation to Examine the Effect of MPI Message Matching Costs on Application PerformanceProceedings of the 25th European MPI Users' Group Meeting10.1145/3236367.3236375(1-11)Online publication date: 23-Sep-2018
  • (2018)Improving Performance Models for Irregular Point-to-Point CommunicationProceedings of the 25th European MPI Users' Group Meeting10.1145/3236367.3236368(1-8)Online publication date: 23-Sep-2018
  • (2018)A Dedicated Message Matching Mechanism for Collective CommunicationsWorkshop Proceedings of the 47th International Conference on Parallel Processing10.1145/3229710.3229712(1-10)Online publication date: 13-Aug-2018
  • (2018)The Case for Semi-Permanent Cache OccupancyProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225130(1-11)Online publication date: 13-Aug-2018
  • (2017)Characterizing MPI matching via trace-based simulationProceedings of the 24th European MPI Users' Group Meeting10.1145/3127024.3127040(1-11)Online publication date: 25-Sep-2017
  • (2014)Abstract machine models and proxy architectures for exascale computingProceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing10.1109/Co-HPC.2014.4(25-32)Online publication date: 16-Nov-2014
  • (2014)A fast and resource-conscious MPI message queue mechanism for large-scale jobsFuture Generation Computer Systems10.1016/j.future.2013.07.00330:C(265-290)Online publication date: 1-Jan-2014
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media