Abstract
As InfiniBand (IB) clusters grow in size and scale, predicting the behavior of the IB network in terms of link usage and performance becomes an increasingly challenging task. There currently exists no open source tool that allows users to dynamically analyze and visualize the communication pattern and link usage in the IB network. In this context, we design and develop a scalable InfiniBand Network Analysis and Monitoring tool - INAM. INAM monitors IB clusters in real time and queries the various subnet management entities in the IB network to gather the various performance counters specified by the IB standard. We provide an easy to use web-based interface to visualize performance counters and subnet management attributes of a cluster in an on-demand basis. It is also capable of capturing the communication characteristics of a subset of links in the network. Our experimental results show that INAM is able to accurately visualize the link utilization as well as the communication pattern of target applications.
This research is supported in part by Sandia Laboratories grant #1024384, U.S. Department of Energy grants #DE-FC02-06ER25749, #DE-FC02-06ER25755 and contract #DE-AC02-06CH11357; National Science Foundation grants #CCF-0621484, #CCF-0702675, #CCF-0833169, #CCF-0916302 and #OCI-0926691; grant from Wright Center for Innovation #WCI04-010-OSU-0; grants from Intel, Mellanox, Cisco, QLogic, and Sun Microsystems; Equipment donations from Intel, Mellanox, AMD, Obsidian, Advanced Clustering, Appro, QLogic, and Sun Microsystems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Barth, W.: Nagios. System and Network Monitoring. No Starch Press, U.S. Ed edn. (2006)
Charts, H.: HighCharts JS - Interactive JavaScript Charting, http://www.highcharts.com/
DWR: DWR - Direct Web Remoting, http://directwebremoting.org/dwr/
Hoefler, T., Schneider, T., Lumsdaine, A.: Multistage Switches are not Crossbars: Effects of Static Routing in High-Performance Networks. In: Proceedings of the 2008 IEEE Cluster Conference (September 2008)
InfiniBand Trade Association, http://www.infinibandta.org/
Massie, M.L., Chun, B.N., Culler, D.E.: The Ganglia Distributed Monitoring System: Design, Implementation, and Experience. Parallel Computing 30(7) (July 2004)
Mellanox: Fabric-it, http://www.mellanox.com/pdf/prod_ib_switch_systems/pb_FabricIT_EFM.pdf
MVAPICH2, http://mvapich.cse.ohio-state.edu/
MySQL: MySQL, http://www.mysql.com/
Müller, M.S., van Waveren, G.M., Lieberman, R., Whitney, B., Saito, H., Kumaran, K., Baron, J., Brantley, W.C., Parrott, C., Elken, T., Feng, H., Ponder, C.: Spec mpi2007 - an application benchmark suite for parallel systems using mpi. Concurrency and Computation: Practice and Experience, 191–205 (2010)
Open Fabrics Alliance, http://www.openfabrics.org/
SUN: Java 2 platform, enterprise edition (j2ee) overview, http://java.sun.com/j2ee
Top500: Top500 Supercomputing systems (November 2010), http://www.top500.org
Vienne, J., Martinasso, M., Vincent, J.M., Méhaut, J.F.: Predictive models for bandwidth sharing in high performance clusters. In: Proceedings of the 2008 IEEE Cluster Conference (September 2008)
W3C: HTML5 - Canvas Element, https://developer.mozilla.org/en/HTML/Canvas
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dandapanthula, N. et al. (2012). INAM - A Scalable InfiniBand Network Analysis and Monitoring Tool. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29740-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-29740-3_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29739-7
Online ISBN: 978-3-642-29740-3
eBook Packages: Computer ScienceComputer Science (R0)