Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/762761.762796acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Monitoring data archives for grid environments

Published: 16 November 2002 Publication History

Abstract

Developers and users of high-performance distributed systems often observe performance problems such as unexpectedly low throughput or high latency. To determine the source of these performance problems, detailed end-to-end monitoring data from applications, networks, operating systems, and hardware must be correlated across time and space. Researchers need to be able to view and compare this very detailed monitoring data from a variety of angles. To address this problem, we propose a relational monitoring data archive that is designed to efficiently handle high-volume streams of monitoring data. In this paper we present an instrumentation and monitoring event archive service that can be used to collect and aggregate detailed end-to-end monitoring information from distributed applications. This archive service is designed to be scalable and fault tolerant. We also show how the archive is based on the "Grid Monitoring Architecture" defined by the Global Grid Forum.

References

[1]
Allcock B., Bester, J., Bresnahan, J., Chervenak, A., Foster, I., et. al. Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing. IEEE Mass Storage Conference, 2001.
[2]
Bethel, W., B. Tierney, J. Lee, D. Gunter, S. Lau. Using High-Speed WANs and Network Data Caches to Enable Remote and Distributed Visualization. Proceeding of the IEEE Supercomputing 2000 Conference, Nov. 2000.
[3]
Cancio, G., S. Fisher, T. Folkes, F. Giacomini, W. Hoschek, D. Kelsey, B. Tierney. The DataGrid Architecture. http://grid-atf.web.cern.ch/grid-atf/doc/architecture-2001-07-02.pdf
[4]
Chervenak, A., et. al., Giggle: A Framework for Constructing Scalable Replica Location Services, Proceeding of the IEEE Supercomputing 2002 Conference, Nov. 2002.
[5]
Coghlan, B., A case for Relational GIS/GMA using Relaxed Consistency, GGF Informational Draft GWD-GP-11-1, http://www.gridforum.org/1_GIS/RDIS.htm
[6]
CORBA. Systems Management: Event Management Service. X/Open Document Number: P437, http://www.opengroup.org/onlinepubs/008356299/
[7]
Desktop Management Task Force Common Information Model (CIM), http://www.dmtf.org/standards/standard_cim.php
[8]
Dinda, P. and B. Plale. A Unified Relational Approach to Grid Information Services. Grid Forum Informational Draft GWD-GIS-012-1, http://www.gridforum.org/1_GIS/RDIS.htm
[9]
European Data Grid Project http://www.eu-datagrid.org/
[10]
Fisher, S., Relational Model for Information and Monitoring, GGF Informational Draft GWD-GP-7-1, http://www.gridfo-rum.org/1_GIS/RDIS.htm
[11]
Fisher, S. Relational Grid Monitoring Architecture Package, http://hepunx.rl.ac.uk/grid/wp3/releases.html
[12]
Floyd, S., Limited Slow-Start for TCP with Large Congestion Windows, IETF draft, work in progress, May 2002. URL: http://www.icir.org/floyd/papers/draft-floyd-tcp-slowstart-00b.txt.
[13]
Global Grid Forum (GGF): http://www.globalgridforum.org/
[14]
GriPhyN Project: http://www.griphyn.org/
[15]
Gunter, D., B. Tierney, K. Jackson, J. Lee, M. Stoufer, Dynamic Monitoring of High-Performance Distributed Applications, Proceedings of the 11th IEEE Symposium on High Performance Distributed Computing, July 2002.
[16]
Hoschek, W., G. McCance, Grid Enabled Relational Database Middleware, Global Grid Forum Informational Draft http://www.gridforum.org/1_GIS/RDIS.htm
[17]
Iperf, NLANR, http://dast.nlanr.net/Projects/Iperf/
[18]
Mathis, M., R. Reddy, J. Heffner and J. Saperia, TCP Extended Statistics MIB, IETF draft, February, 2002, http://www.ietf.org/internet-drafts/draft-ietf-tsvwg-tcp-mib-extension-00.txt
[19]
Miller, B., Callaghan, M., et al., The Paradyn parallel performance measurement tools, IEEE Computer, Vol. 28 (11), Nov. 1995.
[20]
Mills, D., Simple Network Time Protocol (SNTP), RFC 1769, University of Delaware, March 1995. http://www.eecis.udel.edu/~ntp/
[21]
Particle Physics Data Grid (PPDG): http://www.ppdg.net/
[22]
Ribler, R., J. Vetter, H. Simitci, D. Reed. Autopilot: Adaptive Control of Distributed Applications. Proceedings of the 7th IEEE Symposium on High-Performance Distributed Computing, Chicago, IL, July 1998.
[23]
Smith, W. A Framework for Control and Observation in Distributed Environments. NAS Technical Report Number: NAS-01-006, http://www.nas.nasa.gov/~wwsmith/
[24]
Snodgrass, R., A Relational Approach to Monitoring Complex Systems, ACM Transactions on Computer Systems, Vol. 6, No. 2 (1988), 157--196.
[25]
SQL. Database Language SQL. ANSI X3.135-1992
[26]
Swany, M. and R. Wolski, Representing Dynamic Performance Information in Grid Environments with the Network Weather Service, Proceeding of the 2nd IEEE International Symposium on Cluster Computing and the Grid, Berlin, Germany, May 2002
[27]
Thain, D., Jim Basney, Se-Chang Son, Miron Livny. The Kangaroo Approach to Data Movement on the Grid. Proceedings of the Tenth IEEE Symposium on High Performance Distributed Computing, San Francisco, California, August 2001
[28]
Tierney, B., R. Aydt, D. Gunter, W. Smith, V. Taylor, R. Wolski, M. Swany. A Grid Monitoring Service Architecture. Global Grid Forum White Paper. http://www-didc.lbl.gov/GridPerf/
[29]
Tierney, B., D. Gunter, J. Becla, B. Jacobsen, D. Quarrie. Using NetLogger for Distributed Systems Performance Analysis of the BaBar Data Analysis System. Proceedings of Computers in High Energy Physics 2000 (CHEP 2000), Feb. 2000.
[30]
Tierney, B., W. Johnston, B. Crowley, G. Hoo, C. Brooks, D. Gunter. The NetLogger Methodology for High Performance Distributed Systems Performance Analysis. Proceeding of IEEE High Performance Distributed Computing, July 1998, http://www-didc.lbl.gov/NetLogger/
[31]
Wu, X., Taylor, V., et. al., Design and Development of Prophesy Performance Database for Distributed Scientific Applications, Proc. the 10th SIAM Conference on Parallel Processing for Scientific Computing, Virginia, March 2001.
[32]
Yan, L., Sarukkai, S., and Mehra, P., Performance measurement, visualization and modeling of parallel and distributed programs using the AIMS toolkit, Software Practice and Experience, Vol. 25 (4), April 1995.

Cited By

View all
  • (2007)Functional architecture of performance measurement system based on grid monitoring architectureProceedings of the 13th International conference on Multimedia Modeling - Volume Part II10.1007/978-3-540-69429-8_61(576-583)Online publication date: 9-Jan-2007
  • (2003)Enabling Network Measurement Portability Through a Hierarchy of CharacteristicsProceedings of the 4th International Workshop on Grid Computing10.5555/951948.952043Online publication date: 17-Nov-2003
  • (2003)Web100ACM SIGCOMM Computer Communication Review10.1145/956993.95700233:3(69-79)Online publication date: 1-Jul-2003
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '02: Proceedings of the 2002 ACM/IEEE conference on Supercomputing
November 2002
952 pages
ISBN:076951524X

Sponsors

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 16 November 2002

Check for updates

Qualifiers

  • Article

Conference

SC '02
Sponsor:

Acceptance Rates

SC '02 Paper Acceptance Rate 67 of 230 submissions, 29%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2007)Functional architecture of performance measurement system based on grid monitoring architectureProceedings of the 13th International conference on Multimedia Modeling - Volume Part II10.1007/978-3-540-69429-8_61(576-583)Online publication date: 9-Jan-2007
  • (2003)Enabling Network Measurement Portability Through a Hierarchy of CharacteristicsProceedings of the 4th International Workshop on Grid Computing10.5555/951948.952043Online publication date: 17-Nov-2003
  • (2003)Web100ACM SIGCOMM Computer Communication Review10.1145/956993.95700233:3(69-79)Online publication date: 1-Jul-2003
  • (2002)A TCP tuning daemonProceedings of the 2002 ACM/IEEE conference on Supercomputing10.5555/762761.762777(1-16)Online publication date: 16-Nov-2002

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media