Nothing Special   »   [go: up one dir, main page]

skip to main content
article

HPCVIEW: A Tool for Top-down Analysis of Node Performance

Published: 01 August 2002 Publication History

Abstract

It is increasingly difficult for complex scientific programs to attain a significant fraction of peak performance on systems that are based on microprocessors with substantial instruction-level parallelism and deep memory hierarchies. Despite this trend, performance analysis and tuning tools are still not used regularly by algorithm and application designers. To a large extent, existing performance tools fail to meet many user needs and are cumbersome to use. To address these issues, we developed HPCVIEW—a toolkit for combining multiple sets of program profile data, correlating the data with source code, and generating a database that can be analyzed anywhere with a commodity Web browser. We argue that HPCVIEW addresses many of the issues that have limited the usability and the utility of most existing tools. We originally built HPCVIEW to facilitate our own work on data layout and optimizing compilers. Now, in addition to daily use within our group, HPCVIEW is being used by several code development teams in DoD and DoE laboratories as well as at NCSA.

References

[1]
1. V. Adve, J. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, December 1995. http://www.supercomp.org/sc95/proceedings/528_VADV/PAPER.PS.]]
[2]
2. J. Anderson, L. Berc, J. Dean, S. Ghemawat, M. Henzinger, S. Leung, R. Sites, M. Vandevoorde, C. Waldspurger, and W. Weihl. Continuous profiling: Where have all the cycles gone. In Proceedings of the 16th ACM Symposium of Operating Systems Principles.]]
[3]
3. D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined machines. Journal of Parallel and Distributed Computing, 5(4):334-358, 1988.]]
[4]
4. K. W. Cameron, Y. Luo, and J. Scharmeier. Instruction-level microprocessor modeling of scientific applications. In ISHPC 1999, pp. 29-40, Japan, May 1999.]]
[5]
5. M. Crovella and T. LeBlanc. Parallel performance prediction using lost cycles. In Proceedings Supercomputing'94, pp. 600-610, November 1994.]]
[6]
6. J. Dean, J. E. Hicks, C. A. Waldspurger, W. E. Weihl, and G. Chrysos. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In Proceedings of the 30th Annual International Symposium on Microarchitecture (Micro '97), Research Triangle Park, NC, 1997. http://citeseer.nj.nec.com/dean97profileme.html.]]
[7]
7. L. DeRose, Y. Zhang, and D. Reed. SvPablo: A multi-language performance analysis system. In 10th International Conference on Performance Tools, pp. 352-355, September 1998.]]
[8]
8. Luiz DeRose. The hardware performance monitor toolkit. In European Conference on Parallel Computing (Euro-Par), Lecture Notes in Computer Science 2150, pp. 122-131, Manchester, United Kingdom, August 2001. Springer-Verlag.]]
[9]
9. A. J. Goldberg and J. Hennessy. MTOOL: A method for isolating memory bottlenecks in shared memory multiprocessor programs. In Proceedings of the 1991 International Conference on Parallel Processing, volume II, Software, pp. II-251-II-257, Boca Raton, FL, August 1991. CRC Press.]]
[10]
10. W3C Math Working Group. Mathematical markup language (MathML) 1.01 specification, July 1999. http://www.w3.org/TR/REC-MathML.]]
[11]
11. Paul Havlak. Nesting of reducible and irreducible loops. ACM Transactions on Programming Languages and Systems, 19(4):557-567, July 1997.]]
[12]
12. The ASCI 30-TeraOps SMG98 Sample Application. DOE Accelerated Strategic Computing Initiative. http://www.acl.lanl.gov/30TeraOpRFP/SampleApps/smg98/smg98.html.]]
[13]
13. The ASCI sweep3d Benchmark Code. DOE Accelerated Strategic Computing Initiative. http://www.llnl.gov/asci_benchmarks/asci/limited/sweep3d/asci_sweep3d.html.]]
[14]
14. C. Janssen. The Visual Profiler, 1999. http://aros.ca.sandia.gov/~cljanss/perf/vprof/doc/README. html.]]
[15]
15. T. Y. Johnston and R. H. Johnson. Program performance measurement. Technical report SLAC User Note 33, Rev. 1. SLAC, Stanford University, California, 1970.]]
[16]
16. D. E. Knuth and F. R. Stevenson. Optimal measurement points for program frequency counts. BIT, 13(3):313-322, 1973.]]
[17]
17. J. Larus and E. Schnarr. EEL: Machine-independent executable editing. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 291-300, June 1995.]]
[18]
18. M. Martonosi, D. Ofelt, and M. Heinrich. Integrating performance monitoring and communication in parallel computers. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 138-147, May 1996.]]
[19]
19. W. Meira Jr., T. LeBlanc, and A. Poulos. Waiting time analysis and performance visualization in Carnival. In Proceedings of ACM SIGMETRICS Symposium on Parallel and Distributed Tools, pp. 1-10, May 1996.]]
[20]
20. John Mellor-Crummey, Robert Fowler, and David Whalley. On providing useful information for analyzing and tuning applications. In Joint International Conference on Measurement & Modeling of Computer Systems, pp. 332-333, Cambridge, MA, June 2001.]]
[21]
21. John Mellor-Crummey, Robert Fowler, and David Whalley. Tools for application-oriented performance tuning. In Proceedings of the 15th ACM International Conference on Supercomputing, pp. 154-165, Sorrento, Italy, June 2001.]]
[22]
22. Silicon Graphics Incorporated. MIPS R10000 Microprocessor User's Manual Version 2.0, 1996. http://www.sgi.com/processors/r10k/manual.html.]]
[23]
23. Sun Microsystems. Analyzing Program Performance With Sun WorkShop, 2001. http://docs.sun.com/ htmlcoll/coll.36.7/iso-8859-1/SWKSHPPERF/AnalyzingTOC.html.]]
[24]
24. R. E. Tarjan. Testing flow graph reducibility. Journal of Computer and System Sciences, 9:355-365, 1974.]]
[25]
25. M. Zagha, B. Larson, S. Turner, and M. Itzkowitz. Performance analysis using the MIPS R10000 performance counters. In Proceedings Supercomputing '96, November 1996.]]

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 23, Issue 1
August 2002
124 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2002

Author Tags

  1. binary analysis
  2. performance evaluation
  3. software performance
  4. software tools

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Comparative Evaluation of Call Graph Generation by Profiling ToolsHigh Performance Computing10.1007/978-3-031-07312-0_11(213-232)Online publication date: 29-May-2022
  • (2021)Ubiquitous Performance AnalysisHigh Performance Computing10.1007/978-3-030-78713-4_23(431-449)Online publication date: 24-Jun-2021
  • (2019)HatchetProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356219(1-21)Online publication date: 17-Nov-2019
  • (2019)Pinpointing performance inefficiencies via lightweight variance profilingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356167(1-19)Online publication date: 17-Nov-2019
  • (2016)CaliperProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3014904.3014967(1-11)Online publication date: 13-Nov-2016
  • (2015)Energy Measurement Tools for Ultrascale ComputingSupercomputing Frontiers and Innovations: an International Journal10.14529/jsfi1502042:2(64-76)Online publication date: 6-Apr-2015
  • (2012)Novel views of performance data to analyze large-scale adaptive applicationsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/2388996.2389038(1-11)Online publication date: 10-Nov-2012
  • (2012)Understanding and detecting real-world performance bugsACM SIGPLAN Notices10.1145/2345156.225407547:6(77-88)Online publication date: 11-Jun-2012
  • (2012)Understanding and detecting real-world performance bugsProceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/2254064.2254075(77-88)Online publication date: 11-Jun-2012
  • (2012)Cache Conscious Task Regrouping on Multicore ProcessorsProceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)10.1109/CCGrid.2012.139(603-611)Online publication date: 13-May-2012
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media