Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

Architectural support for performance tuning: a case study on the SPARCcenter 2000

Published: 01 April 1994 Publication History

Abstract

Latency hiding techniques such as multilevel cache hierarchies yield high performance when applications map well onto hierarchy implementations, but performance can suffer drastically when they do not. Identifying and reducing mismatches between an application and the memory hierarchy is difficult without insight into the actual behavior of the hardware implementation. We advocate the use of hardware event counters, as a cheap, effective and practical way to tune applications for a given hardware platform. We take a case study approach, focussing on the counters available on the SPARCcenter 2000, a 20 processor, shared-bus based multiprocessor. We describe the tools we built to relate hardware event counts to user applications and give examples to illustrate how these tools are useful in practice. We conclude with a critique of the current hardware counters, offering a user's perspective on how they could be redesigned to be more effective.

References

[1]
Fuad Abu-Nofal, et al, A Three-Million-Transistor Microprocessor. In Proceedings of the IEEE International Solid- State Circuits Conference 1992.
[2]
H. Burkhart and R. Millen, Performance Measurement Tools in a Multiprocessor Environment. In IEEE Transactions on Computers, vol. 38, no. 5, May. 1989.
[3]
D. Callahan, K. Kennedy, and A. Portertield, Analyzing and Visualizing Performance of Memory Hierarchies. in M. Simmons and R. Koskela, editors, Performance Instrumentation and Visualization, pp. 1-26. Addison Wesley, 1990.
[4]
J. Dongarra, O. Brewer, J. Kohl, and S. Fineberg, A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms on Parallel Processors. In Journal of Parallel and Distributed Computing, vol. 9, no. 2, pp 185- 202 Feb. 1990.
[5]
Jean-Marc Frailong, Michel Cekleov, Pradeep Sindhu, Jean Gastinel, Mike Splain, Jeff Price and Ashok Singhal, The Next-Generation SPARC Multiprocessing System Architecture. In Proceedings, COMPCON 93.
[6]
A_ J_ Goldberg and j_ L. Hennessy. Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications. In IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 1, pp 28-40, Jan. 1993.
[7]
S.L. Graham, P. B. Kessler, and M. K. McKusick, Gprof: A Call Graph Execution Profiler. In ACM SIGPLAN Notices, vol. 17, no. 6, pp. 120-6, Jun. 1982.
[8]
M.D. Hill, Aspects of Cache Memory and Instruction Buffer Performance, Ph.D. Thesis, University of California at Berkeley, Computer Science Division, Tech Rep. UCB/CSD 87/381.
[9]
J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufman Publishers Inc., San Mateo, CA 1990.
[10]
E Kessler, Fast Break-points: Design and Implementation. In Proceedings of the ACM SIGPLAN '90 Conference on Programming Language Design and Implementation, pp. 78-84, Jun. 1990.
[11]
D. Leaaosld, J. Laudon, T. Joe, et. al., The DASH Prototype: Logic Overhead and Performance. In IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 1, pp 41-6 i, Jan. 1993.
[12]
M.S. Lam, E. E. Rothberg and M. E. Wolf, The Cache Performance and Optimizations of Blocked Algorithms. In Proceedings, ASPLOS -IV, April 1991.
[13]
M.R. Martonosi, A. Gupta, and T. Anderson, MemSpy: Analyzing Memory System Bottlenecks in Programs. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp 1-12, 1992.
[14]
John K. Ousterhout, Tel: An Embeddable Command Language. In 1990 Winter USEN1X Conference Proceedings.
[15]
D.A. Reed, R. A. Aydt, T. M. Madhyastha, et. al. An Overview of the Pabloe Performance Analysis EnvJLronment, Dept. of Computer Science, U. of Illinois, Unpublished Manuscript, Nov. 1992.
[16]
P. Sindhu, J. Frailong, J. Gastinel, M. Cekleov,. L. Yuan, B. Gunning and D. Curry, XDBus: A High Performance, Consistent, Packet-Switched Bus. In Proceedings, COMPCON 93.
[17]
T. Sterling, A. Musciano, B. Donald, and R. Osborne, Multiprocessor Performance Measurement Using F.mbedded Instrumentation. In Proceedings of the International Conference on Parallel Processing, vol I, pp. 156-,65, Aug. 1988.
[18]
S. Thakkar, Performance of ParaUel Applications on a Shared-Memory Multiprocessor. In M. Simmons and R. Koskela, editors, Performance Instrumentation and Visualization, pp. 235-58. Addison Wesley, 1990.

Cited By

View all
  • (2005)Workload analysis of computation intensive tasks: Case study on SPEC CPU95 benchmarksEuro-Par'97 Parallel Processing10.1007/BFb0002841(971-984)Online publication date: 26-Sep-2005
  • (1996)Performance analysis using the MIPS R10000 performance countersProceedings of the 1996 ACM/IEEE conference on Supercomputing10.1145/369028.369059(16-es)Online publication date: 17-Nov-1996
  • (1996)Informing memory operationsACM SIGARCH Computer Architecture News10.1145/232974.23300024:2(260-270)Online publication date: 1-May-1996
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 22, Issue 2
Special Issue: Proceedings of the 21st annual international symposium on Computer architecture (ISCA '94)
April 1994
386 pages
ISSN:0163-5964
DOI:10.1145/192007
Issue’s Table of Contents
  • cover image ACM Conferences
    ISCA '94: Proceedings of the 21st annual international symposium on Computer architecture
    April 1994
    394 pages
    ISBN:0818655100
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1994
Published in SIGARCH Volume 22, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)14
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2005)Workload analysis of computation intensive tasks: Case study on SPEC CPU95 benchmarksEuro-Par'97 Parallel Processing10.1007/BFb0002841(971-984)Online publication date: 26-Sep-2005
  • (1996)Performance analysis using the MIPS R10000 performance countersProceedings of the 1996 ACM/IEEE conference on Supercomputing10.1145/369028.369059(16-es)Online publication date: 17-Nov-1996
  • (1996)Informing memory operationsACM SIGARCH Computer Architecture News10.1145/232974.23300024:2(260-270)Online publication date: 1-May-1996
  • (1996)Informing memory operationsProceedings of the 23rd annual international symposium on Computer architecture10.1145/232973.233000(260-270)Online publication date: 15-May-1996
  • (1995)Memory system performance of UNIX on CC-NUMA multiprocessorsProceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems10.1145/223587.223588(1-13)Online publication date: 1-May-1995
  • (1995)Memory system performance of UNIX on CC-NUMA multiprocessorsACM SIGMETRICS Performance Evaluation Review10.1145/223586.22358823:1(1-13)Online publication date: 1-May-1995

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media