Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/165123.165163acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article
Free access

Limitations of cache prefetching on a bus-based multiprocessor

Published: 01 May 1993 Publication History

Abstract

Compiler-directed cache prefetching has the potential to hide much of the high memory latency seen by current and future high-performance processors. However, prefetching is not without costs, particularly on a multiprocessor. Prefetching can negatively affect bus utilization, overall cache miss rates, memory latencies and data sharing. We simulated the effects of a particular compiler-directed prefetching algorithm, running on a bus-based multiprocesssor. We showed that, despite a high memory latency, this architecture is not very well-suited for prefetching. For several variations on the architecture, speedups for five parallel programs were no greater than 39%, and degradations were as high as 7%, when prefetching was added to the workload. We examined the sources of cache misses, in light of several different prefetching strategies, and pinpointed the causes of the performance changes. Invalidation misses pose a particular problem for current compiler-directed prefetchers. We applied two techniques that reduced their impact: a special prefetching heuristic tailored to write-shared data, and restructuring shared data to reduce false sharing, thus allowing traditional prefetching algorithms to work well.

References

[1]
D. Callahan, K. Kennedy, and A. Portertield. Software prefetching. In Fourth International Conference on Architectural Support }or Programming Languages and Operating Systems, pages 40-52, April 1991.
[2]
W.Y. Chen, S.A. Mahlke, P.P. Chang, and W.W. Hwu. Data access microarchitectures for superscalar processors with compiler-assisted data prefetching. In Proceedings o-f 2$th International Symposium on Microarchitecture, 1991.
[3]
S. Devadas and A.R. Newton. Topological optimization of multiple level array logic. IEEE Transactions on Computer-Aided Design, November 1987.
[4]
S.J. Eggers. Simulation analysis of data sharing in shared memory multiprocessors. Technical Report No. UCB/CSD 89/501 (Ph.D. thesis), University of California, Berkeley, March 1989.
[5]
S.J. Eggers. Simplicity versus accuracy in a model of cache coherency overhead. IEEE Transactions on Computers, 40(8):893-906, August 1991.
[6]
S.J. Eggers and T.E. Jeremiassen. Eliminating false sharing. In International Conference on Parallel Processing, volume i, pages 377-381, St. Charles IL, August 1991.
[7]
S.J. Eggers, D.R. Keppel, E.J. Koldinger, and H.M. Levy. Techniques for inline tracing on a shared-memory multiprocessor. In Proceedings of the 1990 A CM Sigmetrics, pages 37-47, Santa Fe NM, May 1990.
[8]
J.L. Hennessy and N.P. Jouppi. Computer technology and architecture: An evolving interaction. IEEE Computer, 24(9):18-29, September 1991.
[9]
T.E. Jeremiassen and S.J. Eggers. Computing per-process summary side-effect information. In Preliminary Proceedings o} the Fifth Workshop on Languages and Compilers for Parallel Computing, pages 115-122, New Haven CT, August 1992.
[10]
N.P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In 17th Annual International Symposium on Computer Architecture, pages 364-373~ May 1990.
[11]
A.C. Klaiber and H.M. Levy. An architecture for softwarecontrolled data prefetching. In 18th Annual International Symposium on Computer Architecture, pages 43-53, May 1991.
[12]
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In 8th Annual International Symposium on Computer Architecture, pages 81-87, 1981.
[13]
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy. The DASH prototype: Logic overhead and performance. IEEE Transactions on Parallel and Distributed Systems. To appear.
[14]
R. Lovett and S. Thakkar. The symmetry multiprocessor system. In International Conference on Parallel Processing, pages 303-310, University Park PA, August 1988.
[15]
H-K. T. Ms, S. Devadas, R. Wet, and A. Sangiovanni- Vincentelli. Logic verification algorithms and their parallel implementation. In Proceedings of the ~$th Design Automation Conference, pages 283-290, July 1987.
[16]
Motorola. MC88100 RISC Microprocessor User's Manual. Prentice Hall, 1990.
[17]
T.C. Mowry and A. Gupta. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distributed Computing, 12(2):87-106, June 1991.
[18]
T.C. Mowry, M.S. Lain, and A. Gupta. Design and evaluation of a compiler algoritiun for prefetching. In Fifth International Conference on Architectural Support .for Programming Languages and Operating Systems, pages 62-73, October 1992.
[19]
M.S. Papamarcos and J.H. Patel. A low-overhead coherence solution for multiprocessors with private cache memories. In llth Annual International Symposium on Computer Architecture, pages 348-354, 1984.
[20]
C. Scheurich axed M. Dubois. Lockup-free caches in highperformance multiprocessors. Journal o-f Parallel and Distributed Computing, 11(1):25-36, January 1991.
[21]
J.P. Singh, W. Weber, and A. Gupta. SPLASH: Stanford parallel applications for shared-memory. Technical Report CSL-TR-91-469, Computer Systems Laboratory, Stanford University, 1991.
[22]
G.S. Sohi and M. Franklin. High-bandwidth data memory systems for superscalar processor. In Fourth International Conference on Architectural Support -for Programming Languages and Operating Systems, pages 53--62, April 1991.
[23]
J. Torrellas, M.S. Lain, and J.L. Hennessy. Shared data placement optimizations to reduce multiprocessor cache miss rates. In International Conference on Parallel Processing, volume II, pages 266-270, St. Charles IL, August 1990.

Cited By

View all
  • (2009)Coordinated control of multiple prefetchers in multi-core systemsProceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/1669112.1669154(316-326)Online publication date: 12-Dec-2009
  • (2008)Analyzing memory access intensity in parallel programs on multicoreProceedings of the 22nd annual international conference on Supercomputing10.1145/1375527.1375579(359-367)Online publication date: 7-Jun-2008
  • (2006)Friendly fire: understanding the effects of multiprocessor prefetches2006 IEEE International Symposium on Performance Analysis of Systems and Software10.1109/ISPASS.2006.1620802(177-188)Online publication date: 2006
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '93: Proceedings of the 20th annual international symposium on computer architecture
June 1993
361 pages
ISBN:0818638109
DOI:10.1145/165123
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 21, Issue 2
    Special Issue: Proceedings of the 20th annual international symposium on Computer architecture (ISCA '93)
    May 1993
    348 pages
    ISSN:0163-5964
    DOI:10.1145/173682
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 1993

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

20ISCA93
Sponsor:
20ISCA93: 20th International Symposium on Computer Architecture
May 16 - 19, 1993
California, San Diego, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)17
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2009)Coordinated control of multiple prefetchers in multi-core systemsProceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/1669112.1669154(316-326)Online publication date: 12-Dec-2009
  • (2008)Analyzing memory access intensity in parallel programs on multicoreProceedings of the 22nd annual international conference on Supercomputing10.1145/1375527.1375579(359-367)Online publication date: 7-Jun-2008
  • (2006)Friendly fire: understanding the effects of multiprocessor prefetches2006 IEEE International Symposium on Performance Analysis of Systems and Software10.1109/ISPASS.2006.1620802(177-188)Online publication date: 2006
  • (2005)Computing per-process summary side-effect informationLanguages and Compilers for Parallel Computing10.1007/3-540-57502-2_47(175-191)Online publication date: 3-Jun-2005
  • (2004)CQoSProceedings of the 18th annual international conference on Supercomputing10.1145/1006209.1006246(257-266)Online publication date: 26-Jun-2004
  • (2004)Managing Wire Delay in Large Chip-Multiprocessor CachesProceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2004.21(319-330)Online publication date: 4-Dec-2004
  • (2002)Avoiding initialization misses to the heapProceedings of the 29th annual international symposium on Computer architecture10.5555/545215.545236(183-194)Online publication date: 25-May-2002
  • (2002)Avoiding initialization misses to the heapACM SIGARCH Computer Architecture News10.1145/545214.54523630:2(183-194)Online publication date: 1-May-2002
  • (2001)Handling long-latency loads in a simultaneous multithreading processorProceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture10.5555/563998.564038(318-327)Online publication date: 1-Dec-2001
  • (2001)Hardware prefetching in bus-based multiprocessors: pattern characterization and cost-effective hardwareProceedings Ninth Euromicro Workshop on Parallel and Distributed Processing10.1109/EMPDP.2001.905061(345-354)Online publication date: 2001
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media