Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors

Published: 01 April 1996 Publication History

Abstract

We study the efficiency of previously proposed stride and sequential prefetching two promising hardware-based prefetching schemes to reduce read-miss penalties in shared-memory multiprocessors. Although stride accesses dominate in four out of six of the applications we study, we find that sequential prefetching does as well as and in same cases even better than stride prefetching for five applications. This is because 1) most strides are shorter than the block size (we assume 32 byte blocks), which means that sequential prefetching is as effective for these stride accesses, and 2) sequential prefetching also exploits the locality of read misses with nonstride accesses. However, since stride prefetching in general results in fewer useless prefetches, it offers the extra advantage of consuming less memory-system bandwidth.

References

[1]
J.-L. Baer and T.-F. Chen, "An Effective On-Chip Preloading Scheme To Reduce Data Access Penalty," Proc. Supercomputing '91, pp. 176-186, 1991,.]]
[2]
R. Bianchini and T.J. LeBlanc, "A Preliminary Evaluation of Cache-Miss-Initiated Prefetching Techniques in Scalable Multiprocessors," Tech. Report 515, Computer Science Dept., Univ. of Rochester, May 1994.]]
[3]
M. Brorsson F. Dahlgren H. Nilsson and P. Stenström, "The CacheMire Test Bench_A Flexible and Effective Approach for Simulation of Multiprocessors," Proc. 26th Ann. Simulation Symp., pp. 41-49, 1993.]]
[4]
L.M. Censier and P. Feautrier, "A New Solution to Coherence Problems in Multicache Systems," IEEE Trans. Computers, vol. 27, pp. 1,112-1,118, Dec. 1978.]]
[5]
T.-F. Chen and J.-L. Baer, "A Performance Study of Software and Hardware Data Prefetching Schemes," Proc. 21st Int'l Symp. Computer Architecture, pp. 223-232, 1994.]]
[6]
F. Dahlgren M. Dubois and P. Stenström, "Fixed and Adaptive Sequential Prefetching in Shared-Memory Multiprocessors," Proc. Int'l Conf. Parallel Processing, vol. I, pp. 56-63, 1993.]]
[7]
F. Dahlgren M. Dubois and P. Stenström, "Combined Performance Gains of Simple Cache Protocol Extensions," Proc. 21st Int'l Symp. Computer Architecture, pp.187-197, 1994,.]]
[8]
F. Dahlgren and P. Stenström, "Effectiveness of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors," Proc. First Int'l Symp. High-Performance Computer Architecture, pp. 68-77, 1995.]]
[9]
J. Fu and J.H. Patel, "Data Prefetching in Multiprocessor Vector Cache Memories," Proc. 18th Int'l Symp. Computer Architecture, pp. 54-63, 1991.]]
[10]
J. Fu J.H. Patel and B.L. Janssens, "Stride Directed Prefetching in Scalar Processors," Proc. 25th Ann. Int'l Symp. Microarchitecture, pp. 102-110, 1992,.]]
[11]
K. Gharachorloo A. Gupta and J. Hennessy, "Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors," Proc. ASPLOS IV, pp. 245-257, 1991.]]
[12]
A. Gupta, et al., "Comparative Evaluation of Latency Reducing and Toleratin Techniques," Proc. 18th Int'l Symp. Computer Architecture, pp. 254-263, 1991.]]
[13]
E. Hagersten, "Towards Scalable Cache Only Memory Architectures," PhD thesis, SICS Dissertation Series 08, Swedish Inst. of Computer Science, Oct. 1992.]]
[14]
D. Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization," Proc. Eighth Int'l Symp. Computer Architecture, pp. 81-87, 1981.]]
[15]
R. Lee P-C. Yew and D. Lawrie, "Data Prefetching in Shared-Memory Multiprocessors," Proc. Int'l Conf. Parallel Processing, pp. 28-31, 1987.]]
[16]
T. Mowry and A. Gupta, "Tolerating Latency through Software-Controlled Prefetching in Scalable Shared- Memory Multiprocessors," J. Parallel and Distributed. Computing, vol. 12, pp. 87-106, June 1991.]]
[17]
T. Mowry, "Tolerating Latency Through Software Controlled Data Prefetching," PhD Thesis, Dept. of Computer Science, Stanford Univ., Palo, Alto, Calif., Mar. 1994.]]
[18]
J.P. Singh W.-D. Weber and A. Gupta, "SPLASH: Stanford Parallel Applications for Shared-Memory," Computer Architecture News, vol. 20, pp. 5-44, Mar. 1992.]]
[19]
I. Sklenar, "Prefetch Unit for Vector Operations on Scalar Computers," Computer Architecture News, vol. 20, pp. 31-37, Sept. 1992.]]
[20]
A.J. Smith, "Sequential Program Prefetching in Memory Hierarchies," Computer, vol. 11, no. 12, pp. 7-21, Dec. 1978.]]
[21]
P. Stenström, "A Survey of Cache Coherence Scheme for Multiprocessors," Computer, vol. 23, no. 6, pp. 12-24, Jun.e 1990.]]
[22]
P. Stenström F. Dahlgren and L. Lundberg, "A Lockup-Free Multiprocessor Cache Design," Proc. Int'l Conf. Parallel Processing, vol. I, pp. 246-250, 1991.]]

Cited By

View all
  • (2018)SelSMaPACM Transactions on Architecture and Code Optimization10.1145/327465015:4(1-21)Online publication date: 10-Oct-2018
  • (2018)Criticality aware tiered cache hierarchyProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00019(96-109)Online publication date: 2-Jun-2018
  • (2015)Efficiently prefetching complex address patternsProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830793(141-152)Online publication date: 5-Dec-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 7, Issue 4
April 1996
128 pages
ISSN:1045-9219
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 April 1996

Author Tags

  1. Hardware-controlled prefetching
  2. latency tolerance
  3. performance evaluation
  4. relaxed memory consistency
  5. shared-memory multiprocessors.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)SelSMaPACM Transactions on Architecture and Code Optimization10.1145/327465015:4(1-21)Online publication date: 10-Oct-2018
  • (2018)Criticality aware tiered cache hierarchyProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00019(96-109)Online publication date: 2-Jun-2018
  • (2015)Efficiently prefetching complex address patternsProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830793(141-152)Online publication date: 5-Dec-2015
  • (2013)Dual-addressing memory architecture for two-dimensional memory access patternsProceedings of the Conference on Design, Automation and Test in Europe10.5555/2485288.2485308(71-76)Online publication date: 18-Mar-2013
  • (2010)Multi-level hardware prefetching using low complexity delta correlating prediction tables with partial matchingProceedings of the 5th international conference on High Performance Embedded Architectures and Compilers10.1007/978-3-642-11515-8_19(247-261)Online publication date: 25-Jan-2010
  • (2009)Memory resource allocation for file system prefetchingProceedings of the 4th ACM European conference on Computer systems10.1145/1519065.1519075(75-88)Online publication date: 1-Apr-2009
  • (2008)Low-Cost Adaptive Data PrefetchingProceedings of the 14th international Euro-Par conference on Parallel Processing10.1007/978-3-540-85451-7_36(327-336)Online publication date: 26-Aug-2008
  • (2007)Optimal multistream sequential prefetching in a shared cacheACM Transactions on Storage10.1145/1288783.12887893:3(10-es)Online publication date: 1-Oct-2007
  • (2006)Conjugate gradient sparse solversProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898822(297-297)Online publication date: 25-Apr-2006
  • (2004)Fighting the memory wall with assisted executionProceedings of the 1st conference on Computing frontiers10.1145/977091.977116(168-180)Online publication date: 14-Apr-2004
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media