Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/237090.237142acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free access

An evaluation of memory consistency models for shared-memory systems with ILP processors

Published: 01 September 1996 Publication History

Abstract

Relaxed consistency models have been shown to significantly outperform sequential consistency for single-issue, statically scheduled processors with blocking reads. However, current microprocessors aggressively exploit instruction-level parallelism (ILP) using methods such as multiple issue, dynamic scheduling, and non-blocking reads. Researchers have conjectured that two techniques, hardware-controlled non-binding prefetching and speculative loads, have the potential to equalize the hardware performance of memory consistency models on such processors.This paper performs the first detailed quantitative comparison of several implementations of sequential consistency and release consistency optimized for aggressive ILP processors. Our results indicate that hardware prefetching and speculative loads dramatically improve the performance of sequential consistency. However, the gap between sequential consistency and release consistency depends on the cache write policy and the complexity of the cache-coherence protocol implementation. In most cases, release consistency significantly outperforms sequential consistency, but for two applications, the use of a write-back primary cache and a more complex cache-coherence protocol nearly equalizes the performance of the two models.We also observe that the existing techniques, which require on-chip hardware modifications, enhance the performance of release consistency only to a small extent. We propose two new software techniques --- fuzzy acquires and selective acquires --- to achieve more overlap than allowed by the previous implementations of release consistency. To enhance methods for overlapping acquires, we also propose a technique to eliminate control dependences caused by an acquire loop, using a small amount of off-chip hardware called the synchronization buffer.

References

[1]
S. V. Adve, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. Replacing Locks by Higher-Level Primitives. Technical Report TR94-237, Computer Science, Rice University, 1994.
[2]
S. V. Adve and M. D. Hill. A Unified Formalization of Four Shared-Memory Models. IEEE Trans. on Parallel and Distributed Systems, 4(6):613-624, June 1993.
[3]
A. Agarwal et al. The MIT Alewife Machine: Architecture and Performance. In Proc. of the ~~nd ISUA, pages 2-13, 1995.
[4]
R. Alverson et al. The Tera Computer System. In Proc. of the Intl. Conf. on Supercomputing, pages 1-6, 1990.
[5]
B. N. Bershad, M. J. Zekauskas, and W. A. Sawdon. The Midway Distributed Shared Memory System. Compcon, 1992.
[6]
R. G. Covington et el. The Efficient Simulation of Paraffl~l Computer Systems. Intl. Journal of Computer Simulation, 1:31-58, January 1991.
[7]
F. Dahlgren and P. Stenstrom. Effectiveness of Hardware- Based Stride and Sequential Prefetching in Shared-Memory Multiprocessora. In Proc. of the 1st Intl. Syrup. on High Performance Computer Architecture, 1995.
[8]
K. Gharachorloo et el. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In Proc. of the 17th ISCA, pages 15-26, May 1990.
[9]
K. Gharachorloo, A. Gupta, and J. Hennessy. Performance Evaluation of Memory Consistency Models for Shared- Memory Multiprocessors. In Proc. of ASPLOS IV, pages 245-257, 1991.
[10]
K. Gharachorloo, A. Gupta, and J. Hennessy. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proc. of the Intk Conj. on Parallel Processing, pages 1355-i364, 1991.
[11]
K. Gharachorloo, A. Gupta, and J. Hennessy. Hiding Memory Latency Using Dynamic Scheduling in Shared-Memory Multiprocessors. In Proc. of the 19th ISCA, pages 22-33, 1992.
[12]
J. R. Goodman, M. K. Vernon, and P. J. Woest. Efficient Synchronization Primitives for Large-Scale Cache-Coherent Multiprocessors. In Proc. of ASPLO$ III, pages 64-75, 1989.
[13]
A. Gupta et el. Comparative Evaluation of Latency Reducing and Tolerating Techniques. In Proc. of the 18th ISCA, pages 254-263, May 1991.
[14]
R. Gupta. The Fuzzy Barrier: A Mechanism for High Speed Synchronization of Processors. In Proc. of ASPLOS iII, pages 54-63, April 1989.
[15]
D. Hunt. Advanced Features of the 64-bit PA-8000. Hewlett Packard, 1996.
[16]
Intel Corporation. Pentium (r) Pro Family Developer's Manual
[17]
D. Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proc. of the 8th ISCA, pages 81-87, 1981.
[18]
L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. on Computers, C-28(9):690-691, September 1979.
[19]
MIPS Technologies, Inc. RIO00 Microprocessor User's Manual, Version 1.1, January 1996.
[20]
T. Mowry and A. Gupta. Tolerating Latency Through Software-Controlled Prefetching. JPDC, pages 87-106, June 1991.
[21]
V. S. Pat, P. Ranganathan, and S. V. Adve. The Impact of Instruction-Level Parallelism on Multiprocessor Performance and Simulation Methodolgy. Technical report, Rice University, July 1996.
[22]
U. Rajagopalan. The Effects of Interconnection Networks on the Performance of Shared-Memory Multiprocessors. Master's thesis, Rice University, January 1995.
[23]
M. Rosenblum et el. The Impact of Architectural Trends on Operating System Performance. In Proc. of the 15th Syrup. on Operating Systems Principles, pages 285-298, 1995.
[24]
J.P. Singh, W.-D. Weber, and A. Gupta. SPLASH: Stanford Parallel Applications for Sharecl-Memory. Computer Architecture News, 20(1):5-44, March 1992.
[25]
Spare International. The $PARC Architecture Manual, 1993. Version 9.
[26]
S. C. Woo et el. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proc. of the 22nd ISCA, pages 24-36, 1995.
[27]
R. N. Zucker and J.-L. Beer. A Performance Study of Memory Consistency Models. In Proc. of the i9th ISCA, pages 2-12, 1992.

Cited By

View all
  • (2020)Speculative Enforcement of Store Atomicity2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00053(555-567)Online publication date: Oct-2020
  • (2017)Hiding the Long Latency of Persist Barriers Using Speculative ExecutionACM SIGARCH Computer Architecture News10.1145/3140659.308024045:2(175-186)Online publication date: 24-Jun-2017
  • (2017)Using Thermal Stimuli to Enhance Photo-Sharing in Social MediaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/30900501:2(1-21)Online publication date: 30-Jun-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS VII: Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
October 1996
290 pages
ISBN:0897917677
DOI:10.1145/237090
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 1996

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

ASPLOS96
Sponsor:

Acceptance Rates

ASPLOS VII Paper Acceptance Rate 25 of 109 submissions, 23%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)120
  • Downloads (Last 6 weeks)21
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Speculative Enforcement of Store Atomicity2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00053(555-567)Online publication date: Oct-2020
  • (2017)Hiding the Long Latency of Persist Barriers Using Speculative ExecutionACM SIGARCH Computer Architecture News10.1145/3140659.308024045:2(175-186)Online publication date: 24-Jun-2017
  • (2017)Using Thermal Stimuli to Enhance Photo-Sharing in Social MediaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/30900501:2(1-21)Online publication date: 30-Jun-2017
  • (2017)Hiding the Long Latency of Persist Barriers Using Speculative ExecutionProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080240(175-186)Online publication date: 24-Jun-2017
  • (2016)Ranking-Oriented Collaborative FilteringACM Transactions on Information Systems10.1145/296040835:2(1-28)Online publication date: 21-Sep-2016
  • (2015)Efficiently enforcing strong memory ordering in GPUsProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830778(699-712)Online publication date: 5-Dec-2015
  • (2012)Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore ArchitecturesACM Transactions on Embedded Computing Systems10.1145/2180887.218089011S:1(1-32)Online publication date: 1-Jun-2012
  • (2009)Incorporating constraints in probabilistic XMLACM Transactions on Database Systems10.1145/1567274.156728034:3(1-45)Online publication date: 3-Sep-2009
  • (2009)Keyword search over relational tables and streamsACM Transactions on Database Systems10.1145/1567274.156727934:3(1-51)Online publication date: 3-Sep-2009
  • (2009)Semantics and complexity of SPARQLACM Transactions on Database Systems10.1145/1567274.156727834:3(1-45)Online publication date: 3-Sep-2009
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media