Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/782814.782854acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Automatic fence insertion for shared memory multiprocessing

Published: 23 June 2003 Publication History

Abstract

In general, the hardware memory consistency model in a multiprocessor system is not identical to the memory model at the programming language level. Consequently, the programming language memory model must be mapped onto the hardware memory model. Memory fence instructions can be inserted by the compiler where needed to accomplish this mapping. We have developed and implemented several fence insertion and optimization algorithms in our Pensieve compiler project. We present the different fence insertion optimization techniques that were used in this system to guarantee sequential consistency at the language level, and compare them using performance data. Our techniques target two hardware relaxed memory consistency models provided by SMPs based on IBM Power 3 and Intel Pentium 4. Our fence insertion optimization shows up to 17.2% and 32.7% performance improvement on average, with the IBM PowerPC and Intel Pentium 4 (Xeon) multiprocessors respectively.

References

[1]
Sarita V. Adve and Mark D. Hill. Weak ordering - a new definition. In Proceedings of The 17th Annual International Symposium on Computer Architecture (ISCA), pages 2--14, May 1990.]]
[2]
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison Wesley, 1986.]]
[3]
Andrew W. Appel. Modern Compiler Implementation in Java. Cambridge University Press, New York, 1998.]]
[4]
Apple Computer, IBM, and Motorola. PowerPC Microprocessor Common Hardware Reference Platform. Morgan Kaufmann Publishers, Inc., 1995.]]
[5]
Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to Algorithms. The MIT Press, 1990.]]
[6]
Intel Corporation, 2002. The IA-32 Intel® Architecture Software Developer's Manual.]]
[7]
Michel Dubois, Christoph Scheurich, and Faye Briggs. Memory access buffering in multiprocessors. In Proceedings of The 13th Annual International Symposium on Computer Architecture (ISCA), pages 434--442, June 1986.]]
[8]
Xing Fang. Inserting fences to guarantee sequential consistency. Master's thesis, Department of Computer Science and Engineering, Michigan State University, August 2002. Technical Report MSU-CSE-02-27.]]
[9]
Michael R. Garey and David S. Johnson. Computers and Intractability. W. H. Freeman and Company, 1979.]]
[10]
Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of The 17th Annual International Symposium on Computer Architecture (ISCA), pages 15--26, May 1990.]]
[11]
James R. Goodman. Cache consistency and sequential consistency. Technical Report CS-TR-91-1006, Department of Computer Science, University of Wisconsin, February 1991.]]
[12]
James Gosling, Bill Joy, and Guy Steele. The Java Language Specification. Addison Wesley, 1996.]]
[13]
Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, C-28(9):690--691, September 1979.]]
[14]
Jaejin Lee. Compilation Techniques for Explicitly Parallel Programs. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, October 1999. Technical Report UIUCDCS-R-99-2112.]]
[15]
Jaejin Lee and David A. Padua. Hiding relaxed memory consistency with compilers. In Proceedings of The 2000 International Conference on Parallel Architectures and Compilation Techniques, October 2000.]]
[16]
Jaejin Lee and David A. Padua. Hiding relaxed memory consistency with a compiler. IEEE Transactions on Computers: Special Issue on Parallel Architectures and Compilation Techniques, 50(8):824--833, August 2001.]]
[17]
Zhiyuan Li and Walid Abu-sufah. A technique for reducing synchronization overhead in large scale multiprocessors. In Proceedings of The 12th Annual International Symposium on Computer Architecture (ISCA), pages 284--291, 1985.]]
[18]
Zhiyuan Li and Walid Abu-sufah. On reducing data synchronization in multiprocessed loops. IEEE Transactions on Computers, C-36(1):105--109, January 1987.]]
[19]
Samuel P. Midkiff and David A. Padua. Compiler generated synchronization for do loops. In the 1986 International Conference on Parallel Processing, pages 19--22, August 1986.]]
[20]
Samuel P. Midkiff and David A. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, C-36(12):1485--1495, December 1987.]]
[21]
William Pugh. Fixing the Java memory model. In Proceedings of the ACM 1999 Java Grande Conference, June 1999.]]
[22]
Dennis Shasha and Marc Snir. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems, 10(2):282--312, April 1988.]]
[23]
Zehra Sura, Chi-Leung Wong, Xing Fang, Jaejin Lee, Samuel P. Midkiff, and David Padua. Automatic implementation of programming language consistency models. In Proceedings of The 15th International Workshop on Languages and Compilers for Parallel Computing (LCPC), July 2002.]]

Cited By

View all
  • (2022)Speculative Load Forwarding Attack on Modern ProcessorsProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549417(1-9)Online publication date: 30-Oct-2022
  • (2022)Mixed-proxy extensions for the NVIDIA PTX memory consistency modelProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3533045(1058-1070)Online publication date: 18-Jun-2022
  • (2022)Fence Synthesis Under the C11 Memory ModelAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_6(83-99)Online publication date: 21-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '03: Proceedings of the 17th annual international conference on Supercomputing
June 2003
380 pages
ISBN:1581137338
DOI:10.1145/782814
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Java
  2. compilers
  3. consistency models

Qualifiers

  • Article

Conference

ICS03
Sponsor:
ICS03: International Conference on Supercomputing 2003
June 23 - 26, 2003
CA, San Francisco, USA

Acceptance Rates

ICS '03 Paper Acceptance Rate 36 of 171 submissions, 21%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Speculative Load Forwarding Attack on Modern ProcessorsProceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design10.1145/3508352.3549417(1-9)Online publication date: 30-Oct-2022
  • (2022)Mixed-proxy extensions for the NVIDIA PTX memory consistency modelProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3533045(1058-1070)Online publication date: 18-Jun-2022
  • (2022)Fence Synthesis Under the C11 Memory ModelAutomated Technology for Verification and Analysis10.1007/978-3-031-19992-9_6(83-99)Online publication date: 21-Oct-2022
  • (2021)VSync: push-button verification and optimization for synchronization primitives on weak memory modelsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446748(530-545)Online publication date: 19-Apr-2021
  • (2020)Extracting safe thread schedules from incomplete model checking resultsInternational Journal on Software Tools for Technology Transfer10.1007/s10009-020-00575-yOnline publication date: 26-Jun-2020
  • (2020)Lock and Fence When Needed: State Space Exploration + Static Analysis = Improved Fence and Lock InsertionIntegrated Formal Methods10.1007/978-3-030-63461-2_16(297-317)Online publication date: 13-Nov-2020
  • (2019)A Formal Model of Data Access for Multicore Architectures with Multilevel CachesScience of Computer Programming10.1016/j.scico.2019.04.003Online publication date: Apr-2019
  • (2018)Static serializability analysis for causal consistencyACM SIGPLAN Notices10.1145/3296979.319241553:4(90-104)Online publication date: 11-Jun-2018
  • (2018)Static serializability analysis for causal consistencyProceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3192366.3192415(90-104)Online publication date: 11-Jun-2018
  • (2017)Quick verification of concurrent programs by iteratively relaxed schedulingProceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering10.5555/3155562.3155658(776-781)Online publication date: 30-Oct-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media