Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2749469.2750378acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

ArMOR: defending against memory consistency model mismatches in heterogeneous architectures

Published: 13 June 2015 Publication History

Abstract

Architectural heterogeneity is increasing: numerous products and studies have proven the benefits of combining cores and accelerators with varying ISAs into a single system. However, an underappreciated barrier to unlocking the full potential of heterogeneity is the need to specify and to reconcile differences in memory consistency models across layers of the hardware-software stack and among on-chip components.
This paper presents ArMOR, a framework for specifying, comparing, and translating between memory consistency models. ArMOR defines MOSTs, an architecture-independent and precise format for specifying the semantics of memory ordering requirements such as preserved program order or explicit fences. MOSTs allow any two consistency models to be directly and algorithmically compared, and they help avoid many of the pitfalls of traditional consistency model analysis. As a case study, we use ArMOR to automatically generate translation modules called shims that dynamically translate code compiled for one memory model to execute on hardware implementing a different model.

References

[1]
S. Adve and K. Gharachorloo, "Shared memory consistency models: A tutorial," IEEE Computer, vol. 29, no. 12, pp. 66--76, 1996.
[2]
S. Adve and M. Hill, "Weak ordering: a new definition," ISCA, 1990.
[3]
J. Alglave, "A formal hierarchy of weak memory models," Formal Methods in System Design (FMSD), vol. 41, no. 2, pp. 178--210, 2012.
[4]
J. Alglave, M. Batty, A. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson, "GPU concurrency: weak behaviours and programming assumptions," ASPLOS, 2015.
[5]
J. Alglave, A. Fox, S. Ishtiaq, M. O. Myreen, S. Sarkar, P. Sewell, and F. Z. Nardelli, "The semantics of Power and ARM machine code," 4th Workshop on Declarative Aspects of Multicore Programming (DAMP), 2009.
[6]
J. Alglave, L. Maranget, S. Sarkar, and P. Sewell, "Fences in weak memory models," CAV, 2010.
[7]
J. Alglave, L. Maranget, and M. Tautschnig, "Herding cats: Modelling, simulation, testing, and data-mining for weak memory," ACM TOPLAS, vol. 36, July 2014.
[8]
ARM, "ARM architecture reference manual," 2013.
[9]
Arvind and J.-W. Maessen, "Memory model = instruction reordering + store atomicity," ISCA, 2006.
[10]
M. Bach, M. Charney, R. Cohn, E. Demikhovsky, T. Devor, K. Hazelwood, A. Jaleel, C.-K. Luk, G. Lyons, H. Patil, and A. Tal, "Analyzing parallel programs with Pin," IEEE Computer, vol. 43, no. 3, pp. 34--41, 2010.
[11]
L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach, "IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems," MICRO, 2003.
[12]
M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell, "Clarifying and compiling C/C++ Concurrency: from C++11 to POWER," POPL, 2012.
[13]
C. Bienia, "Benchmarking modern multiprocessors," Ph.D. dissertation, Princeton University, January 2011.
[14]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comp. Arch. News, vol. 39, no. 2, Aug. 2011.
[15]
H.-J. Boehm and S. Adve, "Foundations of the C++ concurrency memory model," PLDI, 2008.
[16]
Broadcom, "Migrating CPU specific code from the PowerPC to the Broadcom SB-1 processor," White Paper SB-1-WP100-R, 2002.
[17]
S. Burckhardt, R. Alur, and M. M. K. Martin, "CheckFence: Checking consistency of concurrent data types on relaxed memory models," PLDI, 2007.
[18]
T. Chen, R. Raghavan, J. N. Dale, and E. Iwata, "Cell broadband engine architecture and its first implementation---a performance view," IBM Journal of Research and Development, vol. 51, no. 5, pp. 559--572, 2007.
[19]
B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C.-T. Chou, "DeNovo: Rethinking the memory hierarchy for disciplined parallelism," PACT, 2011.
[20]
M. DeVuyst, A. Venkat, and D. Tullsen, "Execution migration in a heterogeneous-ISA chip multiprocessor," ASPLOS, 2012.
[21]
Y. Duan, A. Muzahid, and J. Torrellas, "WeeFence: Toward making fences free in TSO," ISCA, 2013.
[22]
I. Gelado, J. E. Stone, J. Cabezas, S. Patel, N. Navarro, and W.-M. W. Hwu, "An asymmetric distributed shared memory model for heterogeneous parallel systems," ASPLOS, 2010.
[23]
K. Gharachorloo, A. Gupta, and J. Hennessy, "Two techniques to enhance the performance of memory consistency models," 29th International Conference on Parallel Processing (ICPP), 1991.
[24]
P. Greenhalgh, "big.LITTLE processing with ARM Cortex-A15 & Cortex-A7," ARM White Paper, 2011. {Online}. Available: http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdf
[25]
M. Gschwind, K. Ebcioğlu, E. Altman, and S. Sathaye, "Binary translation and architecture convergence issues for IBM System/390," ICS, 2000.
[26]
L. Higham and L. Jackson, "Translating between Itanium and Sparc memory consistency models," SPAA, 2006.
[27]
T. Q. Huynh and A. Roychoudhury, "Memory model sensitive bytecode verification," Formal Methods in System Design (FMSD), vol. 31, 2007.
[28]
IBM, "Power ISA version 2.07," 2013.
[29]
Intel, "Intel Itanium architecture software developer's manual, revision 2.3," 2010.
[30]
Intel, "Intel 64 and IA-32 architectures software developer's manual," 2013.
[31]
J. H. Kelm, D. R. Johnson, W. Tuohy, S. S. Lumetta, and S. J. Patel, "Cohesion: A hybrid memory model for accelerators," ISCA, 2010.
[32]
Khronos Group, "OpenCL 2.0." {Online}. Available: http://www.khronos.org/opencl
[33]
M. Kuperstein, M. Vechev, and E. Yahav, "Automatic inference of memory fences," FMCAD, 2012.
[34]
N. M. Lê, A. Pop, A. Cohen, and F. Zappa Nardelli, "Correct and efficient work-stealing for weak memory models," PPoPP, 2013.
[35]
J. Lee and D. A. Padua, "Hiding relaxed memory consistency with compilers," PACT, 2000.
[36]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: building customized program analysis tools with dynamic instrumentation," PLDI, 2005.
[37]
D. Lustig and M. Martonosi, "Reducing GPU offload latency via fine-grained CPU-GPU synchronization," HPCA, 2013.
[38]
D. Lustig, M. Pellauer, and M. Martonosi, "PipeCheck: Specifying and verifying microarchitectural enforcement of memory consistency models," MICRO, 2014.
[39]
D. Lustig, C. Trippel, M. Pellauer, and M. Martonosi, "ArMOR: Defending against consistency model mismatches in heterogeneous architectures," Princeton Computer Science Tech. Report TR-981-15, 2015, (conference paper extension).
[40]
S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams, "An axiomatic memory model for POWER multiprocessors," 2012.
[41]
J. Manson, W. Pugh, and S. Adve, "The Java memory model," POPL, 2005.
[42]
F. Z. Nardelli, P. Sewell, J. Sevcik, S. Sarkar, S. Owens, L. Maranget, M. Batty, and J. Alglave, "Relaxed memory models must be rigorous," 2009.
[43]
NVIDIA, "NVIDIA Tegra K1: A new era in mobile computing," 2014. {Online}. Available: http://www.nvidia.com/content/pdf/tegra_white_papers/tegra_k1_whitepaper_v1.0.pdf
[44]
NVIDIA, "CUDA C programming guide v5.5," 2013.
[45]
S. Owens, S. Sarkar, and P. Sewell, "A better x86 memory model: x86-TSO," 22nd Conference on Theorem Proving in Higher Order Logics (TPHOLs), 2009.
[46]
R. Paige and R. E. Tarjan, "Three partition refinement algorithms," SIAM Journal on Computing, vol. 16, no. 6, pp. 973--989, 1987.
[47]
S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency," ISCA, 2014.
[48]
A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger, "A reconfigurable fabric for accelerating large-scale datacenter services," ISCA, 2014.
[49]
Qualcomm, "Snapdragon S4 processors: System on chip solutions for a new mobile age," October 2011. {Online}. Available: https://developer.qualcomm.com/download/qusnapdragons4whitepaperfnlrev6.pdf
[50]
B. Saha, X. Zhou, H. Chen, Y. Gao, S. Yan, M. Rajagopalan, J. Fang, P. Zhang, R. Ronen, and A. Mendelson, "Programming model for a heterogeneous x86 platform," PLDI, 2009.
[51]
S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams, "Understanding POWER microprocessors," PLDI, 2011.
[52]
J. Ševčík, V. Vafeiadis, F. Zappa Nardelli, S. Jagannathan, and P. Sewell, "CompCertTSO: A verified compiler for relaxed-memory concurrency," Journal of the ACM (JACM), vol. 60, no. 3, p. 22, 2013.
[53]
P. Sewell et al., "C/C++11 mappings to processors," http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.
[54]
D. Shasha and M. Snir, "Efficient and correct execution of parallel programs that share memory," TOPLAS, 1988.
[55]
X. Shen, Arvind, and L. Rudolph, "Commit-Reconcile and Fences: A new memory model for architects and compiler writers," ISCA, 1999.
[56]
A. L. Shimpi, "AMD announced K12 core: Custom 64-bit ARM design in 2016." {Online}. Available: http://www.anandtech.com/show/7990/amd-announces-k12-core-custom-64bit-arm-design-in-2016
[57]
A. Singh, S. Narayanasamy, D. Marino, T. Millstein, and M. Musuvathi, "End-to-end sequential consistency," ISCA, 2012.
[58]
D. Sorin, M. Hill, and D. Wood, A Primer on Memory Consistency and Cache Coherence, ser. Synthesis Lectures on Computer Architecture, M. Hill, Ed. Morgan & Claypool Publishers, 2011.
[59]
SPARC, "SPARC architecture manual, version 9," 1994.
[60]
H. Sung, R. Komuravelli, and S. V. Adve, "DeNovoND: efficient hardware support for disciplined non-determinism," ASPLOS, 2013.
[61]
Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua, "Compiler techniques for high performance sequentially consistent Java programs," PPoPP, 2005.
[62]
J. M. Tendler, J. S. Dodson, J. Fields, H. Le, and B. Sinharoy, "POWER4 system microarchitecture," IBM Journal of Research and Development, vol. 46, no. 1, pp. 5--25, 2002.
[63]
"Top500," http://www.top500.org, accessed: Jul. 28, 2014.
[64]
V. Vafeiadis and F. Z. Nardelli, "Verifying fence elimination optimisations," SAS, 2011.
[65]
A. Venkat and D. M. Tullsen, "Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor," ISCA, 2014.

Cited By

View all
  • (2024)PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency ModelProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676889(1-13)Online publication date: 14-Oct-2024
  • (2023)AtoMig: Automatically Migrating Millions Lines of Code from TSO to WMMProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3579849(61-73)Online publication date: 27-Jan-2023
  • (2023)Risotto: A Dynamic Binary Translator for Weak Memory Model ArchitecturesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567962(107-122)Online publication date: 25-Mar-2023
  • Show More Cited By

Index Terms

  1. ArMOR: defending against memory consistency model mismatches in heterogeneous architectures

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
          June 2015
          768 pages
          ISBN:9781450334020
          DOI:10.1145/2749469
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 13 June 2015

          Permissions

          Request permissions for this article.

          Check for updates

          Qualifiers

          • Research-article

          Funding Sources

          Conference

          ISCA '15
          Sponsor:

          Acceptance Rates

          Overall Acceptance Rate 543 of 3,203 submissions, 17%

          Upcoming Conference

          ISCA '25

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)33
          • Downloads (Last 6 weeks)8
          Reflects downloads up to 23 Nov 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)PipeGen: Automated Transformation of a Single-Core Pipeline into a Multicore Pipeline for a Given Memory Consistency ModelProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676889(1-13)Online publication date: 14-Oct-2024
          • (2023)AtoMig: Automatically Migrating Millions Lines of Code from TSO to WMMProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3579849(61-73)Online publication date: 27-Jan-2023
          • (2023)Risotto: A Dynamic Binary Translator for Weak Memory Model ArchitecturesProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3567955.3567962(107-122)Online publication date: 25-Mar-2023
          • (2023)HeteroGen: Automatic Synthesis of Heterogeneous Cache Coherence ProtocolsIEEE Micro10.1109/MM.2023.327499343:4(62-70)Online publication date: 1-Jul-2023
          • (2023)Parallelizing Stream Compression for IoT Applications on Asymmetric Multicores2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00078(950-964)Online publication date: Apr-2023
          • (2022)Lasagne: a static binary translator for weak memory model architecturesProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523719(888-902)Online publication date: 9-Jun-2022
          • (2022)HeteroGen: Automatic Synthesis of Heterogeneous Cache Coherence Protocols2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00061(756-771)Online publication date: Apr-2022
          • (2022)Consistency and Coherence for Heterogeneous SystemsA Primer on Memory Consistency and Cache Coherence10.1007/978-3-031-01764-3_10(211-251)Online publication date: 28-Mar-2022
          • (2020)A Primer on Memory Consistency and Cache Coherence, Second EditionSynthesis Lectures on Computer Architecture10.2200/S00962ED2V01Y201910CAC04915:1(1-294)Online publication date: 4-Feb-2020
          • (2020)DQEMU: A Scalable Emulator with Retargetable DBT on Distributed PlatformsProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404403(1-11)Online publication date: 17-Aug-2020
          • Show More Cited By

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media