Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3243176.3243193acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

MemoDyn: exploiting weakly consistent data structures for dynamic parallel memoization

Published: 01 November 2018 Publication History

Abstract

Several classes of algorithms for combinatorial search and optimization problems employ memoization data structures to speed up their serial convergence. However, accesses to these data structures impose dependences that obstruct program parallelization. Such programs often continue to function correctly even when queries into these data structures return a partial view of their contents. Weakening the consistency of these data structures can unleash new parallelism opportunities, potentially at the cost of additional computation. These opportunities must, therefore, be carefully exploited for overall speedup. This paper presents MemoDyn, a framework for parallelizing loops that access data structures with weakly consistent semantics. MemoDyn provides programming abstractions to express weak semantics, and consists of a parallelizing compiler and a runtime system that automatically and adaptively exploit the semantics for optimized parallel execution. Evaluation of MemoDyn shows that it achieves efficient parallelization, providing significant improvements over competing techniques in terms of both runtime performance and solution quality.

References

[1]
S.V. Adve and K. Gharachorloo. 1996. Shared Memory Consistency Models: A Tutorial. Computer 29, 12 (1996), 66--76.
[2]
Sarita V Adve and Hans-J Boehm. 2010. Memory models: a case for rethinking parallel languages and hardware. Commun. ACM 53, 8 (2010), 90--101.
[3]
Vikram Adve, Vinh Vi Lam, and Brian Ensink. 2001. Language and Compiler Support for Adaptive Distributed Applications. In Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems (LCTES '01). ACM, New York, NY, USA, 238--246.
[4]
Ankur Agrawal, Jungwook Choi, Kailash Gopalakrishnan, Suyog Gupta, Ravi Nair, Jinwook Oh, Daniel A Prener, Sunil Shukla, Vijayalakshmi Srinivasan, and Zehra Sura. 2016. Approximate computing: Challenges and opportunities. In Rebooting Computing (ICRC), IEEE International Conference on. IEEE, 1--8.
[5]
C. Blum and M. Dorigo. 2004. The Hyper-Cube Framework for Ant Colony Optimization. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 34, 2 (2004), 1161--1172.
[6]
Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ Concurrency Memory Model. In Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation (PLDI '08). ACM, New York, NY, USA, 68--78.
[7]
Matthew Bridges, Neil Vachharajani, Yun Zhang, Thomas Jablin, and David August. 2007. Revisiting the Sequential Programming Model for Multi-Core. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, Washington, DC, USA, 69--84.
[8]
Soumen Chakrabarti and Katherine A. Yelick. 1994. Distributed Data Structures and Algorithms for Gröbner Basis Computation. Lisp and Symbolic Computation 7, 2--3 (1994), 147--172.
[9]
Romain Cledat, Tushar Kumar, and Santhosh Pande. 2011. Efficiently Speeding up Sequential Computation through N-way Programming Model. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications (OOPSLA '11). ACM, New York, NY, USA.
[10]
Romain Cledat, Kaushik Ravichandran, and Santosh Pande. 2011. Leveraging Data-structure Semantics for Efficient Algorithmic Parallelism. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF '11). ACM, New York, NY, USA, Article 28, 10 pages.
[11]
Luca Della Toffola, Michael Pradel, and Thomas R. Gross. 2015. Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities. SIGPLAN Not. 50, 10 (Oct. 2015), 607--622.
[12]
Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: a relaxed consistency deterministic computer. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '11). ACM, New York, NY, USA, 67--78.
[13]
Kevin Donnelly, J. J. Hallett, and Assaf Kfoury. 2006. Formal Semantics of Weak References. In Proceedings of the 5th international symposium on Memory management (ISMM '06). ACM, New York, NY, USA, 126--137.
[14]
Jonathan Eastep, David Wingate, and Anant Agarwal. 2011. Smart Data Structures: An Online Machine Learning Approach to Multicore Data Structures. In Proceedings of the 8th ACM international conference on Autonomic computing (ICAC '11). ACM, New York, NY, USA, 11--20.
[15]
Niklas Eén and Niklas Sörensson. 2004. An Extensible SAT-Solver. In Theory and Applications of Satisfiability Testing, Enrico Giunchiglia and Armando Tacchella (Eds.). Lecture Notes in Computer Science, Vol. 2919. Springer Berlin / Heidelberg, 333--336.
[16]
Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc Maranget, Will Deacon, and Peter Sewell. 2016. Modelling the ARMv8 Architecture, Operationally: Concurrency and ISA. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '16). ACM, New York, NY, USA, 608--621.
[17]
Matteo Frigo, Pablo Halpern, Charles E. Leiserson, and Stephen Lewin-Berlin. 2009. Reducers and Other Cilk++ Hyperobjects. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures (SPAA '09). ACM, New York, NY, USA, 79--90.
[18]
Luís Gil, Paulo F. Flores, and Luis Miguel Silveira. 2009. PMSat: A Parallel Version of MiniSAT. JSAT 6, 1--3 (2009), 71--98.
[19]
Harm Gunnar. 2011. Bobcat. http://github.com/Bobcat/bobcat.
[20]
Youssef Hamadi, Saïd Jabbour, and Lakhdar Sais. 2009. ManySAT: A Parallel SAT Solver. JSAT 6, 4 (2009), 245--262.
[21]
R. Haney, T. Meuse, J. Kepner, and J. Lebak. 2005. The HPEC challenge benchmark suite. In HPEC 2005 Workshop.
[22]
E.C.R. Hehner. 1993. A practical theory of programming. Springer.
[23]
T Joachims. 1999. Making large-Scale SVM Learning Practical. Advances in Kernel Methods Support Vector Learning (1999), 169--184. http://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.ps.gz
[24]
Laxmikant V. Kale and Sanjeev Krishnan. 1993. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications (OOPSLA '93). ACM, New York, NY, USA, 91--108.
[25]
Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. 2017. A Promising Semantics for Relaxed-memory Concurrency. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2017). ACM, New York, NY, USA, 175--189.
[26]
M. Koshimura, T. Zhang, H. Fujita, and R. Hasegawa. 2012. QMaxSAT: A Partial Max-SAT Solver system description. Journal on Satisfiability, Boolean Modeling and Computation 8 (2012), 95--100.
[27]
Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2007. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation (PLDI '07). ACM, New York, NY, USA, 211--222.
[28]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO '04: Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, Washington, DC, USA, 75.
[29]
Adam Letchford and Andrea Lodi. 2003. An Augment-and-Branch-and-Cut Framework for Mixed 0-1 Programming. In Combinatorial Optimization - Eureka, You Shrink!, Michael Jünger, Gerhard Reinelt, and Giovanni Rinaldi (Eds.). Lecture Notes in Computer Science, Vol. 2570. Springer Berlin / Heidelberg, 119--133.
[30]
S. Luke. 2010. Essentials of Metaheuristics.
[31]
Donald Michie. 1968. Memo Functions and Machine Learning. Nature 218, 5136 (1968), 19--22.
[32]
Kyndylan Nienhuis, Kayvan Memarian, and Peter Sewell. 2016. An Operational Semantics for C/C++11 Concurrency. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). ACM, New York, NY, USA, 111--128.
[33]
Arun Raman, Ayal Zaks, Jae W. Lee, and David I. August. 2012. Parcae: a system for flexible parallel execution. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 133--144.
[34]
Lawrence Rauchwerger, Francisco Arzu, and Koji Ouchi. 1998. Standard Templates Adaptive Parallel Library (STAPL). In LCR. 402--409.
[35]
Lakshminarayanan Renganarayana, Vijayalakshmi Srinivasan, Ravi Nair, and Daniel Prener. 2012. Programming with Relaxed Synchronization. In Proceedings of the 2012 ACM Workshop on Relaxing Synchronization for Multicore and Manycore Scalability (RACES '12). ACM, New York, NY, USA, 41--50.
[36]
Martin Rinard. 2013. Parallel Synchronization-Free Approximate Data Structure Construction. In Presented as part of the 5th USENIX Workshop on Hot Topics in Parallelism. USENIX, San Jose, CA. https://www.usenix.org/conference/hotpar13/workshop-program/presentation/Rinard
[37]
Martin C. Rinard. 1994. The design, implementation and evaluation of Jade, a portable, implicitly parallel programming language. Ph.D. Dissertation.
[38]
Martin C Rinard. 2012. Unsynchronized Techniques for Approximate Parallel Computing. RACES (2012).
[39]
Martin C. Rinard and Pedro C. Diniz. 2003. Eliminating synchronization bottlenecks using adaptive replication. ACM Trans. Program. Lang. Syst. 25, 3 (May 2003), 316--359.
[40]
Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty, Peter Sewell, Luc Maranget, Jade Alglave, and Derek Williams. 2012. Synchronising C/C++ and POWER. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 311--322.
[41]
D.J. Slate. 1987. A chess program that uses transposition table to learn from experience. International Computer Chess Association Journal 10, 2 (1987), 59--71.
[42]
M. Süß and C. Leopold. 2006. Implementing Irregular Parallel Algorithms with OpenMP. Euro-Par 2006 Parallel Processing (2006), 635--644.
[43]
V. Tabatabaee, A. Tiwari, and J.K. Hollingsworth. 2005. Parallel Parameter Tuning for Applications with Performance Variability. In Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference. 57.
[44]
E.G. Talbi. 2006. Parallel combinatorial optimization. Vol. 58. Wiley-Blackwell.
[45]
George Teodoro and Alan Sussman. 2011. AARTS: low overhead online adaptive auto-tuning. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (EXADAPT '11). ACM, New York, NY, USA, 1--11.
[46]
Abhishek Udupa, Kaushik Rajan, and William Thies. 2011. ALTER: exploiting breakable dependences for parallelization. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation (PLDI '11). ACM, New York, NY, USA, 480--491.
[47]
Weixiong Zhang. 2001. Phase Transitions and Backbones of 3-SAT and Maximum 3-SAT. In Principles and Practice of Constraint Programming - CP 2001, Toby Walsh (Ed.). Lecture Notes in Computer Science, Vol. 2239. Springer Berlin / Heidelberg, 153--167.

Cited By

View all
  • (2023)Efficient Parallel Functional Programming with EffectsProceedings of the ACM on Programming Languages10.1145/35912847:PLDI(1558-1583)Online publication date: 6-Jun-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques
November 2018
494 pages
ISBN:9781450359863
DOI:10.1145/3243176
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IFIP WG 10.3: IFIP WG 10.3
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptivity
  2. automatic parallelization
  3. implicit parallelism
  4. memoization
  5. programming model
  6. runtime
  7. tuning
  8. weakly consistent data structures

Qualifiers

  • Research-article

Conference

PACT '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Upcoming Conference

PACT '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient Parallel Functional Programming with EffectsProceedings of the ACM on Programming Languages10.1145/35912847:PLDI(1558-1583)Online publication date: 6-Jun-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media