research-article

ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM

Authors:

Sai Charan Koduru,

Rajiv GuptaAuthors Info & Claims

OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications

Pages 861 - 878

https://doi.org/10.1145/2660193.2660227

Published: 15 October 2014 Publication History

Abstract

Many vertex-centric graph algorithms can be expressed using asynchronous parallelism by relaxing certain read-after-write data dependences and allowing threads to compute vertex values using stale (i.e., not the most recent) values of their neighboring vertices. We observe that on distributed shared memory systems, by converting synchronous algorithms into their asynchronous counterparts, algorithms can be made tolerant to high inter-node communication latency. However, high inter-node communication latency can lead to excessive use of stale values causing an increase in the number of iterations required by the algorithms to converge. Although by using bounded staleness we can restrict the slowdown in the rate of convergence, this also restricts the ability to tolerate communication latency. In this paper we design a relaxed memory consistency model and consistency protocol that simultaneously tolerate communication latency and minimize the use of stale values. This is achieved via a coordinated use of best effort refresh policy and bounded staleness. We demonstrate that for a range of asynchronous graph algorithms and PDE solvers, on an average, our approach outperforms algorithms based upon: prior relaxed memory models that allow stale values by at least 2.27x; and Bulk Synchronous Parallel (BSP) model by 4.2x. We also show that our approach frequently outperforms GraphLab, a popular distributed graph processing framework.

References

[1]

Apache Giraph. http://giraph.apache.org/.

[2]

M. Ahamad, G. Neiger, J.E. Burns, P. Kohli, and P.W. Hutto. Causal Memory: Definitions, Implementation and Programming. phDistributed Computing, 9 (1): 37--49, 1995.

[3]

C. Amza, A.L. Cox, W. Zwaenepoel, and S. Dwarkadas. Software DSM Protocols That Adapt Between Single Writer and Multiple Writer. phHPCA, pages 261--271, 1997.

Digital Library

[4]

A. Bourchtein. Atmospheric models. http://www.cise.ufl.edu/research/sparse/matrices/Bourchtein/atmosmodl.html, 2009.

[5]

H.E. Bal, M.F. Kaashoek, and A.S. Tanenbaum. Orca: A Language for Parallel Programming of Distributed Systems. phIEEE TSE, 18 (3): 190--205, 1992.

Digital Library

[6]

G.M. Baudet. Asynchronous Iterative Methods for Multiprocessors. phJACM, 25 (2): 226--244, 1978.

Digital Library

[7]

B.N. Bershad and M.J. Zekauskas. Midway: Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors. phTR, Carnegie Mellon University-CS-91--170, 1991.

[8]

J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Implementation and Performance of Munin. phSOSP, pages 152--164, 1991.

Digital Library

[9]

J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for Reducing Consistency-related Communication in Distributed Shared-memory Systems. phACM TOCS, 13 (3): 205--243, 1995.

Digital Library

[10]

D. Chaiken, C. Fields, K. Kurihara, and A. Agarwal. Directory-Based Cache Coherence in Large-Scale Multiprocessors. phComputer, 23 (6): 49--58, 1990.

Digital Library

[11]

D. Chen, C. Tang, B. Sanders, S. Dwarkadas, and M.L. Scott. Exploiting High-level Coherence Information to Optimize Distributed Shared State. phPPoPP, pages 131--142, 2003.

Digital Library

[12]

Y-S. Cheng, M. Neely, and K. M. Chugg. Iterative Message Passing Algorithm for Bipartite Maximum Weighted Matching. In phIEEE International Symposium on Information Theory, pages 1934--1938. 2006.

[13]

J. Cipar, Q. Ho, J. K. Kim, S. Lee, G. R. Ganger, G. Gibson, K. Keeton, and E. Xing. Solving the Straggler Problem with Bounded Staleness. phHotOS, pages 22--22, 2013.

Digital Library

[14]

W. L. M. D. Chazan. Chaotic relaxation. In phLinear Algebra and Its Application, pages 2:199--222, 1969.

[15]

J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. phCACM, 51 (1): 107--113, 2008.

Digital Library

[16]

A. Dziekonski, A. Lamecki, and M. Mrozowski. High-order vector finite element method in EM. http://www.cise.ufl.edu/research/sparse/matrices/Dziekonski/dielFilterV3real.html, 2011.

[17]

C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software Behavior Oriented Parallelization. phPLDI, pages 223--234, 2007.

Digital Library

[18]

C. Janna, and M. Ferronato. 3D model of a steel flange, hexahedral finite elements. http://www.cise.ufl.edu/research/sparse/matrices/Janna/Flan\_1565.html, 2011.

[19]

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-memory Multiprocessors. phISCA, pages 15--26, 1990.

Digital Library

[20]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. phOSDI, pages 17--30, 2012.

Digital Library

[21]

J. R. Goodman. phCache Consistency and Sequential Consistency. Univ. of Wisconsin-Madison, CS Department, 1991.

[22]

A. Heddaya and H. Sinha. An overview of Mermera: A system and formalism for non-coherent distributed parallel memory. phHawaii International Conf. on System Sciences, vol. 2, pages 164--173, 1993.

[23]

A. Heddaya and H. Sinha. phCoherence, Non-coherence and Local Consistency in Distributed Shared Memory for Parallel Computing. TR BU-CS-92-004, Boston Univ., 1992.

[24]

A. Heddaya and H. Sinha. phAn Implementation of Mermera: A Shared Memory System that Mixes Coherence with Non-coherence. TR BUCS-TR-1993-006, Boston Univ., 1993.

Digital Library

[25]

M. De Domenico, A. Lima, P. Mougel, and M. Musolesi. The Anatomy of a Scientific Rumor. Scientific Reports, 2013.

[26]

P. W. Hutto and M. Ahamad. Slow memory: Weakening consistency to enhance concurrency in distributed shared memories. phICDCS, pages 302--309, 1990.

[27]

L. Iftode, J. P. Singh, and K. Li. Scope Consistency: A Bridge Between Release Consistency and Entry Consistency. phSPAA, pages 277--287, 1996.

Digital Library

[28]

V. Iosevich and A. Schuster. Distributed Shared Memory: To Relax or Not to Relax? In M. Danelutto, M. Vanneschi, and D. Laforenza, editors, phEuro-Par, phLNCS 3149, pages 198--205, Springer, 2004.

[29]

U. Kang, D. Horng, et al. Inference of Beliefs on Billion-Scale Graphs. phKDD-LDMTA, 2010.

[30]

G. Karypis, V. Kumar. A Fast and Highly Quality Multilevel Scheme for Partitioning Irregular Graphs. phSIAM Journal on Scientific Computing, Vol. 20, pp. 359--392, 1999.

Digital Library

[31]

P. Keleher, A.L. Cox, and W. Zwaenepoel. phLazy Release Consistency for Software Distributed Shared Memory, phISCA, pages 13--21, 1992.

Digital Library

[32]

P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. phWTEC.pages 10--10, 1994.

Digital Library

[33]

SC. Koduru, M. Feng, and R. Gupta. Programming Large Dynamic Data Structures on a DSM Cluster of Multicores. phPGAS Programming Models, 2013.

[34]

L. Kontothanassis, R. Stets, G. Hunt, U. Rencuzogullari, G. Altekar, S. Dwarkadas, and M.L. Scott. Shared Memory Computing on Clusters with Symmetric Multiprocessors and System Area Networks. phTOCS, 23 (3): 301--335, 2005.

Digital Library

[35]

A. Kristensen and C. Low. Problem-oriented Object Memory: Customizing Consistency. phOOPSLA, pages 399--413, 1995.

Digital Library

[36]

M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L.P. Chew. Optimistic Parallelism Requires Abstractions. phPLDI, pages 211--222, 2007.

Digital Library

[37]

A. Kyrola, G. Blelloch, C. Guestrin GraphChi: Large-scale Graph Computation on Just a PC. phOSDI, pages 31--46, 2012.

Digital Library

[38]

L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. phIEEE TC, C-28 (9): 690--691, 1979.

Digital Library

[39]

L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. phCACM, 21 (7): 558--565, 1978.

Digital Library

[40]

D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. phThe Directory-based Cache Coherence Protocol for the DASH Multiprocessor, phISCA, pages 148--159, 1990.

Digital Library

[41]

J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics, pages 29--123, 2009.

[42]

I. Lipkind, I. Pechtchanski, and V. Karamcheti. Object Views: Language Support for Intelligent Object Caching in Parallel and Distributed Computations. phOOPSLA, pages 447--460, 1999.

Digital Library

[43]

R. Lipton and J. Sandberg. phPRAM: A Scalable Shared Memory. Princeton University, Department of Computer Science, TR-180--88, 1988.

[44]

L. Liu and Z. Li. Improving Parallelism and Locality with Asynchronous Algorithms. phPPoPP, pages 213--222, 2010.

Digital Library

[45]

X. Liu and T. Murata. Advanced modularity-specialized label propagation algorithm for detecting communities in networks. phPhysica A: Statistical Mechanics and its Applications, 389 (7): 1493--1500, 2010.

[46]

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. phProc. VLDB Endow., 5 (8): 716--727, 2012.

Digital Library

[47]

G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing.phSIGMOD, pages 135--146, 2010.

Digital Library

[48]

R. Meyers and Z. Li. ASYNC Loop Constructs for Relaxed Synchronization. In phLCPC,phLanguages and Compilers for Parallel Computing, pages 292--303, 2008.

Digital Library

[49]

D. Mosberger. Memory Consistency Models. phSIGOPS Oper. Syst. Rev., 27 (1): 18--26, 1993.

Digital Library

[50]

D. Nguyen, L. Andrew and K. Pingali. A Lightweight Infrastructure for Graph Analytics. phSOSP, 2013.

Digital Library

[51]

W-Y. Liang, C-T. King, and F. Lai. Adsmith: An Efficient Object-Based Distributed Shared Memory System on PVM. phInternational Symposium on Parallel Architectures, Algorithms, and Networks, pages 173--179, 1996.

Digital Library

[52]

J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012.

Digital Library

[53]

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. 1999.

[54]

L. Takac and M. Zabovsky. Data analysis in public social networks. International Scientific Conference and International Workshop Present Day Trends of Innovations, 2012.

[55]

L. Rauchwerger and D. A. Padua. The LRPD Test: Speculative Run-time Parallelization of Loops with Privatization and Reduction Parallelization. PLDI, 218--232, 1995.

Digital Library

[56]

D. J. Scales and K. Gharachorloo. Design and Performance of the Shasta Distributed Shared Memory Protocol. phICS, pages 245--252, 1997.

Digital Library

[57]

M. Schulz, J. Tao, and W. Karl. Improving the Scalability of Shared Memory Systems through Relaxed Consistency. phWC3, 2002.

[58]

X. Shen, Arvind, and L. Rudolph. CACHET: An Adaptive Cache Coherence Protocol for Distributed Shared-memory Systems. phICS, pages 135--144, 1999.

Digital Library

[59]

J. Shun, and G. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. phPPoPP, pages 135--146, 2013.

Digital Library

[60]

A. Singla, U. Ramachandran, and J. Hodgins. Temporal Notions of Synchronization and Consistency in Beehive. phSPAA, pages 211--220, 1997.

Digital Library

[61]

J. Leskovec. Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/index.html, 2011.

[62]

C. Sinclair. 3-D spectral-element elastic wave modeling in freq. domain. http://www.cise.ufl.edu/research/sparse/matrices/Sinclair/3Dspectralwave.html, 2007.

[63]

T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. phACM Transactions on Mathematical Software, Vol 38, pages 1:1 - 1:25, 2011.

Digital Library

[64]

L. G. Valiant. A Bridging Model for Parallel Computation. phCACM, 33 (8): 103--111, 1990.

Digital Library

[65]

J. Leskovec, L. A. Adamic, and B. A. Huberman. The Dynamics of Viral Marketing. ACM Trans. Web, 2007.

Digital Library

[66]

G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous Large-Scale Graph Processing Made Easy. phCIDR, 2013.

[67]

B.-H. Yu, Z. Huang, S. Cranefield, and M. Purvis. Homeless and Home-based Lazy Release Consistency Protocols on Distributed Shared Memory. phAustralasian Conf. on Computer Science-Vol 26, pages 117--123, 2004.

Digital Library

[68]

Y. Zhou, L. Iftode, J. P. Sing, K. Li, B. R. Toonen, I. Schoinas, M. D. Hill, and D. A. Wood. Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation. phPPoPP, pages 193--205, 1997.

Digital Library

[69]

X. Zhu and Z. Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report Carnegie Mellon University-CALD-02--107,Carnegie Mellon University, 2002.

Cited By

Concessao KCheramangalath UDev RNasre R(2024)Meerkat: A Framework for Dynamic Graph Algorithms on GPUsInternational Journal of Parallel Programming10.1007/s10766-024-00774-z52:5-6(400-453)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s10766-024-00774-z
Mazloumi AAfarin MGupta R(2023)Expressway: Prioritizing Edges for Distributed Evaluation of Graph Queries2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386860(4362-4371)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386860
Sun XWang WQiu SYang RHuang SXu JWang Z(2022)STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model TrainingSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00076(1-17)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00076
Show More Cited By

Index Terms

ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
      2. Language types
        Distributed programming languages

Recommendations

ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM
OOPSLA '14

Many vertex-centric graph algorithms can be expressed using asynchronous parallelism by relaxing certain read-after-write data dependences and allowing threads to compute vertex values using stale (i.e., not the most recent) values of their neighboring ...
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors

To support a global virtual memory space, an architecture must translate virtual addresses dynamically. In current processors, the translation is done in a TLB (Translation Lookaside Buffer), before or in parallel with the first-level cache access. As ...
Minnow: Lightweight Offload Engines for Worklist Management and Worklist-Directed Prefetching
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

The importance of irregular applications such as graph analytics is rapidly growing with the rise of Big Data. However, parallel graph workloads tend to perform poorly on general-purpose chip multiprocessors (CMPs) due to poor cache locality, low ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications

October 2014

946 pages

ISBN:9781450325851

DOI:10.1145/2660193

General Chair:
Andrew Black
Portland State University, USA
,
Program Chair:
Todd Millstein
University of California, Los Angeles, USA

ACM SIGPLAN Notices Volume 49, Issue 10
OOPSLA '14
October 2014
907 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2714064
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

In-Cooperation

SIGAda: ACM Special Interest Group on Ada Programming Language

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SPLASH '14

Sponsor:

SIGPLAN

SPLASH '14: Conference on Systems, Programming, and Applications: Software for Humanity

October 20 - 24, 2014

Oregon, Portland, USA

Acceptance Rates

OOPSLA '14 Paper Acceptance Rate 52 of 186 submissions, 28%;

Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

56
Total Citations
View Citations
882
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)2

Reflects downloads up to 29 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Concessao KCheramangalath UDev RNasre R(2024)Meerkat: A Framework for Dynamic Graph Algorithms on GPUsInternational Journal of Parallel Programming10.1007/s10766-024-00774-z52:5-6(400-453)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s10766-024-00774-z
Mazloumi AAfarin MGupta R(2023)Expressway: Prioritizing Edges for Distributed Evaluation of Graph Queries2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386860(4362-4371)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386860
Sun XWang WQiu SYang RHuang SXu JWang Z(2022)STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model TrainingSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00076(1-17)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00076
Jiang XXu CGupta R(2021)VRGQACM SIGOPS Operating Systems Review10.1145/3469379.346938255:1(11-20)Online publication date: 6-Jun-2021
https://dl.acm.org/doi/10.1145/3469379.3469382
Zhuo YChen JRao GLuo QWang YYang HQian DQian X(2021)Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency GuaranteeACM Transactions on Computer Systems10.1145/345368137:1-4(1-37)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1145/3453681
Zhuo YChen JLuo QWang YYang HQian DQian XDonaldson ATorlak E(2020)SympleGraph: distributed graph processing with precise loop-carried dependency guaranteeProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385961(592-607)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3385961
Zhang YBrahmakshatriya AChen XDhulipala LKamil SAmarasinghe SShun JMars JTang LXue JWu P(2020)Optimizing ordered graph algorithms with GraphItProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377909(158-170)Online publication date: 22-Feb-2020
https://dl.acm.org/doi/10.1145/3368826.3377909
Rahman SAbu-Ghazaleh NGupta R(2020)GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00078(908-921)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00078
Xu CMazloumi AJiang XGupta R(2020)SimGQ: Simultaneously Evaluating Iterative Graph Queries2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC50609.2020.00014(1-10)Online publication date: Dec-2020
https://doi.org/10.1109/HiPC50609.2020.00014
Mazloumi AXu CZhao ZGupta R(2020)BEAD: Batched Evaluation of Iterative Graph Queries with Evolving Analytics Demands2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378211(461-468)Online publication date: 10-Dec-2020
https://doi.org/10.1109/BigData50022.2020.9378211
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents