Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2660193.2660227acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM

Published: 15 October 2014 Publication History

Abstract

Many vertex-centric graph algorithms can be expressed using asynchronous parallelism by relaxing certain read-after-write data dependences and allowing threads to compute vertex values using stale (i.e., not the most recent) values of their neighboring vertices. We observe that on distributed shared memory systems, by converting synchronous algorithms into their asynchronous counterparts, algorithms can be made tolerant to high inter-node communication latency. However, high inter-node communication latency can lead to excessive use of stale values causing an increase in the number of iterations required by the algorithms to converge. Although by using bounded staleness we can restrict the slowdown in the rate of convergence, this also restricts the ability to tolerate communication latency. In this paper we design a relaxed memory consistency model and consistency protocol that simultaneously tolerate communication latency and minimize the use of stale values. This is achieved via a coordinated use of best effort refresh policy and bounded staleness. We demonstrate that for a range of asynchronous graph algorithms and PDE solvers, on an average, our approach outperforms algorithms based upon: prior relaxed memory models that allow stale values by at least 2.27x; and Bulk Synchronous Parallel (BSP) model by 4.2x. We also show that our approach frequently outperforms GraphLab, a popular distributed graph processing framework.

References

[1]
Apache Giraph. http://giraph.apache.org/.
[2]
M. Ahamad, G. Neiger, J.E. Burns, P. Kohli, and P.W. Hutto. Causal Memory: Definitions, Implementation and Programming. phDistributed Computing, 9 (1): 37--49, 1995.
[3]
C. Amza, A.L. Cox, W. Zwaenepoel, and S. Dwarkadas. Software DSM Protocols That Adapt Between Single Writer and Multiple Writer. phHPCA, pages 261--271, 1997.
[4]
A. Bourchtein. Atmospheric models. http://www.cise.ufl.edu/research/sparse/matrices/Bourchtein/atmosmodl.html, 2009.
[5]
H.E. Bal, M.F. Kaashoek, and A.S. Tanenbaum. Orca: A Language for Parallel Programming of Distributed Systems. phIEEE TSE, 18 (3): 190--205, 1992.
[6]
G.M. Baudet. Asynchronous Iterative Methods for Multiprocessors. phJACM, 25 (2): 226--244, 1978.
[7]
B.N. Bershad and M.J. Zekauskas. Midway: Shared Memory Parallel Programming with Entry Consistency for Distributed Memory Multiprocessors. phTR, Carnegie Mellon University-CS-91--170, 1991.
[8]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Implementation and Performance of Munin. phSOSP, pages 152--164, 1991.
[9]
J.B. Carter, J.K. Bennett, and W. Zwaenepoel. Techniques for Reducing Consistency-related Communication in Distributed Shared-memory Systems. phACM TOCS, 13 (3): 205--243, 1995.
[10]
D. Chaiken, C. Fields, K. Kurihara, and A. Agarwal. Directory-Based Cache Coherence in Large-Scale Multiprocessors. phComputer, 23 (6): 49--58, 1990.
[11]
D. Chen, C. Tang, B. Sanders, S. Dwarkadas, and M.L. Scott. Exploiting High-level Coherence Information to Optimize Distributed Shared State. phPPoPP, pages 131--142, 2003.
[12]
Y-S. Cheng, M. Neely, and K. M. Chugg. Iterative Message Passing Algorithm for Bipartite Maximum Weighted Matching. In phIEEE International Symposium on Information Theory, pages 1934--1938. 2006.
[13]
J. Cipar, Q. Ho, J. K. Kim, S. Lee, G. R. Ganger, G. Gibson, K. Keeton, and E. Xing. Solving the Straggler Problem with Bounded Staleness. phHotOS, pages 22--22, 2013.
[14]
W. L. M. D. Chazan. Chaotic relaxation. In phLinear Algebra and Its Application, pages 2:199--222, 1969.
[15]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. phCACM, 51 (1): 107--113, 2008.
[16]
A. Dziekonski, A. Lamecki, and M. Mrozowski. High-order vector finite element method in EM. http://www.cise.ufl.edu/research/sparse/matrices/Dziekonski/dielFilterV3real.html, 2011.
[17]
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. Software Behavior Oriented Parallelization. phPLDI, pages 223--234, 2007.
[18]
C. Janna, and M. Ferronato. 3D model of a steel flange, hexahedral finite elements. http://www.cise.ufl.edu/research/sparse/matrices/Janna/Flan\_1565.html, 2011.
[19]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-memory Multiprocessors. phISCA, pages 15--26, 1990.
[20]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. phOSDI, pages 17--30, 2012.
[21]
J. R. Goodman. phCache Consistency and Sequential Consistency. Univ. of Wisconsin-Madison, CS Department, 1991.
[22]
A. Heddaya and H. Sinha. An overview of Mermera: A system and formalism for non-coherent distributed parallel memory. phHawaii International Conf. on System Sciences, vol. 2, pages 164--173, 1993.
[23]
A. Heddaya and H. Sinha. phCoherence, Non-coherence and Local Consistency in Distributed Shared Memory for Parallel Computing. TR BU-CS-92-004, Boston Univ., 1992.
[24]
A. Heddaya and H. Sinha. phAn Implementation of Mermera: A Shared Memory System that Mixes Coherence with Non-coherence. TR BUCS-TR-1993-006, Boston Univ., 1993.
[25]
M. De Domenico, A. Lima, P. Mougel, and M. Musolesi. The Anatomy of a Scientific Rumor. Scientific Reports, 2013.
[26]
P. W. Hutto and M. Ahamad. Slow memory: Weakening consistency to enhance concurrency in distributed shared memories. phICDCS, pages 302--309, 1990.
[27]
L. Iftode, J. P. Singh, and K. Li. Scope Consistency: A Bridge Between Release Consistency and Entry Consistency. phSPAA, pages 277--287, 1996.
[28]
V. Iosevich and A. Schuster. Distributed Shared Memory: To Relax or Not to Relax? In M. Danelutto, M. Vanneschi, and D. Laforenza, editors, phEuro-Par, phLNCS 3149, pages 198--205, Springer, 2004.
[29]
U. Kang, D. Horng, et al. Inference of Beliefs on Billion-Scale Graphs. phKDD-LDMTA, 2010.
[30]
G. Karypis, V. Kumar. A Fast and Highly Quality Multilevel Scheme for Partitioning Irregular Graphs. phSIAM Journal on Scientific Computing, Vol. 20, pp. 359--392, 1999.
[31]
P. Keleher, A.L. Cox, and W. Zwaenepoel. phLazy Release Consistency for Software Distributed Shared Memory, phISCA, pages 13--21, 1992.
[32]
P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. phWTEC.pages 10--10, 1994.
[33]
SC. Koduru, M. Feng, and R. Gupta. Programming Large Dynamic Data Structures on a DSM Cluster of Multicores. phPGAS Programming Models, 2013.
[34]
L. Kontothanassis, R. Stets, G. Hunt, U. Rencuzogullari, G. Altekar, S. Dwarkadas, and M.L. Scott. Shared Memory Computing on Clusters with Symmetric Multiprocessors and System Area Networks. phTOCS, 23 (3): 301--335, 2005.
[35]
A. Kristensen and C. Low. Problem-oriented Object Memory: Customizing Consistency. phOOPSLA, pages 399--413, 1995.
[36]
M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L.P. Chew. Optimistic Parallelism Requires Abstractions. phPLDI, pages 211--222, 2007.
[37]
A. Kyrola, G. Blelloch, C. Guestrin GraphChi: Large-scale Graph Computation on Just a PC. phOSDI, pages 31--46, 2012.
[38]
L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. phIEEE TC, C-28 (9): 690--691, 1979.
[39]
L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. phCACM, 21 (7): 558--565, 1978.
[40]
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. phThe Directory-based Cache Coherence Protocol for the DASH Multiprocessor, phISCA, pages 148--159, 1990.
[41]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics, pages 29--123, 2009.
[42]
I. Lipkind, I. Pechtchanski, and V. Karamcheti. Object Views: Language Support for Intelligent Object Caching in Parallel and Distributed Computations. phOOPSLA, pages 447--460, 1999.
[43]
R. Lipton and J. Sandberg. phPRAM: A Scalable Shared Memory. Princeton University, Department of Computer Science, TR-180--88, 1988.
[44]
L. Liu and Z. Li. Improving Parallelism and Locality with Asynchronous Algorithms. phPPoPP, pages 213--222, 2010.
[45]
X. Liu and T. Murata. Advanced modularity-specialized label propagation algorithm for detecting communities in networks. phPhysica A: Statistical Mechanics and its Applications, 389 (7): 1493--1500, 2010.
[46]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. phProc. VLDB Endow., 5 (8): 716--727, 2012.
[47]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A System for Large-scale Graph Processing.phSIGMOD, pages 135--146, 2010.
[48]
R. Meyers and Z. Li. ASYNC Loop Constructs for Relaxed Synchronization. In phLCPC,phLanguages and Compilers for Parallel Computing, pages 292--303, 2008.
[49]
D. Mosberger. Memory Consistency Models. phSIGOPS Oper. Syst. Rev., 27 (1): 18--26, 1993.
[50]
D. Nguyen, L. Andrew and K. Pingali. A Lightweight Infrastructure for Graph Analytics. phSOSP, 2013.
[51]
W-Y. Liang, C-T. King, and F. Lai. Adsmith: An Efficient Object-Based Distributed Shared Memory System on PVM. phInternational Symposium on Parallel Architectures, Algorithms, and Networks, pages 173--179, 1996.
[52]
J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012.
[53]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. 1999.
[54]
L. Takac and M. Zabovsky. Data analysis in public social networks. International Scientific Conference and International Workshop Present Day Trends of Innovations, 2012.
[55]
L. Rauchwerger and D. A. Padua. The LRPD Test: Speculative Run-time Parallelization of Loops with Privatization and Reduction Parallelization. PLDI, 218--232, 1995.
[56]
D. J. Scales and K. Gharachorloo. Design and Performance of the Shasta Distributed Shared Memory Protocol. phICS, pages 245--252, 1997.
[57]
M. Schulz, J. Tao, and W. Karl. Improving the Scalability of Shared Memory Systems through Relaxed Consistency. phWC3, 2002.
[58]
X. Shen, Arvind, and L. Rudolph. CACHET: An Adaptive Cache Coherence Protocol for Distributed Shared-memory Systems. phICS, pages 135--144, 1999.
[59]
J. Shun, and G. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. phPPoPP, pages 135--146, 2013.
[60]
A. Singla, U. Ramachandran, and J. Hodgins. Temporal Notions of Synchronization and Consistency in Beehive. phSPAA, pages 211--220, 1997.
[61]
J. Leskovec. Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/index.html, 2011.
[62]
C. Sinclair. 3-D spectral-element elastic wave modeling in freq. domain. http://www.cise.ufl.edu/research/sparse/matrices/Sinclair/3Dspectralwave.html, 2007.
[63]
T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. phACM Transactions on Mathematical Software, Vol 38, pages 1:1 - 1:25, 2011.
[64]
L. G. Valiant. A Bridging Model for Parallel Computation. phCACM, 33 (8): 103--111, 1990.
[65]
J. Leskovec, L. A. Adamic, and B. A. Huberman. The Dynamics of Viral Marketing. ACM Trans. Web, 2007.
[66]
G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous Large-Scale Graph Processing Made Easy. phCIDR, 2013.
[67]
B.-H. Yu, Z. Huang, S. Cranefield, and M. Purvis. Homeless and Home-based Lazy Release Consistency Protocols on Distributed Shared Memory. phAustralasian Conf. on Computer Science-Vol 26, pages 117--123, 2004.
[68]
Y. Zhou, L. Iftode, J. P. Sing, K. Li, B. R. Toonen, I. Schoinas, M. D. Hill, and D. A. Wood. Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation. phPPoPP, pages 193--205, 1997.
[69]
X. Zhu and Z. Ghahramani. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report Carnegie Mellon University-CALD-02--107,Carnegie Mellon University, 2002.

Cited By

View all
  • (2024)Meerkat: A Framework for Dynamic Graph Algorithms on GPUsInternational Journal of Parallel Programming10.1007/s10766-024-00774-z52:5-6(400-453)Online publication date: 1-Dec-2024
  • (2023)Expressway: Prioritizing Edges for Distributed Evaluation of Graph Queries2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386860(4362-4371)Online publication date: 15-Dec-2023
  • (2022)STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model TrainingSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00076(1-17)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications
October 2014
946 pages
ISBN:9781450325851
DOI:10.1145/2660193
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 49, Issue 10
    OOPSLA '14
    October 2014
    907 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2714064
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. best effort refresh
  2. bounded staleness
  3. communication latency
  4. distributed shared memory
  5. graph analytics
  6. graph mining
  7. pde solvers

Qualifiers

  • Research-article

Funding Sources

Conference

SPLASH '14
Sponsor:

Acceptance Rates

OOPSLA '14 Paper Acceptance Rate 52 of 186 submissions, 28%;
Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)2
Reflects downloads up to 29 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Meerkat: A Framework for Dynamic Graph Algorithms on GPUsInternational Journal of Parallel Programming10.1007/s10766-024-00774-z52:5-6(400-453)Online publication date: 1-Dec-2024
  • (2023)Expressway: Prioritizing Edges for Distributed Evaluation of Graph Queries2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386860(4362-4371)Online publication date: 15-Dec-2023
  • (2022)STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model TrainingSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00076(1-17)Online publication date: Nov-2022
  • (2021)VRGQACM SIGOPS Operating Systems Review10.1145/3469379.346938255:1(11-20)Online publication date: 6-Jun-2021
  • (2021)Distributed Graph Processing System and Processing-in-memory Architecture with Precise Loop-carried Dependency GuaranteeACM Transactions on Computer Systems10.1145/345368137:1-4(1-37)Online publication date: 1-Jul-2021
  • (2020)SympleGraph: distributed graph processing with precise loop-carried dependency guaranteeProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3385961(592-607)Online publication date: 11-Jun-2020
  • (2020)Optimizing ordered graph algorithms with GraphItProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377909(158-170)Online publication date: 22-Feb-2020
  • (2020)GraphPulse: An Event-Driven Hardware Accelerator for Asynchronous Graph Processing2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00078(908-921)Online publication date: Oct-2020
  • (2020)SimGQ: Simultaneously Evaluating Iterative Graph Queries2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC50609.2020.00014(1-10)Online publication date: Dec-2020
  • (2020)BEAD: Batched Evaluation of Iterative Graph Queries with Evolving Analytics Demands2020 IEEE International Conference on Big Data (Big Data)10.1109/BigData50022.2020.9378211(461-468)Online publication date: 10-Dec-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media