Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/ISCA45697.2020.00043acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

GraphABCD: scaling out graph analytics with asynchronous block coordinate descent

Published: 23 September 2020 Publication History

Abstract

It is of vital importance to efficiently process large graphs for many data-intensive applications. As a result, a large collection of graph analytic frameworks has been proposed to improve the per-iteration performance on a single kind of computation resource. However, heavy coordination and synchronization overhead make it hard to scale out graph analytic frameworks from single platform to heterogeneous platforms. Furthermore, increasing the convergence rate, i.e. reducing the number of iterations, which is equally vital for improving the overall performance of iterative graph algorithms, receives much less attention.
In this paper, we introduce the Block Coordinate Descent (BCD) view of graph algorithms and propose an asynchronous heterogeneous graph analytic framework, GraphABCD, using the BCD view. The BCD view offers key insights and trade-offs on achieving high convergence rate of iterative graph algorithms. GraphABCD features fast convergence under the algorithm design options suggested by BCD. GraphABCD offers algorithm and architectural supports for asynchronous execution, without undermining its fast convergence properties. With minimum synchronization overhead, GraphABCD is able to scale out to heterogeneous and distributed accelerators efficiently. To demonstrate GraphABCD, we prototype its whole system on Intel HARPv2 CPU-FPGA heterogeneous platform. Evaluations on HARPv2 show that GraphABCD achieves geo-mean speedups of 4.8x and 2.0x over GraphMat, a state-of-the-art framework in terms of convergence rate and execution time, respectively.

References

[1]
J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, "A scalable processing-in-memory accelerator for parallel graph processing," in Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, June 13-17, 2015, 2015, pp. 105--117.
[2]
Z. Ai, M. Zhang, Y. Wu, X. Qian, K. Chen, and W. Zheng, "Clip: A disk I/O focused parallel out-of-core graph processing system," IEEE Trans. Parallel Distrib. Syst., vol. 30, no. 1, pp. 45--62, 2019.
[3]
M. S. B. Altaf and D. A. Wood, "Logca: A high-level performance model for hardware accelerators," in Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017, 2017, pp. 375--388.
[4]
L. Backstrom, D. P. Huttenlocher, J. M. Kleinberg, and X. Lan, "Group formation in large social networks: membership, growth, and evolution," in Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, 2006, 2006, pp. 44--54.
[5]
J. Bennett, S. Lanning et al., "The netflix prize," in Proceedings of KDD cup and workshop, vol. 2007. New York, NY, USA., 2007, p. 35.
[6]
A. Ching, H. Choi, J. Homan, C. Kunz, O. O'Malley, J. Mannix, D. Ryaboy, C. Martella, R. Shaposhnik, S. Schelter, E. Koontz, A. Presta, E. Reisman, M. Kabiljo, N. Joffe, S. Edunov, P. Kumar, I. Kabiljo, and H. Eslami, "Apache giraph," 2013. [Online]. Available: http://giraph.apache.org/
[7]
G. Dai, T. Huang, Y. Chi, N. Xu, Y. Wang, and H. Yang, "Foregraph: Exploring large-scale graph processing on multi-fpga architecture," in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2017, Monterey, CA, USA, February 22-24, 2017, 2017, pp. 217--226.
[8]
R. Dathathri, G. Gill, L. Hoang, H. Dang, A. Brooks, N. Dryden, M. Snir, and K. Pingali, "Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics," in Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018, 2018, pp. 752--768.
[9]
N. Engelhardt and H. K. So, "Gravf: A vertex-centric distributed graph processing framework on fpgas," in 26th International Conference on Field Programmable Logic and Applications, FPL 2016, Lausanne, Switzerland, August 29 - September 2, 2016, 2016, pp. 1--4.
[10]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin, "Powergraph: Distributed graph-parallel computation on natural graphs," in 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8-10, 2012, 2012, pp. 17--30.
[11]
S. Grossman, H. Litz, and C. Kozyrakis, "Making pull-based graph processing performant," in Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, Vienna, Austria, February 24-28, 2018, A. Krall and T. R. Gross, Eds. ACM, 2018, pp. 246--260.
[12]
T. J. Ham, L. Wu, N. Sundaram, N. Satish, and M. Martonosi, "Graphicionado: A high-performance and energy-efficient accelerator for graph analytics," in 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016, Taipei, Taiwan, October 15-19, 2016, 2016, pp. 56:1--56:13.
[13]
F. M. Harper and J. A. Konstan, "The movielens datasets: History and context," TiiS, vol. 5, no. 4, pp. 19:1--19:19, 2016.
[14]
S. Hong, S. K. Kim, T. Oguntebi, and K. Olukotun, "Accelerating CUDA graph algorithms at maximum warp," in Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2011, San Antonio, TX, USA, February 12-16, 2011, 2011, pp. 267--276.
[15]
F. Khorasani, K. Vora, R. Gupta, and L. N. Bhuyan, "Cusha: vertex-centric graph processing on gpus," in The 23rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC'14, Vancouver, BC, Canada - June 23 - 27, 2014, 2014, pp. 239--252.
[16]
D. Kotkov, J. A. Konstan, Q. Zhao, and J. Veijalainen, "Investigating serendipity in recommender systems based on real user feedback," in Proceedings of the 33rd Annual ACM Symposium on Applied Computing, SAC 2018, Pau, France, April 09-13, 2018, 2018, pp. 1341--1350.
[17]
H. Kwak, C. Lee, H. Park, and S. B. Moon, "What is twitter, a social network or a news media?" in Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010, M. Rappa, P. Jones, J. Freire, and S. Chakrabarti, Eds. ACM, 2010, pp. 591--600.
[18]
A. Kyrola, G. E. Blelloch, and C. Guestrin, "Graphchi: Large-scale graph computation on just a PC," in 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8-10, 2012, 2012, pp. 31--46.
[19]
K. Lakhotia, R. Kannan, S. Pati, and V. K. Prasanna, "GPOP: a cache and memory-efficient framework for graph processing over partitions," in Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, Washington, DC, USA, February 16-20, 2019, 2019, pp. 393--394.
[20]
A. Lenharth, D. Nguyen, and K. Pingali, "Parallel graph analytics," Commun. ACM, vol. 59, no. 5, pp. 78--87, Apr. 2016.
[21]
J. Leskovec, D. P. Huttenlocher, and J. M. Kleinberg, "Predicting positive and negative links in online social networks," in Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010, 2010, pp. 641--650.
[22]
J. Leskovec, D. P. Huttenlocher, and J. M. Kleinberg, "Signed networks in social media," in Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, Atlanta, Georgia, USA, April 10-15, 2010. ACM, 2010, pp. 1361--1370.
[23]
J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, "Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters," arXiv e-prints, p. arXiv:0810.1355, Oct. 2008.
[24]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein, "Distributed graphlab: A framework for machine learning in the cloud," PVLDB, vol. 5, no. 8, pp. 716--727, 2012.
[25]
L. Luo and Y. Liu, "Improving parallel efficiency for asynchronous graph analytics using gauss-seidel-based matrix computation," Concurrency and Computation: Practice and Experience, vol. 31, no. 17, 2019.
[26]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, "Pregel: a system for large-scale graph processing," in Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6-10, 2010, 2010, pp. 135--146.
[27]
D. Nguyen, A. Lenharth, and K. Pingali, "A lightweight infrastructure for graph analytics," in Proceedings of ACM Symposium on Operating Systems Principles, ser. SOSP '13, 2013, pp. 456--471.
[28]
E. Nurvitadhi, G. Weisz, Y. Wang, S. Hurkat, M. Nguyen, J. C. Hoe, J. F. Martínez, and C. Guestrin, "Graphgen: An FPGA framework for vertex-centric graph computation," in 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, Boston, MA, USA, May 11-13, 2014, 2014, pp. 25--28.
[29]
J. Nutini, I. Laradji, and M. Schmidt, "Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence," ArXiv e-prints, Dec. 2017.
[30]
J. Nutini, M. W. Schmidt, I. H. Laradji, M. P. Friedlander, and H. A. Koepke, "Coordinate descent converges faster with the gauss-southwell rule than random selection," in Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, 2015, pp. 1632--1641.
[31]
T. Oguntebi and K. Olukotun, "Graphops: A dataflow library for graph analytics acceleration," in Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, February 21--23, 2016, 2016, pp. 111--117.
[32]
N. Oliver, R. R. Sharma, S. Chang, B. Chitlur, E. Garcia, J. Grecco, A. Grier, N. Ijih, Y. Liu, P. Marolia, H. Mitchel, S. Subhaschandra, A. Sheiman, T. Whisonant, and P. Gupta, "A reconfigurable computing system based on a cache-coherent fabric," in 2011 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2011, Cancun, Mexico, November 30 - December 2, 2011, 2011, pp. 80--85.
[33]
M. M. Ozdal, S. Yesil, T. Kim, A. Ayupov, J. Greth, S. M. Burns, and Ö. Özturk, "Energy efficient architecture for graph analytics accelerators," in 43rd ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2016, Seoul, South Korea, June 18-22, 2016, 2016, pp. 166--177.
[34]
B. Recht, C. Ré, S. J. Wright, and F. Niu, "Hogwild: A lock-free approach to parallelizing stochastic gradient descent," in Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, Eds., 2011, pp. 693--701.
[35]
A. H. N. Sabet, J. Qiu, and Z. Zhao, "Tigr: Transforming irregular graphs for gpu-friendly graph processing," in Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018, Williamsburg, VA, USA, March 24-28, 2018, 2018, pp. 622--636.
[36]
J. Shun and G. E. Blelloch, "Ligra: a lightweight graph processing framework for shared memory," in ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, Shenzhen, China, February 23-27, 2013, 2013, pp. 135--146.
[37]
J. Shun, L. Dhulipala, and G. E. Blelloch, "Smaller and faster: Parallel processing of compressed graphs with ligra+," in 2015 Data Compression Conference, DCC 2015, Snowbird, UT, USA, April 7-9, 2015, A. Bilgin, M. W. Marcellin, J. Serra-Sagristà, and J. A. Storer, Eds. IEEE, 2015, pp. 403--412.
[38]
P. Stutz, A. Bernstein, and W. W. Cohen, "Signal/collect: Graph algorithms for the (semantic) web," in The Semantic Web - ISWC 2010 - 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised Selected Papers, Part I, 2010, pp. 764--780.
[39]
T. Sun, R. Hannah, and W. Yin, "Asynchronous Coordinate Descent Under More Realistic Assumption," in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS'17. USA: Curran Associates Inc., 2017, pp. 6183--6191, event-place: Long Beach, California, USA.
[40]
N. Sundaram, N. Satish, M. M. A. Patwary, S. Dulloor, M. J. Anderson, S. G. Vadlamudi, D. Das, and P. Dubey, "Graphmat: High performance graph analytics made productive," PVLDB, vol. 8, no. 11, pp. 1214--1225, 2015.
[41]
L. Takac and M. Zabovsky, "Data analysis in public social networks," International Scientific Conference and International Workshop Present Day Trends of Innovations, pp. 1--6, 01 2012.
[42]
S. Tasci and M. Demirbas, "Giraphx: Parallel yet serializable large-scale graph processing," in Euro-Par 2013 Parallel Processing - 19th International Conference, Aachen, Germany, August 26-30, 2013. Proceedings, 2013, pp. 458--469.
[43]
P. Tseng, "Convergence of a block coordinate descent method for nondifferentiable minimization," Journal of optimization theory and applications, vol. 109, no. 3, pp. 475--494, 2001.
[44]
D. Voitsechov and Y. Etsion, "Single-graph multiple flows: Energy efficient design alternative for gpgpus," in ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014, Minneapolis, MN, USA, June 14-18, 2014, 2014, pp. 205--216.
[45]
Y. Wang, S. Baxter, and J. D. Owens, "Mini-gunrock: A lightweight graph analytics framework on the GPU," in 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2017, Orlando / Buena Vista, FL, USA, May 29 - June 2, 2017, 2017, pp. 616--626.
[46]
J. J. Whang, A. Lenharth, I. S. Dhillon, and K. Pingali, "Scalable Data-Driven PageRank: Algorithms, System Issues, and Lessons Learned," in Euro-Par 2015: Parallel Processing, J. L. Träff, S. Hunold, and F. Versaci, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015, vol. 9233, pp. 438--450.
[47]
D. Yan, J. Cheng, K. Xing, Y. Lu, W. Ng, and Y. Bu, "Pregel algorithms for graph connectivity problems with performance guarantees," PVLDB, vol. 7, no. 14, pp. 1821--1832, 2014.
[48]
M. Yan, L. Deng, X. Hu, L. Liang, Y. Feng, X. Ye, Z. Zhang, D. Fan, and Y. Xie, "Hygcn: A GCN accelerator with hybrid architecture," CoRR, vol. abs/2001.02514, 2020.
[49]
M. Yan, X. Hu, S. Li, A. Basak, H. Li, X. Ma, I. Akgun, Y. Feng, P. Gu, L. Deng, X. Ye, Z. Zhang, D. Fan, and Y. Xie, "Alleviating irregularity in graph analytics acceleration: a hardware/software co-design approach," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019, 2019, pp. 615--628.
[50]
E. Yoneki and A. Roy, "Scale-up graph processing: a storage-centric view," in First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, co-loated with SIGMOD/PODS 2013, New York, NY, USA, June 24, 2013, 2013, p. 8.
[51]
M. Zhang, Y. Wu, K. Chen, X. Qian, X. Li, and W. Zheng, "Exploring the hidden dimension in graph processing," in 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016., 2016, pp. 285--300.
[52]
M. Zhang, Y. Wu, Y. Zhuo, X. Qian, C. Huan, and K. Chen, "Wonderland: A novel abstraction-based out-of-core graph processing system," in Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2018, Williamsburg, VA, USA, March 24-28, 2018, 2018, pp. 608--621.
[53]
M. Zhang, Y. Zhuo, C. Wang, M. Gao, Y. Wu, K. Chen, C. Kozyrakis, and X. Qian, "Graphp: Reducing communication for pim-based graph processing with efficient data partition," in IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, February 24-28, 2018, 2018, pp. 544--557.
[54]
Y. Zhang, M. Yang, R. Baghdadi, S. Kamil, J. Shun, and S. P. Amarasinghe, "Graphit: a high-performance graph DSL," PACMPL, vol. 2, no. OOPSLA, pp. 121:1--121:30, 2018.
[55]
S. Zhou, C. Chelmis, and V. K. Prasanna, "High-throughput and energy-efficient graph processing on FPGA," in 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2016, Washington, DC, USA, May 1-3, 2016, 2016, pp. 103--110.
[56]
S. Zhou, R. Kannan, Y. Min, and V. K. Prasanna, "FASTCF: fpga-based accelerator for stochastic-gradient-descent-based collaborative filtering," in Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2018, Monterey, CA, USA, February 25-27, 2018, 2018, pp. 259--268.
[57]
S. Zhou, R. Kannan, H. Zeng, and V. K. Prasanna, "An FPGA framework for edge-centric graph processing," in Proceedings of the 15th ACM International Conference on Computing Frontiers, CF 2018, Ischia, Italy, May 08-10, 2018, 2018, pp. 69--77.
[58]
S. Zhou and V. K. Prasanna, "Accelerating graph analytics on CPU-FPGA heterogeneous platform," in 29th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2017, Campinas, Brazil, October 17-20, 2017, 2017, pp. 137--144.
[59]
X. Zhu, W. Chen, W. Zheng, and X. Ma, "Gemini: A computation-centric distributed graph processing system," in 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016., 2016, pp. 301--316.
[60]
Y. Zhuo, C. Wang, M. Zhang, R. Wang, D. Niu, Y. Wang, and X. Qian, "Graphq: Scalable pim-based graph processing," in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019, 2019, pp. 712--725.

Cited By

View all
  • (2023)GGPAProceedings of the 20th ACM International Conference on Computing Frontiers10.1145/3587135.3592198(33-41)Online publication date: 9-May-2023
  • (2022)HetGraph: A High Performance CPU-CGRA Architecture for Matrix-based Graph AnalyticsProceedings of the Great Lakes Symposium on VLSI 202210.1145/3526241.3530382(387-391)Online publication date: 6-Jun-2022
  • (2022)Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural networkProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527439(946-961)Online publication date: 18-Jun-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture
May 2020
1152 pages
ISBN:9781728146614

Sponsors

In-Cooperation

  • IEEE

Publisher

IEEE Press

Publication History

Published: 23 September 2020

Check for updates

Author Tags

  1. FPGA
  2. graph analytics
  3. hardware accelerator
  4. heterogeneous architecture

Qualifiers

  • Research-article

Conference

ISCA '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)GGPAProceedings of the 20th ACM International Conference on Computing Frontiers10.1145/3587135.3592198(33-41)Online publication date: 9-May-2023
  • (2022)HetGraph: A High Performance CPU-CGRA Architecture for Matrix-based Graph AnalyticsProceedings of the Great Lakes Symposium on VLSI 202210.1145/3526241.3530382(387-391)Online publication date: 6-Jun-2022
  • (2022)Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural networkProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527439(946-961)Online publication date: 18-Jun-2022
  • (2022)TDGraphProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527409(116-129)Online publication date: 18-Jun-2022
  • (2021)LCCGProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3480854(1-14)Online publication date: 14-Nov-2021
  • (2021)SpZipProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00087(1069-1082)Online publication date: 14-Jun-2021
  • (2021)PolyGraphProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00053(595-608)Online publication date: 14-Jun-2021
  • (2021)FlexMinerProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00052(581-594)Online publication date: 14-Jun-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media