Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

MITra: A Framework for Multi-Instance Graph Traversal

Published: 01 June 2023 Publication History

Abstract

This paper presents MITra, a framework for composing multi-instance graph algorithms that traverse from multiple source vertices simultaneously over a single thread. Underlying MITra is a model of multi-instance traversal that uniformly captures traversal sharing across instances. Based on this, MITra provides a programming model that allows users to express traversals by declaring vertex ranks and specify computation logic via an edge function. It synthesizes multi-instance traversal algorithms from declared vertex ranks and edge functions adopted from classic single-instance algorithms, automatically sharing computation across instances and benefiting from SIMD. We show that MITra can generate multi-instance algorithms provably better than existing ones, while being more expressive than traditional frameworks. In addition to the ease of programming, we experimentally verify that MITra is on average an order of magnitude faster than approaches based on existing frameworks for common graph algorithms, and is comparable to the state-of-the-art highly optimized one-off algorithms.

References

[1]
Giraph. https://giraph.apache.org/.
[2]
Intel® intrinsics guide. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html.
[3]
LiveJournal. http://snap.stanford.edu/data/soc-LiveJournal1.html.
[4]
Pokec. http://snap.stanford.edu/data/soc-Pokec.html.
[5]
Snap Library. https://snap.stanford.edu/snap/description.html.
[6]
The source code for Ligra framework. https://github.com/jshun/ligra.
[7]
The source code for MS-BFS algorithm. https://github.com/mtodat/ms-bfs.
[8]
Twitter. http://konect.cc/networks/twitter/.
[9]
UK domain. http://konect.cc/networks/dimacs10-uk-2007-05/.
[10]
UK/DE/EU Traffic. https://www.cc.gatech.edu/dimacs10/archive/streets.shtml.
[11]
US Traffic. http://www.diag.uniroma1.it//challenge9/download.shtml.
[12]
Z. Abul-Basher. Multiple-query optimization of regular path queries. In 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19--22, 2017, pages 1426--1430. IEEE Computer Society, 2017.
[13]
T. Akiba, Y. Iwata, and Y. Yoshida. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In SIGMOD. ACM, 2013.
[14]
N. Bruno, L. Gravano, N. Koudas, and D. Srivastava. Navigation- vs. index-based XML multi-query processing. In U. Dayal, K. Ramamritham, and T. M. Vijayaraman, editors, Proceedings of the 19th International Conference on Data Engineering, March 5--8, 2003, Bangalore, India, pages 139--150. IEEE Computer Society, 2003.
[15]
D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining, pages 442--446. SIAM, 2004.
[16]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms. MIT press, 2009.
[17]
M. K. Esfahani, P. Kilpatrick, and H. Vandierendonck. Exploiting in-hub temporal locality in spmv-based graph processing. In X. Sun, S. Shende, L. V. Kalé, and Y. Chen, editors, ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9 - 12, 2021, pages 42:1--42:10. ACM, 2021.
[18]
J. L. Gersting. Mathematical structures for computer science (3. ed.). Computer Science Press, 1993.
[19]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In C. Thekkath and A. Vahdat, editors, 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8--10, 2012, pages 17--30. USENIX Association, 2012.
[20]
W. Han, S. Lee, K. Park, J. Lee, M. Kim, J. Kim, and H. Yu. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, IL, USA, August 11--14, 2013, pages 77--85. ACM, 2013.
[21]
G. Jeh and J. Widom. Scaling personalized web search. In G. Hencsey, B. White, Y. R. Chen, L. Kovács, and S. Lawrence, editors, Proceedings of the Twelfth International World Wide Web Conference, WWW 2003, Budapest, Hungary, May 20--24, 2003, pages 271--279. ACM, 2003.
[22]
U. Kang, D. H. Chau, and C. Faloutsos. Mining large graphs: Algorithms, inference, and discoveries. In ICDE.
[23]
U. Kang, C. E. Tsourakakis, and C. Faloutsos. PEGASUS: mining peta-scale graphs. Knowl. Inf. Syst., 27(2):303--325, 2011.
[24]
M. Kaufmann, M. Then, A. Kemper, and T. Neumann. Parallel array-based single- and multi-source breadth first searches on large dense graphs. In V. Markl, S. Orlando, B. Mitschang, P. Andritsos, K. Sattler, and S. Breß, editors, Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21--24, 2017, pages 1--12. OpenProceedings.org, 2017.
[25]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999.
[26]
A. Koschmieder and U. Leser. Regular path queries on large graphs. In A. Ailamaki and S. Bowers, editors, SSDBM, volume 7338, pages 177--194. Springer, 2012.
[27]
A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In C. Thekkath and A. Vahdat, editors, 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8--10, 2012, pages 31--46. USENIX Association, 2012.
[28]
W. Le, A. Kementsietsidis, S. Duan, and F. Li. Scalable multi-query optimization for SPARQL. In A. Kementsietsidis and M. A. V. Salles, editors, IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1--5 April, 2012, pages 666--677. IEEE Computer Society, 2012.
[29]
X. Liu, Z. Wang, N. Wang, X. Li, B. Zhang, J. Qiao, Z. Wei, and J. Nie. An adaptive sharing framework for efficient multi-source shortest path computation. In C. Xing, X. Fu, Y. Zhang, G. Zhang, and C. Borjigin, editors, Web Information Systems and Applications - 18th International Conference, WISA 2021, Kaifeng, China, September 24--26, 2021, Proceedings, volume 12999 of Lecture Notes in Computer Science, pages 635--646. Springer, 2021.
[30]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning in the cloud. Proc. VLDB Endow., 5(8):716--727, 2012.
[31]
S. Lu, S. Sun, J. Paul, Y. Li, and B. He. Cache-efficient fork-processing patterns on large graphs. In G. Li, Z. Li, S. Idreos, and D. Srivastava, editors, SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, pages 1208--1221. ACM, 2021.
[32]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In A. K. Elmagarmid and D. Agrawal, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6--10, 2010, pages 135--146. ACM, 2010.
[33]
A. Mazloumi, X. Jiang, and R. Gupta. MultiLyra: Scalable distributed evaluation of batches of iterative graph queries. In C. Baru, J. Huan, L. Khan, X. Hu, R. Ak, Y. Tian, R. S. Barga, C. Zaniolo, K. Lee, and Y. F. Ye, editors, 2019 IEEE International Conference on Big Data (IEEE BigData), Los Angeles, CA, USA, December 9--12, 2019, pages 349--358. IEEE, 2019.
[34]
U. Meyer and P. Sanders. Delta-stepping: a parallelizable shortest path algorithm. J. Algorithms, 49(1):114--152, 2003.
[35]
D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In M. Kaminsky and M. Dahlin, editors, ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP'13, Farmington, PA, USA, November 3--6, 2013, pages 456--471. ACM, 2013.
[36]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
[37]
P. Pan and C. Li. Congra: Towards efficient processing of concurrent graph queries on shared-memory machines. In 2017 IEEE International Conference on Computer Design, ICCD 2017, Boston, MA, USA, November 5--8, 2017, pages 217--224. IEEE Computer Society, 2017.
[38]
A. Parravicini, F. Sgherzi, and M. D. Santambrogio. A reduced-precision streaming spmv architecture for personalized pagerank on FPGA. In ASPDAC, pages 378--383. ACM, 2021.
[39]
P. Peng, Q. Ge, L. Zou, M. T. Özsu, Z. Xu, and D. Zhao. Optimizing multi-query evaluation in federated RDF systems. IEEE Trans. Knowl. Data Eng., 33(4):1692--1707, 2021.
[40]
S. Qiu, L. You, and Z. Wang. Optimizing sparse matrix multiplications for graph neural networks. In International Workshop on Languages and Compilers for Parallel Computing, pages 101--117. Springer, 2022.
[41]
X. Ren and J. Wang. Multi-query optimization for subgraph isomorphism search. Proc. VLDB Endow., 10(3):121--132, 2016.
[42]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 472--488, 2013.
[43]
T. K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13(1):23--52, 1988.
[44]
T. K. Sellis and S. Ghosh. On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng., 2(2):262--266, 1990.
[45]
J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In A. Nicolau, X. Shen, S. P. Amarasinghe, and R. W. Vuduc, editors, ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, Shenzhen, China, February 23--27, 2013.
[46]
S. Skiena. The Algorithm Design Manual, Second Edition. Springer, 2008.
[47]
J. Sun, H. Vandierendonck, and D. S. Nikolopoulos. Accelerating graph analytics by utilising the memory locality of graph partitioning. In 2017 46th International Conference on Parallel Processing (ICPP), pages 181--190. IEEE, 2017.
[48]
M. Then, M. Kaufmann, F. Chirigati, T. Hoang-Vu, K. Pham, A. Kemper, T. Neumann, and H. T. Vo. The more the merrier: Efficient multi-source graph traversal. Proc. VLDB Endow., 8(4):449--460, 2014.
[49]
Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From "think like a vertex" to "think like a graph". Proc. VLDB Endow., 7(3):193--204, 2013.
[50]
L. G. Valiant. General purpose parallel architectures. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity, pages 943--972. Elsevier and MIT Press, 1990.
[51]
S. Wang, R. Yang, X. Xiao, Z. Wei, and Y. Yang. FORA: simple and effective approximate single-source personalized pagerank. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017, pages 505--514. ACM, 2017.
[52]
Z. Wei, X. He, X. Xiao, S. Wang, S. Shang, and J. Wen. TopPPR: Top-k personalized pagerank queries with precision guarantees on large graphs. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, pages 441--456. ACM, 2018.
[53]
C. Xu, A. Mazloumi, X. Jiang, and R. Gupta. SimGQ: Simultaneously evaluating iterative graph queries. In 27th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2020, Pune, India, December 16--19, 2020, pages 1--10. IEEE, 2020.
[54]
C. Xu, A. Mazloumi, X. Jiang, and R. Gupta. SimGQ+: Simultaneously evaluating iterative point-to-all and point-to-point graph queries. Journal of Parallel and Distributed Computing, 164:12--27, 2022.
[55]
D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. Proc. VLDB Endow., 7(14):1981--1992, 2014.
[56]
D. Yan, J. Cheng, M. T. Özsu, F. Yang, Y. Lu, J. C. S. Lui, Q. Zhang, and W. Ng. A general-purpose query-centric framework for querying big graphs. Proc. VLDB Endow., 9(7):564--575, 2016.
[57]
H. Yanagisawa. A multi-source label-correcting algorithm for the all-pairs shortest paths problem. In 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19--23 April 2010 - Conference Proceedings, pages 1--10, 2010.
[58]
Y. Zhang, X. Liao, H. Jin, L. Gu, L. He, B. He, and H. Liu. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In H. S. Gunawi and B. Reed, editors, 2018 USENIX Annual Technical Conference, USENIX ATC 2018, Boston, MA, USA, July 11--13, 2018, pages 441--452. USENIX Association, 2018.
[59]
Y. Zhang, M. Yang, R. Baghdadi, S. Kamil, J. Shun, and S. P. Amarasinghe. GraphIt: a high-performance graph DSL. Proc. ACM Program. Lang., 2(OOPSLA):121:1--121:30, 2018.
[60]
J. Zhao, Y. Zhang, X. Liao, L. He, B. He, H. Jin, H. Liu, and Y. Chen. GraphM: an efficient storage system for high throughput of concurrent graph processing. In M. Taufer, P. Balaji, and A. J. Peña, editors, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, Denver, Colorado, USA, November 17--19, 2019, pages 3:1--3:14. ACM, 2019.
[61]
X. Zhu, W. Chen, W. Zheng, and X. Ma. Gemini: A computation-centric distributed graph processing system. In K. Keeton and T. Roscoe, editors, 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2--4, 2016, pages 301--316. USENIX Association, 2016.

Cited By

View all
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 10
June 2023
295 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 June 2023
Published in PVLDB Volume 16, Issue 10

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)99
  • Downloads (Last 6 weeks)10
Reflects downloads up to 17 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enabling Window-Based Monotonic Graph Analytics with Reusable Transitional Results for Pattern-Consistent QueriesProceedings of the VLDB Endowment10.14778/3681954.368197917:11(3003-3016)Online publication date: 30-Aug-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media