Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient Distributed Hop-Constrained Path Enumeration on Large-Scale Graphs

Published: 26 March 2024 Publication History

Abstract

The enumeration of hop-constrained simple paths is a building block in many graph-based areas. Due to the enormous search spaces in large-scale graphs, a single machine can hardly satisfy the requirements of both efficiency and memory, which causes an urgent need for efficient distributed methods. In practice, it is inevitable to produce plenty of intermediate results when directly extending centralized methods to the distributed environment, thereby causing a memory crisis and weakening the query performance. The state-of-the-art distributed method HybridEnum designed a hybrid search paradigm to enumerate simple paths. However, it makes massive exploration for the redundant vertices not located in any simple path, thereby resulting in poor query performance. To alleviate this problem, we design a distributed approach DistriEnum to optimize query performance and scalability with well-bound memory consumption. Firstly, DistriEnum adopts a graph reduction strategy to rule out the redundant vertices without satisfying the constraint of hop number. Then, a core search paradigm is designed to simultaneously reduce the traversal of shared subpaths and the storage of intermediate results. Moreover, DistriEnum is equipped with a task division strategy to theoretically achieve workload balance. Finally, a vertex migration strategy is devised to reduce the communication cost during the enumeration. The comprehensive experimental results on 10 real-world graphs demonstrate that DistriEnum achieves up to 3 orders of magnitude speedup than HybridEnum in query performance and exhibits superior performances on scalability, communication cost, and memory consumption.

References

[1]
Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In SIGMOD. ACM, 349--360.
[2]
Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Robert E. Tarjan. 2016. A New Approach to Incremental Cycle Detection and Related Problems. ACM Trans. Algorithms, Vol. 12, 2 (2016), 14:1--14:22.
[3]
Sayan Bhattacharya and Janardhan Kulkarni. 2020. An Improved Algorithm for Incremental Cycle Detection and Topological Ordering in Sparse Graphs. In SODA. SIAM, 2509--2521.
[4]
Etienne Birmelé, Rui A. Ferreira, Roberto Grossi, Andrea Marino, Nadia Pisanti, Romeo Rizzi, and Gustavo Sacomoto. 2013. Optimal Listing of Cycles and st-Paths in Undirected Graphs. In SODA. SIAM, 1884--1896.
[5]
Katerina Böhmová, Luca Hafliger, Matús Mihalák, Tobias Prö ger, Gustavo Sacomoto, and Marie-France Sagot. 2018. Computing and Listing st-Paths in Public Transportation Networks. Theory Comput. Syst., Vol. 62, 3 (2018), 600--621.
[6]
Yuzheng Cai, Siyuan Liu, Weiguo Zheng, and Xuemin Lin. 2023. Towards Generating Hop-constrained s-t Simple Path Graphs. CoRR, Vol. abs/2304.12656 (2023). https://doi.org/10.48550/arXiv.2304.12656
[7]
Lijun Chang, Xuemin Lin, Lu Qin, Jeffrey Xu Yu, and Jian Pei. 2015. Efficiently Computing Top-K Shortest Path Join. In EDBT. OpenProceedings.org, 133--144.
[8]
Xiaoshuang Chen, Kai Wang, Xuemin Lin, Wenjie Zhang, Lu Qin, and Ying Zhang. 2021. Efficiently Answering Reachability and Path Queries on Temporal Bipartite Graphs. Proc. VLDB Endow., Vol. 14, 10 (2021), 1845--1858.
[9]
Edith Cohen, Eran Halperin, Haim Kaplan, and Uri Zwick. 2003. Reachability and Distance Queries via 2-Hop Labels. SIAM J. Comput., Vol. 32, 5 (2003), 1338--1355.
[10]
Yixiang Fang, Reynold Cheng, Siqiang Luo, and Jiafeng Hu. 2016. Effective Community Search for Large Attributed Graphs. Proc. VLDB Endow., Vol. 9, 12 (2016), 1233--1244.
[11]
Roberto Grossi, Andrea Marino, and Luca Versari. 2018. Efficient Algorithms for Listing k Disjoint st-Paths in Graphs. In LATIN (Lecture Notes in Computer Science, Vol. 10807). Springer, 544--557.
[12]
Sairam Gurajada and Martin Theobald. 2016. Distributed Set Reachability. In SIGMOD. ACM, 1247--1261.
[13]
Bernhard Haeupler, Telikepalli Kavitha, Rogers Mathew, Siddhartha Sen, and Robert Endre Tarjan. 2012. Incremental Cycle Detection, Topological Ordering, and Strong Component Maintenance. ACM Trans. Algorithms, Vol. 8, 1 (2012), 3:1--3:33.
[14]
Kongzhang Hao, Long Yuan, and Wenjie Zhang. 2021. Distributed Hop-Constrained s-t Simple Path Enumeration at Billion Scale. Proc. VLDB Endow., Vol. 15, 2 (2021), 169--182.
[15]
Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. 2016. FAQ: Questions Asked Frequently. In PODS, 2016. ACM, 13--28.
[16]
Larkshmi Krishnamurthy, Joseph H. Nadeau, Gultekin Özsoyoglu, Z. Meral Ö zsoyoglu, Greg Schaeffer, Murat Tasan, and Wanhong Xu. 2003. Pathways Database System: An Integrated System for Biological Pathways. Bioinform., Vol. 19, 8 (2003), 930--937.
[17]
Rohit Kumar and Toon Calders. 2018. 2SCENT: An Efficient Algorithm to Enumerate All Simple Temporal Cycles. Proc. VLDB Endow., Vol. 11, 11 (2018), 1441--1453.
[18]
Kisung Lee and Ling Liu. 2013. Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning. Proc. VLDB Endow., Vol. 6, 14 (2013), 1894--1905.
[19]
Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2019. Scaling Distance Labeling on Small-World Networks. In SIGMOD. ACM, 1060--1077.
[20]
Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2020. Scaling Up Distance Labeling on Graphs with Core-Periphery Properties. In SIGMOD. ACM, 1367--1381.
[21]
Masaaki Nishino, Norihito Yasuda, Shin-ichi Minato, and Masaaki Nagata. 2017. Compiling Graph Substructures into Sentential Decision Diagrams. In AAAI. 1213--1221.
[22]
You Peng, Xuemin Lin, Ying Zhang, Wenjie Zhang, and Lu Qin. 2022. Answering reachability and K-reach queries on large graphs with label constraints. VLDB J., Vol. 31, 1 (2022), 101--127.
[23]
You Peng, Ying Zhang, Xuemin Lin, Wenjie Zhang, Lu Qin, and Jingren Zhou. 2019. Hop-constrained s-t Simple Path Enumeration: Towards Bridging Theory and Practice. Proc. VLDB Endow., Vol. 13, 4 (2019), 463--476.
[24]
Michalis Potamias, Francesco Bonchi, Carlos Castillo, and Aristides Gionis. 2009. Fast shortest path distance estimation in large networks. In CIKM. ACM, 867--876.
[25]
Xiafei Qiu, Wubin Cen, Zhengping Qian, You Peng, Ying Zhang, Xuemin Lin, and Jingren Zhou. 2018. Real-time Constrained Cycle Detection in Large Dynamic Graphs. Proc. VLDB Endow., Vol. 11, 12 (2018), 1876--1888.
[26]
Baoxu Shi and Tim Weninger. 2016. Discriminative predicate path mining for fact checking in knowledge graphs. Knowl. Based Syst., Vol. 104 (2016), 123--133.
[27]
Prashant Shiralkar, Alessandro Flammini, Filippo Menczer, and Giovanni Luca Ciampaglia. 2017. Finding Streams in Knowledge Graphs to Support Fact Checking. In ICDM. IEEE Computer Society, 859--864.
[28]
Shixuan Sun, Yuhang Chen, Bingsheng He, and Bryan Hooi. 2021. PathEnum: Towards Real-Time Hop-Constrained s-t Path Enumeration. In SIGMOD. ACM, 1758--1770.
[29]
Lucien D. J. Valstar, George H. L. Fletcher, and Yuichi Yoshida. 2017. Landmark Indexing for Evaluation of Label-Constrained Reachability Queries. In SIGMOD. ACM, 345--358.
[30]
Dong Wen, Yilun Huang, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. 2020. Efficiently Answering Span-Reachability Queries in Large Temporal Graphs. In ICDE. IEEE, 1153--1164.
[31]
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs. Proc. VLDB Endow., Vol. 7, 14 (2014), 1981--1992.
[32]
Norihito Yasuda, Teruji Sugaya, and Shin-ichi Minato. 2017. Fast Compilation of s-t Paths on a Graph for Counting and Enumeration. In AMBN (Proceedings of Machine Learning Research, Vol. 73). PMLR, 129--140.
[33]
Yuanyuan Zeng, Kenli Li, Xu Zhou, Wensheng Luo, and Yunjun Gao. 2022a. An Efficient Index-Based Approach to Distributed Set Reachability on Small-World Graphs. IEEE Trans. Parallel Distributed Syst., Vol. 33, 10 (2022), 2358--2371.
[34]
Yuanyuan Zeng, Wangdong Yang, Xu Zhou, Guoqin Xiao, Yunjun Gao, and Kenli Li. 2022b. Distributed Set Label-Constrained Reachability Queries over Billion-Scale Graphs. In ICDE. IEEE.
[35]
Chao Zhang, Angela Bonifati, and M. Tamer Özsu. 2023. An Overview of Reachability Indexes on Graphs. In SIGMOD, 2023. ACM, 61--68.

Cited By

View all
  • (2024)Distributed Shortest Distance Labeling on Large-Scale GraphsProceedings of the VLDB Endowment10.14778/3675034.367505317:10(2641-2653)Online publication date: 1-Jun-2024

Index Terms

  1. Efficient Distributed Hop-Constrained Path Enumeration on Large-Scale Graphs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 2, Issue 1
    SIGMOD
    February 2024
    1874 pages
    EISSN:2836-6573
    DOI:10.1145/3654807
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 March 2024
    Published in PACMMOD Volume 2, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. distributed computing
    2. graph traversal
    3. path enumeration

    Qualifiers

    • Research-article

    Funding Sources

    • National Key R$\&$D Programs of China
    • Shenzhen Science and Technology Program and Guangdong Key Lab of Mathematical Foundations for Artificial Intelligence, National Natural Science Foundation of China Innovation Research Group Project
    • Shenzhen Science and Technology Program
    • Basic and Applied Basic Research Fund in Guangdong Province
    • Guangdong Talent Program
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)250
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Distributed Shortest Distance Labeling on Large-Scale GraphsProceedings of the VLDB Endowment10.14778/3675034.367505317:10(2641-2653)Online publication date: 1-Jun-2024

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media