Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

RapidMatch: a holistic approach to subgraph query processing

Published: 01 October 2020 Publication History

Abstract

A subgraph query searches for all embeddings in a data graph that are identical to a query graph. Two kinds of algorithms, either graph exploration based or join based, have been developed for processing subgraph queries. Due to algorithmic and implementational differences, join-based systems can handle query graphs of a few vertices efficiently whereas exploration-based approaches typically process up to several tens of vertices in the query graph. In this paper, we first compare these two kinds of methods and prove that the complexity of result enumeration in state-of-the-art exploration-based methods matches that of the worst-case optimal join. Furthermore, we propose RapidMatch, a holistic subgraph query processing framework integrating the two approaches. Specifically, RapidMatch not only runs relational operators such as selections and joins, but also utilizes graph structural information, as in graph exploration, for filtering and join plan generation. Consequently, it outperforms the state of the art in both approaches on a wide range of query workloads.

References

[1]
Christopher R Aberger, Andrew Lamb, Susan Tu, Andres Nötzli, Kunle Olukotun, and Christopher Ré. 2017. Emptyheaded: A relational engine for graph processing. ACM Transactions on Database Systems (TODS) 42, 4 (2017), 1--44.
[2]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of databases. Vol. 8. Addison-Wesley Reading.
[3]
Khaled Ammar, Frank McSherry, Semih Salihoglu, and Manas Joglekar. 2018. Distributed Evaluation of Subgraph Queries Using Worst-case Optimal Low-Memory Dataflows. Proceedings of the VLDB Endowment 11, 6 (2018).
[4]
Molham Aref, Balder ten Cate, Todd J Green, Benny Kimelfeld, Dan Olteanu, Emir Pasalic, Todd L Veldhuizen, and Geoffrey Washburn. 2015. Design and implementation of the LogicBlox system. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 1371--1382.
[5]
Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query plans for relational joins. In 2008 49th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 739--748.
[6]
Bibek Bhattarai, Hang Liu, and H Howie Huang. 2019. Ceci: Compact embedding cluster index for scalable subgraph matching. In Proceedings of the 2019 International Conference on Management of Data. 1447--1462.
[7]
Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient subgraph matching by postponing cartesian products. In Proceedings of the 2016 International Conference on Management of Data. 1199--1214.
[8]
Jonathan Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National security agency technical report 16 (2008), 3--1.
[9]
Stephen A Cook. 1971. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing. 151--158.
[10]
Brian Gallagher. 2006. Matching Structure and Semantics: A Survey on Graph-Based Pattern Matching. In AAAI Fall Symposium: Capturing and Using Patterns for Evidence Detection. 45--53.
[11]
Wentian Guo, Yuchen Li, Mo Sha, Bingsheng He, Xiaokui Xiao, and Kian-Lee Tan. 2020. GPU-Accelerated Subgraph Enumeration on Partitioned Graphs. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1067--1082.
[12]
Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient subgraph matching: Harmonizing dynamic programming, adaptive matching order, and failing set together. In Proceedings of the 2019 International Conference on Management of Data. 1429--1446.
[13]
Shuo Han, Lei Zou, and Jeffrey Xu Yu. 2018. Speeding up set intersections in graph algorithms using simd instructions. In Proceedings of the 2018 International Conference on Management of Data. 1587--1602.
[14]
Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 337--348.
[15]
Huahai He and Ambuj K Singh. 2008. Graphs-at-a-time: query language and access methods for graph databases. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 405--418.
[16]
Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1311--1322.
[17]
Chathura Kankanamge, Siddhartha Sahu, Amine Mhedbhi, Jeremy Chen, and Semih Salihoglu. 2017. Graphflow: An active graph database. In Proceedings of the 2017 ACM International Conference on Management of Data. 1695--1698.
[18]
Foteini Katsarou, Nikos Ntarmos, and Peter Triantafillou. 2015. Performance and scalability of indexed subgraph query processing methods. Proceedings of the VLDB Endowment 8, 12 (2015), 1566--1577.
[19]
Foteini Katsarou, Nikos Ntarmos, and Peter Triantafillou. 2017. Subgraph querying with parallel use of query rewritings and alternative algorithms. (2017).
[20]
Hyeonji Kim, Juneyoung Lee, Sourav S Bhowmick, Wook-Shin Han, JeongHoon Lee, Seongyun Ko, and Moath HA Jarrah. 2016. DUALSIM: Parallel subgraph enumeration in a massive graph on a single machine. In Proceedings of the 2016 International Conference on Management of Data. 1231--1245.
[21]
Longbin Lai, Zhu Qing, Zhengyi Yang, Xin Jin, Zhengmin Lai, Ran Wang, Kongzhang Hao, Xuemin Lin, Lu Qin, Wenjie Zhang, et al. 2019. Distributed subgraph matching on timely dataflow. Proceedings of the VLDB Endowment 12, 10 (2019), 1099--1112.
[22]
Jinsoo Lee, Wook-Shin Han, Romans Kasperovics, and Jeong-Hoon Lee. 2012. An in-depth comparison of subgraph isomorphism algorithms in graph databases. Proceedings of the VLDB Endowment 6, 2 (2012), 133--144.
[23]
Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing subgraph queries by combining binary and worst-case optimal joins. Proceedings of the VLDB Endowment 12, 11 (2019), 1692--1704.
[24]
Hung Q Ngo. 2018. Worst-case optimal join algorithms: Techniques, results, and open problems. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 111--124.
[25]
Hung Q Ngo, Christopher Ré, and Atri Rudra. 2014. Skew strikes back: New developments in the theory of join algorithms. ACM SIGMOD Record 42, 4 (2014), 5--16.
[26]
Dung Nguyen, Molham Aref, Martin Bravenboer, George Kollias, Hung Q Ngo, Christopher Ré, and Atri Rudra. 2015. Join processing for graph patterns: An old dog with new tricks. In Proceedings of the GRADES'15. 1--8.
[27]
Yeonsu Park, Seongyun Ko, Sourav S Bhowmick, Kyoungmin Kim, Kijae Hong, and Wook-Shin Han. 2020. G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1099--1114.
[28]
Carlos R Rivero and Hasan M Jamil. 2017. Efficient and scalable labeled subgraph matching using SGMatch. Knowledge and Information Systems 51, 1 (2017), 61--87.
[29]
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M Tamer Özsu. 2017. The ubiquity of large graphs and surprising challenges of graph processing. Proceedings of the VLDB Endowment 11, 4 (2017), 420--431.
[30]
Ahmet Erdem Sariyüce, C Seshadhri, and Ali Pinar. 2018. Local algorithms for hierarchical dense subgraph discovery. Proceedings of the VLDB Endowment 12, 1 (2018), 43--56.
[31]
Ahmet Erdem Sariyuce, C Seshadhri, Ali Pinar, and Umit V Catalyurek. 2015. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web. 927--937.
[32]
Stephen B Seidman. 1983. Network structure and minimum degree. Social networks 5, 3 (1983), 269--287.
[33]
Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proceedings of the VLDB Endowment 1, 1 (2008), 364--375.
[34]
Shixuan Sun, Yulin Che, Lipeng Wang, and Qiong Luo. 2019. Efficient parallel subgraph enumeration on a single machine. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 232--243.
[35]
Shixuan Sun and Qiong Luo. 2019. Scaling up subgraph query processing with efficient subgraph matching. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 220--231.
[36]
Shixuan Sun and Qiong Luo. 2020. In-Memory Subgraph Matching: An In-depth Study. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1083--1098.
[37]
Julian R Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM (JACM) 23, 1 (1976), 31--42.
[38]
Todd L Veldhuizen. 2012. Leapfrog triejoin: A simple, worst-case optimal join algorithm. arXiv preprint arXiv:1210.0481 (2012).
[39]
Mihalis Yannakakis. 1981. Algorithms for acyclic database schemes. In VLDB, Vol. 81. 82--94.
[40]
Shijie Zhang, Shirong Li, and Jiong Yang. 2009. GADDI: distance index based subgraph matching in biological networks. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. 192--203.
[41]
Shijie Zhang, Shirong Li, and Jiong Yang. 2010. SUMMA: subgraph matching in massive graphs. In Proceedings of the 19th ACM international conference on Information and knowledge management. 1285--1288.
[42]
Peixiang Zhao and Jiawei Han. 2010. On graph query optimization in large networks. Proceedings of the VLDB Endowment 3, 1--2 (2010), 340--351.

Cited By

View all
  • (2024)TC-Match: Fast Time-Constrained Continuous Subgraph MatchingProceedings of the VLDB Endowment10.14778/3681954.368196317:11(2791-2804)Online publication date: 30-Aug-2024
  • (2024)Fast Local Subgraph CountingProceedings of the VLDB Endowment10.14778/3659437.365945117:8(1967-1980)Online publication date: 1-Apr-2024
  • (2024)In-depth Analysis of Continuous Subgraph Matching in a Common Delta Query Compilation FrameworkProceedings of the ACM on Management of Data10.1145/36549502:3(1-27)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 14, Issue 2
October 2020
167 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 October 2020
Published in PVLDB Volume 14, Issue 2

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)139
  • Downloads (Last 6 weeks)15
Reflects downloads up to 09 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TC-Match: Fast Time-Constrained Continuous Subgraph MatchingProceedings of the VLDB Endowment10.14778/3681954.368196317:11(2791-2804)Online publication date: 30-Aug-2024
  • (2024)Fast Local Subgraph CountingProceedings of the VLDB Endowment10.14778/3659437.365945117:8(1967-1980)Online publication date: 1-Apr-2024
  • (2024)In-depth Analysis of Continuous Subgraph Matching in a Common Delta Query Compilation FrameworkProceedings of the ACM on Management of Data10.1145/36549502:3(1-27)Online publication date: 30-May-2024
  • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
  • (2024)Efficient Pruned Top-K Subgraph Matching with Topology-Aware BoundsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679790(2848-2857)Online publication date: 21-Oct-2024
  • (2024)eGrass: An Encrypted Attributed Subgraph Matching System With Malicious SecurityIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.340908919(5999-6014)Online publication date: 3-Jun-2024
  • (2024)SIMComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2024.110502248:COnline publication date: 1-Jun-2024
  • (2024)Towards efficient simulation-based constrained temporal graph pattern matchingWorld Wide Web10.1007/s11280-024-01259-227:3Online publication date: 3-Apr-2024
  • (2024)Optimizing subgraph retrieval and matching with an efficient indexing schemeKnowledge and Information Systems10.1007/s10115-024-02175-766:11(6815-6843)Online publication date: 1-Nov-2024
  • (2024)StepTC: Stepwise Triangle Counting on GPU with Two Efficient Set Intersection MethodsDatabase Systems for Advanced Applications10.1007/978-981-97-5562-2_28(441-451)Online publication date: 2-Jul-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media