Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Subgraph matching over graph federation

Published: 01 November 2021 Publication History

Abstract

Many real-life applications require processing graph data across heterogeneous sources. In this paper, we define the graph federation that indicates that the graph data sources are temporarily federated and offer their data for users. Next, we propose a new framework FedGraph to efficiently and effectively perform subgraph matching, which is a crucial application in graph federation. FedGraph consists of three phases, including query decomposition, distributed matching, and distributed joining. We also develop new efficient approximation algorithms and apply them in each phase to attack the NP-hard problem. The evaluations are conducted in a real test bed using both real-life and synthetic graph datasets. FedGraph outperforms the state-of-the-art methods, reducing the execution time and communication cost by 37.3 × and 61.8 ×, respectively.

References

[1]
2020. The Linked Open Data Cloud. https://lod-cloud.net/.
[2]
Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, and Edna Ruckhaus. 2011. ANAPSID: an adaptive query processing engine for SPARQL endpoints. In International Semantic Web Conference. Springer, 18--34.
[3]
Foto N Afrati, Dimitris Fotakis, and Jeffrey D Ullman. 2013. Enumerating subgraph instances using map-reduce. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 62--73.
[4]
Khaled Ammar, Frank McSherry, Semih Salihoglu, and Manas Joglekar. 2018. Distributed evaluation of subgraph queries using worst-case optimal low-memory dataflows. Proceedings of the VLDB Endowment 11, 6 (2018), 691--704.
[5]
Bibek Bhattarai, Hang Liu, and H Howie Huang. 2019. Ceci: Compact embedding cluster index for scalable subgraph matching. In Proceedings of the 2019 International Conference on Management of Data. 1447--1462.
[6]
Fei Bi, Lijun Chang, Xuemin Lin, Lu Qin, and Wenjie Zhang. 2016. Efficient subgraph matching by postponing cartesian products. In Proceedings of the 2016 International Conference on Management of Data. 1199--1214.
[7]
David C. Blair. 1979. Information Retrieval, 2nd ed. C.J. Van Rijsbergen. London: Butterworths. Journal of the American Society for Information Science 30, 6 (1979), 374--375.
[8]
Ali Cakmak and Gultekin Ozsoyoglu. 2008. Taxonomy-superimposed graph mining. In Proceedings of the 11th international conference on Extending database technology: Advances in database technology. 217--228.
[9]
Yang Cao, Wenfei Fan, Yanghao Wang, and Ke Yi. 2020. Querying shared data with security heterogeneity. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 575--585.
[10]
Olivier Corby, Rose Dieng-Kuntz, Fabien Gandon, and Catherine Faron-Zucker. 2006. Searching the semantic web: Approximate query processing based on ontologies. IEEE Intelligent Systems 21, 1 (2006), 20--27.
[11]
Luigi P Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento. 2004. A (sub) graph isomorphism algorithm for matching large graphs. IEEE transactions on pattern analysis and machine intelligence 26, 10 (2004), 1367--1372.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[13]
Kemele M Endris, Philipp D Rohde, Maria-Esther Vidal, and Sören Auer. 2019. Ontario: Federated query processing against a semantic data lake. In International Conference on Database and Expert Systems Applications. Springer, 379--395.
[14]
Hector Garcia-Molina, Jeffrey D Ullman, and Jennifer Widom. 2000. Database system implementation. Vol. 654. Prentice Hall Englewood Cliffs.
[15]
Michael R. Garey and David S. Johnson. 1979. Computers and intractability: a guide to the theory of NP-completeness. W.H.Freeman.
[16]
Minos N Garofalakis and Yannis E Ioannidis. 1996. Multi-dimensional resource scheduling for parallel queries. ACM SIGMOD Record 25, 2 (1996), 365--376.
[17]
Olaf Görlitz and Steffen Staab. 2011. SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In Proceedings of the Second International Conference on Consuming Linked Data-Volume 782. 13--24.
[18]
Glen L Gray and Roger S Debreceny. 2014. A taxonomy to guide research on the application of data mining to fraud detection in financial statement audits. International Journal of Accounting Information Systems 15, 4 (2014), 357--380.
[19]
Myoungji Han, Hyunjoon Kim, Geonmo Gu, Kunsoo Park, and Wook-Shin Han. 2019. Efficient subgraph matching: Harmonizing dynamic programming, adaptive matching order, and failing set together. In Proceedings of the 2019 International Conference on Management of Data. 1429--1446.
[20]
Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. 2013. Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 337--348.
[21]
Andreas Harth, Katja Hose, Marcel Karnstedt, Axel Polleres, Kai-Uwe Sattler, and Jürgen Umbrich. 2010. Data summaries for on-demand queries over linked data. In Proceedings of the 19th international conference on World wide web. 411--420.
[22]
Huahai He and Ambuj K Singh. 2008. Graphs-at-a-time: query language and access methods for graph databases. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 405--418.
[23]
Hanh Huu Hoang and A Min Tjoa. 2006. The state of the art of ontology-based query systems: A comparison of existing approaches. Citeseer.
[24]
Dorit S Hochba. 1997. Approximation algorithms for NP-hard problems. ACM Sigact News 28, 2 (1997), 40--52.
[25]
Johannes Hoffart, Fabian M Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence 194 (2013), 28--61.
[26]
https://neo4j.com/blog/rdf-triple-store-vs-labeled-property-graphdifference/. [n.d.]. ([n. d.]).
[27]
Yannis E Ioannidis and Younkyung Cha Kang. 1991. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In Proceedings of the 1991 ACM SIGMOD international conference on Management of data. 168--177.
[28]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
[29]
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-Thought Vectors. In Advances in Neural Information Processing Systems. 3294--3302.
[30]
Rasmus Knappe, Henrik Bulskov, and Troels Andreasen. 2007. Perspectives on ontology-based querying. International Journal of Intelligent Systems 22, 7 (2007), 739--761.
[31]
Longbin Lai, Lu Qin, Xuemin Lin, and Lijun Chang. 2015. Scalable subgraph enumeration in mapreduce. Proceedings of the VLDB Endowment 8, 10 (2015), 974--985.
[32]
Longbin Lai, Lu Qin, Xuemin Lin, Ying Zhang, Lijun Chang, and Shiyu Yang. 2016. Scalable distributed subgraph enumeration. Proceedings of the VLDB Endowment 10, 3 (2016), 217--228.
[33]
Longbin Lai, Zhu Qing, Zhengyi Yang, Xin Jin, Zhengmin Lai, Ran Wang, Kongzhang Hao, Xuemin Lin, Lu Qin, Wenjie Zhang, et al. 2019. Distributed subgraph matching on timely dataflow. Proceedings of the VLDB Endowment 12, 10 (2019), 1099--1112.
[34]
Quoc V. Le and Tomás Mikolov. [n.d.]. Distributed Representations of Sentences and Documents. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21--26 June 2014 (JMLR Workshop and Conference Proceedings), Vol. 32. 1188--1196.
[35]
Eric Little, Kedar Sambhoos, and James Llinas. 2008. Enhancing graph matching techniques with ontologies. In 2008 11th International Conference on Information Fusion. IEEE, 1--8.
[36]
Hanchao Ma, Morteza Alipourlangouri, Yinghui Wu, Fei Chiang, and Jiaxing Pi. 2019. Ontology-based entity matching in attributed graphs. Proceedings of the VLDB Endowment 12, 10 (2019), 1195--1207.
[37]
Eetu Mäkelä, Eero Hyvönen, and Samppa Saarela. 2006. Ontogator¡aa semantic view-based search engine service for web applications. In International Semantic Web Conference. Springer, 847--860.
[38]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135--146.
[39]
Fatemeh Nargesian, Erkang Zhu, Renée J Miller, Ken Q Pu, and Patricia C Arocena. 2019. Data lake management: challenges and opportunities. Proceedings of the VLDB Endowment 12, 12 (2019), 1986--1989.
[40]
Xiaofeng Ouyang, Liang Hong, and Lujia Zhang. 2018. Query Associations Over Big Financial Knowledge Graph. In International Conference on Big Scientific Data Management. Springer, 199--211.
[41]
Peng Peng, Lei Zou, M. Tamer Ozsu, Lei Chen, and Dongyan Zhao. 2016. Processing SPARQL queries over distributed RDF graphs. The VLDB journal 25, 2 (2016), 243--268.
[42]
Fabian Prasser, Alfons Kemper, and Klaus A Kuhn. 2012. Efficient distributed query processing for autonomous RDF databases. In Proceedings of the 15th International Conference on Extending Database Technology. 372--383.
[43]
Miao Qiao, Hao Zhang, and Hong Cheng. 2017. Subgraph matching: on compression and computation. Proceedings of the VLDB Endowment 11, 2 (2017), 176--188.
[44]
Muhammad Saleem and Axel-Cyrille Ngonga Ngomo. 2014. Hibiscus: Hypergraph-based source selection for sparql endpoint federation. In European semantic web conference. Springer, 176--191.
[45]
Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. 2011. Fedx: Optimization techniques for federated query processing on linked data. In International semantic web conference. Springer, 601--616.
[46]
Haichuan Shang, Ying Zhang, Xuemin Lin, and Jeffrey Xu Yu. 2008. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proceedings of the VLDB Endowment 1, 1 (2008), 364--375.
[47]
Patrick Stünkel, Ole von Bargen, Adrian Rutle, and Yngve Lamo. 2020. GraphQL Federation: A Model-Based Approach. (2020).
[48]
Shixuan Sun and Qiong Luo. 2019. Scaling up subgraph query processing with efficient subgraph matching. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 220--231.
[49]
Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment 5, 9 (2012), 788--799.
[50]
Julian R Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM (JACM) 23, 1 (1976), 31--42.
[51]
Vijay V. Vazirani. 2001. Approximation algorithms. Springer.
[52]
Chihping Wang and Ming-Syan Chen. 1996. On the complexity of distributed query optimization. IEEE Transactions on Knowledge and Data Engineering 8, 4 (1996), 650--662.
[53]
Haixun Wang, Hao He, Jun Yang, Philip S Yu, and Jeffrey Xu Yu. 2006. Dual labeling: Answering graph reachability queries in constant time. In 22nd International Conference on Data Engineering (ICDE'06). IEEE, 75--75.
[54]
Zhaokang Wang, Rong Gu, Weiwei Hu, Chunfeng Yuan, and Yihua Huang. 2019. BENU: Distributed subgraph enumeration with backtracking-based framework. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 136--147.
[55]
Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup graph processing by graph ordering. In Proceedings of the 2016 International Conference on Management of Data. 1813--1828.
[56]
Wentao Wu, Hongsong Li, Haixun Wang, and Kenny Q Zhu. 2012. Probase: A probabilistic taxonomy for text understanding. In Proceedings of the 2012 ACM SIGMOD International Conference on Management ofData. 481--492.
[57]
Yinghui Wu, Shengqi Yang, and Xifeng Yan. 2013. Ontology-based subgraph querying. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 697--708.
[58]
Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li, et al. 2019. OAG: Toward linking large-scale heterogeneous entity graphs. In Proceedings of the 25th ACM SIGKDD. 2585--2595.
[59]
Amal Zouaq and Felix Martel. 2020. What is the schema of your knowledge graph? leveraging knowledge graph embeddings and clustering for expressive taxonomy learning. In Proceedings of The International Workshop on Semantic Big Data. 1--6.

Cited By

View all
  • (2024)FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous DataProceedings of the VLDB Endowment10.14778/3648160.364817417:6(1337-1349)Online publication date: 3-May-2024
  • (2023)Inductive graph unlearningProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620417(3205-3222)Online publication date: 9-Aug-2023
  • (2023)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 1-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 3
November 2021
364 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 November 2021
Published in PVLDB Volume 15, Issue 3

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)11
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous DataProceedings of the VLDB Endowment10.14778/3648160.364817417:6(1337-1349)Online publication date: 3-May-2024
  • (2023)Inductive graph unlearningProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620417(3205-3222)Online publication date: 9-Aug-2023
  • (2023)RAGraph: A Region-Aware Framework for Geo-Distributed Graph ProcessingProceedings of the VLDB Endowment10.14778/3632093.363209417:3(264-277)Online publication date: 1-Nov-2023
  • (2023)CommunityAF: An Example-Based Community Search Method via Autoregressive FlowProceedings of the VLDB Endowment10.14778/3603581.360359516:10(2565-2577)Online publication date: 1-Jun-2023
  • (2023)Approximate k-Nearest Neighbor Query over Spatial Data FederationDatabase Systems for Advanced Applications10.1007/978-3-031-30637-2_23(351-368)Online publication date: 17-Apr-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media