Article

Fg-index: towards verification-free query processing on graph databases

Authors:

An LuAuthors Info & Claims

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Pages 857 - 872

https://doi.org/10.1145/1247480.1247574

Published: 11 June 2007 Publication History

Abstract

Graphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since it involves subgraph isomorphism testing, which is an NP-complete problem. In recent years, some effective graph indexes have been proposed to first obtain a candidate answer set by filtering part of the false results and then perform verification on each candidate by checking subgraph isomorphism. Query performance is improved since the number of subgraph isomorphism tests is reduced. However, candidate verification is still inevitable, which can be expensive when the size of the candidate answer set is large.

In this paper, we propose a novel indexing technique that constructs a nested inverted-index, called FG-index, based on the set of Frequent subGraphs (FGs). Given a graph query that is an FG in the database, FG-index returns the exact set of query answers without performing candidate verification. When the query is an infrequent graph, FG-index produces a candidate answer set which is close to the exact answer set. Since an infrequent graph means the graph occurs in only a small number of graphs in the database, the number of subgraph isomorphism tests is small. To ensure that the index fits into the main memory, we propose a new notion of δ-Tolerance Closed Frequent Graphs (δ-TCFGs), which allows us to flexibly tune the size of the index in a parameterized way. Our extensive experiments verify that query processing using FG-index is orders of magnitude more efficient than using the state-of-the-art graph index.

References

[1]

IBM Synthetic Data Generation Code for Associations and Sequential Patterns. http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data mining/mining.shtml, 1996.

[2]

Daylight theory manual-daylight version 4.9. Daylight Chemical Information Systems, Inc. www.daylight.com, 2006.

[3]

Q. Chen, A. Lim, and K. W. Ong. D(k)-index: an adaptive structural summary for graph-structured data. In Proc. of SIGMOD, pages 134--144, 2003.

Digital Library

[4]

J. Cheng, Y. Ke, and W. Ng. δ-tolerance closed frequent itemsets. In Proc. of ICDM, pages 139--148, 2006.

Digital Library

[5]

C. W. Chung, J. K. Min, and K. Shim. Apex: an adaptive path index for xml data. In Proc. of SIGMOD, pages 121--132, 2002.

Digital Library

[6]

D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994.

Digital Library

[7]

S. A. Cook. The complexity of theorem-proving procedures. In Proc. of STOC '71: Proceedings of the third annual ACM symposium on Theory of computing, pages 151--158, 1971.

Digital Library

[8]

B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proc. of VLDB, pages 341--350, 2001.

Digital Library

[9]

R. H. Güting. GraphDB: Modeling and querying graphs in databases. In Proc. of VLDB, pages 297--308, 1994.

Digital Library

[10]

R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proc. of VLDB, pages 436--445, 1997.

Digital Library

[11]

L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. of KDD, pages 169--180, 1994.

[12]

J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In Proc. of KDD, pages 581--586, 2004.

Digital Library

[13]

A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of PKDD, pages 13--23, 2000.

Digital Library

[14]

R. Kaushik, P. Bohannon, J. F. Naughton, and H. F.Korth. Covering indexes for branching path queries. In Proc. of SIGMOD, pages 133--144, 2002.

Digital Library

[15]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.

Digital Library

[16]

T. Milo and D. Suciu. Index structures for path expressions. In Proc. of ICDT, pages 277--295, 1999.

Digital Library

[17]

D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In Proc. of PODS, pages 39--52, 2002.

Digital Library

[18]

S. Srinivasa and S. Kumar. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proc. of VLDB, pages 975--986, 2003.

[19]

J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976.

Digital Library

[20]

C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In Proc. of KDD, pages 316--325, 2004.

Digital Library

[21]

D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In Proc. of VLDB, pages 709--720, 2005.

Digital Library

[22]

X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In Proc. of ICDM, pages 721--724, 2002.

Digital Library

[23]

X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In Proc. of KDD, pages 286--295, 2003.

Digital Library

[24]

X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst., 30(4):960--993, 2005.

Digital Library

Cited By

Ma YXu BYin H(2024)Tps: A new way to find good vertex-search order for exact subgraph matchingMultimedia Tools and Applications10.1007/s11042-024-18328-383:27(69875-69896)Online publication date: 3-Feb-2024
https://doi.org/10.1007/s11042-024-18328-3
Qin YWang XHao W(2024)MHNA: Multi-Hop Neighbors Aware Index for Accelerating Subgraph MatchingWeb and Big Data10.1007/978-981-97-2303-4_19(283-298)Online publication date: 29-May-2024
https://doi.org/10.1007/978-981-97-2303-4_19
Funamoto NInokuchi A(2024)Fast Subgraph Search with Graph Code IndicesDatabase and Expert Systems Applications10.1007/978-3-031-68309-1_4(43-58)Online publication date: 18-Aug-2024
https://doi.org/10.1007/978-3-031-68309-1_4
Show More Cited By

Index Terms

Fg-index: towards verification-free query processing on graph databases
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Efficient query processing on graph databases

We study the problem of processing subgraph queries on a database that consists of a set of graphs. The answer to a subgraph query is the set of graphs in the database that are supergraphs of the query. In this article, we propose an efficient index, FG*...
ECTree: an extended tree index for attributed subgraph queries
IDEAS '12: Proceedings of the 16th International Database Engineering & Applications Sysmposium

Graphs are popular data structures for modeling complex data types. There is a need for managing such graph data and providing efficient querying tools. In the graph mining realm, the problem lies in indexing a large number of graphs for fast retrieval. ...
GBLENDER: towards blending visual query formulation and query processing in graph databases
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Given a graph database D and a query graph g, an exact subgraph matching query asks for the set S of graphs in D that contain g as a subgraph. This type of queries find important applications in several domains such as bioinformatics and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

June 2007

1210 pages

ISBN:9781595936868

DOI:10.1145/1247480

General Chairs:
Lizhu Zhou
Tsinghua University, China
,
Tok Wang Ling
National University of Singapore, Singapore
,
Program Chair:
Beng Chin Ooi
National University of Singapore, Singapore

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGMOD/PODS07

Sponsor:

SIGMOD/PODS07: International Conference on Management of Data

June 11 - 14, 2007

Beijing, China

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

241
Total Citations
View Citations
195
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)2

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ma YXu BYin H(2024)Tps: A new way to find good vertex-search order for exact subgraph matchingMultimedia Tools and Applications10.1007/s11042-024-18328-383:27(69875-69896)Online publication date: 3-Feb-2024
https://doi.org/10.1007/s11042-024-18328-3
Qin YWang XHao W(2024)MHNA: Multi-Hop Neighbors Aware Index for Accelerating Subgraph MatchingWeb and Big Data10.1007/978-981-97-2303-4_19(283-298)Online publication date: 29-May-2024
https://doi.org/10.1007/978-981-97-2303-4_19
Funamoto NInokuchi A(2024)Fast Subgraph Search with Graph Code IndicesDatabase and Expert Systems Applications10.1007/978-3-031-68309-1_4(43-58)Online publication date: 18-Aug-2024
https://doi.org/10.1007/978-3-031-68309-1_4
Arai JFujiwara YOnizuka M(2023)GuP: Fast Subgraph Matching by Guard-based PruningProceedings of the ACM on Management of Data10.1145/35893121:2(1-26)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589312
Zeng JU LYan XLi YHan MTang B(2023)Extracting Top-$k$ Frequent and Diversified Patterns in Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3233594(1-18)Online publication date: 2023
https://doi.org/10.1109/TKDE.2022.3233594
Cong LYu JGe X(2023)Privacy preserving subgraph isomorphism query for dynamic graph databaseJournal of Network and Computer Applications10.1016/j.jnca.2022.103562211(103562)Online publication date: Feb-2023
https://doi.org/10.1016/j.jnca.2022.103562
Zou QLu C(2022)An Accurate Matching Query Method of Natural Language Knowledge Graph Based on Hierarchical Graph Topological SequenceIEEE Access10.1109/ACCESS.2022.315552010(24080-24094)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3155520
Cong LYu JGe X(2022)Enabling efficient privacy-preserving subgraph isomorphic query over graphsFuture Generation Computer Systems10.1016/j.future.2022.01.027132:C(1-10)Online publication date: 18-May-2022
https://dl.acm.org/doi/10.1016/j.future.2022.01.027
Wang XZhang QGuo DZhao X(2022)A survey of continuous subgraph matching for dynamic graphsKnowledge and Information Systems10.1007/s10115-022-01753-x65:3(945-989)Online publication date: 19-Oct-2022
https://doi.org/10.1007/s10115-022-01753-x
Su YSong YLi XLi FGu Y(2022)Efficient Subhypergraph Containment Queries on Hypergraph DatabasesWeb Information Systems and Applications10.1007/978-3-031-20309-1_44(497-509)Online publication date: 16-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-20309-1_44
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents