Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1247480.1247574acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Fg-index: towards verification-free query processing on graph databases

Published: 11 June 2007 Publication History

Abstract

Graphs are prevalently used to model the relationships between objects in various domains. With the increasing usage of graph databases, it has become more and more demanding to efficiently process graph queries. Querying graph databases is costly since it involves subgraph isomorphism testing, which is an NP-complete problem. In recent years, some effective graph indexes have been proposed to first obtain a candidate answer set by filtering part of the false results and then perform verification on each candidate by checking subgraph isomorphism. Query performance is improved since the number of subgraph isomorphism tests is reduced. However, candidate verification is still inevitable, which can be expensive when the size of the candidate answer set is large.
In this paper, we propose a novel indexing technique that constructs a nested inverted-index, called FG-index, based on the set of Frequent subGraphs (FGs). Given a graph query that is an FG in the database, FG-index returns the exact set of query answers without performing candidate verification. When the query is an infrequent graph, FG-index produces a candidate answer set which is close to the exact answer set. Since an infrequent graph means the graph occurs in only a small number of graphs in the database, the number of subgraph isomorphism tests is small. To ensure that the index fits into the main memory, we propose a new notion of δ-Tolerance Closed Frequent Graphs (δ-TCFGs), which allows us to flexibly tune the size of the index in a parameterized way. Our extensive experiments verify that query processing using FG-index is orders of magnitude more efficient than using the state-of-the-art graph index.

References

[1]
IBM Synthetic Data Generation Code for Associations and Sequential Patterns. http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data mining/mining.shtml, 1996.
[2]
Daylight theory manual-daylight version 4.9. Daylight Chemical Information Systems, Inc. www.daylight.com, 2006.
[3]
Q. Chen, A. Lim, and K. W. Ong. D(k)-index: an adaptive structural summary for graph-structured data. In Proc. of SIGMOD, pages 134--144, 2003.
[4]
J. Cheng, Y. Ke, and W. Ng. δ-tolerance closed frequent itemsets. In Proc. of ICDM, pages 139--148, 2006.
[5]
C. W. Chung, J. K. Min, and K. Shim. Apex: an adaptive path index for xml data. In Proc. of SIGMOD, pages 121--132, 2002.
[6]
D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994.
[7]
S. A. Cook. The complexity of theorem-proving procedures. In Proc. of STOC '71: Proceedings of the third annual ACM symposium on Theory of computing, pages 151--158, 1971.
[8]
B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proc. of VLDB, pages 341--350, 2001.
[9]
R. H. Güting. GraphDB: Modeling and querying graphs in databases. In Proc. of VLDB, pages 297--308, 1994.
[10]
R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In Proc. of VLDB, pages 436--445, 1997.
[11]
L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. of KDD, pages 169--180, 1994.
[12]
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In Proc. of KDD, pages 581--586, 2004.
[13]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of PKDD, pages 13--23, 2000.
[14]
R. Kaushik, P. Bohannon, J. F. Naughton, and H. F.Korth. Covering indexes for branching path queries. In Proc. of SIGMOD, pages 133--144, 2002.
[15]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.
[16]
T. Milo and D. Suciu. Index structures for path expressions. In Proc. of ICDT, pages 277--295, 1999.
[17]
D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In Proc. of PODS, pages 39--52, 2002.
[18]
S. Srinivasa and S. Kumar. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proc. of VLDB, pages 975--986, 2003.
[19]
J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1):31--42, 1976.
[20]
C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In Proc. of KDD, pages 316--325, 2004.
[21]
D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In Proc. of VLDB, pages 709--720, 2005.
[22]
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In Proc. of ICDM, pages 721--724, 2002.
[23]
X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In Proc. of KDD, pages 286--295, 2003.
[24]
X. Yan, P. S. Yu, and J. Han. Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst., 30(4):960--993, 2005.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
  • General Chairs:
  • Lizhu Zhou,
  • Tok Wang Ling,
  • Program Chair:
  • Beng Chin Ooi
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. frequent subgraphs
  2. graph databases
  3. graph indexing
  4. graph querying

Qualifiers

  • Article

Conference

SIGMOD/PODS07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Tps: A new way to find good vertex-search order for exact subgraph matchingMultimedia Tools and Applications10.1007/s11042-024-18328-383:27(69875-69896)Online publication date: 3-Feb-2024
  • (2024)MHNA: Multi-Hop Neighbors Aware Index for Accelerating Subgraph MatchingWeb and Big Data10.1007/978-981-97-2303-4_19(283-298)Online publication date: 29-May-2024
  • (2024)Fast Subgraph Search with Graph Code IndicesDatabase and Expert Systems Applications10.1007/978-3-031-68309-1_4(43-58)Online publication date: 18-Aug-2024
  • (2023)GuP: Fast Subgraph Matching by Guard-based PruningProceedings of the ACM on Management of Data10.1145/35893121:2(1-26)Online publication date: 20-Jun-2023
  • (2023)Extracting Top-$k$ Frequent and Diversified Patterns in Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3233594(1-18)Online publication date: 2023
  • (2023)Privacy preserving subgraph isomorphism query for dynamic graph databaseJournal of Network and Computer Applications10.1016/j.jnca.2022.103562211(103562)Online publication date: Feb-2023
  • (2022)An Accurate Matching Query Method of Natural Language Knowledge Graph Based on Hierarchical Graph Topological SequenceIEEE Access10.1109/ACCESS.2022.315552010(24080-24094)Online publication date: 2022
  • (2022)Enabling efficient privacy-preserving subgraph isomorphic query over graphsFuture Generation Computer Systems10.1016/j.future.2022.01.027132:C(1-10)Online publication date: 18-May-2022
  • (2022)A survey of continuous subgraph matching for dynamic graphsKnowledge and Information Systems10.1007/s10115-022-01753-x65:3(945-989)Online publication date: 19-Oct-2022
  • (2022)Efficient Subhypergraph Containment Queries on Hypergraph DatabasesWeb Information Systems and Applications10.1007/978-3-031-20309-1_44(497-509)Online publication date: 16-Sep-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media