Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1516360.1516421acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

FOGGER: an algorithm for graph generator discovery

Published: 24 March 2009 Publication History

Abstract

To our best knowledge, all existing graph pattern mining algorithms can only mine either closed, maximal or the complete set of frequent subgraphs instead of graph generators which are preferable to the closed subgraphs according to the Minimum Description Length principle in some applications. In this paper, we study a new problem of frequent subgraph mining, called frequent connected graph generator mining, which poses significant challenges due to the underlying complexity associated with frequent subgraph mining as well as the absence of Apriori property for graph generators. Whereas, we still present an efficient solution FOGGER for this new problem. By exploring some properties of graph generators, two effective pruning techniques, backward edge pruning and forward edge pruning, are proposed to prune the branches of the well-known DFS code enumeration tree that do not contain graph generators. To further improve the efficiency, an effective index structure, ADI++, is also devised to facilitate the subgraph isomorphism checking. We experimentally evaluate various aspects of FOGGER using both real and synthetic datasets. Our results demonstrate that the two pruning techniques are effective in pruning the unpromising parts of search space, and FOGGER is efficient and scalable in terms of the base size of input databases. Meanwhile, the performance study for graph generator-based classification model shows that generator-based model is much simpler and can achieve almost the same accuracy for classifying chemical compounds in comparison with closed subgraph-based model.

References

[1]
Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In CL '00: Proceedings of the First International Conference on Computational Logic, pages 972--986, London, UK, 2000. Springer-Verlag.
[2]
J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queris by means of free-sets. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 75--85, London, UK, 2000. Springer-Verlag.
[3]
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 857--872, Beijing, China, 2007. ACM.
[4]
B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997. Translator-C. Franzke.
[5]
Q. Gao, M. Li, and P. Vitányi. Applying mdl to learn best model granularity. Artificial Intelligence, 121(1--2):1--29, 2000.
[6]
P. D. Grünwald, I. J. Myung, and M. A. Pitt. Advances in Minimum Description Length: Theory and Applications (Neural Information Processing). The MIT Press, 2005.
[7]
P. Grünwald. A tutorial introduction to the minimum description length principle. The Computing Research Repository, math.ST/0406077, 2004.
[8]
H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213--221, 2005.
[9]
J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM '03: Proceedings of the Third IEEE International Conference on Data Mining, page 549, Washington, DC, USA, 2003. IEEE Computer Society.
[10]
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM.
[11]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 13--23, London, UK, 2000. Springer-Verlag.
[12]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 313--320, Washington, DC, USA, 2001. IEEE Computer Society.
[13]
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph*. Data Mining and Knowledge Discovery, 11(3):243--271, 2005.
[14]
J. Li, H. Li, L. Wong, J. Pei, and G. Dong. Minimum description length principle: Generators are preferable to closed patterns. In AAAI '06, Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Application of Artificial Intelligence. AAAI Press, 2006.
[15]
M. Li and P. Vitányi. An introduction to Kolmogorov complexity and its applications (2nd ed.). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997.
[16]
J. Rissanen. Modelling by the shortest data description. Automatica, 14:465--471, 1978.
[17]
K. Sim, J. Li, V. Gopalkrishnan, and G. Liu. Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1059--1063, Washington, DC, USA, 2006. IEEE Computer Society.
[18]
J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. Graphscope: parameter-free mining of large time-evolving graphs. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 687--696, San Jose, California, USA, 2007. ACM.
[19]
L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1097--1101, Washington, DC, USA, 2006. IEEE Computer Society.
[20]
R. M. H. Ting and J. Bailey. Mining minimal contrast subgraph patterns. In SDM'06: Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, MD, USA, 2006. SIAM.
[21]
H. Tong, C. Faloutsos, and Y. Koren. Fast direction-aware proximity for graph mining. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 747--756, San Jose, California, USA, 2007. ACM.
[22]
C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM.
[23]
J. Wang, W. Hsu, M. L. Lee, and C. Sheng. A partition-based approach to graph mining. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 74, Atlanta, USA, 2006. IEEE Computer Society.
[24]
J. Wang, Z. Zeng, and L. Zhou. Clan:an algorithm for mining closed cliques from large dense graph databases. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 73, Atlanta, USA, 2006. IEEE Computer Society.
[25]
D. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 976--985, Istanbul, Turkey, 2007.
[26]
X. Yan and J. H. and. Closegraph: mining closed frequent graph patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, Washington, D.C., 2003. ACM.
[27]
X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM '02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 721, Washington, DC, USA, 2002. IEEE Computer Society.
[28]
X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 335--346, Paris, France, 2004. ACM.
[29]
X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 324--333, Chicago, Illinois, USA, 2005. ACM.
[30]
M. J. Zaki. Efficiently mining frequent trees in a forest. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71--80, Edmonton, Alberta, Canada, 2002. ACM.
[31]
Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797--802, Philadelphia, PA, USA, 2006. ACM.
[32]
S. Zhang, J. Yang, and V. Cheedella. Monkey: Approximate graph mining based on spanning trees. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 1247--1249, Istanbul, Turkey, 2007.
[33]
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta <= graph. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 938--949, Vienna, Austria, 2007. VLDB Endowment.

Cited By

View all
  • (2021)Heuristic extraction of co-occurrence patterns for effective construction of graph-based interpretable decision sets2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW53999.2021.00033(159-165)Online publication date: Nov-2021
  • (2020)Extraction of Interpretable Decision Sets in Graph Databases2020 Eighth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR51075.2020.00028(153-159)Online publication date: Nov-2020
  • (2020)A survey of pattern mining in dynamic graphsWIREs Data Mining and Knowledge Discovery10.1002/widm.1372Online publication date: 20-May-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
March 2009
1180 pages
ISBN:9781605584225
DOI:10.1145/1516360
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 March 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph classification
  2. graph generator
  3. graph mining

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '09
EDBT/ICDT '09: EDBT/ICDT '09 joint conference
March 24 - 26, 2009
Saint Petersburg, Russia

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)7
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Heuristic extraction of co-occurrence patterns for effective construction of graph-based interpretable decision sets2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW53999.2021.00033(159-165)Online publication date: Nov-2021
  • (2020)Extraction of Interpretable Decision Sets in Graph Databases2020 Eighth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR51075.2020.00028(153-159)Online publication date: Nov-2020
  • (2020)A survey of pattern mining in dynamic graphsWIREs Data Mining and Knowledge Discovery10.1002/widm.1372Online publication date: 20-May-2020
  • (2019)Extraction of Characteristic Subgraph Patterns with Support Threshold from Databases of Floor Plans2019 Seventh International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR.2019.00033(197-203)Online publication date: Nov-2019
  • (2019)CCSpanKnowledge-Based Systems10.1016/j.knosys.2015.06.01489:C(1-13)Online publication date: 1-Jan-2019
  • (2018)Key roles of closed sets and minimal generators in concise representations of frequent patternsIntelligent Data Analysis10.5555/2595513.259551716:4(581-631)Online publication date: 27-Dec-2018
  • (2017)Efficient Mining of Subsample-Stable Graph Patterns2017 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2017.88(757-762)Online publication date: Nov-2017
  • (2017)Discovery of δ-Tolerance Closed Subgraphs on GPGPU2017 Fifth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR.2017.34(173-179)Online publication date: Nov-2017
  • (2016)Mining Contiguous Sequential Generators in Biological SequencesIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2015.249513213:5(855-867)Online publication date: 1-Sep-2016
  • (2016)Experimental Evaluation of a GPU-based Frequent Subgraph Miner Using Synthetic Databases2016 Fourth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR.2016.0093(504-507)Online publication date: Nov-2016
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media