research-article

Free access

FOGGER: an algorithm for graph generator discovery

Authors:

Lizhu ZhouAuthors Info & Claims

EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Pages 517 - 528

https://doi.org/10.1145/1516360.1516421

Published: 24 March 2009 Publication History

Abstract

To our best knowledge, all existing graph pattern mining algorithms can only mine either closed, maximal or the complete set of frequent subgraphs instead of graph generators which are preferable to the closed subgraphs according to the Minimum Description Length principle in some applications. In this paper, we study a new problem of frequent subgraph mining, called frequent connected graph generator mining, which poses significant challenges due to the underlying complexity associated with frequent subgraph mining as well as the absence of Apriori property for graph generators. Whereas, we still present an efficient solution FOGGER for this new problem. By exploring some properties of graph generators, two effective pruning techniques, backward edge pruning and forward edge pruning, are proposed to prune the branches of the well-known DFS code enumeration tree that do not contain graph generators. To further improve the efficiency, an effective index structure, ADI++, is also devised to facilitate the subgraph isomorphism checking. We experimentally evaluate various aspects of FOGGER using both real and synthetic datasets. Our results demonstrate that the two pruning techniques are effective in pruning the unpromising parts of search space, and FOGGER is efficient and scalable in terms of the base size of input databases. Meanwhile, the performance study for graph generator-based classification model shows that generator-based model is much simpler and can achieve almost the same accuracy for classifying chemical compounds in comparison with closed subgraph-based model.

References

[1]

Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In CL '00: Proceedings of the First International Conference on Computational Logic, pages 972--986, London, UK, 2000. Springer-Verlag.

Digital Library

[2]

J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queris by means of free-sets. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 75--85, London, UK, 2000. Springer-Verlag.

Digital Library

[3]

J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 857--872, Beijing, China, 2007. ACM.

Digital Library

[4]

B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997. Translator-C. Franzke.

Digital Library

[5]

Q. Gao, M. Li, and P. Vitányi. Applying mdl to learn best model granularity. Artificial Intelligence, 121(1--2):1--29, 2000.

Digital Library

[6]

P. D. Grünwald, I. J. Myung, and M. A. Pitt. Advances in Minimum Description Length: Theory and Applications (Neural Information Processing). The MIT Press, 2005.

Digital Library

[7]

P. Grünwald. A tutorial introduction to the minimum description length principle. The Computing Research Repository, math.ST/0406077, 2004.

[8]

H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics, 21(1):213--221, 2005.

Digital Library

[9]

J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM '03: Proceedings of the Third IEEE International Conference on Data Mining, page 549, Washington, DC, USA, 2003. IEEE Computer Society.

Digital Library

[10]

J. Huan, W. Wang, J. Prins, and J. Yang. Spin: mining maximal frequent subgraphs from graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM.

Digital Library

[11]

A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pages 13--23, London, UK, 2000. Springer-Verlag.

Digital Library

[12]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 313--320, Washington, DC, USA, 2001. IEEE Computer Society.

Digital Library

[13]

M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph*. Data Mining and Knowledge Discovery, 11(3):243--271, 2005.

Digital Library

[14]

J. Li, H. Li, L. Wong, J. Pei, and G. Dong. Minimum description length principle: Generators are preferable to closed patterns. In AAAI '06, Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Application of Artificial Intelligence. AAAI Press, 2006.

Digital Library

[15]

M. Li and P. Vitányi. An introduction to Kolmogorov complexity and its applications (2nd ed.). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1997.

Digital Library

[16]

J. Rissanen. Modelling by the shortest data description. Automatica, 14:465--471, 1978.

Digital Library

[17]

K. Sim, J. Li, V. Gopalkrishnan, and G. Liu. Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1059--1063, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[18]

J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. Graphscope: parameter-free mining of large time-evolving graphs. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 687--696, San Jose, California, USA, 2007. ACM.

Digital Library

[19]

L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. In ICDM '06: Proceedings of the Sixth International Conference on Data Mining, pages 1097--1101, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[20]

R. M. H. Ting and J. Bailey. Mining minimal contrast subgraph patterns. In SDM'06: Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, MD, USA, 2006. SIAM.

[21]

H. Tong, C. Faloutsos, and Y. Koren. Fast direction-aware proximity for graph mining. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 747--756, San Jose, California, USA, 2007. ACM.

Digital Library

[22]

C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. Scalable mining of large disk-based graph databases. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 581--586, Seattle, WA, USA, 2004. ACM.

Digital Library

[23]

J. Wang, W. Hsu, M. L. Lee, and C. Sheng. A partition-based approach to graph mining. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 74, Atlanta, USA, 2006. IEEE Computer Society.

Digital Library

[24]

J. Wang, Z. Zeng, and L. Zhou. Clan:an algorithm for mining closed cliques from large dense graph databases. In ICDE '06: Proceedings of the 22nd International Conference on Data Engineering, page 73, Atlanta, USA, 2006. IEEE Computer Society.

Digital Library

[25]

D. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 976--985, Istanbul, Turkey, 2007.

[26]

X. Yan and J. H. and. Closegraph: mining closed frequent graph patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, Washington, D.C., 2003. ACM.

Digital Library

[27]

X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM '02: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM'02), page 721, Washington, DC, USA, 2002. IEEE Computer Society.

Digital Library

[28]

X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 335--346, Paris, France, 2004. ACM.

Digital Library

[29]

X. Yan, X. J. Zhou, and J. Han. Mining closed relational graphs with connectivity constraints. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 324--333, Chicago, Illinois, USA, 2005. ACM.

Digital Library

[30]

M. J. Zaki. Efficiently mining frequent trees in a forest. In KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 71--80, Edmonton, Alberta, Canada, 2002. ACM.

Digital Library

[31]

Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherent closed quasi-clique discovery from large dense graph databases. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 797--802, Philadelphia, PA, USA, 2006. ACM.

Digital Library

[32]

S. Zhang, J. Yang, and V. Cheedella. Monkey: Approximate graph mining based on spanning trees. In ICDE '07, IEEE 23rd International Conference on Data Engineering, pages 1247--1249, Istanbul, Turkey, 2007.

[33]

P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: tree + delta <= graph. In VLDB '07: Proceedings of the 33rd international conference on Very large data bases, pages 938--949, Vienna, Austria, 2007. VLDB Endowment.

Digital Library

Cited By

Matsuyama KOzaki T(2021)Heuristic extraction of co-occurrence patterns for effective construction of graph-based interpretable decision sets2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW53999.2021.00033(159-165)Online publication date: Nov-2021
https://doi.org/10.1109/CANDARW53999.2021.00033
Matsuyama KOzaki T(2020)Extraction of Interpretable Decision Sets in Graph Databases2020 Eighth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR51075.2020.00028(153-159)Online publication date: Nov-2020
https://doi.org/10.1109/CANDAR51075.2020.00028
Fournier‐Viger PHe GCheng CLi JZhou MLin JYun U(2020)A survey of pattern mining in dynamic graphsWIREs Data Mining and Knowledge Discovery10.1002/widm.1372Online publication date: 20-May-2020
https://doi.org/10.1002/widm.1372
Show More Cited By

Recommendations

Graph Classification via Graph Structure Learning
Intelligent Information and Database Systems
Abstract
With the ability of representing structures and complex relationships between data, graph learning is widely applied in many fields. The problem of graph classification is important in graph analysis and learning. There are many popular graph ...
A Truss-Based Framework for Graph Similarity Computation

The study of graph kernels has been an important area of graph analysis, which is widely used to solve the similarity problems between graphs. Most of the existing graph kernels consider either local or global properties of the graph, and there are ...
Graph partitioning and visualization in graph mining: a survey
Abstract
Graph mining is a process of obtaining one or more sub-graphs and has been a very attractive research topic over the last two decades. It has found many practical applications dealing with real world problems in variety of domains like Social ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

March 2009

1180 pages

ISBN:9781605584225

DOI:10.1145/1516360

Editors:
Martin Kersten
CWI, The Netherlands
,
Boris Novikov
University of Saint Petersburg, Russia
,
Jens Teubner
ETH Zurich, Switzerland
,
Vladimir Polutin
HP Labs, Russia
,
Stefan Manegold
CWI, The Netherlands

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 March 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

EDBT/ICDT '09

EDBT/ICDT '09: EDBT/ICDT '09 joint conference

March 24 - 26, 2009

Saint Petersburg, Russia

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
366
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)7

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Matsuyama KOzaki T(2021)Heuristic extraction of co-occurrence patterns for effective construction of graph-based interpretable decision sets2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW53999.2021.00033(159-165)Online publication date: Nov-2021
https://doi.org/10.1109/CANDARW53999.2021.00033
Matsuyama KOzaki T(2020)Extraction of Interpretable Decision Sets in Graph Databases2020 Eighth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR51075.2020.00028(153-159)Online publication date: Nov-2020
https://doi.org/10.1109/CANDAR51075.2020.00028
Fournier‐Viger PHe GCheng CLi JZhou MLin JYun U(2020)A survey of pattern mining in dynamic graphsWIREs Data Mining and Knowledge Discovery10.1002/widm.1372Online publication date: 20-May-2020
https://doi.org/10.1002/widm.1372
Ozaki T(2019)Extraction of Characteristic Subgraph Patterns with Support Threshold from Databases of Floor Plans2019 Seventh International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR.2019.00033(197-203)Online publication date: Nov-2019
https://doi.org/10.1109/CANDAR.2019.00033
(2019)CCSpanKnowledge-Based Systems10.1016/j.knosys.2015.06.01489:C(1-13)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1016/j.knosys.2015.06.014
Hamrouni T(2018)Key roles of closed sets and minimal generators in concise representations of frequent patternsIntelligent Data Analysis10.5555/2595513.259551716:4(581-631)Online publication date: 27-Dec-2018
https://dl.acm.org/doi/10.5555/2595513.2595517
Buzmakov AKuznetsov SNapoli A(2017)Efficient Mining of Subsample-Stable Graph Patterns2017 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2017.88(757-762)Online publication date: Nov-2017
https://doi.org/10.1109/ICDM.2017.88
Toki TOzaki T(2017)Discovery of δ-Tolerance Closed Subgraphs on GPGPU2017 Fifth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR.2017.34(173-179)Online publication date: Nov-2017
https://doi.org/10.1109/CANDAR.2017.34
Zhang JWang YZhang CShi Y(2016)Mining Contiguous Sequential Generators in Biological SequencesIEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)10.1109/TCBB.2015.249513213:5(855-867)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1109/TCBB.2015.2495132
Toki TOzaki T(2016)Experimental Evaluation of a GPU-based Frequent Subgraph Miner Using Synthetic Databases2016 Fourth International Symposium on Computing and Networking (CANDAR)10.1109/CANDAR.2016.0093(504-507)Online publication date: Nov-2016
https://doi.org/10.1109/CANDAR.2016.0093
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents