research-article

Output space sampling for graph patterns

Authors:

Mohammad Al Hasan,

Mohammed J. ZakiAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 2, Issue 1

Pages 730 - 741

https://doi.org/10.14778/1687627.1687710

Published: 01 August 2009 Publication History

Abstract

Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs. Another motivation is to obtain a succinct output set that is informative and useful. In the same spirit, researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs. In this work, we propose a generic sampling framework that is based on Metropolis-Hastings algorithm to sample the output space of frequent subgraphs. Our experiments on various sampling strategies show the versatility, utility and efficiency of the proposed sampling approach.

References

[1]

C. Bilgin, C. Demir, C. Nagi, and B. Yener. Cell-graph mining for breast tissue modeling and analysis. In IEEE Engineering in Medicine and Biology Society, 2007.

[2]

M. Boley and H. Grosskreutz. A randomized approach for approximating the number of frequent sets. In IEEE Int'l Conf. on Data Mining, 2008.

Digital Library

[3]

I. Bordino, D. Donato, A. Gionis, and S. Leonardi. Mining Large Networks with Subgraph Counting. In Proc. of ICDM, 2008.

Digital Library

[4]

B. Bringmann, A. Zimmermann, L. Raedt, and S. Nijssen. Don't be afraid of simpler pattern. In Proc. of PKDD Conference, 2006.

Digital Library

[5]

C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In In Proc. of Neural Information Processing Systems (NIPS), 2006.

[6]

V. Chakravarthy, V. Pandit, and Y. Sabharwal. Analysis of Sampling Techniques for Association Rule Mining. In Proc. of 12th International Conf. on Database Theory, 2009.

Digital Library

[7]

V. Chaoji, M. Hasan, S. Salem, J. Besson, and M. Zaki. ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns. Statistical Analysis and Data Mining, 1(2):67--84, June 2008.

Digital Library

[8]

V. Chaoji, M. Hasan, S. Salem, and M. Zaki. An Integrated, Generic Approach to Pattern Mining: Data Mining Template Library. Data Mining and Knowledge Discovery Journal, 17(3):457--495, 2008.

Digital Library

[9]

B. Chen, P. Hass, and P. Scheuermann. A new Two-Phase Sampling based Algorithm for discovering Association Rules. In SIGKDD Proceedings, pages 462--468, 2002.

Digital Library

[10]

R. K. Chung. Spectral Graph Theory. Americal Mathematical Society, 1997.

[11]

V. Guruswami. Rapidly mixing markov chains: A comparison of techniques. Technical report, MIT Laboratory of Computer Science, 2000.

[12]

M. A. Hasan and M. Zaki. Musk: Uniform sampling of k maximal patterns. In SIAM Data Mining, 2009.

[13]

J. Huan, W. Wang, D. B, J. Snoeyink, J. Prins, and A. Tropsha. Mining Protein Family Specific Residue Packing Patterns from Protein Structure Graphs. In Proc. of RECOMB, 2004.

Digital Library

[14]

J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 2003.

Digital Library

[15]

J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. In SIGKDD, 2004.

Digital Library

[16]

C. Hubler, H. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis Algorithms for Representative Subgraph Sampling. In Proc. of ICDM, 2008.

Digital Library

[17]

R. Jin, M. Abu-Ata, Y. Xiang, and N. Ruan. Effective and efficient itemset pattern summarization: regression-based approaches. In KDD '08: Proc. of SIGKDD, pages 399--407, 2008.

Digital Library

[18]

S. Kramer, L. Raedt, and C. Helma. Molecular feature Mining in HIV data. In Proc. of SIGKDD, pages 136--143, 2001.

Digital Library

[19]

M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. In ICDM, 2001.

Digital Library

[20]

L. Li, W. Fu, F. Guo, T. Mowry, and C. Faloutsos. Cut-And-Stitch: Efficient Parallel Learning of Linear Dynamical Systems on SMPs. In Proc. of SIGKDD, 2008.

Digital Library

[21]

S. Morishita and J. Sese. Traversing Itemset Lattice with Statistical Metric Pruning. PODS, pages 226--236, 2000.

Digital Library

[22]

R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.

Digital Library

[23]

M. Thoma, H. Cheng, A. Gretton, J. Han, H. Kriegel, A. Smola, L. Song, P. Yu, X. Yan, and K. Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In SIAM Int'l Conf. on Data Mining, 2009.

[24]

S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. In KDD Proceedings. ACM, 2004.

Digital Library

[25]

R. Y. Rubinstein and D. K. Kroese. Simulation and the Monte Carlo Method, 2nd Ed. John Wiley & Sons, 2008.

Digital Library

[26]

A. Sinclair. Algorithms for Random Generation and Counting. BirkHauser, 1992.

Digital Library

[27]

P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of SIGKDD, pages 32--41, 2002.

Digital Library

[28]

H. Toivonen. Sampling Large Databases for Association Rules. In VLDB Proceedings, pages 134--145, 1996.

Digital Library

[29]

J. R. Ullmann. An Algorithm for Subgraph Isomorphism. Journal of ACM, 23(1):31--42, 1976.

Digital Library

[30]

S. Vishwanathan, K. Borgwardt, and N. Schraudolph. Fast computation of graph kernels. In In Proc. of Neural Information Processing Systems (NIPS), 2006.

[31]

C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 730--735. ACM, 2006.

Digital Library

[32]

D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 709--720. VLDB Endowment, 2005.

Digital Library

[33]

X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing Itemset Patterns: A Profile-Based Approach. In SIGKDD, 2005.

Digital Library

[34]

X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining Significant Graph Patterns by Leap Search. In SIGMOD Proceedings. ACM, 2008.

Digital Library

[35]

X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining. In ICDM, 2002.

Digital Library

[36]

X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, New York, NY, USA, 2003. ACM.

Digital Library

[37]

S. Zhang, X. Wu, C. Zhang, and J. Lu. Computing the Minimum-Support for Mining Frequent Patterns. Knowledge and Information Systems, 15(2):233--257, 2008.

Digital Library

Cited By

Haghir Chehreghani M(2024)A Review on the Impact of Data Representation on Model ExplainabilityACM Computing Surveys10.1145/366217856:10(1-21)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3662178
Jha VTripathi P(2024)Conscious points and patterns extraction: a high-performance computing model for knowledge discovery in cognitive IoTThe Journal of Supercomputing10.1007/s11227-024-06348-780:17(24871-24907)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s11227-024-06348-7
Loglisci CImpedovo ACalders TCeci M(2024)Heuristic approaches for non-exhaustive pattern-based change detection in dynamic networksJournal of Intelligent Information Systems10.1007/s10844-024-00866-962:5(1455-1492)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10844-024-00866-9
Show More Cited By

Index Terms

Output space sampling for graph patterns

Recommendations

Frequent subgraph mining on a single large graph using sampling techniques
MLG '10: Proceedings of the Eighth Workshop on Mining and Learning with Graphs

Frequent subgraph mining has always been an important issue in data mining. Several frequent graph mining methods have been developed for mining graph transactions. However, these methods become less usable when the dataset is a single large graph. Also,...
Mining interesting subgraphs by output space sampling
Unbiased Sampling of Bipartite Graph
CYBERC '11: Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

Increasing size of online social networks (OSNs) has given rise to sampling method studies that provide a relatively small but representative sample of large-scale OSNs so that the measurement and analysis burden can be affordable. So far, a number of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 2, Issue 1

August 2009

1293 pages

ISSN:2150-8097

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 August 2009

Published in PVLDB Volume 2, Issue 1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
379
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Haghir Chehreghani M(2024)A Review on the Impact of Data Representation on Model ExplainabilityACM Computing Surveys10.1145/366217856:10(1-21)Online publication date: 22-Jun-2024
https://dl.acm.org/doi/10.1145/3662178
Jha VTripathi P(2024)Conscious points and patterns extraction: a high-performance computing model for knowledge discovery in cognitive IoTThe Journal of Supercomputing10.1007/s11227-024-06348-780:17(24871-24907)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s11227-024-06348-7
Loglisci CImpedovo ACalders TCeci M(2024)Heuristic approaches for non-exhaustive pattern-based change detection in dynamic networksJournal of Intelligent Information Systems10.1007/s10844-024-00866-962:5(1455-1492)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10844-024-00866-9
Lehembre ECremilleux BZimmermann ACuissart BOuali A(2024)WaveLSea: helping experts interactively explore pattern mining search spacesData Mining and Knowledge Discovery10.1007/s10618-024-01037-838:4(2403-2439)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s10618-024-01037-8
Tian SZeng XHu YWang BLiu YJin YMeng CHong CZhang TWang W(2024)GraphRPM: Risk Pattern Mining on Industrial Large Attributed GraphsMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track10.1007/978-3-031-70381-2_9(133-149)Online publication date: 8-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-70381-2_9
Jiang PWei YSu JWang RWu BKloeckner AMoreira J(2022)SampleMineProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569658(185-197)Online publication date: 8-Oct-2022
https://dl.acm.org/doi/10.1145/3559009.3569658
Ahmed SIslam MRajan H(2022)Semantics and Anomaly Preserving Sampling Strategy for Large-Scale Time Series DataACM/IMS Transactions on Data Science10.1145/35119182:4(1-25)Online publication date: 30-Mar-2022
https://dl.acm.org/doi/10.1145/3511918
Giacometti ASoulet A(2021)Reservoir Pattern Sampling in Data StreamsMachine Learning and Knowledge Discovery in Databases. Research Track10.1007/978-3-030-86486-6_21(337-352)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86486-6_21
Belfodil ACazalens SLamarre PPlantevit M(2020)Identifying exceptional (dis)agreement between groupsData Mining and Knowledge Discovery10.1007/s10618-019-00665-934:2(394-442)Online publication date: 1-Mar-2020
https://dl.acm.org/doi/10.1007/s10618-019-00665-9
Diop LDiop CGiacometti ALi DSoulet A(2020)Sequential pattern sampling with norm-based utilityKnowledge and Information Systems10.1007/s10115-019-01417-362:5(2029-2065)Online publication date: 1-May-2020
https://dl.acm.org/doi/10.1007/s10115-019-01417-3
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents