Abstract
Transactional data mining (association rules, decision trees etc.) has been effectively used to find non-trivial patterns in categorical and unstructured data. For applications that have an inherent structure (e.g., social networks, proteins), graph mining is useful since mapping the structured data into a transactional representation will lead to loss of information. Graph mining is used for identifying interesting or frequent subgraphs. Database mining uses SQL and relational representation to overcome limitations of main memory algorithms and to achieve scalability.
This paper presents a scalable, SQL-based approach to graph mining – specifically, interesting substructure discovery. The most general form of graphs including directed edges, multiple edges between nodes, and cycles are handled by our approach. Our primary goal in this work has been to address scalability, and map difficult and computationally expensive problems such as pseudo duplicate elimination, canonical labeling, and isomorphism checking into SQL-based counterparts. The notion of minimum description length (MDL) has been cast into corresponding metric for relational representation. Our experimental analysis shows that graphs with Millions of nodes and edges can be handled by the algorithm and the approach presented in this paper.
This work was supported, in part, by NSF grant IIS 0534611.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Balachandran, R., Padmanabhan, S., Chakravarthy, S.: Enhanced DB-subdue: Supporting subtle aspects of graph mining using a relational approach. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS, vol. 3918, pp. 673–678. Springer, Heidelberg (2006)
Cook, D.J., Holder, L.B.: Graph-based data mining. IEEE Intelligent Systems 15(2), 32–41 (2000)
Inokuchi, A., Washio, T., Motoda, H.: Complete mining of frequent patterns from graphs: Mining graph data. Mach. Learn. 50(3), 321–354 (2003)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: ICDM 2001: Proceedings of the 2001 IEEE International Conference on Data Mining, Washington, DC, USA, pp. 313–320. IEEE Computer Society Press, Los Alamitos (2001)
Mishra, P., Chakravarthy, S.: Performance evaluation and analysis of k-way join variants for association rule mining. In: BNCOD, pp. 95–114 (2003)
Padmanabhan, S.: HDB-Subdue: A relational database approach to graph mining and hierarchical reduction. Master’s thesis, Department of Computer Science and Engineering, University of Texas at Arlington /Students/sharma/theses/Pad05MS.pdf (December 2005), http://itlab.uta.edu/ITLABWEB
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Mining sequential patterns by prefix-projected growth. In: Proceedings of the 17th International Conference on Data Engineering, Washington, DC, USA, pp. 215–224. IEEE Computer Society Press, Los Alamitos (2001)
Washio, T., Motoda, H.: State of the art of graph-based data mining. SIGKDD Explor. Newsl. 5(1), 59–68 (2003)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: ICDM 2002: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Washington, DC, USA, p. 721. IEEE Computer Society Press, Los Alamitos (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Padmanabhan, S., Chakravarthy, S. (2009). HDB-Subdue: A Scalable Approach to Graph Mining. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2009. Lecture Notes in Computer Science, vol 5691. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03730-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-03730-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03729-0
Online ISBN: 978-3-642-03730-6
eBook Packages: Computer ScienceComputer Science (R0)