Abstract
Frequent pattern mining (FPM) is an important data mining paradigm to extract informative patterns like itemsets, sequences, trees, and graphs. However, no practical framework for integrating the FPM tasks has been attempted. In this paper, we describe the design and implementation of the Data Mining Template Library (DMTL) for FPM. DMTL utilizes a generic data mining approach, where all aspects of mining are controlled via a set of properties. It uses a novel pattern property hierarchy to define and mine different pattern types. This property hierarchy can be thought of as a systematic characterization of the pattern space, i.e., a meta-pattern specification that allows the analyst to specify new pattern types, by extending this hierarchy. Furthermore, in DMTL all aspects of mining are controlled by a set of different mining properties. For example, the kind of mining approach to use, the kind of data types and formats to mine over, the kind of back-end storage manager to use, are all specified as a list of properties. This provides tremendous flexibility to customize the toolkit for various applications. Flexibility of the toolkit is exemplified by the ease with which support for a new pattern can be added. Experiments on synthetic and public dataset are conducted to demonstrate the scalability provided by the persistent back-end in the library. DMTL been publicly released as open-source software (http://dmtl.sourceforge.net/), and has been downloaded by numerous researchers from all over the world.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD conference on management of data
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, CA, pp 307–328
Agrawal R, Srikant R (1995) Mining sequential patterns. In: 11th International conference on data engineering
Antunes C, Oliveira AL (2004) Sequential pattern mining algorithms: Trade-offs between speed and memory. In: 2nd International workshop on mining graphs, trees and sequences with ECML/PKDD
Asai T, Abe K, Kawasoe S, Arimura H, Satamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: 2nd SIAM international conference on data mining
Asai T, Arimura H, Uno T, Nakano S (2003) Discovering frequent substructures in large unordered trees. In: 6th International conference on discovery science
Ayres J, Flannick J, Gehrke JE, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: ACM SIGKDD international conference on knowledge discovery and data mining
Balcazar JL, Casas-Garriga G (2005) On horn axiomatizations for sequential data. In: 10th International confererence on database theory
Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data. ACM, New York, USA, pp 85–93
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: ACM SIGMOD conference on management of data
Buehrer G, Parthasarathy S, Ghoting A (2006). Out-of-core frequent pattern mining on a commodity PC. In: ACM SIGKDD international conference on knowledge discovery and data mining
Burdick D, Calimlim M, Gehrke J (2001a) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: IEEE international conference on data engineering
Burdick D, Calimlim M, Gehrke J (2001b) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: 17th International conference on data engineering
Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: 3rd IEEE international conference on data mining
Chi Y, Yang Y, Muntz RR (2004a) HybridTreeMiner: an efficient algorihtm for mining frequent rooted trees and free trees using canonical forms. In: 16th International conference on scientific and statistical database management
Chi Y, Yang Y, Xia Y, Muntz RR (2004b) CMTreeMiner: mining both closed and maximal frequent subtrees. In: 8th Pacific-Asia conferernce on knowledge discovery and data mining
Cook D, Holder L (1994) Substructure discovery using minimal description length and background knowledge. J Arti Intell Res 1: 231–255
Dehaspe L, Toivonen H, King R (1998) Finding frequent substructures in chemical compounds. In: 4th ACM SIGKDD international conference knowledge discovery and data mining
Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer-Verlag
Garofalakis M, Rastogi R, Shim K (1999) SPIRIT: sequential pattern mining with regular expression constraints. In: 25th International conference on very large data bases
Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y-K, Dubey P (2005) Cache-conscious frequent pattern mining on a modern processor. In: 31st International conference on very large data bases
Goethals B, Zaki MJ (2003) Advances in frequent itemset mining implementations: report on FIMI’03. SIGKDD Explor 6: 109–117
Gschwind T (2001) PSTL—A C++ Persistent Standard Template Library. In: 6th USENIX conference on object-oriented technologies and systems
Han J, Pei J, Yin Y (2000a) Mining frequent patterns without candidate generation. In: ACM SIGMOD conference on management of data
Han J, Pei J, Yin Y (2000b). Mining frequent patterns without candidate generation. In: ACM SIGMOD conference on management of data
Hasan MA, Chaoji V, Salem S, Zaki M (2005) DMTL: A generic Data Mining Template Library. In: 1st Workshop on library-centric software design (with OOPSLA)
Horváth T, Ramon J, Wrobel S (2006) Frequent subgraph mining in outerplanar graphs. In: 12th ACM SIGKDD international conference on knowledge discovery and data mining
Huan J, Wang W, Prins J (2003a) Efficient mining of frequent subgraphs in the presence of isomorphism. In: IEEE international conference on data mining
Huan J, Wang W, Prins J (2003b) Efficient mining of frequent subgraphs in the presence of isomorphism (Technical report TR03-021). University of North Carolina
Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: 4th European conference on principles of knowledge discovery and data mining
Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: Mining graph data. Machine Learn 50: 321–354
Kramer S, Raedt LD, Helma C (2001) Molecular feature mining in HIV data. In: ACM SIGKDD international conference on knowledge discovery and data mining
Kuramochi M, Karypis G (2001). Frequent subgraph discovery. In: 1st IEEE international conference on data mining
Kuramochi M, Karypis G (2004) An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering 16: 1038–1051
Mannila H, Toivonen H. (1996) Discovering generalized episodes using minimal occurences. In: 2nd International conference knowledge discovery and data mining
Mannila H, Toivonen H. (1997) Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1: 241–258
Mannila H, Toivonen H, Verkamo I (1995) Discovering frequent episodes in sequences. In: 1st International conference knowledge discovery and data mining
Musser D, Derge G, Saini A. (2001) STL tutorial and reference guide, 2nd edition. Addison-Wesley
Nijssen S, Kok J (2003) Efficient discovery of frequent unordered trees. In: 1st Internationall workshop on mining graphs, trees and sequences
Nijssen S, Kok J (2004) A quickstart in frequent structure mining can make a difference. In: ACM SIGKDD international conference on knowledge discovery and data mining
Oates T, Schmill MD, Jensen D, Cohen PR (1997) A family of algorithms for finding temporal structure in data. In: 6th International workshop on AI and statistics
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: 7th International conference on database theory
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001). PrefixSpan: mining sequential patterns efficiently by prefixprojected pattern growth. In: IEEE international conference on data engineering
Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: 21st International conference on very large data bases
Shasha D, Wang J, Zhang S (2004) Unordered tree mining with applications to phylogeny. In: IEEE international conference on data engineering
Siek J, Lee L, Lumsdaine A (2002). The boost graph library. Addison-Wesley
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: 5th International conference extending database technology
Termier A, Rousset M-C, Sebag M (2002) TreeFinder: a first step towards xml data mining. In: IEEE international conference on data mining
Termier A, Rousset M-C, Sebag M (2004) Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: IEEE international conference on data mining
Ullmann JR (1976) An algorithm for subgraph isomorphism. J ACM 23: 31–
Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Pacific-Asia conference on knowledge discovery and data mining
Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: IEEE international conference on data engineering
Wang J, Han J, Pei J. (2003). CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: ACM SIGKDD international conference on knowledge discovery and data mining
Wang K, Liu H (1998) Discovering typical structures of documents: a road map approach. In: ACM SIGIR international conference on information retrieval
Witten I, Frank E (1999) Data mining: practical machine learning tools and techniques with java implementations. Morgan Kauffman
Xiao Y, Yao J-F, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: IEEE international conference on data mining
Yan X, Han J (2002a) gSpan: graph-based substructure pattern mining. In: IEEE international conference on data mining
Yan X, Han J (2002b) gSpan: graph-based substructure pattern mining (Technical report UIUCDCS-R-2002-2296). University of Illinois at Urbana-Champaign
Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns In: ACM SIGKDD international conference on knowledge discovery and data mining
Yoshida K, Motoda H (1995) CLIP: concept learning from inference patterns. Artif Intel 75: 63–92
Zaki MJ (2000a) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12: 372–390
Zaki MJ (2000b) Sequences mining in categorical domains: Incorporating constraints. In: 9th International conference on information and knowledge management
Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Machine Learn J 42: 31–60
Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: 8th ACM SIGKDD international conference on knowledge discovery and data mining
Zaki MJ (2005a) Efficiently mining frequent embedded unordered trees. Fundamenta Informaticae 66: 33–52
Zaki MJ (2005b) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowledge Data Eng 17: 1021–1035
Zaki MJ, Gouda K (2003) Fast vertical mining using Diffsets. In: 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–335
Zaki MJ, Hsiao C-J (2002) ChARM: an efficient algorithm for closed itemset mining. In: 2nd SIAM international conference on data mining
Zaki MJ, Hsiao C-J (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17: 462–478
Zaki MJ, Parimi N, De N, Gao F, Phoophakdee B, Urban J, Chaoji V, Hasan M, Salem S (2004) Towards generic pattern mining. In: International conference on formal concept analysis (Invited paper)
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: 3rd International conference on knowledge discovery and data mining
Zou B, Ma X, Kemme B, Newton G, Precu D (2006) Data mining using relational database management systems. In: Pacific-asia conference on knowledge discovery and data mining
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Hannu Toivonen.
Rights and permissions
About this article
Cite this article
Chaoji, V., Al Hasan, M., Salem, S. et al. An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Disc 17, 457–495 (2008). https://doi.org/10.1007/s10618-008-0098-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-008-0098-x