Abstract
Mining frequent patterns from datasets is one of the key success of data mining research. Currently, most of the studies focus on the data sets in which the elements are independent, such as the items in the marketing basket. However, the objects in the real world often have close relationship with each other. How to extract frequent patterns from these relations is the objective of this paper. The authors use graphs to model the relations, and select a simple type for analysis. Combining the graph theory and algorithms to generate frequent patterns, a new algorithm called Topology, which can mine these graphs efficiently, has been proposed. The performance of the algorithm is evaluated by doing experiments with synthetic datasets and real data. The experimental results show that Topology can do the job well. At the end of this paper, the potential improvement is mentioned.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal Ret al. Mining association rules between sets of items in large databases. InProc. ACM SIGMOD, Washington D C, USA, 1993, pp.207–216.
Agrawal Ret al. Fast algorithms for mining association rules in large databases. InProc. VLDB, Santiago, Chile, 1994, pp.487–499.
Park J Set al. An effective hash based algorithm for mining association rules. InProc. ACM SIGMOD, San Jose, California, USA, 1995, pp.175–186.
Brin Set al. Dynamic itemset counting and implication rules for market basket data. InProc. ACM SIGMOD, Tucson, Arizona, USA, 1997, pp.255–264.
Han Jet al. Mining frequent patterns without candidate generation. InProc. ACM SIGMOD Dallas, Texas, USA, 2000, pp.1–12.
Read R Cet al. The graph isomorphism disease.J. Graph Theory, 1977, 4: 339–363.
Babai Let al. Canonical labeling of graphs. InProc. ACM STOC, Boston, Massachusetts, USA, 1983, pp.171–183.
Inokuchi Aet al. An apriori-based algorithm for mining frequent substructures from graph data. InProc. PKDD, LNCS 1910, Springer, Lyon, France, 2000, pp.13–23.
Inokuchi Aet al. Applying algebraic mining method of graph substructures to mutageniesis data analysis. InKDD Challenge, PAKDD, Kyoto, Japan, 2000, pp.41–46.
Inokuchi Aet al. A fast algorithm for mining frequent connected subgraphs. Research Report RT0448, IBM Research, Tokyo Research Laboratory, 2002.
Kuramochi Met al. Frequent subgraph discovery. InProc. IEEE ICDM, San Jose California, USA, 2001, pp.313–320.
Kuramochi Met al. An efficient algorithm for discovering frequent subgraph. Technical Report 02-026, Dept. of Computer Science, University of Minnesota, 2002.
Yan Xet al. gSpan: Graph-based substructure pattern mining. InProc. IEEE ICDM, Maebashi City, Japan, 2002.
Pei Jet al. PrefixSpan: Mining sequential patterns by prefix-projected growth. InProc. ICDE, Dusseldorf, Germany, 2001, pp.215–224.
Cook D Jet al. Substructure discovery using minimum description length and background knowledge.J. Artificial Intelligence Research, 1994, 1: 231–255.
Yoshida Ket al. CLIP: Concept learning from inference patterns.Artificial Intelligence, 1995, 1: 63–92.
Motoda Het al. Machine learning techniques to make computers easier to use. InProc. IJCAI, 1997, 2: 1622–1631, Nagoya, Japan.
Matsuda Tet al. Extension of graph-based induction for general graph structured data. InProc. PAKDD, Springer, Kyoto, Japan, 2000, LNCS 1805: 420–431.
Matsuda Tet al. Knowledge discovery from structured data by beam-wise graph-based induction. InProc. PRICAI, Springer, Tokyo, Japan, 2002, LNCS 2417: 255–264.
Raedt L Deet al. The levelwise version space algorithm and its application to molecular fragment finding. InProc. IJCAI, Seattle, Washington, USA, 2001, 2: 853–862.
Dehaspe Let al. Finding frequent substructures in chemical compounds. InProc. KDD, New York, USA, 1998, pp.30–36.
Kramer Set al. Molecular feature mining in HIV data. InProc. ACM SIGKDD, San Francisco, USA, 2001, pp.136–143.
Weininger D. SMILES, a chemical language and information system.J. Chemical Information and Computer Sciences, 1988, 1: 31–36.
James C Aet al. Daylight Theory Manual—Daylight 4.71.
Wang Xet al. Finding patterns in three-dimensional graphs: Algorithms and applications to scientific data mining.IEEE TKDE, 2002, 4: 731–749.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China (Grant Nos.69933030 and 60303008) and the National High-Technology Development 863 Program of China (Grant No.2002AA4Z3430).
Wei Wang received the B.S. degree in computer science in 1992 from Shandong University, the Ph.D. degree in computer science in 1998 from Fudan University, respectively. He is now a professor in Department of Computing and Information Technology, Fudan University. His research interests include database, data warehouse, data mining.
Qing-Qing Yuan received the B.S., the M.S. degrees in computer science in 2000 from Fudan University, in 2003, respectively. Now she is a Ph.D. candidate in Department of Computer Science, University of California. Santa BarBara. Her research interests include database and data mining.
Hao-Feng Zhou received the B.S. degree in computer science in 1997 from Shanghai University, the M.S. degree and the Ph.D. degree in computer science in 2000 and in 2003, from Fudan University, respectively. His research interests include database and data mining.
Ming-Sheng Hong received the B.S. degree in computer science in 2002 from Fudan University. Now she is a Ph.D. candidate in Department of Computer Science, University of Connell. His research interests include database and data mining.
Bai-Le Shi received the B.S. degree in mathematics in 1957 from Peking University. He is a professor in Department of Computing and Information Technology, Fudan University. He is also director of the Shanghai (International) Database Research Center. His research interests include database, data warehouse and digital library.
Rights and permissions
About this article
Cite this article
Wang, W., Yuan, QQ., Zhou, HF. et al. Extracting frequent connected subgraphs from large graph sets. J. Comput. Sci. & Technol. 19, 867–875 (2004). https://doi.org/10.1007/BF02973450
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02973450