Abstract
Frequent-pattern mining has been studied extensively and has many useful applications. However, frequent-pattern mining often generates too many patterns to be truly efficient or effective. In many applications, it is sufficient to generate and examine frequent patterns with a sufficiently good approximation of the support frequency instead of in full precision. Such a compact but “close-enough” frequent-pattern base is called a condensed frequent-pattern base.
In this paper, we propose and examine several alternatives for the design, representation, and implementation of such condensed frequent-pattern bases. Several algorithms for computing such pattern bases are proposed. Their effectiveness at pattern compression and methods for efficiently computing them are investigated. A systematic performance study is conducted on different kinds of databases, and demonstrates the effectiveness and efficiency of our approach in handling frequent-pattern mining in large databases.
Similar content being viewed by others
References
Agarwal R, Aggarwal C, Prasad VVV (2001) A tree projection algorithm for generation of frequent itemsets. J Parallel Distrib Comput 61:350–371
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc 1994 Int Conf Very Large Data Bases (VLDB’94), Santiago, Chile, pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proc 1995 Int Conf Data Engineering (ICDE’95), Taipei, Taiwan, pp 3–14
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proc 1998 ACM-SIGMOD Int Conf Management of Data (SIGMOD’98), Seattle, USA, pp 85–93
Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proc 2002 ACM SIGKDD Int Conf Knowledge Discovery in Databases (KDD’02), Edmonton, Canada, pp 436–442
Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: Proc 1999 ACM-SIGMOD Int Conf Management of Data (SIGMOD’99), Philadelphia, USA, pp 359–370
Boulicaut JF, Bykowski A, Rigotti C (2000) Approximation of frequency queries by means of free-sets. In: Principles of Data Mining and Knowledge Discovery, pp 75–85
Brin S, Motwani R, Silverstein C (1997) Beyond market basket: generalizing association rules to correlations. In: Proc 1997 ACM-SIGMOD Int Conf Management of Data (SIGMOD’97), Tucson, USA, pp 265–276
Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 443–452
Dong G, Han J, Lam J, Pei J, Wang K (2001) Mining multi-dimensional constrained gradients in data cubes. In: Proc 2001 Int Conf Very Large Data Bases (VLDB’01), Rome, Italy, pp 321–330
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proc 1999 Int Conf Knowledge Discovery and Data Mining (KDD’99), San Diego, USA, pp 43–52
Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: ICDM, pp 163–170
Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: Proc 1999 Int Conf Data Engineering (ICDE’99), Sydney, Australia, pp 106–115
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc 2000 ACM-SIGMOD Int Conf Management of Data (SIGMOD’00), Dallas, USA, pp 1–12
Imielinski T, Khachiyan L, Abdulghani A (2002) Cubegrades: generalizing association rules. Data Min Knowl Discovery 6:219–258
Kamber M, Han J, Chiang JY (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. In: Proc 1997 Int Conf Knowledge Discovery and Data Mining (KDD’97), Newport Beach, USA, pp 207–210
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proc 1998 Int Conf Knowledge Discovery and Data Mining (KDD’98), New York, USA, pp 80–86
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proc 2001 Int Conf Data Mining (ICDM’01), San Jose, USA, pp 369–376
Lakshmanan LVS, Ng R, Han J, and Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. In: Proc 1999 ACM-SIGMOD Int Conf Management of Data (SIGMOD’99), Philadelphia, USA, pp 157–168
Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proc 1997 Int Conf Data Engineering (ICDE’97), Birmingham, England, pp 220–231,
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations (extended abstract). In: Knowledge Discovery and Data Mining, pp 189–194
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discovery 1:259–289
Margaritis D, Faloutsos C, Thrun S (2001) Netcube: a scalable tool for fast data mining and compression. In: Proc 2001 Int Conf Very Large Data Bases (VLDB’01), pp 311–320
Ng R, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proc 1998 ACM-SIGMOD Int Conf Management of Data (SIGMOD’98), Seattle, USA, pp 13–24
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc 7th Int Conf Database Theory (ICDT’99), Jerusalem, Israel, pp 398–416
Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 433–332
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proc 2000 ACM-SIGMOD Int Workshop Data Mining and Knowledge Discovery (DMKD’00), Dallas, USA, pp 11–20
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 215–224
Silverstein C, Brin S, Motwani R, Ullman J (1998) Scalable techniques for mining causal structures. In: Proc 1998 Int Conf Very Large Data Bases (VLDB’98), New York, USA, pp 594–605
Zaki M (2000) Generating non-redundant association rules. In: Proc 2000 ACM SIGKDD Int Conf Knowledge Discovery in Databases (KDD’00), Boston, USA, pp 34–43
Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proc 2002 SIAM Int Conf Data Mining, Arlington, USA, pp 457–473
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pei, J., Dong, G., Zou, W. et al. Mining Condensed Frequent-Pattern Bases. Know. Inf. Sys. 6, 570–594 (2004). https://doi.org/10.1007/s10115-003-0133-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-003-0133-6