Mining Condensed Frequent-Pattern Bases

Jian Pei¹,
Guozhu Dong²,
Wei Zou³ &
…
Jiawei Han³

119 Accesses
Explore all metrics

Abstract

Frequent-pattern mining has been studied extensively and has many useful applications. However, frequent-pattern mining often generates too many patterns to be truly efficient or effective. In many applications, it is sufficient to generate and examine frequent patterns with a sufficiently good approximation of the support frequency instead of in full precision. Such a compact but “close-enough” frequent-pattern base is called a condensed frequent-pattern base.

In this paper, we propose and examine several alternatives for the design, representation, and implementation of such condensed frequent-pattern bases. Several algorithms for computing such pattern bases are proposed. Their effectiveness at pattern compression and methods for efficiently computing them are investigated. A systematic performance study is conducted on different kinds of databases, and demonstrates the effectiveness and efficiency of our approach in handling frequent-pattern mining in large databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agarwal R, Aggarwal C, Prasad VVV (2001) A tree projection algorithm for generation of frequent itemsets. J Parallel Distrib Comput 61:350–371
Article MATH Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc 1994 Int Conf Very Large Data Bases (VLDB’94), Santiago, Chile, pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proc 1995 Int Conf Data Engineering (ICDE’95), Taipei, Taiwan, pp 3–14
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proc 1998 ACM-SIGMOD Int Conf Management of Data (SIGMOD’98), Seattle, USA, pp 85–93
Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proc 2002 ACM SIGKDD Int Conf Knowledge Discovery in Databases (KDD’02), Edmonton, Canada, pp 436–442
Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: Proc 1999 ACM-SIGMOD Int Conf Management of Data (SIGMOD’99), Philadelphia, USA, pp 359–370
Boulicaut JF, Bykowski A, Rigotti C (2000) Approximation of frequency queries by means of free-sets. In: Principles of Data Mining and Knowledge Discovery, pp 75–85
Brin S, Motwani R, Silverstein C (1997) Beyond market basket: generalizing association rules to correlations. In: Proc 1997 ACM-SIGMOD Int Conf Management of Data (SIGMOD’97), Tucson, USA, pp 265–276
Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 443–452
Dong G, Han J, Lam J, Pei J, Wang K (2001) Mining multi-dimensional constrained gradients in data cubes. In: Proc 2001 Int Conf Very Large Data Bases (VLDB’01), Rome, Italy, pp 321–330
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proc 1999 Int Conf Knowledge Discovery and Data Mining (KDD’99), San Diego, USA, pp 43–52
Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: ICDM, pp 163–170
Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: Proc 1999 Int Conf Data Engineering (ICDE’99), Sydney, Australia, pp 106–115
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc 2000 ACM-SIGMOD Int Conf Management of Data (SIGMOD’00), Dallas, USA, pp 1–12
Imielinski T, Khachiyan L, Abdulghani A (2002) Cubegrades: generalizing association rules. Data Min Knowl Discovery 6:219–258
Article Google Scholar
Kamber M, Han J, Chiang JY (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. In: Proc 1997 Int Conf Knowledge Discovery and Data Mining (KDD’97), Newport Beach, USA, pp 207–210
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proc 1998 Int Conf Knowledge Discovery and Data Mining (KDD’98), New York, USA, pp 80–86
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proc 2001 Int Conf Data Mining (ICDM’01), San Jose, USA, pp 369–376
Lakshmanan LVS, Ng R, Han J, and Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. In: Proc 1999 ACM-SIGMOD Int Conf Management of Data (SIGMOD’99), Philadelphia, USA, pp 157–168
Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proc 1997 Int Conf Data Engineering (ICDE’97), Birmingham, England, pp 220–231,
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations (extended abstract). In: Knowledge Discovery and Data Mining, pp 189–194
Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discovery 1:259–289
Article Google Scholar
Margaritis D, Faloutsos C, Thrun S (2001) Netcube: a scalable tool for fast data mining and compression. In: Proc 2001 Int Conf Very Large Data Bases (VLDB’01), pp 311–320
Ng R, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proc 1998 ACM-SIGMOD Int Conf Management of Data (SIGMOD’98), Seattle, USA, pp 13–24
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc 7th Int Conf Database Theory (ICDT’99), Jerusalem, Israel, pp 398–416
Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 433–332
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proc 2000 ACM-SIGMOD Int Workshop Data Mining and Knowledge Discovery (DMKD’00), Dallas, USA, pp 11–20
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 215–224
Silverstein C, Brin S, Motwani R, Ullman J (1998) Scalable techniques for mining causal structures. In: Proc 1998 Int Conf Very Large Data Bases (VLDB’98), New York, USA, pp 594–605
Zaki M (2000) Generating non-redundant association rules. In: Proc 2000 ACM SIGKDD Int Conf Knowledge Discovery in Databases (KDD’00), Boston, USA, pp 34–43
Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proc 2002 SIAM Int Conf Data Mining, Arlington, USA, pp 457–473

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, State University of New York at Buffalo, 201 Bell Hall, Buffalo, NY, 14260-2000, USA
Jian Pei
Wright State University, USA
Guozhu Dong
University of Illinois at Urbana-Champaign, Urbana-Champaign, USA
Wei Zou & Jiawei Han

Authors

Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar
Guozhu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zou
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Pei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pei, J., Dong, G., Zou, W. et al. Mining Condensed Frequent-Pattern Bases. Know. Inf. Sys. 6, 570–594 (2004). https://doi.org/10.1007/s10115-003-0133-6

Download citation

Received: 15 September 2002
Revised: 20 December 2002
Accepted: 24 January 2003
Published: 09 February 2004
Issue Date: September 2004
DOI: https://doi.org/10.1007/s10115-003-0133-6

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Study of Effective Mining Algorithms for Frequent Itemsets

Abstract Representations and Generalized Frequent Pattern Discovery

A Survey on Representation for Itemsets in Association Rule Mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Mining Condensed Frequent-Pattern Bases

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Study of Effective Mining Algorithms for Frequent Itemsets

Abstract Representations and Generalized Frequent Pattern Discovery

A Survey on Representation for Itemsets in Association Rule Mining

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now