Article

Dense itemsets

Authors:

Jouni K. Seppänen,

Heikki MannilaAuthors Info & Claims

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 683 - 688

https://doi.org/10.1145/1014052.1014140

Published: 22 August 2004 Publication History

Abstract

Frequent itemset mining has been the subject of a lot of work in data mining research ever since association rules were introduced. In this paper we address a problem with frequent itemsets: that they only count rows where all their attributes are present, and do not allow for any noise. We show that generalizing the concept of frequency while preserving the performance of mining algorithms is nontrivial, and introduce a generalization of frequent itemsets, dense itemsets. Dense itemsets do not require all attributes to be present at the same time; instead, the itemset needs to define a sufficiently large submatrix that exceeds a given density threshold of attributes present.We consider the problem of computing all dense itemsets in a database. We give a levelwise algorithm for this problem, and also study the top-$k$ variations, i.e., finding the k densest sets with a given support, or the k best-supported sets with a given density. These algorithms select the other parameter automatically, which simplifies mining dense itemsets in an explorative way. We show that the concept captures natural facets of data sets, and give extensive empirical results on the performance of the algorithms. Combining the concept of dense itemsets with set cover ideas, we also show that dense itemsets can be used to obtain succinct descriptions of large datasets. We also discuss some variations of dense itemsets.

References

[1]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD '93, pages 207--216, 1993.]]

Digital Library

[2]

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 12, pages 307--328. AAAI Press, 1996.]]

Digital Library

[3]

E. Bingham, H. Mannila, and J. K. Seppänen. Topics in 0-1 data. In D. Hand, D. Keim, and R. Ng, editors, Proc. KDD--2002, pages 450--455, 2002.]]

Digital Library

[4]

J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by means of free-sets. In PKDD--2000, volume 1910 of LNCS, pages 75--85. Springer, 2000.]]

Digital Library

[5]

T. Brijs, G. Swinnen, K. Vanhoof, and G. Wets. Using association rules for product assortment decisions: A case study. In Knowledge Discovery and Data Mining, pages 254--260, 1999.]]

Digital Library

[6]

A. Bykowski, J. K. Seppänen, and J. Hollmén. Model-independent bounding of Boolean formulae in binary data. In M. Klemettinen, R. Meo, F. Giannotti, and L. De Raedt, editors, Knowledge Discovery in Inductive Databases (KDID--02), First International Workshop, pages 20--31, Helsinki, Finland, 2002. University of Helsinki Department of Computer Science Report B--2002--7.]]

[7]

T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In T. Elomaa, H. Mannila, and H. Toivonen, editors, Proc. PKDD--2002, volume 2431 of LNCS, pages 74--85. Springer-Verlag, 2002.]]

Digital Library

[8]

R. Duda and P. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.]]

Digital Library

[9]

F. Geerts, B. Goethals, and T. Mielikäinen. Mining tiles and tilings. Manuscript in preparation.]]

[10]

B. Goethals and M. J. Zaki, editors. Proc. Workshop on Frequent Itemset Mining Implementations (FIMI--03), volume 90 of CEUR-WS, Melbourne, Florida, 2003. http://CEUR-WS.org/Vol-90/.]]

[11]

J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In ICDM 2002, pages 211--218, 2002.]]

Digital Library

[12]

S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999.

Digital Library

[13]

J. Kleinberg, C. Papadimitriou, and P. Raghavan. Segmentation problems. J. ACM, 51(2):263--280, 2004.]]

Digital Library

[14]

D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, 2000.]]

[15]

T. Mielikäinen and H. Mannila. The pattern ordering problem. In N. Lavrac, D. Gamberger, L. Todorovski, and H. Blockeel, editors, Proc. PKDD--2003, volume 2383 of LNAI, pages 327--338. Springer, 2003.]]

[16]

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. LNCS, 1540:398--416, 1999.]]

Digital Library

[17]

J. Pei, A. K. Tung, and J. Han. Fault-tolerant frequent pattern mining: Problems and challenges. In Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001.]]

[18]

J. A. Swets. Measuring the accuracy of diagnostic systems. Science, 240(4857):1285--93, June 1988.]]

[19]

P. Tzvetkov, X. Yan, and J. Han. TSP: Mining top-k closed sequential patterns. In ICDM 2003, pages 347--354, 2003.]]

Digital Library

[20]

C. Yang, U. Fayyad, and P. S. Bradley. Efficient discovery of error-tolerant frequent itemsets in high dimensions. In Proc. KDD--2001, pages 194--203. ACM Press, 2001.]]

Digital Library

Cited By

Biswas SSaha DPandit R(2024)A Prefix-Based Key-Value Pair Approach for Rare Itemsets Mining Without FP Tree Generation Under Fixed and Incremental Support ConstraintsJournal of Circuits, Systems and Computers10.1142/S021812662450301833:17Online publication date: 19-Jul-2024
https://doi.org/10.1142/S0218126624503018
(2018)Mining top-k approximate closed patterns in an imprecise databaseInternational Journal of Grid and Utility Computing10.1504/IJGUC.2018.0916969:2(97-107)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1504/IJGUC.2018.091696
Shin KHooi BFaloutsos C(2018)Fast, Accurate, and Flexible Algorithms for Dense Subtensor MiningACM Transactions on Knowledge Discovery from Data10.1145/315441412:3(1-30)Online publication date: 23-Jan-2018
https://dl.acm.org/doi/10.1145/3154414
Show More Cited By

Index Terms

Dense itemsets
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

An adaptive approach to mining frequent itemsets efficiently

The mining of frequent itemsets is a fundamental and important task of data mining. To improve the efficiency in mining frequent itemsets, many researchers developed smart data structures to represent the database, and designed divide-and-conquers ...
Discovery of maximum length frequent itemsets

The use of frequent itemsets has been limited by the high computational cost as well as the large number of resulting itemsets. In many real-world scenarios, however, it is often sufficient to mine a small representative subset of frequent itemsets with ...
An efficient method for mining frequent itemsets with double constraints

Constraint-based frequent itemset mining is necessary when the needs and interests of users are the top priority. In this task, two opposite types of constraint are studied, namely anti-monotone and monotone constraints. Previous approaches have mainly ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

August 2004

874 pages

ISBN:1581138881

DOI:10.1145/1014052

General Chairs:
Won Kim
Cyber Database Solutions
,
Ronny Kohavi
Amazon.com
,
Program Chairs:
Johannes Gehrke
Cornell University
,
William DuMouchel
AT&T Labs Research

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD04

Sponsor:

KDD04: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 22 - 25, 2004

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

46
Total Citations
View Citations
550
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Biswas SSaha DPandit R(2024)A Prefix-Based Key-Value Pair Approach for Rare Itemsets Mining Without FP Tree Generation Under Fixed and Incremental Support ConstraintsJournal of Circuits, Systems and Computers10.1142/S021812662450301833:17Online publication date: 19-Jul-2024
https://doi.org/10.1142/S0218126624503018
(2018)Mining top-k approximate closed patterns in an imprecise databaseInternational Journal of Grid and Utility Computing10.1504/IJGUC.2018.0916969:2(97-107)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.1504/IJGUC.2018.091696
Shin KHooi BFaloutsos C(2018)Fast, Accurate, and Flexible Algorithms for Dense Subtensor MiningACM Transactions on Knowledge Discovery from Data10.1145/315441412:3(1-30)Online publication date: 23-Jan-2018
https://dl.acm.org/doi/10.1145/3154414
Liu SPoon C(2018)On mining approximate and exact fault-tolerant frequent itemsetsKnowledge and Information Systems10.1007/s10115-017-1079-455:2(361-391)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s10115-017-1079-4
Liu J(2018)Approximation of Frequent ItemsetsEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_22(147-151)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-1-4614-8265-9_22
Liu J(2017)Approximation of Frequent ItemsetsEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_22-2(1-5)Online publication date: 18-Sep-2017
https://doi.org/10.1007/978-1-4899-7993-3_22-2
Shin KHooi BFaloutsos C(2016)M-ZoomEuropean Conference on Machine Learning and Knowledge Discovery in Databases - Volume 985110.1007/978-3-319-46128-1_17(264-280)Online publication date: 19-Sep-2016
https://dl.acm.org/doi/10.1007/978-3-319-46128-1_17
Johnson NZhang HFang GKumar VKuang R(2015)SubPatCNV: approximate subspace pattern mining for mapping copy-number variationsBMC Bioinformatics10.1186/s12859-014-0426-716:1Online publication date: 16-Jan-2015
https://doi.org/10.1186/s12859-014-0426-7
Yu XLi YWang H(2015)Mining Approximate Frequent Patterns from Noisy DatabasesProceedings of the 2015 10th International Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA)10.1109/BWCCA.2015.29(400-403)Online publication date: 4-Nov-2015
https://dl.acm.org/doi/10.1109/BWCCA.2015.29
Vreeken JTatti N(2014)Interesting PatternsFrequent Pattern Mining10.1007/978-3-319-07821-2_5(105-134)Online publication date: 30-Aug-2014
https://doi.org/10.1007/978-3-319-07821-2_5
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten