Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1014052.1014140acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Dense itemsets

Published: 22 August 2004 Publication History

Abstract

Frequent itemset mining has been the subject of a lot of work in data mining research ever since association rules were introduced. In this paper we address a problem with frequent itemsets: that they only count rows where all their attributes are present, and do not allow for any noise. We show that generalizing the concept of frequency while preserving the performance of mining algorithms is nontrivial, and introduce a generalization of frequent itemsets, dense itemsets. Dense itemsets do not require all attributes to be present at the same time; instead, the itemset needs to define a sufficiently large submatrix that exceeds a given density threshold of attributes present.We consider the problem of computing all dense itemsets in a database. We give a levelwise algorithm for this problem, and also study the top-$k$ variations, i.e., finding the k densest sets with a given support, or the k best-supported sets with a given density. These algorithms select the other parameter automatically, which simplifies mining dense itemsets in an explorative way. We show that the concept captures natural facets of data sets, and give extensive empirical results on the performance of the algorithms. Combining the concept of dense itemsets with set cover ideas, we also show that dense itemsets can be used to obtain succinct descriptions of large datasets. We also discuss some variations of dense itemsets.

References

[1]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD '93, pages 207--216, 1993.]]
[2]
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 12, pages 307--328. AAAI Press, 1996.]]
[3]
E. Bingham, H. Mannila, and J. K. Seppänen. Topics in 0-1 data. In D. Hand, D. Keim, and R. Ng, editors, Proc. KDD--2002, pages 450--455, 2002.]]
[4]
J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by means of free-sets. In PKDD--2000, volume 1910 of LNCS, pages 75--85. Springer, 2000.]]
[5]
T. Brijs, G. Swinnen, K. Vanhoof, and G. Wets. Using association rules for product assortment decisions: A case study. In Knowledge Discovery and Data Mining, pages 254--260, 1999.]]
[6]
A. Bykowski, J. K. Seppänen, and J. Hollmén. Model-independent bounding of Boolean formulae in binary data. In M. Klemettinen, R. Meo, F. Giannotti, and L. De Raedt, editors, Knowledge Discovery in Inductive Databases (KDID--02), First International Workshop, pages 20--31, Helsinki, Finland, 2002. University of Helsinki Department of Computer Science Report B--2002--7.]]
[7]
T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In T. Elomaa, H. Mannila, and H. Toivonen, editors, Proc. PKDD--2002, volume 2431 of LNCS, pages 74--85. Springer-Verlag, 2002.]]
[8]
R. Duda and P. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.]]
[9]
F. Geerts, B. Goethals, and T. Mielikäinen. Mining tiles and tilings. Manuscript in preparation.]]
[10]
B. Goethals and M. J. Zaki, editors. Proc. Workshop on Frequent Itemset Mining Implementations (FIMI--03), volume 90 of CEUR-WS, Melbourne, Florida, 2003. http://CEUR-WS.org/Vol-90/.]]
[11]
J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining top-k frequent closed patterns without minimum support. In ICDM 2002, pages 211--218, 2002.]]
[12]
S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39--45, 1999.
[13]
J. Kleinberg, C. Papadimitriou, and P. Raghavan. Segmentation problems. J. ACM, 51(2):263--280, 2004.]]
[14]
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, 2000.]]
[15]
T. Mielikäinen and H. Mannila. The pattern ordering problem. In N. Lavrac, D. Gamberger, L. Todorovski, and H. Blockeel, editors, Proc. PKDD--2003, volume 2383 of LNAI, pages 327--338. Springer, 2003.]]
[16]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. LNCS, 1540:398--416, 1999.]]
[17]
J. Pei, A. K. Tung, and J. Han. Fault-tolerant frequent pattern mining: Problems and challenges. In Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001.]]
[18]
J. A. Swets. Measuring the accuracy of diagnostic systems. Science, 240(4857):1285--93, June 1988.]]
[19]
P. Tzvetkov, X. Yan, and J. Han. TSP: Mining top-k closed sequential patterns. In ICDM 2003, pages 347--354, 2003.]]
[20]
C. Yang, U. Fayyad, and P. S. Bradley. Efficient discovery of error-tolerant frequent itemsets in high dimensions. In Proc. KDD--2001, pages 194--203. ACM Press, 2001.]]

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2004
874 pages
ISBN:1581138881
DOI:10.1145/1014052
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. error tolerance
  2. frequent itemsets

Qualifiers

  • Article

Conference

KDD04

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Mining top-k approximate closed patterns in an imprecise databaseInternational Journal of Grid and Utility Computing10.1504/IJGUC.2018.0916969:2(97-107)Online publication date: 1-Jan-2018
  • (2018)Fast, Accurate, and Flexible Algorithms for Dense Subtensor MiningACM Transactions on Knowledge Discovery from Data10.1145/315441412:3(1-30)Online publication date: 23-Jan-2018
  • (2018)On mining approximate and exact fault-tolerant frequent itemsetsKnowledge and Information Systems10.1007/s10115-017-1079-455:2(361-391)Online publication date: 1-May-2018
  • (2018)Approximation of Frequent ItemsetsEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_22(147-151)Online publication date: 7-Dec-2018
  • (2017)Approximation of Frequent ItemsetsEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_22-2(1-5)Online publication date: 18-Sep-2017
  • (2016)M-ZoomEuropean Conference on Machine Learning and Knowledge Discovery in Databases - Volume 985110.1007/978-3-319-46128-1_17(264-280)Online publication date: 19-Sep-2016
  • (2015)SubPatCNV: approximate subspace pattern mining for mapping copy-number variationsBMC Bioinformatics10.1186/s12859-014-0426-716:1Online publication date: 16-Jan-2015
  • (2015)Mining Approximate Frequent Patterns from Noisy DatabasesProceedings of the 2015 10th International Conference on Broadband and Wireless Computing, Communication and Applications (BWCCA)10.1109/BWCCA.2015.29(400-403)Online publication date: 4-Nov-2015
  • (2014)Interesting PatternsFrequent Pattern Mining10.1007/978-3-319-07821-2_5(105-134)Online publication date: 30-Aug-2014
  • (2014)On Mining Proportional Fault-Tolerant Frequent ItemsetsDatabase Systems for Advanced Applications10.1007/978-3-319-05810-8_23(342-356)Online publication date: 2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media