Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/773153.773181acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Feasible itemset distributions in data mining: theory and application

Published: 09 June 2003 Publication History

Abstract

Computing frequent itemsets and maximally frequent item-sets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking.

References

[1]
R. Agrawal, C. Aggarwal, and V. Prasad. Depth First Generation of Long Patterns. In 7th Int'l Conference on Knowledge Discovery and Data Mining, Aug. 2000.
[2]
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data Mining, pages 307--328. AAAI Press, Menlo Park, CA, 1996.
[3]
S. Bay. The UCI KDD Archive (kdd.ics.uci.edu). University of California, Irvine. Department of Information and Computer Science.
[4]
R. J. Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD Conf. Management of Data, June 1998.
[5]
B. Bollobás. Combinatorics. Cambridge University Press, 1986.
[6]
S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In ACM SIGMOD Conf. Management of Data, May 1997.
[7]
D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. In Intl. Conf. on Data Engineering, Apr. 2001.
[8]
B. Goethals, F. Geerts, and J. V. den Bussche. A tight upper bound on the number of candidate patterns. In 1st IEEE International Conference on Data Mining, Nov. 2001.
[9]
K. Gouda and M. J. Zaki. Efficiently mining maximal frequent itemsets. In 1st IEEE Int'l Conf. on Data Mining, Nov. 2001.
[10]
D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. Data mining, hypergraph transversals, and machine learning. In 16th ACM Symp. Principles of Database Systems, May 1997.
[11]
J. Han and M. Kamber. Data Mining: Concepts and Techniuqes. Morgan Kaufmann Publishers, San Francisco, CA, 2001.
[12]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Conf. Management of Data, May 2000.
[13]
G. Katona. A theorem of finite sets. In P. Erdos and G. Katona, editors, Theory of Graphs, pages 187--207. Akademiai Kiado, Budapest, 1968.
[14]
D.-I. Lin and Z. M. Kedem. Pincer-search: A new algorithm for discovering the maximum frequent set. In 6th Intl. Conf. Extending Database Technology, Mar. 1998.
[15]
A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In 21st VLDB Conf., 1995.
[16]
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Aug. 1997.
[17]
Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In 7th Intl. Conf. on Knowledge Discovery and Data Mining, Aug. 2001.

Cited By

View all
  • (2022)△-Closure Structure for Studying Data Distribution2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00099(867-872)Online publication date: Nov-2022
  • (2019)Extending inverse frequent itemsets mining to generate realistic datasets: complexity, accuracy and emerging applicationsData Mining and Knowledge Discovery10.1007/s10618-019-00643-133:6(1736-1774)Online publication date: 1-Nov-2019
  • (2019)Method evaluation, parameterization, and result validation in unsupervised data mining: A critical surveyWIREs Data Mining and Knowledge Discovery10.1002/widm.133010:2Online publication date: 29-Jul-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2003
291 pages
ISBN:1581136706
DOI:10.1145/773153
  • Conference Chair:
  • Frank Neven,
  • General Chair:
  • Catriel Beeri,
  • Program Chair:
  • Tova Milo
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2003

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS03

Acceptance Rates

PODS '03 Paper Acceptance Rate 27 of 136 submissions, 20%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)△-Closure Structure for Studying Data Distribution2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00099(867-872)Online publication date: Nov-2022
  • (2019)Extending inverse frequent itemsets mining to generate realistic datasets: complexity, accuracy and emerging applicationsData Mining and Knowledge Discovery10.1007/s10618-019-00643-133:6(1736-1774)Online publication date: 1-Nov-2019
  • (2019)Method evaluation, parameterization, and result validation in unsupervised data mining: A critical surveyWIREs Data Mining and Knowledge Discovery10.1002/widm.133010:2Online publication date: 29-Jul-2019
  • (2018)The Inverse Tree-OLAP ProblemProceedings of the 22nd International Database Engineering & Applications Symposium10.1145/3216122.3216129(148-156)Online publication date: 18-Jun-2018
  • (2017)Evaluating the Privacy Implications of Frequent Itemset DisclosureICT Systems Security and Privacy Protection10.1007/978-3-319-58469-0_34(506-519)Online publication date: 4-May-2017
  • (2016)BicNET: Flexible module discovery in large-scale biological networks using biclusteringAlgorithms for Molecular Biology10.1186/s13015-016-0074-811:1Online publication date: 20-May-2016
  • (2015)The Data Problem in Data MiningACM SIGKDD Explorations Newsletter10.1145/2783702.278370616:2(38-45)Online publication date: 21-May-2015
  • (2013)Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programsACM Transactions on Knowledge Discovery from Data10.1145/2541268.25412717:4(1-39)Online publication date: 25-Dec-2013
  • (2013)Association rule hiding methodsWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.10823:1(28-36)Online publication date: 1-Jan-2013
  • (2012)Count constraints and the inverse OLAP problemProceedings of the 7th international conference on Foundations of Information and Knowledge Systems10.1007/978-3-642-28472-4_20(352-369)Online publication date: 5-Mar-2012
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media