Article

Feasible itemset distributions in data mining: theory and application

Authors:

William A. Maniatty,

Mohammed J. ZakiAuthors Info & Claims

PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 284 - 295

https://doi.org/10.1145/773153.773181

Published: 09 June 2003 Publication History

Abstract

Computing frequent itemsets and maximally frequent item-sets in a database are classic problems in data mining. The resource requirements of all extant algorithms for both problems depend on the distribution of frequent patterns, a topic that has not been formally investigated. In this paper, we study properties of length distributions of frequent and maximal frequent itemset collections and provide novel solutions for computing tight lower bounds for feasible distributions. We show how these bounding distributions can help in generating realistic synthetic datasets, which can be used for algorithm benchmarking.

References

[1]

R. Agrawal, C. Aggarwal, and V. Prasad. Depth First Generation of Long Patterns. In 7th Int'l Conference on Knowledge Discovery and Data Mining, Aug. 2000.

Digital Library

[2]

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. Fayyad and et al, editors, Advances in Knowledge Discovery and Data Mining, pages 307--328. AAAI Press, Menlo Park, CA, 1996.

Digital Library

[3]

S. Bay. The UCI KDD Archive (kdd.ics.uci.edu). University of California, Irvine. Department of Information and Computer Science.

[4]

R. J. Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD Conf. Management of Data, June 1998.

Digital Library

[5]

B. Bollobás. Combinatorics. Cambridge University Press, 1986.

[6]

S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In ACM SIGMOD Conf. Management of Data, May 1997.

Digital Library

[7]

D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: a maximal frequent itemset algorithm for transactional databases. In Intl. Conf. on Data Engineering, Apr. 2001.

Digital Library

[8]

B. Goethals, F. Geerts, and J. V. den Bussche. A tight upper bound on the number of candidate patterns. In 1st IEEE International Conference on Data Mining, Nov. 2001.

Digital Library

[9]

K. Gouda and M. J. Zaki. Efficiently mining maximal frequent itemsets. In 1st IEEE Int'l Conf. on Data Mining, Nov. 2001.

Digital Library

[10]

D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. Data mining, hypergraph transversals, and machine learning. In 16th ACM Symp. Principles of Database Systems, May 1997.

Digital Library

[11]

J. Han and M. Kamber. Data Mining: Concepts and Techniuqes. Morgan Kaufmann Publishers, San Francisco, CA, 2001.

Digital Library

[12]

J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Conf. Management of Data, May 2000.

Digital Library

[13]

G. Katona. A theorem of finite sets. In P. Erdos and G. Katona, editors, Theory of Graphs, pages 187--207. Akademiai Kiado, Budapest, 1968.

[14]

D.-I. Lin and Z. M. Kedem. Pincer-search: A new algorithm for discovering the maximum frequent set. In 6th Intl. Conf. Extending Database Technology, Mar. 1998.

Digital Library

[15]

A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In 21st VLDB Conf., 1995.

Digital Library

[16]

M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Aug. 1997.

[17]

Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In 7th Intl. Conf. on Knowledge Discovery and Data Mining, Aug. 2001.

Digital Library

Cited By

Buzmakov AKuznetsov SMakhalova TNapoli A(2022)△-Closure Structure for Studying Data Distribution2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00099(867-872)Online publication date: Nov-2022
https://doi.org/10.1109/ICDM54844.2022.00099
Saccá DSerra ERullo A(2019)Extending inverse frequent itemsets mining to generate realistic datasets: complexity, accuracy and emerging applicationsData Mining and Knowledge Discovery10.1007/s10618-019-00643-133:6(1736-1774)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1007/s10618-019-00643-1
Zimmermann A(2019)Method evaluation, parameterization, and result validation in unsupervised data mining: A critical surveyWIREs Data Mining and Knowledge Discovery10.1002/widm.133010:2Online publication date: 29-Jul-2019
https://doi.org/10.1002/widm.1330
Show More Cited By

Index Terms

Feasible itemset distributions in data mining: theory and application

Recommendations

Non-derivable itemset mining

All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This principle allows for excluding candidate itemsets from the expensive counting phase. In this paper, we present sound and complete deduction rules to ...
A survey of incremental high-utility itemset mining

Traditional association rule mining has been widely studied. But it is unsuitable for real-world applications where factors such as unit profits of items and purchase quantities must be considered. High-utility itemset mining HUIM is designed to find ...
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation
ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining

In order to generate synthetic basket datasets for better benchmark testing, it is important to integrate characteristics from real-life databases into the synthetic basket datasets. The characteristics that could be used for this purpose include the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

June 2003

291 pages

ISBN:1581136706

DOI:10.1145/773153

Conference Chair:
Frank Neven
Limburgs Universitair Centrum
,
General Chair:
Catriel Beeri
Hebrew University of Jerusalem
,
Program Chair:
Tova Milo
Tel Aviv University & INRIA

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SIGMOD/PODS03

Sponsor:

SIGMOD/PODS03: International Conference on Management of Data and Symposium on Principles Database and Systems

June 9 - 11, 2003

California, San Diego

Acceptance Rates

PODS '03 Paper Acceptance Rate 27 of 136 submissions, 20%;

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
877
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Buzmakov AKuznetsov SMakhalova TNapoli A(2022)△-Closure Structure for Studying Data Distribution2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00099(867-872)Online publication date: Nov-2022
https://doi.org/10.1109/ICDM54844.2022.00099
Saccá DSerra ERullo A(2019)Extending inverse frequent itemsets mining to generate realistic datasets: complexity, accuracy and emerging applicationsData Mining and Knowledge Discovery10.1007/s10618-019-00643-133:6(1736-1774)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1007/s10618-019-00643-1
Zimmermann A(2019)Method evaluation, parameterization, and result validation in unsupervised data mining: A critical surveyWIREs Data Mining and Knowledge Discovery10.1002/widm.133010:2Online publication date: 29-Jul-2019
https://doi.org/10.1002/widm.1330
Saccà DSerra ECuzzocrea ADesai BFlesca SZumpano EMasciari ECaroprese L(2018)The Inverse Tree-OLAP ProblemProceedings of the 22nd International Database Engineering & Applications Symposium10.1145/3216122.3216129(148-156)Online publication date: 18-Jun-2018
https://dl.acm.org/doi/10.1145/3216122.3216129
Serra EVaidya JAkella HSharma A(2017)Evaluating the Privacy Implications of Frequent Itemset DisclosureICT Systems Security and Privacy Protection10.1007/978-3-319-58469-0_34(506-519)Online publication date: 4-May-2017
https://doi.org/10.1007/978-3-319-58469-0_34
Henriques RMadeira S(2016)BicNET: Flexible module discovery in large-scale biological networks using biclusteringAlgorithms for Molecular Biology10.1186/s13015-016-0074-811:1Online publication date: 20-May-2016
https://doi.org/10.1186/s13015-016-0074-8
Zimmermann A(2015)The Data Problem in Data MiningACM SIGKDD Explorations Newsletter10.1145/2783702.278370616:2(38-45)Online publication date: 21-May-2015
https://dl.acm.org/doi/10.1145/2783702.2783706
Guzzo AMoccia LSaccà DSerra E(2013)Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programsACM Transactions on Knowledge Discovery from Data10.1145/2541268.25412717:4(1-39)Online publication date: 25-Dec-2013
https://dl.acm.org/doi/10.1145/2541268.2541271
Verykios V(2013)Association rule hiding methodsWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.10823:1(28-36)Online publication date: 1-Jan-2013
https://dl.acm.org/doi/10.1002/widm.1082
Saccà DSerra EGuzzo A(2012)Count constraints and the inverse OLAP problemProceedings of the 7th international conference on Foundations of Information and Knowledge Systems10.1007/978-3-642-28472-4_20(352-369)Online publication date: 5-Mar-2012
https://dl.acm.org/doi/10.1007/978-3-642-28472-4_20
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents