Article

The complexity of mining maximal frequent itemsets and maximal frequent patterns

Author:

Guizhen YangAuthors Info & Claims

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 344 - 353

https://doi.org/10.1145/1014052.1014091

Published: 22 August 2004 Publication History

Abstract

Mining maximal frequent itemsets is one of the most fundamental problems in data mining. In this paper we study the complexity-theoretic aspects of maximal frequent itemset mining, from the perspective of counting the number of solutions. We present the first formal proof that the problem of counting the number of distinct maximal frequent itemsets in a database of transactions, given an arbitrary support threshold, is #P-complete, thereby providing strong theoretical evidence that the problem of mining maximal frequent itemsets is NP-hard. This result is of particular interest since the associated decision problem of checking the existence of a maximal frequent itemset is in P.We also extend our complexity analysis to other similar data mining problems dealing with complex data structures, such as sequences, trees, and graphs, which have attracted intensive research interests in recent years. Normally, in these problems a partial order among frequent patterns can be defined in such a way as to preserve the downward closure property, with maximal frequent patterns being those without any successor with respect to this partial order. We investigate several variants of these mining problems in which the patterns of interest are subsequences, subtrees, or subgraphs, and show that the associated problems of counting the number of maximal frequent patterns are all either #P-complete or #P-hard.

References

[1]

R. C. Agarwal, C. C. Aggarwal, and V. V. V. Prasad. Depth first generation of long patterns. In KDD, 2000.]]

Digital Library

[2]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, 1994.]]

Digital Library

[3]

R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE, 1995.]]

Digital Library

[4]

T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arikawa. Efficient substructure discovery from large semi-structured data. In SDM, 2002.]]

[5]

R. J. Bayardo Jr. Efficiently mining long patterns from databases. In SIGMOD, 1998.]]

Digital Library

[6]

E. Boros, V. Gurvich, L. Khachiyan, and K. Makino. On the complexity of generating maximal frequent and minimal infrequent sets. In STACS, 2002.]]

Digital Library

[7]

D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In ICDE, 2001.]]

Digital Library

[8]

M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979.]]

Digital Library

[9]

K. Gouda and M. J. Zaki. Efficiently mining maximal frequent itemsets. In ICDM, 2001.]]

Digital Library

[10]

D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharm. Discovering all most specific sentences. ACM Transactions on Database Systems (TODS), 28(2):140--174, 2003.]]

Digital Library

[11]

H. B. Hunt III, M. V. Marathe, V. Radhakrishnan, and R. E. Stearns. The complexity of planar counting problems. SIAM Journal on Computing, 27(4):1142--1167, 1998.]]

Digital Library

[12]

A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD, 2000.]]

Digital Library

[13]

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, 2001.]]

Digital Library

[14]

S. O. Kuznetsov. Interpretation on graphs and complexity characteristics of a search for specific patterns. Nauchno-Tekhnicheskaya Informatsiya, Seriya 2 (Automatic Documentation and Mathematical Linguistics), 23(1):23--27, 1989.]]

[15]

D.-I. Lin and Z. M. Kedem. Pincer-Search: An efficient algorithm for discovering the maximum frequent set. IEEE Transactions on Knowledge and Data Engineering (TKDE), 14(3):553--566, 2002.]]

Digital Library

[16]

C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.]]

[17]

C. H. Papadimitriou. NP-completeness: A retrospective. In ICALP, 1997.]]

Digital Library

[18]

J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal on Computing, 12(4):777--788, 1983.]]

Digital Library

[19]

G. Ramesh, W. Maniatty, and M. J. Zaki. Feasible itemset distributions in data mining: Theory and application. In PODS, 2003.]]

Digital Library

[20]

E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.]]

Digital Library

[21]

S. P. Vadhan. The complexity of counting in sparse, regular, and planar graphs. SIAM Journal on Computing, 31(2):398--427, 2001.]]

Digital Library

[22]

L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science, 8:189--201, 1979.]]

[23]

J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In KDD, 2003.]]

Digital Library

[24]

Y. Xiao, J.-F. Yao, Z. Li, and M. H. Dunham. Efficient data mining for maximal frequent subtrees. In ICDM, 2003.]]

Digital Library

[25]

M. J. Zaki. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning, 42(1/2):31--60, 2001.]]

Digital Library

[26]

M. J. Zaki. Efficiently mining frequent trees in a forest. In KDD, 2002.]]

Digital Library

[27]

M. J. Zaki and M. Ogihara. Theoretical foundations of association rules. In DMKD, 1998.]]

[28]

M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In KDD, 1997.]]

Digital Library

Cited By

Wu YSun RWang XWen DZhang YQin LLin X(2024)Efficient Maximal Frequent Group Enumeration in Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368199717:11(3243-3255)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681997
Islam MAsadi MBasu Roy S(2023)Equitable Top-k Results for Long Tail DataProceedings of the ACM on Management of Data10.1145/36267271:4(1-24)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626727
Likhitha PRavikumar PSaxena DKiran RWatanobe Y(2023) k -PFPMiner: Top-k Periodic Frequent Patterns in Big Temporal Databases IEEE Access10.1109/ACCESS.2023.332583911(119033-119044)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3325839
Show More Cited By

Recommendations

Mining frequent patterns without candidate generation

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test ...
Mining association rules between sets of items in large databases
SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data

We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The ...
Mining association rules between sets of items in large databases

We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

August 2004

874 pages

ISBN:1581138881

DOI:10.1145/1014052

General Chairs:
Won Kim
Cyber Database Solutions
,
Ronny Kohavi
Amazon.com
,
Program Chairs:
Johannes Gehrke
Cornell University
,
William DuMouchel
AT&T Labs Research

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD04

Sponsor:

KDD04: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 22 - 25, 2004

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

136
Total Citations
View Citations
1,833
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)6

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu YSun RWang XWen DZhang YQin LLin X(2024)Efficient Maximal Frequent Group Enumeration in Temporal Bipartite GraphsProceedings of the VLDB Endowment10.14778/3681954.368199717:11(3243-3255)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3681997
Islam MAsadi MBasu Roy S(2023)Equitable Top-k Results for Long Tail DataProceedings of the ACM on Management of Data10.1145/36267271:4(1-24)Online publication date: 12-Dec-2023
https://dl.acm.org/doi/10.1145/3626727
Likhitha PRavikumar PSaxena DKiran RWatanobe Y(2023) k -PFPMiner: Top-k Periodic Frequent Patterns in Big Temporal Databases IEEE Access10.1109/ACCESS.2023.332583911(119033-119044)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3325839
Mihelčić MKurdija A(2023)On the complexity of redescription miningTheoretical Computer Science10.1016/j.tcs.2022.12.023944:COnline publication date: 25-Jan-2023
https://dl.acm.org/doi/10.1016/j.tcs.2022.12.023
Gioia DFior JCagliero L(2023)Early portfolio pruning: a scalable approach to hybrid portfolio selectionKnowledge and Information Systems10.1007/s10115-023-01832-765:6(2485-2508)Online publication date: 31-Jan-2023
https://doi.org/10.1007/s10115-023-01832-7
Chanda DDebnath N(2023)Soft Computing Framework for Digital Healthcare SystemProceedings of the 9th International Conference on Advanced Intelligent Systems and Informatics 202310.1007/978-3-031-43247-7_30(339-347)Online publication date: 18-Sep-2023
https://doi.org/10.1007/978-3-031-43247-7_30
van Leeuwen MUkkonen A(2022)Fast Estimation of the Pattern Frequency SpectrumMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44851-9_8(114-129)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44851-9_8
Subbian KSridhar CAggarwal CSrivastava J(2022)Scalable Information Flow Mining in NetworksMachine Learning and Knowledge Discovery in Databases10.1007/978-3-662-44845-8_9(130-146)Online publication date: 10-Mar-2022
https://dl.acm.org/doi/10.1007/978-3-662-44845-8_9
Maia MSantana ÍRosseti ISouza UPlastino A(2022)MineReduce-Based Metaheuristic for the Minimum Latency ProblemMetaheuristics10.1007/978-3-031-26504-4_7(88-102)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1007/978-3-031-26504-4_7
Pinaire JChabert EAzé JBringay SLandais P(2021)Sequential Pattern Mining to Predict Medical In-Hospital Mortality from Administrative Data: Application to Acute Coronary SyndromeJournal of Healthcare Engineering10.1155/2021/55318072021(1-12)Online publication date: 25-May-2021
https://doi.org/10.1155/2021/5531807
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten