Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

GrAFCI+ A fast generator-based algorithm for mining frequent closed itemsets

Published: 01 July 2021 Publication History

Abstract

Mining itemsets for association rule generation is a fundamental data mining task originally stemming from the traditional market basket analysis problem. However, enumerating all frequent itemsets, especially in a dense dataset, or with low support thresholds, remains costly. In this paper, a novel theorem builds the relationship between frequent closed itemsets (FCIs) and frequent generator itemsets (FGIs) and proves that the process of mining FCIs is equivalent to mining FGIs, unified with their full-support and extension items. On the basis of this theorem, a generator-based algorithm for mining FCIs, called GrAFCI+, is proposed and explained in details including its correctness. The comparative effectiveness of the algorithm in terms of scalability is first investigated, along with the compression rate—a measure of the interestingness of a given FIs representation. Extensive experiments are further undertaken on eight datasets and four state-of-the-art algorithms, namely DCI_CLOSED*, DCI_PLUS, FPClose, and NAFCP. The results show that the proposed algorithm is more efficient regarding the execution time in most cases as compared to these algorithms. Because GrAFCI+ main goal is to address the runtime issue, it paid a memory cost, especially when the support is too small. However, this cost is not high since GrAFCI+ is seconded by only one competitor out of four in memory utilization and for large support values. As an overall assessment, GrAFCI+ gives better results than most of its competitors.

References

[1]
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on very large data bases (VLDB ’94), Morgan Kaufmann Publishers Inc., pp 487–499
[2]
Alves R, Rodríguez-Baena DS, and Aguilar-Ruiz JS Gene association analysis: a survey of frequent pattern mining from gene expression data Briefings Bioinform 2010 11 2 210-224
[3]
Burdick D, Calimlim M, Flannick J, Gehrke J, and Yiu T MAFIA: a maximal frequent itemset algorithm IEEE Trans Knowl Data Eng 2005 17 11 1490-1504
[4]
Deng Z and Lv S Prepost+: an efficient n-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning Expert Syst Appl 2015 42 13 5424-5432
[5]
Deng Z and Wang Z A new fast vertical method for mining frequent patterns Int J Comput Intell Syst 2010 3 6 733-744
[6]
Deng Z, Wang Z, and Jiang J A new algorithm for fast mining frequent itemsets using n-lists Sci China Inform Sci 2012 55 9 2008-2030
[7]
Djenouri Y, Djenouri D, Belhadi A, Fournier-Viger P, and Lin JCW A new framework for metaheuristic-based frequent itemset mining Appl Intell 2018 48 12 4775-4791
[8]
Dong G, Feng M, Son NT, Lee TS, Li J, Liu G, Wong L (2002) pattern space projects. https://www.comp.nus.edu.sg/~wongls/projects/pattern-spaces/
[9]
FIMI (2003) Frequent itemset mining dataset repository. http://fimi.cs.helsinki.fi/data/
[10]
Fournier-Viger P, Lin JC, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. In: LNCS, vol 9853, pp 36–40
[11]
Goethals B and Zaki MJ Advances in frequent itemset mining implementations: Report on FIMI03 SIGKDD Explorat 2004 6 1 109-117
[12]
Grahne G and Zhu J Fast algorithms for frequent itemset mining using fp-trees IEEE Trans Knowl Data Eng 2005 17 10 1347-1362
[13]
Han J, Pei J, and Yin Y Mining frequent patterns without candidate generation SIGMOD Rec 2000 29 2 1-12
[14]
Han J, Pei J, Yin Y, and Mao R Mining frequent patterns without candidate generation: a frequent-pattern tree approach Data Mining Knowl Disc 2004 8 1 53-87
[15]
Han J, Kamber M, and Pe J Data Mining: Concepts and Techniques, chap 6 2011 3 Burlington Morgan Kaufmann Publishers 243-278
[16]
Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proceedings 2001 IEEE international conference on data mining, pp 305–312
[17]
Le T and Vo B An n-list-based algorithm for mining frequent closed patterns Expert Syst Appl 2015 42 19 6648-6657
[18]
Li J, Li H, Wong L, Pei J, Dong G (2006) Minimum description length principle: generators are preferable to closed patterns. In: Proceedings of the 21st national conference on artificial intelligence - Volume 1, AAAI Press, pp 409–414
[19]
Liu G, Li J, and Wong L A new concise representation of frequent itemsets using generators and a positive border Knowl Inf Syst 2008 17 1 35-56
[20]
Lucchese C, Orlando S, and Perego R Fast and memory efficient mining of frequent closed itemsets IEEE Trans Knowl Data Eng 2006 18 1 21-36
[21]
Nam H, Yun U, Yoon E, and Lin JCW Efficient approach for incremental weighted erasable pattern mining with list structure Expert Syst Appl 2020 143 113087
[22]
Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) Carpenter: Finding closed patterns in long biological datasets. In: Proceedings of the 9th ACM SIGKDD conference, pp 637—-642
[23]
Pasquier N, Bastide Y, Taouil R, and Lakhal L Efficient mining of association rules using closed itemset lattices Inf Syst 1999 24 1 25-46
[24]
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: workshop on research issues in data mining and knowledge discovery, pp 21–30
[25]
Pei J, Dong G, Zou W, and Han J Mining condensed frequent-pattern bases Knowl Inf Syst 2004 6 5 570-594
[26]
Sahoo J, Ashok KD, and Goswami A An effective association rule mining scheme using a new generic basis Knowl Inf Syst 2015 43 1 127-156
[27]
Sun J, Xun Y, Zhang J, and Li J Incremental frequent itemsets mining with FCFP tree IEEE Access 2019 7 136511-136524
[28]
Vo B, Hong TP, and Le B DBV-Miner: a dynamic bit-vector approach for fast mining frequent closed itemsets Expert Syst Appl 2012 39 8 7196-7206
[29]
Vo B, Le T, Coenen F, and Hong T Mining frequent itemsets using the n-list and subsume concepts Int J Mach Learn Cyber 2016 7 2 253-265
[30]
Vo B, Pham S, Le T, and Deng Z A novel approach for mining maximal frequent patterns Expert Syst Appl 2017 73 178-186
[31]
Wang J, Han J, Pei J (2003) Closet+: Searching for the best strategies for mining frequent closed itemsets. In: Proc of the 9th ACM SIGKDD conference, pp 236–245
[32]
Xu Y, Li Y (2007) Generating concise association rules. In: Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM ’07), pp 781–790
[33]
Zaki MJ Scalable algorithms for association mining IEEE Trans Knowl Data Eng 2000 12 3 372-390
[34]
Zaki MJ and Hsiao C Efficient algorithms for mining closed itemsets and their lattice structure IEEE Trans Knowl Data Eng 2005 17 4 462-478
[35]
Zhang C, Tian P, Zhang X, Liao Q, Jiang ZL, and Wang X HashEclat: an efficient frequent itemset algorithm Int J Mach Learn Cyber 2019 10 3003-3016

Cited By

View all

Index Terms

  1. GrAFCI+ A fast generator-based algorithm for mining frequent closed itemsets
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Knowledge and Information Systems
      Knowledge and Information Systems  Volume 63, Issue 7
      Jul 2021
      387 pages

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 01 July 2021
      Accepted: 26 April 2021
      Revision received: 17 April 2021
      Received: 07 January 2019

      Author Tags

      1. Frequent closed itemset mining
      2. Frequent generator itemset
      3. Generator-based itemset mining

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 19 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media