Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Frequent closed itemset based algorithms: a thorough structural and analytical survey

Published: 01 June 2006 Publication History

Abstract

As a side effect of the digitalization of unprecedented amount of data, traditional retrieval tools proved to be unable to extract hidden and valuable knowledge. Data Mining, with a clear promise to provide adequate tools and/or techniques to do so, is the discovery of hidden information that can be retrieved from datasets. In this paper, we present a structural and analytical survey of <u>f</u>requent <u>c</u>losed <u>i</u>temset (FCI) based algorithms for mining association rules. Indeed, we provide a structural classification, in four categories, and a comparison of these algorithms based on criteria that we introduce. We also present an analytical comparison of FCI-based algorithms using benchmark dense and sparse datasets as well as "worst case" datasets. Aiming to stand beyond classical performance analysis, we intend to provide a focal point on performance analysis based on memory consumption and advantages and/or limitations of optimization strategies, used in the FCI-based algorithms.

References

[1]
C. C. Aggarwal. Towards long pattern generation in dense databases. In ACM-SIGKDD Explorations, volume 3(1), pages 20--26, July 2001.]]
[2]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proceedings of the 20th International Conference on Very Large Databases, Santiago, Chile, pages 478--499, June 1994.]]
[3]
Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the 1st International Conference on Computational Logic (DOOD 2000), Springer-Verlag, LNAI, volume 1861, London, UK, pages 972--986, July 2000.]]
[4]
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with counting inference. In Proceeding of the 6th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, USA, volume 2(2), pages 66--75, 20-23 August 2000.]]
[5]
S. Ben Yahia, Y. Slimani, and J. Rezgui. A divide and conquer approach for deriving partially ordered sub-structures. In Proceedings of the International 9th Pacific-Asia Conference on Knowledge Data Discovery (PAKDD 2005), LNAI, volume 3518, Springer-Verlag, Hanoi, Vietnam, pages 91--96, 18-20 May 2005.]]
[6]
J.-F. Boulicaut, A. Bykowski, and C. Rigotti. Free-sets: A condensed representation of boolean data for the approximation of frequency queries. In Jounal of Data Mining and Knowledge Discovery (DMKD), 7 (1):5--22, 2003.]]
[7]
T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In T. Elomaa, H. Mannila, and H. Toivonen, editors, Proceedings of the 6th European Conference on Principles of Knowledge Discovery and Data Mining (PKDD 2002), LNCS, volume 2431, Springer-Verlag, Helsinki, Finland, pages 74--85, 19-23 August 2002.]]
[8]
T. Calders, C. Rigotti, and J.-F. Boulicaut. A survey on condensed representations for frequent sets. In Constraint Based Mining, Springer-Verlag, LNAI, volume 3848, pages 64--80, 2006.]]
[9]
A. Casali, R. Cicchetti, and L. Lakhal. Essential patterns: A perfect cover of frequent patterns. In A Min Tjoa and J. Trujillo, editors, Proceedings of the 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2005), Springer-Verlag, LNCS, volume 3589, Copenhagen, Denmark, pages 428--437, 22-26 August 2005.]]
[10]
G. Cong, K. H. Tung, X. Xu, F. Pan, and J. Yang. FARMER: finding interesting rule groups in microarray datasets. In Proceedings of the 2004 ACM SIGMOD International conference on Management of data, Paris, France, pages 143--154, 2004.]]
[11]
M. El-Hajj and O. Zaiane. Finding all frequent patterns starting from the closure. In the International Conference on Advanced Data Mining and Applications, Wuhan, China, pages 67--74, July 2005.]]
[12]
F. Flouvat, F. De Marchi, and J-M. Petit. A thorough experimental study of datasets for frequent itemsets. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), New Orleans, USA, pages 162--169, November 2005.]]
[13]
H. Fu and E. Mephu Nguifo. Partitioning large data to scale up lattice-based algorithm. In Proceedings of IEEE International Conference on Tools with Artificial Intelligence (IC-TAI 2003), Sacramento, California, USA, pages 537--541, November 2003.]]
[14]
J. Galambos and I. Simonelli. Bonferroni-type inequalities with applications. Springer-Verlag, 2000.]]
[15]
B. Ganter and R. Wille. Formal Concept Analysis. Springer-Verlag, 1999.]]
[16]
B. Goethals and M. J. Zaki. FIMI'03: Workshop on frequent itemset mining implementations. In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), volume 90 of CEUR Workshop Proceedings, Melbourne, Florida, USA, 19 November 2003.]]
[17]
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), volume 90 of CEUR Workshop Proceedings, Melbourne, Florida, USA, 19 November 2003.]]
[18]
T. Hamrouni, S. BenYahia, and Y. Slimani. Avoiding the itemset closure computation "pitfall". In Proceedings of the 3rd International Conference on Concept Lattices and their Applications (CLA 2005), Olomouc, Czech Republic, pages 46--59, 7-9 September 2005.]]
[19]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (SIG-MOD'00), Dallas, Texas, USA, pages 1--12, May 2000.]]
[20]
J. Hipp, U. Gntzer, and G. Nakhaeizadeh. Algorithms for association rule mining - a general survey and comparison. In ACM-SIGKDD Explorations, volume 2(1), pages 58--64, July 2000.]]
[21]
M. Kryszkiewicz. Concise representation of frequent patterns based on disjunction-free generators. In Proceedings of the 1st IEEE International Conference on Data Mining (ICDM 2001), San Jose, California, USA, pages 305--312, 2001.]]
[22]
S. O. Kuznetsov and S. A. Ob"edkov. Comparing performance of algorithms for generating concept lattices. In Journal of Experimental and Theoretical Artificial Intelligence (JETAI), volume 14(2-3), pages 189--216, 2002.]]
[23]
L. Lhote, F. Rioult, and A. Soulet. Average number of frequent and closed pattern in random databases. In Proceedings of the 7th Conference Francophone d'Apprentissage Automatique (CAp 2005), Presses Universitaires de Grenoble, Nice, France, pages 345--360, 30 May - 03 June 2005.]]
[24]
C. Lucchese, S. Orlando, P. Palmerini, R. Perego, and F. Silvestri. kDCI: A multi-strategy algorithm for mining frequent sets. In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2003), volume 90 of CEUR Workshop Proceedings, Melbourne, Florida, USA, 19 November 2003.]]
[25]
C. Lucchese, S. Orlando, and R. Perego. Fast and memory efficient mining of frequent closed itemsets. In IEEE Journal Transactions on Knowledge and Data Engineering (TKDE), 18 (1):21--36, January 2006.]]
[26]
C. Lucchesse, S. Orlando, and R. Perego. DCI-CLOSED: A fast and memory efficient algorithm to mine frequent closed itemsets. In B. Goethals, M. J. Zaki, and R. Bayardo, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), volume 126 of CEUR Workshop Proceedings, Brighton, UK, 1 November 2004.]]
[27]
E. Mephu Nguifo. Gallois lattice: A framework for concept learning, design, evaluation and refinement. In Proceedings of IEEE International Conference on Tools with Artificial Intelligence (ICTAI 1994), New-Orleans, USA, pages 461--467, November 1994.]]
[28]
F. Pan, G. Cong, K. H. Tung, J. Yang, and M. J. Zaki. CARPENTER: finding closed patterns in long biological datasets. In Proceedings of the 9th ACM SIGKDD International conference on Knowledge discovery and data mining, pages 637--642, 2003.]]
[29]
N. Pasquier. Datamining: Algorithmes d'extraction et de réduction des règles d'association dans les bases de données. Thèse de doctorat, École Doctorale Sciences pour l'Ingénieur de Clermont Ferrand, Université Clermont Ferrand II, France, Janvier 2000.]]
[30]
N. Pasquier. Mining association rules using formal concept analysis. In Proceedings of the 8th International Conference on Conceptual Structures (ICCS 2000), Springer-Verlag, LNAI, volume 1867, Darmstadt, Germany, pages 259--264, August 2000.]]
[31]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemset lattices. Journal of Information Systems, 24(1):25--46, 1999.]]
[32]
N. Pasquier, Y. Bastide, R. Touil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In C. Beeri and P. Buneman, editors, Proceedings of 7th International Conference on Database Theory (ICDT 1999), LNCS, volume 1540, Springer-Verlag, Jerusalem, Israel, pages 398--416, January 1999.]]
[33]
J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of the ACM-SIGMOD International Workshop on Data Mining and Knowledge Discovery (DMKD 2000), Dallas, Texas, USA, pages 21--30, 2000.]]
[34]
R. Srikant. Fast algorithms for mining association rules and sequential patterns. Ph. D dissertation, University of Wisconsin, Madison, USA, 1996.]]
[35]
G. Stumme, R. Taouil, Y. Bastide, N. Pasquier, and L. Lakhal. Computing iceberg concept lattices with TITANIC. Journal on Knowledge and Data Engineering (KDE), 2(42):189--222, 2002.]]
[36]
T. Uno, T. Asai, Y. Uchida, and H. Arimura. An efficient algorithm for enumerating closed patterns in transaction databases. Proceedings of the 7th International Conference on Discovery Science, Padova, Italy, pages 16--31, 2-5 October 2004.]]
[37]
T. Uno, M. Kiyomi, and H. Arimura. LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In B. Goethals, M. J. Zaki, and R. Bayardo, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI 2004), volume 126 of CEUR Workshop Proceedings, Brighton, UK, 1 November 2004.]]
[38]
P. Valtchev, R. Missaoui, and P. Lebrun. A fast algorithm for building the Hasse diagram of a Galois lattice. In Proceedings of the Colloque LaCIM 2000, Montréal, Canada, pages 293--306, September 2000.]]
[39]
J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the best strategies for mining frequent closed itemsets. In Proceedings of the 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), Washington D. C., USA, pages 236--245, 24-27 August 2003.]]
[40]
R. Wille. Restructuring lattices theory: An approach based on hierarchies of concepts. I. Rival, editor, Ordered Sets, Reidel, Dordrecht-Boston, p. 445--470, 1982.]]
[41]
M. J. Zaki. Parallel and distributed association mining: A survey. In IEEE Journal Concurrency, 7 (4):14--25, October 1999.]]
[42]
M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of the 2nd SIAM International Conference on Data Mining, Arlington, Virginia, USA, pages 34--43, April 2002.]]
[43]
Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms (long version). Available at http://ai.stanford.edu/users/ronnyk/realworldassoclong-paper.pdf. Accessed on February 15th 2006.]]
[44]
Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In F. Provost and R. Srikant, editors, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, pages 401--406, August 2001.]]
[45]
J. Zhu. Efficiently mining frequent itemsets from very large databases. Ph. D thesis, University of Concordia, Montréal, Québec, Canada, September 2004.]]

Cited By

View all
  • (2023)GRAPGT: GRAdual patterns with gradualness thresholdInternational Journal of General Systems10.1080/03081079.2022.216204952:5(525-545)Online publication date: 19-Feb-2023
  • (2022)Semi-supervised consensus clustering based on closed patternsKnowledge-Based Systems10.1016/j.knosys.2021.107599235:COnline publication date: 9-Apr-2022
  • (2021)Conceptual Coverage Driven by Essential Concepts: A Formal Concept Analysis ApproachMathematics10.3390/math92126949:21(2694)Online publication date: 23-Oct-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 8, Issue 1
June 2006
104 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1147234
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2006
Published in SIGKDD Volume 8, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)GRAPGT: GRAdual patterns with gradualness thresholdInternational Journal of General Systems10.1080/03081079.2022.216204952:5(525-545)Online publication date: 19-Feb-2023
  • (2022)Semi-supervised consensus clustering based on closed patternsKnowledge-Based Systems10.1016/j.knosys.2021.107599235:COnline publication date: 9-Apr-2022
  • (2021)Conceptual Coverage Driven by Essential Concepts: A Formal Concept Analysis ApproachMathematics10.3390/math92126949:21(2694)Online publication date: 23-Oct-2021
  • (2021)Customer Choice Modelling: A Multi-Level Consensus Clustering ApproachAnnals of Emerging Technologies in Computing10.33166/AETiC.2021.02.0095:2(103-120)Online publication date: 1-Apr-2021
  • (2021)Extracting Frequent (Closed) Seasonal Gradual Patterns Using Closed Itemset Mining2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI52525.2021.00229(1442-1448)Online publication date: Nov-2021
  • (2020)A novel algorithm for searching frequent gradual patterns from an ordered data setIntelligent Data Analysis10.3233/IDA-19464424:5(1029-1042)Online publication date: 30-Sep-2020
  • (2020)Comparative evaluation of pattern mining techniques: an empirical studyComplex & Intelligent Systems10.1007/s40747-020-00226-4Online publication date: 11-Nov-2020
  • (2020)A Multi-level Consensus Clustering Framework for Customer Choice Modelling in Travel IndustryEmerging Technologies in Computing10.1007/978-3-030-60036-5_10(142-157)Online publication date: 29-Sep-2020
  • (2020)Decision support based on optimized data mining techniques: Application to mobile telecommunication companiesConcurrency and Computation: Practice and Experience10.1002/cpe.583333:1Online publication date: 22-Jun-2020
  • (2019)Exploiting the Power of Group Differences: Using Patterns to Solve Data Analysis ProblemsSynthesis Lectures on Data Mining and Knowledge Discovery10.2200/S00897ED1V01Y201901DMK01611:1(1-146)Online publication date: 22-Feb-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media