article

Free access

A data mining proxy approach for efficient frequent itemset mining

Authors:

Guimei LiuAuthors Info & Claims

The VLDB Journal — The International Journal on Very Large Data Bases, Volume 17, Issue 4

Pages 947 - 970

https://doi.org/10.1007/s00778-007-0047-0

Published: 01 July 2008 Publication History

Abstract

Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper, we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns, because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently. We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed.

References

[1]

Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth first generation of long patterns. In: Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001).

Digital Library

[2]

Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: A tree projection algorithm for generation of frequent item sets. J. Parallel Distrib. Comput. 61 (2001).

Digital Library

[3]

Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD Conference (1993).

Digital Library

[4]

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases (1994).

Digital Library

[5]

Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD Conference (1998).

Digital Library

[6]

Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM SIGMOD Conference (1997).

Digital Library

[7]

Bucila, C., Gehrke, J., Kifer, D., White, W.M.: Dualminer: a dual-pruning algorithm for itemsets with constraints. In: Proceedings of the 8th ACM SIGKDD Conference (2002).

Digital Library

[8]

Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of 2001 International Conference on Data Engineering (2001).

Digital Library

[9]

Cong, G., Liu, B.: Speed-up iterative frequent itemset mining with constraint changes. In: Proceedings of the 2002 International Conference on Data Mining (2002).

Digital Library

[10]

Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent item-sets. In: Proceedings of the 2001 IEEE International Conference on Data Mining (2001).

Digital Library

[11]

Grahne, G., Lakshmanan, L.V.S., Wang, X.: Efficient mining of constrained correlated sets. In: Proceedings of the 16th IEEE International Conference on Data Engineering (2000).

Digital Library

[12]

Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD Conference (2000).

Digital Library

[13]

Han, J., Wang, J., Lu, Y., Tzvethov, P.: Mining top-k frequent closed patterns without minimum. In: Proceedings of the 2002 IEEE International Conference on Data Mining (2002).

Digital Library

[14]

Lakshmanan, L.V.S., Leung, C.K.S., Ng, R.T.: Efficient dynamic mining of constrained frequent sets. ACM Trans. Database Systems (TODS) 28(4), (2003).

Digital Library

[15]

Lakshmanan, L.V.S., Ng, R.T., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proceedings of the 1999 ACM SIGMOD Conference (1999).

Digital Library

[16]

Lan, B., Ooi, B.C., Tan, K.L.: Efficient indexing structure for mining frequent patterns. In: Proceedings of the 18th IEEE International Conference on Data Engineering (1994).

Digital Library

[17]

Leung, C.S., Ng, R.T., Mannila, H.: A segmentation approach to optimize frequency counting. In: Proceedings of the 18th IEEE International Conference on Data Engineering (2002).

Digital Library

[18]

Li, Z., Yu, J.X., Lu, H., Xu, Y., Liu, G.: Data mining proxy: serving large number of users for efficient frequent itemset mining. In: Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2004).

[19]

Lin, D.I., Kedem, Z.M.: Pincer search: A new algorithm for discovering the maximum frequent set. In: Proceedings of 6th International Conference on Extending Database Technology, EDBT (1998).

Digital Library

[20]

Lin, J.L., Dunham, M.H.: Mining association rules: anti-skew algorithms. In: Proceedings of the 14th IEEE International Conference on Data Engineering (1998).

Digital Library

[21]

Liu, G., Lu, H., Lou, W., Xu, Y., Yu, J.X.: Ascending frequency ordered prefix-tree: efficient mining of frequent itemsets. In: Proceedings of International Conference on Database Systems for Advanced Applications'03 (2003).

Digital Library

[22]

Liu, J., Pan, Y., Wang, K., Han, J.: Mining frequent item sets by opportunistic projection. In: Proceedings of the 8th KDD Conference (2002).

Digital Library

[23]

Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: Proceedings of KDD Workshop (1994).

[24]

Morzy, T., Wojciechowski, M., Zakrzewicz, M.: Materialized data mining views. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases (2000).

Digital Library

[25]

Nag, B., Deshpande, P.M., DeWitt, D.J.: Using a knowledge cache for interactive discovery of association rules. In: Proceedings of the 5th ACM SIGKDD Conference (1999).

Digital Library

[26]

Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: Proceedings of the 1998 ACM SIGMOD Conference (1998).

Digital Library

[27]

Park, J.S., Chen, M.S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proceedings of the 1995 ACM SIGMOD Conference (1995).

Digital Library

[28]

Pei, J., Han, J.: Can we push more constraints into frequent pattern mining? In: Proceedings of the 6th ACM SIGKDD Conference (2000).

Digital Library

[29]

Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent item sets with convertible constraints. In: Proceedings of the 17th IEEE International Conference on Data Engineering (2001).

Digital Library

[30]

Pei, J., Han, J., Lu, H., Nishio, S., Shiwei Tang, D.Y.: H-mine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings of 2001 IEEE Conference on Data Mining (2001).

Digital Library

[31]

Pei, J., Han, J., Mao, R.: Closet: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2000).

[32]

Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: Proceedings of 21th International Conference on Very Large Data Bases. Morgan Kaufmann (1995).

Digital Library

[33]

Seno, M., Karypis, G.: Lpminer: An algorithm for finding frequent itemsets using length-decreasing support constraint. In: Proceedings of 1st IEEE Conference on Data Mining (2001).

Digital Library

[34]

Shenoy, P., Haritsa, J.R., Sudarshan, S., Bhalotia, G., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: Proceedings of the 2000 ACM SIGMOD Conference (2000).

Digital Library

[35]

Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proceedings of the 3rd ACM SIGKDD Conference (1997).

[36]

Toivonen, H.: Sampling large databases for association rules. In: Proceedings of the 22th International Conference on Very Large Data Bases (1996).

Digital Library

[37]

Wang, J., Han, J., Pei, J.: CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03) (2003).

Digital Library

[38]

Webb, G.I.: Efficient search for association rules. In: Proceedings of the 6th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (2000).

Digital Library

[39]

Xu, Y., Yu, J.X., Liu, G., Lu, H.: From path tree to frequent patterns: A framework for mining frequent patterns. In: Proceedings of IEEE International Conference on Data Mining (2002).

Digital Library

[40]

Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of the 3rd SAIM International Conference on Data Mining (2003).

[41]

Zaki, M.J.: Scalable algorithms for association mining. Knowl. Data Eng. 12(2), (2000).

Digital Library

[42]

Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the 9th ACM SIGKDD Conference (2003).

Digital Library

[43]

Zaki, M.J., Hsiao, C.: Charm: An efficient algorithm for closed association rule mining. Tech. rep., Rensselaer Polytechnic Institute (1999).

Cited By

Lee RWakamiya SSumiya K(2013)Urban area characterization based on crowd behavioral lifelogs over TwitterPersonal and Ubiquitous Computing10.1007/s00779-012-0510-917:4(605-620)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1007/s00779-012-0510-9
Wakamiya SLee RSumiya KLee SHanzo LIsmail RKim DChung MLee S(2012)Crowd-sourced urban life monitoringProceedings of the 6th International Conference on Ubiquitous Information Management and Communication10.1145/2184751.2184784(1-9)Online publication date: 20-Feb-2012
https://dl.acm.org/doi/10.1145/2184751.2184784
Wakamiya SLee RSumiya K(2011)Urban area characterization based on semantics of crowd activities in TwitterProceedings of the 4th international conference on GeoSpatial semantics10.5555/2008664.2008674(108-123)Online publication date: 12-May-2011
https://dl.acm.org/doi/10.5555/2008664.2008674
Show More Cited By

Index Terms

A data mining proxy approach for efficient frequent itemset mining
1. Information systems
  1. Information storage systems
    1. Record storage systems
      1. Directory structures
        B-trees
  2. Information systems applications
    1. Data mining
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Trees

Recommendations

A fast algorithm for frequent itemset mining using Patricia* structures
DaWaK'12: Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery

Efficient mining of frequent itemsets from a database plays an essential role in many data mining tasks such as association rule mining. Many algorithms use a prefix-tree to represent a database and mine frequent itemsets by constructing recursively ...
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation
ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining

In order to generate synthetic basket datasets for better benchmark testing, it is important to integrate characteristics from real-life databases into the synthetic basket datasets. The characteristics that could be used for this purpose include the ...
Frequent itemset mining using cellular learning automata

A core issue of the association rule extracting process in the data mining field is to find the frequent patterns in the database of operational transactions. If these patterns discovered, the decision making process and determining strategies in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases

The VLDB Journal — The International Journal on Very Large Data Bases Volume 17, Issue 4

July 2008

361 pages

ISSN:1066-8888

Issue’s Table of Contents

Copyright © Copyright © 2008 Springer-Verlag.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 July 2008

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
689
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)3

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee RWakamiya SSumiya K(2013)Urban area characterization based on crowd behavioral lifelogs over TwitterPersonal and Ubiquitous Computing10.1007/s00779-012-0510-917:4(605-620)Online publication date: 1-Apr-2013
https://dl.acm.org/doi/10.1007/s00779-012-0510-9
Wakamiya SLee RSumiya KLee SHanzo LIsmail RKim DChung MLee S(2012)Crowd-sourced urban life monitoringProceedings of the 6th International Conference on Ubiquitous Information Management and Communication10.1145/2184751.2184784(1-9)Online publication date: 20-Feb-2012
https://dl.acm.org/doi/10.1145/2184751.2184784
Wakamiya SLee RSumiya K(2011)Urban area characterization based on semantics of crowd activities in TwitterProceedings of the 4th international conference on GeoSpatial semantics10.5555/2008664.2008674(108-123)Online publication date: 12-May-2011
https://dl.acm.org/doi/10.5555/2008664.2008674
Wakamiya SLee RSumiya KJensen CLee WZheng YMokbel M(2011)Crowd-based urban characterizationProceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks10.1145/2063212.2063225(77-84)Online publication date: 1-Nov-2011
https://dl.acm.org/doi/10.1145/2063212.2063225

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents