Nothing Special   »   [go: up one dir, main page]

skip to main content
article
Free access

A data mining proxy approach for efficient frequent itemset mining

Published: 01 July 2008 Publication History

Abstract

Data mining has attracted a lot of research efforts during the past decade. However, little work has been reported on the efficiency of supporting a large number of users who issue different data mining queries periodically when there are new needs and when data is updated. Our work is motivated by the fact that the pattern-growth method is one of the most efficient methods for frequent pattern mining which constructs an initial tree and mines frequent patterns on top of the tree. In this paper, we present a data mining proxy approach that can reduce the I/O costs to construct an initial tree by utilizing the trees that have already been resident in memory. The tree we construct is the smallest for a given data mining query. In addition, our proxy approach can also reduce CPU cost in mining patterns, because the cost of mining relies on the sizes of trees. The focus of the work is to construct an initial tree efficiently. We propose three tree operations to construct a tree. With a unique coding scheme, we can efficiently project subtrees from on-disk trees or in-memory trees. Our performance study indicated that the data mining proxy significantly reduces the I/O cost to construct trees and CPU cost to mine patterns over the trees constructed.

References

[1]
Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: Depth first generation of long patterns. In: Proceedings of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001).
[2]
Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: A tree projection algorithm for generation of frequent item sets. J. Parallel Distrib. Comput. 61 (2001).
[3]
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD Conference (1993).
[4]
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases (1994).
[5]
Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD Conference (1998).
[6]
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the 1997 ACM SIGMOD Conference (1997).
[7]
Bucila, C., Gehrke, J., Kifer, D., White, W.M.: Dualminer: a dual-pruning algorithm for itemsets with constraints. In: Proceedings of the 8th ACM SIGKDD Conference (2002).
[8]
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: A maximal frequent itemset algorithm for transactional databases. In: Proceedings of 2001 International Conference on Data Engineering (2001).
[9]
Cong, G., Liu, B.: Speed-up iterative frequent itemset mining with constraint changes. In: Proceedings of the 2002 International Conference on Data Mining (2002).
[10]
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent item-sets. In: Proceedings of the 2001 IEEE International Conference on Data Mining (2001).
[11]
Grahne, G., Lakshmanan, L.V.S., Wang, X.: Efficient mining of constrained correlated sets. In: Proceedings of the 16th IEEE International Conference on Data Engineering (2000).
[12]
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD Conference (2000).
[13]
Han, J., Wang, J., Lu, Y., Tzvethov, P.: Mining top-k frequent closed patterns without minimum. In: Proceedings of the 2002 IEEE International Conference on Data Mining (2002).
[14]
Lakshmanan, L.V.S., Leung, C.K.S., Ng, R.T.: Efficient dynamic mining of constrained frequent sets. ACM Trans. Database Systems (TODS) 28(4), (2003).
[15]
Lakshmanan, L.V.S., Ng, R.T., Han, J., Pang, A.: Optimization of constrained frequent set queries with 2-variable constraints. In: Proceedings of the 1999 ACM SIGMOD Conference (1999).
[16]
Lan, B., Ooi, B.C., Tan, K.L.: Efficient indexing structure for mining frequent patterns. In: Proceedings of the 18th IEEE International Conference on Data Engineering (1994).
[17]
Leung, C.S., Ng, R.T., Mannila, H.: A segmentation approach to optimize frequency counting. In: Proceedings of the 18th IEEE International Conference on Data Engineering (2002).
[18]
Li, Z., Yu, J.X., Lu, H., Xu, Y., Liu, G.: Data mining proxy: serving large number of users for efficient frequent itemset mining. In: Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2004).
[19]
Lin, D.I., Kedem, Z.M.: Pincer search: A new algorithm for discovering the maximum frequent set. In: Proceedings of 6th International Conference on Extending Database Technology, EDBT (1998).
[20]
Lin, J.L., Dunham, M.H.: Mining association rules: anti-skew algorithms. In: Proceedings of the 14th IEEE International Conference on Data Engineering (1998).
[21]
Liu, G., Lu, H., Lou, W., Xu, Y., Yu, J.X.: Ascending frequency ordered prefix-tree: efficient mining of frequent itemsets. In: Proceedings of International Conference on Database Systems for Advanced Applications'03 (2003).
[22]
Liu, J., Pan, Y., Wang, K., Han, J.: Mining frequent item sets by opportunistic projection. In: Proceedings of the 8th KDD Conference (2002).
[23]
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: Proceedings of KDD Workshop (1994).
[24]
Morzy, T., Wojciechowski, M., Zakrzewicz, M.: Materialized data mining views. In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases (2000).
[25]
Nag, B., Deshpande, P.M., DeWitt, D.J.: Using a knowledge cache for interactive discovery of association rules. In: Proceedings of the 5th ACM SIGKDD Conference (1999).
[26]
Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained association rules. In: Proceedings of the 1998 ACM SIGMOD Conference (1998).
[27]
Park, J.S., Chen, M.S., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proceedings of the 1995 ACM SIGMOD Conference (1995).
[28]
Pei, J., Han, J.: Can we push more constraints into frequent pattern mining? In: Proceedings of the 6th ACM SIGKDD Conference (2000).
[29]
Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent item sets with convertible constraints. In: Proceedings of the 17th IEEE International Conference on Data Engineering (2001).
[30]
Pei, J., Han, J., Lu, H., Nishio, S., Shiwei Tang, D.Y.: H-mine: Hyper-structure mining of frequent patterns in large databases. In: Proceedings of 2001 IEEE Conference on Data Mining (2001).
[31]
Pei, J., Han, J., Mao, R.: Closet: an efficient algorithm for mining frequent closed itemsets. In: Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2000).
[32]
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: Proceedings of 21th International Conference on Very Large Data Bases. Morgan Kaufmann (1995).
[33]
Seno, M., Karypis, G.: Lpminer: An algorithm for finding frequent itemsets using length-decreasing support constraint. In: Proceedings of 1st IEEE Conference on Data Mining (2001).
[34]
Shenoy, P., Haritsa, J.R., Sudarshan, S., Bhalotia, G., Bawa, M., Shah, D.: Turbo-charging vertical mining of large databases. In: Proceedings of the 2000 ACM SIGMOD Conference (2000).
[35]
Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: Proceedings of the 3rd ACM SIGKDD Conference (1997).
[36]
Toivonen, H.: Sampling large databases for association rules. In: Proceedings of the 22th International Conference on Very Large Data Bases (1996).
[37]
Wang, J., Han, J., Pei, J.: CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-03) (2003).
[38]
Webb, G.I.: Efficient search for association rules. In: Proceedings of the 6th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (2000).
[39]
Xu, Y., Yu, J.X., Liu, G., Lu, H.: From path tree to frequent patterns: A framework for mining frequent patterns. In: Proceedings of IEEE International Conference on Data Mining (2002).
[40]
Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of the 3rd SAIM International Conference on Data Mining (2003).
[41]
Zaki, M.J.: Scalable algorithms for association mining. Knowl. Data Eng. 12(2), (2000).
[42]
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the 9th ACM SIGKDD Conference (2003).
[43]
Zaki, M.J., Hsiao, C.: Charm: An efficient algorithm for closed association rule mining. Tech. rep., Rensselaer Polytechnic Institute (1999).

Cited By

View all
  • (2013)Urban area characterization based on crowd behavioral lifelogs over TwitterPersonal and Ubiquitous Computing10.1007/s00779-012-0510-917:4(605-620)Online publication date: 1-Apr-2013
  • (2012)Crowd-sourced urban life monitoringProceedings of the 6th International Conference on Ubiquitous Information Management and Communication10.1145/2184751.2184784(1-9)Online publication date: 20-Feb-2012
  • (2011)Urban area characterization based on semantics of crowd activities in TwitterProceedings of the 4th international conference on GeoSpatial semantics10.5555/2008664.2008674(108-123)Online publication date: 12-May-2011
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 17, Issue 4
July 2008
361 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 July 2008

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2013)Urban area characterization based on crowd behavioral lifelogs over TwitterPersonal and Ubiquitous Computing10.1007/s00779-012-0510-917:4(605-620)Online publication date: 1-Apr-2013
  • (2012)Crowd-sourced urban life monitoringProceedings of the 6th International Conference on Ubiquitous Information Management and Communication10.1145/2184751.2184784(1-9)Online publication date: 20-Feb-2012
  • (2011)Urban area characterization based on semantics of crowd activities in TwitterProceedings of the 4th international conference on GeoSpatial semantics10.5555/2008664.2008674(108-123)Online publication date: 12-May-2011
  • (2011)Crowd-based urban characterizationProceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks10.1145/2063212.2063225(77-84)Online publication date: 1-Nov-2011

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media