Abstract
Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Mining such itemsets from a transaction database involves finding those itemsets with utility above a user-specified threshold. In this paper, we propose an efficient concurrent algorithm, called CHUI-Mine (Concurrent High Utility Itemsets Mine), for mining high utility itemsets by dynamically pruning the tree structure. A tree structure, called the CHUI-Tree, is introduced to capture the important utility information of the candidate itemsets. By recording changes in support counts of candidate high utility items during the tree construction process, we implement dynamic CHUI-Tree pruning, and discuss the rationality thereof. The CHUI-Mine algorithm makes use of a concurrent strategy, enabling the simultaneous construction of a CHUI-Tree and the discovery of high utility itemsets. Our algorithm reduces the problem of huge memory usage for tree construction and traversal in tree-based algorithms for mining high utility itemsets. Extensive experimental results show that the CHUI-Mine algorithm is both efficient and scalable.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adnan M, Alhajj R (2009) DRFP-tree: disk-resident frequent pattern tree. Appl Intell 30(2):84–97
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference very large data bases (VLDB’94), pp 487–499
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern. IEEE Trans Knowl Data Eng 21(12):1708–1721
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198
Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Discov 7(2):153–185
Chan R, Yang Q, Shen Y-D (2003) Mining high utility itemsets. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM’03), pp 19–26
Erwin A, Gopalan RP, Achuthan NR (2007) CTU-Mine: an efficient high utility itemset mining algorithm using the pattern growth approach. In: Proceedings of the 7th IEEE international conference on computer and information technology (CIT’07), pp 71–76
Erwin A, Gopalan RP, Achuthan NR (2007) A bottom-up projection based algorithm for mining high utility itemsets. In: Proceedings of the 2nd international workshop on integrating artificial intelligence and data mining (AIDM’07), pp 3–10
Erwin A, Gopalan RP, Achuthan NR (2008) Efficient mining of high utility itemsets from large datasets. In: Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD’08), pp 554–561
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86
Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Hu J, Mojsilovic A (2007) High-utility pattern mining: a method for discovery of high-utility item sets. Pattern Recognit 40(11):3317–3324
Kaya M, Alhajj R (2008) Online mining of fuzzy multidimensional weighted association rules. Appl Intell 29(1):13–34
Lee C-H (2007) IMSP: an information theoretic approach for multi-dimensional sequential pattern mining. Appl Intell 26(3):231–242
Lan G-C, Hong T-P, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209
Li H-F, Huang H-Y, Lee S-Y (2011) Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl Inf Syst 28(3):495–522
Li H-F (2011) MHUI-max: an efficient algorithm for discovering high-utility itemsets from data streams. J Inf Sci 37(5):532–545
Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Lin C-W, Hong T-P, Lu W-H (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424
Liu Y, Liao W-K, Choudhary AN (2005) A two phase algorithm for fast discovery of high utility of itemsets. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining (PAKDD’05), pp 689–695
Maunz A, Helma C, Kramer S (2011) Efficient mining for structurally diverse subgraph patterns in large molecular databases. Mach Learn 83(2):193–218
Piatetsky-Shapiro G (2007) Data mining and knowledge discovery 1996 to 2005: overcoming the hype and moving from “university” to “business” and “analytics”. Data Min Knowl Discov 15(1):99–105
Rauch J (2005) Logic of association rules. Appl Intell 22(1):9–28
Roscoe AW (2010) Understanding concurrent systems. Springer, London
Shie B-E, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435
Song W, Yang BR, Xu ZY (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl-Based Syst 21(6):507–513
Tatti N, Cule B (2012) Mining closed strict episodes. Data Min Knowl Discov 25(1):34–66
Tseng M-C, Lin Y-Y, Jeng R (2008) Updating generalized association rules with evolving taxonomies. Appl Intell 29(3):306–320
Tseng VS, Wu C-W, Shie B-E, Yu PS (2010) UP-Growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10), pp 253–262
IBM data generator. http://www.cs.loyola.edu/~cgiannel/assoc_gen.html
Frequent Itemset mining implementations repository. http://fimi.ua.ac.be/
Wang S-L, Patel D, Jafari A, Hong T-P (2007) Hiding collaborative recommendation association rules. Appl Intell 27(1):67–77
Wang Y-T, Cheng J-T (2011) Mining periodic movement patterns of mobile phone users based on an efficient sampling approach. Appl Intell 35(1):32–40
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 4th SIAM international conference on data mining (SDM’04), pp 482–486
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Yu G, Shao S, Luo B, Zeng X (2009) A hybrid method for high-utility itemsets mining in large high-dimensional data. Int J Data Warehous Min 5(1):57–73
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), pp 326–335
Zhang S, Chen F, Wu X, Zhang C, Wang R (2012) Mining bridging rules between conceptual clusters. Appl Intell 36(1):108–118
Acknowledgements
We would like to express our deep gratitude to the anonymous reviewers of this paper. The work is partly supported by the National Natural Science Foundation of China (61105045), Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality (PHR201108057), and North China University of Technology (CCXZ201303).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, W., Liu, Y. & Li, J. Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40, 29–43 (2014). https://doi.org/10.1007/s10489-013-0443-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0443-7