Abstract
HUIM (High utility itemsets mining) is a sub-division of data mining dealing with the task to obtain promising patterns in the quantitative datasets. A variant of HUIM is to discover the HAUIM (High average-utility itemsets mining) where average-utility measure is used to obtain the utility of itemsets. HAUIM is the refined version of FIM (Frequent itemset mining) problem and has various applications in the field of market basket analysis, bio-informatics, text mining, network traffic analysis, product recommendation and e-learning among others. In this paper, we provide a comprehensive survey of the state-of-the-art methods of HAUIM to mine the HAUIs (High average-utility itemsets) from the static and dynamic datasets since the induction of the HAUIM problem. We discuss the pros and cons of each category of mining approaches in detail. The taxonomy of HAUIM is presented according to the mining approaches. Finally,various extensions, future directions and research opportunities of HAUIM algorithms are discussed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Any superset of a non-frequent itemset is also non-frequent.
References
Agarwal R, Srikant R, et al. (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499
Ahmed CF, Tanbeer SK, Jeong BS (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI J 32(5):676–686
Ahmed CF, Tanbeer SK, Jeong BS, Choi HJ (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11979–11991
Alkan OK, Karagoz P (2015) Crom and huspext: Improving efficiency of high utility sequential pattern extraction. IEEE Trans Knowl Data Eng 27(10):2645–2657
Boulicaut JF, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowl Disc 7(1):5–22
Chang JH, Lee WS (2004) A sliding window method for finding recently frequent itemsets over online data streams. J Info Sci Eng 20(4):753–762
Chang JH, Lee WS (2006) Finding recently frequent itemsets adaptively over online transactional data streams. Inf Syst 31(8):849–869
Chen H, Shu L, Xia J, Deng Q (2012) Mining frequent patterns in a varying-size sliding window of online transactional data streams. Inf Sci 215:15–36
Chen L, Mei Q (2014) Mining frequent items in data stream using time fading model. Inf Sci 257:54–69
Cheung DW, Han J, Ng VT, Wong C (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the twelfth international conference on data engineering. IEEE, pp 106–114
Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: Database systems for advanced applications’ 97. World Scientific, pp 185–194
Chu C, Tseng V, Liang T (2009) An efficient algorithm for mining high utility itemsets with negative item values in large databases. Appl Math Comput 215(2):767–778
Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66
Duong QH, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Fouad MR, Elbassioni K, Bertino E (2014) A supermodularity-based differential privacy preserving algorithm for data anonymization. IEEE Trans Knowl Data Eng 26(7):1591–1601
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Pacific-asia conference on knowledge discovery and data mining. Springer, pp 40–52
Fournier-Viger P, Li Z, Lin JCW, Kiran RU, Fujita H (2019) Efficient algorithms to identify periodic patterns in multiple sequences. Inf Sci 489:205–226
Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip Rev Data Min Knowl Discov 7(4):e1207
Fournier-Viger P, Wu CW, Zida S, Tseng V (2014) Fhm: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International symposium on methodologies for intelligent systems. Springer, pp 83–92
Fournier-Viger P, Zida S (2015) Foshu: faster on-shelf high utility itemset mining–with or without negative unit profit. In: Proceedings of the 30th annual ACM symposium on applied computing, pp 857–864
Ghazikhani A, Monsefi R, Yazdi HS (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cybern 5(1):51–62
Hamrouni T, Yahia SB, Nguifo EM (2009) Sweeping the disjunctive search space towards mining new exact concise representations of frequent itemsets. Data Knowl Eng 68(10):1091–1111
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Hashem T, Ahmed CF, Samiullah M, Akther S, Jeong BS, Jeon S (2014) An efficient approach for mining cross-level closed itemsets and minimal association rules using closed itemset lattices. Expert Syst Appl 41(6):2914–2938
Hong TP, Lee CH, Wang SL (2009) An incremental mining algorithm for high average-utility itemsets. In: 2009 10Th international symposium on pervasive systems, algorithms, and networks. IEEE, pp 421–425
Hong TP, Lee CH, Wang SL (2009) Mining high average-utility itemsets. In: 2009 IEEE International conference on systems, man and cybernetics. IEEE, pp 2526–2530
Hong TP, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Hong TP, Lin CW, Wu YL (2008) Incrementally fast updated frequent pattern trees. Expert Syst Appl 34(4):2424–2435
Hong TP, Lin CW, Wu YL (2009) Maintenance of fast updated frequent pattern trees for record deletion. Comput Stat Data Anal 53(7):2485–2499
Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5(2):111–129
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report, Citeseer
Kim D, Yun U (2016) Mining high utility itemsets based on the time decaying model. Intell Data Anal 20(5):1157–1180
Kim D, Yun U (2017) Efficient algorithm for mining high average-utility itemsets in incremental transaction databases. Appl Intell 47(1):114–131
Kim J, Yun U, Yoon E, Lin JCW, Fournier-Viger P (2020) One scan based high average-utility pattern mining in static and dynamic databases. Future Generation Computer Systems
Kiran RU, Reddy PK (2011) Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms. In: Proceedings of the 14th international conference on extending database technology, pp 11–20
Koh JL, Shieh SF (2004) An efficient approach for maintaining association rules based on adjusting fp-tree structures. In: International conference on database systems for advanced applications. Springer, pp 417–424
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Krishnamoorthy S (2018) Efficient mining of high utility itemsets with multiple minimum utility thresholds. Eng Appl Artif Intell 69:112–126
Lal K, Mahanti N (2010) Mining association rules in large database by implementing pipelining technique in partition algorithm. Int J Comput Appl 2(4):33–39
Lan GC, Hong TP, Huang JP, Tseng V (2014) On-shelf utility mining with negative item values. Expert Syst Appl 41(7):3450–3459
Lan GC, Hong TP, Tseng V (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Making 11(05):1009–1030
Lan GC, Hong TP, Tseng V, Wang SL (2014) Applying the maximum utility measure in high utility sequential pattern mining. Expert Syst Appl 41(11):5071–5081
Lan GC, Hong TP, Tseng V, et al. (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209
Lan GC, Lin CW, Hong TP, Tseng V (2011) Updating high average-utility itemsets in dynamic databases. In: 2011 9Th world congress on intelligent control and automation. IEEE, pp 932–936
Lee G, Yun U, Ryang H (2015) Mining weighted erasable patterns by using underestimated constraint-based pruning technique. J Intell Fuzzy Syst 28(3):1145–1157
Lee G, Yun U, Ryu KH (2014) Sliding window based weighted maximal frequent pattern mining over data streams. Expert Syst Appl 41(2):694–708
Leung CKS, Jiang F (2011) Frequent itemset mining of uncertain data streams using the damped window model. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp 950–955
Li H (2015) On-line and dynamic time warping for time series data mining. Int J Mach Learn Cybern 6(1):145–153
Li HF (2009) Mining top-k maximal reference sequences from streaming web click-sequences with a damped sliding window. Expert Syst Appl 36(8):11304–11311
Li HF, Shan MK, Lee SY (2008) Dsm-fi: an efficient algorithm for mining frequent itemsets in data streams. Knowl Inf Syst 17(1):79–97
Li X, Zaïane O.R, Li Z (2006) Advanced data mining and applications. In: Proceedings of Second International Conference, ADMA. Springer, pp 14–16
Lin CW, Hong TP, Lan GC, Wong JW, Lin WY (2015) Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Adv Eng Inform 29(1):16–27
Lin CW, Hong TP, Lu WH (2009) The pre-fufp algorithm for incremental mining. Expert Syst Appl 36(5):9498–9505
Lin CW, Hong TP, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. In: Asian conference on intelligent information and database systems. Springer, pp 131–139
Lin CW, Lan GC, Hong TP (2012) An incremental mining algorithm for high utility itemsets. Expert Syst Appl 39(8):7173–7180
Lin JCW, Gan W, Fournier-Viger P, Hong TP (2015) Mining high-utility itemsets with multiple minimum utility thresholds. In: Proceedings of the Eighth International C* Conference on Computer Science & Software Engineering, pp 9–17
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Zhan J (2016) Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowl-Based Syst 113:100–115
Lin JCW, Li T, Fournier-Viger P, Hong TP, Su JH (2016) Efficient mining of high average-utility itemsets with multiple minimum thresholds. In: Industrial conference on data mining. Springer, pp 14–28
Lin JCW, Li T, Fournier-Viger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243
Lin JCW, Li T, Pirouz M, Zhang J, Fournier-Viger P (2020) High average-utility sequential pattern mining based on uncertain databases. Knowl Inf Syst 62(3):1199–1228
Lin JCW, Ren S, Fournier-Viger P (2018) Memu: More efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6:7593–7609
Lin JCW, Ren S, Fournier-Viger P, Hong TP (2017) Ehaupm: Efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5:12927–12940
Lin JCW, Ren S, Fournier-Viger P, Hong TP (2017) Mining of high average-utility itemsets with a tighter upper-bound model. In: Proceedings of the 4th Multidisciplinary International Social Networks Conference, pp 1–6
Lin JCW, Ren S, Fournier-Viger P, Hong TP, Su JH, Vo B (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346
Lin JCW, Ren S, Fournier-Viger P, Pan JS, Hong TP (2018) Efficiently updating the discovered high average-utility itemsets with transaction insertion. Eng Appl Artif Intell 72:136–149
Lin JCW, Ren S, Fournier-Viger P, Su JH, Vo B (2017) More efficient algorithm to mine high average-utility patterns. In: Advances in intelligent information hiding and multimedia signal processing. Springer, pp 101–110
Lin JCW, Shao Y, Fournier-Viger P, Djenouri Y, Guo X (2018) Maintenance algorithm for high average-utility itemsets with transaction deletion. Appl Intell 48(10):3691–3706
Lin JCW, Wu JMT, Fournier-Viger P, Hong TP, Li T (2019) Efficient mining of high average-utility sequential patterns from uncertain databases. In: 2019 IEEE International conference on systems, man and cybernetics (SMC). IEEE, pp 1989–1994
Lin MY, Lee SY (1998) Incremental update on sequential patterns in large databases. In: Proceedings Tenth IEEE International Conference on Tools with Artificial Intelligence (Cat. No. 98CH36294). IEEE, pp 24–31
Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 337–341
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 55–64
Liu Y, Liao WK, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on Utility-based data mining, pp 90–99
Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Pacific-asia conference on knowledge discovery and data mining. Springer, pp 689–695
Loukides G, Gkoulalas-Divanis A (2012) Utility-preserving transaction data anonymization with low information loss. Expert Syst Appl 39(10):9764–9777
Lu T, Vo B, Nguyen HT, Hong TP (2015) A new method for mining high average utility itemsets. In: IFIP International conference on computer information systems and industrial management. Springer, pp 33–42
Nguyen D, Nguyen LT, Vo B, Pedrycz W (2016) Efficient mining of class association rules with the itemset constraint. Knowl-Based Syst 103:73–88
Nguyen D, Vo B, Le B (2015) Ccar: an efficient method for mining class association rules with itemset constraints. Eng Appl Artif Intell 37:115–124
Nguyen LTT, Nguyen TD, Nguyen A, Tran PN, Trinh C, Huynh B, Vo B (2020) Efficient method for mining high-utility itemsets using high-average utility measure. In: International conference on computational collective intelligence. Springer, pp 305–315
Ozturk C, Hancer E, Karaboga D (2015) A novel binary artificial bee colony algorithm based on genetic operators. Inf Sci 297:154–170
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Phuong N, Duy ND (2017) Constructing a new algorithm for high average utility itemsets mining. In: 2017 International conference on system science and engineering (ICSSE). IEEE, pp 273–278
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126
Ryang H, Yun U (2017) Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques. Knowl Inf Syst 51(2):627–659
Ryang H, Yun U, Ryu KH (2014) Discovering high utility itemsets with multiple minimum supports. Intell Data Anal 18(6):1027–1047
Ryang H, Yun U, Ryu KH (2016) Fast algorithm for high utility pattern mining with the sum of item quantities. Intell Data Anal 20(2):395–415
Salam A, Khayal MSH (2012) Mining top- k frequent patterns without minimum support threshold. Knowl Inf Ayst 30(1):57–86
Sarda NL, Srinivas N (1998) An adaptive algorithm for incremental mining of association rules. In: Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No. 98EX130), pp 240–245. IEEE
Savasere A, Omiecinski ER, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. Technical report, Georgia Institute of Technology
Sethi KK, Ramesh D (2020) A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure. J Supercomput:1–31
Sethi KK, Ramesh D (2020) High average-utility itemset mining with multiple minimum utility threshold: a generalized approach. Eng Appl Artif Intell 96:103933
Shao J, Meng X, Cao L (2016) Mining actionable combined high utility incremental and associated patterns. In: 2016 IEEE International conference on aircraft utility systems (AUS). IEEE, pp 1164–1169
Shie BE, Philip SY, Tseng V (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst Appl 39(17):12947–12960
Singh K, Shakya HK, Biswas B (2015) An efficient approach to discovering frequent patterns from data cube using aggregation and directed graph. In: Proceedings of the Sixth International Conference on Computer and Communication Technology 2015, pp 31–35
Singh K, Shakya HK, Singh A, Biswas B (2018) Mining of high-utility itemsets with negative utility. Expert Syst 35(6):e12296
Tanbeer SK, Ahmed CF, Jeong BS, Lee YK (2009) Efficient single-pass frequent pattern mining using a prefix-tree. Inf Sci 179(5):559–583
Tanbeer SK, Ahmed CF, Jeong BS, Lee YK (2009) Sliding window-based frequent pattern mining over data streams. Inf Sci 179(22):3843–3865
Teng WG, Chen MS, Philip SY (2003) A regression-based temporal pattern mining scheme for data streams. In: Proceedings 2003 VLDB Conference. Elsevier, pp 93–104
Thilagu M, Nadarajan R (2012) Efficiently mining of effective web traversal patterns with average utility. Procedia Technol 6:444–451
Truong T, Duong H, Le B, Fournier-Viger P (2018) Efficient vertical mining of high average-utility itemsets based on novel upper-bounds. IEEE Trans Knowl Data Eng 31(2):301–314
Truong T, Duong H, Le B, Fournier-Viger P (2020) Ehausm: an efficient algorithm for high average utility sequence mining. Inf Sci 515:302–323
Truong T, Duong H, Le B, Fournier-Viger P, Yun U (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl-Based Syst 183:104847
Tseng V, Chu C, Liang T (2006) An efficient method for mining temporal emerging itemsets from data streams. In: International computer symposium (ICS), workshop on software engineering, databases and knowledge discovery, Taipei
Tseng V, Shie BE, Wu CW, Philip SY (2012) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Tseng V, Wu CW, Fournier-Viger P, Philip SY (2015) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on wit-trees. Expert Syst Appl 40(4):1256–1264
Vo B, Le T, Pedrycz W, Nguyen G, Baik SW (2017) Mining erasable itemsets with subset and superset itemset constraints. Expert Syst Appl 69:50–61
Wang J, Han J, Lu Y, Tzvetkov P (2005) Tfp: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–663
Wu CW, Shie BE, Tseng V, Yu PS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 78–86
Wu JMT, Lin JCW, Pirouz M, Fournier-Viger P (2018) New tighter upper bounds for mining high average-utility itemsets. In: Proceedings of the 2018 International Conference on Big Data and Education, pp 27–32
Wu JMT, Lin JCW, Pirouz M, Fournier-Viger P (2018) Tub-haupm: Tighter upper bound for mining high average-utility patterns. IEEE Access 6:18655–18669
Wu JMT, Teng Q, Lin JCW, Cheng CF (2020) Incrementally updating the discovered high average-utility patterns with the pre-large concept. IEEE Access 8:66788–66798
Wu JMT, Teng Q, Lin JCW, Yun U, Chen HC (2020) Updating high average-utility itemsets with pre-large concept. J Intell Fuzzy Syst (Preprint):1–10
Wu R, He Z (2018) Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48(10):3429–3445
Wu TY, Lin JCW, Shao Y, Fournier-Viger P, Hong TP (2017) Updating the discovered high average-utility patterns with transaction insertion. In: International conference on genetic and evolutionary computing. Springer, pp 66–73
YILDIRIM I, CELIK M (2018) Fimhaui: Fast incremental mining of high average-utility itemsets. In: 2018 International conference on artificial intelligence and data processing (IDAP). IEEE, pp 1–9
Yildirim I, Celik M (2019) An efficient tree-based algorithm for mining high average-utility itemset. IEEE Access 7:144245–144263
Yin J, Zheng Z, Cao L (2012) Uspan: an efficient algorithm for mining high utility sequential patterns. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 660–668
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Futur Gener Comput Syst 68:346–360
Yun U, Kim D, Ryang H, Lee G, Lee KM (2016) Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst 30(6):3605–3617
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl-Based Syst 144:188–205
Yun U, Ryang H (2015) Incremental high utility pattern mining with static and dynamic databases. Appl Intell 42(2):323–352
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 326–335
Zhang B, Lin JCW, Fournier-Viger P, Li T (2017) Mining of high utility-probability sequential patterns from uncertain databases. Plos one 12(7):e0180931
Zhang B, Lin JCW, Shao Y, Fournier-Viger P, Djenouri Y (2018) Maintenance of discovered high average-utility itemsets in dynamic databases. Appl Sci 8(5):769
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng V (2017) Efim: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, K., Kumar, R. & Biswas, B. High average-utility itemsets mining: a survey. Appl Intell 52, 3901–3938 (2022). https://doi.org/10.1007/s10489-021-02611-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02611-z