research-article

Efficient algorithms for mining high-utility itemsets in uncertain databases

Authors:

Jerry Chun-Wei Lin,

Philippe Fournier-Viger,

Tzung-Pei Hong,

Vincent S. TsengAuthors Info & Claims

Knowledge-Based Systems, Volume 96, Issue C

Pages 171 - 187

https://doi.org/10.1016/j.knosys.2015.12.019

Published: 15 March 2016 Publication History

Abstract

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the information stored in databases is precise, i.e., that there is no uncertainty. But in many real-life applications, an item or itemset is not only present or absent in transactions but is also associated with an existence probability. This is especially the case for data collected experimentally or using noisy sensors. In the past, many algorithms were respectively proposed to effectively mine frequent itemsets in uncertain databases. But mining HUIs in an uncertain database has not yet been proposed, although uncertainty is commonly seen in real-world applications. In this paper, a novel framework, named potential high-utility itemset mining (PHUIM) in uncertain databases, is proposed to efficiently discover not only the itemsets with high utilities but also the itemsets with high existence probabilities in an uncertain database based on the tuple uncertainty model. The PHUI-UP algorithm (potential high-utility itemsets upper-bound-based mining algorithm) is first presented to mine potential high-utility itemsets (PHUIs) using a level-wise search. Since PHUI-UP adopts a generate-and-test approach to mine PHUIs, it suffers from the problem of repeatedly scanning the database. To address this issue, a second algorithm named PHUI-List (potential high-utility itemsets PU-list-based mining algorithm) is also proposed. This latter directly mines PHUIs without generating candidates, thanks to a novel probability-utility-list (PU-list) structure, thus greatly improving the scalability of PHUI mining. Substantial experiments were conducted on both real-life and synthetic datasets to assess the performance of the two designed algorithms in terms of runtime, number of patterns, memory consumption, and scalability.

References

[1]

Frequent itemset mining dataset repository. Available: http://fimi.ua.ac.be/data/ (2012).

[2]

C.C. Aggarwal, Managing and mining uncertain data, Manag. Min. Uncertain Data (2010).

[3]

C.C. Aggarwal, Y. Li, J. Wang, J. Wang, Frequent pattern mining with uncertain data, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 29-38.

Digital Library

[4]

C.C. Aggarwal, P.S. Yu, A survey of uncertain data algorithms and applications, IEEE Trans. Knowl. Data Eng., 21 (2009) 609-623.

Digital Library

[5]

R. Agrawal, T. Imielinski, A. Swami, Database mining: a performance perspective, IEEE Trans. Knowl. Data Eng., 5 (1993) 914-925.

Digital Library

[6]

R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large database, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1993, pp. 207-216.

Digital Library

[7]

R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: Proceedings of the International Conference on Very Large Data Bases, 1994, pp. 487-499.

[8]

R. Agrawal, R. Srikant. Quest synthetic data generator. Available: http://www.Almaden.ibm.com/cs/quest/syndata.html (1994).

[9]

C.F. Ahmed, S.K. Tanbeer, B.S. Jeong, Y.K. Le, Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans. Knowl. Data Eng., 21 (2009) 1708-1721.

Digital Library

[10]

T. Bernecker, H.P. Kriegel, M. Renz, F. Verhein, A. Zuefl, Probabilistic frequent itemset mining in uncertain databases, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 119-128.

Digital Library

[11]

R. Chan, Q. Yang, Y.D. Shen, Mining high utility itemsets, in: Proceedings of the IEEE International Conference on Data Mining, 2003, pp. 19-26.

[12]

M.S. Chen, J. Han, P.S. Yu, Data mining: an overview from a database perspective, IEEE Trans. Knowl. Data Eng., 8 (1996) 866-883.

Digital Library

[13]

C.K. Chui, B. Kao, E. Hung, Mining frequent itemsets from uncertain data, in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2007, pp. 47-58.

[14]

P. Fournier-Viger, C.W. Wu, S. Zida, V.S. Tseng, FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning, in: Proceedings of the International Symposium on Methodologies for Intelligent Systems, 8502, 2014, pp. 83-92.

[15]

G.C. Lan, T.P. Hong, V.S. Tseng, Discovery of high utility itemsets from on-shelf time periods of products, Expert Syst. Appl., 38 (2011) 5851-5857.

Digital Library

[16]

L. Geng, H.J. Hamilton, Interestingness measures for data mining: a survey, ACM Comput. Surv., 38 (2006) 1-32.

Digital Library

[17]

J. Han, J. Pei, Y. Yin, R. Mao, Mining frequent patterns without candidate generation: a frequent-pattern tree approach, Data Mining Knowl. Discov., 8 (2004) 53-87.

[18]

J.C.W. Lin, W. Gan, P. Fournier-Viger, T.P. Hong, Mining high-utility itemsets with multiple minimum utility thresholds, in: Proceedings of the ACM International Canadian Conference on Computer Science & Software Engineering, 2015, pp. 9-17.

[19]

J.C.W. Lin, W. Gan, T.P. Hong, B. Zhang, An incremental high-utility mining algorithm with transaction insertion, Sci. World J. (2015).

[20]

J.C.W. Lin, W. Gan, T.P. Hong, V.S. Tseng, Efficient algorithms for mining up-to-date high-utility patterns, Adv. Eng. Inf., 29 (2015) 648-661.

Digital Library

[21]

C.K.S. Leung, B. Hao, Mining of frequent itemsets from streams of uncertain data, IEEE Int. Conf. Data Eng. (2009) 1663-1670.

Digital Library

[22]

C.K.S. Leung, M.A.F. Mateo, D.A. Brajczuk, A tree-based approach for frequent pattern mining from uncertain data, in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2008, pp. 653-661.

Digital Library

[23]

C.W. Lin, T.P. Hong, A new mining approach for uncertain databases using cufp trees, Expert Syst. Appl., 39 (2012) 4084-4093.

Digital Library

[24]

C.W. Lin, T.P. Hong, W.H. Lu, An effective tree structure for mining high utility itemsets, Expert Syst. Appl., 38 (2011) 7419-7424.

Digital Library

[25]

C. Liu, L. Chen, C. Zhang, Summarizing probabilistic frequent patterns: a fast approach, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 527-535.

Digital Library

[26]

M. Liu, J. Qu, Mining high utility itemsets without candidate generation, in: Proceedings of the ACM International Conference on Information and Knowledge Management, 2012, pp. 55-64.

[27]

Y. Liu, W.K. Liao, A. Choudhary, A two-phase algorithm for fast discovery of high utility itemsets, in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2005, pp. 689-695.

Digital Library

[28]

Microsoft. Example database foodmart of microsoft analysis services. Available: http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx.

[29]

D. Nilesh, S. Dan, Efficient query evaluation on probabilistic databases, VLDB J., 16 (2007) 523-544.

Digital Library

[30]

P. Fournier-Viger, A. Gomariz, T. Gueniche, A. Soltani, C.W. Wu, V.S. Tseng, SPMF: a java open-source pattern mining library, J. Mach. Learn. Res., 15 (2014) 3389-3393.

Digital Library

[31]

L. Sun, R. Cheng, D.W. Cheung, J. Cheng, Mining uncertain data with probabilistic guarantees, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 273-282.

[32]

Y. Tong, L. Chen, Y. Cheng, P.S. Yu, Mining frequent itemsets over uncertain databases, VLDB Endow., 5 (2012) 1650-1661.

Digital Library

[33]

V.S. Tseng, B.E. Shie, C.W. Wu, P.S. Yu, Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Trans. Knowl. Data Eng., 25 (2013) 1772-1786.

Digital Library

[34]

V.S. Tseng, C.W. Wu, B.E. Shie, P.S. Yu, UP-growth: An efficient algorithm for high utility itemset mining, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 253-262.

[35]

L. Wang, R. Cheng, S.D. Lee, D. Cheung, Accelerating probabilistic frequent itemset mining: A model-based approach, in: Proceedings of the ACM International Conference on Information and Knowledge Managemen, 2010, pp. 429-438.

[36]

L. Wang, D.L. Cheung, R. Cheng, S.D. Lee, X.S. Yang, Efficient mining of frequent item sets on large uncertain databases, IEEE Trans. Knowl. Data Eng., 24 (2012) 2170-2183.

Digital Library

[37]

C.W. Wu, B.E. Shie, V.S. Tseng, P.S. Yu, Mining top-k high utility itemsets, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 78-86.

[38]

H. Yao, H.J. Hamilton, Mining itemset utilities from transaction databases, Data Knowl. Eng., 59 (2006) 603-626.

Digital Library

[39]

H. Yao, H.J. Hamilton, C.J. Butz, A foundational approach to mining itemset utilities from databases, in: Proceedings of the SIAM International Conference on Data Mining, 2004, pp. 211-225.

[40]

M. Zihayat, A. An, Mining top-k high utility patterns over data streams, Inf. Sci., 285 (2014) 138-161.

Digital Library

Cited By

Chen ZGan WHuang GZheng YYu P(2024)Towards utility-driven contiguous sequential patterns in uncertain multi-sequencesKnowledge-Based Systems10.1016/j.knosys.2023.111314284:COnline publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1016/j.knosys.2023.111314
Gui YGan WWu YYu P(2024)Privacy preserving rare itemset miningInformation Sciences: an International Journal10.1016/j.ins.2024.120262662:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120262
Huang SGan WMiao JHan XFournier-Viger P(2024)Targeted mining of top-k high utility itemsetsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107047126:PDOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107047
Show More Cited By

Efficient algorithms for mining high-utility itemsets in uncertain databases
1. Information systems
  1. Information systems applications

Recommendations

Mining closed high utility itemsets in uncertain databases
SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

In order to reduce the number of high-utility itemsets (HUIs), closed high-utility itemsets (CHUIs) have been proposed. However, most techniques for mining CHUIs require certain databases; i.e., there are no probabilities. However, in many real-world ...
Mining Potential High-Utility Itemsets over Uncertain Databases
ASE BD&SI '15: Proceedings of the ASE BigData & SocialInformatics 2015

Traditional high-utility itemsets mining (HUIM) incorporates the concept of utility (e.g., profit) over certain databases. However, an item or itemset is not only present or absent in the transactions but also associated with an existing probability ...
Efficiently mining uncertain high-utility itemsets

Data mining consists of deriving implicit, potentially meaningful and useful knowledge from databases such as information about the most profitable items. High-utility itemset mining (HUIM) has thus emerged as an important research topic in data mining. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Knowledge-Based Systems

Knowledge-Based Systems Volume 96, Issue C

March 2016

188 pages

ISSN:0950-7051

Issue’s Table of Contents

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 15 March 2016

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen ZGan WHuang GZheng YYu P(2024)Towards utility-driven contiguous sequential patterns in uncertain multi-sequencesKnowledge-Based Systems10.1016/j.knosys.2023.111314284:COnline publication date: 25-Jan-2024
https://dl.acm.org/doi/10.1016/j.knosys.2023.111314
Gui YGan WWu YYu P(2024)Privacy preserving rare itemset miningInformation Sciences: an International Journal10.1016/j.ins.2024.120262662:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.ins.2024.120262
Huang SGan WMiao JHan XFournier-Viger P(2024)Targeted mining of top-k high utility itemsetsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107047126:PDOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107047
Zhang CYang YDu ZGan WYu P(2023)HUSP-SP: Faster Utility Mining on Sequence DataACM Transactions on Knowledge Discovery from Data10.1145/359793518:1(1-21)Online publication date: 10-Aug-2023
https://dl.acm.org/doi/10.1145/3597935
Zhou TQi ZGan WWan SChen G(2023)Weighted Statistically Significant Pattern MiningCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587586(1276-1285)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543873.3587586
Lin QGan WWu YChen JChen C(2023)Smart System: Joint Utility and Frequency for Pattern ClassificationACM Transactions on Management Information Systems10.1145/353148013:4(1-24)Online publication date: 17-Jan-2023
https://dl.acm.org/doi/10.1145/3531480
Gan WChen GYin HFournier-Viger PChen CYu P(2022)Towards Revenue Maximization with Popular and Profitable ProductsACM/IMS Transactions on Data Science10.1145/34880582:4(1-21)Online publication date: 24-May-2022
https://dl.acm.org/doi/10.1145/3488058
Gui YGan WChen YWu Y(2022)Mining with Rarity for Web IntelligenceCompanion Proceedings of the Web Conference 202210.1145/3487553.3524708(973-981)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3487553.3524708
Wan SChen JQi ZGan WTang L(2022)Fast RFM Model for Customer SegmentationCompanion Proceedings of the Web Conference 202210.1145/3487553.3524707(965-972)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3487553.3524707
Nguyen HVo DBui HLe TVo B(2022)An efficient approach for mining weighted uncertain interesting patternsInformation Sciences: an International Journal10.1016/j.ins.2022.10.009615:C(1-23)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1016/j.ins.2022.10.009
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents