Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient algorithms for mining high-utility itemsets in uncertain databases

Published: 15 March 2016 Publication History

Abstract

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the information stored in databases is precise, i.e., that there is no uncertainty. But in many real-life applications, an item or itemset is not only present or absent in transactions but is also associated with an existence probability. This is especially the case for data collected experimentally or using noisy sensors. In the past, many algorithms were respectively proposed to effectively mine frequent itemsets in uncertain databases. But mining HUIs in an uncertain database has not yet been proposed, although uncertainty is commonly seen in real-world applications. In this paper, a novel framework, named potential high-utility itemset mining (PHUIM) in uncertain databases, is proposed to efficiently discover not only the itemsets with high utilities but also the itemsets with high existence probabilities in an uncertain database based on the tuple uncertainty model. The PHUI-UP algorithm (potential high-utility itemsets upper-bound-based mining algorithm) is first presented to mine potential high-utility itemsets (PHUIs) using a level-wise search. Since PHUI-UP adopts a generate-and-test approach to mine PHUIs, it suffers from the problem of repeatedly scanning the database. To address this issue, a second algorithm named PHUI-List (potential high-utility itemsets PU-list-based mining algorithm) is also proposed. This latter directly mines PHUIs without generating candidates, thanks to a novel probability-utility-list (PU-list) structure, thus greatly improving the scalability of PHUI mining. Substantial experiments were conducted on both real-life and synthetic datasets to assess the performance of the two designed algorithms in terms of runtime, number of patterns, memory consumption, and scalability.

References

[1]
Frequent itemset mining dataset repository. Available: http://fimi.ua.ac.be/data/ (2012).
[2]
C.C. Aggarwal, Managing and mining uncertain data, Manag. Min. Uncertain Data (2010).
[3]
C.C. Aggarwal, Y. Li, J. Wang, J. Wang, Frequent pattern mining with uncertain data, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 29-38.
[4]
C.C. Aggarwal, P.S. Yu, A survey of uncertain data algorithms and applications, IEEE Trans. Knowl. Data Eng., 21 (2009) 609-623.
[5]
R. Agrawal, T. Imielinski, A. Swami, Database mining: a performance perspective, IEEE Trans. Knowl. Data Eng., 5 (1993) 914-925.
[6]
R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large database, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 1993, pp. 207-216.
[7]
R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: Proceedings of the International Conference on Very Large Data Bases, 1994, pp. 487-499.
[8]
R. Agrawal, R. Srikant. Quest synthetic data generator. Available: http://www.Almaden.ibm.com/cs/quest/syndata.html (1994).
[9]
C.F. Ahmed, S.K. Tanbeer, B.S. Jeong, Y.K. Le, Efficient tree structures for high utility pattern mining in incremental databases, IEEE Trans. Knowl. Data Eng., 21 (2009) 1708-1721.
[10]
T. Bernecker, H.P. Kriegel, M. Renz, F. Verhein, A. Zuefl, Probabilistic frequent itemset mining in uncertain databases, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 119-128.
[11]
R. Chan, Q. Yang, Y.D. Shen, Mining high utility itemsets, in: Proceedings of the IEEE International Conference on Data Mining, 2003, pp. 19-26.
[12]
M.S. Chen, J. Han, P.S. Yu, Data mining: an overview from a database perspective, IEEE Trans. Knowl. Data Eng., 8 (1996) 866-883.
[13]
C.K. Chui, B. Kao, E. Hung, Mining frequent itemsets from uncertain data, in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2007, pp. 47-58.
[14]
P. Fournier-Viger, C.W. Wu, S. Zida, V.S. Tseng, FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning, in: Proceedings of the International Symposium on Methodologies for Intelligent Systems, 8502, 2014, pp. 83-92.
[15]
G.C. Lan, T.P. Hong, V.S. Tseng, Discovery of high utility itemsets from on-shelf time periods of products, Expert Syst. Appl., 38 (2011) 5851-5857.
[16]
L. Geng, H.J. Hamilton, Interestingness measures for data mining: a survey, ACM Comput. Surv., 38 (2006) 1-32.
[17]
J. Han, J. Pei, Y. Yin, R. Mao, Mining frequent patterns without candidate generation: a frequent-pattern tree approach, Data Mining Knowl. Discov., 8 (2004) 53-87.
[18]
J.C.W. Lin, W. Gan, P. Fournier-Viger, T.P. Hong, Mining high-utility itemsets with multiple minimum utility thresholds, in: Proceedings of the ACM International Canadian Conference on Computer Science & Software Engineering, 2015, pp. 9-17.
[19]
J.C.W. Lin, W. Gan, T.P. Hong, B. Zhang, An incremental high-utility mining algorithm with transaction insertion, Sci. World J. (2015).
[20]
J.C.W. Lin, W. Gan, T.P. Hong, V.S. Tseng, Efficient algorithms for mining up-to-date high-utility patterns, Adv. Eng. Inf., 29 (2015) 648-661.
[21]
C.K.S. Leung, B. Hao, Mining of frequent itemsets from streams of uncertain data, IEEE Int. Conf. Data Eng. (2009) 1663-1670.
[22]
C.K.S. Leung, M.A.F. Mateo, D.A. Brajczuk, A tree-based approach for frequent pattern mining from uncertain data, in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2008, pp. 653-661.
[23]
C.W. Lin, T.P. Hong, A new mining approach for uncertain databases using cufp trees, Expert Syst. Appl., 39 (2012) 4084-4093.
[24]
C.W. Lin, T.P. Hong, W.H. Lu, An effective tree structure for mining high utility itemsets, Expert Syst. Appl., 38 (2011) 7419-7424.
[25]
C. Liu, L. Chen, C. Zhang, Summarizing probabilistic frequent patterns: a fast approach, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 527-535.
[26]
M. Liu, J. Qu, Mining high utility itemsets without candidate generation, in: Proceedings of the ACM International Conference on Information and Knowledge Management, 2012, pp. 55-64.
[27]
Y. Liu, W.K. Liao, A. Choudhary, A two-phase algorithm for fast discovery of high utility itemsets, in: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2005, pp. 689-695.
[28]
Microsoft. Example database foodmart of microsoft analysis services. Available: http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx.
[29]
D. Nilesh, S. Dan, Efficient query evaluation on probabilistic databases, VLDB J., 16 (2007) 523-544.
[30]
P. Fournier-Viger, A. Gomariz, T. Gueniche, A. Soltani, C.W. Wu, V.S. Tseng, SPMF: a java open-source pattern mining library, J. Mach. Learn. Res., 15 (2014) 3389-3393.
[31]
L. Sun, R. Cheng, D.W. Cheung, J. Cheng, Mining uncertain data with probabilistic guarantees, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 273-282.
[32]
Y. Tong, L. Chen, Y. Cheng, P.S. Yu, Mining frequent itemsets over uncertain databases, VLDB Endow., 5 (2012) 1650-1661.
[33]
V.S. Tseng, B.E. Shie, C.W. Wu, P.S. Yu, Efficient algorithms for mining high utility itemsets from transactional databases, IEEE Trans. Knowl. Data Eng., 25 (2013) 1772-1786.
[34]
V.S. Tseng, C.W. Wu, B.E. Shie, P.S. Yu, UP-growth: An efficient algorithm for high utility itemset mining, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2010, pp. 253-262.
[35]
L. Wang, R. Cheng, S.D. Lee, D. Cheung, Accelerating probabilistic frequent itemset mining: A model-based approach, in: Proceedings of the ACM International Conference on Information and Knowledge Managemen, 2010, pp. 429-438.
[36]
L. Wang, D.L. Cheung, R. Cheng, S.D. Lee, X.S. Yang, Efficient mining of frequent item sets on large uncertain databases, IEEE Trans. Knowl. Data Eng., 24 (2012) 2170-2183.
[37]
C.W. Wu, B.E. Shie, V.S. Tseng, P.S. Yu, Mining top-k high utility itemsets, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 78-86.
[38]
H. Yao, H.J. Hamilton, Mining itemset utilities from transaction databases, Data Knowl. Eng., 59 (2006) 603-626.
[39]
H. Yao, H.J. Hamilton, C.J. Butz, A foundational approach to mining itemset utilities from databases, in: Proceedings of the SIAM International Conference on Data Mining, 2004, pp. 211-225.
[40]
M. Zihayat, A. An, Mining top-k high utility patterns over data streams, Inf. Sci., 285 (2014) 138-161.

Cited By

View all
  1. Efficient algorithms for mining high-utility itemsets in uncertain databases

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Knowledge-Based Systems
    Knowledge-Based Systems  Volume 96, Issue C
    March 2016
    188 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 15 March 2016

    Author Tags

    1. High-utility itemset
    2. PU-list structure
    3. Probabilistic-based
    4. Uncertain database
    5. Upper-bound

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Towards utility-driven contiguous sequential patterns in uncertain multi-sequencesKnowledge-Based Systems10.1016/j.knosys.2023.111314284:COnline publication date: 25-Jan-2024
    • (2024)Privacy preserving rare itemset miningInformation Sciences: an International Journal10.1016/j.ins.2024.120262662:COnline publication date: 1-Mar-2024
    • (2024)Targeted mining of top-k high utility itemsetsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107047126:PDOnline publication date: 27-Feb-2024
    • (2023)HUSP-SP: Faster Utility Mining on Sequence DataACM Transactions on Knowledge Discovery from Data10.1145/359793518:1(1-21)Online publication date: 10-Aug-2023
    • (2023)Weighted Statistically Significant Pattern MiningCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587586(1276-1285)Online publication date: 30-Apr-2023
    • (2023)Smart System: Joint Utility and Frequency for Pattern ClassificationACM Transactions on Management Information Systems10.1145/353148013:4(1-24)Online publication date: 17-Jan-2023
    • (2022)Towards Revenue Maximization with Popular and Profitable ProductsACM/IMS Transactions on Data Science10.1145/34880582:4(1-21)Online publication date: 24-May-2022
    • (2022)Mining with Rarity for Web IntelligenceCompanion Proceedings of the Web Conference 202210.1145/3487553.3524708(973-981)Online publication date: 25-Apr-2022
    • (2022)Fast RFM Model for Customer SegmentationCompanion Proceedings of the Web Conference 202210.1145/3487553.3524707(965-972)Online publication date: 25-Apr-2022
    • (2022)An efficient approach for mining weighted uncertain interesting patternsInformation Sciences: an International Journal10.1016/j.ins.2022.10.009615:C(1-23)Online publication date: 1-Nov-2022
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media