Abstract
In practical scenarios, an entity presence can depend on existence probability instead of binary situations of present or absent. This is certainly relevant for information taken in an experimental setting or with instruments, devices, and faulty methods. High-utility patterns mining (HUPM) is a collection of approaches for detecting patterns in transaction records that take into account both object count and profitability. HUPM algorithms, on the other hand, can only handle accurate data, despite the fact that extensive data obtained in real-world applications via experimental observations or sensors are frequently uncertain. To uncover interesting patterns in an inherent uncertain collection, potential high-utility pattern mining (PHUPM) is developed. This paper proposes a Spark-based potential interesting pattern mining solution to work with large amounts of uncertain data. The suggested technique effectively discovers patterns using the probability-utility-list structure. One of our highest priorities is to improve execution time while increasing parallelization and distribution of all workloads. In-depth test findings on both real and simulated databases reveal that the proposed method performs well in a Spark framework with large data collections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C.: An introduction to uncertain data algorithms and applications. In: Aggarwal, C. (ed.) Managing and Mining Uncertain Data, pp. 1–8. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-09690-2_1
Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38 (2009)
Agrawal, R., Srikant, R.: Quest Synthetic Data Generator. IBM Almaden Research Center (1994)
Ahmed, U., Lin, J.C.W., Srivastava, G., Yasin, R., Djenouri, Y.: An evolutionary model to mine high expected utility patterns from uncertain databases. IEEE Trans. Emerg. Top. Comput. Intell. 5(1), 19–28 (2020)
Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119–128 (2009)
Cai, C.H., Fu, A.W.C., Cheng, C.H., Kwong, W.W.: Mining association rules with weighted items. In: Proceedings of the International Database Engineering and Applications Symposium, IDEAS 1998 (Cat. No. 98EX156), pp. 68–77. IEEE (1998)
Chui, C.-K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0_8
Kumar, S., Mohbey, K.K.: A review on big data based parallel and distributedapproaches of pattern mining. J. King Saud Univ. - Comput. Inf. Sci. 34(5), 1639–1662 (2022). https://doi.org/10.1016/j.jksuci.2019.09.006
Kumar, S., Mohbey, K.K.: High utility pattern mining distributed algorithm based on spark RDD. In: Bhateja, V., Satapathy, S.C., Travieso-Gonzalez, C.M., Flores-Fuentes, W. (eds.) Computer Communication, Networking and IoT. LNNS, vol. 197, pp. 367–374. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0980-0_34
Mohbey, K.K., Kumar, S.: The impact of big data in predictive analytics towards technological development in cloud computing. Int. J. Eng. Syst. Model. Simul. 13(1), 61–75 (2022). https://doi.org/10.1504/IJESMS.2022.122732
Kumar, S., Mohbey, K.K.: Memory-optimized distributed utility mining for big data. J. King Saud Univ. - Comput. Inf. Sci. (2021). https://doi.org/10.1016/j.jksuci.2021.04.017
Lehrack, S., Schmitt, I.: A probabilistic interpretation for a geometric similarity measure. In: Liu, W. (ed.) ECSQARU 2011. LNCS (LNAI), vol. 6717, pp. 749–760. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22152-1_63
Leung, C.K.S., Hao, B.: Mining of frequent itemsets from streams of uncertain data. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 1663–1670. IEEE (2009)
Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 653–661. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68125-0_61
Lin, C.W., Hong, T.P.: A new mining approach for uncertain databases using CUFP trees. Expert Syst. Appl. 39(4), 4084–4093 (2012)
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Chao, H.C.: Mining weighted frequent itemsets without candidate generation in uncertain databases. Int. J. Inf. Technol. Decis. Making 16(06), 1549–1579 (2017)
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Tseng, V.S.: Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl.-Based Syst. 96, 171–187 (2016)
Lin, J.C.-W., Gan, W., Fournier-Viger, P., Hong, T.-P., Tseng, V.S.: Efficiently mining uncertain high-utility itemsets. Soft. Comput. 21(11), 2801–2820 (2016). https://doi.org/10.1007/s00500-016-2159-1
Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64 (2012)
Liu, Y., Liao, W., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_79
Mohbey, K.K., Kumar, S.: A parallel approach for high utility-based frequent pattern mining in a big data environment. Iran J. Comput. Sci. 4, 195–200 (2021)
Srivastava, G., Lin, J.C.W., Jolfaei, A., Li, Y., Djenouri, Y.: Uncertain-driven analytics of sequence data in IoCV environments. IEEE Trans. Intell. Transp. Syst. 22, 5403–5414 (2020)
Sun, L., Cheng, R., Cheung, D.W., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 273–282 (2010)
Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. arXiv preprint arXiv:1208.0292 (2012)
Wang, L., Cheng, R., Lee, S.D., Cheung, D.: Accelerating probabilistic frequent itemset mining: a model-based approach. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 429–438 (2010)
Wu, J.M.T., et al.: Mining of high-utility patterns in big IoT-based databases. Mob. Netw. Appl. 26(1), 216–233 (2021)
Zhang, B., Lin, J.C.W., Fournier-Viger, P., Li, T.: Mining of high utility-probability sequential patterns from uncertain databases. PLoS ONE 12(7), e0180931 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kumar, S., Mohbey, K.K. (2022). UBDM: Utility-Based Potential Pattern Mining over Uncertain Data Using Spark Framework. In: Balas, V.E., Sinha, G.R., Agarwal, B., Sharma, T.K., Dadheech, P., Mahrishi, M. (eds) Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT. ICETCE 2022. Communications in Computer and Information Science, vol 1591. Springer, Cham. https://doi.org/10.1007/978-3-031-07012-9_52
Download citation
DOI: https://doi.org/10.1007/978-3-031-07012-9_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07011-2
Online ISBN: 978-3-031-07012-9
eBook Packages: Computer ScienceComputer Science (R0)