UBDM: Utility-Based Potential Pattern Mining over Uncertain Data Using Spark Framework

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1591))

Included in the following conference series:

International Conference on Emerging Technologies in Computer Engineering

992 Accesses

Abstract

In practical scenarios, an entity presence can depend on existence probability instead of binary situations of present or absent. This is certainly relevant for information taken in an experimental setting or with instruments, devices, and faulty methods. High-utility patterns mining (HUPM) is a collection of approaches for detecting patterns in transaction records that take into account both object count and profitability. HUPM algorithms, on the other hand, can only handle accurate data, despite the fact that extensive data obtained in real-world applications via experimental observations or sensors are frequently uncertain. To uncover interesting patterns in an inherent uncertain collection, potential high-utility pattern mining (PHUPM) is developed. This paper proposes a Spark-based potential interesting pattern mining solution to work with large amounts of uncertain data. The suggested technique effectively discovers patterns using the probability-utility-list structure. One of our highest priorities is to improve execution time while increasing parallelization and distribution of all workloads. In-depth test findings on both real and simulated databases reveal that the proposed method performs well in a Spark framework with large data collections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Spark-based high utility itemset mining with multiple external utilities

Article 17 November 2021

High Utility Pattern Mining Distributed Algorithm Based on Spark RDD

A Survey of High Utility Pattern Mining Algorithms for Big Data

Notes

1.
http://www.philippe-fournier-viger.com/spmf/.

References

Aggarwal, C.C.: An introduction to uncertain data algorithms and applications. In: Aggarwal, C. (ed.) Managing and Mining Uncertain Data, pp. 1–8. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-09690-2_1
Chapter MATH Google Scholar
Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38 (2009)
Google Scholar
Agrawal, R., Srikant, R.: Quest Synthetic Data Generator. IBM Almaden Research Center (1994)
Google Scholar
Ahmed, U., Lin, J.C.W., Srivastava, G., Yasin, R., Djenouri, Y.: An evolutionary model to mine high expected utility patterns from uncertain databases. IEEE Trans. Emerg. Top. Comput. Intell. 5(1), 19–28 (2020)
Article Google Scholar
Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119–128 (2009)
Google Scholar
Cai, C.H., Fu, A.W.C., Cheng, C.H., Kwong, W.W.: Mining association rules with weighted items. In: Proceedings of the International Database Engineering and Applications Symposium, IDEAS 1998 (Cat. No. 98EX156), pp. 68–77. IEEE (1998)
Google Scholar
Chui, C.-K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0_8
Chapter Google Scholar
Kumar, S., Mohbey, K.K.: A review on big data based parallel and distributedapproaches of pattern mining. J. King Saud Univ. - Comput. Inf. Sci. 34(5), 1639–1662 (2022). https://doi.org/10.1016/j.jksuci.2019.09.006
Kumar, S., Mohbey, K.K.: High utility pattern mining distributed algorithm based on spark RDD. In: Bhateja, V., Satapathy, S.C., Travieso-Gonzalez, C.M., Flores-Fuentes, W. (eds.) Computer Communication, Networking and IoT. LNNS, vol. 197, pp. 367–374. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0980-0_34
Chapter Google Scholar
Mohbey, K.K., Kumar, S.: The impact of big data in predictive analytics towards technological development in cloud computing. Int. J. Eng. Syst. Model. Simul. 13(1), 61–75 (2022). https://doi.org/10.1504/IJESMS.2022.122732
Kumar, S., Mohbey, K.K.: Memory-optimized distributed utility mining for big data. J. King Saud Univ. - Comput. Inf. Sci. (2021). https://doi.org/10.1016/j.jksuci.2021.04.017
Lehrack, S., Schmitt, I.: A probabilistic interpretation for a geometric similarity measure. In: Liu, W. (ed.) ECSQARU 2011. LNCS (LNAI), vol. 6717, pp. 749–760. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22152-1_63
Chapter Google Scholar
Leung, C.K.S., Hao, B.: Mining of frequent itemsets from streams of uncertain data. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 1663–1670. IEEE (2009)
Google Scholar
Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 653–661. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68125-0_61
Chapter Google Scholar
Lin, C.W., Hong, T.P.: A new mining approach for uncertain databases using CUFP trees. Expert Syst. Appl. 39(4), 4084–4093 (2012)
Article Google Scholar
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Chao, H.C.: Mining weighted frequent itemsets without candidate generation in uncertain databases. Int. J. Inf. Technol. Decis. Making 16(06), 1549–1579 (2017)
Article Google Scholar
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Tseng, V.S.: Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl.-Based Syst. 96, 171–187 (2016)
Article Google Scholar
Lin, J.C.-W., Gan, W., Fournier-Viger, P., Hong, T.-P., Tseng, V.S.: Efficiently mining uncertain high-utility itemsets. Soft. Comput. 21(11), 2801–2820 (2016). https://doi.org/10.1007/s00500-016-2159-1
Article MATH Google Scholar
Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64 (2012)
Google Scholar
Liu, Y., Liao, W., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_79
Chapter Google Scholar
Mohbey, K.K., Kumar, S.: A parallel approach for high utility-based frequent pattern mining in a big data environment. Iran J. Comput. Sci. 4, 195–200 (2021)
Article Google Scholar
Srivastava, G., Lin, J.C.W., Jolfaei, A., Li, Y., Djenouri, Y.: Uncertain-driven analytics of sequence data in IoCV environments. IEEE Trans. Intell. Transp. Syst. 22, 5403–5414 (2020)
Article Google Scholar
Sun, L., Cheng, R., Cheung, D.W., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 273–282 (2010)
Google Scholar
Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. arXiv preprint arXiv:1208.0292 (2012)
Wang, L., Cheng, R., Lee, S.D., Cheung, D.: Accelerating probabilistic frequent itemset mining: a model-based approach. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 429–438 (2010)
Google Scholar
Wu, J.M.T., et al.: Mining of high-utility patterns in big IoT-based databases. Mob. Netw. Appl. 26(1), 216–233 (2021)
Article Google Scholar
Zhang, B., Lin, J.C.W., Fournier-Viger, P., Li, T.: Mining of high utility-probability sequential patterns from uncertain databases. PLoS ONE 12(7), e0180931 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Central University of Rajasthan, Ajmer, India
Sunil Kumar & Krishna Kumar Mohbey

Authors

Sunil Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Krishna Kumar Mohbey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sunil Kumar .

Editor information

Editors and Affiliations

Aurel Vlaicu University of Arad, Arad, Romania
Valentina E. Balas
Myanmar Institute of Information Technology, Mandalay, Myanmar
G. R. Sinha
Indian Institute of Information Technology Kota, Jaipur, India
Basant Agarwal
Shobhit University, Gangoh, India
Tarun Kumar Sharma
SKIT, Jaipur, India
Pankaj Dadheech
SKIT, Jaipur, India
Mehul Mahrishi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, S., Mohbey, K.K. (2022). UBDM: Utility-Based Potential Pattern Mining over Uncertain Data Using Spark Framework. In: Balas, V.E., Sinha, G.R., Agarwal, B., Sharma, T.K., Dadheech, P., Mahrishi, M. (eds) Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT. ICETCE 2022. Communications in Computer and Information Science, vol 1591. Springer, Cham. https://doi.org/10.1007/978-3-031-07012-9_52

Download citation

DOI: https://doi.org/10.1007/978-3-031-07012-9_52
Published: 26 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07011-2
Online ISBN: 978-3-031-07012-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics