Nothing Special   »   [go: up one dir, main page]

Skip to main content

UBDM: Utility-Based Potential Pattern Mining over Uncertain Data Using Spark Framework

  • Conference paper
  • First Online:
Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT (ICETCE 2022)

Abstract

In practical scenarios, an entity presence can depend on existence probability instead of binary situations of present or absent. This is certainly relevant for information taken in an experimental setting or with instruments, devices, and faulty methods. High-utility patterns mining (HUPM) is a collection of approaches for detecting patterns in transaction records that take into account both object count and profitability. HUPM algorithms, on the other hand, can only handle accurate data, despite the fact that extensive data obtained in real-world applications via experimental observations or sensors are frequently uncertain. To uncover interesting patterns in an inherent uncertain collection, potential high-utility pattern mining (PHUPM) is developed. This paper proposes a Spark-based potential interesting pattern mining solution to work with large amounts of uncertain data. The suggested technique effectively discovers patterns using the probability-utility-list structure. One of our highest priorities is to improve execution time while increasing parallelization and distribution of all workloads. In-depth test findings on both real and simulated databases reveal that the proposed method performs well in a Spark framework with large data collections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.philippe-fournier-viger.com/spmf/.

References

  1. Aggarwal, C.C.: An introduction to uncertain data algorithms and applications. In: Aggarwal, C. (ed.) Managing and Mining Uncertain Data, pp. 1–8. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-09690-2_1

    Chapter  MATH  Google Scholar 

  2. Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38 (2009)

    Google Scholar 

  3. Agrawal, R., Srikant, R.: Quest Synthetic Data Generator. IBM Almaden Research Center (1994)

    Google Scholar 

  4. Ahmed, U., Lin, J.C.W., Srivastava, G., Yasin, R., Djenouri, Y.: An evolutionary model to mine high expected utility patterns from uncertain databases. IEEE Trans. Emerg. Top. Comput. Intell. 5(1), 19–28 (2020)

    Article  Google Scholar 

  5. Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic frequent itemset mining in uncertain databases. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 119–128 (2009)

    Google Scholar 

  6. Cai, C.H., Fu, A.W.C., Cheng, C.H., Kwong, W.W.: Mining association rules with weighted items. In: Proceedings of the International Database Engineering and Applications Symposium, IDEAS 1998 (Cat. No. 98EX156), pp. 68–77. IEEE (1998)

    Google Scholar 

  7. Chui, C.-K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 47–58. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0_8

    Chapter  Google Scholar 

  8. Kumar, S., Mohbey, K.K.: A review on big data based parallel and distributedapproaches of pattern mining. J. King Saud Univ. - Comput. Inf. Sci. 34(5), 1639–1662 (2022). https://doi.org/10.1016/j.jksuci.2019.09.006

  9. Kumar, S., Mohbey, K.K.: High utility pattern mining distributed algorithm based on spark RDD. In: Bhateja, V., Satapathy, S.C., Travieso-Gonzalez, C.M., Flores-Fuentes, W. (eds.) Computer Communication, Networking and IoT. LNNS, vol. 197, pp. 367–374. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0980-0_34

    Chapter  Google Scholar 

  10. Mohbey, K.K., Kumar, S.: The impact of big data in predictive analytics towards technological development in cloud computing. Int. J. Eng. Syst. Model. Simul. 13(1), 61–75 (2022). https://doi.org/10.1504/IJESMS.2022.122732

  11. Kumar, S., Mohbey, K.K.: Memory-optimized distributed utility mining for big data. J. King Saud Univ. - Comput. Inf. Sci. (2021). https://doi.org/10.1016/j.jksuci.2021.04.017

  12. Lehrack, S., Schmitt, I.: A probabilistic interpretation for a geometric similarity measure. In: Liu, W. (ed.) ECSQARU 2011. LNCS (LNAI), vol. 6717, pp. 749–760. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22152-1_63

    Chapter  Google Scholar 

  13. Leung, C.K.S., Hao, B.: Mining of frequent itemsets from streams of uncertain data. In: 2009 IEEE 25th International Conference on Data Engineering, pp. 1663–1670. IEEE (2009)

    Google Scholar 

  14. Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 653–661. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68125-0_61

    Chapter  Google Scholar 

  15. Lin, C.W., Hong, T.P.: A new mining approach for uncertain databases using CUFP trees. Expert Syst. Appl. 39(4), 4084–4093 (2012)

    Article  Google Scholar 

  16. Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Chao, H.C.: Mining weighted frequent itemsets without candidate generation in uncertain databases. Int. J. Inf. Technol. Decis. Making 16(06), 1549–1579 (2017)

    Article  Google Scholar 

  17. Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Tseng, V.S.: Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl.-Based Syst. 96, 171–187 (2016)

    Article  Google Scholar 

  18. Lin, J.C.-W., Gan, W., Fournier-Viger, P., Hong, T.-P., Tseng, V.S.: Efficiently mining uncertain high-utility itemsets. Soft. Comput. 21(11), 2801–2820 (2016). https://doi.org/10.1007/s00500-016-2159-1

    Article  MATH  Google Scholar 

  19. Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64 (2012)

    Google Scholar 

  20. Liu, Y., Liao, W., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_79

    Chapter  Google Scholar 

  21. Mohbey, K.K., Kumar, S.: A parallel approach for high utility-based frequent pattern mining in a big data environment. Iran J. Comput. Sci. 4, 195–200 (2021)

    Article  Google Scholar 

  22. Srivastava, G., Lin, J.C.W., Jolfaei, A., Li, Y., Djenouri, Y.: Uncertain-driven analytics of sequence data in IoCV environments. IEEE Trans. Intell. Transp. Syst. 22, 5403–5414 (2020)

    Article  Google Scholar 

  23. Sun, L., Cheng, R., Cheung, D.W., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 273–282 (2010)

    Google Scholar 

  24. Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. arXiv preprint arXiv:1208.0292 (2012)

  25. Wang, L., Cheng, R., Lee, S.D., Cheung, D.: Accelerating probabilistic frequent itemset mining: a model-based approach. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 429–438 (2010)

    Google Scholar 

  26. Wu, J.M.T., et al.: Mining of high-utility patterns in big IoT-based databases. Mob. Netw. Appl. 26(1), 216–233 (2021)

    Article  Google Scholar 

  27. Zhang, B., Lin, J.C.W., Fournier-Viger, P., Li, T.: Mining of high utility-probability sequential patterns from uncertain databases. PLoS ONE 12(7), e0180931 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunil Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, S., Mohbey, K.K. (2022). UBDM: Utility-Based Potential Pattern Mining over Uncertain Data Using Spark Framework. In: Balas, V.E., Sinha, G.R., Agarwal, B., Sharma, T.K., Dadheech, P., Mahrishi, M. (eds) Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT. ICETCE 2022. Communications in Computer and Information Science, vol 1591. Springer, Cham. https://doi.org/10.1007/978-3-031-07012-9_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-07012-9_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-07011-2

  • Online ISBN: 978-3-031-07012-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics