An Efficient MapReduce-Based Apriori-Like Algorithm for Mining Frequent Itemsets from Big Data

Ching-Ming Chao¹⁹,
Po-Zung Chen²⁰,
Shih-Yang Yang²¹ &
…
Cheng-Hung Yen¹⁹

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 264))

Included in the following conference series:

International Wireless Internet Conference

803 Accesses

Abstract

Data mining can discover valuable information from large amounts of data so as to utilize this information to enhance personal or organizational competitiveness. Apriori is a classic algorithm for mining frequent itemsets. Recently, with rapid growth of the Internet as well as fast development of information and communications technology, the amount of data is augmented in an explosive fashion at a speed of tens of petabytes per day. These rapidly expensive data are characterized by huge amount, high speed, continuous arrival, real-time, and unpredictability. Traditional data mining algorithms are not applicable. Therefore, big data mining has become an important research issue.

Clouding computing is a key technique for big data. In this paper, we study the issue of applying cloud computing to mining frequent itemsets from big data. We propose a MapReduce-based Apriori-like frequent itemset mining algorithm called Apriori-MapReduce (abbreviated as AMR). The salient feature of AMR is that it deletes the items of itemsets lower than the minimum support from the transaction database. In such a way, it can greatly reduce the generation of candidate itemsets to avoid a memory shortage and an overload to I/O and CPU, so that a better mining efficiency can be achieved. Empirical studies show that the processing efficiency of the AMR algorithm is superior to that of another efficient MapReduce-based Apriori algorithm under various minimum supports and numbers of transactions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hybrid Version of Apriori Using MapReduce

Finding efficiencies in frequent pattern mining from big uncertain data

Article 06 September 2016

On using MapReduce to scale algorithms for Big Data analytics: a case study

Article Open access 30 November 2019

References

Agarwal R., Srikant, R.: Fast algorithms for mining association rules in large database. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499, Santiago de Chile (1994)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM SIGMOD-SIGACT-AIGART Symposium on Principles of Database Systems, pp. 1–16, Madison, WI, June 2002
Google Scholar
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute, New York (2011)
Google Scholar
Turner, V., Gantz, J.F., Reinsel, D., Minton, S.: The digital universe of opportunities: rich data and the increasing value of the internet of things. In: International Data Corporation, White Paper, IDC_1672, May 2014
Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Towards an adaptive approach for mining data streams in resource constrained environments. In: Proceedings of the 2004 International Conference on Data Warehousing and Knowledge Discovery, pp. 189–198, Zaragoza, Spain, September 2004
Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM Sigmod Record 34(2), 18–26 (2005)
Article Google Scholar
Golab, L., Ozsu, T.M.: Issues in data stream management. ACM Sigmod Record 32(2), 5–14 (2003)
Article Google Scholar
Wang, F., Ercegovac, V., Syeda-Mahmood, T., et al.: Large-scale multimodal mining for healthcare with MapReduce. In: Proceedings of the 1st ACM International Health Informatics Symposium, pp. 479–483, New York, November 2010
Google Scholar
Lin, R.C.H., Liao, H.J., Tung, K.Y., Lin, Y.C., Wu, S.L.: Network traffic analysis with cloud platform. J. Internet Technol. 13(6), 953–961 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Management, Soochow University, Taipei, 100, Taiwan
Ching-Ming Chao & Cheng-Hung Yen
Department of Computer Science and Information Engineering, Tamkang University, Taipei, Taiwan
Po-Zung Chen
Department of Media Art and Management of Information System, University of Kang Ning, Taipei, Taiwan
Shih-Yang Yang

Authors

Ching-Ming Chao
View author publications
You can also search for this author in PubMed Google Scholar
Po-Zung Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Yang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Hung Yen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shih-Yang Yang .

Editor information

Editors and Affiliations

National Taiwan University of Science and Technology (NTUST), Taipei, Taiwan
Jiann-Liang Chen
National Taiwan Normal University, Taipei, Taiwan
Ai-Chun Pang
Department of Industrial Engineering and Management, National Changhua University of Education, Changhua, Taiwan
Der-Jiunn Deng
Department of Industrial Engineering and Management, National Chiao Tung University, Hsinchu, Taiwan
Chun-Cheng Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chao, CM., Chen, PZ., Yang, SY., Yen, CH. (2019). An Efficient MapReduce-Based Apriori-Like Algorithm for Mining Frequent Itemsets from Big Data. In: Chen, JL., Pang, AC., Deng, DJ., Lin, CC. (eds) Wireless Internet. WICON 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 264. Springer, Cham. https://doi.org/10.1007/978-3-030-06158-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-06158-6_8
Published: 05 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06157-9
Online ISBN: 978-3-030-06158-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics