Nothing Special   »   [go: up one dir, main page]

Skip to main content

Parallel Implementation of Chi2 Algorithm in MapReduce Framework

  • Conference paper
  • First Online:
Human Centered Computing (HCC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8944))

Included in the following conference series:

Abstract

The discretization of continuous attributes is an important pre-processing step for machine learning and data mining. How to efficiently process the discretization of continuous attributes of massive data has become an urgent problem to be resolved. Hadoop as a rising technique in recent years can efficiently process many applications based on massive data. This paper designs and implements a parallel Chi2-based discretization algorithm based on MapReduce model. On the premise of the discretization efficiency, experiments have been done by using different size of data sets in the different nodes. The experimental results show that the proposed algorithm has high efficiency and good scalability to process the discretization of continuous attributes of massive data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Kurgan, L.A., Cios, K.J.: CAIM discretization algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2), 145–153 (2004)

    Article  Google Scholar 

  2. Mittal, A., Cheong, L.: Employing discrete Bayes error rate for discretization and feature selection tasks. In: Proceedings of the 1st IEEE International Conference on Data Mining (ICDM 2002), pp. 298–305 (2002)

    Google Scholar 

  3. Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Journal of Data Mining and Knowledge Discovery 6(4), 393–423 (2002)

    Article  MathSciNet  Google Scholar 

  4. Tsai, C.J., Lee, C.I., Yang, W.P.: A discretization algorithm based onclass-attribute contingency coefficient. Information Sciences 178, 714–731 (2008)

    Article  Google Scholar 

  5. Wong, A.K.C., Chiu, D.K.Y.: Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 9, 796–805 (1987)

    Article  Google Scholar 

  6. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceeding of Thirteenth International Conference on Artificial Intelligence, pp. 1022–1027 (1993)

    Google Scholar 

  7. Ching, J.Y., Wong, A.K.C., Chan, K.C.C.: Class-dependent discretization for inductive learning from continuous and mixed mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(7), 641–651 (1995)

    Article  Google Scholar 

  8. Kurgan, L., Cios, K.J.: Fast class-attribute interdependence maximization (CAIM) discretization algorithm. In: Proceeding of International Conference on Machine Learning and Applications, pp. 30–36 (2003)

    Google Scholar 

  9. Kerber, R.: ChiMerge: discretization of numeric attributes. In: Proceeding of Ninth International Conference on Artificial Intelligence, pp. 123–128 (1992)

    Google Scholar 

  10. Liu, H., Setiono, R.: Feature selection via discretization. IEEE Transactions on Knowledge and Data Engineering 9(4), 642–645 (1997)

    Article  Google Scholar 

  11. Tay, F., Shen, L.: A modified chi2 algorithm for discretization. IEEE Transactions on Knowledge and Data Engineering 14(3), 666–670 (2002)

    Article  Google Scholar 

  12. Su, C.T., Hsu, J.H.: An extended chi2 algorithm for discretization of real value attributes. IEEE Transactions on Knowledge and Data Engineering 17(3), 437–441 (2005)

    Article  Google Scholar 

  13. Dean, J., Ghemawat, S.: Mapreduce: simplied data processing on large clusters. In: The 6th Symposium on Operating System Design and Implementation (OSDI 2004), San Francisco, USA, pp. 137–150 (2004)

    Google Scholar 

  14. Qian, J., Miao, D., Zhang, Z., Yue, X.: Parallel attribute reduction algorithms using MapReduce. Information Sciences 279, 671–690 (2014)

    Article  MathSciNet  Google Scholar 

  15. Alham, N.K., Li, M., Liu, Y., Qi, M.: A MapReduce-based distributed SVM ensemble for scalable image classification and annotation. Computers & Mathematics with Applications 66(10), 1920–1934 (2013)

    Article  Google Scholar 

  16. Chen, J., Zheng, G., Chen, H.: ELM-MapReduce: MapReduce accelerated extreme learning machine for big spatial data analysis. In: Proceedings of the 10th IEEE International Conference on Control and Automation (ICCA), pp. 400–405 (2013)

    Google Scholar 

  17. Hadoop. Apache Software Foundation. http://hadoop.apache.org

  18. Frank, A., Asuncion, A: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2010). http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, Y., Yu, J., Wang, J. (2015). Parallel Implementation of Chi2 Algorithm in MapReduce Framework. In: Zu, Q., Hu, B., Gu, N., Seng, S. (eds) Human Centered Computing. HCC 2014. Lecture Notes in Computer Science(), vol 8944. Springer, Cham. https://doi.org/10.1007/978-3-319-15554-8_83

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15554-8_83

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15553-1

  • Online ISBN: 978-3-319-15554-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics