Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs. The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action. Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework. This book assumes the reader will have a basic familiarity with Java, as most code examples will be written in Java. Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples.
Cited By
- Sagharichian M and Alipour Langouri M (2023). iPartition: a distributed partitioning algorithm for block-centric graph processing systems, The Journal of Supercomputing, 79:18, (21116-21143), Online publication date: 1-Dec-2023.
- Fan W, He T, Lai L, Li X, Li Y, Li Z, Qian Z, Tian C, Wang L, Xu J, Yao Y, Yin Q, Yu W, Zhou J, Zhu D and Zhu R (2021). GraphScope, Proceedings of the VLDB Endowment, 14:12, (2879-2892), Online publication date: 1-Jul-2021.
- Su W, Aurora A, Chen M and Zadok E Supporting Transactions for Bulk NFSv4 Compounds Proceedings of the 13th ACM International Systems and Storage Conference, (75-86)
- Yang C, Chen S, Liu J, Liu R and Chang C (2019). On construction of an energy monitoring service using big data technology for the smart campus, Cluster Computing, 23:1, (265-288), Online publication date: 1-Mar-2020.
- Saoudi E, Adiui El Ouadrhiri A, El Warrak O, Jai Andaloussi S and Sekkaki A Improving Content Based Video Retrieval Performance by Using Hadoop-MapReduce Model Proceedings of the 23rd Conference of Open Innovations Association FRUCT, (329-334)
- Siddiqa A, Karim A and Chang V (2018). Modeling SmallClient indexing framework for big data analytics, The Journal of Supercomputing, 74:10, (5241-5262), Online publication date: 1-Oct-2018.
- Siddiqa A, Karim A and Chang V (2018). Modeling SmallClient indexing framework for big data analytics, The Journal of Supercomputing, 74:10, (5241-5262), Online publication date: 1-Oct-2018.
- Íñiguez L, Galar M and Fernández A Improving Fuzzy Rule Based Classification Systems in Big Data via Support-based Filtering 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), (1-8)
- Ragmani A, Omri A, Abghour N, Moussaid K and Rida M An efficient load balancing strategy based on mapreduce for public cloud Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing, (1-10)
- Wang Y, Li C, Li M and Liu Z (2017). HBase storage schemas for massive spatial vector data, Cluster Computing, 20:4, (3657-3666), Online publication date: 1-Dec-2017.
- Muhammed T and Shaikh R (2017). An analysis of fault detection strategies in wireless sensor networks, Journal of Network and Computer Applications, 78:C, (267-287), Online publication date: 15-Jan-2017.
- Bansal N, Singh R and Sharma A (2017). An Insight into State-of-the-Art Techniques for Big Data Classification, International Journal of Information System Modeling and Design, 8:3, (24-42), Online publication date: 1-Jul-2017.
- Padillo F, Luna J and Ventura S An evolutionary algorithm for mining rare association rules: A Big Data approach 2017 IEEE Congress on Evolutionary Computation (CEC), (2007-2014)
- Rodríguez Fernández M, González Alonso I and Zalama Casanova E (2016). Online identification of appliances from power consumption data collected by smart meters, Pattern Analysis & Applications, 19:2, (463-473), Online publication date: 1-May-2016.
- Wang J (2016). Extracting significant pattern histories from timestamped texts using MapReduce, The Journal of Supercomputing, 72:8, (3236-3260), Online publication date: 1-Aug-2016.
- Zhang D, Chen X, Yao H and James A (2016). Moving SWAT model calibration and uncertainty analysis to an enterprise Hadoop-based cloud, Environmental Modelling & Software, 84:C, (140-148), Online publication date: 1-Oct-2016.
- Kumar M, Rath N and Rath S (2016). Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier, Journal of Biomedical Informatics, 60:C, (395-409), Online publication date: 1-Apr-2016.
- Drozd A, Gladkova A and Matsuoka S Python, performance, and natural language processing Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing, (1-10)
- Jitkajornwanich K and Elmasri R Conceptual Analysis of Big Data Using Ontologies and EER Revised Selected Papers of the First International Workshop on Machine Learning, Optimization, and Big Data - Volume 9432, (306-317)
- Phan T, D'Orazio L and Rigaux P A Theoretical and Experimental Comparison of Filter-Based Equijoins in MapReduce Transactions on Large-Scale Data- and Knowledge-Centered Systems XXV - Volume 9620, (33-70)
- López V, del Río S, Benítez J and Herrera F (2015). Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data, Fuzzy Sets and Systems, 258:C, (5-38), Online publication date: 1-Jan-2015.
- Mohammadi M, Raahemi B, Cheraghchi F, Obidallah W and Bigdeli E Big data analytics using hadoop Proceedings of 24th Annual International Conference on Computer Science and Software Engineering, (323-325)
- Lopez-Veyna J, Sosa-Sosa V and Lopez-Arevalo I (2014). A low redundancy strategy for keyword search in structured and semi-structured data, Information Sciences: an International Journal, 288:C, (135-152), Online publication date: 20-Dec-2014.
- Phan T, d'Orazio L and Rigaux P Toward intersection filter-based optimization for joins in MapReduce Proceedings of the 2nd International Workshop on Cloud Intelligence, (1-2)
- Zheng R, Liu K, Jin H, Zhang Q and Feng X Accelerate MapReduce on GPUs with multi-level reduction Proceedings of the 5th Asia-Pacific Symposium on Internetware, (1-8)
- Lin H, Liu X, Fu W and Jia K A Study on Linear Elastic FEM by Cloud Computing Proceedings of the Second International Conference on Innovative Computing and Cloud Computing, (136-142)
- Akhmed-Zaki D, Danaev N, Matkerim B and Bektemessov A Design of Distributed Parallel Computing Using by MapReduce/MPI Technology Proceedings of the 12th International Conference on Parallel Computing Technologies - Volume 7979, (139-148)
- Lopez-Veyna J, Sosa-Sosa V and Lopez-Arevalo I KESOSD Proceedings of the Third International Workshop on Keyword Search on Structured Data, (23-31)
- Witayangkurn A, Horanont T and Shibasaki R Performance comparisons of spatial data processing techniques for a large scale mobile phone dataset Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications, (1-6)
- Wang S, Chen X, Huang J and Feng S Scalable subspace logistic regression models for high dimensional data Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications, (685-694)
- Li B, Chen X, Li M, Huang J and Feng S Scalable random forests for massive data Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I, (135-146)
- Yuan G, Zheng L, Chong-Jun W, Xin-Sheng F and Jun-Yuan X Chinese medicine formula network analysis for core herbal discovery Proceedings of the 2012 international conference on Brain Informatics, (255-264)
- Stuart J, Chen C, Ma K and Owens J Multi-GPU volume rendering using MapReduce Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, (841-848)
Index Terms
- Hadoop in Action