Abstract
The computing power of modern high performance systems cannot be fully exploited using traditional parallel programming models. On the other hand, the growing demand for processing big data volumes requires a better control of the workflows, an efficient storage management, as well as a fault-tolerant runtime system. Trying to offer our proper solution to these problems, we designed and developed GPI-Space, a complex but flexible software development and execution platform, in which the data coordination of an application is decoupled from the programming of the algorithms. This allows the domain user to focus on the implementation of its problem only, while the fault tolerant runtime framework automatically runs the application in parallel in complex environments. We discuss the advantages and the disadvantages of our approach by comparison with the most popular MapReduce implementation, Hadoop. The tests performed on a multicore cluster with the wordcount use case showed that GPI-Space is almost three times faster than Hadoop when strictly the execution times are considered, and more than six times faster when the data loading time is also considered.
Chapter PDF
Similar content being viewed by others
References
http://www.gpi-site.com/gpi2/ (accessed October 2013)
Borthakur, D., Gray, J., Sarma, J.S., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S., Schmidt, R., Aiyer, A.: Apache hadoop goes realtime at Facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1071–1080. ACM, New York (2011)
Dean, J., Ghemawat, S.: Mapreduce: Simplified Data Processing on Large Clusters. In: Proceedings of the 6th Symposium on Opearting Systems Design and Implementation (OSDI 2004), San Francisco, CA, USA, pp. 137–150 (2004)
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns – Elements of Reusable Object-Oriented Software, 1st edn. Addison-Wesley Longman, Amsterdam (1995), 37. Reprint (2009)
Jin, H., Ibrahim, S., Qi, L., Cao, H., Wu, S., Shi, X.: The MapReduce Programming Model and Implementations. In: Buyya, R., Broberg, J., Goscinski, A. (eds.) Cloud Computing: Principles and Paradigms, pp. 373–390. John Wiley & Sons, Inc. (2011)
Lämmel, R.: Google’s MapReduce programming model - Revisited. Sci. Comput. Program. 70(1), 1–30 (2008)
Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., Moon, B.: Parallel data processing with MapReduce: a survey. SIGMOD Rec. 40(4), 11–20 (2012)
Lin, J., Ryaboy, D.: Scaling big data mining infrastructure: the twitter experience. SIGKDD Explor. Newsl. 14(2), 6–19 (2012)
Linn, J., Dyer, C.: Data-Intensive Text Processing With MapReduce. Synthesis Lectures on Human Language Technologies Series. Morgan & Claypool Publishers (2010)
Machado, R., Lojewski, C., Abreu, S., Pfreundt, F.J.: Unbalanced tree search on a manycore system using the GPI programming model. Computer Science - Research and Development 26(3-4), 229–236 (2011)
Rotaru, T., Dalheimer, M., Pfreundt, F.J.: Service-oriented middleware for financial monte carlo simulations on the cell broadband engine. Concurrency and Computation: Practice and Experience 22(5), 643–657 (2010)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 2010, pp. 1–10. IEEE Computer Society, Washington, DC (2010)
Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. SIGOPS Oper. Syst. Rev. 35(5), 230–243 (2001)
White, T.: Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale, 3rd edn. O’Reilly Media (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rotaru, T., Rahn, M., Pfreundt, FJ. (2014). MapReduce in GPI-Space. In: an Mey, D., et al. Euro-Par 2013: Parallel Processing Workshops. Euro-Par 2013. Lecture Notes in Computer Science, vol 8374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54420-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-54420-0_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54419-4
Online ISBN: 978-3-642-54420-0
eBook Packages: Computer ScienceComputer Science (R0)