Abstract
Cloud Computing introduces a novel computing paradigm that allows the users to run their applications on a customized environment using on-demand resources. This novel computing concept is enabled by several technologies including the Web, virtualization, distributed file systems as well as parallel programming models. For parallel computing on the Cloud, MapReduce is currently the first choice for Cloud providers to deliver data analysis services because this model is specially designed for data-intensive applications while a Cloud centre is actually also a data centre hosting a huge amount of data usually in Petascale. The current deployment of MapReduce on the Cloud, however, follows the traditional execution model of MapReduce that needs the support of a cluster manager. This means that the single virtual machines created on the Cloud have to be organized into a cluster in order to be capable of running a MapReduce application. This is not only a burden for system management but also prohibits inter-Cloud computing that can involve the resources of different Clouds to solve large problems with big data or distributed data. We developed a software framework for individual virtual machines to execute a MapReduce application in a parallel/collaborative way without the necessity of installing a middleware or specific software package for system management. A focus of this research work is a Single-Sign-On (SSON) mechanism that enables the remote access to the individual machines. We validated the SSON mechanism together with the entire MapReduce framework using a private Cloud. Experimental results show both the functionality and the feasibility of our approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alhamazani K, Ranjan R, Mitra K, Rabhi F, Khan S.U, Guabtni A, Bhatnagar V (2013) An overview of the commercial cloud monitoring tools: research dimensions, design issues, and state-of-the-art. CoRR. http://arxiv.org/abs/1312.6170
Amazon (2013) Amazon elastic compute cloud. http://aws.amazon.com/ec2/
Bing T, Moca M, Chevalier S, Haiwu H, Fedak G (2010) Towards MapReduce for desktop grid computing. In: Proceedings of the international conference on P2P, parallel, grid, cloud and internet computing, pp 193–200
Chandra R, Dagum L, Kohr D, Maydan D, McDonald J, Menon R (2001) Parallel programming in OpenMP. Morgan Kaufmann, Los Altos, CA. ISBN:1-55860-671-8
Chen D, Li D, Xiong M, Bao H, Li X (2010) GPGPU-aided ensemble empirical mode decomposition for EEG analysis during anaesthesia. IEEE Trans Inf Technol BioMed 14(6):1417–1427
Chen D, Wang L, Ouyang G, Li X (2011) Massively parallel neural signal processing on a many-core platform. IEEE/AIP Mag Comput Sci Eng 13(6):42–51
Chen D, Wang L, Wu X, Chen J, Khan S, Kolodziej J, Tian M, Huang F, Liu W (2013) Hybrid modelling and simulation of huge crowd over a hierarchical grid architecture. Futur Gener Comput Syst 29(5):1309–1317
Costa F, Silva L, Dahlin M (2011) Volunteer cloud computing: MapReduce over the Internet. In: Proceedings of the IEEE international symposium on parallel and distributed processing workshops and Phd Forum, pp 1855–1862
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. J ACM Commun 51(1):107–113
Dou A, Kalogeraki V, Gunopulos D, Mielikainen T, Tuulos V.H (2010) Misco: a mapreduce framework for mobile systems. In: Proceedings of the 3rd international conference on pervasive technologies related to assistive environments
Fedak G, He H, Cappello F (2008) BitDew: a programmable environment for large-scale data management and distribution. In: Proceedings of the ACM/IEEE conference on supercomputing
Gentzsch W (2001) Sun Grid Engine: towards creating a compute power grid. In: Proceedings of the 1st international symposium on cluster computing and the grid, pp 35–36. Washington, USA
Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: Proceedings of the ACM symposium on operating systems principles, pp 29–43
Globus: Grid security infrastructure (2013). http://www.globus.org/security/
Hadoop: Apache Hadoop Project (2012). http://hadoop.apache.org/
Hameed A, Khoshkbari A, Ranjan R, Khan S.U, Kolodziej J, Balaji P, Zeadally S, Malluhi QM, Tzirtas N, Vishnav A, Zomaya A (2014) A survey and taxonomy on energy efficient resource allocation techniques for cloud computing systems (accepted)
He B, Fang W, Luo Q, Govindaraju N.K, Wang T (2008) Mars: a mapreduce framework on graphics processors. In: Proceedings of international conference on parallel architectures and compilation techniques, pp 260–269
Ibrahim S, Jin H, Cheng B, Cao H, Wu S, Qi L (2009) CLOUDLET: towards mapreduce implementation on virtual machines. In: Proceedings of the ACM international symposium on high performance distributed computing, pp 65–66
Keahey K, Freeman T (2008) Science clouds: early experiences in cloud computing for scientific applications. In: Proceedings of the first workshop on cloud computing and its applications
Kolodziej J, Khan S, Wang L, Byrski A, Nasro M, Madani S (2013) Hierarchical genetic-based grid scheduling with energy optimization. Clust Comput. doi:10.1007/s10586-012-0226-7
Kolodziej J, Khan S, Wang L, Kisiel-Dorohinicki M, Madani S (2012) Security, energy, and performance-aware resource allocation mechanisms for computational grids. Futur Gener Comput Syst. doi:10.1016/j.future.2012.09.009
Kolodziej J, Khan S, Wang L, Zomaya A (2013) Energy efficient genetic-based schedulers in computational grids. Concurr Comput Pract Exp . doi:10.1002/cpe.2839
Liu H, Orban D (2011) Cloud MapReduce: a MapReduce implementation on top of a cloud operating system. In: Proceedings of the international symposium on cluster, cloud and grid computing, pp 464–474
Mell P, Grance T (2013) The NIST definition of cloud computing. http://csrc.nist.gov/publications/drafts/800-145/Draft-SP-800-145_cloud-definition
Menzel M, Ranjan R, Wang L, Khan S, Chen J (2014) CloudGenius: a hybrid decison support method for automating the migration of web application clustes to public clouds (accepted)
Miao Y, Wang L, Liu D (2013) A web 2.0-based scientific gateway for massive remote sensing image processing. Concurr Comput Pract Exp. doi:10.1002/cpe.3049
Pacheco P (1996) Parallel programming with MPI. No. 978-1-55860-339-4 in ISBN. Morgan Kaufmann, Los Altos
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of the IEEE international symposium on high performance computer architecture, pp 13–24
Ranjan R, Buyya R, Harwood A (2005) A case for cooperative and incentive based coupling of distributed clusters. In: Proceedings of the 7th IEEE international conference on cluster computing (Cluster 2005), pp 1–11. Boston, MS, USA
Ranjan R, Buyya R, Nepal S, Georgakopulo D (2014) A note on resource orchestration for cloud computing (accepted)
Rescorla E (2002) SSL and TLS designing adn building secure systems. Addison-Wesley, Reading
Roy I, Setty STV, Kilzer A, Shmatikov V, Witchel E (2010) Airavat: security and privacy for MapReduce. In: Proceedings of the 7th USENIX conference on networked systems design and implementation
Shan Y, Wang B, Yan J, Wang Y, Xu N, Yang H (2010) FPMR: MapReduce framework on FPGA. In: Proceedings of the annual ACM/SIGDA international symposium on field programmable gate arrays, pp 93–102
Shvachko K, Hairong K, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of the IEEE symposium on mass storage systems and technologies, pp 1–10
Sotomayor B, Montero R, Llorente I, Foster I (2008) Capacity leasing in cloud systems using the OpenNebula engine. In: The first workshop on cloud computing and its applications
Staples G (2006) TORQUE resource manager. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing
Tatebe O, Hiraga K, Soda N (2010) Gfarm grid file system. New Gener Comput 28(3):257–275
Wang L, Chen D, Hu Y, Ma Y, Wang J (2013) Towards enabling cyberinfrastructure as a service in clouds. Comput Electr Eng 39(1):3–14
Wang L, Chen D, Liu W, Ma Y, Wu Y, Deng Z (2013) Parallel simulation of threat management for urban water distribution systems with MapReduce in clouds. IEEE Mag Comput Sci Eng. doi:10.1109/MCSE.2012.89
Wang L, Khan S, Chen D, Kolodziej J, Ranjan R, Xu C, Zomaya A (2013) Energy-aware parallel task scheduling in a cluster. Futur Gener Comput Syst 29(7):1661–1670
Wang L, Khan S, Dayal J (2012) Thermal aware workload placement with task-temperature profiles in a data center. J Supercomput 61(3):780–803
Wang L, Kunze M, Tao J, von Laszewski G (2011) Towards building a cloud for scientific applications. Adv Eng Softw 42(9):714–722
Wang L, Laszewski G, Younge A, He X, Kunze M, Tao J, Fu C (2010) Cloud computing: a perspective study. New Gener Comput 28(2):137–146
Wang L, Tao J, Ranjan R, Marten H, Streit A, Chen J, Chen D (2013) G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Futur Gener Comput Syst 29(3):739–750
Wei J, Liu D, Wang L (2013) A general metric and parallel framework for adaptive image fusion. Concurr Comput Pract Exp. doi:10.1002/cpe.3037
Wei W, Du J, Yu T, Gu X (2009) SecureMR: a service integrity assurance framework for MapReduce. In: Proceedings of annual computer security applications conference, pp 73–82
Zhao J, Wang L, Tao J, Chen J, Sun W, Ranjan RR, Kolodziej J, Streit A, Georgakopoulos D (2014) A security framework in G-Hadoop for big data computing across distributed Cloud data centres. J Comput Syst Sci. doi:10.1016/j.jcss.2014.02.006
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, J., Tao, J. & Streit, A. Enabling collaborative MapReduce on the Cloud with a single-sign-on mechanism. Computing 98, 55–72 (2016). https://doi.org/10.1007/s00607-014-0390-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-014-0390-0