Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Adapting scientific computing problems to clouds using MapReduce

Published: 01 January 2012 Publication History

Abstract

Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study this, we established a scientific computing cloud (SciCloud) project and environment on our internal clusters. The main goal of the project is to study the scope of establishing private clouds at the universities. With these clouds, students and researchers can efficiently use the already existing resources of university computer networks, in solving computationally intensive scientific, mathematical, and academic problems. However, to be able to run the scientific computing applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. This paper summarizes the challenges associated with reducing iterative algorithms to the MapReduce model. Algorithms used by scientific computing are divided into different classes by how they can be adapted to the MapReduce model; examples from each such class are reduced to the MapReduce model and their performance is measured and analyzed. The study mainly focuses on the Hadoop MapReduce framework but also compares it to an alternative MapReduce framework called Twister, which is specifically designed for iterative algorithms. The analysis shows that Hadoop MapReduce has significant trouble with iterative problems while it suits well for embarrassingly parallel problems, and that Twister can handle iterative problems much more efficiently. This work shows how to adapt algorithms from each class into the MapReduce model, what affects the efficiency and scalability of algorithms in each class and allows us to judge which framework is more efficient for each of them, by mapping the advantages and disadvantages of the two frameworks. This study is of significant importance for scientific computing as it often uses complex iterative methods to solve critical problems and adapting such methods to cloud computing frameworks is not a trivial task.

References

[1]
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J. and Brandic, I., Cloud computing and emerging it platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems. v25. 599-616.
[2]
S.N. Srirama, O. Batrashev, E. Vainikko, Scicloud: scientific computing on the cloud, in: The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2010, p. 579.
[3]
S.G. Jeffrey Dean, MapReduce: simplified data processing on large clusters, in: Proc. of the 6th OSDI.
[4]
Apache Software Foundation, Hadoop, 2011. http://wiki.apache.org/hadoop/.
[5]
The google file system. SIGOPS Operating Systems Review. v37. 29-43.
[6]
Cohen, J., Graph twiddling in a MapReduce world. Computing in Science and Engineering. v11. 29-41.
[7]
C. Bunch, B. Drawert, M. Norman, Mapscale: a cloud environment for scientific computing, Technical Report, University of California, Computer Science Department, 2009.
[8]
Google Inc., App engine java overview, 2011. http://code.google.com/appengine/docs/java/overview.html.
[9]
Amazon Inc., Amazon elastic compute cloud (Amazon ec2), 2011. http://aws.amazon.com/ec2/.
[10]
Eucalyptus Systems Inc., Eucalyptus, 2011. http://www.eucalyptus.com.
[11]
Srirama, S. and Jarke, M., Mobile hosts in enterprise service integration. International Journal of Web Engineering and Technology (IJWET). v5. 187-213.
[12]
Gottschalk, K., Graham, S., Kreger, H. and Snell, J., Introduction to web services architecture. IBM Systems Journal: New Developments in Web Services and E-commerce. v41 i2. 178-198.
[13]
Srirama, S.N., Jarke, M. and Prinz, W., Mobile web services mediation framework. In: Middleware for Service Oriented Computing (MW4SOC) Workshop @ 8th Int. Middleware Conf. 2007, ACM Press.
[14]
Srirama, S.N., Shor, V., Vainikko, E. and Jarke, M., Scalable mobile web services mediation framework. In: Fifth Int. Conf. on Internet and Web Applications and Services, IEEE CS. pp. 315-320.
[15]
Apache Software Foundation, Hdfs, 2011. http://hadoop.apache.org/common/docs/current/hdfs_design.html.
[16]
J.R. Shewchuk, An introduction to the conjugate gradient method without the agonizing pain, Technical Report, Pittsburgh, PA, USA, 1994.
[17]
Kaufman, L. and Rousseeuw, P., Finding Groups in Data An Introduction to Cluster Analysis. 1990. Wiley Interscience, New York.
[18]
Pomerance, C., A tale of two sieves. Notices of the American Mathematical Society. v43. 1473-1485.
[19]
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J. and Fox, G., Twister: a runtime for iterative MapReduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, ACM, New York, NY, USA. pp. 810-818.
[20]
J. Ekanayake, X. Qiu, T. Gunarathne, S. Beason, G. Fox, High performance parallel computing with cloud and cloud technologies, Technical Report, Indiana University, 2009.
[21]
Y. Bu, B. Howe, M. Balazinska, M.D. Ernst, HaLoop: efficient iterative data processing on large clusters, in: 36th International Conference on Very Large Data Bases, Singapore.
[22]
M. Zaharia, M. Chowdhury, M.J. Franklin, S. Shenker, I. Stoica, Spark: cluster computing with working sets, in: 2nd USENIX Conf. on Hot Topics in Cloud Computing, HotCloud'10, p. 10.
[23]
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N. and Czajkowski, G., Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 International Conference on Management of Data, ACM. pp. 135-146.
[24]
Scalability, portability and predictability: the bsp approach to parallel programming. Future Generation Computer Systems. v12. 265-272.
[25]
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G. and Kozyrakis, C., Evaluating MapReduce for multi-core and multiprocessor systems. In: 13th International Symposium on High Performance Computer Architecture, IEEE CS. pp. 13-24.
[26]
Sehgal, S., Erdelyi, M., Merzky, A. and Jha, S., Understanding application-level interoperability: Scaling-out MapReduce over high-performancee Grids and clouds. Future Generation Computer Systems. v27. 590-599.
[27]
Robert, C. and Casella, G., Monte Carlo Statistical Methods. 2004. Springer Verlag.
[28]
Branford, S., Sahin, C., Thandavan, A., Alexandrov, V. and Dimov, I., Monte Carlo methods for matrix computations on the Grid. Future Generation Computer Systems. v24. 605-612.

Cited By

View all
  • (2024)A systematic survey on fault-tolerant solutions for distributed data analyticsComputer Science Review10.1016/j.cosrev.2024.10066053:COnline publication date: 1-Aug-2024
  • (2022)BurstZ+: Eliminating The Communication Bottleneck of Scientific Computing Accelerators via Accelerated CompressionACM Transactions on Reconfigurable Technology and Systems10.1145/347683115:2(1-34)Online publication date: 31-Jan-2022
  • (2018)On the Exploration of Equal Length Cellular Automata Rules Targeting a MapReduce Design in CloudInternational Journal of Cloud Applications and Computing10.4018/IJCAC.20180401018:2(1-26)Online publication date: 1-Apr-2018
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems
Future Generation Computer Systems  Volume 28, Issue 1
January, 2012
338 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2012

Author Tags

  1. Cloud computing
  2. Hadoop
  3. Iterative algorithm
  4. MapReduce
  5. Scientific computing
  6. Twister

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A systematic survey on fault-tolerant solutions for distributed data analyticsComputer Science Review10.1016/j.cosrev.2024.10066053:COnline publication date: 1-Aug-2024
  • (2022)BurstZ+: Eliminating The Communication Bottleneck of Scientific Computing Accelerators via Accelerated CompressionACM Transactions on Reconfigurable Technology and Systems10.1145/347683115:2(1-34)Online publication date: 31-Jan-2022
  • (2018)On the Exploration of Equal Length Cellular Automata Rules Targeting a MapReduce Design in CloudInternational Journal of Cloud Applications and Computing10.4018/IJCAC.20180401018:2(1-26)Online publication date: 1-Apr-2018
  • (2018)SDACComputer Languages, Systems and Structures10.1016/j.cl.2018.07.00554:C(406-426)Online publication date: 1-Dec-2018
  • (2017)BLCACM Transactions on Privacy and Security10.1145/304176020:2(1-25)Online publication date: 25-May-2017
  • (2017)Intelligent and independent processes for overcoming big graphsThe Journal of Supercomputing10.1007/s11227-016-1834-473:4(1438-1466)Online publication date: 1-Apr-2017
  • (2016)Towards understanding job heterogeneity in HPCProceedings of the 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2016.32(521-526)Online publication date: 16-May-2016
  • (2016)Expanded cloud plumes hiding Big Data ecosystemFuture Generation Computer Systems10.1016/j.future.2016.01.00359:C(63-92)Online publication date: 1-Jun-2016
  • (2016)An efficient MapReduce-based rule matching method for production systemFuture Generation Computer Systems10.1016/j.future.2015.03.01054:C(478-489)Online publication date: 1-Jan-2016
  • (2016)Future Internet technologies for environmental applicationsEnvironmental Modelling & Software10.1016/j.envsoft.2015.12.01578:C(1-15)Online publication date: 1-Apr-2016
  • Show More Cited By

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media