Abstract
The data growth over the last couple of decades increases on a massive scale. As the volume of the data increases so are the challenges associated with big data. The issues related to avalanche of data being produced are immense and cover variety of challenges that needs a careful consideration. The use of (High Performance Data Analytics) HPDA is increasing at brisk speed in many industries resulted in expansion of HPC market in these new territories. HPC and Big data are different systems, not only at the technical level, but also have different ecosystems. The world of workload is diverse enough and performance sensitivity is high enough that, we cannot have globally optimal and locally high sub-optimal solutions to all the issues related to convergence of big data and HPC. As we are heading towards exascale systems, the necessary integration of big data and HPC is a current hot topic of research but still at very infant stages. Both systems have different architecture and their integration brings many challenges. The main aim of this paper is to identify the driving forces, challenges, current and future trends associated with the integration of HPC and big data. We also propose architecture of big data and HPC convergence using design patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Singh, K., Kaur, R.: Hadoop: addressing challenges of big data. In: 2014 IEEE International Advance Computing Conference (IACC), pp. 686–689. IEEE (2014)
Charl, S.: IBM - HPC and HPDA for the Cognitive Journey with OpenPOWER. https://www-03.ibm.com/systems/power/solutions/bigdata-analytics/smartpaper/high-value-insights.html
Keable, C.: The convergence of High Performance Computing and Big Data – Ascent. https://ascent.atos.net/convergence-high-performance-computing-big-data/
Joseph, E., Sorensen, B.: IDC Update on How Big Data Is Redefining High Performance Computing. https://www.tacc.utexas.edu/documents/1084364/1136739/IDC+HPDA+Briefing+slides+10.21.2014_2.pdf
Geist, A., Lucas, R.: Whitepaper on the Major Computer Science Challenges at Exascale (2009)
Krishnan, S., Tatineni, M., Baru, C.: myHadoop-Hadoop-on-Demand on Traditional HPC Resources (2011)
Xuan, P., Denton, J., Ge, R., Srimani, P.K., Luo, F.: Big data analytics on traditional HPC infrastructure using two-level storage (2015)
Is Hadoop the New HPC. http://www.admin-magazine.com/HPC/Articles/Is-Hadoop-the-New-HPC
Katal, A., Wazid, M., Goudar, R.H.: Big data: issues, challenges, tools and good practices. In: 2013 Sixth International Conference on Contemporary Computing (IC3), pp. 404–409. IEEE (2013)
Hess, K.: Hadoop vs. Spark: The New Age of Big Data. http://www.datamation.com/data-center/hadoop-vs.-spark-the-new-age-of-big-data.html
Muhammad, J.: Is Apache Spark going to replace Hadoop? http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/
OLCF Staff Writer: OLCF Group to Offer Spark On-Demand Data Analysis. https://www.olcf.ornl.gov/2016/03/29/olcf-group-to-offer-spark-on-demand-data-analysis/
Islam, N.S., Lu, X., Wasi-ur-Rahman, M., Shankar, D., Panda, D.K.: Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 101–110. IEEE (2015)
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating MapReduce for multi-core and multiprocessor systems. In: 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pp. 13–24. IEEE (2007)
Tiwari, N., Sarkar, S., Bellur, U., Indrawan, M.: An empirical study of Hadoop’s energy efficiency on a HPC cluster. Procedia Comput. Sci. 29, 62–72 (2014)
Woodie, A.: Does InfiniBand Have a Future on Hadoop? http://www.datanami.com/2015/08/04/does-infiniband-have-a-future-on-hadoop/
Veiga, J., Exp, R.R., Taboada, G.L., Touri, J.: Analysis and Evaluation of Big Data Computing Solutions in an HPC Environment (2015)
Wang, Y., et al.: Assessing the performance impact of high-speed interconnects on MapReduce. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB-2012. LNCS, vol. 8163, pp. 148–163. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-53974-9_13
Islam, N.S., Lu, X., Wasi-ur-Rahman, M., Panda, D.K.: Can parallel replication benefit Hadoop distributed file system for high performance interconnects? In: 2013 IEEE 21st Annual Symposium on High-Performance Interconnects, pp. 75–78. IEEE (2013)
Moore, J., Chase, J., Ranganathan, P., Sharma, R.: Making scheduling cool: temperature-aware workload placement in data centers (2005)
Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58, 56–68 (2015)
Rajovic, N., Puzovic, N., Vilanova, L., Villavieja, C., Ramirez, A.: The low-power architecture approach towards exascale computing. In: Proceedings of the Second Workshop on Scalable Algorithms for Large-Scale Systems - ScalA 2011, p. 1. ACM Press, New York (2011)
Cappello, F.: Fault tolerance in petascale/exascale systems: current knowledge, challenges and research opportunities. Int. J. High Perform. Comput. Appl. 23, 212–226 (2009)
Gutierrez, D.: The Convergence of Big Data and HPC – insideBIGDATA. https://insidebigdata.com/2016/10/25/the-convergence-of-big-data-and-hpc/
High Performance Data Analytics (HPDA) Market-Forecast 2022. https://www.marketresearchfuture.com/reports/high-performance-data-analytics-hpda-market
Willard, C.G., Snell, A., Segervall, L., Feldman, M.: Top Six Predictions for HPC in 2015 (2015)
Egham: Gartner Says 8.4 Billion Connected “Things”; Will Be in Use in 2017, Up 31 Percent From 2016. http://www.gartner.com/newsroom/id/3598917
El Baz, D.: IoT and the need for high performance computing. In: 2014 International Conference on Identification, Information and Knowledge in the Internet of Things, pp. 1–6. IEEE (2014)
Conway, S.: High Performance Data Analysis (HPDA): HPC - Big Data Convergence - insideHPC (2017)
Keutzer, K., Tim, M.: Our Pattern Language_Our Pattern Language (2016). Keutzer—EECS UC Berkeley, Tim—Intel. file:///Users/abdulmanan/Desktop/Our Pattern Language_Our Pattern Language.htm
Bodkin, R., Bodkin, R.: Big Data Patterns, pp. 1–23 (2017)
Mysore, D., Khupat, S., Jain, S.: Big data architecture and patterns, Part 1: Introduction to big data classification and architecture. https://www.ibm.com/developerworks/library/bd-archpatterns1/index.html
Acknowledgments
The authors acknowledge with thanks, the technical and financial support from the Deanship of Scientific Research (DSR) at the King Abdul-Aziz University (KAU), Jeddah, Saudi Arabia, under the grant number G-661-611-38. The work carried out in this paper is supported by the HPC Center at the King Abdul-Aziz University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Usman, S., Mehmood, R., Katib, I. (2018). Big Data and HPC Convergence: The Cutting Edge and Outlook. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 224. Springer, Cham. https://doi.org/10.1007/978-3-319-94180-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-94180-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94179-0
Online ISBN: 978-3-319-94180-6
eBook Packages: Computer ScienceComputer Science (R0)