Abstract
The dawn of exascale computing and its convergence with big data analytics has greatly spurred research interests. The reasons are straightforward. Traditionally, high performance computing (HPC) systems have been used for scientific applications involving majority of compute-intensive tasks. At the same time, the proliferation of big data resulted into design of data-intensive processing paradigms like Apache big data stack. Big data generating at high pace necessitates faster processing mechanisms for getting insights at a real time. For this, the HPC systems may serve as panacea for solving the big data problems. Though the HPC systems have the capability to give the promising results for big data, directly integrating them with existing data-intensive frameworks like Apache big data stack is not straightforward due to challenges associated with them. This triggers a research on seamlessly integrating these two paradigms based on interoperable framework, programming model, and system architecture. The aim of this paper is to assess a progress made in HPC world as an effort to augment it with big data analytics support. As an outcome of this, the taxonomy showing the factors to be considered for augmenting HPC systems with big data support has been put forth. This paper sheds light upon how big data frameworks can be ported to HPC platforms as a preliminary step towards the convergence of big data and exascale computing ecosystem. The focus is given on research issues related to augmenting HPC paradigms with big data frameworks and corresponding approaches to address those issues. This paper also discusses data-intensive as well as compute-intensive processing paradigms, benchmark suites and workloads, and future directions in the domain of integrating HPC with big data analytics.
Similar content being viewed by others
References
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51 (2008)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Newton (2012)
Apache Spark. https://spark.apache.org. Accessed 22 Sep 2018
Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58, 56–68 (2015)
Elsebakhi, E., et al.: Large-scale machine learning based on functional networks for biomedical big data with high performance computing platforms. J. Comput. Sci. 11, 69–81 (2015)
Bianchini, G., Caymes-Scutari, P., Méndez-Garabetti, M.: Evolutionary-Statistical System: a parallel method for improving forest fire spread prediction. J. Comput. Sci. 6, 58–66 (2015)
Zhao, G., Bryan, B.A., King, D., Song, X., Yu, Q.: Parallelization and optimization of spatial analysis for large scale environmental model data assembly. Comput. Electron. Agric. 89, 94–99 (2012)
Bhangale, U.M., Kurte, K.R., Durbha, S.S., King, R.L., Younan, N.H.: Big data processing using HPC for remote sensing disaster data. In: Geoscience and Remote Sensing Symposium (IGARSS), 2016, pp. 5894–5897. IEEE International (2016)
Worldwide high-performance data analysis forecast. https://www.marketresearchfuture.com/reports/high-performance-data-analytics-hpda-market-1828
Cray Urika-XC. http://www.cray.com/products/analytics/urika-xc. Accessed 27 Sep 2018
Wrangler. https://portal.tacc.utexas.edu/-/introduction-to-wrangler. Accessed 27 Sep 2018
HPCC. https://hpccsystems.com. Accessed 30 Sep 2018
Bridges. https://www.psc.edu/bridges. Accessed 30 Sep 2018
ADIOS. https://www.exascaleproject.org/project/adios-framework-scientific-data-exascale-systems/. Accessed 7 Feb 2019
CODAR. https://www.exascaleproject.org/project/codar-co-design-center-online-data-analysis-reduction-exascale/. Accessed 7 Feb 2019
EXAFEL. https://www.exascaleproject.org/project/exafel-data-analytics-exascale-free-electron-lasers/. Accessed 7 Feb 2019
ExaLearn Co-Design Center. https://www.exascaleproject.org/ecp-announces-new-co-design-center-to-focus-on-exascale-machine-learning-technologies/. Accessed 7 Feb 2019
Park, B.H., Hukerikar, S., Adamson, R., Engelmann, C.: Big data meets HPC log analytics: scalable approach to understanding systems at extreme scale. In: IEEE International Conference on Cluster Computing (CLUSTER), 2017, pp. 758–765 (2017)
Moise, D.: Experiences with performing MapReduce analysis of scientific data on HPC platforms. In: Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing, pp. 11–18 (2016)
Fox, G.C., Qiu, J., Kamburugamuve, S., Jha, S., Luckow, A.: HPC-ABDS high performance computing enhanced Apache big data stack. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 1057–1066 (2015)
Fox, G., Qiu, J., Jha, S., Ekanayake, S., Kamburugamuve, S.: Big data, simulations and HPC convergence. In: Big Data Benchmarking, pp. 3–17. Springer (2015)
Veiga, J., Expósito, R.R., Taboada, G.L., Touriño, J.: Analysis and evaluation of MapReduce solutions on an HPC cluster. Comput. Electr. Eng. 50, 200–216 (2016)
Xenopoulos, P., Daniel, J., Matheson, M., Sukumar, S.: Big data analytics on HPC architectures: performance and cost. In 2016 IEEE International Conference on Big Data (Big Data), pp. 2286–2295 (2016)
Asaadi, H., Khaldi, D., Chapman, B.: A comparative survey of the HPC and big data paradigms: analysis and experiments. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 423–432 (2016)
Wasi-ur-Rahman, M., Islam, N.S., Lu, X., Panda, D.K.D.K.: A comprehensive study of MapReduce over Lustre for intermediate data placement and shuffle strategies on HPC clusters. IEEE Trans. Parallel Distrib. Syst. 28, 633–646 (2017)
Usman, S., Mehmood, R., Katib, I.: Big data and HPC convergence: the cutting edge and outlook. In: Smart Societies, Infrastructure, Technologies and Applications, pp. 11–26. Springer (2018)
Asch, M., et al.: Big data and extreme-scale computing: pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int. J. High Perform. Comput. Appl. 32, 435–479 (2018)
The convergence of big data and extreme-scale HPC. https://www.hpcwire.com/2018/08/31/the-convergence-of-big-data-and-extreme-scale-hpc/. Accessed 22 Sep 2018
Luckow, A., Paraskevakos, I., Chantzialexiou, G., Jha, S.: Hadoop on HPC: integrating Hadoop and pilot-based dynamic resource management. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1607–1616 (2016)
Ross, R.B., Thakur, R., et al.: PVFS: a parallel file system for Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference, pp. 391–430 (2000)
Nagle, D., Serenyi, D., Matthews, A.: The Panasas ActiveScale storage cluster: delivering scalable high bandwidth storage. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 53 (2004)
Eisler, M., Labiaga, R., Stern, H.: Managing NFS and NIS: Help for Unix System Administrators. O’Reilly Media, Inc., Newton (2001)
Schwan, P., et al.: Lustre: building a file system for 1000-node clusters. In: Proceedings of the 2003 Linux Symposium, vol. 2003, pp. 380–386 (2003)
Schmuck, F.B., Haskin, R.L.: GPFS: a shared-disk file system for large computing clusters. In: FAST, vol. 2 (2002)
Gu, Y., Grossman, R.L., Szalay, A., Thakar, A.: Distributing the Sloan digital sky survey using UDT and sector. In: Second IEEE International Conference on e-Science and Grid Computing, 2006. e-Science’06, p. 56 (2006)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. ACM 37 (2003). https://doi.org/10.1145/1165389.945450
OpenMP. https://www.openmp.org. Accessed 20 Aug 2018
MPICH. https://www.mpich.org. Accessed 20 Aug 2018
MVAPICH. http://mvapich.cse.ohio-state.edu. Accessed 20 Aug 2018
Exascale MPI. https://www.exascaleproject.org/project/exascale-mpi/. Accessed 2 Feb 2019
OMPI-X. https://www.exascaleproject.org/project/ompi-x-open-mpi-exascale/. Accessed 2 Feb 2019
OpenACC. https://www.openacc.org. Accessed 2 Feb 2019
Zhang, F., et al.: CloudFlow: a data-aware programming model for cloud workflow applications on modern HPC systems. Future Gener. Comput. Syst. 51, 98–110 (2015)
Venkata, M.G., Aderholdt, F., Parchman, Z.: SharP: Towards programming extreme-scale systems with hierarchical heterogeneous memory. In: 2017 46th International Conference on Parallel Processing Workshops (ICPPW), pp. 145–154 (2017)
Fadika, Z., Dede, E., Govindaraju, M., Ramakrishnan, L.: MARIANE: using MapReduce in HPC environments. Future Gener. Comput. Syst. 36, 379–388 (2014)
Luckow, A., et al.: P*: a model of pilot-abstractions. CoRR (2012). http://arxiv.org/abs/1207.6644
Neves, M.V., Ferreto, T., De Rose, C.: Scheduling MapReduce jobs in HPC clusters. In: Euro-Par 2012 Parallel Processing: 18th International Conference, Euro-Par 2012, Proceedings, pp. 179–190. Springer, Berlin (2012)
Sato, K., et al.: A user-level InfiniBand-based file system and checkpoint strategy for burst buffers. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 21–30 (2014). https://doi.org/10.1109/ccgrid.2014.24
Daly, J.T.: A higher order estimate of the optimum checkpoint interval for restart dumps. Future Gener. Comput. Syst. 22, 303–312 (2006)
Pcocc. https://pcocc.readthedocs.io/en/latest/. Accessed 8 March 2019
TrinityX. https://trinityx.eu. Accessed 8 March 2019
OpenStack. https://www.openstack.org/. Accessed 8 March 2019
Docker. https://www.docker.com. Accessed 8 March 2019
Slurm elastic computing. https://slurm.schedmd.com/elastic_computing.html. Accessed 8 March 2019
Xen. https://xenproject.org. Accessed 8 March 2019
VMware. https://www.vmware.com. Accessed 8 March 2019
KVM. https://www.linux-kvm.org. Accessed 8 March 2019
VirtualBox. https://www.virtualbox.org. Accessed 8 March 2019
Regola, N., Ducom, J.-C.: Recommendations for virtualization technologies in high performance computing. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp. 409–416 (2010)
Biederman, E.W., Networx, L.: Multiple instances of the global Linux namespaces. Proc. Linux Symp. 1, 101–112 (2006)
Cgroups. https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt. Accessed 10 March 2019
Linux containers. https://linuxcontainers.org. Accessed 10 March 2019
Linux-VServer. www.linux-vserver.org. Accessed 10 March 2019
OpenVZ. https://openvz.org. Accessed 10 March 2019
LXD Linux containers. https://linuxcontainers.org/lxd/introduction. Accessed 10 March 2019
rkt-CoreOS. https://coreos.com/rkt/. Accessed 10 March 2019
Kurtzer, G.M., Sochat, V., Bauer, M.W.: Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017)
Shifter. https://docs.nersc.gov/programming/shifter/overview/. Accessed 14 March 2019
Priedhorsky, R., Randles, T.: Charliecloud: unprivileged containers for user-defined software stacks in HPC. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 36 (2017
Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. ACM SIGOPS Oper. Syst. Rev. 41, 275–287 (2007)
Julian, S., Shuey, M., Cook, S.: Containers in research: initial experiences with lightweight infrastructure. In: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, p. 25 (2016)
Kozhirbayev, Z., Sinnott, R.O.: A performance comparison of container-based technologies for the cloud. Future Gener. Comput. Syst. 68, 175–182 (2017)
Medrano-Jaimes, F., Lozano-Rizk, J.E., Castañeda-Avila, S., Rivera-Rodriguez, R.: Use of containers for high-performance computing. In: International Conference on Supercomputing in Mexico, pp. 24–32 (2018)
Martin, J.P., Kandasamy, A., Chandrasekaran, K.: Exploring the support for high performance applications in the container runtime environment. Hum. Centric Comput. Inf. Sci. 8, 1 (2018)
Shafer, J.: I/O virtualization bottlenecks in cloud computing today. In: Proceedings of the 2nd Conference on I/O Virtualization, p. 5 (2010)
Yassour, B.-A., Ben-Yehuda, M., Wasserman, O.: Direct Device Assignment for Untrusted Fully-Virtualized Virtual Machines. IBM, Haifa (2008)
Liu, J., Huang, W., Abali, B., Panda, D.K.: High performance VMM-bypass I/O in virtual machines. In: USENIX Annual Technical Conference, General Track, pp. 29–42 (2006)
SR-IOV. http://pcisig.com/specifications/iov/single_root/. Accessed 14 March 2019
Gugnani, S., Lu, X., Panda, D.K.: Performance characterization of Hadoop workloads on SR-IOV-enabled virtualized InfiniBand clusters. In: Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, pp. 36–45 (2016)
Hillenbrand, M., Mauch, V., Stoess, J., Miller, K., Bellosa, F.: Virtual InfiniBand clusters for HPC clouds. In: Proceedings of the 2nd International Workshop on Cloud Computing Platforms, p. 9 (2012)
Nicolae, B., Cappello, F.: BlobCR: virtual disk based checkpoint–restart for HPC applications on IaaS clouds. J. Parallel Distrib. Comput. 73, 698–711 (2013)
Ren, J., Qi, Y., Dai, Y., Xuan, Y., Shi, Y.: nOSV: a lightweight nested-virtualization VMM for hosting high performance computing on cloud. J. Syst. Softw. 124, 137–152 (2017)
Zhang, J., Lu, X., Chakraborty, S., Panda, D.K. Slurm-V: extending Slurm for building efficient HPC cloud with SR-IOV and IVShmem. In: European Conference on Parallel Processing, pp. 349–362 (2016)
Duran-Limon, H.A., Flores-Contreras, J., Parlavantzas, N., Zhao, M., Meulenert-Peña, A.: Efficient execution of the WRF model and other HPC applications in the cloud. Earth Sci. Inform. 9, 365–382 (2016)
Duran-Limon, H.A., Siller, M., Blair, G.S., Lopez, A., Lombera-Landa, J.F.: Using lightweight virtual machines to achieve resource adaptation in middleware. IET Softw. 5, 229–237 (2011)
Yang, C.-T., Wang, H.-Y., Ou, W.-S., Liu, Y.-T., Hsu, C.-H.: On implementation of GPU virtualization using PCI pass-through. In: 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, pp. 711–716 (2012)
Jo, H., Jeong, J., Lee, M., Choi, D.H.: Exploiting GPUs in virtual machine for BioCloud. Biomed. Res. Int. (2013). https://doi.org/10.1155/2013/939460
Prades, J., Reaño, C., Silla, F.: On the effect of using rCUDA to provide CUDA acceleration to Xen virtual machines. Clust. Comput. 22, 185–204 (2019)
Mavridis, I., Karatza, H.: Combining containers and virtual machines to enhance isolation and extend functionality on cloud computing. Future Gener. Comput. Syst. 94, 674–696 (2019)
Gad, R., et al.: Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environments. J. Supercomput. 74, 6236–6257 (2018)
Trusted Computing Group. https://trustedcomputinggroup.org. Accessed 27 Feb 2019
Goldman, K., Sailer, R., Pendarakis, D., Srinivasan, D.: Scalable integrity monitoring in virtualized environments. In: Proceedings of the Fifth ACM Workshop on Scalable Trusted Computing, pp. 73–78 (2010)
Zhang, J., Lu, X., Panda, D.K.: Is singularity-based container technology ready for running MPI applications on HPC clouds? In: Proceedings of the 10th International Conference on Utility and Cloud Computing, pp. 151–160 (2017)
De Benedictis, M., Lioy, A.: Integrity verification of Docker containers for a lightweight cloud environment. Future Gener. Comput. Syst. 97, 236–246 (2019)
Costan, V., Devadas, S.: Intel SGX explained. IACR Cryptol. ePrint Arch. 2016, 86 (2016)
Arnautov, S., et al.: SCONE: secure Linux containers with Intel SGX. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI16), pp. 689–703 (2016)
Sailer, R., Zhang, X., Jaeger, T., Van Doorn, L.: Design and implementation of a TCG-based integrity measurement architecture. In: USENIX Security Symposium, vol. 13, pp. 223–238 (2004)
Sun, Y., et al.: Security namespace: making Linux security frameworks available to containers. In: 27th USENIX Security Symposium USENIX Security 18, pp. 1423–1439 (2018)
AppArmor. https://www.novell.com/developer/ndk/novell_apparmor.html. Accessed 27 Feb 2019
Bézivin, J.: On the unification power of models. Softw. Syst. Model. 4, 171–188 (2005)
Paraiso, F., Challita, S., Al-Dhuraibi, Y., Merle, P.: Model-driven management of docker containers. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pp. 718–725 (2016)
Pérez, A., Moltó, G., Caballer, M., Calatrava, A.: Serverless computing for container-based architectures. Future Gener. Comput. Syst. 83, 50–59 (2018)
AWS Lambda. https://aws.amazon.com/lambda. Accessed 1 March 2019
Medel, V., et al.: Client-side scheduling based on application characterization on Kubernetes. In: Pham, C., Altmann, J., Bañares, J.Á. (eds.) Economics of Grids, Clouds, Systems, and Services, pp. 162–176. Springer, Cham (2017)
Yang, X., Liu, N., Feng, B., Sun, X.-H., Zhou, S.: PortHadoop: support direct HPC data processing in Hadoop. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 223–232 (2015)
Ruan, G., Plale, B.: Horme: random access big data analytics. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 364–373 (2016)
McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM Conference on Recommender Systems, pp. 165–172 (2013)
Ren, K., Zheng, Q., Patil, S., Gibson, G.: IndexFS: scaling file system metadata performance with stateless caching and bulk insertion. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 237–248 (2014)
Takatsu, F., Hiraga, K., Tatebe, O.: PPFS: a scale-out distributed file system for post-petascale systems. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications, pp. 1477–1484 (2016)
Islam, N.S., Lu, X., Wasi-ur-Rahman, M., Shankar, D., Panda, D.K.: Triple-H: a hybrid approach to accelerate HDFS on HPC clusters with heterogeneous storage architecture. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 101–110 (2015)
Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable Internet services. ACM SIGOPS Oper. Syst. Rev. 35, 230–243 (2001)
Wasi-ur-Rahman, M., Lu, X., Islam, N.S., Rajachandrasekar, R., Panda, D.K.: High-performance design of YARN MapReduce on modern HPC clusters with Lustre and RDMA. In: Parallel and Distributed Processing Symposium (IPDPS), 2015, pp. 291–300. IEEE International (2015)
Rahman, M.W., Lu, X., Islam, N.S., Rajachandrasekar, R., Panda, D.K.: MapReduce over Lustre: can RDMA-based approach benefit? In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014 Parallel Processing: 20th International Conference. Proceedings, Porto, Portugal, 25–29 August 2014, pp. 644–655. Springer (2014)
Li, H., Ghodsi, A., Zaharia, M., Shenker, S., Stoica, I.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–15 (2014)
Zhao, D., et al.: FusionFS: toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 61–70 (2014)
Xuan, P., Ligon, W.B., Srimani, P.K., Ge, R., Luo, F.: Accelerating big data analytics on HPC clusters using two-level storage. Parallel Comput. 61, 18–34 (2017)
Raynaud, T., Haque, R., Ait-Kaci, H.: CedCom: a high-performance architecture for Big Data applications. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), pp. 621–632 (2014)
Cheng, P., Lu, Y., Du, Y., Chen, Z.: Experiences of converging big data analytics frameworks with high performance computing systems. In: Yokota, R., Wu, W. (eds.) Supercomputing Frontiers, pp. 90–106. Springer (2018)
Bhimji, W., et al.: Accelerating Science with the NERSC Burst Buffer Early User Program. Lawrence National Laboratory, Berkeley (2016)
Wang, T., Oral, S., Pritchard, M., Vasko, K., Yu, W.: Development of a burst buffer system for data-intensive applications. arXiv Prepr. arXiv1505.01765 (2015)
Henseler, D., Landsteiner, B., Petesch, D., Wright, C., Wright, N.J.: Architecture and design of Cray DataWarp. In: Cray User Group, CUG (2016)
Wang, T., Mohror, K., Moody, A., Sato, K., Yu, W.: An ephemeral burst-buffer file system for scientific applications. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 69 (2016)
Tang, K., et al.: Toward managing HPC burst buffers effectively: draining strategy to regulate bursty I/O behavior. In: 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 87–98 (2017)
UnifyCR. https://www.exascaleproject.org/project/unifycr-file-system-burst-buffers/. Accessed 22 2019
Islam, N.S., Shankar, D., Lu, X., Wasi-Ur-Rahman, M., Panda, D.K.: Accelerating I/O performance of big data analytics on HPC clusters through RDMA-based key-value store. In: 2015 44th International Conference on Parallel Processing, pp. 280–289 (2015)
Wang, Y., Goldstone, R., Yu, W., Wang, T.: Characterization and optimization of memory-resident MapReduce on HPC systems. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 799–808 (2014)
Yildiz, O., Zhou, A.C., Ibrahim, S.: Improving the effectiveness of burst buffers for big data processing in HPC systems with Eley. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 87–91 (2017)
Yildiz, O., Zhou, A.C., Ibrahim, S.: Improving the effectiveness of burst buffers for big data processing in HPC systems with Eley. Future Gener. Comput. Syst. (2018). https://doi.org/10.1016/j.future.2018.03.029
Chaimov, N., et al.: Scaling Spark on HPC systems. In: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, pp. 97–110 (2016)
Islam, N.S., Wasi-ur-Rahman, M., Lu, X., Panda, D.K.: High performance design for HDFS with byte-addressability of NVM and RDMA. In: Proceedings of the 2016 International Conference on Supercomputing, p. 8 (2016)
Wang, T., et al.: BurstMem: a high-performance burst buffer system for scientific applications. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 71–79 (2014)
Hadoop workload analysis. http://www.pdl.cmu.edu/HLA/index.shtml. Accessed 27 Feb 2018
Liu, N., et al.: On the role of burst buffers in leadership-class storage systems. In 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11 (2012). https://doi.org/10.1109/msst.2012.6232369
Wasi-ur-Rahman, M., Islam, N.S., Lu, X., Panda, D.K.: NVMD: non-volatile memory assisted design for accelerating MapReduce and DAG execution frameworks on HPC systems. In: IEEE International Conference on Big Data (Big Data), pp. 369–374 (2017)
Moving computation is cheaper than moving data. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Accessed 22 Sep 2018
Liu, Q., et al.: Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks. Concurr. Comput. Pract. Exp. 26, 1453–1473 (2014)
Klasky, S., et al.: In situ data processing for extreme-scale computing. In: Proceedings of SciDAC (2011)
ALPINE Project. https://www.exascaleproject.org/project/alpine-algorithms-infrastructure-situ-visualization-analysis/. Accessed 7 Feb 2019
Foster, I., et al.: Computing just what you need: online data analysis and reduction at extreme scales. In: European Conference on Parallel Processing, pp. 3–19 (2017)
Mackey, G., Sehrish, S., Mitchell, C., Bent, J., Wang, J.: USFD: a unified storage framework for SOAR HPC scientific workflows. Int. J. Parallel Emerg. Distrib. Syst. 27, 347–367 (2012)
EZ. https://www.exascaleproject.org/project/ez-fast-effective-parallel-error-bounded-exascale-lossy-compression-scientific-data/. Accessed 7 Feb 2019
Tao, D., Di, S., Chen, Z., Cappello, F.: Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1129–1139 (2017)
Son, S.W., Sehrish, S., Liao, W., Oldfield, R., Choudhary, A.: Reducing I/O variability using dynamic I/O path characterization in petascale storage systems. J. Supercomput. 73, 2069–2097 (2017)
Wang, T., Oral, S., Pritchard, M., Wang, B., Yu, W.: TRIO: burst buffer based I/O orchestration. In: 2015 IEEE International Conference on Cluster Computing, pp. 194–203 (2015)
Kougkas, A., Dorier, M., Latham, R., Ross, R., Sun, X.-H.: Leveraging burst buffer coordination to prevent I/O interference. In: 2016 IEEE 12th International Conference on e-Science (e-Science), pp. 371–380 (2016)
Zhang, X., Jiang, S., Diallo, A., Wang, L.: IR+: removing parallel I/O interference of MPI programs via data replication over heterogeneous storage devices. Parallel Comput. 76, 91–105 (2018)
Han, J., et al.: Accelerating a burst buffer via user-level I/O isolation. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 245–255 (2017)
Xu, C., et al.: Exploiting analytics shipping with virtualized MapReduce on HPC backend storage servers. IEEE Trans. Parallel Distrib. Syst. 27, 185–196 (2016)
da Silva, R.F., Callaghan, S., Deelman, E.: On the use of burst buffers for accelerating data-intensive scientific workflows. In: Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science, p. 2 (2017)
Dreher, M., Raffin, B.: A flexible framework for asynchronous in situ and in transit analytics for scientific simulations. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 277–286 (2014)
Malitsky, N.: Bringing the HPC reconstruction algorithms to Big Data platforms. In: 2016 New York Scientific Data Summit (NYSDS), pp. 1–8 (2016)
OpenFabrics. http://www.openfabrics.org/. Accessed 22 Sep 2018
Wasi-ur-Rahman, M., et al.: High-performance RDMA-based design of Hadoop MapReduce over InfiniBand. In: 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), pp. 1908–1917 (2013)
Rahman, M.W., Lu, X., Islam, N.S., Panda, D.K.: HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. In: Proceedings of the 28th ACM International Conference on Supercomputing, pp. 33–42 (2014)
High Performance Data Analytics: Experiences of Porting the Apache Hama Graph Analytics Framework to an HPC InfiniBand Connected Cluster (White Paper). https://gdmissionsystems.com/-/media/General-Dynamics/Cyber-and-Electronic-Warfare-Systems/PDF/Brochures/high-performance-data-analytics-whitepaper-2015.ashx
Li, M., Lu, X., Hamidouche, K., Zhang, J., Panda, D.K.: Mizan-RMA: accelerating Mizan graph processing framework with MPI RMA. In: IEEE 23rd International Conference on High Performance Computing (HiPC), 42–51 (2016)
Li, M., et al.: Designing MPI library with on-demand paging (ODP) of InfiniBand: challenges and benefits. In: SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 433–443 (2016)
Lu, X., Wang, B., Zha, L., Xu, Z.: Can MPI benefit Hadoop and MapReduce applications? In: 2011 40th International Conference on Parallel Processing Workshops, pp. 371–379 (2011)
Wang, Y., Xu, C., Li, X., Yu, W.: JVM-bypass for efficient Hadoop shuffling. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 569–578 (2013)
Sur, S., Wang, H., Huang, J., Ouyang, X., Panda, D.K.: Can high-performance interconnects benefit Hadoop distributed file system? In: Workshop on Micro Architectural Support for Virtualization, Data Center Computing, and Clouds (MASVDC). Held in Conjunction with MICRO (2010)
Jose, J., et al.: Memcached design on high performance RDMA capable interconnects. In: 2011 International Conference on Parallel Processing, pp. 743–752 (2011)
Jose, J., Luo, M., Sur, S., Panda, D.K.: Unifying UPC and MPI runtimes: experience with MVAPICH. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, p. 5 (2010)
Islam, N.S., et al.: High performance RDMA-based design of HDFS over InfiniBand. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 35 (2012)
Huang, J., et al.: High-performance design of HBase with RDMA over InfiniBand. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 774–785 (2012)
Lu, X., et al.: High-performance design of Hadoop RPC with RDMA over InfiniBand. In: 2013 42nd International Conference on Parallel Processing, pp. 641–650 (2013)
Islam, N.S., Lu, X., Rahman, M.W., Panda, D.K.: SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, pp. 261–264 (2014)
Lu, X., Rahman, M.W.U., Islam, N., Shankar, D., Panda, D.K.: Accelerating Spark with RDMA for big data processing: early experiences. In: 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects, pp. 9–16 (2014)
Islam, N.S., Lu, X., Wasi-ur-Rahman, M., Panda, D.K.: Can parallel replication benefit Hadoop distributed file system for high performance interconnects? In: 2013 IEEE 21st Annual Symposium on High-Performance Interconnects, pp. 75–78 (2013)
Katevenis, M., et al.: Next generation of Exascale-class systems: ExaNeSt Project and the status of its interconnect and storage development. Microprocess. Microsyst. 61, 58–71 (2018)
Zahid, F., Gran, E.G., Bogdański, B., Johnsen, B.D., Skeie, T.: Efficient network isolation and load balancing in multi-tenant HPC clusters. Future Gener. Comput. Syst. 72, 145–162 (2017)
Wang, J., et al.: SideIO: a side I/O system framework for hybrid scientific workflow. J. Parallel Distrib. Comput. 108, 45–58 (2017)
Huang, D., et al.: UNIO: a unified I/O system framework for hybrid scientific workflow. In: Second International Conference on Cloud Computing and Big Data in Asia, pp. 99–114 (2015)
Hadoop on demand. https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.17.1/docs/hod.html. Accessed 22 Sep 2018
Magpie. https://github.com/LLNL/magpie. Accessed 22 Sep 2018
Moody, W.C., Ngo, L.B., Duffy, E., Apon, A.: JUMMP: job uninterrupted maneuverable MapReduce platform. In: 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–8 (2013)
Krishnan, S., Tatineni, M., Baru, C.: myHadoop-Hadoop-on-Demand on Traditional HPC Resources. San Diego Supercomputer Center Technical Report. TR-2011-2. University of California, San Diego (2011)
Lu, T., et al.: Canopus: a paradigm shift towards elastic extreme-scale data analytics on HPC storage. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 58–69 (2017)
EXAHDF5. https://www.exascaleproject.org/project/exahdf5-delivering-efficient-parallel-o-exascale-computing-systems/. Accessed 7 Feb 2019
Mercier, M., Glesser, D., Georgiou, Y., Richard, O.: Big data and HPC collocation: using HPC idle resources for Big Data analytics. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 347–352 (2017). https://doi.org/10.1109/bigdata.2017.8257944
Turilli, M., Santcroos, M., Jha, S.: A comprehensive perspective on the pilot-job abstraction. CoRR (2015). http://arxiv.org/abs/1508.04180
Merzky, A., Santcroos, M., Turilli, M., Jha, S.: RADICAL-Pilot: scalable execution of heterogeneous and dynamic workloads on supercomputers. CoRR (2015). http://arxiv.org/abs/1512.08194
Merzky, A., Weidner, O., Jha, S.: SAGA: a standardized access layer to heterogeneous distributed computing infrastructure. SoftwareX 1, 3–8 (2015)
SAGA-Hadoop. https://github.com/drelu/saga-hadoop. Accessed 22 Sep 2018
Rahman, M.W., Islam, N.S., Lu, X., Shankar, D., Panda, D.K.: MR-Advisor: a comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters. J. Parallel Distrib. Comput. 120, 237–250 (2018)
Jin, H., Ji, J., Sun, X.-H., Chen, Y., Thakur, R.: CHAIO: enabling HPC applications on data-intensive file systems. In: 2012 41st International Conference on Parallel Processing, pp. 369–378 (2012)
Aupy, G., Gainaru, A., Le Fèvre, V.: Periodic I/O scheduling for super-computers. In: International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, pp. 44–66 (2017)
Gao, C., Ren, R., Cai, H.: GAI: a centralized tree-based scheduler for machine learning workload in large shared clusters. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 611–629 (2018)
Ekanayake, S., Kamburugamuve, S., Fox, G.C.: SPIDAL Java: high performance data analytics with Java and MPI on large multicore HPC clusters. In: Proceedings of 24th High Performance Computing Symposium (2016)
NVIDIA NCCL. https://developer.nvidia.com/nccl. Accessed 22 Sep 2018
Wickramasinghe, U.S., Bronevetsky, G., Lumsdaine, A., Friedley, A.: Hybrid MPI: a case study on the Xeon Phi platform. In: ACM Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers, pp. 6:1–6:8 (2014)
DATALIB. https://www.exascaleproject.org/project/datalib-data-libraries-services-enabling-exascale-science/. Accessed 7 Feb 2019
Gittens, A., et al.: Matrix factorizations at scale: a comparison of scientific data analytics in Spark and C +MPI using three case studies. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 204–213 (2016). https://doi.org/10.1109/bigdata.2016.7840606
Jha, S., Qiu, J., Luckow, A., Mantha, P., Fox, G.C.: A tale of two data-intensive paradigms: applications, abstractions, and architectures. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 645–652 (2014)
Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf. Procedia Comput. Sci. 53, 121–130 (2015)
Anderson, M., et al.: Bridging the gap between HPC and Big Data frameworks. Proc. VLDB Endow. 10, 901–912 (2017)
Guo, Y., Bland, W., Balaji, P., Zhou, X.: Fault tolerant MapReduce-MPI for HPC clusters. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 34 (2015)
SCR. https://computation.llnl.gov/projects/scalable-checkpoint-restart-for-mpi. Accessed 22 Sep 2018
Moody, A., Bronevetsky, G., Mohror, K., De Supinski, B.R.: Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)
Rajachandrasekar, R., Moody, A., Mohror, K., Panda, D.K.: A 1 PB/s file system to checkpoint three million MPI tasks. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 143–154 (2013)
VeloC. https://www.exascaleproject.org/project/veloc-low-overhead-transparent-multilevel-checkpoint-restart/. Accessed 7 Feb 2019
You, Y., et al.: Scaling support vector machines on modern HPC platforms. J. Parallel Distrib. Comput. 76, 16–31 (2015)
TeraSort. http://sortbenchmark.org. Accessed 22 Sep 2018
Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.N.: PUMA: Purdue MapReduce benchmarks suite (2012)
IOZone benchmark. http://www.iozone.org. Accessed 22 Sep 2018
Shan, H., Shalf, J.: Using IOR to analyze the I/O performance for HPC platforms. In: Cray User Group Conference 2007, Seattle, WA, USA (2007)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: New Frontiers in Information and Software as Services, pp. 209–228 (2011)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51 (2010)
Gao, W., et al.: BigDataBench: a dwarf-based big data and AI benchmark suite. CoRR (2018). http://arxiv.org/abs/1802.08254
OSU HiBD-benchmark. http://hibd.cse.ohio-state.edu. Accessed 22 Sep 2018
HPL—a portable implementation of the high-performance Linpack benchmark for distributed-memory computers. http://www.netlib.org/benchmark/hpl/
Graph500. https://graph500.org/. Accessed 22 Sep 2018
BLAST. https://blast.ncbi.nlm.nih.gov/Blast.cgi. Accessed 22 Sep 2018
GridMix. https://hadoop.apache.org/docs/r1.2.1/gridmix.html. Accessed 22 Sep 2018
Parallel Workload Archive. http://www.cs.huji.ac.il/labs/parallel/workload/. Accessed 22 Sep 2018
Albrecht, J.: Challenges for the LHC Run 3: Computing and Algorithms. (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pathak, A.R., Pandey, M. & Rautaray, S.S. Approaches of enhancing interoperations among high performance computing and big data analytics via augmentation. Cluster Comput 23, 953–988 (2020). https://doi.org/10.1007/s10586-019-02960-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-019-02960-y