Cooperative Preprocessing at Petabytes on High Performance Computing System

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11335))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1731 Accesses

Abstract

With the explosion of data, we have an urgent demand for data throughput in high performance computing systems. Data-intensive applications are becoming increasingly common in HPC environments. As data scale increases faster than systems, it’s time to fully utilize resources in every aspect, including computing power, storage capacity and data throughput. We can no longer ignore data preprocessing since it’s an important procedure, especially when dealing with large amount of data. How to efficiently perform data preprocessing in current HPC systems? How to make full use of system resources on data-intensive applications? What should be valued when designing new HPC architectures? All these questions need answers. In this paper, we drew a sketch for procedure of data-intensive applications, which lead to an adaptive resource allocation scheme according to procedure requirements. We analyzed characters of preprocessing and designed a preprocessing model for data-intensive applications in HPC systems. It has not only fulfilled the demand for computing but also meet the need of throughput, with cooperative work in storage system and storage management system. Experiments were done on Sunway TaihuLight, one of the world’s fastest supercomputers. The whole procedure of preprocessing at Petabytes can be done in hours without interfering other ongoing applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On the Performance of Spark on HPC Systems: Towards a Complete Picture

Design and Implementation of the Tianhe-2 Data Storage and Management System

Article 17 January 2020

Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

References

Chodorow, K.: MongoDB: The Definitive Guide: Powerful and Scalable Data Storage. O’Reilly. Media Inc., Newton (2013)
Google Scholar
Fu, H., et al.: The sunway taihulight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 072001 (2016)
Article Google Scholar
Huang, H., Lin, J., Chen, C., Fan, M.: Review of outlier detection. Appl. Res. Comput. 8, 002 (2006)
Google Scholar
Islam, N.S., Lu, X., Wasi-ur Rahman, M., Shankar, D., Panda, D.K.: Triple-h: a hybrid approach to accelerate hdfs on hpc clusters with heterogeneous storage architecture. In: 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 101–110. IEEE (2015)
Google Scholar
Islam, N.S., Shankar, D., Lu, X., Wasi-Ur-Rahman, M., Panda, D.K.: Accelerating I/O performance of big data analytics on HPC clusters through RDMA-based key-value store. In: 44th International Conference on Parallel Processing (ICPP), pp. 280–289. IEEE (2015)
Google Scholar
Jian, Z., Jin, X.: Research on data preprocess in data mining and its application. Appl. Res. Comput. 7(117–118), 157 (2004)
Google Scholar
Kalmegh, P., Navathe, S.B.: Graph database design challenges using hpc platforms. In: High Performance. Computing, Networking, Storage and Analysis (SCC), SC Companion, pp. 1306–1309. IEEE (2012)
Google Scholar
Miller, J.J.: Graph database applications and concepts with neo4j. In: Proceedings of the Southern Association for Information Systems Conference, Atlanta, GA, USA, vol. 2324, p. 36 (2013)
Google Scholar
Miyoshi, T., Kondo, K., Terasaki, K.: Big ensemble data assimilation in numerical weather prediction. Computer 48(11), 15–21 (2015)
Article Google Scholar
Miyoshi, T., et al.: “Big data assimilation” revolutionizing severe weather prediction. Bull. Am. Meteorol. Soc. 97(8), 1347–1354 (2016)
Article Google Scholar
Wenguang, C.: Big data and high performance computing, 003, pp. 1–6 (2015)
Google Scholar
Team at the University of Wisconsin Madison, H.: High Throughput Computing, June 2015. http://research.cs.wisc.edu/htcondor/htc.html
Yi, Z., Peng, Z., Xuebin, C., Tie, N., Zongyan, C.: A brief view on requirements and development of high performance computing application. J. Comput. Res. Dev. 10, 001 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi, China
Rujun Sun & Lufei Zhang
National Super Computing Wuxi Center, Wuxi, China
Xiyang Wang

Authors

Rujun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Lufei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiyang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rujun Sun .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, R., Zhang, L., Wang, X. (2018). Cooperative Preprocessing at Petabytes on High Performance Computing System. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11335. Springer, Cham. https://doi.org/10.1007/978-3-030-05054-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-05054-2_16
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05053-5
Online ISBN: 978-3-030-05054-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cooperative Preprocessing at Petabytes on High Performance Computing System

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the Performance of Spark on HPC Systems: Towards a Complete Picture

Design and Implementation of the Tianhe-2 Data Storage and Management System

Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Cooperative Preprocessing at Petabytes on High Performance Computing System

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On the Performance of Spark on HPC Systems: Towards a Complete Picture

Design and Implementation of the Tianhe-2 Data Storage and Management System

Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation