Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient AES implementation on Sunway TaihuLight supercomputer: : A systematic approach

Published: 01 April 2020 Publication History

Abstract

Encryption is an important technique to improve information security for many real-world applications. The Advanced Encryption Standard (AES) is a widely-used efficient cryptographic algorithm. Although AES is fast both in software and hardware, it is time-consuming to do data encryption especially for large amount of data. Therefore, it is a lasting effort to accelerate AES operations. This paper presents SW-AES, a parallel AES implementation on the Sunway TaihuLight, one of the fastest supercomputers in the world that takes the SW26010 processor as the basic building block. According to the architectural features of SW26010, SW-AES exploits parallelism from different levels, including (1) inter-CPE (Computing Processing Element) data parallelism that distributes tasks among the 256 on-chip CPEs, (2) intra-CPE data parallelism enabled by the Single-Instruction Multiple-Data (SIMD) instructions inside each CPE, and (3) instruction-level parallelism that pipelines memory access and the computation. In addition, corresponding to the two application scenarios, SW-AES presents scalable ways to efficiently run AES on many nodes. As a result, SW-AES can gain a maximum throughput of 13.50 GB/s on a single SW26010 node, which is 216.23× higher than the latest parallel AES implementation on the Sunway TaihuLight, and about 37.3% higher than the latest AES implementation on the GTX 480 GPU. When running on 1024 computing nodes with each one processing 1 GB data, SW-AES can achieve a throughput of 13819.25 GB/s. On the contrast, only a throughput of 63.91 GB/s can be achieved by the latest related work on the Sunway TaihuLight.

Highlights

A data layout enabling SIMD operations inside one Computing Processor Element.
An S-Box lookup strategy improving the performance of nonvectorized memory access.
A data parallelism to fully use the 256 Computing Processor Elements on a chip.
New mechanisms to scale AES operations of different scenarios to many nodes.

References

[1]
Ao Y., Yang C., Wang X., Xue W., Fu H., Liu F., Gan L., Xu P., Ma W., 26 pflops stencil computations for atmospheric modeling on sunway taihulight, in: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2017, pp. 535–544.
[2]
Archer B.J., Seventy Years of Computing in the Nuclear Weapons Program, Los Alamos National Laboratory (LANL), 2015.
[3]
Bos J.W., Osvik D.A., Stefan D., Fast implementations of aes on various platforms, 2009, IACR Cryptol. ePrint Archive, 2009, 501.
[4]
Chen Y., Li K., Fei X., Quan Z., Li K., Implementation and optimization of AES algorithm on the sunway taihulight, in: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), IEEE, 2016, pp. 256–261.
[5]
Daemen J., Rijmen V., Specification for the advanced encryption standard (aes), Fed. Inf. Process. Stand. Publ. 197 (2001).
[6]
Dong W., Kang L., Quan Z., Li K., Li K., Hao Z., Xie X.-H., Implementing molecular dynamics simulation on sunway taihulight system, in: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), IEEE, 2016, pp. 443–450.
[7]
Fang J., Fu H., Zhao W., Chen B., Zheng W., Yang G., Swdnn: A library for accelerating deep learning applications on sunway taihulight, in: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2017, pp. 615–624.
[8]
Fu H., Liao J., Xue W., Wang L., Chen D., Gu L., Xu J., Ding N., Wang X., He C., et al., Refactoring and optimizing the community atmosphere model (cam) on the sunway taihulight supercomputer, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, 2016, p. 83.
[9]
Fu H., Liao J., Yang J., Wang L., Song Z., Huang X., Yang C., Xue W., Liu F., Qiao F., et al., The sunway taihulight supercomputer: system and applications, Sci. China Inf. Sci. 59 (7) (2016) 072001.
[10]
Guo G.-l., Qian Q., Zhang R., Different implementations of aes cryptographic algorithm, in: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), IEEE, 2015, pp. 1848–1853.
[11]
Guürkaynak F., Burg A., Felber N., Fichtner W., Gasser D., Hug F., Kaeslin H., A 2 gb/s balanced aes crypto-chip implementation, in: Proceedings of the 14th ACM Great Lakes Symposium on VLSI, ACM, 2004, pp. 39–44.
[12]
Harrison O., Waldron J., Aes encryption implementation and analysis on commodity graphics processing units, in: International Workshop on Cryptographic Hardware and Embedded Systems, Springer, 2007, pp. 209–226.
[13]
Iwai K., Kurokawa T., Nisikawa N., Aes encryption implementation on cuda gpu and its analysis, in: 2010 First International Conference on Networking and Computing (ICNC), IEEE, 2010, pp. 209–214.
[14]
Käsper E., Schwabe P., Faster and timing-attack resistant aes-gcm, in: Cryptographic Hardware and Embedded Systems-CHES 2009, Springer, 2009, pp. 1–17.
[15]
Khan A., Al-Mouhamed M., Almousa A., Fatayar A., Ibrahim A., Siddiqui A., Aes-128 ecb encryption on gpus and effects of input plaintext patterns on performance, in: 2014 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), IEEE, 2014, pp. 1–6.
[16]
Li L., Fang J., Fu H., Jiang J., Zhao W., He C., You X., Yang G., Swcaffe: A parallel framework for accelerating deep learning applications on sunway taihulight, in: IEEE International Conference on Cluster Computing, CLUSTER 2018, Belfast, UK, September 10-13, 2018, IEEE, 2018, pp. 413–422.
[17]
Li L., Fang J., Jiang J., Gan L., Zheng W., Fu H., Yang G., SW-AES: accelerating AES algorithm on the sunway taihulight, in: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), Guangzhou, China, December 12-15, 2017, IEEE, 2017, pp. 1204–1211.
[18]
Lim R.K., Petzold L.R., Koç Ç.K., Bitsliced high-performance aes-ecb on gpus, in: The New Codebreakers, Springer, 2016, pp. 125–133.
[19]
Lin H., Tang X., Yu B., Zhuo Y., Chen W., Zhai J., Yin W., Zheng W., Scalable graph traversal on sunway taihulight with ten million cores, in: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2017, pp. 635–645.
[20]
Liu J., Qin H., Wang Y., Yang G., Zheng J., Yao Y., Zheng Y., Liu Z., Liu X., Largest particle simulations downgrade the runaway electron risk for ITER, 2016, arXiv preprint arXiv:1611.02362.
[21]
Liu Q., Xu Z., Yuan Y., A 66.1 gbps single-pipeline aes on fpga, in: 2013 International Conference on Field-Programmable Technology (FPT), IEEE, 2013, pp. 378–381.
[22]
Nishikawa N., Iwai K., Kurokawa T., Granularity optimization method for aes encryption implementation on cuda, IEICE Tech. Rep. VLSI Des. Technol. 109 (393) (2010) 107–112.
[23]
Nishikawa N., Iwai K., Kurokawa T., High-performance symmetric block ciphers on cuda, in: 2011 Second International Conference on Networking and Computing (ICNC), IEEE, 2011, pp. 221–227.
[24]
Schilling R., Unterluggauer T., Mangard S., Gürkaynak F.K., Muehlberghuber M., Benini L., High speed asic implementations of leakage-resilient cryptography, in: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, IEEE, 2018, pp. 1259–1264.
[25]
Shastry P., Kulkarni A., Sutaone M.S., Asic implementation of aes, in: 2012 Annual IEEE India Conference (INDICON), IEEE, 2012, pp. 1255–1259.
[26]
Wang Y., Ha Y., High throughput and resource efficient aes encryption/decryption for sans, in: 2016 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2016, pp. 1166–1169.
[27]
Wolkerstorfer J., Oswald E., Lamberger M., An asic implementation of the aes sboxes, in: Cryptographers’ Track at the RSA Conference, Springer, 2002, pp. 67–78.
[28]
Xu Z., Lin J., Matsuoka S., Benchmarking sw26010 many-core processor, in: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, 2017, pp. 743–752.
[29]
Yang C., Xue W., Fu H., You H., Wang X., Ao Y., Liu F., Gan L., Xu P., Wang L., et al., 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, 2016, p. 6.
[30]
Zhang J., Zhou C., Wang Y., Ju L., Du Q., Chi X., Xu D., Chen D., Liu Y., Liu Z., Extreme-scale phase field simulations of coarsening dynamics on the sunway taihulight supercomputer, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, 2016, p. 4.

Cited By

View all
  • (2025)An efficient heterogeneous parallel password recovery system on MT-3000The Journal of Supercomputing10.1007/s11227-024-06532-981:1Online publication date: 1-Jan-2025
  • (2023)Parallel SHA-256 on SW26010 many-core processor for hashing of multiple messagesThe Journal of Supercomputing10.1007/s11227-022-04750-779:2(2332-2355)Online publication date: 1-Feb-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing  Volume 138, Issue C
Apr 2020
231 pages

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 April 2020

Author Tags

  1. High-performance computing
  2. Supercomputer
  3. AES algorithm
  4. Vectorization
  5. Parallelism

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)An efficient heterogeneous parallel password recovery system on MT-3000The Journal of Supercomputing10.1007/s11227-024-06532-981:1Online publication date: 1-Jan-2025
  • (2023)Parallel SHA-256 on SW26010 many-core processor for hashing of multiple messagesThe Journal of Supercomputing10.1007/s11227-022-04750-779:2(2332-2355)Online publication date: 1-Feb-2023

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media