Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Efficient AES implementation on Sunway TaihuLight supercomputer: : A systematic approach

Published: 01 April 2020 Publication History

Abstract

Encryption is an important technique to improve information security for many real-world applications. The Advanced Encryption Standard (AES) is a widely-used efficient cryptographic algorithm. Although AES is fast both in software and hardware, it is time-consuming to do data encryption especially for large amount of data. Therefore, it is a lasting effort to accelerate AES operations. This paper presents SW-AES, a parallel AES implementation on the Sunway TaihuLight, one of the fastest supercomputers in the world that takes the SW26010 processor as the basic building block. According to the architectural features of SW26010, SW-AES exploits parallelism from different levels, including (1) inter-CPE (Computing Processing Element) data parallelism that distributes tasks among the 256 on-chip CPEs, (2) intra-CPE data parallelism enabled by the Single-Instruction Multiple-Data (SIMD) instructions inside each CPE, and (3) instruction-level parallelism that pipelines memory access and the computation. In addition, corresponding to the two application scenarios, SW-AES presents scalable ways to efficiently run AES on many nodes. As a result, SW-AES can gain a maximum throughput of 13.50 GB/s on a single SW26010 node, which is 216.23× higher than the latest parallel AES implementation on the Sunway TaihuLight, and about 37.3% higher than the latest AES implementation on the GTX 480 GPU. When running on 1024 computing nodes with each one processing 1 GB data, SW-AES can achieve a throughput of 13819.25 GB/s. On the contrast, only a throughput of 63.91 GB/s can be achieved by the latest related work on the Sunway TaihuLight.

Highlights

A data layout enabling SIMD operations inside one Computing Processor Element.
An S-Box lookup strategy improving the performance of nonvectorized memory access.
A data parallelism to fully use the 256 Computing Processor Elements on a chip.
New mechanisms to scale AES operations of different scenarios to many nodes.

References

[1]
Ao Y., Yang C., Wang X., Xue W., Fu H., Liu F., Gan L., Xu P., Ma W., 26 pflops stencil computations for atmospheric modeling on sunway taihulight, in: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2017, pp. 535–544.
[2]
Archer B.J., Seventy Years of Computing in the Nuclear Weapons Program, Los Alamos National Laboratory (LANL), 2015.
[3]
Bos J.W., Osvik D.A., Stefan D., Fast implementations of aes on various platforms, 2009, IACR Cryptol. ePrint Archive, 2009, 501.
[4]
Chen Y., Li K., Fei X., Quan Z., Li K., Implementation and optimization of AES algorithm on the sunway taihulight, in: 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), IEEE, 2016, pp. 256–261.
[5]
Daemen J., Rijmen V., Specification for the advanced encryption standard (aes), Fed. Inf. Process. Stand. Publ. 197 (2001).
[6]
Dong W., Kang L., Quan Z., Li K., Li K., Hao Z., Xie X.-H., Implementing molecular dynamics simulation on sunway taihulight system, in: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), IEEE, 2016, pp. 443–450.
[7]
Fang J., Fu H., Zhao W., Chen B., Zheng W., Yang G., Swdnn: A library for accelerating deep learning applications on sunway taihulight, in: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2017, pp. 615–624.
[8]
Fu H., Liao J., Xue W., Wang L., Chen D., Gu L., Xu J., Ding N., Wang X., He C., et al., Refactoring and optimizing the community atmosphere model (cam) on the sunway taihulight supercomputer, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, 2016, p. 83.
[9]
Fu H., Liao J., Yang J., Wang L., Song Z., Huang X., Yang C., Xue W., Liu F., Qiao F., et al., The sunway taihulight supercomputer: system and applications, Sci. China Inf. Sci. 59 (7) (2016) 072001.
[10]
Guo G.-l., Qian Q., Zhang R., Different implementations of aes cryptographic algorithm, in: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), IEEE, 2015, pp. 1848–1853.
[11]
Guürkaynak F., Burg A., Felber N., Fichtner W., Gasser D., Hug F., Kaeslin H., A 2 gb/s balanced aes crypto-chip implementation, in: Proceedings of the 14th ACM Great Lakes Symposium on VLSI, ACM, 2004, pp. 39–44.
[12]
Harrison O., Waldron J., Aes encryption implementation and analysis on commodity graphics processing units, in: International Workshop on Cryptographic Hardware and Embedded Systems, Springer, 2007, pp. 209–226.
[13]
Iwai K., Kurokawa T., Nisikawa N., Aes encryption implementation on cuda gpu and its analysis, in: 2010 First International Conference on Networking and Computing (ICNC), IEEE, 2010, pp. 209–214.
[14]
Käsper E., Schwabe P., Faster and timing-attack resistant aes-gcm, in: Cryptographic Hardware and Embedded Systems-CHES 2009, Springer, 2009, pp. 1–17.
[15]
Khan A., Al-Mouhamed M., Almousa A., Fatayar A., Ibrahim A., Siddiqui A., Aes-128 ecb encryption on gpus and effects of input plaintext patterns on performance, in: 2014 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), IEEE, 2014, pp. 1–6.
[16]
Li L., Fang J., Fu H., Jiang J., Zhao W., He C., You X., Yang G., Swcaffe: A parallel framework for accelerating deep learning applications on sunway taihulight, in: IEEE International Conference on Cluster Computing, CLUSTER 2018, Belfast, UK, September 10-13, 2018, IEEE, 2018, pp. 413–422.
[17]
Li L., Fang J., Jiang J., Gan L., Zheng W., Fu H., Yang G., SW-AES: accelerating AES algorithm on the sunway taihulight, in: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), Guangzhou, China, December 12-15, 2017, IEEE, 2017, pp. 1204–1211.
[18]
Lim R.K., Petzold L.R., Koç Ç.K., Bitsliced high-performance aes-ecb on gpus, in: The New Codebreakers, Springer, 2016, pp. 125–133.
[19]
Lin H., Tang X., Yu B., Zhuo Y., Chen W., Zhai J., Yin W., Zheng W., Scalable graph traversal on sunway taihulight with ten million cores, in: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, 2017, pp. 635–645.
[20]
Liu J., Qin H., Wang Y., Yang G., Zheng J., Yao Y., Zheng Y., Liu Z., Liu X., Largest particle simulations downgrade the runaway electron risk for ITER, 2016, arXiv preprint arXiv:1611.02362.
[21]
Liu Q., Xu Z., Yuan Y., A 66.1 gbps single-pipeline aes on fpga, in: 2013 International Conference on Field-Programmable Technology (FPT), IEEE, 2013, pp. 378–381.
[22]
Nishikawa N., Iwai K., Kurokawa T., Granularity optimization method for aes encryption implementation on cuda, IEICE Tech. Rep. VLSI Des. Technol. 109 (393) (2010) 107–112.
[23]
Nishikawa N., Iwai K., Kurokawa T., High-performance symmetric block ciphers on cuda, in: 2011 Second International Conference on Networking and Computing (ICNC), IEEE, 2011, pp. 221–227.
[24]
Schilling R., Unterluggauer T., Mangard S., Gürkaynak F.K., Muehlberghuber M., Benini L., High speed asic implementations of leakage-resilient cryptography, in: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, IEEE, 2018, pp. 1259–1264.
[25]
Shastry P., Kulkarni A., Sutaone M.S., Asic implementation of aes, in: 2012 Annual IEEE India Conference (INDICON), IEEE, 2012, pp. 1255–1259.
[26]
Wang Y., Ha Y., High throughput and resource efficient aes encryption/decryption for sans, in: 2016 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2016, pp. 1166–1169.
[27]
Wolkerstorfer J., Oswald E., Lamberger M., An asic implementation of the aes sboxes, in: Cryptographers’ Track at the RSA Conference, Springer, 2002, pp. 67–78.
[28]
Xu Z., Lin J., Matsuoka S., Benchmarking sw26010 many-core processor, in: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, 2017, pp. 743–752.
[29]
Yang C., Xue W., Fu H., You H., Wang X., Ao Y., Liu F., Gan L., Xu P., Wang L., et al., 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, 2016, p. 6.
[30]
Zhang J., Zhou C., Wang Y., Ju L., Du Q., Chi X., Xu D., Chen D., Liu Y., Liu Z., Extreme-scale phase field simulations of coarsening dynamics on the sunway taihulight supercomputer, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE Press, 2016, p. 4.

Cited By

View all
  • (2025)An efficient heterogeneous parallel password recovery system on MT-3000The Journal of Supercomputing10.1007/s11227-024-06532-981:1Online publication date: 1-Jan-2025
  • (2023)Parallel SHA-256 on SW26010 many-core processor for hashing of multiple messagesThe Journal of Supercomputing10.1007/s11227-022-04750-779:2(2332-2355)Online publication date: 1-Feb-2023

Index Terms

  1. Efficient AES implementation on Sunway TaihuLight supercomputer: A systematic approach
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Journal of Parallel and Distributed Computing
    Journal of Parallel and Distributed Computing  Volume 138, Issue C
    Apr 2020
    231 pages

    Publisher

    Academic Press, Inc.

    United States

    Publication History

    Published: 01 April 2020

    Author Tags

    1. High-performance computing
    2. Supercomputer
    3. AES algorithm
    4. Vectorization
    5. Parallelism

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)An efficient heterogeneous parallel password recovery system on MT-3000The Journal of Supercomputing10.1007/s11227-024-06532-981:1Online publication date: 1-Jan-2025
    • (2023)Parallel SHA-256 on SW26010 many-core processor for hashing of multiple messagesThe Journal of Supercomputing10.1007/s11227-022-04750-779:2(2332-2355)Online publication date: 1-Feb-2023

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media