Abstract
We show how to implement the Montgomery reduction algorithm for isogeny-based cryptography such that it can utilize the unsigned multiply accumulate accumulate long instruction present on modern ARM architectures. This results in a practical speedup of a factor 1.34 compared to the approach used by SIKE: the supersingular isogeny-based submission to the ongoing post-quantum standardization effort. Moreover, motivated by the recent work of Costello and Hisil (A simple and compact algorithm for SIDH with arbitrary degree isogenies. In: ASIACRYPT 2017, Part II, LNCS. Springer, Heidelberg 2017), which shows that there is only a moderate degradation in performance when evaluating large odd-degree isogenies, we search for more general supersingular isogeny friendly moduli. Using graphics processing units to accelerate this search, we find many such moduli which allow for faster implementations on embedded devices. By combining these two approaches, we manage to make the modular reduction 1.5 times as fast on a 32-bit ARM platform.
Similar content being viewed by others
References
Azarderakhsh, R., Campagna, M., Costello, C., Feo, L.D., Hess, B., Jalali, A., Jao, D., Koziel, B., LaMacchia, B., Longa, P., Naehrig, M., Renes, J., Soukharev, V., Urbanik, D.: Supersingular isogeny key encapsulation. Submission to the NIST Post-Quantum Standardization project (2017)
Azarderakhsh, R., Fishbein, D., Jao, D.: Efficient implementations of a quantum-resistant key-exchange protocol on embedded systems. Technical report (2014). http://cacr.uwaterloo.ca/techreports/2014/cacr2014-20.pdf
Barrett, P.: Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: Odlyzko, A.M. (ed.) CRYPTO’86, LNCS, vol. 263, pp. 311–323. Springer, Heidelberg (1987)
Bernstein, D.J., Chen, T.R., Cheng, C.M., Lange, T., Yang, B.Y.: ECM on graphics cards. In: Joux, A. (ed.) EUROCRYPT 2009, LNCS, vol. 5479, pp. 483–501. Springer, Heidelberg (2009)
Bos, J.W., Friedberger, S.: Fast arithmetic modulo \(2^xp^y\pm 1\). In: Burgess, N., Bruguera, J.D., de Dinechin, F. (eds.) IEEE Symposium on Computer Arithmetic—ARITH 2017, pp. 148–155. IEEE Computer Society (2017). https://doi.org/10.1109/ARITH.2017.15
Chen, L., Jordan, S., Liu, Y., Moody, D., Peralta, R., Perlner, R., Smith-Tone, D.: Report on post-quantum cryptography. NISTIR 8105, National Institute of Standards and Technology (2016). http://csrc.nist.gov/publications/drafts/nistir-8105/nistir_8105_draft.pdf
Costello, C., Hisil, H.: A simple and compact algorithm for SIDH with arbitrary degree isogenies. In: Takagi, T., Peyrin, T. (eds.) ASIACRYPT 2017, Part II, LNCS, vol. 10625, pp. 303–329. Springer, Heidelberg (2017)
Costello, C., Longa, P., Naehrig, M.: Efficient algorithms for supersingular isogeny Diffie–Hellman. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016, Part I, LNCS, vol. 9814, pp. 572–601. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53018-4_21
De Bruijn, N.G.: On the number of positive integers \(\le x\) and free of prime factors \(> y\). In: Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen Series A, vol. 54, pp. 50–60 (1951)
Devoret, M.H., Schoelkopf, R.J.: Superconducting circuits for quantum information: an outlook. Science 339(6124), 1169–1174 (2013)
Dickman, K.: On the frequency of numbers containing prime factors of a certain relative magnitude. Arkiv for matematik, astronomi och fysik 22(10), 1–14 (1930)
Dussé, S.R., Kaliski Jr., B.S.: A cryptographic library for the Motorola DSP56000. In: Damgård, I. (ed.) EUROCRYPT’90, LNCS, vol. 473, pp. 230–244. Springer, Heidelberg (1991)
Faz-Hernández, A., López, J., Ochoa-Jiménez, E., Rodríguez-Henríquez, F.: A faster software implementation of the supersingular isogeny Diffie-Hellman key exchange protocol. IEEE Trans. Comput. 67(11), 1622–1636 (2017)
Gouvêa, C.P.L., López, J.: Software implementation of pairing-based cryptography on sensor networks using the MSP430 microcontroller. In: Roy, B.K., Sendrier, N. (eds.) INDOCRYPT 2009, LNCS, vol. 5922, pp. 248–262. Springer, Heidelberg (2009)
Großschädl, J., Oswald, E., Page, D., Tunstall, M.: Side-channel analysis of cryptographic software via early-terminating multiplications. In: Lee, D., Hong, S. (eds.) ICISC 09, LNCS, vol. 5984, pp. 176–192. Springer, Heidelberg (2010)
Hermans, J., Schneider, M., Buchmann, J., Vercauteren, F., Preneel, B.: Parallel shortest lattice vector enumeration on graphics cards. In: Bernstein, D.J., Lange, T. (eds.) AFRICACRYPT 10, LNCS, vol. 6055, pp. 52–68. Springer, Heidelberg (2010)
Jalali, A., Azarderakhsh, R., Kermani, M.M., Jao, D.: Supersingular isogeny Diffie–Hellman key exchange on 64-bit arm. IEEE Trans. Dependable Secure Comput. (2017). https://doi.org/10.1109/TDSC.2017.2723891
Jao, D., Feo, L.D.: Towards quantum-resistant cryptosystems from supersingular elliptic curve isogenies. In: Yang, B. (ed.) Post-Quantum Cryptography, LNCS, vol. 7071, pp. 19–34. Springer (2011). https://doi.org/10.1007/978-3-642-25405-5_2
Joux, A., Vitse, V.: A crossbred algorithm for solving Boolean polynomial systems. In: Kaczorowski, J., Pieprzyk, J., Pomykala, J. (eds.) Number-Theoretic Methods in Cryptology 2017, LNCS, vol. 10737, pp. 3–21. Springer (2018)
Kaliski Jr., B.: The Z80180 and big-number arithmetic. Dr. Dobb’s J. 18(9), 50–58 (1993)
Karatsuba, A.A., Ofman, Y.: Multiplication of many-digital numbers by automatic computers. In: Doklady Akad. Nauk SSSR, no. 145 in Proceedings of the USSR Academy of Science, pp. 293–294 (1962)
Karmakar, A., Roy, S.S., Vercauteren, F., Verbauwhede, I.: Efficient finite field multiplication for isogeny based post quantum cryptography. In: Duquesne, S., Petkova-Nikova, S. (eds.) Arithmetic of Finite Fields—WAIFI, LNCS, vol. 10064, pp. 193–207 (2016)
Kelly, J., Barends, R., Fowler, A.G., Megrant, A., Jeffrey, E., White, T.C., Sank, D., Mutus, J.Y., Campbell, B., Chen, Y., Chen, Z., Chiaro, B., Dunsworth, A., Hoi, I.C., Neill, C., O’Malley, P.J.J., Quintana, C., Roushan, P., Vainsencher, A., Wenner, J., Cleland, A.N., Martinis, J.M.: State preservation by repetitive error detection in a superconducting quantum circuit. Nature 519, 66–69 (2015)
Koziel, B., Jalali, A., Azarderakhsh, R., Jao, D., Mozaffari-Kermani, M.: NEON-SIDH: efficient implementation of supersingular isogeny Diffie–Hellman key exchange protocol on ARM. In: Foresti, S., Persiano, G. (eds.) Cryptology and Network Security, pp. 88–103. Springer, Berlin (2016)
Kuo, P.C., Schneider, M., Dagdelen, Ö., Reichelt, J., Buchmann, J., Cheng, C.M., Yang, B.Y.: Extreme enumeration on GPU and in clouds—how many dollars you need to break SVP challenges. In: Preneel, B., Takagi, T. (eds.) CHES 2011, LNCS, vol. 6917, pp. 176–191. Springer, Heidelberg (2011)
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA tesla: a unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Moody, D.: Post-quantum cryptography: NIST’s plans for the future. Presentation at PKC 2016 (2016). http://csrc.nist.gov/groups/ST/post-quantum-crypto/documents/pqcrypto-2016-presentation.pdf
Mosca, M.: Cybersecurity in an era with quantum computers: will we be ready? Cryptology ePrint Archive, Report 2015/1075 (2015). http://eprint.iacr.org/2015/1075
Moss, A., Page, D., Smart, N.P.: Toward acceleration of RSA using 3D graphics hardware. In: Galbraith, S.D. (ed.) Proceedings of the 11th IMA International Conference on Cryptography and Coding, Cryptography and Coding 2007, pp. 364–383. Springer (2007)
NVIDIA Corporation: NVIDIA CUDA C Programming Guide v9.1 (2018). http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf
Ramaswami, V.: On the number of positive integers less than \(x\) and free of prime divisors greater than \(x^c\). Bull. AMS 55(12), 1122–1127 (1949)
Scott, M.: Fast machine code for modular multiplication (1995). ftp://ftp.computing.dcu.ie/pub/crypto/fast mod mult2.ps
Seo, H., Liu, Z., Longa, P., Hu, Z.: SIDH on ARM: faster modular multiplications for faster post-quantum supersingular isogeny key exchange. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(3), 1–20 (2018). https://doi.org/10.13154/tches.v2018.i3.1-20
Shand, M., Vuillemin, J.: Fast implementations of RSA cryptography. In: Swartzlander Jr., E.E., Irwin, M.J., Jullien, G.A. (eds.) 11th Symposium on Computer Arithmetic, pp. 252–259. IEEE Computer Society (1993). https://doi.org/10.1109/ARITH.1993.378085
Szerwinski, R., Güneysu, T.: Exploiting the power of GPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008, LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008)
Walter, C.D.: Montgomery exponentiation needs no final subtractions. Electron. Lett. 35, 1831–1832 (1999)
Acknowledgements
The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation program Marie Skłodowska-Curie ITN ECRYPT-NET (Project Reference 643161) and Horizon 2020 project PQCRYPTO (Project Reference 645622). We would like to thank the reviewers for their useful suggestions to improve the quality of this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bos, J.W., Friedberger, S.J. Faster modular arithmetic for isogeny-based crypto on embedded devices. J Cryptogr Eng 10, 97–109 (2020). https://doi.org/10.1007/s13389-019-00214-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13389-019-00214-6