Abstract
Multi-precision multiplication is one of the most fundamental operations on microprocessors to allow public-key cryptography such as RSA and elliptic curve cryptography. In this paper, we present a novel multiplication technique that increases the performance of multiplication by sophisticated caching of operands. Our method significantly reduces the number of needed load instructions which is usually one of the most expensive operations on modern processors. We evaluate our new technique on an 8-b ATmega128 and a 32-b ARM7TDMI microcontroller and compare the results with existing solutions. For the ATmega128, our implementation needs only 2395 clock cycles for a 160-b multiplication. The number of required load instructions is reduced from 167 (needed for the best known hybrid multiplication) to only 80. On the ARM7TDMI, our implementation needs only 281 clock cycles as opposed to 357. For both platforms, the proposed technique outperforms related work by a factor of about 10–23%. We also show that the method scales very well even for larger integer sizes (required for RSA) and limited register sets. It fully complies with existing multiply–accumulate instructions that are integrated in most of the available processors.
Similar content being viewed by others
Notes
Note that we do not consider multiplication methods such as Karatsuba–Ofman or FFT in this paper since they are considered to require more resources and memory accesses on common microcontrollers than the given methods [11].
We assume the allocation of three registers for the accumulator register, whereas \(2+\lceil \log _2 (n)/W\rceil \) registers are actually needed to maintain the sum of partial products.
The code generator is available from http://www.iaik.tugraz.at/content/research/sesys/tools/mulopcache/.
Note that a fully unrolled implementation using such large integer multiplications might be impractical due to the huge amount of code.
The early-terminating multiplier of the ARM7TDMI is susceptible to side-channel attacks that exploit the data-dependent runtime of the multiplier as shown in [6].
References
Atmel Corporation, 8-bit AVR Microcontroller with 128K Bytes In-System Programmable Flash (2007). http://www.atmel.com/dyn/resources/prod_documents/doc2467.pdf
Atmel Corporation, 8-bit AVR Instruction Set (2008). http://www.atmel.com/dyn/resources/prod_documents/doc0856.pdf
M. Aydos, T. Yanik, Ç .K. Koç, An high-speed ECC-based wireless authentication protocol on an ARM microprocessor, in 16th Annual Computer Security Applications Conference (ACSAC 2000), 11–15 December 2000, New Orleans, Louisiana, USA (IEEE, 2000), pp. 401–410
Certicom Research, Standards for Efficient Cryptography, SEC 2: Recommended Elliptic Curve Domain Parameters, Version 2.0 (2010). http://www.secg.org/
P. Comba, Exponentiation cryptosystems on the IBM PC. IBM Syst. J.29(4), 526–538 (1990)
J. Großschädl, E. Oswald, D. Page, M. Tunstall, Side channel analysis of cryptographic software via early-terminating multiplications, in Information, Security and Cryptology ? ICISC 2009, 12th International Conference, Seoul, Korea, December 2–4, 2009, Proceedings, Lecture Notes in Computer Science, vol. 5984, Seoul, Korea (Springer, 2010), pp. 176–192
J. Großschädl, E. Savaş, Instruction set extensions for fast arithmetic in finite fields GF(\(p\)) and GF(\(2^m\)), in CHES 2004, 6th International Workshop, Cambridge, MA, USA, August 11–13, 2004, LNCS, vol. 3156 (Springer, 2004), pp. 133–147
N. Gura, A. Patel, A. Wander, H. Eberle, S. C. Shantz, Comparing elliptic curve cryptography and RSA on 8-Bit CPUs, in CHES 2004, 6th International Workshop, Cambridge, USA, August 11–13, 2004 (Springer, 2004), pp. 119–132
IAR Systems, IAR Embedded Workbench (2012). http://www.iar.com/
A. Kargl, S. Pyka, H. Seuschek, Fast arithmetic on ATmega128 for elliptic curve cryptography (2008). Cryptology ePrint Archive http://eprint.iacr.org/. Report 2008/442
Ç. K. Koç, High speed RSA implementation. Technical Report, RSA Laboratories, RSA Data Security, Inc., Redwood City (1994)
C. Lederer, R. Mader, M. Koschuch, J. Großschädl, A. Szekely, S. Tillich, Energy-efficient implementation of ECDH key exchange for wireless sensor networks, in 3rd International Workshop in Information Security Theory and Practices—WISTP 2009, Brussels, Belgium, September 1–4, 2009, LNCS, vol. 5746 (Springer, 2009), pp. 112–127
A. Lenstra, E. Verheul, Selecting cryptographic key sizes. J. Cryptol.14(4), 255–293 (2001)
A. Liu, P. Ning, TinyECC: a configurable library for elliptic curve cryptography in wireless sensor networks, in International Conference on Information Processing in Sensor Networks-IPSN 2008, April 22–24, 2008, St. Louis, Missouri, USA, pp. 245–256 (2008)
Z. Liu, J. Großschädl, I. Kizhvatov, Efficient and side-channel resistant RSA implementation for 8-bit AVR microcontrollers, in Workshop on the Security of the Internet of Things-SOCIOT 2010, 1st International Workshop, November 29, 2010, Tokyo, Japan. IEEE Computer Society (2010)
M. Medwed, E. Oswald, Template attacks on ECDSA, in 9th International Workshop on Information Security Applications (WISA 2008), Jeju Island, Korea, September 23–25, 2008, Pre-Proceedings, ed. by K.-I. Chung, M. Yung, K. Sohn, pp. 14–27 (2008)
National Institute of Standards and Technology (NIST), SP800-57 Part 1: DRAFT Recommendation for Key Management: Part 1: General (2011). http://csrc.nist.gov/publications/drafts/800-57/Draft_SP800-57-Part1-Rev3_May2011.pdf
nongnu.org. AVR Libc Home Page (2012). http://www.nongnu.org/avr-libc/
J. Pelzl, T. Wollinger, J. Guajardo, C. Paar, Hyperelliptic curve cryptosystems: closing the performance gap to elliptic curves, in Cryptographic Hardware and Embedded Systems–CHES 2003, 5th International Workshop, Cologne, Germany, September 8-10, 2003, Proceedings, ed. by C.D. Walter, Ç.K. Koç, C. Paar. Lecture Notes in Computer Science, vol. 2779 (Springer, 2003), pp. 351–365
Rowley. Crossworks for AVR (2012). http://www.rowley.co.uk/avr/
M. Scott, P. Szczechowiak, Optimizing multiprecision multiplication for public key cryptography (2007). Cryptology ePrint Archive http://eprint.iacr.org/. Report 2007/299
P. Szczechowiak, L.B. Oliveira, M. Scott, M. Collier, R. Dahab, NanoECC: testing the limits of elliptic curve cryptography in sensor networks, in Wireless Sensor Networks 5th European Conference, EWSN 2008, Bologna, Italy, January 30-February 1, 2008, ed. by R. Verdone. LNCS, vol. 4913 (Springer, 2008), pp. 305–320
O. Ugus, A. Hessler, D. Westhoff, Performance of additive homomorphic EC-ElGamal encryption for TinyPEDS, in GI/ITG KuVS Fachgespr?ch ?Drahtlose Sensornetze?, RWTH Aachen, 2007. UbiSec (2007)
L. Uhsadel, A. Poschmann, C. Paar, Enabling full-size public-key algorithms on 8-bit sensor nodes, In Security and Privacy in Ad-hoc and Sensor Networks 4th European Workshop, ESAS 2007, Cambridge, UK, July 2–3 (2007)
H. Wang, Q. Li, Efficient implementation of public key cryptosystems on mote sensors, in Information and Communications Security 8th International Conference, ICICS 2006, Raleigh, NC, USA, December 4–7, 2006. LNCS, vol. 4307 (Springer, 2006), pp. 519–528
S.-B. Xu, L. Batina, Efficient implementation of elliptic curve cryptosystems on an ARM7 with hardware accelerator, in Information Security 4th International Conference, ISC 2001 Malaga, Spain, October 1–3, 2001 Proceedings, ed. by G.I. Davida, Y. Frankel. Lecture Notes in Computer Science, vol. 2200 (Springer, 2001), pp. 266–279
Acknowledgements
The work has been supported by the European Commission through the ICT program under Contract ICT-2007-216646 (European Network of Excellence in Cryptology—ECRYPT II) and under Contract ICT-SEC-2009-5-258754 (Tamper Resistant Sensor Node—TAMPRES).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Mitsuru Matsui
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was done while the authors were at Graz University of Technology (IAIK)
Appendix: Algorithm for Operand-Caching Multiplication
Appendix: Algorithm for Operand-Caching Multiplication
The following pseudocode shows the algorithm for multi-precision multiplication using the operand-caching method. Variables that are located in data memory are denoted by \(M_x\) where x represents the name of the integer a or b. The parameter e describes the number of locally usable registers \(R_a[e-1,\dots ,0]\) and \(R_b[e-1,\dots ,0]\). The triple-word accumulator is denoted by \(ACC = (ACC_2,ACC_1,ACC_0)\).
Rights and permissions
About this article
Cite this article
Hutter, M., Wenger, E. Fast Multi-precision Multiplication for Public-Key Cryptography on Embedded Microprocessors. J Cryptol 33, 1442–1460 (2020). https://doi.org/10.1007/s00145-020-09351-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00145-020-09351-2