US20010010077A1 - Computationally efficient modular multiplication method and apparatus - Google Patents
Computationally efficient modular multiplication method and apparatus Download PDFInfo
- Publication number
- US20010010077A1 US20010010077A1 US09/758,782 US75878201A US2001010077A1 US 20010010077 A1 US20010010077 A1 US 20010010077A1 US 75878201 A US75878201 A US 75878201A US 2001010077 A1 US2001010077 A1 US 2001010077A1
- Authority
- US
- United States
- Prior art keywords
- bits
- multiplier
- memory
- output port
- preload register
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000036316 preload Effects 0.000 claims abstract description 55
- 230000008878 coupling Effects 0.000 claims 2
- 238000010168 coupling process Methods 0.000 claims 2
- 238000005859 coupling reaction Methods 0.000 claims 2
- 238000012545 processing Methods 0.000 description 18
- 238000004364 calculation method Methods 0.000 description 16
- 230000009467 reduction Effects 0.000 description 14
- 238000004891 communication Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 5
- 238000012546 transfer Methods 0.000 description 3
- 230000003466 anti-cipated effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
- G06F7/5324—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel partitioned, i.e. using repetitively a smaller parallel parallel multiplier or using an array of such smaller multipliers
-
- A—HUMAN NECESSITIES
- A47—FURNITURE; DOMESTIC ARTICLES OR APPLIANCES; COFFEE MILLS; SPICE MILLS; SUCTION CLEANERS IN GENERAL
- A47B—TABLES; DESKS; OFFICE FURNITURE; CABINETS; DRAWERS; GENERAL DETAILS OF FURNITURE
- A47B21/00—Tables or desks for office equipment, e.g. typewriters, keyboards
- A47B21/03—Tables or desks for office equipment, e.g. typewriters, keyboards with substantially horizontally extensible or adjustable parts other than drawers, e.g. leaves
- A47B21/0371—Platforms for supporting wrists
-
- A—HUMAN NECESSITIES
- A47—FURNITURE; DOMESTIC ARTICLES OR APPLIANCES; COFFEE MILLS; SPICE MILLS; SUCTION CLEANERS IN GENERAL
- A47C—CHAIRS; SOFAS; BEDS
- A47C16/00—Stand-alone rests or supports for feet, legs, arms, back or head
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B43—WRITING OR DRAWING IMPLEMENTS; BUREAU ACCESSORIES
- B43L—ARTICLES FOR WRITING OR DRAWING UPON; WRITING OR DRAWING AIDS; ACCESSORIES FOR WRITING OR DRAWING
- B43L15/00—Supports for attachment to hands or arms for facilitating writing or drawing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/722—Modular multiplication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/728—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic using Montgomery reduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/723—Modular exponentiation
Definitions
- the present invention relates to cryptographic systems, and more particularly, to a highly efficient multiplier for performing modular reduction operations integral to cryptographic key calculations.
- Cryptographic systems are commonly used to restrict unauthorized access to messages communicated over otherwise insecure channels.
- cryptographic systems use a unique key, such as a series of numbers, to control an algorithm used to encrypt a message before it is transmitted over an insecure communication channel to a receiver.
- the receiver must have access to the same key in order to decode the encrypted message.
- it is essential that the key be communicated in advance by the sender to the receiver over a secure channel in order to maintain the security of the cryptographic system; however, secure communication of the key is hampered by the unavailability and expense of secure communication channels.
- the spontaneity of most business communications is impeded by the need to communicate the key in advance.
- a public key cryptographic system utilizes a pair of keys in which one is publicly communicated, i.e., the public key, and the other is kept secret by the receiver, i.e., the private key. While the private key is mathematically related to the public key, it is practically impossible to derive the private key from the public key alone. In this way, the public key is used to encrypt a message, and the private key is used to decrypt the message.
- a drawback of such cryptographic systems is that calculation of the modular exponentiation remains a daunting mathematical task even to an authorized receiver using a high speed computer. With the prevalence of public computer networks used to transmit confidential data for personal, business and governmental purposes, it is anticipated that most computer users will want cryptographic systems to control access to their data. Despite the increased security, the difficulty of the modular exponentiation calculation will substantially drain computer resources and degrade data throughput rates, and thus represents a major impediment to the widespread adoption of commercial cryptographic systems.
- the efficient multiplier architecture uses a preload register, coupled to a multiplier at a second input port via a KN bit bus to load the value of the “a” multiplicand in the multiplier in a single clock pulse.
- the “b” multiplicand (which is also KN bits long) is supplied to the multiplier N bits at a time from a memory via an N bit bus coupled to a multiplier.
- the multiplier multiplies the N bits of the “b” multiplicand by the KN bits of the “a” multiplicand and provides that product at a multiplier output N bits at a time, where it can be supplied to the memory.
- the efficient multiplication method using the foregoing architecture begins by providing KN bits of the multiplicand “a” from a preload register to a second multiplier input port in a single clock pulse. Then, N bits of the multiplicand “b” are provided to a first multiplier input port, also in a single clock pulse. The KN bits of the number “a” are multiplied by the K bits of the number “b” until all of the KN bits of the “b” multiplicand are provided to the first multiplier input port and multiplied by the KN bits of the “a” multiplicand. When completed, these operations result in an output number, which is then transmitted to the memory, where it can be made available for further processing.
- one embodiment of the present invention loads a predicted (future) value for multiplicand “a” into the preload register while multiplication operations on the current “a” and “b” multiplicands are being performed. This technique further reduces the clock cycles necessary to load and multiply the parameters.
- FIG. 1 is a block diagram of an exemplary application of a modular exponentiator within a cryptographic system
- FIG. 2 is a block diagram of the modular exponentiator
- FIG. 3 is a system level flow diagram of the functions performed by the modular exponentiator
- FIG. 4 is a flow chart showing an exponent bit scanning operation performed by the modular exponentiator
- FIG. 5 a - c are block diagrams of an exponent register within various stages of the exponent bit scanning operation of FIG. 4;
- FIG. 6 is a flow chart showing a multiplication operation performed by the modular exponentiator
- FIG. 7 is a flow chart showing a squaring operation performed in conjunction with the multiplication operation of FIG. 6;
- FIG. 8 is a chart showing an exemplary exponent bit scanning operation in accordance with the flow chart of FIG. 4;
- FIG. 9 is a chart showing an exemplary multiplication and squaring operation in accordance with the flow charts of FIGS. 6 and 7;
- FIG. 10 is a block diagram showing a system architecture which can be employed to practice the present invention.
- FIG. 11 is a block diagram showing one embodiment of the multiplier and associated modules
- FIG. 12 is a timing diagram showing the pre-loading of predictive multiplicands.
- FIGS. 13 and 14 are flow charts depicting the multiplication operations.
- the present invention satisfies the need for a high speed modular exponentiation method and apparatus which provides a sufficient level of communication security while minimizing the impact to computer system performance and data throughput rates.
- like element numerals are used to describe like elements in one or more of the figures.
- FIG. 1 a block diagram of an application of a modular exponentiator 20 within an exemplary cryptographic system 10 is illustrated.
- the exemplary cryptographic system 10 includes a central processing unit (CPU) 12 , a random access memory (RAM) 14 , a read only memory (ROM) 16 , and modular exponentiator 20 .
- CPU central processing unit
- RAM random access memory
- ROM read only memory
- modular exponentiator 20 Each of the elements of the cryptographic system 10 are coupled together by a bidirectional data and control bus 18 , over which data and control messages are transmitted.
- the CPU 12 controls the operation of the cryptographic system 10 , and may be provided by a conventional microprocessor or digital signal processor circuit.
- the RAM 14 provides temporary data storage for operation of the CPU 12
- the ROM 16 provides for non-volatile storage of an instruction set, i.e., software, that is executed in a sequential manner by the CPU 12 to control the overall operation of the cryptographic system 10 .
- the modular exponentiator 20 may comprise a special function device, such as an application specific integrated circuit (ASIC) or field programmable gate array (FPGA), that is accessed by the CPU 12 to perform modular exponentiation operations.
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- the elements of the cryptographic system 10 may all be contained within a single ASIC or FPGA in which the modular exponentiator 20 is provided as an embedded core process.
- the cryptographic system provides an interface between a non-secure communication channel and a data user.
- the cryptographic system receives encrypted data from an external source, such as a remote transmitter (not shown) which is communicating with the cryptographic system over the communication channel.
- the encrypted data is decrypted by the cryptographic system, and the decrypted data is provided to the data user.
- the data user provides decrypted data to the cryptographic system for encryption and subsequent transmission across the communication channel.
- the cryptographic system also receives and transmits various non-encrypted messages, such as control data and the public key information. It should be apparent that all communications with the cryptographic system occur via the data and control bus 18 .
- the modular exponentiator 20 is illustrated in greater detail in FIG. 2.
- the modular exponentiator 20 comprises an interface logic unit 22 , a pair of parallel processing units 24 a , 24 b , and a RAM 25 , which all communicate internally over a data and control bus 27 .
- the interface logic unit 22 controls communications between the modular exponentiator 20 and the data and control bus 18 of the cryptographic system 10 described above.
- the processing units 24 a , 24 b comprise respective control units 26 a , 26 b and multiplier units 28 a , 28 b , which further comprise internal circuit elements that execute a modular exponentiation process, as will be further described below.
- the RAM 25 provides for temporary storage of data values generated by the control units 26 a , 26 b and multiplier units 28 a , 28 b while executing a modular exponentiation operation.
- k is 1,024 bits.
- b e mod n ((( q ⁇ 1 mod p* ( b r e q mod p+p ⁇ b r e q mod q ))mod p )* q )+ b r e q mod q
- q ⁇ 1 mod p is a special value called an inverse which is derived from the Chinese remainder theorem, as known in the art.
- q ⁇ 1 mod p is the inverse of q mod p. Since the inverse represents a modular exponentiation of the same order as b e p mod p, the inverse may be pre-calculated in advance, and stored in the RAM 25 at step 108 .
- the values e p and e q are k/2 bit values equal to e mod (p ⁇ 1) and e mod (q ⁇ 1), respectively.
- a reduced base term b r for each of b r e p mod p and b r e q mod q is provided by taking a modular reduction of b with respect to p and q, respectively.
- the reduced base terms b r thus have a k/2 bit length as well.
- the b r e p mod q term is subtracted from b r e p mod p and the result is added to p at step 106 .
- the resulting sum is multiplied by the inverse q ⁇ 1 modp which was pre-calculated at step 108 .
- This step may be performed by one of the multipliers 28 a , 28 b , which are optimized for modular operations as will be further described below.
- the resulting product is modularly reduced with respect to p at step 109 , and further multiplied by q at step 110 to produce a k-bit value.
- step 111 the product of that final multiplication is added to b r e q mod q at step 111 , which was previously calculated at step 105 . It should be appreciated that the modular reduction that occurs at step 109 is much easier than the original modular exponentiation in view of the substantial reduction in size of the original b e term. This final solution to the modular exponentiation is provided to the data and control bus 18 for further use by the CPU 12 .
- FIG. 4 illustrates a flow chart describing a routine referred to herein as exponent bit-scanning, which is used to reduce the number of multiplications necessary to perform an exponentiation.
- the exponent bit-scanning routine factors the exponentials b r e p and b r e q into a product of precomputed powers of the reduced base b r modularly reduced with respect to p or q.
- the routine may be coded in firmware and executed sequentially by the respective processing units 24 a , 24 b described above in the form of a software program.
- the routine may be hardwired as discrete logic circuits that are optimized to perform the various functions of the exponent bit-scanning routine.
- the description that follows will refer only to the operation of the exponent bit scanning routine with respect to the exponential b r e p , but it should be appreciated that a similar operation must be performed with respect to the exponential b r e q .
- the exponent bit-scanning routine is called at step 200 , and a running total is initialized to one at step 201 .
- An exponent e p to be bit-scanned is loaded into a register at step 202 .
- FIGS. 5 a - c illustrate a k-bit exponent e (i.e., e k ⁇ 1 -e 0 ) loaded into a register 32 .
- the register 32 may comprise a predefined memory space within the RAM 25 .
- a window 34 is defined through which a limited number of bits of the exponent e are accessed.
- a window size of three bits is used in an exemplary embodiment of the present invention, though it should be appreciated that a different number could also be advantageously utilized.
- the window 34 is shifted from the left of the register 32 until a one appears in the most significant bit (MSB) of the 3-bit window, as shown by a loop defined at steps 203 and 204 .
- MSB most significant bit
- step 203 the MSB is checked for presence of a one, and if a one is not detected, the window 34 is shifted by one bit to the right at step 204 .
- FIG. 5 b illustrates the window 34 shifted one bit to the right. It should be apparent that steps 203 and 204 will be repeated until a one is detected.
- a one has been detected a the MSB, and the value of the three-bit binary number in the window 34 is a read.
- the number is necessarily a 4, 5, 6 or 7 (i.e., binary 100, 101, 110 or 111, respectively) since the MSB is one.
- a pre-computed value for the reduced base b r raised to the number read from the window 34 i.e., b r 4 , b r 5 , b r 6 or b r 7 , respectively
- This pre-computed value is multiplied by a running total of the exponentiation at step 207 . It should be appreciated that in the first pass through the routine the running total is set to one as a default.
- a loop begins at step 209 in which the register 32 is checked to see if the least significant bit (LSB) of the exponent e p has entered the window 34 .
- step 209 checks for the LSB of the entire exponent e p , in contrast with step 203 which reads the MSB of the window 34 . If the LSB has not yet entered the window 34 , the loop continues to step 212 at which the window 34 is successively shifted to the right, and step 213 in which the running total is modular squared with each such shift. The loop is repeated three times until the previous three bits are no longer in the window 34 , i.e., three shifts of the window.
- LSB least significant bit
- the routing determines at step 216 whether the MSB is one. If so, the routine returns to step 205 , and the value in the window 34 is read once again. Alternatively, if the MSB is zero, then the register 32 is again checked at step 217 to see if the LSB of the exponent e p has entered the window 34 . If the LSB is not in the window 34 , the loop including steps 212 and 213 is again repeated with the window again shifted one bit to the right and the running total modular squared with the shift.
- the LSB has entered the window 34 , this indicates that the end of the exponent e p has been reached and the exponent bit-scanning routine is almost completed.
- the last two bits in the window 34 are read, and at step 223 the running total is multiplied by the reduced base b r the number of times the value read in the window. For example, if the value of the lower tow bits is a one, two, or three (i.e., binary 01, 10 or 11, respectively), then the previous running total is multiplied by the reduced base b r one, two or three times, respectively. If the value of the lower two bits is a 0, then the running total is not changed (i.e., multiplied by one). Then, the exponent bit-scanning routine ends at step 224 .
- the register 32 is checked to see if the LSB of the exponent e p has entered the window 34 . If the LSB has entered the window 34 , a series of step are performed in which the count value is checked. The count value keeps track of the number of passes through the above-described loop that have taken place. If the count value is three, indicating that all of the bits in the window 34 have been previously scanned, then the exponent bit-scanning routine ends at step 224 . If the count value is two, then all but the last bit in the window 34 has been previously scanned, and at step 221 , the value of the last bit is read.
- the count value is one, then only the first bit in the window 34 has been previously scanned, and at step 222 , the value of the last two bits is read (as already described above). Once again, at step 223 the running total is multiplied by the reduced base b r the number of times the value read in the window. Then, the exponent bit-scanning routine ends at step 224 .
- the successive shifts reduce the exemplary term b 1011010011 to (((((((((b 5 ) 2 ) 2 )*b 5 ) 2 ) 2 ) 2 *b 3 . Since the term b 5 was precalculated and fetched from memory, processing time is saved by not having to calculate that term. In addition, there are additional processing time savings that are achieved in performing a modular reduction of the exemplary term with respect to n due to the distributive nature of modular reduction. Rather than a huge number of multiplications followed by an equally huge modular reduction, only nine multiplications and modular reductions are required, and the modular reductions are smaller in magnitude since the intermediate values are smaller.
- the modular squaring step that occurs with each shift is necessary since the exponent bit-scanning begins at the MSB of the exponent e p where the window value is not really 4, 5, 6 or 7, but is actually 4, 5, 6 or 7 times 2 k where k is the exponent bit position for the window's LSB bit. Since the value of the exponent e p is interpreted as a power of the base b r , a factor of 2 k implies squaring k times. Multiplying by a precalculated value when the window MSB is one is used to insure that all ones in the exponent e p are taken into account and to reduce the total number of pre-calculated values that are needed.
- the modular exponentiator 20 utilizes an efficient multiplication algorithm for modular terms, referred to in the art as Montgomery multiplication.
- FIG. 6 illustrates a flow chart describing a Montgomery multiplication operation executed by the modular exponentiator 20 .
- the Montgomery multiplication operation may be coded in firmware and executed sequentially within the respective processing units 24 a , 24 b by the control units 26 a , 26 b which access the multipliers 28 a , 28 b for particular aspects of the operation, as will be further described below.
- the Montgomery multiplication routine may be hardwired as discrete logic circuits that are optimized to perform the various functions of the routine.
- the Montgomery multiplication routine includes a major loop and two minor loops.
- a distinct word of a multiplicand b i is multiplied by each of the words of a multiplicand a j , where j is the number of words in multiplicand a j and i is the number of words in multiplicand b i .
- the Montgomery multiplication routine is called at step 301 .
- the two multiplicands a j and b i are loaded into respective registers at step 302 , along with a square flag.
- the square flag is set to one so that a squaring speed-up subroutine may be called at step 400 .
- the squaring speed-up subroutine will be described in greater detail below. If the two multiplicands a j and b i are not equal, then the square flag is set to zero.
- i is set to be equal to one at step 305 so that the first word of multiplicand b 1 is accessed.
- the square flag is checked at step 306 to determine whether the squaring speed-up subroutine should be called, and if not, j is set equal to one at step 307 .
- the two words a j and b i are multiplied together within the first minor loop at step 308 , and the product added to the previous carry and previous c j . It should be appreciated that in the first pass through the routine, the carry and c j values are zero. The lower word of the result is stored as c j and the higher word of the result is used as the next carry.
- the first minor loop is repeated by incrementing j at step 310 until the last word of a is detected at step 309 , which ends the first minor loop. Before starting the second minor loop, a special reduction value is calculated that produces all “0”s for the lowest word of c j when multiplied with c j , and j is set to two at step 311 .
- the special reduction value is multiplied by the modulus n j , added to the previous carry and c j .
- the lower word of the result is stored as c j ⁇ 1 and the higher word of the result is used as the next carry.
- the second minor loop is repeated by incrementing j at step 314 until the last word of c j is detected at step 313 , which ends the second minor loop.
- i is incremented at step 316 and the major loop is repeated until the last word of b i has passed through the major loop.
- the modular reduction of the final result of c j with respect to n is obtained at step 317 , and the Montgomery multiplication routine ends at step 318 .
- An example of a Montgomery multiplication of a j with b i in which both multiplicands are four words long is provided at FIG. 9.
- the symbol ⁇ is used to denote the combination of all previous values.
- the Montgomery multiplication routine of FIG. 6 can be speeded up when used to square a number by recognizing that some of the partial products of the multiplication are equal.
- multiplicand a j is equal to multiplicand b 1 , i.e., a squaring operation
- the partial products of various components of the multiplication would ordinarily be repeated, e.g., the partial product of a 2 with b 3 is equal to the partial product of a 3 with b 2 .
- both of these partial products occur during the third major loop iteration.
- the first time the partial product is encountered it can be multiplied by two to account for the second occurrence, and a full multiplication of the second partial product can be skipped.
- Multiplication by two constitutes a single left shift for a binary number, and is significantly faster than a full multiplication operation. It should be appreciated that a great number of squaring operations are performed by the modular exponentiator 20 due to the operation of the exponent bit-scanning routing described above, and an increase in speed of the squaring operations would have a significant effect on the overall processing time for a particular modular exponentiation.
- FIG. 7 illustrates a flow chart describing the squaring speed-up subroutine, which is called at step 401 .
- j is set to be equal to i at step 402 , which, in the first iteration of the major loop of FIG. 6, will be equal to one. In subsequent iterations of the major loop, however, it should be apparent that j will begin with the latest value of i and will thus skip formation of partial products that have already been encountered.
- i is compared to j. If i is equal to j, then at step 405 a factor is set to one, and if i and j are not equal, then at step 404 the factor is set to two.
- step 406 a j and b i and the factor are multiplied together the product added to the previous carry and c j .
- the lower word of the result is stored as c j and the higher word of the result is used as the next carry.
- j is incremented at step 408 and the loop is repeated until the last word of b j has passed through the loop, at which time the squaring speed-up subroutine ends at step 409 .
- step 410 of FIG. 6 the Montgomery multiplication routine resumes just after the first minor loop. It should be appreciated that the squaring speed-up subroutine will operate in place of the first minor loop for every iteration of the major loop of the Montgomery multiplication routine when the squaring flag is set.
- the multipliers 28 a , 28 b are tailored to perform specific operations.
- the multipliers 28 a , 28 b include specific functions for multiplying by two (used by the squaring speed-up routine), executing an a*b+c function, and performing the mod 2 n function on a 2n-bit result while leaving the higher n bits in a carry register.
- FIG. 10 is a chart showing a block diagram of a system architecture which can be employed to practice the present invention.
- the architecture is implemented on an ASIC 500 .
- ASIC 500 comprises a CPU 12 with a processor 502 , which performs operations required to implement the present invention.
- processor 502 comprises a reduced instruction set (RISC) POWERPCTM 401 core processor available from the IBMTM Corporation.
- RISC reduced instruction set
- Processor 502 provides a trace interface 504 and a watch interface 506 , and obtains instructions via an external FLASH/SRAM memory interface module 520 and a 32 bit external memory interface 522 .
- the trace interface 504 and the watch interface 506 provide for error detection and debugging.
- processor 502 interfaces with the ASIC module bus 524 via a selectable data cache 508 and an instruction cache 510 .
- the ASIC 500 interface logic 22 comprises a general I/O module 516 with a 4 bit external interface 518 , an external memory interface module 520 and associated interface 522 , and a PCI interface module 512 and associated PCI interface 514 .
- the PCI interface 514 provides a 32 bit data channel nominally operating at 33 MHz.
- the PCI interface module 512 provides the operations necessary for compliance with the PCI interface I/O and command protocol, including built-in input and output first input first output (FIFO) buffers for efficient data transfer. Data transfer among other modules in the ASIC 500 is provided by the ASIC module bus 524 .
- the ASIC 500 also optionally comprises a high speed dedicated random number generator 526 , for key generation and padding.
- the ASIC 500 also comprises a modular exponentiator 20 , which includes pair of parallel processing units 24 a 24 b , each associated with a RAM 25 .
- FIG. 11 presents a more detailed view of the processing units 24 a ,b, the associated RAM 25 , and control units 26 a, b .
- the processing unit 24 a, b comprises a multiplier 602 , a preload register 604 , a memory 25 , and a multiplexer 606 .
- a control unit 26 a, b operatively coupled to the multiplier 602 , preload register 604 , memory 25 and multiplexer 606 controls the operation of these respective devices, in accordance with a clock signal provided by clock 608 .
- multiplier 602 would comprise a 64 bit bus for each input number to be multiplied.
- the number of clock pulses necessary to input both values from a 64 bit bus would be too large to support a 5 ms calculation speed with a 33 MHz clock.
- the present invention provides this high speed capability with a unique architecture that includes a 512 bit multiplier input port coupled to a preload register, and a control unit that enforces an efficient computation protocol to efficiently perform 512 by 512 bit multiplications. Further, because of the predictable nature of the computations required in performing Montgomery multiplications, the control unit 26 a, b enforces a computation protocol that minimizes the clock cycles to input a new number into the preload register.
- the multiplier comprises a first input port 610 with N bit capacity, where N is an integer greater than one, and a second input port 612 with a K*N (hereinafter KN) bit capacity, where K is an integer greater than one.
- KN K*N
- multiplicand “a” At port 612 in each successive multiplication, inputs to the preload register 604 (representing the multiplicand “a”) can be provided by the multiplier 602 (from a multiplier output port 614 ) or the memory 25 (from a memory output port 616 ) under selectable control of the multiplexer 606 and the control unit 26 a, b .
- the Montgomery algorithm dictates that the desired value for “a” in the next calculation is often the same as the value for “a” in the preceding multiplication (see for example, FIG. 9).
- the preload register 604 does not require a new value for “a”, and the control unit 26 a ,b will retain the previous value for “a” in the preload register, and provide it to the multiplier 602 when necessary.
- a data path is also provided from the multiplier output port 614 to the preload register 604 to allow immediately needed results to bypass the memory 25 , thereby reducing memory bus traffic.
- the multiplication of a*b takes place as follows.
- the full 512 bit value for the second number (“a”) is input from the preload register 604 to the multiplier 602 .
- the first 64 bits of the first number (“b”) is loaded into the multiplier 602 .
- the 64 bit first number (“b”) is multiplied by the 512 bit second number (“a”).
- the next 64 bits of the first number (“b”) are then loaded into the multiplier 602 , and that portion of “b” is multiplied by the 512 bit second number (“a”).
- the control unit 26 a, b of the present invention invokes a different command protocol when a new “a” value is expected.
- This protocol makes use of the 64 bit input bus during the three clocks after each 64 bit “b” value is supplied to the multiplier 602 .
- the predicted, future value needed for the next multiplication is fetched from the memory 25 and directed to the preload register 604 during the clock period following the input of the “b” value to the multiplier 602 .
- the predicted future value for “a” is ascertainable due to the deterministic nature of the Montgomery multiplication routine, which frequently uses the same “a” value while varying only the “b” value.
- FIG. 12 is a timing diagram illustrating the foregoing logic.
- Trace 702 represents the signal from the clock 608 .
- Trace 704 indicates the clock cycles where values for “b” are supplied to the multiplier 602 from the memory 25 . Since the bus connecting the memory 25 output port 616 and the multiplier first input port 610 is a 64 bit bus, values for the 512 bit number “b” are supplied to the multiplier 602 in 64 bit increments. Accordingly, location 708 on trace 704 indicates where the first 64 bits of the 512 bit number “b” are transferred to the multiplier 602 via the multiplier first input port 610 .
- FIGS. 13 and 14 are flow charts depicting the multiplication operations of one embodiment of the present invention.
- KN bits of “A” are provided from the preload register 602 to the multiplier 602 in the multiplier second input port 612 . This is accomplished in a single clock pulse.
- N bits of “b” are provided to the first input port 610 of the multiplier 602 in a single clock pulse. This is shown in block 804 .
- the operand “a” is often used in successive calculations, and can also be predicted from past values. Because of this deterministic nature, the value for “a” for in successive calculations may be predicted. If a new value is predicted for “a” in following computations, N bits of the predicted “a” value is provided from the memory 25 to the preload register 604 in a single clock. This can be performed in a clock pulse following the pulse providing the N bits of “b” to the multiplier, and is depicted in blocks 806 and 814 . By providing the predicted “a” value from the memory 25 at this time, a potential bottleneck in data flow for new “a” values is minimized, as described above with reference to FIG. 12. If a new value for “a” is not anticipated, the logic from block 806 proceeds to block 808 , where the KN bits of “a” are multiplied by the N bits of “b.”
- the multiplier 602 is capable of efficiently performing a number of operations on “a” and “b,” in addition to multiplication. These operations are described in Table 1 below: TABLE 1 Address Instruction Control Word Description 0000 a + b Read value “a” from the memory 25, and add it to an accumulator in the multiplier 602. This can be accomplished by performing the operation [(a * b) + acc] where either “a” or “b” are set to one. 0001 a * b + acc Read “a” and “b” from the memory 25 and execute a multiply and accumulate function. The LSB of the result is stored back in the memory 25.
- All data transfer between the multiplier 602 and the memory 25 occurs with the LSBs first, with the memory address decremented by one after each memory read or memory write operation.
- the LSBs of the result are stored back in the memory 25.
- 0011 Save acc Store accumulator value in the memory 25, and clear accumulator.
- 0100 Save acc and overflow Store accumulator and overflow in the memory 25. Clear accumulator and overflow.
- the LSBs of the result are stored back in the memory 25.
- the apparatus uses a preload register, coupled to a multiplier via a KN bit bus to load the value of the “a” multiplicand in the multiplier in a single clock pulse.
- the “b” multiplicand (which is also KN bits long) is supplied to the multiplier N bits at a time from a memory via an N bit bus.
- the multiplier multiplies the N bits of the “b” multiplicand by the KN bits of the “a” multiplicand until all KN bits of “b” are multiplied by the KN bits of “a.”
- the method provides KN bits of the multiplicand “a” from a preload register to the multiplier in a single clock pulse. Then, N bits of the multiplicand “b” are provided to the multiplier, also in a single clock pulse. Next, the KN bits of the number “a” are repeatedly multiplied by the N bits of the number b until all of the KN bits of the “b” multiplicand are provided to the first multiplier input port and multiplied by the KN bits of the “a” multiplicand. When completed, these operations result in an output number, which is then transmitted N bits at a time to the memory, where it can be made available for further processing.
- one embodiment of the present invention loads a predicted (future) value for multiplicand “a” into the preload register while multiplication operations on the current “a” and “b” multiplicands are being performed. This technique further reduces the clock cycles necessary to load and multiply the parameters.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
Description
- This application is related to co-pending and commonly assigned application Ser. No. 08/828,368, entitled “High-Speed Modular Exponentiator,” by Gregory A. Powell, Mark W. Wilson, Kevin Q. Truong, and Christopher P. Curren, filed Mar. 28, 1997, which application is hereby incorporated by reference herein.
- This application is also related to co-pending and commonly assigned application Ser. No. __/___,___, entitled “High Speed Montgomery Value Calculation,” by Matthew S. McGregor, filed on same date herewith, which application is also hereby incorporated by reference herein.
- 1. Field of the Invention
- The present invention relates to cryptographic systems, and more particularly, to a highly efficient multiplier for performing modular reduction operations integral to cryptographic key calculations.
- 2. Description of Related Art
- Cryptographic systems are commonly used to restrict unauthorized access to messages communicated over otherwise insecure channels. In general, cryptographic systems use a unique key, such as a series of numbers, to control an algorithm used to encrypt a message before it is transmitted over an insecure communication channel to a receiver. The receiver must have access to the same key in order to decode the encrypted message. Thus, it is essential that the key be communicated in advance by the sender to the receiver over a secure channel in order to maintain the security of the cryptographic system; however, secure communication of the key is hampered by the unavailability and expense of secure communication channels. Moreover, the spontaneity of most business communications is impeded by the need to communicate the key in advance.
- In view of the difficulty and inconvenience of communicating the key over a secure channel, so-called public key cryptographic systems are proposed in which a key may be communicated over an insecure channel without jeopardizing the security of the system. A public key cryptographic system utilizes a pair of keys in which one is publicly communicated, i.e., the public key, and the other is kept secret by the receiver, i.e., the private key. While the private key is mathematically related to the public key, it is practically impossible to derive the private key from the public key alone. In this way, the public key is used to encrypt a message, and the private key is used to decrypt the message.
- Such cryptographic systems often require computation of modular exponentiations of the form y=be mod n, in which the base b, exponent e and modulus n are extremely large numbers, e.g., having a length of 1,024 binary digits or bits. If, for example, the exponent e were transmitted as a public key, and the base b and modulus n were known to the receiver in advance, a private key y could be derived by computing the modular exponentiation. It would require such a extremely large amount of computing power and time to factor the private key y from the exponent e without knowledge of the base b and modulus n, that unauthorized access to the decrypted message is virtually precluded as a practical matter.
- A drawback of such cryptographic systems is that calculation of the modular exponentiation remains a daunting mathematical task even to an authorized receiver using a high speed computer. With the prevalence of public computer networks used to transmit confidential data for personal, business and governmental purposes, it is anticipated that most computer users will want cryptographic systems to control access to their data. Despite the increased security, the difficulty of the modular exponentiation calculation will substantially drain computer resources and degrade data throughput rates, and thus represents a major impediment to the widespread adoption of commercial cryptographic systems.
- Accordingly, a critical need exists for a high speed modular exponentiation method and apparatus to provide a sufficient level of communication security while minimizing the impact to computer system performance and data throughput rates.
- In accordance with the teachings of the present invention, a highly efficient method and apparatus is disclosed for performing operations required for modular exponentiation. The apparatus is especially well suited for implementing multiplications using the Montgomery algorithm.
- The efficient multiplier architecture uses a preload register, coupled to a multiplier at a second input port via a KN bit bus to load the value of the “a” multiplicand in the multiplier in a single clock pulse. The “b” multiplicand (which is also KN bits long) is supplied to the multiplier N bits at a time from a memory via an N bit bus coupled to a multiplier. The multiplier multiplies the N bits of the “b” multiplicand by the KN bits of the “a” multiplicand and provides that product at a multiplier output N bits at a time, where it can be supplied to the memory.
- The efficient multiplication method using the foregoing architecture is also described. The method begins by providing KN bits of the multiplicand “a” from a preload register to a second multiplier input port in a single clock pulse. Then, N bits of the multiplicand “b” are provided to a first multiplier input port, also in a single clock pulse. The KN bits of the number “a” are multiplied by the K bits of the number “b” until all of the KN bits of the “b” multiplicand are provided to the first multiplier input port and multiplied by the KN bits of the “a” multiplicand. When completed, these operations result in an output number, which is then transmitted to the memory, where it can be made available for further processing.
- In accordance with the deterministic behavior of the Montgomery algorithm, one embodiment of the present invention loads a predicted (future) value for multiplicand “a” into the preload register while multiplication operations on the current “a” and “b” multiplicands are being performed. This technique further reduces the clock cycles necessary to load and multiply the parameters.
- A more complete understanding of the computationally efficient multiplier will be afforded to those skilled in the art, as well as a realization of additional advantages and objects thereof, by a consideration of the following detailed description of the preferred embodiment. Reference will be made to the appended sheets of drawings which will first be described briefly.
- FIG. 1 is a block diagram of an exemplary application of a modular exponentiator within a cryptographic system;
- FIG. 2 is a block diagram of the modular exponentiator;
- FIG. 3 is a system level flow diagram of the functions performed by the modular exponentiator;
- FIG. 4 is a flow chart showing an exponent bit scanning operation performed by the modular exponentiator;
- FIG. 5a-c are block diagrams of an exponent register within various stages of the exponent bit scanning operation of FIG. 4;
- FIG. 6 is a flow chart showing a multiplication operation performed by the modular exponentiator;
- FIG. 7 is a flow chart showing a squaring operation performed in conjunction with the multiplication operation of FIG. 6;
- FIG. 8 is a chart showing an exemplary exponent bit scanning operation in accordance with the flow chart of FIG. 4;
- FIG. 9 is a chart showing an exemplary multiplication and squaring operation in accordance with the flow charts of FIGS. 6 and 7;
- FIG. 10 is a block diagram showing a system architecture which can be employed to practice the present invention;
- FIG. 11 is a block diagram showing one embodiment of the multiplier and associated modules;
- FIG. 12 is a timing diagram showing the pre-loading of predictive multiplicands; and
- FIGS. 13 and 14 are flow charts depicting the multiplication operations.
- The present invention satisfies the need for a high speed modular exponentiation method and apparatus which provides a sufficient level of communication security while minimizing the impact to computer system performance and data throughput rates. In the detailed description that follows, like element numerals are used to describe like elements in one or more of the figures.
- Referring first to FIG. 1, a block diagram of an application of a
modular exponentiator 20 within an exemplarycryptographic system 10 is illustrated. The exemplarycryptographic system 10 includes a central processing unit (CPU) 12, a random access memory (RAM) 14, a read only memory (ROM) 16, andmodular exponentiator 20. Each of the elements of thecryptographic system 10 are coupled together by a bidirectional data andcontrol bus 18, over which data and control messages are transmitted. TheCPU 12 controls the operation of thecryptographic system 10, and may be provided by a conventional microprocessor or digital signal processor circuit. TheRAM 14 provides temporary data storage for operation of theCPU 12, and theROM 16 provides for non-volatile storage of an instruction set, i.e., software, that is executed in a sequential manner by theCPU 12 to control the overall operation of thecryptographic system 10. Themodular exponentiator 20 may comprise a special function device, such as an application specific integrated circuit (ASIC) or field programmable gate array (FPGA), that is accessed by theCPU 12 to perform modular exponentiation operations. Alternatively, the elements of thecryptographic system 10 may all be contained within a single ASIC or FPGA in which themodular exponentiator 20 is provided as an embedded core process. - As known in the art, the cryptographic system provides an interface between a non-secure communication channel and a data user. The cryptographic system receives encrypted data from an external source, such as a remote transmitter (not shown) which is communicating with the cryptographic system over the communication channel. The encrypted data is decrypted by the cryptographic system, and the decrypted data is provided to the data user. Conversely, the data user provides decrypted data to the cryptographic system for encryption and subsequent transmission across the communication channel. The cryptographic system also receives and transmits various non-encrypted messages, such as control data and the public key information. It should be apparent that all communications with the cryptographic system occur via the data and
control bus 18. - The
modular exponentiator 20 is illustrated in greater detail in FIG. 2. Themodular exponentiator 20 comprises aninterface logic unit 22, a pair ofparallel processing units RAM 25, which all communicate internally over a data andcontrol bus 27. Theinterface logic unit 22 controls communications between themodular exponentiator 20 and the data andcontrol bus 18 of thecryptographic system 10 described above. Theprocessing units respective control units multiplier units 28 a, 28 b, which further comprise internal circuit elements that execute a modular exponentiation process, as will be further described below. TheRAM 25 provides for temporary storage of data values generated by thecontrol units multiplier units 28 a, 28 b while executing a modular exponentiation operation. - Referring now to FIG. 3 in conjunction with FIG. 2 described above, a system level flow diagram of the functions performed by the
modular exponentiator 20 is illustrated. As shown atstep 101, themodular exponentiator 20 will compute a modular exponentiation of the form y=be mod n, in which the modulus n, base b and exponent e are each k bits long. In a preferred embodiment of the present invention, k is 1,024 bits. Using conventional methods, solving such a modular exponentiation would require a tremendous amount of computing power due to the large number and size of the multiplications and modular reductions that must be performed. In the present invention, the modular exponentiation is solved in a highly efficient manner by reducing the size of the problem and by reducing the number of multiplications that are performed. - As a first step in solving the modular exponentiation, the original exponentiation is split into components, as follows:
- b emodn=(((q −1modp*(b r e q modp+p−b r e q mod q))modp)*q)+b r e q modq
- in which p and q are large prime numbers whereby n=p*q. For maximum security, p and q should be roughly the same size. The term q−1 mod p is a special value called an inverse which is derived from the Chinese remainder theorem, as known in the art. In particular, q−1 mod p is the inverse of q mod p. Since the inverse represents a modular exponentiation of the same order as be p mod p, the inverse may be pre-calculated in advance, and stored in the
RAM 25 atstep 108. The values ep and eq are k/2 bit values equal to e mod (p−1) and e mod (q−1), respectively. A reduced base term br for each of br e p mod p and br e q mod q is provided by taking a modular reduction of b with respect to p and q, respectively. The reduced base terms br thus have a k/2 bit length as well. - Splitting the modular exponentiation permits its solution in two parallel paths, as illustrated in FIG. 3, which are processed separately by the
respective processing units steps RAM 25 atsteps - Since p and q are each respectively k/2 bits in length, the magnitude of the respective problems is thus reduced substantially from its original form. Moreover, the parallel calculation of two reduced-size modular exponentiations requires substantially less computer processing time than a corresponding calculation of the original modular exponentiation within a single processing unit. The reduction in processing time results from the fact that the number of multiplies needed to perform an exponentiation with an efficient algorithm (such as described below) is proportional to 2s2+s, where s is equal to k divided by the multiplication operand size in bits. If an s word problem was treated as two separate s/2 word problems, the number of multiply operations per exponentiation is reduced to a value proportional to
- For example, if k were 1,024 bits and the multiplication operand were 128 bits, s would be equal to 8. Accordingly, an s word problem would require a number of multiply operations proportional to 136, while the two separate s/2 word problems would respectively require a number of multiply operations proportional to 36. Thus, the number of multiply operations is reduced by 3.778 times.
- Following the calculations of
steps step 106. Atstep 107, the resulting sum is multiplied by the inverse q−1 modp which was pre-calculated atstep 108. This step may be performed by one of themultipliers 28 a, 28 b, which are optimized for modular operations as will be further described below. The resulting product is modularly reduced with respect to p atstep 109, and further multiplied by q atstep 110 to produce a k-bit value. Lastly, the product of that final multiplication is added to br e q mod q atstep 111, which was previously calculated atstep 105. It should be appreciated that the modular reduction that occurs atstep 109 is much easier than the original modular exponentiation in view of the substantial reduction in size of the original be term. This final solution to the modular exponentiation is provided to the data andcontrol bus 18 for further use by theCPU 12. - Referring now to FIGS. 4 and 5a-c, the modular exponentiations of br e p mod p and br e q mod q from
steps respective processing units - The exponent bit-scanning routine is called at
step 200, and a running total is initialized to one atstep 201. An exponent ep to be bit-scanned is loaded into a register atstep 202. FIGS. 5a-c illustrate a k-bit exponent e (i.e., ek−1-e0) loaded into aregister 32. Theregister 32 may comprise a predefined memory space within theRAM 25. First, awindow 34 is defined through which a limited number of bits of the exponent e are accessed. A window size of three bits is used in an exemplary embodiment of the present invention, though it should be appreciated that a different number could also be advantageously utilized. Thewindow 34 is shifted from the left of theregister 32 until a one appears in the most significant bit (MSB) of the 3-bit window, as shown by a loop defined atsteps step 203, the MSB is checked for presence of a one, and if a one is not detected, thewindow 34 is shifted by one bit to the right atstep 204. FIG. 5b illustrates thewindow 34 shifted one bit to the right. It should be apparent thatsteps - At
step 205, a one has been detected a the MSB, and the value of the three-bit binary number in thewindow 34 is a read. The number is necessarily a 4, 5, 6 or 7 (i.e., binary 100, 101, 110 or 111, respectively) since the MSB is one. Atstep 206, a pre-computed value for the reduced base br raised to the number read from the window 34 (i.e., br 4, br 5, br 6 or br 7, respectively) is fetched from memory. This pre-computed value is multiplied by a running total of the exponentiation atstep 207. It should be appreciated that in the first pass through the routine the running total is set to one as a default. - Thereafter, a loop begins at
step 209 in which theregister 32 is checked to see if the least significant bit (LSB) of the exponent ep has entered thewindow 34. Significantly, step 209 checks for the LSB of the entire exponent ep, in contrast withstep 203 which reads the MSB of thewindow 34. If the LSB has not yet entered thewindow 34, the loop continues to step 212 at which thewindow 34 is successively shifted to the right, and step 213 in which the running total is modular squared with each such shift. The loop is repeated three times until the previous three bits are no longer in thewindow 34, i.e., three shifts of the window. Once three shifts have occurred, the routing determines atstep 216 whether the MSB is one. If so, the routine returns to step 205, and the value in thewindow 34 is read once again. Alternatively, if the MSB is zero, then theregister 32 is again checked atstep 217 to see if the LSB of the exponent ep has entered thewindow 34. If the LSB is not in thewindow 34, theloop including steps - If, at
step 217, the LSB has entered thewindow 34, this indicates that the end of the exponent ep has been reached and the exponent bit-scanning routine is almost completed. Atstep 222, the last two bits in thewindow 34 are read, and atstep 223 the running total is multiplied by the reduced base br the number of times the value read in the window. For example, if the value of the lower tow bits is a one, two, or three (i.e., binary 01, 10 or 11, respectively), then the previous running total is multiplied by the reduced base br one, two or three times, respectively. If the value of the lower two bits is a 0, then the running total is not changed (i.e., multiplied by one). Then, the exponent bit-scanning routine ends atstep 224. - Returning to step209 discussed above, before the loop begins, the
register 32 is checked to see if the LSB of the exponent ep has entered thewindow 34. If the LSB has entered thewindow 34, a series of step are performed in which the count value is checked. The count value keeps track of the number of passes through the above-described loop that have taken place. If the count value is three, indicating that all of the bits in thewindow 34 have been previously scanned, then the exponent bit-scanning routine ends atstep 224. If the count value is two, then all but the last bit in thewindow 34 has been previously scanned, and atstep 221, the value of the last bit is read. If the count value is one, then only the first bit in thewindow 34 has been previously scanned, and atstep 222, the value of the last two bits is read (as already described above). Once again, atstep 223 the running total is multiplied by the reduced base br the number of times the value read in the window. Then, the exponent bit-scanning routine ends atstep 224. - An example of the exponent bit-scanning technique is illustrated in FIG. 8 with respect to a modular exponentiation of a base b raised to a ten-bit exponent e, in which e=1011010011. The successive shifts reduce the exemplary term b1011010011 to ((((((((b5)2)2)2)*b5)2)2)2)2*b3. Since the term b5 was precalculated and fetched from memory, processing time is saved by not having to calculate that term. In addition, there are additional processing time savings that are achieved in performing a modular reduction of the exemplary term with respect to n due to the distributive nature of modular reduction. Rather than a huge number of multiplications followed by an equally huge modular reduction, only nine multiplications and modular reductions are required, and the modular reductions are smaller in magnitude since the intermediate values are smaller.
- It should be appreciated that the modular squaring step that occurs with each shift is necessary since the exponent bit-scanning begins at the MSB of the exponent ep where the window value is not really 4, 5, 6 or 7, but is actually 4, 5, 6 or 7
times 2k where k is the exponent bit position for the window's LSB bit. Since the value of the exponent ep is interpreted as a power of the base br, a factor of 2k implies squaring k times. Multiplying by a precalculated value when the window MSB is one is used to insure that all ones in the exponent ep are taken into account and to reduce the total number of pre-calculated values that are needed. - Even though the exponent bit-scanning routine has reduced the number of multiplications that have to be performed in the respective calculations of br e p mod p and br e q mod q, there still are a number of multiplications that need to be performed. The
modular exponentiator 20 utilizes an efficient multiplication algorithm for modular terms, referred to in the art as Montgomery multiplication. The Montgomery algorithm provides that: - where k is the number of bits in the modulus n, n is relatively prime to 2k, and n>a, n>b. In order to use the algorithm for repeated multiplies, the values of a and b must be put into Montgomery form prior to performing the Montgomery multiply, where:
- x*2kmod n=x Mont
- If the two values to the Montgomery multiplied are in Montgomery form, then the result will also be in Montgomery form.
- FIG. 6 illustrates a flow chart describing a Montgomery multiplication operation executed by the
modular exponentiator 20. As with the exponent bit-scanning routine described above with respect to FIG. 4, the Montgomery multiplication operation may be coded in firmware and executed sequentially within therespective processing units control units multipliers 28 a, 28 b for particular aspects of the operation, as will be further described below. Alternatively, the Montgomery multiplication routine may be hardwired as discrete logic circuits that are optimized to perform the various functions of the routine. - As illustrated in FIG. 6, the Montgomery multiplication routine includes a major loop and two minor loops. In each major loop, a distinct word of a multiplicand bi is multiplied by each of the words of a multiplicand aj, where j is the number of words in multiplicand aj and i is the number of words in multiplicand bi. The Montgomery multiplication routine is called at
step 301. The two multiplicands aj and bi are loaded into respective registers atstep 302, along with a square flag. If the two multiplicands aj and bi are equal, the square flag is set to one so that a squaring speed-up subroutine may be called at step 400. The squaring speed-up subroutine will be described in greater detail below. If the two multiplicands aj and bi are not equal, then the square flag is set to zero. - Before initiating the first major loop, i is set to be equal to one at
step 305 so that the first word of multiplicand b1 is accessed. The square flag is checked at step 306 to determine whether the squaring speed-up subroutine should be called, and if not, j is set equal to one atstep 307. - The two words aj and bi are multiplied together within the first minor loop at
step 308, and the product added to the previous carry and previous cj. It should be appreciated that in the first pass through the routine, the carry and cj values are zero. The lower word of the result is stored as cj and the higher word of the result is used as the next carry. The first minor loop is repeated by incrementing j atstep 310 until the last word of a is detected atstep 309, which ends the first minor loop. Before starting the second minor loop, a special reduction value is calculated that produces all “0”s for the lowest word of cj when multiplied with cj, and j is set to two atstep 311. Thereafter, atstep 312, the special reduction value is multiplied by the modulus nj, added to the previous carry and cj. The lower word of the result is stored as cj−1 and the higher word of the result is used as the next carry. The second minor loop is repeated by incrementing j atstep 314 until the last word of cj is detected atstep 313, which ends the second minor loop. Once the second minor loop ends, i is incremented atstep 316 and the major loop is repeated until the last word of bi has passed through the major loop. Then, the modular reduction of the final result of cj with respect to n is obtained atstep 317, and the Montgomery multiplication routine ends atstep 318. An example of a Montgomery multiplication of aj with bi in which both multiplicands are four words long is provided at FIG. 9. In the example, the symbol Σ is used to denote the combination of all previous values. - The Montgomery multiplication routine of FIG. 6 can be speeded up when used to square a number by recognizing that some of the partial products of the multiplication are equal. In particular, when multiplicand aj is equal to multiplicand b1, i.e., a squaring operation, then the partial products of various components of the multiplication would ordinarily be repeated, e.g., the partial product of a2 with b3 is equal to the partial product of a3 with b2. As illustrated in FIG. 9, both of these partial products occur during the third major loop iteration. Thus, the first time the partial product is encountered it can be multiplied by two to account for the second occurrence, and a full multiplication of the second partial product can be skipped. Multiplication by two constitutes a single left shift for a binary number, and is significantly faster than a full multiplication operation. It should be appreciated that a great number of squaring operations are performed by the
modular exponentiator 20 due to the operation of the exponent bit-scanning routing described above, and an increase in speed of the squaring operations would have a significant effect on the overall processing time for a particular modular exponentiation. - FIG. 7 illustrates a flow chart describing the squaring speed-up subroutine, which is called at
step 401. Initially, j is set to be equal to i atstep 402, which, in the first iteration of the major loop of FIG. 6, will be equal to one. In subsequent iterations of the major loop, however, it should be apparent that j will begin with the latest value of i and will thus skip formation of partial products that have already been encountered. Atstep 403, i is compared to j. If i is equal to j, then at step 405 a factor is set to one, and if i and j are not equal, then atstep 404 the factor is set to two. Thereafter, instep 406, aj and bi and the factor are multiplied together the product added to the previous carry and cj. As instep 308 of FIG. 6, the lower word of the result is stored as cj and the higher word of the result is used as the next carry. After completing themultiplication step 406, j is incremented atstep 408 and the loop is repeated until the last word of bj has passed through the loop, at which time the squaring speed-up subroutine ends atstep 409. Atstep 410 of FIG. 6, the Montgomery multiplication routine resumes just after the first minor loop. It should be appreciated that the squaring speed-up subroutine will operate in place of the first minor loop for every iteration of the major loop of the Montgomery multiplication routine when the squaring flag is set. - In order to perform the Montgomery multiplication routine more efficiently, the
multipliers 28 a, 28 b are tailored to perform specific operations. In particular, themultipliers 28 a, 28 b include specific functions for multiplying by two (used by the squaring speed-up routine), executing an a*b+c function, and performing themod 2n function on a 2n-bit result while leaving the higher n bits in a carry register. - FIG. 10 is a chart showing a block diagram of a system architecture which can be employed to practice the present invention. In this embodiment, the architecture is implemented on an
ASIC 500.ASIC 500 comprises aCPU 12 with aprocessor 502, which performs operations required to implement the present invention. In one embodiment,processor 502 comprises a reduced instruction set (RISC)POWERPC™ 401 core processor available from the IBM™ Corporation.Processor 502 provides atrace interface 504 and awatch interface 506, and obtains instructions via an external FLASH/SRAMmemory interface module 520 and a 32 bitexternal memory interface 522. Thetrace interface 504 and thewatch interface 506 provide for error detection and debugging. To enhance performance,processor 502 interfaces with theASIC module bus 524 via aselectable data cache 508 and aninstruction cache 510. TheASIC 500interface logic 22 comprises a general I/O module 516 with a 4 bit external interface 518, an externalmemory interface module 520 and associatedinterface 522, and aPCI interface module 512 and associated PCI interface 514. The PCI interface 514 provides a 32 bit data channel nominally operating at 33 MHz. ThePCI interface module 512 provides the operations necessary for compliance with the PCI interface I/O and command protocol, including built-in input and output first input first output (FIFO) buffers for efficient data transfer. Data transfer among other modules in theASIC 500 is provided by theASIC module bus 524. TheASIC 500 also optionally comprises a high speed dedicatedrandom number generator 526, for key generation and padding. In accordance with the principles described herein, theASIC 500 also comprises amodular exponentiator 20, which includes pair ofparallel processing units 24 a 24 b, each associated with aRAM 25. - FIG. 11 presents a more detailed view of the
processing units 24 a,b, the associatedRAM 25, andcontrol units 26 a, b. Theprocessing unit 24 a, b comprises amultiplier 602, apreload register 604, amemory 25, and a multiplexer 606. Acontrol unit 26 a, b, operatively coupled to themultiplier 602,preload register 604,memory 25 and multiplexer 606 controls the operation of these respective devices, in accordance with a clock signal provided byclock 608. - It is desirable to perform 1024 bit RSA calculations such as modular exponentiations as quickly as possible, preferably in less than 5 ms at a 33 MHz clock speed. Although the 1024 bit RSA calculations can be reduced to 512 bit calculations using the above teaching, this still leaves the problem of performing two 512 bit calculations within the 5 ms interval.
- Ordinarily,
multiplier 602 would comprise a 64 bit bus for each input number to be multiplied. However, with such a design, the number of clock pulses necessary to input both values from a 64 bit bus would be too large to support a 5 ms calculation speed with a 33 MHz clock. The present invention provides this high speed capability with a unique architecture that includes a 512 bit multiplier input port coupled to a preload register, and a control unit that enforces an efficient computation protocol to efficiently perform 512 by 512 bit multiplications. Further, because of the predictable nature of the computations required in performing Montgomery multiplications, thecontrol unit 26 a, b enforces a computation protocol that minimizes the clock cycles to input a new number into the preload register. - In accordance with the foregoing, the multiplier comprises a
first input port 610 with N bit capacity, where N is an integer greater than one, and asecond input port 612 with a K*N (hereinafter KN) bit capacity, where K is an integer greater than one. The illustrated embodiment depicts a system wherein N=64, and K=8, representing the situation where the first input port is a 64 bit parallel input port, and the second input port is a 512 bit parallel port. Selecting the capacity of the multiplierfirst input port 610 to be less than that of the multipliersecond port 612 minimizes system resource requirements without substantially impacting the throughput of themultiplier 602. That is because themultiplier 602 only operates on 64 bits of the number at the first input port 610 (the “b” multiplicand) every four clocks as the multiplication is taking place. - To control the value of multiplicand “a” at
port 612 in each successive multiplication, inputs to the preload register 604 (representing the multiplicand “a”) can be provided by the multiplier 602 (from a multiplier output port 614) or the memory 25 (from a memory output port 616) under selectable control of the multiplexer 606 and thecontrol unit 26 a, b. For example, the Montgomery algorithm dictates that the desired value for “a” in the next calculation is often the same as the value for “a” in the preceding multiplication (see for example, FIG. 9). In such cases, thepreload register 604 does not require a new value for “a”, and thecontrol unit 26 a,b will retain the previous value for “a” in the preload register, and provide it to themultiplier 602 when necessary. A data path is also provided from themultiplier output port 614 to thepreload register 604 to allow immediately needed results to bypass thememory 25, thereby reducing memory bus traffic. - Presuming that there is a first number (“b”) stored in the
memory 25, and a second number (“a”) loaded into thepreload register 604, the multiplication of a*b takes place as follows. In the first clock cycle, the full 512 bit value for the second number (“a”) is input from thepreload register 604 to themultiplier 602. Next, the first 64 bits of the first number (“b”) is loaded into themultiplier 602. Then, over the next 3 clock cycles, the 64 bit first number (“b”) is multiplied by the 512 bit second number (“a”). The next 64 bits of the first number (“b”) are then loaded into themultiplier 602, and that portion of “b” is multiplied by the 512 bit second number (“a”). This process is repeated until all bits of the “b” multiplicand are multiplied by all bits of the “a” multiplicand. Loading and multiplying all of the bits of “b” by those of “a” takes 8*4=32 clock cycles. After 4 clock cycles ofmultiplier 602 internal processing, the output, representing the least significant 512 bits of the product of the first number (“b”) and the second number (“a”), is outputted over the next 8 clock cycles. The most significant 512 bits of the product remain in themultiplier 602, and are used for further carry operations. Accordingly, 45 clock cycles are required to determine the product of “a” and “b.” - Although the data channel622 from the
preload register 604 to themultiplier 602 is 512 bits, the bus capacity to all other input and output ports, including thememory 25 is only 64 bits. Therefore, in cases where a preload value is required (a new “a” value), an additional 8 clock cycles would ordinarily be required to load the value from thememory 25 to the preload register 604 from the 64 bit data channel. This would mean that for any multiplication requiring a new “a” value, the number of required clock cycles to complete the operation would be 45+8=53. To avoid this problem thecontrol unit 26 a, b of the present invention invokes a different command protocol when a new “a” value is expected. This protocol makes use of the 64 bit input bus during the three clocks after each 64 bit “b” value is supplied to themultiplier 602. In particular, the predicted, future value needed for the next multiplication is fetched from thememory 25 and directed to thepreload register 604 during the clock period following the input of the “b” value to themultiplier 602. The predicted future value for “a” is ascertainable due to the deterministic nature of the Montgomery multiplication routine, which frequently uses the same “a” value while varying only the “b” value. - FIG. 12 is a timing diagram illustrating the foregoing logic.
Trace 702 represents the signal from theclock 608.Trace 704 indicates the clock cycles where values for “b” are supplied to themultiplier 602 from thememory 25. Since the bus connecting thememory 25 output port 616 and the multiplierfirst input port 610 is a 64 bit bus, values for the 512 bit number “b” are supplied to themultiplier 602 in 64 bit increments. Accordingly,location 708 ontrace 704 indicates where the first 64 bits of the 512 bit number “b” are transferred to themultiplier 602 via the multiplierfirst input port 610. At a clock pulse after the clock pulse in which the first 64 bits of the “b” value was transferred to themultiplier memory 25 to thepreload register 604. This is indicated at the pulse 710 ontrace 706. The foregoing can also be implemented with pulse 710 occurring two or more cycles after the cycle loading the “b” information as well. This process is repeated until all bits representing “b” have been loaded into themultiplier 602 and all bits representing the new “a” value have been pre-loaded into thepreload register 604. - FIGS. 13 and 14 are flow charts depicting the multiplication operations of one embodiment of the present invention. First, as shown in
block 802, KN bits of “A” are provided from thepreload register 602 to themultiplier 602 in the multipliersecond input port 612. This is accomplished in a single clock pulse. Next, N bits of “b” are provided to thefirst input port 610 of themultiplier 602 in a single clock pulse. This is shown inblock 804. - In the Montgomery algorithm, the operand “a” is often used in successive calculations, and can also be predicted from past values. Because of this deterministic nature, the value for “a” for in successive calculations may be predicted. If a new value is predicted for “a” in following computations, N bits of the predicted “a” value is provided from the
memory 25 to thepreload register 604 in a single clock. This can be performed in a clock pulse following the pulse providing the N bits of “b” to the multiplier, and is depicted inblocks memory 25 at this time, a potential bottleneck in data flow for new “a” values is minimized, as described above with reference to FIG. 12. If a new value for “a” is not anticipated, the logic fromblock 806 proceeds to block 808, where the KN bits of “a” are multiplied by the N bits of “b.” - This process is completed until all KN bits of “b” have been multiplied by all KN bits of “a,” as illustrated in
block 810, resulting in the output number provided in block 812. Then, as shown inblock 814, N bits of the output number are provided to the multiplier output port in a single clock pulse. If the current output value from themultiplier 602 is required for the next multiplication, N bits of the output number are provided to thepreload register 604. This is illustrated inblocks memory 25. As depicted in block 822, the operations performed inblocks 814 through 822 are repeated until all KN bits of the output number are provided to thememory 25. - Using the foregoing techniques, the
multiplier 602 is capable of efficiently performing a number of operations on “a” and “b,” in addition to multiplication. These operations are described in Table 1 below:TABLE 1 Address Instruction Control Word Description 0000 a + b Read value “a” from the memory 25, and add it to anaccumulator in the multiplier 602. This can beaccomplished by performing the operation [(a * b) + acc] where either “a” or “b” are set to one. 0001 a * b + acc Read “a” and “b” from the memory 25 and execute amultiply and accumulate function. The LSB of the result is stored back in the memory 25. All datatransfer between the multiplier 602 and thememory 25occurs with the LSBs first, with the memory address decremented by one after each memory read or memory write operation. 0010 a * b + acc Use previous value of “a,” read “b” from memory 25and execute multiply and accumulate function. The LSBs of the result are stored back in the memory 25.0011 Save acc Store accumulator value in the memory 25, and clearaccumulator. 0100 Save acc and overflow Store accumulator and overflow in the memory 25.Clear accumulator and overflow. 0110 ((a * b) *2) + acc Use previous value of “a,” read “b” from the memory 25, and execute multiply and accumulate * 2 function. The LSBs of the result are stored back in the memory 25. 0111 Clear acc and overflow Clear the accumulator and overflow. - A computationally efficient multiplication apparatus and method especially well suited to modular exponentiation has been described. The apparatus uses a preload register, coupled to a multiplier via a KN bit bus to load the value of the “a” multiplicand in the multiplier in a single clock pulse. The “b” multiplicand (which is also KN bits long) is supplied to the multiplier N bits at a time from a memory via an N bit bus. The multiplier multiplies the N bits of the “b” multiplicand by the KN bits of the “a” multiplicand until all KN bits of “b” are multiplied by the KN bits of “a.”
- The method provides KN bits of the multiplicand “a” from a preload register to the multiplier in a single clock pulse. Then, N bits of the multiplicand “b” are provided to the multiplier, also in a single clock pulse. Next, the KN bits of the number “a” are repeatedly multiplied by the N bits of the number b until all of the KN bits of the “b” multiplicand are provided to the first multiplier input port and multiplied by the KN bits of the “a” multiplicand. When completed, these operations result in an output number, which is then transmitted N bits at a time to the memory, where it can be made available for further processing.
- In accordance with the deterministic behavior of the Montgomery algorithm, one embodiment of the present invention loads a predicted (future) value for multiplicand “a” into the preload register while multiplication operations on the current “a” and “b” multiplicands are being performed. This technique further reduces the clock cycles necessary to load and multiply the parameters.
- It should also be appreciated that various modifications, adaptations, and alternative embodiments of the computationally efficient multiplier may be made within the scope and spirit of the present invention. For example, while the present invention is well suited to cryptographic systems implemented with special purpose processors, it is also useful in non-cyrptographic systems and may be implemented in general purpose processors as well. In such cases, one or more computer-executable programs of instructions implementing the invention may be tangibly embodied in a computer-readable program storage device such as a floppy disk or other storage media.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/758,782 US6434585B2 (en) | 1998-03-30 | 2001-01-11 | Computationally efficient modular multiplication method and apparatus |
US10/043,580 US20020103843A1 (en) | 1998-03-30 | 2002-01-11 | Computationally efficient modular multiplication method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US5037998A | 1998-03-30 | 1998-03-30 | |
US09/758,782 US6434585B2 (en) | 1998-03-30 | 2001-01-11 | Computationally efficient modular multiplication method and apparatus |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US5037998A Continuation | 1998-03-30 | 1998-03-30 | |
US09/350,379 Continuation US6247672B1 (en) | 1999-07-09 | 1999-07-09 | Wrist rest for stenotypists |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/043,580 Division US20020103843A1 (en) | 1998-03-30 | 2002-01-11 | Computationally efficient modular multiplication method and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
US20010010077A1 true US20010010077A1 (en) | 2001-07-26 |
US6434585B2 US6434585B2 (en) | 2002-08-13 |
Family
ID=21964917
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/758,782 Expired - Lifetime US6434585B2 (en) | 1998-03-30 | 2001-01-11 | Computationally efficient modular multiplication method and apparatus |
US10/043,580 Abandoned US20020103843A1 (en) | 1998-03-30 | 2002-01-11 | Computationally efficient modular multiplication method and apparatus |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/043,580 Abandoned US20020103843A1 (en) | 1998-03-30 | 2002-01-11 | Computationally efficient modular multiplication method and apparatus |
Country Status (5)
Country | Link |
---|---|
US (2) | US6434585B2 (en) |
EP (1) | EP0947914B1 (en) |
JP (1) | JPH11305996A (en) |
CA (1) | CA2251178A1 (en) |
DE (1) | DE69828150T2 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020010730A1 (en) * | 2000-05-11 | 2002-01-24 | Blaker David M. | Accelerated montgomery exponentiation using plural multipliers |
US20020061104A1 (en) * | 2000-10-19 | 2002-05-23 | Oberthur Card Systems Sa | Calculation unit for executing a cryptographic protocol |
US20030031316A1 (en) * | 2001-06-08 | 2003-02-13 | Langston R. Vaughn | Method and system for a full-adder post processor for modulo arithmetic |
US20030074382A1 (en) * | 2000-05-05 | 2003-04-17 | Bernd Schmandt | Method and device for modulo calculation |
EP1306748A2 (en) * | 2001-10-24 | 2003-05-02 | Matsushita Electric Industrial Co., Ltd. | Modular squaring circuit, modular squaring method, and modular squaring program |
US20030086564A1 (en) * | 2001-09-05 | 2003-05-08 | Kuhlman Douglas A. | Method and apparatus for cipher encryption and decryption using an s-box |
US20040210613A1 (en) * | 2001-08-29 | 2004-10-21 | Infineon Technologies Ag | Method and apparatus for modular multiplication |
US20040267859A1 (en) * | 2001-10-17 | 2004-12-30 | Infineon Technologies Ag | Method and device for calculating a result of an exponentiation |
US20060010191A1 (en) * | 2001-06-13 | 2006-01-12 | Takahashi Richard J | Circuit and method for performing multiple modulo mathematic operations |
US7136484B1 (en) * | 2001-10-01 | 2006-11-14 | Silicon Image, Inc. | Cryptosystems using commuting pairs in a monoid |
US7233663B2 (en) * | 2001-10-29 | 2007-06-19 | Safenet, Inc. | Key generation performance improvement |
US20090067617A1 (en) * | 2007-09-10 | 2009-03-12 | Spansion Llc | Secure modular exponentiation by randomization of exponent scanning |
CN109710308A (en) * | 2017-10-25 | 2019-05-03 | 阿里巴巴集团控股有限公司 | Processing method, the device and system of task |
CN110088727A (en) * | 2016-12-12 | 2019-08-02 | 皇家飞利浦有限公司 | It is arranged to calculate the electronic computing device of the product of integer |
Families Citing this family (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2352309B (en) * | 1999-07-21 | 2004-02-11 | Advanced Risc Mach Ltd | A system and method for performing modular multiplication |
DE10061998A1 (en) * | 2000-12-13 | 2002-07-18 | Infineon Technologies Ag | The cryptographic processor |
JP3709553B2 (en) | 2000-12-19 | 2005-10-26 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Arithmetic circuit and arithmetic method |
JP3532860B2 (en) * | 2001-01-22 | 2004-05-31 | 株式会社東芝 | Arithmetic device, method, and program using remainder representation |
US6748412B2 (en) | 2001-09-26 | 2004-06-08 | Intel Corporation | Square-and-multiply exponent processor |
US20030065696A1 (en) * | 2001-09-28 | 2003-04-03 | Ruehle Michael D. | Method and apparatus for performing modular exponentiation |
US7194089B2 (en) * | 2001-10-24 | 2007-03-20 | International Business Machines Corporation | Method for reducing a value modulo a shared secret |
GB2383435A (en) * | 2001-12-18 | 2003-06-25 | Automatic Parallel Designs Ltd | Logic circuit for performing modular multiplication and exponentiation |
KR100436814B1 (en) * | 2001-12-20 | 2004-06-23 | 한국전자통신연구원 | apparatus for RSA Crypto Processing of IC card |
US7240084B2 (en) * | 2002-05-01 | 2007-07-03 | Sun Microsystems, Inc. | Generic implementations of elliptic curve cryptography using partial reduction |
US20040010530A1 (en) * | 2002-07-10 | 2004-01-15 | Freking William L. | Systolic high radix modular multiplier |
US7260595B2 (en) | 2002-12-23 | 2007-08-21 | Arithmatica Limited | Logic circuit and method for carry and sum generation and method of designing such a logic circuit |
US8194855B2 (en) * | 2003-06-30 | 2012-06-05 | Oracle America, Inc. | Method and apparatus for implementing processor instructions for accelerating public-key cryptography |
US7650374B1 (en) | 2004-03-02 | 2010-01-19 | Sun Microsystems, Inc. | Hybrid multi-precision multiplication |
US7519644B2 (en) * | 2004-05-27 | 2009-04-14 | King Fahd University Of Petroleum And Minerals | Finite field serial-serial multiplication/reduction structure and method |
US7953814B1 (en) | 2005-02-28 | 2011-05-31 | Mcafee, Inc. | Stopping and remediating outbound messaging abuse |
US9160755B2 (en) | 2004-12-21 | 2015-10-13 | Mcafee, Inc. | Trusted communication network |
US9015472B1 (en) | 2005-03-10 | 2015-04-21 | Mcafee, Inc. | Marking electronic messages to indicate human origination |
US20070150530A1 (en) * | 2005-12-13 | 2007-06-28 | Intel Corporation | Resisting cache timing based attacks |
US8028015B2 (en) * | 2007-08-10 | 2011-09-27 | Inside Contactless S.A. | Method and system for large number multiplication |
US20090234866A1 (en) * | 2008-03-17 | 2009-09-17 | Paul Caprioli | Floating Point Unit and Cryptographic Unit Having a Shared Multiplier Tree |
US10354229B2 (en) | 2008-08-04 | 2019-07-16 | Mcafee, Llc | Method and system for centralized contact management |
US8356185B2 (en) * | 2009-10-08 | 2013-01-15 | Oracle America, Inc. | Apparatus and method for local operand bypassing for cryptographic instructions |
WO2020146284A1 (en) * | 2019-01-07 | 2020-07-16 | Cryptography Research, Inc. | Efficient squaring with loop equalization in arithmetic logic units |
DE102020102453A1 (en) | 2020-01-31 | 2021-08-05 | Infineon Technologies Ag | Integrated circuit for the modular multiplication of two whole numbers for a cryptographic method and method for the cryptographic processing of data based on modular multiplication |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5121431A (en) | 1990-07-02 | 1992-06-09 | Northern Telecom Limited | Processor method of multiplying large numbers |
US5420815A (en) | 1991-10-29 | 1995-05-30 | Advanced Micro Devices, Inc. | Digital multiplication and accumulation system |
US5274707A (en) | 1991-12-06 | 1993-12-28 | Roger Schlafly | Modular exponentiation and reduction device and method |
US5513133A (en) * | 1992-11-30 | 1996-04-30 | Fortress U&T Ltd. | Compact microelectronic device for performing modular multiplication and exponentiation over large numbers |
JPH0720778A (en) * | 1993-07-02 | 1995-01-24 | Fujitsu Ltd | Residue calculation device, table creation device and multiplication remainder calculation device |
US5794028A (en) | 1996-10-17 | 1998-08-11 | Advanced Micro Devices, Inc. | Shared branch prediction structure |
-
1998
- 1998-10-08 DE DE69828150T patent/DE69828150T2/en not_active Expired - Fee Related
- 1998-10-08 EP EP98308206A patent/EP0947914B1/en not_active Expired - Lifetime
- 1998-10-19 CA CA002251178A patent/CA2251178A1/en not_active Abandoned
-
1999
- 1999-01-28 JP JP11020245A patent/JPH11305996A/en active Pending
-
2001
- 2001-01-11 US US09/758,782 patent/US6434585B2/en not_active Expired - Lifetime
-
2002
- 2002-01-11 US US10/043,580 patent/US20020103843A1/en not_active Abandoned
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7031995B2 (en) * | 2000-05-05 | 2006-04-18 | Infineon Technologies Ag | Method and device for modulo calculation |
US20030074382A1 (en) * | 2000-05-05 | 2003-04-17 | Bernd Schmandt | Method and device for modulo calculation |
US20020010730A1 (en) * | 2000-05-11 | 2002-01-24 | Blaker David M. | Accelerated montgomery exponentiation using plural multipliers |
US6820105B2 (en) * | 2000-05-11 | 2004-11-16 | Cyberguard Corporation | Accelerated montgomery exponentiation using plural multipliers |
US7457408B2 (en) * | 2000-10-19 | 2008-11-25 | Oberthur Technologies | Calculation unit for executing a cryptographic protocol |
US20020061104A1 (en) * | 2000-10-19 | 2002-05-23 | Oberthur Card Systems Sa | Calculation unit for executing a cryptographic protocol |
US20030031316A1 (en) * | 2001-06-08 | 2003-02-13 | Langston R. Vaughn | Method and system for a full-adder post processor for modulo arithmetic |
US7194088B2 (en) * | 2001-06-08 | 2007-03-20 | Corrent Corporation | Method and system for a full-adder post processor for modulo arithmetic |
US20090106342A1 (en) * | 2001-06-13 | 2009-04-23 | Itt Manufacturing Enterprises, Inc. | Circuit and method for performing multiple modulo mathematic operations |
US8090757B2 (en) | 2001-06-13 | 2012-01-03 | Itt Manufacturing Enterprises, Inc. | Circuit and method for performing multiple modulo mathematic operations |
US7320015B2 (en) * | 2001-06-13 | 2008-01-15 | Itt Manufacturing Enterprises, Inc. | Circuit and method for performing multiple modulo mathematic operations |
US20060010191A1 (en) * | 2001-06-13 | 2006-01-12 | Takahashi Richard J | Circuit and method for performing multiple modulo mathematic operations |
US20060015553A1 (en) * | 2001-06-13 | 2006-01-19 | Takahashi Richard J | Circuit and method for performing multiple modulo mathematic operations |
US7016927B2 (en) * | 2001-08-29 | 2006-03-21 | Infineon Technologies Ag | Method and apparatus for modular multiplication |
US20040210613A1 (en) * | 2001-08-29 | 2004-10-21 | Infineon Technologies Ag | Method and apparatus for modular multiplication |
US20030086564A1 (en) * | 2001-09-05 | 2003-05-08 | Kuhlman Douglas A. | Method and apparatus for cipher encryption and decryption using an s-box |
US7136484B1 (en) * | 2001-10-01 | 2006-11-14 | Silicon Image, Inc. | Cryptosystems using commuting pairs in a monoid |
US20040267859A1 (en) * | 2001-10-17 | 2004-12-30 | Infineon Technologies Ag | Method and device for calculating a result of an exponentiation |
US7016929B2 (en) * | 2001-10-17 | 2006-03-21 | Infineon Technologies Ag | Method and device for calculating a result of an exponentiation |
US20030128842A1 (en) * | 2001-10-24 | 2003-07-10 | Toshihisa Nakano | Modular squaring circuit, modular squaring method, and modular squaring program |
EP1306748A3 (en) * | 2001-10-24 | 2005-10-12 | Matsushita Electric Industrial Co., Ltd. | Modular squaring circuit, modular squaring method, and modular squaring program |
EP1306748A2 (en) * | 2001-10-24 | 2003-05-02 | Matsushita Electric Industrial Co., Ltd. | Modular squaring circuit, modular squaring method, and modular squaring program |
US7233663B2 (en) * | 2001-10-29 | 2007-06-19 | Safenet, Inc. | Key generation performance improvement |
US20090067617A1 (en) * | 2007-09-10 | 2009-03-12 | Spansion Llc | Secure modular exponentiation by randomization of exponent scanning |
US8670557B2 (en) * | 2007-09-10 | 2014-03-11 | Spansion Llc | Cryptographic system with modular randomization of exponentiation |
CN110088727A (en) * | 2016-12-12 | 2019-08-02 | 皇家飞利浦有限公司 | It is arranged to calculate the electronic computing device of the product of integer |
CN109710308A (en) * | 2017-10-25 | 2019-05-03 | 阿里巴巴集团控股有限公司 | Processing method, the device and system of task |
US11018864B2 (en) * | 2017-10-25 | 2021-05-25 | Alibaba Group Holding Limited | Method, device, and system for task processing |
Also Published As
Publication number | Publication date |
---|---|
EP0947914A1 (en) | 1999-10-06 |
EP0947914B1 (en) | 2004-12-15 |
JPH11305996A (en) | 1999-11-05 |
CA2251178A1 (en) | 1999-09-30 |
US6434585B2 (en) | 2002-08-13 |
US20020103843A1 (en) | 2002-08-01 |
DE69828150D1 (en) | 2005-01-20 |
DE69828150T2 (en) | 2005-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6434585B2 (en) | Computationally efficient modular multiplication method and apparatus | |
US6282290B1 (en) | High speed modular exponentiator | |
US6240436B1 (en) | High speed montgomery value calculation | |
EP0801345B1 (en) | Circuit for modulo multiplication and exponentiation arithmetic | |
US4658094A (en) | Encryption apparatus and methods for raising a large unsigned integer to a large unsigned integer power modulo a large unsigned integer | |
US6182104B1 (en) | Circuit and method of modulo multiplication | |
US6671709B2 (en) | Multiplier cell and method of computing | |
JP2001527673A (en) | Apparatus and method for modular multiplication and exponentiation based on Montgomery multiplication | |
KR100436814B1 (en) | apparatus for RSA Crypto Processing of IC card | |
EP0938790B1 (en) | A method and device for executing a decrypting mechanism through calculating a standardized modular exponentiation for thwarting timing attacks | |
Rankine | Thomas—a complete single chip RSA device | |
US8458492B2 (en) | Crypto-engine for cryptographic processing of data | |
EP1068565B1 (en) | Acceleration and security enhancements for elliptic curve and rsa coprocessors | |
US8781112B2 (en) | Signed montgomery arithmetic | |
US20060008080A1 (en) | Modular-multiplication computing unit and information processing unit | |
EP1547302B1 (en) | Method and apparatus for processing arbitrary key bit length encryption operations with similar efficiencies | |
EP1818810B1 (en) | Circuit and method for multiplying long integer values | |
US20040064274A1 (en) | Residue calculating unit immune to power analysis | |
KR100564599B1 (en) | Inverse calculation circuit, inverse calculation method, and storage medium encoded with computer-readable computer program code | |
KR100399048B1 (en) | Apparatus of Elliptic Curve Cryptosystem | |
KR20030048631A (en) | Crypto Processing apparatus for Elliptic Curve Cryptosystem | |
Großschädl | A new serial/parallel architecture for a low power modular multiplier | |
KR20020071327A (en) | High-radix Modular Exponentiator for RSA using CRT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: SAFENET, INC., MARYLAND Free format text: MERGER;ASSIGNOR:RAINBOW TECHNOLOGIES, INC.;REEL/FRAME:017537/0070 Effective date: 20040315 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERA Free format text: FIRST LIEN PATENT SECURITY AGREEMENT;ASSIGNOR:SAFENET, INC.;REEL/FRAME:019161/0506 Effective date: 20070412 |
|
AS | Assignment |
Owner name: DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERA Free format text: SECOND LIEN PATENT SECURITY AGREEMENT;ASSIGNOR:SAFENET, INC.;REEL/FRAME:019181/0012 Effective date: 20070412 |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |
|
AS | Assignment |
Owner name: SAFENET, INC., MARYLAND Free format text: FIRST LIEN PATENT SECURITY AGREEMENT RELEASE;ASSIGNOR:DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERAL AGENT;REEL/FRAME:032436/0871 Effective date: 20140305 Owner name: SAFENET, INC., MARYLAND Free format text: SECOND LIEN PATENT SECURITY AGREEMENT RELEASE;ASSIGNOR:DEUTSCHE BANK TRUST COMPANY AMERICAS, AS COLLATERAL AGENT;REEL/FRAME:032469/0359 Effective date: 20140305 |
|
AS | Assignment |
Owner name: BANK OF AMERICA, N.A., AS COLLATERAL AGENT, NORTH Free format text: FIRST LIEN PATENT SECURITY AGREEMENT;ASSIGNOR:SAFENET, INC.;REEL/FRAME:032441/0015 Effective date: 20140305 |
|
AS | Assignment |
Owner name: BANK OF AMERICA, N.A. AS COLLATERAL AGENT, NORTH C Free format text: SECOND LIEN PATENT SECURITY AGREEMENT;ASSIGNOR:SAFENET, INC.;REEL/FRAME:032448/0677 Effective date: 20140305 |
|
AS | Assignment |
Owner name: SAFENET, INC., MARYLAND Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (FIRST LIEN);ASSIGNOR:BANK OF AMERICA, N.A.;REEL/FRAME:034862/0366 Effective date: 20150106 Owner name: SAFENET, INC., MARYLAND Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (SECOND LIEN);ASSIGNOR:BANK OF AMERICA, N.A.;REEL/FRAME:034862/0394 Effective date: 20150106 |