US20020178196A1 - Modular arithmetic coprocessor comprising two multiplication circuits working in parallel - Google Patents
Modular arithmetic coprocessor comprising two multiplication circuits working in parallel Download PDFInfo
- Publication number
- US20020178196A1 US20020178196A1 US09/991,494 US99149401A US2002178196A1 US 20020178196 A1 US20020178196 A1 US 20020178196A1 US 99149401 A US99149401 A US 99149401A US 2002178196 A1 US2002178196 A1 US 2002178196A1
- Authority
- US
- United States
- Prior art keywords
- register
- input
- output
- circuit
- multiplication
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/527—Multiplying only in serial-parallel fashion, i.e. one operand being entered serially and the other in parallel
- G06F7/5272—Multiplying only in serial-parallel fashion, i.e. one operand being entered serially and the other in parallel with row wise addition of partial products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/60—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
- G06F7/72—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic
- G06F7/728—Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers using residue arithmetic using Montgomery reduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F2207/38—Indexing scheme relating to groups G06F7/38 - G06F7/575
- G06F2207/3804—Details
- G06F2207/3808—Details concerning the type of numbers or the way they are handled
- G06F2207/3812—Devices capable of handling different types of numbers
- G06F2207/382—Reconfigurable for different fixed word lengths
Definitions
- the invention relates to a modular arithmetic coprocessor comprising two multiplication circuits working in parallel. More specifically, the invention relates to the improvement of a known arithmetic coprocessor enabling the performance of modular operations according to the Montgomery method in order to extend the applications of this coprocessor.
- the Montgomery method performs modular computations in a finite field denoted GF(2 n ) without the performance of divisions.
- the words of the pieces of data A and B are given to a multiplication circuit 19 having a series input to receive B, a parallel input to receive the k-bit blocks of A, and a series output.
- the coprocessor 4 a illustrated in FIG. 1 comprises:
- multiplexers 13 , 14 and 15 placed respectively before the registers 10 , 11 and 12 .
- two multiplication circuits 19 and 20 comprising one series input, one parallel input and one series output;
- a demultiplexer 39 [0017] a demultiplexer 39 ;
- delay cells 32 , 33 and 34 to delay the propagation of pieces of binary data by k cycle periods
- A,B P field (A,B) N operation where A, B, and N are encoded on n bits in m words of k bits, N is an odd number and A is subdivided into m words Ai ⁇ 1 (with i as an integer index varying from 1 to m), includes the following steps:
- This approach may raise problems in applications such as smart card applications for which the size of the circuits is physically limited because of the differences in flexibility between the cards and the silicon substrates. Furthermore, there is a demand for the integration of increasing numbers of different functional elements on a card of this kind, and the place available for an encryption circuit is accordingly further reduced. It is therefore necessary to find solutions with which to limit the increase in the size of this circuit while at the same time enabling optimum operation for pieces of data whose size is greater than the size of the originally planned registers.
- This problem is not limited to modular arithmetic coprocessors that process pieces of data with a fixed size of 256 or 512 bits. It can also be transposed more generally to data-handling coprocessors that need to be used for operations on data whose size exceeds their processing capacity.
- the coprocessor can be used to carry out standard operations of multiplication, it is possible to perform modular operations on operands encoded on a number m′*k bits with m′>m.
- the operands A, B and N are manipulated by being divided into q (q as an integer) sub-operands of n bits: A[q ⁇ 1], A[q ⁇ 2] . . . A[ 0 ], B[q ⁇ 1], B[q ⁇ 2] . . . B[ 0 ], N[q ⁇ 1], N[q ⁇ 2]. . . N[ 0 ].
- the coprocessor is used to perform standard operations of multiplication on the sub-operands.
- [0051] 1 —A[ 0 ]*B[ 0 ] is computed.
- the result has the form R[ 1 ] 0 R[ 1 ] 0 with R[ 1 ] 0 and R[ 0 ] 0 being pieces of binary data encoded on n bits.
- R[ 1 ] 0 and R[ 0 ] 0 are output from the registers 11 and 12 and they are stored in a memory.
- [0052] 2 A[ 0 ]*B[ 1 ] is computed.
- the result has the form R[ 1 ] 1 R[ 0 ] 1 with R[ 1 ] 1 and R[ 0 ] 1 being pieces of binary data encoded on n bits.
- R[ 1 ] 1 and R[ 0 ] 1 are output from the registers 11 and 12 and they are stored in a memory.
- T[ 1 ] 0 T[ 0 ] 0 T[ 1 ] 0 and T[ 0 ] 0 being binary pieces of data encoded on n bits.
- T[ 1 ] 0 and T[ 0 ] 0 are output from the registers 11 and 12 and they are stored in a memory.
- [0069] 2 Y[ 0 ]*N[ 1 ] is computed.
- the result has the form T[ 1 ] 1 T[ 0 ] 1 with T[ 1 ] 1 and T[ 0 ] 1 encoded on n bits.
- T[ 1 ] 1 and T[ 0 ] 1 are output from the registers 11 and 12 and they are stored in a memory.
- the result Z of the addition has the form (c) Z[q] Z[q ⁇ 1] . . . Z[ 1 ] Z[ 0 ].
- the method requires a certain number of exchanges of data with the exterior. These exchanges entail penalties in terms of computation time and memory space to store the results extracted from the coprocessor.
- the value of the coprocessors is that they use a faster clock frequency than that of the other elements that are connected to them.
- the value of using a coprocessor is minimized if the processing operations for which it is designed involve exchanges with circuits (standard processors, memories, etc.) that work more slowly, namely circuits to whose speeds they have to adapt during the exchanges.
- the inventor has sought to modify the coprocessor illustrated in FIG. 1 so as to improve the processing of the above operations, and more particularly so as to reduce the processing time. To do this, the inventor proposes to modify the existing device so that it makes parallel use of the multiplication circuits 19 and 20 , making it possible to perform operations, both modular and non-modular, at higher speed.
- the device according to the invention makes it possible to implement new methods of computation which are faster than the methods that can be implemented by the device of FIG. 1.
- the invention relates to a device comprising:
- At least one input terminal to receive binary pieces of data to be stored in these registers
- a first addition circuit that performs operations of addition between a piece of data stored in the second register and a piece of data produced by the first multiplication circuit
- a second addition circuit that performs an operation of addition between a piece of data produced by the first addition circuit and a piece of data given to the second addition circuit by the second multiplication circuit
- multiplexing means that selectively supplies, to inputs of the first addition circuit, the contents of the second register or a permanent logic state, the connection of an input of the second multiplication circuit to an output of the first register, the connection of the output of the first multiplication circuit to one of the registers and the supply to the second addition circuit of a piece of data produced by the first addition circuit or a permanent logic state.
- the multiplexing means include a first multiplexer comprising two series inputs and one series output, a first input of said multiplexer being connected to an output of the second register, a second input of the multiplexer receiving a permanent logic state and the output of the multiplexer being connected to an input of the first addition circuit.
- the device further comprises a subtraction circuit, placed between the second register and the first addition circuit, that performs a subtraction operation between a piece of data stored in the second register and a piece of data stored in the fifth register, the first multiplexer comprises a third series input, said multiplexer being placed between the subtraction circuit and the first addition circuit and the third input of said multiplexer being connected to an output of the subtraction circuit.
- the device furthermore comprises a third addition circuit, series-connected with the first addition circuit, that performs addition operations between pieces of data stored in the second and fifth registers and a piece of data produced by the first multiplication circuit and multiplexing means that selectively supplies, to an input of the third addition circuit, the contents of the fifth register or a permanent logic state.
- the multiplexing means comprise a second multiplexer having a first input, this first input enabling the connection of the output of the first or third addition circuit to one of the registers to store all or a part of the pieces of data produced by addition between the pieces of data stored in the second and fifth registers and a piece of data produced by the first multiplication circuit.
- the second multiplexer comprises a second input connected to the output of the second addition circuit for the storage, in one or more of the registers, of the data produced by this second multiplication circuit.
- the device comprises means to connect the output of either one of the second or fifth registers to inputs of these third and fourth registers.
- the device comprises a sixth register with series input and series output and multiplexing means to connect the output of this sixth register to inputs of the third and fourth registers.
- the device comprises a multiplexer to selectively connect the input of the third register to the output of the sixth register or to an input terminal.
- the device comprises a multiplexer having two inputs and one output, a first input of the multiplexer being connected to an input terminal to receive pieces of data from outside the device, a second input of the multiplexer being connected to the output of the sixth register for reintroducing, into said register, the pieces of data given at its output.
- the device further comprises a delay cell, placed between an output of the first addition circuit and an input of the second addition circuit, comprising multiplexing means to directly connect said first and second addition circuits, thus preventing the introduction of a delay between said circuits.
- the invention also relates to a device comprising a processor, a memory, a communications bus and a device as defined here above.
- FIG. 1 shows a coprocessor 4 a according to the prior art
- FIG. 2 shows an example of a structure of a circuit comprising a coprocessor
- FIG. 3 shows an exemplary embodiment of a coprocessor 4 b according to the invention.
- FIG. 2 shows an encryption circuit 1 comprising a processor 2 (for example an 8-bit, 16-bit or 32 -bit processor), a memory 3 , a coprocessor 4 and a communications bus 5 .
- the bus 5 is used to connect the elements 2 , 3 and 4 together and/or to the exterior of the circuit 1 in order to exchange pieces of data and/or control signals.
- the memory is used to store pieces of binary data having different formats.
- the processor 2 is used to manage the information exchanges between the memory 3 , the coprocessor 4 and the exterior.
- the coprocessor is a modified version of the coprocessor 4 a shown in FIG. 1, as illustrated in FIG. 3.
- FIG. 3 shows an exemplary coprocessor 4 b according to the invention.
- the circuit shown in FIG. 3 comprises:
- a multiplexer 41 comprising two series inputs and one series output.
- the series output of the multiplexer 41 is connected to the input of the register 40 .
- a first input of the multiplexer 41 is connected to a first input terminal 42 and a second input of the multiplexer 41 is connected to the output of the register 40 .
- a multiplexer 13 comprising three series inputs and one series output.
- the series output of the multiplexer 13 is connected to the input of the register 10 .
- a first input of the multiplexer 13 is connected to a second input terminal 43 and a second input of the multiplexer 13 is connected to the output of the register 10 .
- a multiplexer 14 comprising two series inputs and one series output.
- the series output of the multiplexer 14 is connected to the input of the register 11 and a first input of the multiplexer 14 is connected to a third input terminal 44 ,
- a multiplexer 15 comprising three series inputs and one series output.
- the series output of the multiplexer 15 is connected to the input of the register 12 .
- a first input of the multiplexer 15 is connected to a fourth input terminal 45 and a second input of the multiplexer, 15 is connected to the output of the register 12 ,
- a multiplexer 46 comprising two series inputs and one series output.
- the series output of the multiplexer 46 is connected to the input of the register 17 .
- a first input of the multiplexer 46 is connected to a fifth input terminal 47 , and a second input of the multiplexer 46 is connected to the output of the register 40 .
- two multiplication circuits 19 and 20 comprising a series input, a parallel input to receive k bits and a series output,
- two k-bit storage registers 21 and 22 comprising a parallel input and output.
- the input of the register 21 is connected to the output of the register 16 .
- the output of the register 21 is connected to the parallel input of the multiplication circuit 19 and the output of the register 22 is connected to the parallel input of the multiplication circuit 20 ,
- a multiplexer 23 comprising two parallel inputs and one parallel output.
- the first input of the multiplexer 23 is connected to the output of the register 17 and a second input of the multiplexer 23 is connected to the output of the register 18 .
- the output of the multiplexer 23 is connected to the input of the register 22 .
- two multiplexers 24 and 25 each having two series inputs and one series output.
- the output of the multiplexer 24 is connected to the input of the register 16 .
- a first input of the multiplexer 24 is connected to the output of the register 40 .
- the output of the multiplexer 25 is connected to the series input of the multiplication circuit 19 .
- a first input of the multiplexer 25 receives a logic zero.
- a multiplexer 48 comprising four series inputs and one series output.
- the output of the multiplexer 48 is connected to the series input of the multiplication circuit 20 and a first input of this multiplexer receives a logic zero,
- subtraction circuits 27 , 28 and 29 each comprising two series inputs and one series output.
- a first input of the circuit 27 is connected to the output of the register 10 .
- the output of the circuit 27 is connected to each of the second inputs of the multiplexers 24 and 25 , to an output terminal 49 and to a fourth input of the multiplexer 48 .
- a multiplexer 50 comprising two series inputs and one series output.
- the output of the multiplexer 50 is connected to a first input of the circuit 28 .
- a first input of the multiplexer 50 is connected to the output of the register 11 and a second input of this multiplexer receives a logic zero.
- three addition circuits 30 , 31 and 51 each comprising two series inputs and one series output.
- a first input of this circuit 30 is connected to the output of the multiplication circuit 19 and a second input of this circuit is connected to the output of the subtraction circuit 28 .
- the output of the circuit 30 is connected firstly to a second input of the multiplexer 48 and, secondly, to a first input of the circuit 51 .
- the output of the circuit 31 is connected to a first input of the circuit 29 .
- a multiplexer 52 comprising two series inputs and one series output.
- the output of the multiplexer 52 is connected to a second input of the addition circuit 51 .
- a first input of the multiplexer 52 is connected to the output of the register 12 and a second input of this multiplexer receives a logic zero.
- a multiplexer 53 comprising three series inputs and one series output.
- the series output of the multiplexer 53 is connected to a first input of the addition circuit 31 and a first input of this multiplexer is connected to the output of the addition circuit 51 .
- the third input of the multiplexer receives a logic zero.
- delay cells 32 , 33 and 34 for the delaying, by k cycle periods, of the propagation of pieces of binary data (these cells being typically k-bit shift registers, namely registers having the size of the registers 16 , 17 and 18 ), these cells comprising one series input and one series output.
- the output of the cell 32 is connected firstly to the third input of the multiplexer 48 and secondly to the input of the cell 33 .
- the output of the cell 33 is connected to a second input of the circuit 29 .
- the input of the cell 34 is connected to the output of the addition circuit 51 and the output of the cell 34 is connected to the second input of the multiplexer 53 ,
- a comparison circuit 35 comprising two series inputs and two outputs. A first input of the circuit 35 is connected to the output of the circuit 31 and a second input of the circuit 35 is connected to the output of the circuit 29 .
- two multiplexers 36 and 37 each comprising two series inputs, one selection input and one series output.
- Each of the first series inputs of the multiplexers 36 and 37 receives a logic zero.
- Each of the selection inputs is connected to one of the outputs of the circuit 35 .
- the output of the multiplexer 36 is connected to a second input of the circuit 27 and the output of the multiplexer 37 is connected to a second input of the circuit 28 ,
- a multiplexer 38 comprising two series inputs and one series output.
- a first input of the multiplexer 38 receives a logic “one”.
- a second input of the multiplexer 38 is connected to the output of the register 12 .
- the output of the multiplexer 38 is connected, firstly, to the input of the cell 32 and, secondly, to second inputs of the multiplexers 36 and 37 ,
- a demultiplexer 39 comprising a series input and two series outputs.
- the input of the demultiplexer 39 is connected to the output of the circuit 20 and a first output of the demultiplexer 39 is connected to the input of the register 18 ,
- a delay cell 54 for the delaying, by k cycle periods, the propagation of pieces of binary data (these cells being typically k-bit shift registers), this cell comprising one series input and one series output.
- the input of the cell 54 is connected to a second output of the demultiplexer 39 .
- a multiplexer 55 comprising two series inputs and one series output.
- a first input of the multiplexer 55 is connected to the second output of the demultiplexer 39 .
- a second input of the multiplexer 55 is connected to the output of the cell 54 and the output of the multiplexer 55 is connected to a second input of the addition circuit 31 ,
- a multiplexer 56 comprising two series inputs and one series output.
- a first input of the multiplexer 56 is connected to the output of the addition circuit 51 , namely to the output of the second of the two series-connected addition circuits 30 and 51 .
- a second input of the multiplexer 56 is connected to the output of the addition circuit 31 .
- the output of this multiplexer is connected to third inputs of the multiplexers 13 and 15 and to a second input of the multiplexer 14 ,
- output and input terminals it could be chosen to use distinct terminals but these could also be one or more input/output terminals common to several elements of the coprocessor.
- One advantage in using distinct terminals is that it is possible to receive and/or give pieces of data from and/or to elements external to the coprocessor (such as the processor 2 for example) in parallel, thus reducing the period of the exchanges between the circuit and the external elements.
- the above description must also be understood as being a description that is functional rather than structural. In other words, it is important to be able to make the connections as indicated between elements fulfilling the functions listed (multiplexing, addition, subtraction, etc.). In particular, the multiplexers could be combined with one another and then have a greater number of inputs, outputs and selection inputs.
- multiplexers 25 , 36 , 37 , 40 , 50 , 52 and 53 which have an input receiving a fixed signal (a logic zero or logic one), it is possible to incorporate them functionally into the circuits 19 , 27 , 28 , 30 and 51 if these circuits include a control input enabling a logic state to be imposed on one of their inputs.
- the device of FIG. 3 includes the same elements, some added elements and modifications in the connections of the elements with one another.
- the device of FIG. 3 has a supplementary register 40 , a supplementary addition circuit 51 , a supplementary delay cell 54 and multiplexers 41 , 46 , 50 , 52 , 53 , 55 and 56 .
- the multiplexer 14 has its first input connected to an input terminal 44 .
- the multiplexer 48 has a supplementary input so that the pieces of data stored in the register 10 can be given at the series input of the multiplication circuit 20 .
- the register 40 enables the storage in the coprocessor of an n-bit operand A, enabling the exchanges between the coprocessor and the exterior to be limited and therefore providing for a gain in time.
- the multiplexer 46 enables the loading of the register 17 with bits given by the register 40 . This enables the parallel use of the addition circuits 19 and 20 without requiring successive operations for the loading of the registers 16 and 17 with words given to the coprocessor.
- the multiplexer 41 makes it possible, by looping the output of the register 40 to its input, to keep the contents of this register permanently intact when words are given, by shifting, at the output of this register.
- the working of the coprocessor is interrupted, it is possible to resume the course of operations at an already performed step without having to reload the register from the exterior of the coprocessor. Care will be taken simply to carry out the shifts that are necessary in the register to return to the state corresponding to the step to be performed (if the words contained in the register 40 are always identical, their organization develops as a function of the shifts made).
- the multiplexer 50 enables the setting at zero of the second input of the addition circuit 30 independently of the contents of the register 11 . This makes it possible to make the addition circuit 30 transparent to the flow of bits produced at the output of the multiplication circuit 19 . Indeed, in this case, the bits at the output of the addition circuit 30 are identical to those received at the first input of this circuit.
- the multiplexer 50 can be placed directly between the output of the subtraction circuit 28 and the first input of the addition circuit 30 .
- the multiplexer used then will be a three-input multiplexer having a first input connected to the output of the register 11 , a second input connected to the output of the subtraction circuit 28 and a third input receiving a logic zero.
- Another approach includes using a two-input multiplexer 50 as shown in FIG. 3 and in selecting the first input of the multiplexer 37 so that the subtraction circuit 28 does not modify the bits given to the addition circuit 30 by the register 11 .
- the fourth input of the multiplexer 48 could be connected directly to the output of the register 10 (which is equivalent, from a functional point of view, to connecting it to the output of the subtraction circuit 27 if the latter receives a logic zero constantly at its second input).
- the addition circuit 51 enables the implementation of operations of the type A*B+C+D, the pieces of data C and D being given by the registers 11 and 12 .
- the multiplexer 52 makes it possible firstly to give the contents of the register 12 to the addition circuit 51 to implement said operation or secondly to permanently give it a logic zero. This latter feature makes it possible to make the addition circuit 51 transparent to the bits produced by the multiplication circuit 19 and addition circuit 30 . This makes it possible firstly to carry out the operation A*B+C in a simple way (there is then no need to load the register 12 with n logic zeros) and, secondly, to carry out, without excessive complication, the usual operations for which the device of FIG. 1 is planned (it is necessary only to provide for an additional selection control signal for the multiplexer 52 ).
- the addition circuit 51 is used to carry out operations on pieces of data with a size greater than n, these operations requiring handling operations of the A*B+C+D type. It is possible to do without these operations if it is desired simply to obtain a gain in time with regard to operations relating to pieces of data whose size is smaller than or equal to n, the invention relating above all to the simultaneous use of the two multiplication circuits 19 and 20 .
- the multiplexer 53 enables the output of the addition circuit 51 to be connected directly to the first input of the addition circuit 31 without delaying the transfer of data between these circuits through the delay cell 34 . It also makes it possible to provide logic zeros to the addition circuit 31 . This makes it possible, as shall be seen further below, if necessary to complement the pieces of n-bit data given by the addition circuits 30 and 51 to the addition circuit 31 with k logic zeros so that the size of these pieces of data corresponds to (n+k) bits of the pieces of data produced by the multiplication circuit 20 .
- the delay cell 54 enables the flow of the bits produced by the multiplication circuit 20 to be delayed by k cycle periods.
- the delay cell 34 is not short-circuited by means of the multiplexer 53 , the cell 54 will be modified so as to obtain a delay of 2*k cycle periods.
- the multiplexer 56 enables the storage, in one of the registers 10 to 12 , of the bits produced by the addition circuits 30 (if the second input of the addition circuit 51 receives a logic zero, in which case the circuit 51 is transparent to the bits produced by the circuit 30 ) or 51 .
- the invention enables operations to be performed by the parallel use of the two multiplication circuits 19 and 20 .
- the computation time is halved by the parallel processing of two k-bit words of the multiplicand A as shall be described here below.
- m is a multiple of two. If this is not so, registers of (m+1)*k bits will be used. It is also possible to control the multiplexer 48 so as to give logic zeros at the series input of the multiplication circuit 20 when the mth word of k bits of the piece of data A is given at the parallel input of the multiplication circuit 19 (the multiplication circuit 20 then produces n+k logic zeros).
- the pieces of data A and B could have a size smaller than n. If necessary, the most significant bits of these pieces of data will be complemented by zeros to obtain pieces of data of a size equal to n bits, if it is desired to have only one control program for the processor. It is also possible to provide for a sequencing of the conmmands of the device comprising a variable number of computation loops, enabling the processing time to be reduced if the pieces of data are encoded on a number of bits smaller than n. It is also possible to use operands of different sizes, by complementing the smaller-sized operand with logic zeros or by adapting the control program of the coprocessor.
- the gain in time to perform the operation is equal to 50% with respect to the device shown in FIG. 1 (taking account of the computation steps proper, the initialization step having an identical duration for the devices of FIGS. 1 and 3).
- addition circuit 51 is not used in the implementation of the method. It is therefore possible however in a first stage to improve the prior art device by implementing the device of FIG. 3, except for the addition circuit 51 and the multiplexer 52 . It is also possible, as mentioned already here above, to place the addition circuit 51 between the output of the multiplication circuit 19 and the first input of the addition circuit 30 . It will then be appropriate to connect the first input of the multiplexer 56 to the output of the circuit 30 if it is desired to carry out operations of the A*B+C+D type as illustrated here below.
- One approach is to delay the pieces of data produced by the multiplication circuit 19 between the output of this circuit and the first input of the addition circuit 30 . In this case, it is necessary to short-circuit the cell 34 (or else to use the cell 54 to delay the pieces of data at output of the multiplication circuit 20 ) so that the bits having identical place values are added up at the appropriate time in the addition circuit 31 .
- Another approach is to position the addition circuit 51 downline with respect to the addition circuit 31 and to provide the pieces of data of the register 11 to the circuit 51 by means of appropriate multiplexing means.
- the addition circuit 51 replaces the addition circuit 30 for the addition of pieces of data produced by the multiplication circuits and the contents of the register 11 . It is then necessary to add this addition circuit 51 to the known circuit whereas, in the example described in detail, this circuit 51 is not used and may be eliminated if it is not desired to process pieces of data of a size greater than n. We shall not go into detail into the reversal of the role of the multiplication circuit. The implementing of this reversal entails no difficulty to those skilled in the art, given the indications given here above and the description of the device illustrated with reference to FIG. 3.
- A, B and C have the form A[ 1 ] A[ 0 ], B[ 1 ] B[ 0 ] and C[ 3 ] C[ 2 ] C[ 1 ] C[ 0 ], the operands A[ 1 ] to C[ 0 ] being encoded on n bits.
- [0249] 1 the loading of the piece of data B[ 0 ] into the register 10 .
- [0252] 4 the loading of the piece of data B[ 1 ] in the register 10 .
- [0260] 1 the loading of the pieces of data A[ 0 ] and B[ 0 ] in the registers 40 and 10 .
- [0262] 3 the loading of the piece of data A[ 1 ] in the register 40 (this can be done during the shifting of A[ 0 ] to the registers 16 and 17 ).
- [0264] 5 the loading of the pieces of data A[ 0 ] and B[ 1 ] in the registers 40 and 10 (these may be done during the last iteration of the computation of the step 4).
- [0266] 7 the loading of the pieces of data A[ 1 ] and R′[ 1 ] in the registers 40 and 12 .
- the gain in computation time is 50% for the steps 2, 3, 5 and 6 of the prior art to which it is necessary to add the absence of the steps corresponding to external additions of 2*n bits. Furthermore, with the device of FIG. 1, it is necessary to take account of the routine outputs of data towards the exterior of the coprocessor, once the multiplication operations have been done (which is detrimental from the viewpoint of time and the viewpoint of memory space needed for storage).
- the only piece of data that is output, apart from the sub-operands of the result, is the intermediate result R′[ 1 ]. It is possible, if necessary, to output R′[ 1 ] directly at the output of the multiplexer 56 by connecting this output to an output terminal. This enables this transfer to be done during the computation of the step 4.
- [0272] 1 the comparison firstly of the n bits of A[ 0 ] and A[ 1 ] and secondly of the n bits of B[ 0 ] and B[ 1 ] (it will be assumed that A[ 1 ] and B[ 1 ] encoded on n bits are greater than A[ 0 ] and B[ 0 ]).
- [0274] 3 the computation of A[ 1 ]*B[ 1 ] in the coprocessor and the output of the result referenced C[ 1 ] encoded on 2*n bits.
- [0275] 4 the computation of (B[ 1 ] ⁇ B[ 0 ]) and (A[ 1 ] ⁇ A[ 0 ]) on n bits.
- [0276] 5 the computation of (B[ 1 ]—B[ 0 ])*(A[ 1 ] ⁇ A[ 0 ]) in the coprocessor and the output of the result C[ 3 ] encoded on 2*n bits.
- the coprocessor illustrated in FIG. 3 makes it possible to perform modular operations on operands encoded on a number m′*k bits with m′ greater than or equal to m more quickly than is the case with the device of FIG. 1.
- the operands A, B and N are manipulated by being divided into q (q as an integer) sub-operands of n bits: A[q ⁇ 1] A[q2] . . . A[ 0 ], B[q ⁇ 1] B[q ⁇ 2] . . . B[ 0 ], N[q ⁇ 1] N[q ⁇ 2] . . . N[ 0 ].
- a and B could in practice have different sizes. It is assumed here, for the sake of simplicity, that A and B have the same size.
- the piece of data A comprises an even number of sub-operands (if this is not so, it may be complemented by bits of higher place value that are set at zero so as to obtain an even number).
- the step-by-step operation of the elements of the coprocessor shall not be described in detail as it does not raise any problem to those skilled in the art who know the document D2.
- the result has the form R[ 1 ] 0 R[ 0 ] 0 with R[ 1 ] 0 and R[ 0 ] 0 being pieces of binary data encoded on n bits.
- R[ 0 ]0 X[ 0 ] is output from the register 12 and it is stored in a memory (for example the memory 3 of the circuit 1 )
- [0288] 2 A[ 0 ]*B[ 1 ]+R[ 1 ] 0 is computed.
- the result has the form R[ 1 ] 1 R[ 0 ] 1 with R[ 1 ] 1 and R[ 0 ] 1 being pieces of binary data encoded on n bits.
- R[ 0 ] 1 X[ 1 ] is output from the register 12 and it is stored in a memory.
- the result of the multiplication is the piece of data X[q] X[q ⁇ 1] . . . X[ 1 ] X[ 0 ].
- the result of the multiplication is the piece of data U[q] U[q ⁇ 1] . . . U[ 1 ] U[ 0 ].
- the result Z of the addition has the form (c) Z[q] Z[q ⁇ 1] . . . Z[ 1 ] Z[ 0 ].
- the result has the form R[ 1 ] 0 R[ 0 ] 0 with R[ 1 ] 0 and R[ 0 ] 0 being pieces of binary data encoded on n bits.
- R[ 0 ] 0 W[ 0 ] is output from the register 12 and it is stored in a memory (for example the memory 3 of the circuit 1 )
- [0308] 2 A[ 1 ]*B[ 1 ]+R[ 1 ] 0 is computed.
- the result has the form R[ 1 ] 1 R[ 0 ] 1 with R[ 1 ] 1 and R[ 0 ] 1 being pieces of binary data encoded on n bits.
- R[ 0 ] 1 W[ 1 ] is output from the register 12 and it is stored in a memory.
- W+S( 1 ) is computed, giving the result of the multiplication X with X having the form X[q] X[q ⁇ 1] . . . X[ 1 ] X[ 0 ].
- the computation time is measured in terms of number of clock cycles of the coprocessor.
- the capacity of the coprocessor according to the invention to implement operations of the A*B+C type also has other advantages, for example in the implementation of the RSA encryption method.
- the invention enables the computation of C which has the form X*Q+B by loading B into the register 11 . It will be noted in this respect that it is particularly useful to provide for a multiplexer 14 having one input connected to an input terminal of the circuit. Indeed, in the method explained here above dealing with multiplication for pieces of data whose size is greater than that of the registers, the contents of the register 11 are produced within the circuit. An input terminal is therefore not used in this case to load the contents of the register 11 . If it is desired to compute A*B+C, with C having any unspecified value, it should be possible to enter the piece of data C into the register 11 by connecting one input of the multiplexer to an input terminal.
- the addition circuit 51 is used.
- the operands A, B and N take the form of q n-bit sub-operands: A[q ⁇ 1] A[q ⁇ 2] . . . A[ 0 ], B[q ⁇ 1] B[q ⁇ 2] . . . B[ 0 ], N[q ⁇ 1] N[q ⁇ 2] . . . N[ 0 ].
- [0358] 2 A[ 0 ]*B[ 1 ]+R[ 1 ] 0 +0 is computed.
- the result has the form R[ 1 ] 1 R[ 0 ] 1 with R[ 1 ] 1 and R[ 0 ] 1 being pieces of binary data encoded on n bits.
- R[ 0 ] 1 X[ 1 ] is output from the register 12 and it is stored in a memory.
- the result of the multiplication is the piece of data X[q] X[q ⁇ 1] . . . X[ 1 ] X[ 0 ].
- the result Z of the multiplication is the piece of data Z[q] Z[q ⁇ 1] . . . Z[ 1 ] Z[ 0 ].
- the computation time is measured in terms of number of clock cycles of the coprocessor.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
Abstract
A coprocessor including a first multiplication circuit and a second multiplication circuit with a series input to receive n bits and a series output to give n+k bits. The coprocesser also includes addition and multiplexing circuits enabling the data elements produced by the multiplication circuits to be added up with one another and with other data elements encoded on n bits. The invention makes parallel use of the multiplication circuits to carry out modular or non-modular operations on pieces of binary data having n bits or more.
Description
- This application is a continuation of application Ser. No. 09/428,607, filed Oct. 27, 1999, which in turn is a continuation of application Ser. No. 09/004,375, filed Jan. 8, 1998, entitled MODULAR ARITHMETIC COPROCESSOR COMPRISING TWO MULTIPLICATION CIRCUITS WORKING IN PARALLEL, which prior applications are incorporated herein by reference.
- 1. Field of the Invention
- The invention relates to a modular arithmetic coprocessor comprising two multiplication circuits working in parallel. More specifically, the invention relates to the improvement of a known arithmetic coprocessor enabling the performance of modular operations according to the Montgomery method in order to extend the applications of this coprocessor. The Montgomery method performs modular computations in a finite field denoted GF(2n) without the performance of divisions.
- 2. Description of the Prior Art
- Conventionally, modular operations on GF(2n) are used in cryptography for applications such as authentication of messages, identification of a user and exchange of keys. Such exemplary applications are described for example in French patent application published under No. 2 679 054.
- There are commercially available integrated circuits dedicated to such applications.
- These include, for example the device manufactured by SGS-THOMSON MICROELECTRONICS S.A. as model number ST16CF54, built around an association of the type including a central processing unit and an arithmetic coprocessor and dedicated to performing modular computations. The coprocessor used enables processing of the modular operations by the use of the Montgomery method. It is the object of European patent application No. 0 601 907 A2, hereinafter called the document D2 which in incorporated herein by reference. This coprocessor is illustrated in FIG. 1 (this figure corresponds to FIG. 2 of the document D2).
- The basic operation (called a Pfield operation) implemented by this processor consists of producing, on the basis of three pieces of binary data A (multiplicand), B (multiplier) and N (modulo) encoded on an integer of n bits, of a piece of binary data denoted Pfield(A, B)N encoded on n bits, such that Pfield(A, B)N=A*B*I mod N. I is a piece of binary data called an error and encoded on n bits such that I=2−n mod N (more specifically, the value of I depends on the number of k-bit blocks, with k as an integer, considered for the encoding of A). To carry out the operation A*B*I mod N, it is assumed that the pieces of data are encoded on m words of k bits (m and k being integers), with m*k =n. The words of the pieces of data A and B are given to a
multiplication circuit 19 having a series input to receive B, a parallel input to receive the k-bit blocks of A, and a series output. - In the coprocessor described in the document D2, k=32 and m=8 or 16. This device may be used to produce the result of the modular multiplication A*B mod N. The modular multiplication can be broken down into two successive Pfield elementary operations. Pfield(Pfield(A,B)N, H)N, is computed with H as a piece of data encoded on n bits, called an error correction parameter, and equal to 22n mod N. For further details on the implementation of the modular multiplication, reference may be made to the document D2. Several possibilities of computation are already known. They consist in using either a software method or a specialized circuit such as the one illustrated in the document D2.
- The
coprocessor 4 a illustrated in FIG. 1 comprises: - three
shift registers -
multiplexers registers - three
registers - two
multiplication circuits - two
registers -
multiplexers - a
demultiplexer 39; -
series subtraction circuits -
series addition circuits -
delay cells - a
comparison circuit 35. - For further details on the arrangement of the different elements of the circuit with respect to one another, reference may be made to the document D2 and especially to FIGS. 2 and 3, and to the extracts from the description pertaining thereto:
page 15,line 54 topage 16,line 13, andpage 17,line 50 topage 18,line 55. - The use of the
coprocessor 4 a illustrated in FIG. 2 to carry out a Pfield(A,B)N operation where A, B, and N are encoded on n bits in m words of k bits, N is an odd number and A is subdivided into m words Ai−1 (with i as an integer index varying from 1 to m), includes the following steps: - 1—The Initialization of the Circuit
- the software computation of a parameter J0 defined by the relationship[(N*J0)+1]
mod 2k=0 and the serial loading of the parameter J0 into theregister 17, - the serial loading of B into the
register 10, the serial loading of N into theregister 12, the serial loading of A0 into theregister 16 and the serial loading of n consecutive logic zeros into theregister 11, - the initialization of the two
multiplication circuits subtraction circuits addition circuits - 2—The Setting up of a Loop Indexed by i with i Varying from 1 to m
- the parallel loading into the
register 21 of the contents of theregister 16, - the performance of the different elementary operations in order to perform the following computations:
- X(i)=S(i−1)+B*A i−1
- Y 0(i)=[X(i)*J 0]
mod 2k - Z(i)=X(i)+(N*Y 0(i))
- S(i)=Z/2k, /being the integer division,
- the subtraction, during the following iteration, of N or 0 from S depending on whether S is greater than N or not.
- 3—The Output of the Result S(k) by Means of an Output Terminal
- For farther details on the running of a method of this kind, reference may be made to the document D2 and more particularly to the following extracts: page 4-
line 41 to page 6-line 17 and page 19-lines 7 to 49. - Up till now, the use of the device shown in FIG. 1 could be used to optimize processing operations (in terms of computation time, memory size, etc.) for modular operations using a fixed data size, in this case 256 or 512 bits (depending on whether m is equal to 8 or 16). Now, cryptography requires increasingly efficient machines working at ever-higher speeds and using ever-more complex keys. The trend is thus towards the handling of pieces of data encoded on 768 or even 1024 bits. To process pieces of data of this kind, it is possible to envisage the use of larger-sized circuits by adapting the elements of the circuit to the sizes of the pieces of data. This approach may raise problems in applications such as smart card applications for which the size of the circuits is physically limited because of the differences in flexibility between the cards and the silicon substrates. Furthermore, there is a demand for the integration of increasing numbers of different functional elements on a card of this kind, and the place available for an encryption circuit is accordingly further reduced. It is therefore necessary to find solutions with which to limit the increase in the size of this circuit while at the same time enabling optimum operation for pieces of data whose size is greater than the size of the originally planned registers. This problem is not limited to modular arithmetic coprocessors that process pieces of data with a fixed size of 256 or 512 bits. It can also be transposed more generally to data-handling coprocessors that need to be used for operations on data whose size exceeds their processing capacity.
- If it is desired to carry out modular operations using operands with a size greater than what is managed by the coprocessor (namely in practice greater than the size of the registers), it is possible to use a standard processor (with 8, 16 or 32 bits), a memory and the coprocessor of FIG. 1, the coprocessor being used to perform standard (that is to say non-modular) operations of multiplication.
- It is possible, with the processor described in D2, to carry out standard operations of multiplication A*B on sizes of up to n bits by means of the following procedure.
- 1—Initialization
- the loading of k logic zeros into the
register 17, the loading of B into theregister 10, the loading of n logic zeros into theregisters register 16, - the initialization of the
multiplication circuit 19, the initialization of the addition andsubtraction circuits - 2—The Setting up of a Computation Loop with i as an Index Varying from 1 to m
- the loading of the contents of the
register 16 into theregister 21, - the performance, by a simultaneous rightward shift of the
registers register 11 with the result of the product, - the storage of the k least significant bits into the
register 12 by a k-bit rightward shift, and the storage of the n most significant bits of the result in theregister 11, - the loading of the word A, into the register16 (preferably during the running of one or more of the preceding steps).
- At the end of a procedure such as this, there is therefore the least significant bit of the result in the
register 12 and the most significant bit of the result in theregister 11. All that needs to be done is to add an output terminal connected to the output of theregisters - It is possible to perform the multiplication of a piece of data B encoded on n bits by means of a piece of data A encoded on m′ words with m′ as an integer greater than m. For this purpose, the loop is done with i varying from 1 to m′. At every m iterations, the contents of the
register 12 are output by an output terminal. - Since the coprocessor can be used to carry out standard operations of multiplication, it is possible to perform modular operations on operands encoded on a number m′*k bits with m′>m. For this purpose, the operands A, B and N are manipulated by being divided into q (q as an integer) sub-operands of n bits: A[q−1], A[q−2] . . . A[0], B[q−1], B[q−2] . . . B[0], N[q−1], N[q−2]. . . N[0]. The coprocessor is used to perform standard operations of multiplication on the sub-operands.
- The following method is used:
- 1.1—The Multiplication of B by the First Sub-operand of the Piece of Data A
-
registers - 2—A[0]*B[1] is computed. The result has the form R[1]1 R[0]1 with R[1]1 and R[0]1 being pieces of binary data encoded on n bits. R[1]1 and R[0]1 are output from the
registers - .
- .
- Q—A[0]*B[q−1] is computed. The result has the form R[1]q−1 R[0]q−1 with R[1]q−1 and R[0]q−1 being pieces of binary data encoded on n bits. R[1]q−1 and R[0]q−1 are output from the
registers - 1.2—Computation of the Result of the Multiplication of B by the First Sub-operand of A
- computation of R[1]0+R[0]1 and storage of the result referenced X[1],
- computation of c1+R[1]1+R[0]2 (c1 being the carry value of the previous calculation) and storage of the result referenced X[2],
- .
- .
- computation of cq−2+R[1]q−2+R[0]q−1 and storage of the result referenced X[q−1],
- computation of cq−1+R[1]q−1 and storage of the result referenced X[q].
- If it is assumed that R[0]0=X[0], then the result of the multiplication is the piece of data X[q] X[q−1] . . . X[1] X[0].
- It is of course possible to perform the addition operations as and when the results are output. This makes it possible to minimize the size of the memory in which the results are stored.
- 1.3—Computation of the Result of a Multiplication
- X[0]*J[0]=Y, with Y having the form Y[1] Y[0], Y[1] and Y[0] being encoded on n words, output and storage of Y[0].
- 1.4—Computation of the Result of the Multiplication of the First Sub-operand of Y by the Piece of Data N
- 1—Y[0]*N[0] is computed. The result has the form T[1]0 T[0]0 with T[1]0 and T[0]0 being binary pieces of data encoded on n bits. T[1]0 and T[0]0 are output from the
registers - 2—Y[0]*N[1] is computed. The result has the form T[1]1 T[0]1 with T[1]1 and T[0]1 encoded on n bits. T[1]1 and T[0]1 are output from the
registers - .
- .
- Q—Y[0]*N[q−1] is computed. The result has the form T[1]q−1 T[0]q−1 with T[1]q−1 and T[0]q−1 being pieces of binary data encoded on m words of k bits. T[1]q−1 and T[0]q−1 are output from the
registers - 1.5—Computation of the Result of the Multiplication of N by the First Sub-operand of the Piece of Data Y
- computation of T[1]0+T[0]1 and storage of the result referenced U[1],
- computation of c1+T[1]1+T[0]2 (c1 being the carry value of the previous computation) and storage of the result referenced U[2],
- .
- .
- computation cq−2+T[1]q−2+T[0]q−1 and storage of the result referenced U[q−1],
- computation of cq−1+T[1]q−1 and storage of the result referenced U[q].
- If it is assumed that T[0]0=U[0], then the result of the multiplication is the piece of data U[q] U[q−1] . . . U[1] U[0].
- It is of course possible to perform the addition operations as and when the results are output. This makes it possible to minimize the size of the memory in which the results are stored.
- 1.6—Computation of the Result of the Modular Multiplication of B by the First Sub-operand of the Piece of Data A
- Computation of U+X and storage of the result, referenced Z.
- The result Z of the addition has the form (c) Z[q] Z[q−1] . . . Z[1] Z[0].
- storage of S(1)=Z/2k−(N or 0).
- 2—Resumption of the Steps 1.1 to 1.6 in Considering the Second Sub-operand of the Piece of Data A and in Modifying the Step 1.2 as Here Below
- computation of R[1]0+R[0]1 and storage of the result referenced W[1],
- computation of c1+R[1]1+R[0]2 (c1 being the carry value of the previous calculation) and storage of the result referenced W[2],
- .
- .
- computation of cq−2+R[1]q−2+R[0]q−1 and storage of the result referenced W[q−1],
- computation of cq−1+R[1]q−1 and storage of the result referenced W[q].
- Then:
- computation of W+S(1) which then gives the result of the multiplication X with X having the form X[q] X[q−1] . . . X[1] X[0] and W=W[q] W[q−1] . . . W[1] W[0] with W[0]=R[0]0.
- Q—Resumption of the
Above Step 2 in Taking into Consideration the qth Sub-operand of A. - The final result of the computation is S(q)−N or 0).
- As can be seen, the method requires a certain number of exchanges of data with the exterior. These exchanges entail penalties in terms of computation time and memory space to store the results extracted from the coprocessor. Generally, the value of the coprocessors is that they use a faster clock frequency than that of the other elements that are connected to them. Hence, the value of using a coprocessor is minimized if the processing operations for which it is designed involve exchanges with circuits (standard processors, memories, etc.) that work more slowly, namely circuits to whose speeds they have to adapt during the exchanges.
- The inventor has sought to modify the coprocessor illustrated in FIG. 1 so as to improve the processing of the above operations, and more particularly so as to reduce the processing time. To do this, the inventor proposes to modify the existing device so that it makes parallel use of the
multiplication circuits - Thus, the invention relates to a device comprising:
- a first register, a second register, a third register, a fourth register and a fifth register,
- at least one input terminal to receive binary pieces of data to be stored in these registers,
- a first multiplication circuit that performs a multiplication operation between two pieces of data stored in the first and third registers,
- a second multiplication circuit that performs a multiplication operation between two pieces of data stored in the first and fourth registers,
- a first addition circuit that performs operations of addition between a piece of data stored in the second register and a piece of data produced by the first multiplication circuit,
- a second addition circuit that performs an operation of addition between a piece of data produced by the first addition circuit and a piece of data given to the second addition circuit by the second multiplication circuit,
- a delay cell to delay the supply to the second addition circuit of the piece of data given by the second multiplication circuit,
- multiplexing means that selectively supplies, to inputs of the first addition circuit, the contents of the second register or a permanent logic state, the connection of an input of the second multiplication circuit to an output of the first register, the connection of the output of the first multiplication circuit to one of the registers and the supply to the second addition circuit of a piece of data produced by the first addition circuit or a permanent logic state.
- According to one embodiment, the multiplexing means include a first multiplexer comprising two series inputs and one series output, a first input of said multiplexer being connected to an output of the second register, a second input of the multiplexer receiving a permanent logic state and the output of the multiplexer being connected to an input of the first addition circuit.
- According to one embodiment, the device further comprises a subtraction circuit, placed between the second register and the first addition circuit, that performs a subtraction operation between a piece of data stored in the second register and a piece of data stored in the fifth register, the first multiplexer comprises a third series input, said multiplexer being placed between the subtraction circuit and the first addition circuit and the third input of said multiplexer being connected to an output of the subtraction circuit.
- According to one embodiment, the device furthermore comprises a third addition circuit, series-connected with the first addition circuit, that performs addition operations between pieces of data stored in the second and fifth registers and a piece of data produced by the first multiplication circuit and multiplexing means that selectively supplies, to an input of the third addition circuit, the contents of the fifth register or a permanent logic state.
- According to one embodiment, the multiplexing means comprise a second multiplexer having a first input, this first input enabling the connection of the output of the first or third addition circuit to one of the registers to store all or a part of the pieces of data produced by addition between the pieces of data stored in the second and fifth registers and a piece of data produced by the first multiplication circuit.
- According to one embodiment, the second multiplexer comprises a second input connected to the output of the second addition circuit for the storage, in one or more of the registers, of the data produced by this second multiplication circuit.
- According to one embodiment, the third and fourth registers being used to provide pieces of data to the first and second multiplication circuits, the device comprises means to connect the output of either one of the second or fifth registers to inputs of these third and fourth registers.
- According to one embodiment, the device comprises a sixth register with series input and series output and multiplexing means to connect the output of this sixth register to inputs of the third and fourth registers.
- According to one embodiment, the device comprises a multiplexer to selectively connect the input of the third register to the output of the sixth register or to an input terminal.
- According to one embodiment, the device comprises a multiplexer having two inputs and one output, a first input of the multiplexer being connected to an input terminal to receive pieces of data from outside the device, a second input of the multiplexer being connected to the output of the sixth register for reintroducing, into said register, the pieces of data given at its output.
- According to one embodiment, the device further comprises a delay cell, placed between an output of the first addition circuit and an input of the second addition circuit, comprising multiplexing means to directly connect said first and second addition circuits, thus preventing the introduction of a delay between said circuits.
- The invention also relates to a device comprising a processor, a memory, a communications bus and a device as defined here above.
- The invention also relates to a method for the implementation of a non-modular multiplication A*B, A and B being pieces of binary data encoded in n bits, n being an integer, these pieces of data being subdivided into m words of k bits A=Am−1 . . . A0 and B=Bm−1 . . . B0, m being an even number, the method comprising the following steps:
- 1—Initialization:
- loading the pieces of data A and B into first and second n-bit registers with series input and output, and loading the words A0 and A1 into third and fourth k-bit registers with series input and parallel output,
- initializing first and second addition circuits and of first and second multiplication circuits,
- selecting a first input of a first multiplexer so that it permanently supplies logic zeros to a first series input of the first addition circuit,
- selecting an input of a second multiplexer so that the pieces of data produced by the second multiplication circuit are given with a delay of k clock strokes to a series input of the second addition circuit,
- selecting inputs of a third and fourth multiplexers so as to connect an output of the first register to series inputs of the first and second multiplication circuits.
- 2—Implementation of a Computation Loop with i as an Index Varying from 1 to m/2
- 2.1—Iteration 1:
- loading the contents of the third and fourth registers into fifth and sixth k-bit registers with parallel input and output, these outputs being connected to parallel inputs of the first and second multiplication circuits,
- performing, by simultaneous rightward shifting of the contents of the first register and of a seventh n-bit register with series input and output, multiplication operations of the words A1 and A0 by the piece of data B, the pieces of data produced by the first and second multiplication circuits being encoded on n+k bits,
- adding, in the first addition circuit, the bits produced by the first multiplication circuit with the bits given by the first multiplexer,
- storing the k first bits produced by the first multiplication circuit in an eighth n-bit register with series input and output,
- adding, in the second addition circuit, the n+k bits produced by the second multiplication circuit with the n most significant bits produced by the first multiplication circuit, these bits being complemented by k zeros,
- storing, in the eighth register, of the k first bits produced by the second addition circuit and the storage, in the seventh register, of the following n bits,
- during the above operations, transferring the words A3 and A2 into the third and fourth registers,
- selecting a second input of the first multiplexer in order to connect the output of the seventh register to the first input of the first addition circuit.
- 2j—Iteration j, j Varying from 2 to m/2−1:
- loading the contents of the third and fourth registers into the fifth and sixth registers,
- performing, by simultaneous rightward shifting of the contents of the first and seventh registers, multiplication operations of the words A2j−1 and A2j−2 by the piece of data B,
- adding, in the first addition circuit, the bits produced by the first multiplication circuit with the contents of the seventh register,
- storing the k first bits produced by the first addition circuit in the eighth register,
- adding, in the second addition circuit, of the n+k bits produced by the second multiplication circuit with the n most significant bits produced by the first addition circuit complemented by k zeros to obtain an identical size for the pieces of data that are added up,
- storing, in the eighth register, of the k first bits produced by the second addition circuit and the storage, in the seventh register, of the n following bits,
- during the above operations, the transfer of the words A2j+1 and A2j into the third and fourth registers.
- 2.m/2—Iteration m/2
- Resuming the step 2.j, apart from the transfer of words from the second register into the third and fourth registers, the n least significant bits of the result being in the eighth register and the n most significant bits of the result being in the seventh register at the end of this iteration.
- The invention will be understood more clearly and other particular features and advantages shall appear from the following description, made with reference to the appended drawings, of which:
- FIG. 1 shows a
coprocessor 4 a according to the prior art, - FIG. 2 shows an example of a structure of a circuit comprising a coprocessor,
- FIG. 3 shows an exemplary embodiment of a
coprocessor 4 b according to the invention. - FIG. 2 shows an
encryption circuit 1 comprising a processor 2 (for example an 8-bit, 16-bit or 32-bit processor), amemory 3, acoprocessor 4 and a communications bus 5. The bus 5 is used to connect theelements circuit 1 in order to exchange pieces of data and/or control signals. The memory is used to store pieces of binary data having different formats. Theprocessor 2 is used to manage the information exchanges between thememory 3, thecoprocessor 4 and the exterior. The coprocessor is a modified version of thecoprocessor 4 a shown in FIG. 1, as illustrated in FIG. 3. - FIG. 3 shows an
exemplary coprocessor 4 b according to the invention. - The circuit shown in FIG. 3 comprises:
- four
shift registers - a
multiplexer 41 comprising two series inputs and one series output. The series output of themultiplexer 41 is connected to the input of theregister 40. A first input of themultiplexer 41 is connected to afirst input terminal 42 and a second input of themultiplexer 41 is connected to the output of theregister 40. - a
multiplexer 13 comprising three series inputs and one series output. The series output of themultiplexer 13 is connected to the input of theregister 10. A first input of themultiplexer 13 is connected to asecond input terminal 43 and a second input of themultiplexer 13 is connected to the output of theregister 10. - a
multiplexer 14 comprising two series inputs and one series output. The series output of themultiplexer 14 is connected to the input of theregister 11 and a first input of themultiplexer 14 is connected to athird input terminal 44, - a
multiplexer 15 comprising three series inputs and one series output. The series output of themultiplexer 15 is connected to the input of theregister 12. A first input of themultiplexer 15 is connected to afourth input terminal 45 and a second input of the multiplexer, 15 is connected to the output of theregister 12, - three k-cell registers16, 17 and 18 comprising a series input and a parallel output,
- a
multiplexer 46 comprising two series inputs and one series output. The series output of themultiplexer 46 is connected to the input of theregister 17. A first input of themultiplexer 46 is connected to afifth input terminal 47, and a second input of themultiplexer 46 is connected to the output of theregister 40. - two
multiplication circuits - two k-bit storage registers21 and 22 comprising a parallel input and output. The input of the
register 21 is connected to the output of theregister 16. The output of theregister 21 is connected to the parallel input of themultiplication circuit 19 and the output of theregister 22 is connected to the parallel input of themultiplication circuit 20, - a
multiplexer 23 comprising two parallel inputs and one parallel output. The first input of themultiplexer 23 is connected to the output of theregister 17 and a second input of themultiplexer 23 is connected to the output of theregister 18. The output of themultiplexer 23 is connected to the input of theregister 22. - two
multiplexers multiplexer 24 is connected to the input of theregister 16. A first input of themultiplexer 24 is connected to the output of theregister 40. The output of themultiplexer 25 is connected to the series input of themultiplication circuit 19. A first input of themultiplexer 25 receives a logic zero. - a
multiplexer 48 comprising four series inputs and one series output. The output of themultiplexer 48 is connected to the series input of themultiplication circuit 20 and a first input of this multiplexer receives a logic zero, -
subtraction circuits circuit 27 is connected to the output of theregister 10. The output of thecircuit 27 is connected to each of the second inputs of themultiplexers output terminal 49 and to a fourth input of themultiplexer 48. - a
multiplexer 50 comprising two series inputs and one series output. The output of themultiplexer 50 is connected to a first input of thecircuit 28. A first input of themultiplexer 50 is connected to the output of theregister 11 and a second input of this multiplexer receives a logic zero. - three
addition circuits circuit 30 is connected to the output of themultiplication circuit 19 and a second input of this circuit is connected to the output of thesubtraction circuit 28. The output of thecircuit 30 is connected firstly to a second input of themultiplexer 48 and, secondly, to a first input of thecircuit 51. The output of thecircuit 31 is connected to a first input of thecircuit 29. - a
multiplexer 52 comprising two series inputs and one series output. The output of themultiplexer 52 is connected to a second input of theaddition circuit 51. A first input of themultiplexer 52 is connected to the output of theregister 12 and a second input of this multiplexer receives a logic zero. - a
multiplexer 53 comprising three series inputs and one series output. The series output of themultiplexer 53 is connected to a first input of theaddition circuit 31 and a first input of this multiplexer is connected to the output of theaddition circuit 51. The third input of the multiplexer receives a logic zero. -
delay cells registers cell 32 is connected firstly to the third input of themultiplexer 48 and secondly to the input of thecell 33. The output of thecell 33 is connected to a second input of thecircuit 29. The input of thecell 34 is connected to the output of theaddition circuit 51 and the output of thecell 34 is connected to the second input of themultiplexer 53, - a
comparison circuit 35 comprising two series inputs and two outputs. A first input of thecircuit 35 is connected to the output of thecircuit 31 and a second input of thecircuit 35 is connected to the output of thecircuit 29. - two
multiplexers multiplexers circuit 35. The output of themultiplexer 36 is connected to a second input of thecircuit 27 and the output of themultiplexer 37 is connected to a second input of thecircuit 28, - a
multiplexer 38 comprising two series inputs and one series output. A first input of themultiplexer 38 receives a logic “one”. A second input of themultiplexer 38 is connected to the output of theregister 12. The output of themultiplexer 38 is connected, firstly, to the input of thecell 32 and, secondly, to second inputs of themultiplexers - a
demultiplexer 39 comprising a series input and two series outputs. The input of thedemultiplexer 39 is connected to the output of thecircuit 20 and a first output of thedemultiplexer 39 is connected to the input of theregister 18, - a
delay cell 54 for the delaying, by k cycle periods, the propagation of pieces of binary data (these cells being typically k-bit shift registers), this cell comprising one series input and one series output. The input of thecell 54 is connected to a second output of thedemultiplexer 39. - a
multiplexer 55 comprising two series inputs and one series output. A first input of themultiplexer 55 is connected to the second output of thedemultiplexer 39. A second input of themultiplexer 55 is connected to the output of thecell 54 and the output of themultiplexer 55 is connected to a second input of theaddition circuit 31, - a
multiplexer 56 comprising two series inputs and one series output. A first input of themultiplexer 56 is connected to the output of theaddition circuit 51, namely to the output of the second of the two series-connectedaddition circuits multiplexer 56 is connected to the output of theaddition circuit 31. The output of this multiplexer is connected to third inputs of themultiplexers multiplexer 14, - two
output terminals registers - As shall be seen further below, this exemplary coprocessor, made according to the invention could undergo modifications without departing from the framework of the invention.
- With regard to the output and input terminals, it could be chosen to use distinct terminals but these could also be one or more input/output terminals common to several elements of the coprocessor. One advantage in using distinct terminals is that it is possible to receive and/or give pieces of data from and/or to elements external to the coprocessor (such as the
processor 2 for example) in parallel, thus reducing the period of the exchanges between the circuit and the external elements. - Furthermore, with regard to the elements of the
circuit 4 b, the above description must also be understood as being a description that is functional rather than structural. In other words, it is important to be able to make the connections as indicated between elements fulfilling the functions listed (multiplexing, addition, subtraction, etc.). In particular, the multiplexers could be combined with one another and then have a greater number of inputs, outputs and selection inputs. With regard to themultiplexers circuits - As compared with the device of FIG. 1, the device of FIG. 3 includes the same elements, some added elements and modifications in the connections of the elements with one another. In particular, the device of FIG. 3 has a
supplementary register 40, asupplementary addition circuit 51, asupplementary delay cell 54 andmultiplexers multiplexer 14 has its first input connected to aninput terminal 44. Finally, themultiplexer 48 has a supplementary input so that the pieces of data stored in theregister 10 can be given at the series input of themultiplication circuit 20. - The
register 40 enables the storage in the coprocessor of an n-bit operand A, enabling the exchanges between the coprocessor and the exterior to be limited and therefore providing for a gain in time. Themultiplexer 46 enables the loading of theregister 17 with bits given by theregister 40. This enables the parallel use of theaddition circuits registers - It is possible, if necessary, to do without the
register 40 and themultiplexer 46 and carry out the parallel loading from the exterior of the k-bit words in order to implement parallel operation of themultiplication circuits registers registers register multiplexers - The
multiplexer 41 makes it possible, by looping the output of theregister 40 to its input, to keep the contents of this register permanently intact when words are given, by shifting, at the output of this register. Thus, if the working of the coprocessor is interrupted, it is possible to resume the course of operations at an already performed step without having to reload the register from the exterior of the coprocessor. Care will be taken simply to carry out the shifts that are necessary in the register to return to the state corresponding to the step to be performed (if the words contained in theregister 40 are always identical, their organization develops as a function of the shifts made). - The
multiplexer 50 enables the setting at zero of the second input of theaddition circuit 30 independently of the contents of theregister 11. This makes it possible to make theaddition circuit 30 transparent to the flow of bits produced at the output of themultiplication circuit 19. Indeed, in this case, the bits at the output of theaddition circuit 30 are identical to those received at the first input of this circuit. - It will be noted that the
multiplexer 50 can be placed directly between the output of thesubtraction circuit 28 and the first input of theaddition circuit 30. The multiplexer used then will be a three-input multiplexer having a first input connected to the output of theregister 11, a second input connected to the output of thesubtraction circuit 28 and a third input receiving a logic zero. By thus getting rid of thesubtraction circuit 28, it is possible to reduce the total consumption of thecoprocessor 4 b by planning to cut off thecircuit 28 and if necessary other circuits (such as the multiplexer 37) that are functionally attached to it when thecircuit 28 is not used. Another approach includes using a two-input multiplexer 50 as shown in FIG. 3 and in selecting the first input of themultiplexer 37 so that thesubtraction circuit 28 does not modify the bits given to theaddition circuit 30 by theregister 11. - The fourth input of the
multiplexer 48 could be connected directly to the output of the register 10 (which is equivalent, from a functional point of view, to connecting it to the output of thesubtraction circuit 27 if the latter receives a logic zero constantly at its second input). - The
addition circuit 51 enables the implementation of operations of the type A*B+C+D, the pieces of data C and D being given by theregisters - The
multiplexer 52 makes it possible firstly to give the contents of theregister 12 to theaddition circuit 51 to implement said operation or secondly to permanently give it a logic zero. This latter feature makes it possible to make theaddition circuit 51 transparent to the bits produced by themultiplication circuit 19 andaddition circuit 30. This makes it possible firstly to carry out the operation A*B+C in a simple way (there is then no need to load theregister 12 with n logic zeros) and, secondly, to carry out, without excessive complication, the usual operations for which the device of FIG. 1 is planned (it is necessary only to provide for an additional selection control signal for the multiplexer 52). - It will be noted that it would be possible to place the
addition circuit 51 between the output of themultiplication circuit 19 and the first input of theaddition circuit 30 and connect the first input of themultiplexer 56 to the output of theaddition circuit 30. - It would also be possible to place the
addition circuit 51 between the output of theregister 11 and the second input of theaddition circuit 30. Just as here above, the first input of themultiplexer 56 is then connected to the output of theaddition circuit 30. - It is also possible to place the
addition circuit 51 between the output of themultiplexer 53 and the first input of theaddition circuit 31. - It is again possible to place the
addition circuit 51 downline from theaddition circuit 31. This makes the management of the A*B+C+D type operations more complicated inasmuch as it may then become necessary, downline from theaddition circuit 31, to add pieces of data whose significance is lower than that of the pieces of data given at output of theaddition circuit 31. It is then necessary to provide for a means to set up a delay, downline from theaddition circuit 31, of the pieces of data produced by thiscircuit 31 or, upline, for the providing of pieces of data to thiscircuit 31. It will be noted that this positioning of theaddition circuit 51 downline from theaddition circuit 31 is not an encumbrance as regards the performance of simple multiplication on pieces of n-bit data since, in this case, thecircuit 51 is not used (as shall be seen further below). - As shall be seen further below, the
addition circuit 51 is used to carry out operations on pieces of data with a size greater than n, these operations requiring handling operations of the A*B+C+D type. It is possible to do without these operations if it is desired simply to obtain a gain in time with regard to operations relating to pieces of data whose size is smaller than or equal to n, the invention relating above all to the simultaneous use of the twomultiplication circuits - The
multiplexer 53 enables the output of theaddition circuit 51 to be connected directly to the first input of theaddition circuit 31 without delaying the transfer of data between these circuits through thedelay cell 34. It also makes it possible to provide logic zeros to theaddition circuit 31. This makes it possible, as shall be seen further below, if necessary to complement the pieces of n-bit data given by theaddition circuits addition circuit 31 with k logic zeros so that the size of these pieces of data corresponds to (n+k) bits of the pieces of data produced by themultiplication circuit 20. - It is possible if necessary to use a two-input multiplexer and not circumvent the
delay cell 34 but this will be done to the detriment of the computation time owing to the systematic crossing of thisdelay cell 34. It is also possible functionally, during the operations, to replace the third input of themultiplexer 53 by a selection of the first inputs of themultiplexers multiplexers multiplexer 53 then receives logic zeros. - The
delay cell 54 enables the flow of the bits produced by themultiplication circuit 20 to be delayed by k cycle periods. - It is thus possible to add up the bits produced by the
multiplication circuits multiplication circuit 19 during an operation. Themultiplexer 55 enables said operation to be implemented while at the same time enabling the use of the improved coprocessor of FIG. 3 to carry out the operations implemented by the coprocessor of FIG. 1 (it is enough, for this purpose, to connect the second output of thedemultiplexer 39 directly to the second input of the addition circuit 31). - If the
delay cell 34 is not short-circuited by means of themultiplexer 53, thecell 54 will be modified so as to obtain a delay of 2*k cycle periods. - The
multiplexer 56 enables the storage, in one of theregisters 10 to 12, of the bits produced by the addition circuits 30 (if the second input of theaddition circuit 51 receives a logic zero, in which case thecircuit 51 is transparent to the bits produced by the circuit 30) or 51. - It is also possible to directly store the bits produced by the
multiplication circuit 19 by selecting themultiplexers multiplication circuit 19. - It is possible, by means of the invention, to carry out a non-modular multiplication without taking account of the elements of the
device 4 a used in the context of a Pfield operation. It is therefore possible, in such a case, to consider limiting the consumption of thecircuit 4 by inhibiting the working of the elements that are used solely in the context of a Pfield operation but are not used in the computation methods described here below. - Furthermore, the invention enables operations to be performed by the parallel use of the two
multiplication circuits multiplication circuits circuit 19 and the most significant word is given to thecircuit 20. - 1. Non-modular Multiplication on Pieces of Data with a Size Smaller than or Equal to n Bits
- Let us assume that it is sought to carry out a standard multiplication A*B, the pieces of binary data A and B being encoded on n bits. We shall consider the subdivision of the pieces of data A and B into m words of k bits. Let A=Am−1 . . . A0 and B=Bm−1 . . . B0.
- It is assumed that m is a multiple of two. If this is not so, registers of (m+1)*k bits will be used. It is also possible to control the
multiplexer 48 so as to give logic zeros at the series input of themultiplication circuit 20 when the mth word of k bits of the piece of data A is given at the parallel input of the multiplication circuit 19 (themultiplication circuit 20 then produces n+k logic zeros). - Similarly, the pieces of data A and B could have a size smaller than n. If necessary, the most significant bits of these pieces of data will be complemented by zeros to obtain pieces of data of a size equal to n bits, if it is desired to have only one control program for the processor. It is also possible to provide for a sequencing of the conmmands of the device comprising a variable number of computation loops, enabling the processing time to be reduced if the pieces of data are encoded on a number of bits smaller than n. It is also possible to use operands of different sizes, by complementing the smaller-sized operand with logic zeros or by adapting the control program of the coprocessor.
- The following procedure is used:
- 1—Initialization:
- the loading in the
registers registers - the initialization of the
addition circuits multiplication circuits 19 and 20 (and possibly of the subtraction circuits if they enter the data processing paths; in this case themultiplexers subtraction circuits - the selection of the second inputs of the
multiplexers addition circuits - the selection of the second input of the
multiplexer 55 so that the pieces of data produced by themultiplexer circuit 20 are given with a delay of k clock strokes to theaddition circuit 31, - the selection of the second and fourth inputs of the
multiplexers - 2—Implementation of a Computation Loop with i as an Index Varying from 1 to m/2 2.1—Iteration 1:
- the loading of the contents of the
registers registers - the performance, by simultaneous rightward shifting of the contents of the
registers multiplication circuits - the addition, in the
addition circuit 30, of the bits produced by themultiplication circuit 19 with the bits given by the multiplexer 50 (in other words with a piece of data equal to zero) and addition with a piece of data equal to zero in theaddition circuit 51, - the storage of the k first bits produced by the
multiplication circuit 19 in theregister 12, the first input of themultiplexer 56 being selected, - the addition, in the
addition circuit 31, of the n+k bits produced by themultiplication circuit 20 with the n most significant bits produced by themultiplication circuit 19, these bits being complemented by k zeros, and the selection of the second input of themultiplexer 56, - the storage in the
register 12 of the k first bits produced by theaddition circuit 31 and the storage in theregister 11 of the n following bits, - during the above operations, the transfer of the words A3 and A2 into the
registers - the selection of the first input of the
multiplexer 50 so as to connect the output of theregister 11 to the second input of theaddition circuit 30. - 2j—Iteration j, j Varying from 2 to m/2−1:
- the loading of the contents of the
registers registers - the performance, by simultaneous rightward shifting of the
registers multiplication circuits - the addition, in the
addition circuit 30, of the bits produced by themultiplication circuit 19 with the contents of theregister 11, - the storage of the k first bits produced by the
addition circuit 30 in theregister 12, the second input of themultiplexer 56 being selected and theaddition circuit 51 being transparent to the bits given by theaddition circuit 30, - the addition, in the
addition circuit 31, of the n+k bits produced by themultiplication circuit 20 with the n most significant bits produced by the addition circuit 30 (complemented by k zeros to ensure an identical size of the added data) and the selection of the first input of themultiplexer 56, - the storage in the
register 12 of the k first bits produced by theaddition circuit 31 and the storage in theregister 11 of the n following bits, - during the above operations, the transfer of the words A2j+1 and A2j into the
registers - 2.m/2—Iteration m/2
- The resumption of the step 2j, apart from the transfer of words from the
register 40 into theregisters 16 and 16 (this transfer is unnecessary). - At the end of this iteration, the n least significant bits of the result are in the
register 12 and the n most significant bits are in theregister 11. - The gain in time to perform the operation is equal to 50% with respect to the device shown in FIG. 1 (taking account of the computation steps proper, the initialization step having an identical duration for the devices of FIGS. 1 and 3).
- It will be noted that the
addition circuit 51 is not used in the implementation of the method. It is therefore possible however in a first stage to improve the prior art device by implementing the device of FIG. 3, except for theaddition circuit 51 and themultiplexer 52. It is also possible, as mentioned already here above, to place theaddition circuit 51 between the output of themultiplication circuit 19 and the first input of theaddition circuit 30. It will then be appropriate to connect the first input of themultiplexer 56 to the output of thecircuit 30 if it is desired to carry out operations of the A*B+C+D type as illustrated here below. - It is also possible, without modifying the device of FIG. 3, to exchange the roles of the
addition circuit registers - It is also possible, as the case may be, to place the
addition circuit 51 between the output of theregister 11 and theaddition circuit 30. This introduces no difference in the running of the method described here above but it will be appropriate then to connect the first input of themultiplexer 56 to the output of theaddition circuit 30. - Furthermore, if it is desired simply to improve the prior art device in the performance of multiplication on pieces of data with a size n, it will also be possible to eliminate the
register 40 and use theregister 12 to store the piece of data A. The illustrated circuit will then be modified by connecting the output of theregister 12 to the first and second inputs of themultiplexers registers register 12. It is therefore possible to limit the operation to using theregister 12, the words of the piece of data A being gradually replaced by the least significant words of the final result of the operation. - It will be noted that, in the parallel performance of multiplication operations as described, the
multiplication circuit 19 placed upline with respect to themultiplication circuit 20 produces the least significant bits. - It is possible if necessary to operate in reverse. In this case, the data elements produced by the
multiplication circuit 19 are delayed in the cell 34 (which then functionally replaces the cell 54) and the k first and k last logic zeros are respectively given to the first and second inputs of theaddition circuit 31 to obtain a correspondence between the place values of the pieces of data received by thisaddition circuit 31. Since themultiplexer 56 has no reason to exist, it can be removed. - A problem arises for the addition of the contents of the
register 11 to the pieces of data produced by themultiplication circuits multiplication circuit 19 between the output of this circuit and the first input of theaddition circuit 30. In this case, it is necessary to short-circuit the cell 34 (or else to use thecell 54 to delay the pieces of data at output of the multiplication circuit 20) so that the bits having identical place values are added up at the appropriate time in theaddition circuit 31. Another approach is to position theaddition circuit 51 downline with respect to theaddition circuit 31 and to provide the pieces of data of theregister 11 to thecircuit 51 by means of appropriate multiplexing means. In this case, theaddition circuit 51 replaces theaddition circuit 30 for the addition of pieces of data produced by the multiplication circuits and the contents of theregister 11. It is then necessary to add thisaddition circuit 51 to the known circuit whereas, in the example described in detail, thiscircuit 51 is not used and may be eliminated if it is not desired to process pieces of data of a size greater than n. We shall not go into detail into the reversal of the role of the multiplication circuit. The implementing of this reversal entails no difficulty to those skilled in the art, given the indications given here above and the description of the device illustrated with reference to FIG. 3. - Finally it will be noted that, to make the
cell 54, it is possible, inasmuch as thecell 34 has not been used elsewhere, to use thiscell 34. In this case, it is enough firstly to have available a multiplexer at input of the cell 34 (to enable the connection of this input to the second output of thedemultiplexer 39 or to the output of one of theaddition circuits multiplexer 55 or to the second input of the multiplexer 53). - 2. Multiplication on Pieces of Data with a Size Greater than n
- Let it be assumed that it is sought to make a standard multiplication A*B=C, the binary pieces of data A and B being encoded on a size greater than n. As an example, we shall assume pieces of data encoded on 2*n bits, the result C being encoded on 4*n bits. A, B and C have the form A[1] A[0], B[1] B[0] and C[3] C[2] C[1] C[0], the operands A[1] to C[0] being encoded on n bits.
- As above, the method described shall be extended without difficulty to operands of different sizes.
- By using the device of FIG. 1, the following procedure is used:
- 1—the loading of the piece of data B[0] into the
register 10. - 2—the computation of A[0]*B[0]=R[1] C[0], R[1] and C[0] being encoded in n bits and stored outside the coprocessor at the end of the computation.
- 3—the computation of A[1]*B[0]=R′[1] R′[0], R′[1] and R′[0] being encoded in n bits and stored outside the coprocessor at the end of the computation.
- 4—the loading of the piece of data B[1] in the
register 10. - 5—the computation of A[0]*B[1]=R″[1] R″[0], R″[1] and R″[0] being encoded in n bits and stored outside the coprocessor at the end of the computation.
- 6—the computation of A[1]*B[1]=R′″[1] R′″[0], R′″[1] and R′″[0] being encoded in n bits and stored outside the coprocessor at the end of the computation.
- The subsequent steps are performed outside the coprocessor, for example by means of a processor or a dedicated wired circuit.
- 7—the computation of R′[1] R′[0]+R[1]=T[1] T[0], T[1] and T[0] being encoded in n bits.
- 8—the computation of T[1] T[0]+R″[1] R″[0]=T′[1] C[1], T′[1] and C[1] being encoded in n bits.
- 9—the computation of R′″[1] R′″[0]+T′[1]=C[3] C[2], C[3] C[2] being encoded in n bits.
- By using the device of FIG. 3, the same operation can be done as follows:
- 1—the loading of the pieces of data A[0] and B[0] in the
registers - 2—the computation of A[0]*B[0]+0+0=R[1] C[0] (according to the method described here above) and output of the pieces of data C[0] from the
register 12, the piece of data R[1] being stored in theregister 11. - 3—the loading of the piece of data A[1] in the register 40 (this can be done during the shifting of A[0] to the
registers 16 and 17). - 4—the computation of A[1]=B[0]*R[1]+0=R′[1] R′[0], the selection of the
multiplexer 50 having been modified so as to connect the output of theregister 11 to the second input of theaddition circuit 30 during the first iteration of the loop, and the output of the piece of data R′[1] from theregister 11, the piece of data R′[0] being stored in theregister 12. - 5—the loading of the pieces of data A[0] and B[1] in the
registers 40 and 10 (these may be done during the last iteration of the computation of the step 4). - 6—the computation of A[0]*B[1]+0+R′[0]=R″[1] C[1], the selection of the
multiplexer 52 having been modified so as to connect the output of theregister 12 to the second input of theaddition circuit 51 during the first iteration of the loop, and the output of the piece of data C[1] from theregister 12, the piece of data R″[1] being stored in theregister 11. - 7—the loading of the pieces of data A[1] and R′[1] in the
registers - 8—the computation of A[1]*B[1]+R″[1]+R′[1]=C[3] C[2].
- The gain in computation time is 50% for the
steps - With the device according to the invention, the only piece of data that is output, apart from the sub-operands of the result, is the intermediate result R′[1]. It is possible, if necessary, to output R′[1] directly at the output of the
multiplexer 56 by connecting this output to an output terminal. This enables this transfer to be done during the computation of thestep 4. - The method described is given by way of an example. It is possible to implement other methods while continuing to benefit from the advantages obtained through the simultaneous use of two
multiplication circuits - Thus, it is possible for example to use the method of Karatsuba described here below:
- 1—the comparison firstly of the n bits of A[0] and A[1] and secondly of the n bits of B[0] and B[1] (it will be assumed that A[1] and B[1] encoded on n bits are greater than A[0] and B[0]).
- 2—the computation of A[0]*B[0] in the coprocessor and the output of the result referenced C[0] and encoded on 2*n bits.
- 3—the computation of A[1]*B[1] in the coprocessor and the output of the result referenced C[1] encoded on 2*n bits.
- 4—the computation of (B[1]−B[0]) and (A[1]−A[0]) on n bits.
- 5—the computation of (B[1]—B[0])*(A[1]−A[0]) in the coprocessor and the output of the result C[3] encoded on 2*n bits.
- 6—the computation of 2*C[0]+2*C[1]+C[3].
- Only three operations of multiplication are done instead of four (the multiplication by two is obtained directly in binary logic by the shifting of the pieces of data) and the operation can be faster (depending on the difference of the clock frequencies used by the coprocessor and the processor). This, however, calls for exchanges between the coprocessor and the exterior, and memory space to store the intermediate results. It will be noted that the additions could possibly be obtained by using the resources of the coprocessor (registers and addition circuits).
- For the implementation of the multiplication of pieces of data of a size greater than 2*n, the method requires exchanges between the coprocessor and the exterior since it is then necessary to be able to add at least three pieces of data of the same place value. The fact nevertheless is that the coprocessor according to the invention remains advantageous in the implementation of multiplication operations. Furthermore, if we assume that n=512, the possibility of performing computations on 1024 bits appears to date to be generally sufficient given the goals of security in civilian applications of encryption.
- 3. Modular Operations on Pieces of Data with a Size of n: Example 1
- The coprocessor illustrated in FIG. 3 makes it possible to perform modular operations on operands encoded on a number m′*k bits with m′ greater than or equal to m more quickly than is the case with the device of FIG. 1.
- For this purpose, the operands A, B and N are manipulated by being divided into q (q as an integer) sub-operands of n bits: A[q−1] A[q2] . . . A[0], B[q−1] B[q−2] . . . B[0], N[q−1] N[q−2] . . . N[0]. It will be noted that A and B could in practice have different sizes. It is assumed here, for the sake of simplicity, that A and B have the same size. It will also be assumed that the piece of data A comprises an even number of sub-operands (if this is not so, it may be complemented by bits of higher place value that are set at zero so as to obtain an even number). The step-by-step operation of the elements of the coprocessor shall not be described in detail as it does not raise any problem to those skilled in the art who know the document D2.
- It will be noted that in the above method the
addition circuit 51 is not used. - The operation A*B+C is therefore performed with the utmost efficiency by means of the resources of the coprocessors.
- The following is the method:
- 1.1—The Multiplication of B by the First Sub-operand of A
- 1—A[0]*B[0]+0 is computed. The result has the form R[1]0 R[0]0 with R[1]0 and R[0]0 being pieces of binary data encoded on n bits. R[0]0=X[0] is output from the
register 12 and it is stored in a memory (for example thememory 3 of the circuit 1) - 2—A[0]*B[1]+R[1]0 is computed. The result has the form R[1]1 R[0]1 with R[1]1 and R[0]1 being pieces of binary data encoded on n bits. R[0]1=X[1] is output from the
register 12 and it is stored in a memory. - .
- .
- Q—A[0]*B[q−1]+R[1]q−2 is computed. The result has the form R[1]q−1 R[0]q−1 with R[1]q−1 and R[0]q−1 being pieces of binary data encoded on n bits. R[1]q−1=X[q] and R[0]q−1=X[q−1] are output from the
registers - The result of the multiplication is the piece of data X[q] X[q−1] . . . X[1] X[0].
- 1.2—The Computation of the Result of a Multiplication
- X[0]*J[0]=Y with Y having the form Y[1] Y[0], Y being a piece of binary data encoded on 2*n bits, output and storage of Y[0].
- 1.3—The Computation of the Result of the Multiplication of the First Sub-operand of Y by the Piece of Data N
- 1—Y[0]*N[0]+X[0] is computed. The result has the form T[1]0 T[0]0 with T[1]0 and T[0]0 being pieces of binary data encoded on n bits. T[0]0=U[0] is output from the
register 12 and it is stored in a memory. - 2—Y[0]*N[1]+T[1]0 is computed. The result has the form T[1]1 T[0]1 with T[1]1 and T[0]1 being pieces of binary data encoded on n bits. T[0]1=U[1] is output from the
register 12 and it is stored in a memory. - .
- .
- Q—Y[0]*N[q−1]+T[1]q−2 is computed. The result has the form T[1]q−1 T[0]q−1 with T[1]q−1 and T[0]q−1 being pieces of binary data encoded on n bits. T[1]q−1=U[q] and T[1]q−1=U[q−1] are output from the
registers - The result of the multiplication is the piece of data U[q] U[q−1] . . . U[1] U[0].
- 1.4—The Computation of the Result of the Modular Multiplication of B by the First Sub-operand of A
- U+X is computed and the result referenced Z is stored.
- The result Z of the addition has the form (c) Z[q] Z[q−1] . . . Z[1] Z[0].
- S(1)=Z/2k−(N or 0) is stored.
- 2—Resumption of the Steps 1.1 to 1.4 in Considering the Second Sub-operand of A by Modifying the Step 1.1 as here Below
- 1—A[0]*B[0] is computed. The result has the form R[1]0 R[0]0 with R[1]0 and R[0]0 being pieces of binary data encoded on n bits. R[0]0=W[0] is output from the
register 12 and it is stored in a memory (for example thememory 3 of the circuit 1) - 2—A[1]*B[1]+R[1]0 is computed. The result has the form R[1]1 R[0]1 with R[1]1 and R[0]1 being pieces of binary data encoded on n bits. R[0]1=W[1] is output from the
register 12 and it is stored in a memory. - .
- .
- Q—A[1]*B[q−1]+R[1]q−2 is computed. The result has the form R[1]q−1 R[0]q−1 with R[1]q−1 and R[0]q−1 being pieces of binary data encoded on n bits. R[1]q−1=W[q] and R[0]q−1=W[q−1] are output from the
registers - W+S(1) is computed, giving the result of the multiplication X with X having the form X[q] X[q−1] . . . X[1] X[0].
- Q—Resumption of the
Above Step 2 in Taking into Consideration the Qth Sub-operand of A - The final result of the computation is S(q)−(N or 0).
- Gain in Computation Time
- The computation time is measured in terms of number of clock cycles of the coprocessor.
- The multiplication of the contents of the
register 10 by the contents of theregisters - Method According to the Prior Art
- Computation of the values Ai*B: q·[q·m (n+2.k)]=n·q2·(m+2)
- Computation of the values W: q·q·n=n·q2
- Computation of the values X: (q−1)·(q+1)·n=n ·(q2 −1)
- Computation of the values Y: q·m·(
n+ 2·k)=n·q·(m+2) - Computation of the values T: q·[q·m·(n+2.k)]=n·q2·(m+2)
- Computation of the values U: q·q·n=n·q2
- Computation of the values Z: q·(q+1)·n=n·(q2+q)
- The number of cycles needed to perform the computations is given by the following formula: 2·n·(m+4)·q2+n·(m+3)·q−n.
- Method Using the Invention
- Computation of the values A*B+R: q·[q·m/2·(n+k)]=n/2·(m+1)·q2
- Computation of the values X: (q−1)·(q+1)·n=n·q2−n
- Computation of the values Y: q·m/2·(n+k)=n/2·(m+1)·q
- Computation of the values T: q·[q·m/2·(n+k)]=n/2·(m−1)·q2
- Computation of the values Z: q·(q+1)·n=n·q2+n·q
- The number of cycles needed to perform the computations is given by the following formula:
- n/2·(2m+4)·q2+n/2·(m+3)·q−n.
- Let it be assumed that q=3 and k=32.
- For m=8 (n=256), the first method requires 63,488 cycles and the second method requires 27,136 cycles, giving a gain of 57.26%.
- For m=16 (n=512), the first method requires 212,992 cycles and the second method requires 97,024 cycles, giving a gain of 54.45%.
- It will be observed that these computations do not take account of the exchanges of data between the coprocessor and the exterior, these exchanges being far more numerous in the implementation of the first method. The time needed to perform these exchanges depends on the clock frequency used to set the rate of operation of the external elements (such as the
processor 2, thememory 3 and the communications bus 5 of the circuit 1), this frequency being in practice generally lower than the clock frequency of the coprocessor. - The capacity of the coprocessor according to the invention to implement operations of the A*B+C type also has other advantages, for example in the implementation of the RSA encryption method.
- RSA Method
- The RSA encryption method makes it necessary to perform computations of the C=MD mod N type with M as a message to be encrypted or decrypted, N as a modulo value such that N=P*Q, with P and Q being prime integers and D such that D*E=mod ((P−1)*(Q−1)), with E as a known quantity.
- An algorithm to perform this computation is the following:
- A=(M mod P)D mod(P−1) mod P
- B=(M mod Q)D mod(Q−1) mod Q
- U=Q−1 mod P
- If A<B mod P then
- C=(((A+P−(B mod P))*U)mod P)*Q+B
- Else
- C=(((A−(B mod P))*U) mod P)*Q+B
- The invention enables the computation of C which has the form X*Q+B by loading B into the
register 11. It will be noted in this respect that it is particularly useful to provide for amultiplexer 14 having one input connected to an input terminal of the circuit. Indeed, in the method explained here above dealing with multiplication for pieces of data whose size is greater than that of the registers, the contents of theregister 11 are produced within the circuit. An input terminal is therefore not used in this case to load the contents of theregister 11. If it is desired to compute A*B+C, with C having any unspecified value, it should be possible to enter the piece of data C into theregister 11 by connecting one input of the multiplexer to an input terminal. - 4. Modular Operations on Pieces of Data with a Size Greater than n: Example 2.
- In this example, the
addition circuit 51 is used. - The operation A*B+C+D is performed.
- It is considered, in the same way as earlier, that the operands A, B and N take the form of q n-bit sub-operands: A[q−1] A[q−2] . . . A[0], B[q−1] B[q−2] . . . B[0], N[q−1] N[q−2] . . . N[0].
- The following is the method:
- 1.1—The Multiplication of B by the First Sub-operand of A
- 1—A[0]*B[0]+0+0 is computed. The result has the form R[1]0 R[0]0 with R[1]0 and R[0]0 being pieces of binary data encoded on n bits. R[0]0=X[0] is output from the
register 12 and it is stored in a memory (for example thememory 3 of the circuit 1) - 2—A[0]*B[1]+R[1]0+0 is computed. The result has the form R[1]1 R[0]1 with R[1]1 and R[0]1 being pieces of binary data encoded on n bits. R[0]1=X[1] is output from the
register 12 and it is stored in a memory. - .
- .
- Q—A[0]*B[q−1]+R[1]q−2+0 is computed. The result has the form R[1]q−1 R[0]q−1 with R[1]q−1 and R[0]q−1 being pieces of binary data encoded on n bits. R[1]q−1=X[q] and R[0]q−1=X[q−1] are output from the
registers - The result of the multiplication is the piece of data X[q] X[q−1] . . . X[1] X[0].
- 1.2—The Computation of the Result of an Operation of Multiplication
- X[0]*J[0]=Y is computed with Y having the form Y[1] Y[0], Y being a piece of data encoded on 2*n bits, output and storage of Y[0].
- 1.3—The Computation of the Result of the Multiplication of the First Sub-operand of Y by the Piece of Data N
- 1—Y[0]*N[0]+X[0]+0 is computed. The result has the form T[1]0 T[0]0 with T[1]0 and T[0]0 being pieces of binary data encoded on n bits. T[0]0=Z[0] is output from the
register 12 and it is stored in a memory. - 2—Y[0]*N[1]+X[1]+T[1]0 is computed. The result has the form T[1]1 T[0]1 with T[1]1 and T[0]1 being pieces of binary data encoded on n bits. T[0]1=Z[1] is output from the
register 12 and it is stored in a memory. - .
- .
- .
- Q—Y[0]*N[q−1]+X[q−1]+T[1]q−2 is computed. The result has the form T[1]q−1 T[0]q−1 with T[1]q−1 and T[0]q−1 being pieces of binary data encoded on n bits. T[1]q−1 and T[0]q−1=Z[q−1] is output from the
registers - X[q]+T[1]q−1=Z[q] is computed.
- The result Z of the multiplication is the piece of data Z[q] Z[q−1] . . . Z[1] Z[0].
- Storage of S1=Z/2k−(N or 0)=S1[q] S1[q−1] . . . S1[1] S1[0].
- 2—Resumption of the Steps 1.1 to 1.3 in Considering the Second Sub-operand of A by Modifying the Step 1.1 as Here Below
- 1—A[1]*B[0]+S1[0]+0 is computed. The result has the form R[1]0 R[0]0 with R[1]0 and R[0]0 being pieces of binary data encoded on n bits. R[0]0=X[0] is output from the
register 12 and it is stored in a memory (for example thememory 3 of the circuit 1) - 2—A[1]*B[1]+S1[1]+R[1]0 is computed. The result has the form R[1]1 R[0]1 with R[1]1 and R[0]1 being pieces of binary data encoded on n bits. R[0]1=X[1] is output from the
register 12 and it is stored in a memory. - .
- .
- Q—A[1]*B[q−1]+S1[q−1]+R1q−2 is computed. The result has the form R[1]q−1 R[0]q−1 with R[1]q−1 and R[0]q−1 being pieces of binary data encoded on n bits. R[1]q−1 and R[0]q−1=X[q−1] is output from the
registers - X[q]=R[1]q−1+S1[q] is computed, giving the result of the multiplication X with X having the form X[q] X[q−1] . . . X[1] X[0].
- .
- .
- .
- Q—Resumption of the Above Steps in Taking into Consideration the gth Sub-operand of A
- The final result of the computation is S(q)−(N or 0).
- Gain in computation time
- The computation time is measured in terms of number of clock cycles of the coprocessor.
- The multiplication of the contents of the
register 10 by the contents of theregisters n+ 2·k) cycles (in not taking account of the loading of the registers and the initialization of the circuit). Furthermore, the additions shall be considered to have been performed by means of a serial adder whose rate of operation is set by the clock of the coprocessor. An addition then requires n cycles. - Computation of the values A*B+R+S: [q·m/2·(n+k)]=n/2·q2·(m+1)
- Computation of the values Xi: (q−1)·n=n·q−n
- Computation of the values Y: q·m/2·(n+k)=n/2·q·(m+1)
- Computation of the values Y*N+T+X: q·[q·m/2·(n+k)]=n/2·Q2·(m+1)
- Computation of the values Zq: q·n=n·q
- The number of cycles needed to perform the computations is given by the following formula:
- n·(m+1)·q2+n/2·(m+5)·q−n.
- Let it be assumed that q=3 and k=32.
- For m=8 (n=256), the method requires 25,472 giving a gain of 59.88%.
- For m=16 (n=512), the method requires 90,880 cycles, giving a gain of 57,33%.
- It will be observed that these computations do not take account of the exchanges of data between the coprocessor and the exterior, these exchanges being far more numerous in the implementation of the first method. The time needed to perform these exchanges depends on the clock frequency used to set the rate of operation of the external elements (such as the
processor 2, thememory 3 and the communications bus 5 of the circuit 1), this frequency being in practice generally lower than the clock frequency of the coprocessor. - Having thus described at least one illustrative embodiment of the invention, various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description is by way of example only and is not intended as limiting. The invention is limited only as defined in the following claims and the equivalents thereto.
Claims (13)
1. A device comprising:
a first register, a second register, a third register, a fourth register and a fifth register,
at least one input terminal to receive binary pieces of data to be stored in these registers,
a first multiplication circuit that performs a multiplication operation between two pieces of data stored in the first and third registers,
a second multiplication circuit that performs a multiplication operation between two pieces of data stored in the first and fourth registers,
a first addition circuit that performs operations of addition between a piece of data stored in the second register and a piece of data produced by the first multiplication circuit,
a second addition circuit that performs an operation of addition between a piece of data produced by the first addition circuit and a piece of data given to the second addition circuit by the second multiplication circuit,
a delay cell to delay the supply to the second addition circuit of the piece of data given by the second multiplication circuit,
multiplexing means that selectively supplies, to inputs of the first addition circuit, the contents of the second register or a permanent logic state, the connection of an input of the second multiplication circuit to an output of the first register, the connection of the output of the first multiplication circuit to one of the registers and the supply to the second addition circuit of a piece of data produced by the first addition circuit or a permanent logic state.
2. A device according to claim 1 , wherein the multiplexing means comprise a first multiplexer with two series inputs and one series output, a first input of said multiplexer being connected to an output of the second register, a second input of the multiplexer receiving a permanent logic state and the output of the multiplexer being connected to an input of the first addition circuit.
3. A device according to claim 2 , further comprising a subtraction circuit, placed between the second register and the first addition circuit, that performs a subtraction operation between a piece of data stored in the second register and a piece of data stored in the fifth register, wherein the first multiplexer comprises a third series input, said multiplexer being placed between the subtraction circuit and the first addition circuit and the third input of said multiplexer being connected to an output of the subtraction circuit.
4. A device according to one of the claim 1 , further comprising a third addition circuit, series-connected with the first addition circuit that performs addition operations between the pieces of data stored in the second and fifth registers and a piece of data produced by the first multiplication circuit and multiplexing means that selectively supplies, to an input of the third addition circuit, of the contents of the fifth register or a permanent logic state.
5. A device according to claim 4 , wherein the multiplexing means comprise a second multiplexer having a first input, this first input enabling the connection of the output of the first or third addition circuit to one of the registers to store all or a part of the pieces of data produced by addition between pieces of data stored in the second and fifth registers and a piece of data produced by the first multiplication circuit.
6. A device according to claim 5 , wherein the second multiplexer comprises a second input connected to the output of the second addition circuit for the storage, in one or more of the registers, of the data produced by this second multiplication circuit.
7. A device according to claim 1 , the third and fourth registers being used to provide pieces of data to the first and second multiplication circuits, wherein the device comprises means to connect the output of either one of the second or fifth registers to inputs of these third and fourth registers.
8. A device according to claim 1 , comprising a sixth register with series input and series output and multiplexing means to connect the output of this sixth register to inputs of the third and fourth registers.
9. A device according to claim 8 , comprising a multiplexer to selectively connect the input of the third register to the output of the sixth register or to an input terminal.
10. A device according to claim 8 , comprising a multiplexer having two inputs and one output, a first input of the multiplexer being connected to an input terminal to receive pieces of data from outside the device, a second input of the multiplexer being connected to the output of the sixth register for reintroducing, into said register, of the pieces of data given at its output.
11. A device according to claim 1 , further comprising a delay cell placed between an output of the first addition circuit and an input of the second addition circuit, the device comprising multiplexing means to directly connect said first and second addition circuits, thus preventing the introduction of a delay between said circuits.
12. A device comprising a processor, a memory, a communications bus and a device defined according to claim 1 .
13. A method for the implementation of a non-modular multiplication A*B, A and B being pieces of binary data encoded in n bits, n being an integer, these pieces of data being subdivided into m words of k bits A=Am−1 . . . A0 and B=Bm−1 . . . B0, m being an even number, the method comprising the following steps:
1—Initialization:
loading the pieces of data A and B into first and second n-bit registers with series input and output, and loading the words A0 and A1 into third and fourth k-bit registers with series input and parallel output,
initializing first and second addition circuits and first and second multiplication circuits,
selecting a first input of a first multiplexer so that it permanently supplies logic zeros to a first series input of the first addition circuit,
selecting an input of a second multiplexer so that the pieces of data produced by the second multiplication circuit are given with a delay of k clock strokes to a series input of the second addition circuit,
selecting inputs of a third and fourth multiplexers so as to connect an output of the first register to series inputs of the first and second multiplication circuits;
2—Implementation of a computation loop with i as an index varying from 1 to m/2
2.1—Iteration 1:
loading the contents of the third and fourth registers into fifth and sixth k-bit registers with parallel input and output, these outputs being connected to parallel inputs of the first and second multiplication circuits,
performing, by simultaneous rightward shifting of the contents of the first register and of a seventh n-bit register with series input and output, multiplication operations of the words A1 and A0 by the piece of data B, the pieces of data produced by the first and second multiplication circuits being encoded on n+k bits,
adding, in the first addition circuit, the bits produced by the first multiplication circuit with the bits given by the first multiplexer,
storing of the k first bits produced by the first multiplication circuit in an eighth n-bit register with series input and output,
adding, in the second addition circuit, the n+k bits produced by the second multiplication circuit with the n most significant bits produced by the first multiplication circuit, these bits being complemented by k zeros,
storaging, in the eighth register, of the k first bits produced by the second addition circuit and the storage, in the seventh register, of the following n bits,
during the above operations, transferring the words A3 and A2 into the third and fourth registers,
selecting of a second input of the first multiplexer in order to connect the output of the seventh register to the first input of the first addition circuit;
2.j—iteration j, j varying from 2 to m/2−1:
loading the contents of the third and fourth registers into the fifth and sixth registers,
performing, by simultaneous rightward shifting of the contents of the first and seventh registers, multiplication operations of the words A2j−1 and A2j−2 by the piece of data B,
adding, in the first addition circuit, the bits produced by the first multiplication circuit with the contents of the seventh register,
storing the k first bits produced by the first addition circuit in the eighth register,
adding, in the second addition circuit, the n+k bits produced by the second multiplication circuit with the n most significant bits produced by the first addition circuit complemented by k zeros to obtain an identical size for the pieces of data that are added up,
storing, in the eighth register, the k first bits produced by the second addition circuit and the storage, in the seventh register, of the n following bits,
during the above operations, the transfer of the words A2j+1 and A2j into the third and fourth registers; and
2.m/2—iteration m/2
Resuming step 2.j, apart from the transfer of words from the second register into the third and fourth registers, the n least significant bits of the result being in the eighth register and the n most significant bits of the result being in the seventh register at the end of this iteration.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/991,494 US20020178196A1 (en) | 1997-01-09 | 2001-11-21 | Modular arithmetic coprocessor comprising two multiplication circuits working in parallel |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR9700264A FR2758195B1 (en) | 1997-01-09 | 1997-01-09 | MODULAR ARITHMETIC CO-PACKER COMPRISING TWO MULTIPLICATION CIRCUITS OPERATING IN PARALLEL |
FR9700264 | 1997-01-09 | ||
US09/004,375 US6035317A (en) | 1997-01-09 | 1998-01-08 | Modular arithmetic coprocessor comprising two multiplication circuits working in parallel |
US42860799A | 1999-10-27 | 1999-10-27 | |
US09/991,494 US20020178196A1 (en) | 1997-01-09 | 2001-11-21 | Modular arithmetic coprocessor comprising two multiplication circuits working in parallel |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US42860799A Continuation | 1997-01-09 | 1999-10-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20020178196A1 true US20020178196A1 (en) | 2002-11-28 |
Family
ID=9502555
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/004,375 Expired - Fee Related US6035317A (en) | 1997-01-09 | 1998-01-08 | Modular arithmetic coprocessor comprising two multiplication circuits working in parallel |
US09/991,494 Abandoned US20020178196A1 (en) | 1997-01-09 | 2001-11-21 | Modular arithmetic coprocessor comprising two multiplication circuits working in parallel |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/004,375 Expired - Fee Related US6035317A (en) | 1997-01-09 | 1998-01-08 | Modular arithmetic coprocessor comprising two multiplication circuits working in parallel |
Country Status (4)
Country | Link |
---|---|
US (2) | US6035317A (en) |
EP (1) | EP0853275B1 (en) |
DE (1) | DE69703085T2 (en) |
FR (1) | FR2758195B1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040138098A1 (en) * | 2002-05-07 | 2004-07-15 | Fein Seymour H. | Pharmaceutical compositions including low dosages of desmopressin |
US20060023878A1 (en) * | 2004-07-29 | 2006-02-02 | Hee-Kwan Son | Modular multipliers having segmentable structure and cryptography systems utilizing same |
US20060080071A1 (en) * | 2000-06-20 | 2006-04-13 | International Business Machines Cor | Method, apparatus and computer program product for network design and analysis |
US20070169061A1 (en) * | 2003-12-15 | 2007-07-19 | Bera Rajendra K | Run-Time Parallelization of Loops in Computer Programs Using Bit Vectors |
US20080127152A1 (en) * | 1999-12-01 | 2008-05-29 | Rajendra Kumar Bera | Compiler optimisation of source code by determination and utilization of the equivalence of algebraic expressions in the source code |
US20090043836A1 (en) * | 2007-08-10 | 2009-02-12 | Atmel Corporation | Method and system for large number multiplication |
US8399410B2 (en) | 2007-08-06 | 2013-03-19 | Allergan, Inc. | Methods and devices for desmopressin drug delivery |
US9974826B2 (en) | 2008-05-21 | 2018-05-22 | Ferring B.V. | Methods comprising desmopressin |
US10137167B2 (en) | 2008-05-21 | 2018-11-27 | Ferring B.V. | Methods comprising desmopressin |
US11963995B2 (en) | 2008-05-21 | 2024-04-23 | Ferring B.V. | Methods comprising desmopressin |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7197625B1 (en) | 1997-10-09 | 2007-03-27 | Mips Technologies, Inc. | Alignment and ordering of vector elements for single instruction multiple data processing |
US5864703A (en) * | 1997-10-09 | 1999-01-26 | Mips Technologies, Inc. | Method for providing extended precision in SIMD vector arithmetic operations |
FR2773231B1 (en) * | 1997-12-31 | 2000-02-25 | Sgs Thomson Microelectronics | IMPROVED PRODUCTION PROCESS OF A JO PARAMETER ASSOCIATED WITH THE IMPLEMENTATION OF MODULAR OPERATIONS ACCORDING TO THE MONTGOMERY METHOD |
FR2774783B1 (en) * | 1998-02-09 | 2000-04-14 | Sgs Thomson Microelectronics | METHOD OF IMPLEMENTING AN ELEMENTARY MODULAR OPERATION ACCORDING TO THE MONTGOMERY METHOD |
FR2775369B1 (en) | 1998-02-26 | 2001-08-03 | Sgs Thomson Microelectronics | METHOD FOR IMPLEMENTING A SPECIFIC MODULAR MULTIPLICATION RELATING TO THE MONTGOMERY METHOD |
FR2791155B1 (en) * | 1999-03-17 | 2001-05-11 | St Microelectronics Sa | DEVICE AND METHOD FOR IMPLEMENTING AN ELEMENTARY MODULAR OPERATION ACCORDING TO THE MONTGOMERY METHOD |
FR2791156B1 (en) * | 1999-03-17 | 2001-05-11 | St Microelectronics Sa | DEVICE AND METHOD FOR IMPLEMENTING AN ELEMENTARY MODULAR OPERATION ACCORDING TO THE MONTGOMERY METHOD |
US7240204B1 (en) * | 2000-03-31 | 2007-07-03 | State Of Oregon Acting By And Through The State Board Of Higher Education On Behalf Of Oregon State University | Scalable and unified multiplication methods and apparatus |
US7620832B2 (en) * | 2000-09-20 | 2009-11-17 | Mips Technologies, Inc. | Method and apparatus for masking a microprocessor execution signature |
US7237097B2 (en) | 2001-02-21 | 2007-06-26 | Mips Technologies, Inc. | Partial bitwise permutations |
US7711763B2 (en) * | 2001-02-21 | 2010-05-04 | Mips Technologies, Inc. | Microprocessor instructions for performing polynomial arithmetic operations |
US7162621B2 (en) * | 2001-02-21 | 2007-01-09 | Mips Technologies, Inc. | Virtual instruction expansion based on template and parameter selector information specifying sign-extension or concentration |
US7181484B2 (en) * | 2001-02-21 | 2007-02-20 | Mips Technologies, Inc. | Extended-precision accumulation of multiplier output |
US7599981B2 (en) * | 2001-02-21 | 2009-10-06 | Mips Technologies, Inc. | Binary polynomial multiplier |
US7113593B2 (en) * | 2001-03-06 | 2006-09-26 | Ericsson Inc. | Recursive cryptoaccelerator and recursive VHDL design of logic circuits |
US7318145B1 (en) | 2001-06-01 | 2008-01-08 | Mips Technologies, Inc. | Random slip generator |
US6922717B2 (en) | 2001-09-28 | 2005-07-26 | Intel Corporation | Method and apparatus for performing modular multiplication |
US20030065696A1 (en) * | 2001-09-28 | 2003-04-03 | Ruehle Michael D. | Method and apparatus for performing modular exponentiation |
JP2004145010A (en) * | 2002-10-24 | 2004-05-20 | Renesas Technology Corp | Encryption circuit |
JP4072503B2 (en) * | 2004-02-04 | 2008-04-09 | シャープ株式会社 | IC card with built-in coprocessor for auxiliary operation and control method thereof |
DE102004016412A1 (en) * | 2004-03-30 | 2005-10-27 | Cv Cryptovision Gmbh | Multiplication of two very large numbers for cryptographic purposes, whereby a brackets multiplication method is used based on reduction to a prime element and use of a successive look ahead mechanism |
US7805479B2 (en) * | 2006-03-28 | 2010-09-28 | Michael Andrew Moshier | Scalable, faster method and apparatus for montgomery multiplication |
US10255462B2 (en) | 2016-06-17 | 2019-04-09 | Arm Limited | Apparatus and method for obfuscating power consumption of a processor |
WO2023003737A2 (en) * | 2021-07-23 | 2023-01-26 | Cryptography Research, Inc. | Multi-lane cryptographic engine and operations thereof |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6297060A (en) * | 1985-10-23 | 1987-05-06 | Mitsubishi Electric Corp | Digital signal processor |
US4949292A (en) * | 1987-05-14 | 1990-08-14 | Fujitsu Limited | Vector processor for processing recurrent equations at a high speed |
US5047973A (en) * | 1989-04-26 | 1991-09-10 | Texas Instruments Incorporated | High speed numerical processor for performing a plurality of numeric functions |
US5513133A (en) * | 1992-11-30 | 1996-04-30 | Fortress U&T Ltd. | Compact microelectronic device for performing modular multiplication and exponentiation over large numbers |
-
1997
- 1997-01-09 FR FR9700264A patent/FR2758195B1/en not_active Expired - Fee Related
- 1997-12-19 EP EP97470032A patent/EP0853275B1/en not_active Expired - Lifetime
- 1997-12-19 DE DE69703085T patent/DE69703085T2/en not_active Expired - Lifetime
-
1998
- 1998-01-08 US US09/004,375 patent/US6035317A/en not_active Expired - Fee Related
-
2001
- 2001-11-21 US US09/991,494 patent/US20020178196A1/en not_active Abandoned
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080127152A1 (en) * | 1999-12-01 | 2008-05-29 | Rajendra Kumar Bera | Compiler optimisation of source code by determination and utilization of the equivalence of algebraic expressions in the source code |
US8028280B2 (en) | 1999-12-01 | 2011-09-27 | International Business Machines Corporation | Compiler optimisation of source code by determination and utilization of the equivalence of algebraic expressions in the source code |
US20060080071A1 (en) * | 2000-06-20 | 2006-04-13 | International Business Machines Cor | Method, apparatus and computer program product for network design and analysis |
US8176108B2 (en) | 2000-06-20 | 2012-05-08 | International Business Machines Corporation | Method, apparatus and computer program product for network design and analysis |
US7560429B2 (en) | 2002-05-07 | 2009-07-14 | Ferring B.V. | Orodispersible dosage forms of desmopressin acetate |
US7405203B2 (en) | 2002-05-07 | 2008-07-29 | Reprise Biopharmaceutics, Llc | Pharmaceutical compositions including low dosages of desmopressin |
US9504647B2 (en) | 2002-05-07 | 2016-11-29 | Ferring B.V. | Pharmaceutical formulations of desmopressin |
US9220747B2 (en) | 2002-05-07 | 2015-12-29 | Ferring B.V. | Methods using desmopressin acetate in orodispersible form |
US9919025B2 (en) | 2002-05-07 | 2018-03-20 | Ferring B.V. | Pharmaceutical formulations of desmopressin |
US7579321B2 (en) | 2002-05-07 | 2009-08-25 | Reprise Biopharmaceutics, Llc | Pharmaceutical compositions including low dosages of desmopressin |
US7799761B2 (en) | 2002-05-07 | 2010-09-21 | Allergan, Inc. | Pharmaceutical compositions including low dosages of desmopressin |
US7947654B2 (en) | 2002-05-07 | 2011-05-24 | Ferring B.V. | Pharmaceutical formulations |
US8802624B2 (en) | 2002-05-07 | 2014-08-12 | Ferring B.V. | Methods of treatment using orodispersible desmopressin pharmaceutical formulations |
US10307459B2 (en) | 2002-05-07 | 2019-06-04 | Ferring B.V. | Pharmaceutical formulations of desmopressin |
US20040138098A1 (en) * | 2002-05-07 | 2004-07-15 | Fein Seymour H. | Pharmaceutical compositions including low dosages of desmopressin |
US8143225B2 (en) | 2002-05-07 | 2012-03-27 | Allergan, Inc. | Pharmaceutical compositions including low dosages of desmopressin |
US8028281B2 (en) * | 2003-12-15 | 2011-09-27 | International Business Machines Corporation | Run-Time parallelization of loops in computer programs using bit vectors |
US20070169061A1 (en) * | 2003-12-15 | 2007-07-19 | Bera Rajendra K | Run-Time Parallelization of Loops in Computer Programs Using Bit Vectors |
US20060023878A1 (en) * | 2004-07-29 | 2006-02-02 | Hee-Kwan Son | Modular multipliers having segmentable structure and cryptography systems utilizing same |
US8399410B2 (en) | 2007-08-06 | 2013-03-19 | Allergan, Inc. | Methods and devices for desmopressin drug delivery |
US9375530B2 (en) | 2007-08-06 | 2016-06-28 | Allergan, Inc. | Methods and devices for desmopressin drug delivery |
US8028015B2 (en) | 2007-08-10 | 2011-09-27 | Inside Contactless S.A. | Method and system for large number multiplication |
WO2009023595A1 (en) * | 2007-08-10 | 2009-02-19 | Atmel Corporation | Method and system for large number multiplication |
US20090043836A1 (en) * | 2007-08-10 | 2009-02-12 | Atmel Corporation | Method and system for large number multiplication |
US9974826B2 (en) | 2008-05-21 | 2018-05-22 | Ferring B.V. | Methods comprising desmopressin |
US10137167B2 (en) | 2008-05-21 | 2018-11-27 | Ferring B.V. | Methods comprising desmopressin |
US11020448B2 (en) | 2008-05-21 | 2021-06-01 | Ferring B.V. | Methods comprising desmopressin |
US11963995B2 (en) | 2008-05-21 | 2024-04-23 | Ferring B.V. | Methods comprising desmopressin |
Also Published As
Publication number | Publication date |
---|---|
FR2758195A1 (en) | 1998-07-10 |
US6035317A (en) | 2000-03-07 |
FR2758195B1 (en) | 1999-02-26 |
EP0853275B1 (en) | 2000-09-13 |
DE69703085D1 (en) | 2000-10-19 |
DE69703085T2 (en) | 2001-01-18 |
EP0853275A1 (en) | 1998-07-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6035317A (en) | Modular arithmetic coprocessor comprising two multiplication circuits working in parallel | |
US6349318B1 (en) | Arithmetic processor for finite field and module integer arithmetic operations | |
US5764554A (en) | Method for the implementation of modular reduction according to the Montgomery method | |
US5982900A (en) | Circuit and system for modulo exponentiation arithmetic and arithmetic method of performing modulo exponentiation arithmetic | |
US5745398A (en) | Method for the implementation of modular multiplication according to the Montgomery method | |
US6397241B1 (en) | Multiplier cell and method of computing | |
US5289397A (en) | High-speed modulo exponentiator device | |
US5210710A (en) | Modulo arithmetic processor chip | |
US4891781A (en) | Modulo arithmetic processor chip | |
US7206410B2 (en) | Circuit for the inner or scalar product computation in Galois fields | |
JPH04216588A (en) | Method and apparatus for coding | |
JP2002521720A (en) | Circuits and methods for modulo multiplication | |
US8078661B2 (en) | Multiple-word multiplication-accumulation circuit and montgomery modular multiplication-accumulation circuit | |
CN109814838B (en) | Method, hardware device and system for obtaining intermediate result set in encryption and decryption operation | |
JPH1021057A (en) | Data processor and microcomputer | |
US5987489A (en) | Modular arithmetic coprocessor enabling the performance of non-modular operations at high speed | |
US5751620A (en) | Method for the production of an error correction parameter associated with the implementation of modular operations according to the Montgomery method | |
EP0504996B1 (en) | Arithmetic unit for multiplying long integers modulo M and R.S.A. converter provided with such multiplication device | |
US20030037087A1 (en) | Apparatus and method for efficient modular exponentiation | |
US6341299B1 (en) | Modular arithmetic coprocessor enabling the performance of non-modular operations at high speed | |
US6963644B1 (en) | Multi-word arithmetic device for faster computation of cryptosystem calculations | |
US6424987B1 (en) | Method for the implementation of a specific modular multiplication operation relating to the montgomery method | |
US5948051A (en) | Device improving the processing speed of a modular arithmetic coprocessor | |
CN115270155A (en) | Method for obtaining maximum common divisor of big number expansion and hardware architecture | |
US5912904A (en) | Method for the production of an error correction parameter associated with the implementation of modular operations according to the Montgomery method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |