Binary Digital Processor. The present invention relates to digital processing circuits for performing mathematical operations on vectors and other variables which may be expressed in the complex form x+jy , and to signal-processing arrangements utilising such circuits.
It is frequently necessary or convenient, when dealing with a vector of the form x+jy , to rotate it in the xy plane by an angle ø, and this operation may be performed by the transformation: 1) ... x+jy→ (x cos ø - y sin ø) + j(y cos ø + x sin ø) involving multiplication by sin ø and cos ø. In computing terms, the accurate calculation of cos ø and sin ø is time-consuming and/or uneconomic in terms of memory. An object of the present invention is to provide a digital processing circuit which is capable of rapidly and accurately performing calculations involving complex variables.
Consider the. expressions : cosø = 1 - 2-a sinø = 2-m
Since sin2ø = 1-cos2ø sin2ø = 2 ( 1-a ) - 2-2a
2 ( 1-a ) (a > > 1 )
But sin2ø = 2-2m (exactly true ) sina
2m + 1 (a >> 1 ) Thus in the transformation 1) above, if the functions cos ø and sin ø are replaced by the functions c(m) and s(m) where c(m) = 1-2-(2m+1) ....2) s(m) = 2-m ....3) and m is chosen such that
then rotation of x+jy by exactly the angle ø will be achieved. The modulus of x+jy will be increased slightly by the approximation used for a in terms of m,thus:
(cos2ø + sin2ø)½ = [(1-2-(2m+1))2-(2-m)2]½ 1+2-(4m+3)
givlng a per-unit error k(m) of k(m)
2-(4m+3) ....5)
Thus the transformation
x+jy→(x.c(m)-y.s(m)) + j(y.c(m) + x.s(m)) ....6)
is exactly equivalent to the transformation
x+jy→(1+k(m)) {(x cos ø(m) -y(sin ø(m)) + j(y cos ø(m) + x sin ø(m))} ....7) where ø (m) is define by equation 4 ) .
Having chosen an appropriate parameter m to give the desired angle ø,x.c(m) and y.s(m) can be evaluated simply by shifting x and y 2m+l and m places right respectively in a binary shift register, and in the case of x.c(m) , adding x. Thus the transformation 6) may be accomplished in a binary digital circuit arrangement using adders, shift registers and a set of stored values of m corresponding to various values of ø , no multiplication being necessary. Rotation by an angle ø which does not correspond closely to an integral value of m may be achieved by successive rotations by ø1 ,ø2...øN where and each of ø
1, ø2... øN corresponds closely to an integral value of m. Therefore, according to the present invention, a binary digital processor for rotating a vector representative of two independent input variables x and y by an angle ø in the xy plane by means of respective approximations to the operations of multiplication by sin ø and cos ø comprises adding means and means for performing controlle binary shifts on said variables so as to achieve multiplication of said variables by the functions S(m) or C(m) and thereby approximate to multiplication by sin ø or cos ø respectively, where S(m) = 2-m and C(m) = 1-2-(2m+1},m being chosen such that .
The processor may conveniently be incorporated in an integrated circuit.
Means (such as a look-up table stored in a memory) may be provided for supplying one or more values of m (which is necessarily an integer) to enable any desired angle of rotation to be achieved. In general this will involve the addition and substraction of a plurality of smaller rotations, each of which corresponds exactly to one or other value of m. The number of rotations required will depend on the desired accuracy of the required total rotation.
T he processor may incorporate selectors and additional adders to enable vectors to be added, subtracted rotated and multiplied. Particular embodiments of the invention will now be described by way of example with reference to Figures 1 to 8 of the accompanying drawings, of which:
Figure 1 is a schematic representation of a simple vector rotator accumulator in accordance with the invention; Figure 2 is a table showing the angle of rotation as a function of m and the corresponding magnitude error k(m);
Figure 3 is a schematic representation of a more versatile digital processor which is suitable for use in a frequency analyser;
Figure 4 shows a microprogram for operating the processor of Figure 3;
Figure 5 shows an arrangement for evaluating a discrete Fourier transform (D.F.T.);
Figure 6 is a high level program for operating the arrangement of Figure 5;
Figure 7 shows an arrangement for evaluating a complex D.F.T., and
Figure 8 is a high level program for operating the arrangement of Figure 7.
Referring to Figure 1, the simple processor shown comprises an upper half which handles the real component of a vector x+jy and a precisely similar lower half which stores the imaginary half of x+jy , (j being implied) the instantaneous values of x and y being stored in binary registers 1 and 2 respectively. Shifters 3 and 4, 5 and 6 are associated with registers 1 and 2 respectively and shift the binary representations ofx and y by either m places (shifters 4 and 6) or (2m+1) places (shifters 3 and 5) in accordance with an input value of m. Input data in the form of a vector a+jb is added to the vector x+jy stored in the registers and the latter is rotated by an angle ø (m) in accordance with equation 4) each time the registers are clocked. Each resulting new value of x+jy is successively stored in registers 1 and 2. The operation of the processor is as follows: When register 1 is clocked the stored value of x is applied to the positive input of adder 7 and the stored value of x shifted (2m+1) places to the right (i.e.x.2-(2m+1)) is applied to the negative input of adder 7. The resulting output x . (1-2-(2m+1)) (which is equal to x.c (m)) is added to the output of adder 9, fed to the output of the processor, and stored as the new value of x in register 1. The output of adder 9 is a-y.2-m (i.e. a-y.s(m) ) and hence the new value of x output from the processor and stored in. register 1 is: x.c(m) - y.s(m)+a The operation of the other half of the processor
is precisely similar so that the new value of y output from the processor and stored in register 2 is: y.c(m) + x.s(m) + b
Thus each time the processor is clocked, inputting a, b and m, the stored vector x+jy is transformed to the vector:
(x.c(m)-y.s(m)+a) + j(y.c(m) + x.s(m)+b)
From equation 7) It can be seen that this transformation corresponds to rotation of x+jy by ø , multiplication of x+jy by 1+k(m) (where k(m) is given by equation 5) and addition of a+jb to the rotated vector. It will be apparent that selectors may be inserted in series with the adders in Figure 1 to allow simple addition of a+jb to x+jy or simple rotation of x+jy . Figure 2 shows the values of m and k(m) corresponding to values of ø
(angle of rotation) up to 14°. It can be seen that k(m) rapidly becomes negligible even at small values of m. A suitable, program for choosing successive values of m to apply to the processor of Figure 1 so as to achieve a desired total rotation may readily be devised by a person skilled in the art and stored if necessary in a PROM (programmable read only memory). The total values of both ø (m) and k(m) are equal to the respective sums of the individual ø'S and k's. Since (m) halves with each increment in m, an accurate rotation may be achieved by using only a few successive values of m. It will be appreciated that rotations of 90, 180 and 270 degrees may be achieved exactly by signed transfer of data between registers.. Rotation by zero degrees may be used for simple addition and to pad out compound rotation sequences to a fixed length. Reverse rotation may be achieved by changing the sign of the s(m) coefficients and rotation to the first quadrant may be achieved by using sign changes on negative data only (thereby replacing x+jy by 1x1 + jlyl). The modulus of a stored vector may be determined by providing means for registering a change in sign of y from positive to negative and
for inhibiting further rotation, so that the vector is rotated onto the real axis in small steps. The x value is then equal to the modulus of the vector. The argument of the vector may be found by summing the rotations required to bring the vector on to the x-axis.
Gain control may be achieved by additing and /or subtracting appropriate fractions of the input a + jb
(achieved by binary shifts) a controlled number of times before rotating the vector - this facility is useful for example in "windowing" input waveform samples in Fourier Transform frequency analysis.
The extra functions described above may be achieved by simple modifications (generally involving the insertion of selectors in the input paths of the adders) to the processor of Figure 1. Some of these functions are incorporated in the more sophisticated processor of Figure 3, but will be apparent to one skilled in the art and will not therefore be described individually in detail.
Figure 3 shows a processor which is similar to that shown in Figure 1, but which is more versatile. The operation of the processor of Figure 3 will be described with reference to the operating program tabulated in Figure 4. The processor employs the "two is complement binary" format, in which the most significant bits Xm and Ym stored in the X and Y registers 1 and 2 represent -2 L , where L is the value of the next most significant bit. Thus each register can store a binary number having a value between ±(2L-1) inclusive, Xm and Ymhaving a value zero for positive numbers and unity for negative numbers. Typically the X and Y registers, adders, shifters and selectors (and the data buses between them) might have a capacity of 24 bits without undue complexity.
The processor of Figure 3 is controlled by a control section 39, which is in turn controlled by inputs Xm and Ym (the most significant (sign) bits from the
X and Υ registers 1 and 2 respectively) , a 3-bit binary code F corresponding to F2, F1, F0 in the program listed
in Figure 4(a) and a 1-bit sign code S listed in the same program. S=0 gives true input data or positive
(counter-clockwise) rotation. S=1 gives negated input data or clockwise rotation. Control 39 generates 1-bit output codes a,b,c,d,e,f,g,h,j,k in response to the inputs
Xm, Ym , F and S and feeds them to the various adders and shifters referenced 3 to 34 in Figure 3. It will be appreciated that the upper and lower halves of this part of the processor handle the real input data (from input A) and imaginary input data (from input B) in precisely the same manner. Accordingly, each control code a,b, e,f,j,k is routed to two identical components as indicated, thus code a controls compoents 19 and 22, code b controls components 20 and 23 and so on. However XOR gates 27 to 30 are independently controlled by codes g,c, h and d respectively. An additional "halt" output H is fed to a counter 42, which counter counts small (known) rotations of the vector x+jy. Output H is generated by the sign bit Ym and has a value unity when Ym changes from zero to unity, corresponding to rotation of x+jy clockwise past the x-axis. When H changes to unity, counter 42 is stopped, and thus registers the number of known small rotations (typically one degree) required to bring a vector x+jy onto the x-axis, which corresponds to the argument of the vector.
Additional control signals are fed directly to some or all of the components of the processor, namely a clock signal; a value of m in 4-bit code (to shift registers 3,4,5 and 6 only); a 1-bit transfer code T (to selectors 35,36 and 41), for either updating the output registers 37,38 and 40 (T=1) or leaving them unchanged, (T=0); a 2-bit output enable code 0 to selector 43 for selecting one of inputs 0,1,2 or 3 and a 1-bit output enable code
to a common output buffer 44. The states of transfer code T and output enable code 0 are tabulated in Figure 4(b) and 4(c) and labelled with
two-letter or three-letter program codes which appear in the higher-level programs of Figures 6 and 7. Codes RL, IM, NIM and ARG correspond to inputs 0, 2, 1 and 3 respectively of selector 43. Inputs m, F,S,T,O and E, together with suitable write commands and source and destination addresses (not shown) are generated by a high level program (Figure 6) which is suitably stored in ROM (read only memory).
Figure 4(a) shows the eight operations (in program code, NOP, LSI, ASI, ROT, RAC, RFQ, ROC and CAR) which the processor can perform on input data A+jB, in terms of the inputs to control 39 and the corresponding outputs generated by control 39. It will be appreciated that those operations which involve rotation will additionally be governed by the variable m, which determines the angle of rotation ø(m) in accordance with expression 4).
Before describing the operations tabulated in Figure 4(a) the hardware for implementing control of the processor shown in Figure 3 will be briefly described. Components 25 and 26 are NAND gates. Components 27,28,29 and 30 are XOR (exclusive OR) gates. Components 31,32,33and 34 are selectors, which select inputs 1 or 0 according to the value of their binary control code j or e. Components 19,20,21,22,23,24 are AND gates and therefore behave as adders when their binary control codes are high (a,b,f = 1). Components 13,14,15,16,17,18 are simple adders and components 1,2,3,4,5 and 6 are the same as the correspondingly referenced components in Figure 1.
Referring to Figure 4(a), the first operation listed is NOP (no operation) which simply leaves stored x+jy unchanged, irrespective of the input data. This is achieved by feeding the contents- of the x and y registers back into their inputs. Accordingly a path for the stored x data is formed through selector 31, XOR gate 27, AND gate 21 and adders 14 and 15 by putting k=0 (blocking off gate 25); j=0 (to select input 0 of selector 31); g=0 (to open gate 27) and f=1 (to open gate 21) By
setting h=0 a similar path is formed for the y register. The addition of A and B to x and y at adders 14 is prevented by setting a=0. These values are set out in the first row of the table of Figure 4(a). Outputs c,d and e are not specified. Since no operation is carried out, inputs S, Xm. Ym„ and H are immaterial: such "don't care" inputs are indicated by crosses. Operation LSI exists in two versions and replaces x+jy by either s(m)(A+jB) if the sign bit S=0 or -s(m)(A+jB) if S=1. Similarly, operation ASI achieves the transformation: x+jy→ x+jy ± s(m)(A+jB) according to whether S=0 (making c=0 and d=0) or 1 (making c=1 and d=1).
Operation ROT achieves the rotation (in either direction): x+jy→x.c(m)
y.s(m)+j(y.c(m)±x.c(m)) according to whether S=0 or 1.
Operation RAC performs the combined rotation and addition: x+jy→(x.c(m)
y.s(m)+A) + j(y.c(m) ± x.s(m) + B again according to whether S=0 or S=1.
Operation RFQ exists in four different versions according to the state of Xm and Ym (i.e. according to the quadrant in which x+jy falls). This changes the signs of x and y only if they are respectively negative thus the new x and y are always positive.
Operation ROC leaves x+jy unchanged if H=1 and rotates x+jy by ø(m) if H=0; i.e. when H=0, x+jy→(x.c(m)
y.s(m)) + j(y.c(m) ± x.s(m)) ; the direction of rotation depending on the value of S. If m is chosen to make ø small, the argument of x+jy can be found by adding the ø's in counter 42 , as already described.
The operation CAR rotates x+jy by ± 90º according to the value of S.
Figure 5 shows a processor P as described above with reference to Figure 3 , incorporated in an arrangement for evaluating the frequency components of an input
waveform by means of a discrete Fourier transform (D.F.T.). Samples of the waveform are stored digitally in a memory Ml, fed to the real input of processor P (the imaginary input being left unused or used for some other function), processed and output to a memory M2, which is divided in two for storing the real and imaginary parts respectively of the output vector x+jy. The arrangement is controlled by a controller/sequencer C, which is in turn fed with the program tabulated in Figure 6. The program of Figure 6 is written in terras of the operations set out in Figure 4 and also includes WR (write real) and WI (write imaginary) commands for entering output data from P into memory M2 together with the necessary source and destination addresses SA and DA.
The general expression for an N point (radix N) discrete Fourier transform is:
where 0 ≤ k < N, = ,
xn is the value of the nth (complex) waveform sample and Xk is the amplitude of the kth frequency component. It has been shown by T.E. Curtis and J.E. Wickenden on pages 424and 425 of I.E.E. Proceedings Vol. 130, Part F, Number 5 (August 1983) that expression 8.) may be re-written as:
in the special case where N is a prime numbers;
where Q is any integer such that Q≠0(mod N) and k* is defined by the expression kk* (mod N) = Q. The above mentioned article (which starts at p.423) is hereby incorporated by reference. It will be noted that a single value of (which corresponds to a vector
rotation in expression 9.)) may be chosen and used for all frequency components k (except k=0), different values of k using different input data sequences. Thus expression 9.) may be evaluated more easily than expression 8.), which requires a different rotation for each value of k.
The program of Figure 6 evaluates expression 9.) for a 5 point D.F.T. (N=5, k=1,2,3,4), which is equivalent to evaluating the matrix product:
The matrix of xn above r epr e sents th e Wnk(mod 5)'s in the sum. The X0 terms (representing the D.C. signal component) are omitted. From 11. ) , X1 = x0 W0 + x1 W1 + x2 W2 + x3 W3 + x4 W4
=(((x4W1+x3) W1+x2) W1+ x1 ) W1+x0 . .... 12 .)
and X2= X0W0 = x3W1 = x1W2 + x4W3 + x2W4
= (((χ2W1 + x4)W1 + x1 )W1 + x3)W1 + X0 . . . . 1 3. ) so that each successive Wn
is rotated by -72º. This is achieved in Figure 6
by a rotation of 90º and then two reverse rotations of approximately 14° (m=2) and 4° (m=4) respectively. Expression 12.) is evaluated in lines 1 to 13 of the program (source addresses 4,3,2,1, and 0 corresponding to x4 , x3, x2, x1 and x0 respectively) and X1 is output in lines 15 and 16. Because of the conjugate symmetry of the matrix of xn in expression 11.), X 4 can be immediately derived frpm X1 and is output in lines 17 and 18 of the program. X2 is calculated in lines 14 to 26 of the program and output at lines 28 and 29, enabling X3 to be
output at lines 30 and 31, and X0 is calculated in lines 27 to 31 of the program and output at lines 33 and 34. Thus a set of complex harmonies X1 to X5 are output from memory M2. The arrangement of Figure 5, programmed as shown in Figure 6, may thus be used substantially as it stands to perform a frequency analysis on an input waveform. It will be appreciated that a D.F.T. having many more than 5 points will generally be used in practice. A program on the lines set out in Figure 6, but with a proportionally greater number of steps, may be derived without difficulty for a D.F.T. having a large, prime number of points. Furthermore it will be appreciated that the processor of Figure 3 is generally applicable to the calculation of D.F.T.'S, and not just the special form of prime radix D.F.T. of expression 9.). Thus the general D.F.T. of expression 8.) may be written recursively
and computed by the processor of Figure 3 by summing the data samples in reverse order with a repeating compound rotation
. However the prime radix D.F.T. 9.) may be computed more efficiently and in particular a D.F.T. decomposed into a plurality of (particularly 2) prime radix D.F.T.'S such as 9.) may be computed highly efficiently by a processor in accordance with the invention. On page 424 of the Curtis and Wickenden paper referred to above, it is shown that a radix N1N2 D.F.T. can be efficiently evaluated by performing N1 radix N2 D.F.T.'S and performing N2 radix N.. D.F.T.'S on the result, if N1 and N2 are relatively prime (i.e. the highest common factor = 1). This condition is of course fulfilled if both N1 and N2 are themselves prime numbers, and accordingly Figure 7 shows an arrangement for processing the complex output X0 to X4 of the processor arrangement of Figure 5 to successively calculate five radix 3 D.F.T.'S of the type shown in expression 9.).
Thus the arrangement of Figure 7 is precisely similar to
that of Figure 5 except that it handles complex input data, which is fed to a divided input memory M3. The input data is fed to the real and imaginary inputs of a processor P of the type shown in Figure 3 and the resulting output is fed to an output data store M4, where the modulus is extracted and output as a set of 15 frequency components. The arrangement is controlled by a controller/sequencer C, which is in turn controlled by the program of Figure 8. This program is similar to that of Figure 6, except that rotations of -120º (approximately) are used. The three computed output values are output at lines 20, 38 and 48 respectively.