Nothing Special   »   [go: up one dir, main page]

An Efficient VLSI Architecture For Iterative Logarithmic Multiplier

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2017 4th International Conference on Signal Processing and Integrated Networks (SPIN)

An efficient VLSI architecture for Iterative


Logarithmic Multiplier
Durgesh Nandan1 Jitendra Kanungo2 Anurag Mahajan3
Department of Electronics and Department of Electronics and Department of Electronics and
Communication Engineering Communication Engineering Communication Engineering
J.U.E.T J.U.E.T J.U.E.T
Guna, Madhya Pradesh, India Guna, Madhya Pradesh, India Guna, Madhya Pradesh, India
prof.durgeshnandan@gmail.com jitendra.kanungo@juet.ac.in anurag.mahajan@juet.ac.in

Abstract— Logarithmic Number System (LNS) based multiplier additions, and divisions into subtractions, respectively, which
plays a significant role in the fields of Digital Signal Processing can save a lot of computation efforts [5].
(DSP), Image processing and Neural network which needs a lot of The logarithmic multiplication has performed in three steps:
arithmetic operation. In all arithmetic operations, the (1) conversion of binary numbers into the logarithmic
multiplication is most hardware consuming component. Here, we
give a possible solution to that problem by using an efficient VLSI
numbers by the help of logarithmic converter, (2) addition
architecture of Mitchell’s algorithm based iterative logarithmic operation, and (3) the antilogarithmic conversion of
multiplier with seamless pipelined technique. The proposed work is logarithmic numbers by the help of antilogarithmic converter
based on the hardware minimization at the same error cost than of [3]. A Simple block diagram of logarithmic based
previously reported architectures. We use VHDL to design the multiplication is shown below in fig.1. [2].The process of
existing and proposed Mitchell’s algorithm based iterative implementation of logarithmic and antilogarithmic converters
logarithmic multiplier. Both multipliers design are evaluated with play a major role in deciding the accuracy and performance of
the Synopsys design compiler by using 90 nm CMOS technology LNS based circuits.
and compared the results in terms of Data Arrival Time (DAT),
Area, Power, Area Delay Product (ADP), and EPS (Energy per
Sample). The proposed design involves 30.99 %, 31.10 %, and 20.84 Input X Input Y
% ADP, 5.12 %, 15.48%, and 23.55 % less EPS in comparisons of
existing Mitchell’s algorithm based iterative logarithmic multiplier
for 8 bit,16 bit, and 32 bit operations respectively. Logarithmic Logarithmic
Converter Converter
Keywords— Iterative logarithmic multiplier, Logarithmic number
system, Mitchell method, Operand decomposition, Seamless
pipelined. Adder Circuit

I. INTRODUCTION
Antilogarithmic
Logarithmic computation is broadly used in the field of Digital Converter
DSP, neural network, and image processing applications. An
arithmetic operation (like addition, subtraction, multiplication,
and division) is the broadly used operation in Digital Signal X*Y
Processing (DSP) applications in a real time domain. Figure1. Block diagram of logarithmic based multiplication
Logarithm provides an option for digital designers in place of
binary arithmetic because logarithm operation has fast and less An example of logarithmic multiplication of two numbers 20
area consuming in comparisons of binary arithmetic. Due to and 60 for easy understanding of operation is given below [8].
an efficient hardware implementation of logarithmic
operations is a good choice in place of binary arithmetic Let A=b’ (00010100) =d’20; and B=b’ (00111100) =d’60;
operations. Log A =0100.0100; and Log B=0101.11100;
In all arithmetic operations, multiplication is the most Sum = Log A + Log B
hardware consuming component due to the generation and Sum = (0100.0100) + (0101.11100) =1010.00100;
processing of huge numbers of partial products. The Antilog (Sum) = Antilog (Log A + Log B);
generation of partial products can be avoided by using a Antilog (Sum) =00000100100000000;
Logarithmic Number System (LNS) [1]-[4]. Logarithmic X*Y= Antilog (Sum) =1152;
operations can be converted multiplications operation into

978-1-5090-2797-2/17/$31.00 ©2017 IEEE 419

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:53:26 UTC from IEEE Xplore. Restrictions apply.
2017 4th International Conference on Signal Processing and Integrated Networks (SPIN)

Rest of the paper has been organized as the systematic antilogarithm converter [15]. The operand decomposition
development of LNS is described in Section II. Proposed approach improves the average error percentage and the error
method of logarithmic multiplication and new approach of range of Mitchell algorithms at the cost of 100 % area
seamless pipeline has been explored further in Section III. expansion. It is equally applicable to all other methods like
Section IV gives hardware complexity, synthesis, and Divided Approximation based correction and correction term
simulated results. And finally, explores the applications of based methods [3]. In 2010, Bulic et.al proposed iterative
LNS and the conclusion are concluded in Section V. logarithmic approximation. It based on iteration process where
the correction terms of multiplication calculate immediate
II. REVIEW OF LOGARITHMIC after the product which avoids the comparison of the sum of
MULTIPLICATION METHODS mantissa with ’1. It requires fewer logic resources and
increasing the speed of the multiplier with error correction
Systematic developments of logarithm multiplication methods
circuits [16]. In 2013, Agrawal et.al also proposed similar type
after evolution are clear understandable for anyone by the help
of work but gives ASIC implementation of iterative pipelined
of block diagram given in Fig. 2. LNS based multipliers can
architecture [17]. We found that Mitchell’s algorithm based
be divided into two categories: (1) Look-Up Tables (LUT) and
iterative logarithmic multiplier suffers from poor hardware
interpolations and (2) Mitchell’s algorithm [3-11]. Mitchell’s
efficiency, now the challenge is to make it efficient [16-17].
algorithm (MA) can be subdivided into four sub- categories
namely as a) divided approximation (DA) b) correction term
III. PROPOSED METHOD
based c) operand

Logarithmic Multiplication The existing MA based iterative algorithm for logarithmic


multiplication approach does not provide a trade-off between
the accuracy, hardware efficiency, and speed. We purpose the
Mitchell’s hardware efficient VLSI architecture of Mitchell’s algorithm
Look up Table based
Algorithm(MA) based based iterative logarithmic multiplier using seamless pipelined
technique to get further improvement in trade-off among
Improving MA Accuracy various parameters.

A. Algorithm for iterative logarithmic multiplier


MA based Iterative
Divided Approximation Algorithm In this section, we understand the mathematical analysis of
iterative algorithm with a possibility of achieving an exact
Correction Term based Operand Decomposition result.
Consider two n-bit numbers N1 and N2 of the form.
Iterative Algorithm using X = xn −1 xn − 2 ........x2 x1 x0 (1)
Seamless pipelined
And
Fig.2 Organization model of Logarithmic Multiplication Y = yn −1 yn − 2 ........ y2 y1 y0 (2)
decomposition d) MA-based iterative algorithm and e) The final MA approximation for the multiplication depends on
Iterative algorithm using seamless pipelined. In 1962, Mitchell the carry bit form the sum of the mantissa and is given by:
K +K
proposed an algorithm for multiplication using straight line PM.A =2 1 2 (1+X1 +X 2 ) where X1 +X 2 <1 (3)
approximation, with an error in the accuracy of approx 12% 1+K1 +K 2
PM.A =2 (X1 +X 2 ) where X1 +X 2 ≥1 (4)
during multiplication operation. [8]. Interpolation based LNS
multiplier gives high accuracy, but at high design cost. In The binary representation of multiplication of two numbers N1
1970, Hall et al. proposed multiple piecewise linear and N2 is express as below:
k +k k +k
approximation, but accuracy was at the price of hardware Ptrue = N 1 ∗ N 2 = 2 1 2 (1 + x1 + x 2 ) + 2 1 2 ( x1 x 2 ) (5)
complexity, high power consumption, and less speed [9]. In
1999, SanGregory’s correcting algorithm was small and fast For avoid approximation error, we take.
because it uses only mantissa’s four MSB for adjustment of x∗2k = N − 2k (6)
concatenated result [10]. In 2003, Abed and Siferd developed
logarithmic/ antilogarithmic converter correction algorithm Putting the value of equation (19) in equation (18) and we
that fulfills the required trade-off [11-12]. In 2009 and 2011, find.
( k1 + k2 ) k1 k2 k2 k1 k1 k2
Juang et.al proposed a two region bit level manipulation Ptrue = N1 ∗ N 2 = 2 + ( N1 − 2 )2 + ( N2 − 2 )2 + ( N1 − 2 ) ∗ (N2 − 2 )
scheme to achieve efficient hardware implementation with (7)
high level of accuracy [13]-[14]. A similar approach for error
minimization has been used for an antilogarithmic converter. Suppose that
(k +k ) k k k k
In 2015, Juang et.al 2 region logarithm approximation ranged (0)
P approx = 2 1 2 + ( N1 − 2 1 )2 2 + ( N 2 − 2 2 )2 1 (8)
over 0 to 0.0319 and ranged over -0.60 to 1.72 for

420

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:53:26 UTC from IEEE Xplore. Restrictions apply.
2017 4th International Conference on Signal Processing and Integrated Networks (SPIN)

Is the first approximation of the product and Ptrue is the C. Functional Architecture of MA based iterative
(0) k k
Ptru e = P a p p ro x + ( N 1 − 2 1 ) ∗ ( N 2 − 2 2 ) (9) logarithmic multiplier using seamless pipelined
architecture
The absolute error after the first approximation is
(0) k k
E ( 0 ) = Ptrue − P approx = ( N 1 − 2 1 ) ∗ ( N 2 − 2 2 ) (10) The generalize diagram of the proposed iterative logarithmic
multiplier using seamless pipelined is presented in Fig. 4. The
If we repeated multiplication procedure till n correction terms, MA-based iterative logarithmic multiplier using seamless
we can approximate the product as: pipelined architecture consisting of six major blocks: leading
n
(11) one detector (LOD), priority encoder, barrel shifter (left),
¦
( n )
P
( j )
( 0 )
= P + C
a p p ro x a p p ro x
j =1
adder circuit with seamless pipelined register, decoder, and
register. Here in fig.4, PIPO stands for parallel in parallel out
Equations (1) to (11) show the complete mathematical register which is here used for a general pipelining purpose,
analysis of iterative algorithm [16]. RCA with S.P stands for a ripple carry adder with seamless
pipelined, and BSL stands for barrel shift register.
B. Seamless pipelined technique Functionality of various blocks is discussed as: the LOD is
simple to design
Pipelining performs by inserting registers in combinational Input 1 Input 2
sections of the data path for accelerate digital signal
LOD LOD
processing (DSP) applications. But, it’s hard to decide on the
proper locations where pipeline registers should be inserted to Priority Priority
derive the desired advantage. We see that at the architecture- Encoder Encoder
level pipelining by injecting registers in between adjacent PIPO
arithmetic circuits does not achieve a meaningful reduction in
the critical path so that the advantage of pipelining is usually RCA with S.P B.S.L B.S.L

missed in DSP circuits. We need to pipeline at the architecture Decoder RCA with S.P
level, and pipeline within the arithmetic circuits are considered
separately [18]. To derive greater advantage of pipelining, RCA
combinational logic is required to be partitioned seamlessly
without restricting the locations of pipeline registers to only in P(0) approx.
between arithmetic circuits. We propose a seamless pipelining
Fig.4. Generalize diagram of proposed seamless pipelined iterative
approach where the delay registers are placed across the signal logarithmic multiplier
path of a unified network of combinational components
without restricting them to be placed in between two and give a result which counts the position of LSB, priority
arithmetic components. It is different from conventional fine- encoder expands the LOD as a number, barrel shifter shift
grained pipelining which requires additional registers to be number on the left side. Seamless pipelined is a newly
placed within the arithmetic circuits. The proposed pipelining proposed pipeline technique; it reduced the delay of
approach takes a seamless view of combinational sections and architecture without any increment of the area [18]. And
decides on the appropriate locations where delay registers finally, we remove unnecessary pipelined architecture. As in
should be placed to reduce the critical path most effectively. the seamless pipeline implementation of the basic block, the
For effective seamless pipelining, we obtain a precise estimate residues are available after the first stage. The correction
of the propagation delay of composite circuits to find the circuit can now start to work immediately after the first stage
locations of pipeline registers to minimize the critical path. A of the prior block is finished [18].
conceptual diagram of seamless pipelining of 8 bit ripple carry
adder circuit is shown in Fig. 3.
y7 y6 y5 y4 y3 y2 y1 y0

x8 x7 x5 x4 x3 x2 x1 x0

L1 c3
c6 c5 c4 c2 c1 c0
c7 F.A F.A F.A F.A H.
F.A F.A F.A
A

L2
Fig.5. Block diagram of the proposed seamless pipelined multiplier with
s7 s6 s5 s4 s3 s2 s1 s0 one error correction circuits
Fig.3. Block diagram of seamless pipelining of 8 bit ripple carry adder
circuit The seamless pipelined multiplier with one correction circuits
is presented in figure 5.

421

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:53:26 UTC from IEEE Xplore. Restrictions apply.
2017 4th International Conference on Signal Processing and Integrated Networks (SPIN)

IV. RESULTS AND DISCUSSION logarithmic multiplier using seamless pipelined technique
without the error correction circuit involve 8.21 %, 18.74 %
The proposed structure has N input samples and produces a
and 12.68 % less DAT, 24.82 %, 15.21 %, and 9.34 % area,
2N output sample Where N is an equal number of bits used as
30.99 %, 31.10 %, and 20.84 % ADP, 5.12 %, 15.48%, and
the input bits. The proposed structure for logarithmic
23.55 % less EPS for 8 bits, 16 bits and 32 bits architecture
multiplication involves a LOD, XOR gate, priority encoder,
respectively as compared to reported an ASIC based
barrel shift (left), RCA circuit with seamless pipelined,
logarithmic multiplier using iterative pipelined architecture
decoder circuit, and PIPO register.
[17].
A. Hardware complexity and Synthesis results Table1. General comparison of hardware complexities

Our motive of this work is to implement the seamless Structure Existing17 Proposed
pipelined technique to construct an MA based iterative
logarithmic multiplier architecture which is efficient regarding
hardware. We concentrate on minimizing the hardware with NOT 205 131
an equal accuracy. To fulfill this target, we apply seamless
pipelined register in place of simple pipelining and remove OR 234 160
additional pipeline whichever it is not required.
The theoretical analysis of area complexities in terms of AND 493 345
transistors counted with proposed structures and reported
structures of 8 bits are listed in Table 1. The proposed iterative EXOR/EXNOR 81 81
logarithmic multiplier structure involves 74 less NOT gates,
74 less OR gate, 148 less AND gates and same XOR/XNOR
gates in comparison of Existing.
We have synthesized the proposed iterative logarithmic
multiplier using seamless pipelined architecture 90 nm CMOS V. CONCLUSION
technology node and performance evaluated for 8, 16, and 32
bits data inputs. We have also implemented and synthesizing The seamless pipeline can reduce critical path better than the
the designs of the 8, 16, and 32 bits an ASIC-based pipeline techniques. The pipeline can only reduce DAT up to
logarithmic multiplier using iterative pipelined architecture 10 %, but Seamless pipeline reduced up to 20 % at same
[17]. The area, power, and timing constraints are compared hardware cost. Our main motive to design the hardware
with the reported ASIC-based logarithmic multiplier using architecture of logarithmic multiplier is to make it efficient for
iterative pipelined architecture [17] with Mitchell’s algorithm DSP applications where accuracy is not a big deal. Less
based iterative logarithmic multiplier with seamless pipelined hardware resource required for LNS multiplier while the
technique and without of the error correction circuit as listed binary multiplier takes a lot of hardware resources. This
in Tables 2. The ADP and EPS for the proposed and reported multiplier is an efficient and capable to use as an independent
structures [17] are also listed in Tables 2. As shown in Table multiplier.
1, the proposed Mitchell’s algorithm based iterative

Table2. Synthesis results of proposed Logarithmic Multiplication using seamless pipelined architecture and reported structures [16].

Structure Existing17 Proposed % Gain

N=8 N=16 N=32 N=8 N=16 N=32 N=8 N=16 N=32


37.04 73.80 127.80 34.00 59.97 111.60 8.21 % 18.74 % 12.68 %
DAT(ns)
9557.47 22381.69 63586.93 7186.07 18977.67 57648.15 24.82 % 15.21 % 9.34 %
Area(μm2)
103.63 263.36 754.80 142.48 223.91 660.89 -37.48% 14.98% 12.45%
Power(μW)
354.008 1651.76 8126.40 244.32 1138.09 6433.53 30.99 % 31.10 % 20.84 %
ADP(μm2*μs)
3826.23 19435.96 96463.44 3630.96 16429.13 73755.32 5.12 % 15.48% 23.55 %
EPS(n J)

422

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:53:26 UTC from IEEE Xplore. Restrictions apply.
2017 4th International Conference on Signal Processing and Integrated Networks (SPIN)

References
[10]. S.L. San Gregory, R.E. Siferd, C. Brother and D. Gallagher,
[1]. H. Fu, O. Mencer, and W. Luk, “FPGA Designs with Optimized "Low-Power Logarithm Approximation with CMOS VLSI
Logarithmic Arithmetic.” IEEE Transactions on Computers, Vol. 59, Implementation," Proc. IEEE Midwest Symp. Circuits and
No. 7, pp. 1000–1006, July 2010. Systems, Aug. 1999.

[2]. K. Johansson, O. Gustafsson and L. Wanhammar, [11]. K.H. Abed, R.E. Sifred, “CMOS VLSI Implementation of a
“Implementation of Elementary Functions for Logarithmic Number Low- Power Logarithmic Converter,” IEEE Transactions on
Systems,” IET Computer & Digital Techniques, Vol. 2, No. 4, pp. Computers, Vol. 52, No. 11, pp. 1421-1433, November 2003.
295–304, April 2008.
[12]. K.H. Abed, R.E. Sifred, “VLSI Implementation of a Low-Power
[3]. V. Mahalingam, and N. Rangantathan, “Improving Accuracy in Antilogarithmic Converter,” IEEE Transactions on Computers, Vol.
Mitchell's Logarithmic Multiplication Using Operand 52, No. 9, pp. 1221-1228, September 2003.
Decomposition,” IEEE Transactions on Computers, Vol. 55, No. 2,
pp. 1523-1535, December 2006. [13]. T.B. Juang, S.H. Chen, and H.J.Cheng, “A Lower error and
ROM-free Logarithmic Converter for Digital Signal Processing
[4]. P. Saha, A. Banerjee, A. Dandapat and P. Bhattacharyya, “High Applications.” IEEE Transactions on Circuits and Systems II Vol. 56
speed multiplier using high accuracy floating point logarithmic (12):pp. 931–935, 2009.
number system,” Scientia Iranica Transactions D: Computer Science
& Engineering and Electrical Engineering, Vol. 21, No. 3, pp. 826– [14]. T.-B. Juang, P. K. Meher, and K.S. Jan, “High-performance
841, 2014. Logarithmic Converters Using Novel Two-region Bit-level
Manipulation Schemes.” IEEE International Symposium on VLSI
[5]. L. K. Yu, and D. M. Lewis., “A 30-b Integrated Logarithmic Design, Automation and Test, pp.1–4, 25–28 April 2011.
Number System Processor.” IEEE Journal of Solid State Circuits,
Vol. 26, No. 10, pp. 1433–1440,Oct. 1991. [15]. Tso-Bing Juang, Han-Lung Kuo and Kai-Shiang Jan, “Lower-
error and area-efficient antilogarithmic converters with bit-correction
[6]. E. Swartzlander and A. Alexopoulos, “The sign/logarithm schemes,” Journal of the Chinese Institute of Engineers, 2015.
number system,” IEEE Transactions on Computers, vol. C, no 12, pp.
1238–1242, December 1975. [16]. P. Bulic, Z. Babic, and A. Avramovic, “A simple pipelined
logarithmic multiplier”, IEEE International Conference on Computer
[7]. M. J. Schulte, and E. E. Swartzlander, “Hardware Designs for Design (ICCD), pp.235-240 (2010).
Exactly Rounded Elementary Functions.” IEEE Transactions on
Computers, Vol. 43, No. 8, pp. 964–973, Aug. 1994. [17]. R. K Agrawal, and H. M. Kittur, “ASIC based
logarithmic multiplier using iterative pipelined architecture,” IEEE
[8]. J.N. Mitchell, "Computer Multiplication and Division using Conference on Information & Communication Technologies
Binary Logarithms," IRE Transactions on Electronic Computers, Vol. (ICT),pp.362-366 (2013).
11, No. 6, pp. 512-517, Aug. 1962.
[18]. P. K. Mehar, “seamless pipelining of DSP circuits,” circuits
[9]. E.L. Hall, D.D. Lynch, and S.J. Dwyer III, “Generation of system and signal processing, (4 July 2015), DOI 10.1007/s00034-
Products and Quotients Using Approximate Binary Logarithms for 015-0089-2.
Digital Filtering Applications,” IEEE Trans. Computers, vol. 19, no.
2, pp. 97-105, Feb. 1970.

423

Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY CALICUT. Downloaded on August 28,2020 at 13:53:26 UTC from IEEE Xplore. Restrictions apply.

You might also like