Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

A New CMOS Topology for Low-Voltage Null Convention Logic Gates Design

2014, 2014 20th IEEE International Symposium on Asynchronous Circuits and Systems

A New CMOS Topology for Low Voltage Null Convention Logic Gates Design Matheus Trevisan Moreira, Michel Evandro Arendt, Ricardo Aquino Guazzelli, Ney Laert Vilar Calazans GAPH – FACIN – Potifical Catholic University of Rio Grande do Sul Porto Alegre, Brazil {matheus.moreira, ney.calazans}@pucrs.br, {michel.arendt, ricardo.guazzelli}@acad.pucrs.br Abstract—This paper proposes a new transistor topology to design gates required by Null Convention Logic for low voltage operation. The new topology enables implement all functionalities required by this design style. Extensive simulation results conducted in a 65 nm CMOS technology allow comparing the new topology to popular static and semi-static ones and indicate that the former presents better speed, energy and leakage trade-offs for different voltage levels, demonstrating the suitability of the new topology for low voltage applications. Drawbacks are an area of 4 minimum size transistors and reduced robustness against soft errors, when operating at non-minimum voltages. Keywords—Null Convention Logic, static, low-power. I. INTRODUCTION Asynchronous or clockless circuit design can cope better with inherent problems of current technologies that make synchronous design over constrained, like susceptibility to PVT variations and excessive power dissipation in clock trees. Albeit asynchronous circuits can be implemented using many different templates, the quasi-delay-insensitive (QDI) [1] template is attractive for several reasons, but especially because it allows wire and gate delays to be ignored, given that the isochronic fork [2] delay assumption is respected. This allows QDI circuits to better accommodate delay discrepancies caused by PVT variations, making them more robust than circuits based on templates that rely on complex timing assumptions. Also, with QDI, design complexity can be considerably reduced, facilitating timing closure and analysis. The definition of a specific QDI template requires the choice of a handshake protocol and a delay-insensitive (DI) data encoding. According to Martin and Nyström [3], the 4phase handshaking protocol coupled to either 1-of-2 or 1-of-4 codes comprises almost the entirety of options in practical QDI design. For such templates, various logic styles support sequential and combinational logic implementations. Of the styles proposed to date, the Null Convention Logic (NCL) [4] is one that enables power-, area- and speed-efficient design [5]-[11] based on standard cells. In fact, NCL has been successfully employed in the design of many fabricated chips, as discussed in [12]. Also, several design flows have been proposed to date to help automating NCL design, as described in [12]-[15]. In this paper we propose a new CMOS topology for implementing gates required by NCL design at the standard-cell level. The topology is compared to the classic static and semistatic topologies in the following aspects:  Post-layout extraction input capacitance and internal parasitics;  Area, speed and power tradeoffs;  Vulnerability to PVT variations;  Applicability to voltage scaling applications;  Robustness against single event effects. For the sake of comparison, 27 case study gates were designed to layout level targeting a 65nm bulk CMOS technology from STMicroelectronics. These gates employed 3 different NCL functionalities implemented using 3 different topologies (static, semi-static and the proposed one) for 3 different driving strengths. Results indicate that the proposed topology is suited for voltage scaling applications and presents better speed, energy and power tradeoffs for the increased silicon area cost. This cost is fixed: 4 minimum sized transistors per gate. Also, the proposed topology is generally more robust against PVT variations and presents lower input and parasitic capacitances. II. NULL CONVENTION LOGIC Theseus Logic, Inc. [4] proposed NCL, a logic that has been employed to implement QDI asynchronous systems on silicon. NCL is an alternative to other design styles like delay insensitive minterm synthesis (DIMS) [16] and was applied to cope with power problems [5]-[7], to design high speed circuits [8] [9] and to fault tolerant schemes [10] [11], as well as other applications such as ternary logic [17]. One of its advantages is to enable power-, area- and speed-efficient QDI design with a standard-cell-based approach, while other asynchronous templates require recourse to full-custom approaches. NCL gates are sometimes called threshold gates, but this is imprecise. In fact, NCL gates couple a threshold function [18] with positive integer weights assigned to inputs to the use of a hysteresis mechanism. This is required to support DI circuit design using 1-of-n data encoding. Figure 1 shows a generic NCL gate symbol. I1 I2 I3 ... MWw1..wn IN Q Figure 1 – Basic NCL gate symbol. In NCL gates N is the number of gate inputs, M is the gate threshold or a threshold function, and each input has a weight (wi). Wherever no weight is specified, weight 1 is assumed. Weights come after the W specifier. The output switches to 0 when all N inputs are 0 and to 1 when the sum of weights for The static topology, presented in Figure 2 (b) is similar to the semi-static one. However, its feedback inverter is controlled by pull-up (HOLD0) and pull-down (HOLD1) networks. The former is the complement network of SET and the latter is the complement network of RESET. This allows avoiding that the feedback inverter interferes during output switching. Note that for both static and semi-static topologies, the output inverter transistors are sized to be able to drive the output load. Transistors of the RESET and SET functions are dimensioned to drive the equivalent input capacitance of the output inverter. All remaining transistors can be minimum size. It is common knowledge that static topologies are typically more power efficient, while semi-static ones are more area efficient [24] [25]. Also, according to Moreira and Calazans [26] and Bastos et al. [27], the static topology is more suited to voltage scaling applications, while the semi-static one is more robust to single event effects. These works explored static and semi-static C-Element topologies that are compatible with NCL gates, since basic C-Elements are the special case of NCL gates where threshold M is equal to the number of inputs N, with inputs weights all equal to 1. RESET VDD P1 P0 N1 N0 P2 HOLD 0 P0 SET (a) P1 N3 N1 VDD RESET HOLD 0 P1 P3 Q VDD P0 N0 HOLD 1 N2 Q N1 N0 HOLD 1 (b) (c) Figure 2 – NCL topologies: (a) semi-static; (b) static; (c) proposed. The new scheme guarantees that P0 and N0 will never be on simultaneously, since the RESET network is on only when all inputs are 0, while SET requires that at least one input be 1. Practical SET functions require at least 2 inputs to be 1. This renders the proposed topology even more interesting, because momentary shorts present in the output inverter of the static topology are avoided. In fact, before making either P0 or N0 on, both transistors are always turned off, because as soon as SET and RESET are off, their complements HOLD0 or HOLD1 will be on, and a minimum of two inputs are required to switch from an on SET network to an on RESET network and viceversa. Also, because input capacitances of P0 and N0 are decoupled, contrary to what happens in the classical static and semi-static topologies, SET and RESET networks need to drive smaller capacitances to switch the output. In this way, better power and delay tradeoffs are expected as characteristics of the new topology. IV. III. VDD Q VDD SET In fact, this work considers the static and semi-static topologies [23], which are most referred in literature and have been successfully fabricated on silicon. Note that we also discarded dynamic topologies [23] as these are not suited to QDI design. In the classic semi-static topology, presented in Figure 2 (a), the reset and set functions correspond to pull-up (RESET) and pull-down (SET) networks, followed by an output inverter. RESET detects when all inputs are 0, corresponding to a series of N PMOS transistors. Note that the output Q is then inverted by output inverter composed by P0 and N0. SET depends on the gate threshold M. Also, to ensure delay insensitivity, the gate keeps its output value when neither RESET nor SET functions are true. Since these are not complementary, the static gate requires a feedback inverter, composed by P1 and N1. Note that this inverter is typically minimum size, as it is not required for output switching, being used just for keeping the output stable (a memory effect). In fact, using a minimum size inverter allows reducing interference during gate switching. RESET There are different manners of implementing NCL gates as standard-cells. Some rely on the usage of differential logic, as discussed in [19] and [20]. However, this requires structural modifications in the design and is not compatible with state-ofthe-art methods and tools for NCL design automation. Others rely on the usage of multi-threshold CMOS technologies, as discussed in [21]. However, these also require the availability of specific technologies and are not directly compatible with methods and tools proposed to date. Therefore we do not consider differential or multi-threshold topologies in this work. We also discard the topology proposed in [22] as it does not present a general schematic and requires gate specific designs. sioning of these follows a process similar to that of the output inverter for the topologies previously mentioned. The same RESET and SET networks control these transistors. However, the latter are respectively connected to their complements HOLD1 and HOLD0 networks. Accordingly, the P0 and N0 driving transistors are both turned off by a hold state input combination. This state uses transistors P1 to P3 and N1 to N3 to maintain the output stable. All these are minimum sized and the mechanism employs a loop of two inverters (P1-N1 and P3-N3), where the driving inverter (P3-N3) is controlled by HOLD1/RESET or by HOLD0/SET (through P2 and N2). Note that HOLD0 and HOLD1 transistors are also minimum size and transistors used for SET and RESET are dimensioned for driving P0 and N0 input capacitance loads. Also, since C-elements are special NCL gates the new scheme is also a new topology for C-elements. SET inputs at 1 reaches threshold M or satisfies the threshold function. Otherwise, the previous output value is maintained. EXPERIMENTS AND DISCUSSION THE PROPOSED TOPOLOGY This work proposes the alternative NCL topology displayed in Figure 2 (c). In it, output Q switches to 1 or 0, according to the set and reset transistors P0 and N0, respectively. Dimen- A. Gates Design In order to compare the discussed topologies, 3 different NCL functionalities were implemented until the layout level for 3 different driving strengths (X2, X7 and X13) for the static, semi-static and proposed topologies. This produced a total of 3*3*3=27 case study gates. The target technology was the STMicroelectronics 65nm bulk CMOS and gate design followed the flow proposed by the authors in [28]. This flow relies on the use of specially designed design automation tools as well as on the Cadence Framework and Mentor Calibre for layout verification and extraction. Selected functionalities were1: 2-of-2 (M=2, N=2 and weights=1 1); AO2-of-4 (M=AndOr, N=4 and weights=1 1 1 1); 3W22-of-4 (M=3, N=4 and weights=2 2 1 1). This choice allows evaluating trade-offs of topologies for low (2-of-2), medium (AndOr2-of-4) and high (3W22-of-4) complexity gates. Also, in the authors experience these functionalities are widely used in NCL design. After layout extraction, analog simulation allows evaluating the case study gates. All results presented herein are based on worst case RC parasitics extraction. B. Area, Power and Speed Table I presents a general comparison of five relevant parameters for all 27 cells: area, input capacitance, parasitic capacitance, speed to energy efficiency and speed to leakage efficiency. To each parameter values column follows a column that shows the values as percentage deviations from the value of that parameter in the new topology. Regarding area, it is clear that for small driving strengths, the proposed topology displays substantial area overhead. In fact, for a 2-of-2 gate of drive X2, it requires 2 times the area required by a semi-static topology. However, when compared to the static implementation its worst case overhead is 40%, in the X2 and X7 2-of-2 gates. Note that the static topology requires 71.4% the area of the new topology in this case. This area overhead for small driving strengths, is a consequence of the fixed overhead of four transistors. In fact, as the driving strength is increased, the overhead decreases. Therefore, the bigger the driving strength is, the lower is the area overhead. Besides, as gate complexity grows the cost is also amortized. This can be noticed by analyzing the area results for 3W22-of-4 and AO2-of-4 gates. For the latter, the area required by the static topology is never less than 90% the area of the new topology. For the semi-static topology, the area is always equal to or above 94.1% of the new topology. Note that for AO2-of-4 gates the semi-static topology can present larger area than the static one. This is due to the big transistors required by the former, folded in layout, which increases area. Regarding the input capacitance data, Table I presents results obtained after layout extraction. Values correspond to the worst case among capacitances for all inputs in each cell. In all cases, the semi-static topology presents an input capacitance lower or equal to the proposed one. This is justified by the reduced number of transistors that have their gates connected to its gate inputs, due to the simplicity of the topology. On the other hand, the static topology presents an input capacitance which is always bigger than the new topology for the more complex gates (3W22-of-4 and AO2-of-4). In the worst case, it presents an overhead of almost 30%. For the 2-of-2 case study, 1 The authors consider the classical NCL gate terminology rather confusing, and do not employ it here. In the classical terminology, 2-of-2 is TH22, 3W22-of-4 is TH34W22 and AO2-of-4 is TH44 (AND case). the input capacitance is quite similar to that obtained for the proposed topology. The justification for this is because the new topology typically requires smaller transistors in the SET and RESET networks, leading to reduced gate capacitance. This indicates that the new topology is suited for standard-cell based design, as input capacitance is crucial in technology mapping and optimization steps. In fact, smaller capacitances enable better speed, energy and power tradeoffs at the circuit level. As for parasitics, Table I shows that the semi-static topology presents lower parasitics than the proposed topology in most cases. This is justified by the considerably smaller number of transistors required by the former. Results also indicate that compared to the static topology the proposed topology presents lower parasitics, except for the lowest complexity gate (2-of-2). This indicates the suitability of the new topology to produce complex NCL gates. Speed, energy and leakage power figures were obtained during analog simulation of the extracted circuits using Cadence Spectre simulation and “.measure” commands of SPICE language. Simulations employed typical fabrication process and operating conditions (1V and 25C). Also, during simulation, all gates had a fixed output load equivalent to four inverters with the same drive strength of the gate (FO4). This allows a fair and realistic comparison, as this metric is classically employed in digital circuit design. For each case study gate, all transition arcs for each input/output pair were simulated and, for each arc, propagation delay was measured as the time a transition in an input takes to cause an output switching. The energy consumed for switching the output was also measured for each arc. The energy was measured as the integral of the current in the power supply during the time the cell is switching multiplied by the operating voltage. Also, we simulated all static states of each case study gate and we measured leakage power as the average current in the power source during each state multiplied by the operating voltage. From these results we obtained the number of times the cell is capable of switching its output per second, measured in giga transitions per second (GTPS). The GTPS value considers the average between all obtained propagation delays for each case study. Energy per transition (EPT) was measured as the average energy consumed for switching the output of each gate. Since each topology can present varying propagation delay, energy and leakage power figures, even for a same driving strength, a fair comparison is not possible by analyzing just these figures. Accordingly the expression of results employs cost-benefit functions. The tenth column of Table I presents the GTPS/EPT results. The ratio between the measured GTPS and EPT values defines the speed-energy efficiency function. This enables evaluating the speed of the gates without overlooking the associated energy consumption. The proposed topology presents bigger GTPS/EPT figures in all cases but for the 2-of2 gate with a driving strength of X2, where its results are very close to the ones obtained for the static topology. This means that, in general, the new topology is capable of making more transitions than the static and semi-static topologies for a given amount of consumed energy. Table I – Area, input and parasitic capacitances, speed-energy and speed-leakage tradeoffs for the case study gates. All-bold columns are relative comparisons between the proposed topology and the others. Area (µm²) % of Proposed Proposed X2 Static Semi Static Proposed X7 Static Semi Static Proposed X13 Static Semi Static 7.28 5.2 3.64 7.28 5.2 3.64 8.32 6.76 5.72 100.0 71.4 50.0 100.0 71.4 50.0 100.0 81.3 68.8 Proposed 11.44 X2 Static 9.36 Semi Static 8.84 Proposed 11.44 X7 Static 9.36 Semi Static 8.84 Proposed 13 X13 Static 10.92 Semi Static 9.36 Proposed 8.84 X2 Static 8.32 Semi Static 8.84 Proposed 9.36 X7 Static 8.84 Semi Static 8.84 Proposed 10.4 X13 Static 9.36 Semi Static 9.88 AO2-of-4 3W22-of-4 2-of-2 NCL Drive Topology Function Input % of Pro- Parasitic Cap. (fF) posed Cap. (fF) % of Proposed GTPS / EPT % of Proposed GTPS / Avg. Leakage % of Proposed 0.61 0.6 0.36 0.62 0.66 0.36 0.78 0.69 0.78 100.0 98.4 59.0 100.0 106.5 58.1 100.0 88.5 100.0 3.760 3.084 1.766 3.662 3.40 1.966 4.98 3.821 4.32 100.0 82.0 47.0 100.0 92.8 53.7 100.0 76.7 86.7 5.74 5.87 3.35 3.82 2.86 2.46 2.19 1.52 1.64 100.0 102.3 58.4 100.0 74.9 64.4 100.0 69.4 74.9 0.12 0.13 0.07 0.08 0.06 0.05 0.05 0.03 0.04 100.0 108.3 58.3 100.0 75.0 62.5 100.0 60.0 80.0 100.0 81.8 77.3 100.0 81.8 77.3 100.0 84.0 72.0 1.2 1.34 0.77 1.18 1.53 0.77 0.93 1.2 0.81 100.0 111.7 64.2 100.0 129.7 65.3 100.0 129.0 87.1 7.190 7.986 5.77 7.723 8.44 6 8.657 8.7 6.544 100.0 111.1 80.3 100.0 109.3 77.7 100.0 100.5 75.6 3.36 3.33 1.58 2.30 1.80 1.32 1.36 0.99 0.88 100.0 99.1 47.0 100.0 78.3 57.4 100.0 72.8 64.7 0.31 0.47 0.27 0.25 0.28 0.21 0.17 0.16 0.14 100.0 151.6 87.1 100.0 112.0 84.0 100.0 94.1 82.4 100.0 94.1 100.0 100.0 94.4 94.4 100.0 90.0 95.0 0.77 0.94 0.64 0.77 0.91 0.64 0.76 0.97 0.67 100.0 122.1 83.1 100.0 118.2 83.1 100.0 127.6 88.2 5.754 6.344 4.83 5.938 6.24 4.83 6.73 6.938 7.27 100.0 110.3 83.9 100.0 105.1 81.3 100.0 103.1 108.0 3.43 3.21 1.37 2.56 1.68 1.24 1.61 0.94 0.89 100.0 93.6 39.9 100.0 65.6 48.4 100.0 58.4 55.3 0.35 0.49 0.29 0.30 0.28 0.23 0.20 0.16 0.16 100.0 140.0 82.9 100.0 93.3 76.7 100.0 80.0 80.0 Note that the bigger the driving strength is the bigger is the gain in delay-energy efficiency of the proposed topology. In the best case, an improvement of over 150% is observed when compared to the static topology. E.g. for the AO2-of-4 gate, the static topology has a value of GTPS/EPT that is just 39.9% of the new topology value. This is justified by the fact that the new topology has typically lower capacitances to drive when switching the output. Note that in this topology (refer to Figure 2(c)) the gate capacitances of the PMOS and NMOS transistors of the output inverter (P0 and N0) are decoupled. In this way, the proposed topology is suited for energy efficient applications. Also, the proposed topology is well suited for standardcell-based design, as its speed-energy efficiency does not decay as in the static and semi-static topologies as driving strength increases. This allows design optimizations to take place without compromising low power operation. Table I also shows speed-leakage efficiency results. GTPS/Average Leakage was measured as the obtained GTPS for each gate divided by the average leakage power. This creates a cost function to evaluate speed-leakage tradeoffs and enables analyzing the speed of the gates without overlooking leakage power. In this case, improvements were not as substantial as in delay-energy tradeoffs. In fact, for X2 gates, our topology presented worse GTPS/Average Leakage tradeoffs in all cases. However, as the driving strength increases, our topology proves to be more efficient in terms of the speed-leakage power tradeoff. In the best case, improvements were of more than 70%. This corroborates the previous results that indicate the suitability of the topology for cell-based design. C. Process, Voltage and Temperature Variations Another desirable feature for contemporary integrated circuits is increased tolerance to process, voltage and temperature (PVT) variations. In fact, process variations are a critical problem in current technologies. A second set of experiments enabled to measure the variability in the observed performance figures for the case study gates while varying operating voltage, temperature and process fabrication parameters. Note that, because leakage variations were not significant and there were little discrepancies between the analyzed gates, the paper present only results obtained for propagation delay and energy per transition. Firstly, we submitted the gates to variations in operating voltage and temperature. We simulated these for a range from 90% to 110% of the nominal values (1V and 25C) in steps of 1%, combining all possible values. This means 21 distinct voltages and temperatures that combined lead to 21*21=441 simulation scenarios for each gate. Figure 3 shows the variations observed in measured energy per transition. In the Figure, case study gates are distributed along the horizontal axis. A PR prefix indicates the gate employs the proposed topology; an ST prefix indicates it employs the static topology and an SS prefix indicates it employs the semi-static topology. Also, 22 stands for a 2-of-2 gate, 3W22 is a 3W22-of-4 gate and AO24 is an AO2-of-4 gate. Gate names were abbreviated to allow a more compact representation in the charts. In these, variations can be either positive or negative and they are measured from the base value obtained for nominal voltage and temperature, represented by the 0 value in the vertical axis in the charts. Note that in most cases the proposed topology presents smaller amplitude in energy per transition variations, when compared to the other topologies. Also, as driving strength is increased, susceptibility to voltage and temperature variations is worsened in all topologies. However, for the proposed topology, the increase is not as substantial as in for the static and semi-static topologies. of a weak inverter for maintaining the output stable, which imposes a resistance when switching the output, while the other topologies employ a more sophisticated and expensive mechanism (see Figure 2). Also, in general, the proposed topology is the most robust against PVT variations. Concerning variations in propagation delay, Figure 4 shows that all topologies present similar susceptibility to voltage and temperature variations. However, for the biggest simulated driving strength, the proposed topology always presents smaller variations. In this way, charts of Figure 3 and Figure 4 confirm the suitability of the proposed topology for standard-cellbased design. Figure 5 – Energy variation distribution observed from Monte Carlo analysis. Figure 3 – Energy variation distribution for varying operating voltage and temperature.2 Figure 6 – Propagation delay variation distribution observed from Monte Carlo analysis. Figure 4 – Propagation delay variation distribution for varying operating voltage and temperature. A second set of simulations allowed evaluating the effect of process variations in gate performance figures. To do this, we proceeded to a process and mismatch Monte Carlo analysis with 5000 samples and measured energy per transition and average propagation delay for each simulated scenario. The charts in Figure 5 and Figure 6 summarize the results. Figure 5 shows the proposed topology presents smaller variations in energy per transition than the other topologies for X7 and X13 driving strengths. For X2, the results observed are comparable to those of the static topology. As for the observed propagation delay variations, the proposed topology presents values comparable to the ones observed for the static topology in most cases. Note that PVT variations are more significant in the semi-static topology. This is expected, as this topology relies on the usage 2 Note that, for Figures 3-6, data were divided into 10 sets (5 positive and 5 negative). Each color represents a quintile of positive sets (in red) and negative sets (in blue), where darker colors represent lower quintiles and brighter colors represent upper quintiles. D. Voltage Scaling Another very important aspect for contemporary technologies is the ability to operate at voltage levels lower than the nominal. In fact, according to Hanson et al. [29], voltage scaling is the most effective solution to cope with increasing power constraints. Accordingly, we performed a set of experiments to evaluate the impact of voltage scaling in the case study gates. The first experiment detected the minimum voltages that can be applied to each gate without interfering in their correct behavior. The experiment investigated scenarios for varying temperatures and a fixed fan-out of four (FO4) output load. Minimum voltages were estimated by simulating all transition arcs and static states of each gate for each temperature/voltage scenario. When at least one arc does not generate the correct output or a static state is not able to maintain correct functionality, the scenario is defined as not functional. Also, the signals generated must have voltages in well defined regions, for logic ‘1’ (from 90% to 100% of the power supply) or for logic ‘0’ (from 0% to 10% the power supply). If a signal presents a voltage level in the undefined region (from 10% to 90%), the scenario is also defined as not functional. In summary, the minimum voltage is defined as the lowest voltage at which the gates can operate without jeopardizing their correct logical/electrical behavior. Figure 7 summarizes the results obtained. posed topology presents GTPS/EPT values that are 37% better, in average, than the same values for the static topology. (a) (d) (b) (e) (c) (f) Figure 7 – Observed minimum operating voltage for the case study gates. Darker case values are worse than light case values. As the tables in Figure 7 show, the semi-static topology is clearly not suitable for voltage scaling, as it tolerates less variations in operational voltage. The proposed and the static topologies, on the other hand, tolerate voltages as low as 0.15V and 0.1V, respectively. In fact, the obtained results for these topologies are similar. Results can be explained analyzing the transistors arrangement of each topology. Recalling Figure 2, in the semi-static topology there is a conflict-solving situation for every output transition and there is an assumption that the feedback inverter is weaker than the driving RESET and SET networks. However, as operating voltage is scaled down this assumption no longer holds. This phenomenon does not happen in the static and proposed topologies, because these employ a mechanism for disabling the feedback inverter while switching, enabling them to operate at reduced voltages. Therefore, the latter are better for semi-custom low voltage design. A second experiment allowed us to compare speed-energy and speed-leakage tradeoffs for the static and the proposed topologies under varying supply voltages. Note that we discarded the semi-static topology for this experiment as it does not cope well with variations in supply voltage. Figure 8 shows the speed-energy efficiency values, in GTPS/EPT. As the charts show, the proposed topology is the one that presents highest GTPS/EPT values for all case study gates. In fact, the proposed topology achieves optimizations of roughly 50%, in average when compared to the static one. Also, both topologies reach optimum power efficiency when supplied with nearthreshold voltages (between 0.5 V and 0.6 V), for all driving strengths. Note that for minimum operating voltages the pro- Figure 8 – Speed-energy efficiency for 2-of-2, 3W22-of-4 and AO2of-4 gates using ((a), (b) and (c)) the proposed and ((d), (e) and (f)) the static topology. Figure 9 shows the speed-leakage efficiency values, in GTPS/Avg. Leak., measured for all gates considered here. As the charts show, in this case, the static topology is the best for lower driving strengths. However, as the driving strength is increased, the proposed topology becomes advantageous. This confirms previous results. As the charts show, optimum speedleakage efficiency is also obtained in the near-threshold voltages. Moreover, for minimum operating voltages, the differences in the observed GPS/AVG Leak. between the proposed and the static topologies were negligible. In view of the obtained results, we understand that, in general, the proposed topology presents better speed, energy and leakage power tradeoffs for different voltage levels. Therefore, we consider that the topology is suited for low voltage operation. E. Fault Tolerance Single event effects can cause the output of an NCL gate to incorrectly flip. Depending of the state of the gate, this can have irreversible consequences. For instance, if the gate is in a state of memorization, i. e. if its output is not being driven by the SET or RESET networks, a glitch on the output may generate a single event upset (SEU). This is widely discussed in [27] and can compromise the correct functionality of QDI circuits. A final experiment allowed us to evaluate the robustness of the implemented gates against single event effects (SEEs). (e) (c) (f) Figure 9 – Speed-leakage efficiency using ((a), (b) and (c)) proposed and ((d), (e) and (f)) static 2-of-2, 3W22-of-4 and AO2-of-4 gates. To compute robustness data we simulated the behavior of NCL gates under SEEs using the particle strike model proposed in [30]. In this model, the current injected in the gate nodes is a consequence of the charge collected, expressed by the following equation: , where Q is the collected charge at the junction, τ_α is the collection time-constant of the junction and τ_β is the ion-track establishment time-constant. Note that τ_α and τ_β are technology specific constants. The resulting model entered in MATLAB was converted to a transient current source described in SPICE. This source was used to simulate the effect of particle strikes in all input and output nodes of the NCL gates while these are kept in memorization states. It is important to clarify that a pair of inverters was inserted in the inputs of the simulated gates, to allow injecting current without interference of fixed input sources, which were employed for feeding the input inverters. Furthermore, four parallel inverters with driving strength identical to that of the gate under simulation were added at the output, to respect the FO4 output load principle. Figure 10 shows an example simulation environment, as described for the 2-of-2 NCL gate, where the effect of charge collection for each scenario is generated by two current sources at the inputs (I0 and I1) and one at the output (I2). Using this model, different scenarios were simulated, where the collected charge Q varies from 0.1 fC to 30 fC, in 0.1 fC steps. These values are realistic for the target technology, according to the related documentation. All simulation scenarios Figure 10 – Example of simulation environment using a 2-of-2 NCL. Note that as the driving strength is increased, the minimum charge for generating SEUs for the static and semi-static topologies is over the boundary of the performed experiments (30 fC). This is due to increases in input and parasitics capacitances of the gates, which filter transients. The same is not observed for the new topology. In fact, for input injections, bigger driving strengths allow improving robustness, as input and internal capacitances are increased. However for output injections, robustness is minimally affected by driving strength. This is justified because during memorization states, the integrity of the new topology output relies on minimum sized transistors in a loop of inverters as Section III describes. In this way, transients in the output node easily corrupt its value, identifying the Achilles heel of the proposed topology. Table II – Input (I.) and output (O.) critical charge for all NCL gates. NCL Drive Topology Func. 1V I. (fC) O. (fC) Proposed Static Semi Static Proposed X7 Static Semi Static Proposed X13 Static Semi Static 22.5 21.7 22.5 - 9.3 20 20 12.6 17.2 - Proposed Static Semi Static Proposed X7 Static Semi Static Proposed X13 Static Semi Static 22.2 23.1 25 - 9.8 20.3 20.4 12.6 17.3 - Proposed Static Semi Static Proposed X7 Static Semi Static Proposed X13 Static Semi Static 22.6 22.3 23.2 - 9.8 20 20.4 12.6 17.2 - X2 2-of-2 (b) 3W22-of-4 (d) AO2-of-4 (a) employed typical fabrication process and operating temperature. During simulation, we measured the minimum collected charge that caused the output of the gate to flip incorrectly, i.e. the minimum collected charge that generated an SEU. This was done for both, injections in the inputs and in the outputs. A first set of simulations were conducted assuming an operating voltage of 1V, as Table II shows. For this set, the semi-static topology displayed superior results as it always requires higher charges to generate SEUs. For injections at the inputs, the proposed topology presented results similar to those obtained for the static topology. However, for injections at the outputs, the new topology showed to be much more sensitive to SEEs. In fact, it typically required less than 50% of the charge required by static and semi-static topologies to produce an SEU. X2 X2 0.6V I. (fC) O. (fC) 0.2V I. (fC) O. (fC) 8.9 8.6 4.1 8.5 0.1 0.1 0.5 0.5 21.7 22.1 5.7 23.6 0.1 0.1 1.1 1.1 - 8.3 - 0.1 0.1 1.7 2.3 9 9.4 4.1 8.8 0.1 0.1 0.4 0.2 21.9 22.8 5.8 23.8 0.1 0.1 0.5 0.5 - 8.4 - 0.1 0.1 1 1 9.2 9 4.1 8.6 0.1 0.1 0.2 0.2 22.3 21.6 5.7 23.9 0.1 0.1 0.5 0.6 - 8.4 - 0.1 0.1 1 1.1 Another two sets of experiments were conducted, employing the same simulation environment, but using operating voltage of 0.6V and 0.2V, respectively. It was then possible to evaluate the robustness of the topologies when operating at low voltages. The obtained results are also summarized in the following columns of Table II. Note that for these experiments we discarded the semi-static topology, because it does not tolerate significant variations in its operating voltage. As Table II shows, when the gates operate at 0.6V, results similar to those observed for 1 V appear and the static topology tolerates bigger charges than the proposed topology. However, when operating at minimum voltage levels (0.2 V), the proposed topology presents a robustness similar to that observed for the static topology. This confirms the suitability of the proposed topology for low voltage operation. V. CONCLUSIONS This article proposed a new topology for designing NCL gates. Experimental results indicate the suitability of the topology to low voltage applications. Accordingly, when operating at minimum voltages, it provides improvements in speed, energy and leakage trade-offs while maintaining robustness against single event effects at levels similar to the classic static topology. As future work, we intend to evaluate the improvements that can be achieved by employing multi-threshold logic in the proposed topology, enabling to mitigate problems caused by single event effects. It is also future work evaluating metastability effects on the proposed topology and its usage for constructing NCL+ gates [31]. [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] ACKNOWLEDGEMENTS [22] Authors acknowledge the support of CNPq under grants 401839/2013-3, 200147/2014-5 and 310864/2011-9 and the support of FAPERGS under grant 11/1445-0. [23] REFERENCES Manohar and A. J. Martin. Quasi-delay-insensitive circuits are turingcomplete. In International Symposium on Advanced Research in Asynchronous Circuits and Systems, 1996. [2] A. J. Martin. The limitations to delay-insensitivity in asynchronous circuits. In Proceedings of the 6th MIT Conference on Advanced Research in VLSI, 1990, pp. 263-278. [3] A. J. Martin and M. Nyström. Asynchronous Techniques for System-onChip Design. Proceedings of the IEEE, 94(6), June 2006, pp. 1089-1020. [4] K. M. Fant and S. A. Brandt. NULL convention logic: a complete and consistent logic for asynchronous digital circuit synthesis. In International Conference on Application Specific Systems, Architectures and Processors, 1996, pp. 261-273. [5] R. D. Jorgenson, L. Sorensen, D. Leet, M. Hagedorn, D. R. Lamb, T. H. Friddell and W. P. Snapp. Ultralow-Power Operation in Subthreshold Regimes Applying Clockless Logic. Proceedings of the IEEE, 98(2), February 2010, pp. 299-314. [6] Z. Liang, S. C. Smith and J. Di. Bit-Wise MTNCL: An ultra-low power bit-wise pipelined asynchronous circuit design methodology. In IEEE International Midwest Symp. on Circ. and Syst., 2010, pp. 217-220. [7] G. Xuguang, Y. Liu and Y. Yang. Performance Analysis of Low Power Null Convention Logic Units with Power Cutoff. In Asia-Pacific Conference on Wearable Computing Syst., 2010, pp. 55-58. [8] W. Jun and C. Minsu. Latency & area measurement and optimization of asynchronous nanowire crossbar system. In Instrumentation and Measurement Technology Conference, 2010, pp. 1596-1601. [9] Y. Yang, Y. Yang, Z. Zhu and D. Zhou. A high-speed asynchronous array multiplier based on multi-threshold semi-static Null convention logic pipeline. In IEEE International Conf. on ASIC, 2011, pp. 633-636. [10] F. K. Lodhi, O. Hasan, S. R. Hasan and F. Awwad. Modified null convention logic pipeline to detect soft errors in both null and data phases. [24] [1] [25] [26] [27] [28] [29] [30] [31] In IEEE International Midwest Symp. on Circ. and Syst., 2012, pp. 402405. W. Kuang, P. Zhao, J. S. Yuan and R. F. DeMara. Design of Asynchronous Circuits for High Soft Error Tolerance in Deep Submicrometer CMOS Circuits. IEEE Transactions on VLSI Systems, 18(3), March 2010, pp. 410-422. M. Ligthart, K. Fant, R. Smith, A. Taubin and A. Kondratyev. Asynchronous design using commercial HDL synthesis tools. In International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2000, pp. 114-125. J. Cheoljoo and S. Nowick. Technology mapping and cell merger for asynchronous threshold networks. In IEEE Trans. on Computer-Aided Design of Integrated Circ. and Systems, 27(4), April 2008, pp. 659-672. F. A. Parsan, W. K. Al-Assadi and S. C. Smith. Gate mapping automation for asynchronous null convention logic circuits. IEEE Transactions on VLSI Systems, 22(1), January 2014, pp. 99-112. R. Reese, S. C. Smith and M. A. Thornton. Uncle - an rtl approach to asynchronous design. In International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2012, pp. 65-72. P. A. Beerel, R. Ozdag and M. Ferretti. A Designer’s Guide to Asynchronous VLSI. Cambridge University Press, 2010, 337 p. S. Andrawes and P. Beckett. Ternary circuits for Null Convention Logic. In Intern. Conf. on Computer Engineering & Systems, 2011, pp. 3-8. S. L. Hurst. An Introduction to Threshold Logic: A Survey of Present Theory and Practice. The Radio and Electronic Engineer, 37(6), June 1969, pp. 339-351. S. Yancey and S. Smith. A differential design for C-elements and NCL gates. In IEEE International Midwest Symp. on Circ. and Syst., 2010, pp. 632-635. H. Lee and Y. Kim. Low Power Null Convention Logic Circuit Design Based on DCVSL. In IEEE International Midwest Symp. on Circ. and Syst., 2013, pp. 29-32. A.D. Bailey, J. Di, S. C. Smith and H. A. Mantooth. Ultra-low power delay-insensitive circuit design. In IEEE International Midwest Symp. on Circ. and Syst., 2008, pp.503-506. F. Parsan and S. Smith. CMOS implementation of static threshold gates with hysteresis: A new approach. In IEEE/IFIP 20th International Conference on VLSI and System-on-Chip, 2012, pp. 41-45. G. E. Sobelman and K. Fant. CMOS circuit design of threshold gates with hysteresis. In IEEE International Symposium on Circuits and Systems, 1998, pp. 61-64. M. T. Moreira, B. S. Oliveira, F. G. Moraes and N. L. V. Calazans. Impact of C-Elements in Asynchronous Circuits. In IEEE International Symp. on Quality Electronic Design, 2012, pp. 85-90. M. Shams, J. C. Ebergen and M. I. Elmasry. Modeling and Comparing CMOS Implementations of the C-Element. In: IEEE Transactions on VLSI Systems, 6(4), 1998, pp. 563-567. M. T. Moreira and N. L. V. Calazans. Voltage Scaling on C-Elements: A Speed, Power and Energy Efficiency Analysis. In IEEE International Conference on Computer Design, 2013, pp. 329-334. R. P. Bastos, G. Sicard, F. Kastensmidt, M. Renaudin and R. Reis. Evaluating transient-fault effects on traditional C-Elements implementations. In Int. On-Line Testing Symp., 2010, pp. 34-40. M. T. Moreira, C. H. M. Oliveira, R. C. Porto and N. L. V. Calazans. Design of NCL Gates with the ASCEnD Flow. In Latin American Symposium on Circuits and Systems, 2013, 6 p. S. Hanson, B. Zhai, K. Bernstein, D. Blaauw, A. Bryant, L. Chang, K. K. Das, W. Haensch, E. J. Nowak and D. M. Sylvester. Ultralow voltage, minimum-energy CMOS. IBM Journal of Research and Development, 50(4-5), 2006, pp.469-490. R. Garg, and S. P. Khatri. A Novel, Highly SEU Tolerant Digital Circuit Design Approach. In IEEE International Conference on Computer Design, 2008, pp 14-20. M. T. Moreira, C. H. Menezes, R. C. Porto and N. L. V. Calazans. NCL+: Return-to-One Null Convention Logic. In IEEE International Midwest Symp. on Circ. and Syst., 2013, pp. 836-839.