Nothing Special   »   [go: up one dir, main page]

DFT Timing Design Methodology For At-Speed BIST: February 2003

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4012363

DFT timing design methodology for at-speed BIST

Conference Paper · February 2003


DOI: 10.1109/ASPDAC.2003.1195122 · Source: IEEE Xplore

CITATION READS

1 928

6 authors, including:

Yasuo Sato Kazumi Hatayama


Kyushu Institute of Technology Gunma University
75 PUBLICATIONS   845 CITATIONS    84 PUBLICATIONS   512 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Yasuo Sato on 09 January 2015.

The user has requested enhancement of the downloaded file.


DFT Timing Design Methodology for At-Speed BIST

Yasuo Sato1), Motoyuki Sato1), Koki Tsutsumida1), Masatoshi Kawashima1), Kazumi Hatayama2),
and Kazuyuki Nomoto3)

1) Device Development Center 2) Central Research Laboratory 3) Semiconductor & Integrated Circuits
Hitachi, Ltd. Hitachi, Ltd. Hitachi, Ltd.
Ome-shi, Tokyo, Japan, 198-8512 Kokubunji-shi, Tokyo, Japan, 185-8601 Kodaira-shi, Tokyo, Japan, 187-8588
Tel: +81-428-33-2017 Tel: +81-42-327-7877 Tel: +81-42-320-7300
Fax: +81-428-33-2157 Fax: +81-42-327-7736 Fax: +81-42-327-8638
e-mail: ysatoh@ddc.hitachi.co.jp e-mail: k-hataya@crl.hitachi.co.jp e-mail: nomoto-kazuyuki@sic.hitachi.co.jp
motoyuki@ddc.hitachi.co.jp
k-tsutsu@ddc.hitachi.co.jp
masato-k@ddc.hitachi.co.jp
small modification of the original layout of user logic. We
applied this methodology to our industrial design chips
Abstract using our custom tool “Singen”, and confirmed their short
Logic BIST is well known as an effective method for low cost design term.
testing. However, it is difficult to realize at-speed testing, as it
requires a deliberate timing design in regard to logic design
Distribution
and layout of the chip. This paper presents a timing design
methodology for at-speed BIST, using a multiple-clock domain
scheme. Some experimental test results of large industrial Benign
designs using our custom tool “Singen”, will also be shown.

1. INTRODUCTION
Fatal
The increase of timing-related failure becomes a crucial
issue in the deep sub-micron (DSM) technology. Fig.1
shows the distribution of defect [1][2], which shows that Particle size
potential of failure increases as particle (defect) size Fig.1 The distribution of defect
decreases. Moreover, small defects, which had been benign
in the conventional process, tend to cause a fatal timing
failure in high speed LSI’s. To detect them, at-speed testing 2. BIST DESIGN FLOW
has been investigated intensively [3]-[7].
2.1 The DFT structure
Logic BIST is well known as an effective method for
low cost testing because it enables us to test a high-speed Fig.2 shows our DFT structure, which is based on
design chip with a low speed ATE. Some papers in regard to STUMPS [9]. TPG is a test pattern generator, which is based
at-speed BIST have been published. They show multiple- on LFSR (Linear Feedback Shift Register). MISR (Multiple
clock domain schemes to test DUT (device under test) at Input Signature Register) is used as a pattern compressor.
system cycle [6][7]. The length of scan chain is reduced to 200-300 for realizing
short testing time, and it is independent of input pin number.
However, it is difficult to realize at-speed BIST, as it Three types of testing clock resources are available.
requires a deliberate timing design in regard to logic design
and layout of the chip. Few papers have reported in regard (1) PLLIN: the clock input of PLL (Phase Lock Loop)
to timing design of DFT circuits or clock design [8]. Ad hoc It is used for at-speed testing.
approaches have been adopted in industrial design. We
need special care to satisfy restrictions such as set-up time (2) TCK: the clock input for boundary scan test
or hold-time. Clock design is the most difficult one. Clock It is used for slow-speed testing (DC-BIST), and is also
network should be designed to guarantee that any logic gate used for slow scan shifting.
should operate properly in every testing mode (at-speed,
medium speed, and slow speed). (3) C1,C2: the clock inputs for debugging

In this paper, we will show our DFT timing design They are used for fast BIST (AC-BIST), which may not
methodology for at-speed BIST using a multiple-clock be at-speed. However, it is faster than TCK. Test timing
domain scheme. We introduce the layout design of the DFT is controllable according to the difference between C1
circuits and the clock network. They were realized with and C2. It is known that the skew of two pins on ATE
can be adjusted to fair level ordinary. As PLL doesn’t
Logical Design
operate well at slower speed than the specification, this
function is essential for debugging.
CIF (external clock interface) generates the test clocks from DFT rule check
(C1,C2) or TCK. TGN (test clock generator) generates at-
speed test clocks from PLL. TCU (test control unit) controls Test point insertion (TPI)
them.

DFT synthesis

Scan Chain
M
TA I STA (pre-layout)
T
A Scan Chain S
TCK P
R
P G Floor plan
Scan Chain
C
I
Auto placement
T
C1 C F Scan Chain
U Test clock synthesis
C2
T System Clock Scan chain reordering
PLLIN G
PLL N Auto routing

Fig.2 The DFT structure STA (post-layout)

2.2 The BIST design flow BIST fault sim. &


Reseeding
Fig.3 shows our BIST design flow. It is mainly applied to
ASIC’s in 0.18um technology. Therefore, short design-term
is strongly required. Their frequencies are ordinary from Logical equivalence check Test pattern verification
30MHz to 500MHz. The DFT is constructed of logic BIST,
memory BIST and the boundary-scan [10]-[15].
Fig.3 BIST design flow
Firstly, DFT rule checker will find design rule violations.
For example, a gate-loop, an asynchronous “set” or “reset” The signals in Level-1 should be treated carefully. For
signal, a gated-clock, and a negative-edged flip-flop should instance, some of them need manual layout and others use
be modified to satisfy the rules. timing driven layout (TDL) and clock-tree synthesis (CTS).
The signals in Level-2 will be realized using TDL or manual
Secondly, the test point insertion (TPI) is applied. TPI floorplan of blocks. The signals in level-3 don’t need any
will improve the fault coverage with small number of special care for their timing. However, the number of fan-
additional gates. It was developed to minimize the gate out should be optimized.
overhead and delay overhead [12].
After TDL is completed, BIST pattern generation and fault
Then, DFT synthesis inserts several control blocks (TAP, simulation will be performed. It is consisted of the random
TCU, CIF, TGN), scan chains, TPG and MISR into the pattern-based BIST and reseeding-based BIST (NPG:
original logic. At the same time, it will output the timing neighborhood pattern generation [11][15]). The detail of at-
script file, which will be used for timing driven layout speed BIST scheme will be described in the following
(TDL) and static timing analysis (STA). To realize this section.
function, all of the test signals were categorized into the
following three levels:
Level-1: The signals that should operate at the speed of 3. AT-SPEED BIST SCHEME
system cycle (at-speed):
3.1 Multiple-clock domain testing scheme
TGN, CIF, test clocks, and scan enable signal Fig.4 and Fig.5 show our at-speed BIST scheme for
Level-2: The signals that should operate at the speed of scan multiple-clock domain. Fig.4 shows a case when TI clock
shifting cycle: launches a pulse, and the same clock (TI) captures it. Fig.5
shows a case when TI launches a pulse, and TJ captures it.
JTAG signals, scan chains, TPG, and MISR Thus, each clock pair is tested respectively, whereas other
Level-3: The signals that need only DC-level speed: clocks are frozen in a capture window. TI and TJ operate at
rather slow speed in scan-in mode or scan-out mode. The
Mode control signals
timing between the launch and the capture (Tij) is the same Dij ≥ Thold + ∆IJ (5)
as system clock cycle (i.e. at-speed).
This scheme has the following features:
(1) The test control is simple. launch capture

(2) The two delay testing methods (the skewed-load test


[16][17][18], and the broad-side test [19]) are available. TI

(3) Pair of clocks, for instance, one of which is from a PLL SEN
and the other is from an external clock pin, doesn’t
synchronize with each other. So they can be tested at
slower speed (AC-BIST or DC-BIST). FF

(4) The power and noise during scan shifting are reduced. Scan-in Capture window Scan-out
(5) The debugging and diagnosis of testing is viable.
Fig.4 Test timing (TI-TI)
The only drawback of this method is an increase of testing
time. However, a design chip usually consists of a few main
clocks and many sub-clocks. If so, the testing time mostly launch
owes to the main clocks, and others contribute little. Our
TI
experiment [14] also confirmed this phenomenon.
capture
Fig.6 shows our multiple-clock domain scheme. There are
many clock domains, which have different delay length (Di). TJ
The depth of each cone shows Di. Each clock is supplied
SEN
from a PLL or an external clock pin. In at-speed testing, the
clock in a capture window is supplied from a PLL (bold
line) as the system clock operation. Each path from domain- FF
I to domain-J should be designed to operate at the speed of
Tij. Scan-in Capture window Scan-out

However, when the clock is supplied from TCK or C1-C2 Fig.5 Test timing (TI-TJ)
in DC-BIST or AC-BIST, the clocks go through other paths
(Fig.6). So the clock skew from domain-I to domain-J can be
as large as ∆IJ (=Di−Dj). Therefore, test timing (TDC-BIST or PLLI
TG
TAC-BIST) should be greater than Tij + ∆IJ. From Fig.4, 5 and
6, we know that SEN (scan enable) should be enabled PLL Domain-I
between launch and capture. We generate SEN from a clock
resource and treat it like another clock. According the
discussion above, we derive the following restrictions; Domain-J
∆ IJ
TI (launch) < SEN (low) < TJ (capture) TCK CIF

= TJ (launch) + Tij (1) TCU


C1
T DC-BIST
IJ ≥ Tij + ∆IJ (2) C2
T AC-BIST
IJ ≥ Tij + ∆IJ (3)
at_speed
T = Tij (4) Domain-K
We should remark that relation (1) is needed for the Fig.6 Multiple-clock domain model
skewed-load test. If we only use the broad-side test for delay
testing, relation (1) can be neglected. In our example in Dij
FF1 FF2 FF3 FF4
section 4, we have used both methods combined to get high
delay fault coverage. SID Q SID Q SID Q SID Q

From (1)-(4), we conclude that reducing ∆IJ is crucial in


timing design. It is also effective to reduce hold-time TI TI TJ TJ
violations during scan shifting between different clocks as
shown in Fig.7. The delay from FF2 to FF3 (Dij) should be
larger than the hold-time of FF3 (Thold). Fig.7 scan-chain timing
3.2 Reducing clock skew
Domain-I
In the previous section, we have shown that all clocks
should be treated as if they were in a domain during AC- TCK
TCU
BIST or DC-BIST mode. To ensure the signal propagation C1
C2
between different clock domains during AC-BIST or DC- CIF
BIST, clock skew ∆IJ (I=1 to N, J=1 to N) should be PLLIN PLL T-I
SEN-I
TGN
minimized at reasonable level.
Fig.8 shows our concept of reducing ∆IJ. We insert delay
gates (∆Dj) between test pins and the selectors that switch
PLL clock and test clock. This process is performed after
T-K SEN-K Domain-J
generating system clock domains. Therefore, it doesn’t
effect system clock delay or skew at all. The layout T-J
procedure using a commercial CTS (clock tree synthesis)
tool will be as follows; SEN-J
st
1 step: create system clock domains. Domain-K
(specify clock delay and skew) T-K system clock
nd scan enable
2 step: create test clock domain (all clocks are treated
Fig.9 Multi-clock domain layout
as a domain) preserving each system clock domain-I.
After the layout design is completed, STA (Static Timing
(specify the longest delay of Di + α) Analysis) in regard to the timing restrictions described in
3rd step: create scan enable trees corresponding to each section 3.1 is performed. The STA script is made by DFT
synthesis automatically. It will be as follows;
system clock domain-I.
(a) Script for scan enable to check restriction (1)
(specify the delay as Di + α, the skew as SKi)
- Define a clock that starts from port C2 to TI (I=1,N).
Fig.9 shows a layout of Fig.8. The clocks of domain-I and
domain-J are supplied from PLL. The clock of domain-K is - Check setup and hold time at each flip-flop considering
supplied from a clock pin (T-K). The revised clock skew scan enable to be a data path triggered by the clock
(∆IJnew) will be as follows; (Fig.10).

∆IJnew = (Di + ∆Di) − (Dj + ∆Dj) FF


SEN-I
= (Di − Dj) − (∆Dj − ∆Di) delay
SEN

= ∆IJ − (∆Dj − ∆Di) (6) C2


CIF TI
TG
PLLI
Fig.10 timing check of scan enable
PLL Domain-I
(b) Script for capture window to check restriction (2)& (3)
- Set scan enable (SEN-J) as ‘0’ (capture mode).
Domain-J
- Define a clock that starts from port C2 to TI (I=1,N).
TCK CIF
- Check setup and hold time between TI and TJ.
TCU ∆Dj
C1 ∆ IJ new Other clock pairs are defined as false paths, which will
C2 not be checked. (See section 3.1 (3) and Fig.5)
(c) Script for scan shifting to check restriction (5)
∆Dk Domain-K - Set scan enable as ‘1’ (scan in or scan out mode).
- Define a clock that starts from port TCK.
Fig.8 Multi-clock domain (optimized)
- Check setup and hold time at each flip-flop.
4. EXPERIMENTAL RESULTS ASIC3 was tested at 400MHz (launch-capture speed). It
Fig.11 shows the distribution of delay and clock skew for consists of 3 test clocks:
8 clock domains of 700k gate ASIC. We applied the method • CK1: 400MHz main clock
introduced in section 3.2 using a commercial CTS tool.
Preserving the original clock domain cones, the delay from • CK2: 100MHz sub-clock
test clock pin (TCK or C1-C2) was leveled around the • TT: 50MHz dedicated test clock that is used for the
length of 10.58 ns with 2.189 ns skew (all clocks were boundary-scan test, and memory BIST
treated as one domain). This level was enough for our
54Mhz design. The clock layout was performed manually using the tree
buffering technique and its skew was reduced to less than
Fig.12 shows the analysis of turn around time. For each 100 ps. The scan enable signals were designed in the same
design step, we need some manual operation such as way. The scan shift worked at the speed of 20ns, and the
optimizing the parameters for the first time. So we needed length of scan chain was within 300 flip-flops. The timing
17 hours (excluding layout time). In layout design, CTS violation of hold-time, most of which depend on MUXD-
needed 5 hours. On the final design stage, manual work scan structure, occurred frequently. However, timing tuning
should be almost negligible, and TPI will need less time. gates were inserted automatically and their layout was
Fig.13 shows the results of evaluation data. Their stuck-at performed incrementally in several hours.
fault efficiency was 99.67%, 99.97% and 99.98%,
respectively. Their transition fault efficiency was 96.94%,
Item ASIC1 ASIC2 ASIC3
94.67% and 98.35%, respectively. They were acquired using
the random-based BIST and reseeding-based BIST (NPG: Gate count 1495K 1050K 1272K
neighborhood pattern generation). The BIST pattern count
Frequency(MHz) 51 136 400
of ASIC3 was reduced less than others with optimization.
Scan chain length 300 214 300

BIST pattern count (DC) 510K 510K 133K


clock delay (ns) skew (ns)
BIST pattern count (AC) 510K 510K 314K
T0 7.79 0.318
Fault Efficiency (BIST-DC) 98.30% 99.74% 99.88%
T1 1.64 0.003
Fault Efficiency (reseed-DC) 99.67% 99.97% 99.98%
T2 2.17 0.035
Fault Efficiency (AC) 96.94% 94.69% 98.35%
T3 2.10 0.042
BIST sim time (hr) 11.0 4.6 8.5
T4 3.44 0.04

T5 3.15 0.146 Fig.13 Implementation results

T6 2.85 0.161

T7 1.23 0.000 Item ASIC3

Fig.11 clock skew distribution TPG 2.2%

MISR 0.8%
Item Manual (hr) CPU (hr) Scan chain 44.7%

Boundary Scan 2.8%


DFT rule check 0.3 0.2
Scan enable, TT 29.2%
TPI 1.0 8.0
Control 19.9%
DFT synthesis 0.5 0.2
TPI 0.4%
Formality 0.5 0.2
else 1.6%
BIST sim. 1.0 3.3
Sum. 100%
Verification 1.0 1.0

Sum. 4.3 12.9 Fig.14 Wiring overhead evaluation

Fig.12 Turn around time


The gate overhead of ASIC3 was 8.5%. This includes the
overhead of scan-chain, TPG, MISR, boundary scan, TAP,
TCU, TT-related gate, and fan-out gates for test control [5] J.Braden, Q.Lin, B.Smith, “Use of BIST in SUN
signals. The wiring overhead of ASIC3 is shown in Fig.14. FIRETM servers”, Int. Test Conf., 2001, pp. 1017-1022
As today’s LSI has many wiring-layers and their capacity [6] G.Hetherington, T.Fryars, N.Tamarapalli, M.Kassab,
defines its chip size, we are more interested in the wiring A.Hassan, J.Rajski, “Logic BIST for large industrial
overhead than the gate overhead. DFT synthesis tool made designs: real issues and case studies”, Int. Test Conf.,
the signal names to correspond to the items in Fig.14. 1999, pp. 358-367
Therefore, it is easy to extract them from layout. [7] B.N.-Dostie, “Design for at-speed test, diagnosis and
measurement”, Kluwer Academic Publishers, 1999
5. CONCLUSIONS [8] X.Gu, et al., “An Effort-Minimized Logic BIST
In this paper, we have shown a timing design Implementation Method”, Int. Test Conf., 2001, pp.
methodology for at-speed BIST, and some experimental test 1002-1010
results of industrial designs using our custom DFT tool [9] P.Bardell, W.McAnney, J.Savir, “Built-in test for
“Singen”. VLSI: pseudorandom techniques”, John Wiley and
Sons, 1987
(1) An at-speed BIST scheme was presented. It tests DUT [10] Y.Sato, T.Ikeya, M.nakao, T.Nagumo, “A BIST
for each clock pair. Timing restrictions for this scheme approach for very deep sub-micron (VDSM) defects”,
were extracted. Reducing the clock skew between Int. Test Conf., 2000, pp. 283-291
different clock domains was crucial. [11] M.Nakao, Y.Kiyoshige, K.Hatayama, S.Fukumoto,
(2) We showed a systematic layout approach to reduce the K.Iwasaki, “Deterministic built-in test with
clock skew described in (1). Actual experimental data neighborhood pattern generator”, IEICE TRANS.
was shown. Short design time was confirmed. INF.&SYST., Vol.E85-D, No.5, 2002, pp.874-883
[12] N.Nakao, S.Kobayashi, K.Hatayama, K.Iijima,
(3) Implementation results for three ASICs were introduced. S.Terada, “Low overhead test point insertion for scan-
400Mhz at-speed test was achieved, and high fault based BIST”, Int. Test Conf., 1999, pp. 348-357
efficiency of 99.67 - 99.98% was acquired. [13] M.Nakao, Y.Kiyoshige, K.Hatayama, Y.Sato,
T.Nagumo, “Test generation for multiple-threshold
gate-delay fault model”, Asian Test Symposium, 2001,
pp. 244-249
ACKNOWLEDGEMENTS
[14] K.Hatayama, M.Nakao, Y.sato, “At-speed Built-in Test
Many other people helped our development and for Logic Circuits with Multiple Clocks”, Asian Test
evaluations. The authors would particularly like to thank Symposium, 2002, in press.
Toyohito Ikeya, Tadayoshi Yamada, Takashi Natabe, [15] K.Hatayama, M.Nakao, Y.Kiyoshige, K.Natsume,
Masahiro Takakura, Haruki Ishida, and Michinobu Nakao Y.Sato, T.Nagumo, “Application of High-Quality
for their invaluable contributions. Built-in Test to Industrial Designs”, Int. Test Conf.,
2002, pp. 1003-1012
REFERENCES [16] K.-T Cheng, S.Devadas, K.Keutzer, “Delay-fault test
[1] C.H.Stapper, “Modeling of defects in integrated circuit generation and synthesis for testability under a
photolithographic patterns”, IBM J. Res. Develop, standard scan design methodology”, IEEE
Vol.28 No.4 July 1984 Transactions on Computer-Aided Design of Integrated
[2] International Technology Roadmap for Circuits and Systems., 12(8), 1993, pp. 1217-1231
Semiconductors, 1999 Edition [17] J.Savir, “Skewed-load transition test: part I, calculus”,
[3] T.G.Foote, D.E.Hoffman, W.V.Huott, T.J.Koprowski, Int. Test Conf., 1992, pp. 705-713
B.J.Robbins, M.P.Kusko, “Testing the 400MHz IBM [18] J.Savir, “Skewed-load transition test: part II, coverage”,
generation-4 CMOS chip”, Int. Test Conf., 1997, pp. Int. Test Conf., 1992, pp. 714-722
106-114 [19] J.Savir, “On broad-side delay testing”, VLSI Test
[4] P.S.Gillis, T.S.Guzowski, B.L.Keller, R.H.Kerr, “Test Symposium, 1994, pp. 284-290
methodologies and design automation for IBM ASICs”,
IBM J. Res. Develop, Vol.40 No.4 July 1996

View publication stats

You might also like