FPGAs are programmable hardware devices that provide great flexibility, in-field updates, and rapid prototyping of digital designs. The majority of FPGA applications require high-speed processing, including digital filters and control loops with strict low latency requirements. Static Random Access Memory (SRAM)–based FPGAs are the most popular in the market and use volatile memory to store their configuration. These devices present a regular structure of configurable logic blocks and programmable switch matrices that allow for a mapping of logic functions.
Given their general purpose nature, and unlike ASICs, manufacturers cannot know beforehand what functionality will be synthesized on the FPGA chip, hindering the implementation of precautionary measures to cope with circuit degradation (e.g., guard-bands). This situation moves the issue of monitoring ageing processes to the application developers, who are required to know the tolerances of signal propagation times for their specific use case.
4.1 Ring Oscillator
The flexibility of FPGAs enables the implementation of monitoring techniques directly on the programmable hardware.
Ring Oscillators (ROs), which consist of an odd number of inverters connected in a ring, are one of the most common approaches to measuring digital signal propagation time. RO-based sensors are usually composed of an RO and a frequency-measuring circuit (Figure
7). The frequency at the output of an RO depends —at a given temperature and voltage — on the inverter propagation times
\(t_p\) and the number of gates
n, as shown in Equation (
6).
Zick and Hayes [
129] proposed an online concurrent sensing method to measure variations in physical parameters. By using an enhanced RO, an efficient counter, and control logic, the authors developed a compact sensor requiring only 8
Look-Up Tables (LUTs). A residue number system ring counter was implemented, as it requires fewer resources than a binary counter. The temperature sensitivity of the RO was increased to detect hotspots on the chip. RO sensors were placed regularly on a hexagonal tessellation all over the FPGA, together with a softcore, a timer, and a
Universal Asynchronous Receiver/Transmitter (UART). The authors measured propagation times and indirectly estimated a transistor current leakage profile, and localized dynamic power usage and temperature.
Sedcole and Cheung [
107] placed ROs in a matrix configuration to measure within-die delay variability. The structure enabled the encoding of regions under test into columns and rows. In such a dense configuration, ROs should only be active for short periods of time to avoid heating neighbouring sensors. A single counter, timer, and unified control logic performed the measurements. Pfeifer and Pliva characterized chip delays during design time [
95] and analysed the limitations of 28 nm FPGAs [
96] with ROs. To measure the oscillations, they implemented a method based on
Block RAM (BRAM), which was later formalized in an approach called “Reliability-on-Chip” [
93]. The approach requires a true-dual-port BRAM on the chip to undersample the oscillator outputs and a softcore to read the signal streams for calculating the delays. Li et al. [
76] not only studied the process variability of 28 nm FPGAs under NBTI by deploying ROs but also utilized the data extracted from accelerated ageing to train various ML models. By building datasets with information regarding the artificial ageing stress conditions and the RO sensor configurations, they found that the XGBoost model performed best across all conditions.
Lanzieri et al. [
69] developed an RO measurement module to perform a large-scale study of the propagation delay on 298 Xilinx Virtex-6 devices, which have been naturally aged and in operation as part of a linear particle accelerator. By comparing delay measurements of used and unused devices, the authors found evidence of effects caused by ageing mechanisms. Moreover, an analysis of the radiation exposure of the devices inside the accelerator showed that slower propagation delays are correlated to higher radiation doses.
The application of ROs as variability sensors extends to more modern node technologies as well. Maragos et al. [
79] explored increased intra- and inter-die variability on 16 nm FinFET FPGAs with various ROs controlled by an embedded Cortex-A53 CPU. After an ageing process of 8,000 h, Sobas and Marc [
111] measured the degradation of various 16 nm FinFET FPGAs by implementing an RO-based test bench. The authors evaluated the effects of static and dynamic stress, and derived an empirical degradation model for both modes that yields estimations with less than 10% relative error. When comparing their results to similar evaluations on 28 nm MOSFETs, they found that the smaller nodes show better reliability on static stress and that BTI was the predominant ageing mechanism (as opposed to HCI). A similar conclusion was reached by Bender et al. [
12] from assessing 16 nm FinFET Xilinx devices with a multi-temperature operational life testing method. By evaluating the impact of different temperatures, voltages, frequencies, and RO sizes, the authors managed to isolate the effects of various degradation mechanisms. Additionally, their results suggest that there is a contribution from the self-heating effect to BTI due to the lower heat dissipation of the transistor fins.
Naouss and Marc [
85] proposed a test bench to self-characterize the delay of LUTs. Their design allows stress signals to be injected into the
Circuit Under Test (CUT) to produce accelerated ageing. This feature was later used to independently study BTI [
87] and HCI [
86] impact by using different stress signals. They implemented a frequency measuring circuit using three asynchronous counters (one of N bits and two of K bits) and a clock reference. This circuit (Figure
8) allowed for counting the number of cycles in a given period of time as well as the duty cycle of the signal. An enable signal activated the counters and controlled the counting window. Counter A registered the number of cycles of the RO output signal, which was used to calculate the oscillation frequency. Counter B counted the number of clock cycles during the active time of the RO signal, from which the duty cycle was calculated. Measuring the signal duty cycle allowed for studying the impact of ageing on the rising and falling times.
ROs have been employed in offline tests as well [
4,
6,
21,
103]. Ahmed et al. [
4] used performance degradation to detect recycled FPGAs by exhaustively fingerprinting LUT delays. They synthesized ROs with routes configurable via SRAM. External equipment was used to control the test logic and read out the frequency of each oscillator. Ruffoni and Bogliolo [
103] focused on measuring delays of the internal FPGA wires. Two ROs were used, of which one included the wire structure under test. By comparing oscillation frequencies, the delay of the wire was derived. Although the authors employed external equipment, they argued that their method could potentially be operated solely on the device under test. Bruguier et al. [
21] proposed a non-invasive method to characterize FPGA performance, analysing the spectra of electromagnetic radiation caused by the ROs. A similar measurement approach was used by Amouri et al. [
6] to explore the impact of elevated temperatures and voltages on the performance degradation of FPGAs.
Discussion. RO sensors are relatively easy to implement and very versatile considering that they can provide insights into the effects of BTI and HCI when combined with methods such as multi-temperature operational life testing [
12]. Additionally, they can be tuned to indirectly measure other quantities beyond the propagation delay [
129]. Although the main sensing principle among approaches is similar, the literature presents multiple approaches to placing the sensors and counting the frequency. A trade-off exists with RO sensors: longer ring chains cover larger areas and require fewer measurement circuits but decrease sensor resolution. Care should be taken to avoid overheating the die or stress power rails by using too few stages in the rings, as this results in unrealistic measurements [
95]. A downside of ROs is that they measure delays of the FPGA resources that form the ring and not of the synthesized circuits, which may differ depending on the position and routing. Other techniques measure delays of the existing combinational logic instead, as we describe in the following sections.
4.2 Shadow Register
The usage of Shadow Registers (SRs) is a well-studied technique for delay characterization and degradation monitoring. Sensors are usually placed at the end of critical combinational paths in parallel to a destination register for detecting late transitions.
Li and Lach [
75], Valdés et al. [
120], and Leong et al. [
72] proposed to place an SR after the CUT that is clocked by a signal skewed from the destination register (Figure
9). By comparing the latched value on both registers and controlling the phase difference of the clock signals, they determine the delay of the CUT. Li and Lach [
75] varied the phase difference in runtime to characterize the FPGA propagation delay and built a histogram. Leong et al. [
72] implemented an online concurrent ageing monitoring sensor, which detected when the propagation delay was higher than a predefined threshold. Valdés et al. [
120] included an on/off signal to their concurrent sensor that interrupts the clock, enabling authors to differentiate the type of ageing mostly suffered by the sensor itself: static ageing (continuous monitoring) or dynamic ageing (periodic monitoring). The sensor functionality was initially tested by operating the circuit under different power supply voltages, which induced a change in its signal propagation delays but did not affect the sensor. The sensor from Valdés et al. was then validated [
119] by performing an accelerated ageing process on an FPGA.
The authors reported no significant frequency variations of the clocks on which the sensor reliability depended after the burn-in process. Leong et al. [
72] tested sensors by increasing the FPGA frequency and reducing the gap time.
Ghaderi et al. [
37] proposed ageing monitors clocked by a single “sensor clock,” which set the maximum allowed slack for combinational signals of critical paths. By injecting the CUT signal and a shifted version of it to an XOR, a positive pulse was generated on each transition of the CUT. The XOR was latched by a
Flip-Flop (FF) and triggered by the sensor clock, thereby detecting invalid transitions whenever the pulse occurred too late.
Pfeifer and Pliva [
94] presented an online concurrent delay-fault detection technique for combinatorial circuits. The authors used the D FFs at the input of on-chip BRAMs as SRs, which map the signals to memory rows for later analysis. The interconnect introduced a fixed delay between the destination and the SR to control the sensor sensitivity, and an on-board CPU performed the signal comparison.
Wong et al. [
123] presented a self-characterization method, with two registers around a combinational CUT, clocked in counter-phase. An XOR between the CUT output and the SR latched value produced the error signal. Transitions occurring after the first half of the test clock period were invalid. The authors leveraged on-chip clock generation to sweep the test clock frequency until the maximum was found. A non-concurrent circuit for start-up tests was also proposed and optimized in [
125], which stored test results of each region on the FPGA RAM.
Amouri and Tahoori [
7] implemented an ageing sensor to detect late transitions of combinational paths on a Virtex-5 FPGA. The sensor, illustrated in Figure
10, was composed of two edge-triggered D FFs clocked by the combinational output, with their inputs connected to the principal clock signal. Whenever an invalid change in the combinational signal occurred (i.e., during the active clock cycle), the sensor output was activated. Two FFs were used to detect rising and falling signal transitions. By the addition of independently configurable delay blocks on the combinational output signal and the clock signal, the authors could control the sensor sensitivity (i.e., how late after the rising clock signal a change in the combinational output is detected). When sensitivity is configured to a negative value, the sensor is turned into an early warning monitor, which checks that the signal is stabilized at least by a given time before the clock rises. In comparison with the previously described work, this method bears the great advantage of not requiring extra clock resources for the sensor. On the downside, the approach does not quantitatively measure the propagation delay but rather detects transitions only when slower than a given threshold.
Jiang et al. [
54] proposed a similar architecture but connected the inputs Q of both SRs to a shadow clock signal, which had a phase shift relative to the main clock. Given that the frequency of the main clock is known, the authors were able to derive the CUT delay by changing the phase angle between clocks and observing the sensor output.
Discussion. Shadow registers appear more complex to implement and place than ROs, but they provide higher versatility. These sensors enable measuring propagation delays of application-specific circuits (e.g., to characterize chips) as well as implementing late transition detectors. For the detection of transitions within a given time window, this method can verify circuit functionality under different conditions and can even run for continuous monitoring of critical combinational paths to detect degradations caused by BTI and HCI. While ROs either act as probes on unused die areas or temporally replace functioning circuits to test the underlying hardware, SRs are able to run in parallel to application-specific combinational logic. SRs enable at-speed tests, which is a great advantage as they can seamlessly be added concurrently to applications at the cost of additional resource usage.
4.3 Circuit Transition Probability
Propagation delay variations can be detected by observing the
Transition Probability (TP) of a circuit [
114,
115,
124]. Consider a combinatorial digital circuit with an output node
z. For each applied input combination, an output value
\(z(k)\) is produced. The
transition probability of z, denoted
\(D(z)\), is the probability of the state changing when the next input stimuli are applied [
38] on the following clock cycle. As
z can only be zero or one,
\(D(z)\) is the probability of
z experiencing a transition between these states:
where
\(p_z^{0 1}\) and
\(p_z^{1 0}\) indicate the probability of
z undergoing the
\(0 \rightarrow 1\), and
\(1 \rightarrow 0\) transitions respectively. From [
38], this probability can be calculated as the relative number of transitions that occurred in an interval of
N clock cycles with
\(N \rightarrow \infty\).
As an example,
\(p_z^{1 0}\) can be defined as
If the probabilities are approximated by observing the transitions during a large number
N of clock cycles, we obtain
Hence, \(D(z)\) can be estimated by the relative amount of rising and falling edges of z over a time interval N.
Ghosh et al. [
38] derived a theorem that relates the output value probability with the input value probability on a combinatorial circuit. If the input signals have a probability distribution independent of time (i.e., they form a stationary process), then the output signal
z will also have this characteristic. This means for stationary input signals that the TP
\(D(z)\) does not change in time.
Wong et al. [
124] estimated the maximum functioning frequency of an arbitrary circuit by measuring its output TP. With a careful selection of the input signals, they ensured a constant TP of the output under normal operating frequencies. They performed various measurements at increasing frequencies, up to the point at which changes in the TP could be observed. The change indicated that the maximum frequency was reached, and the circuit started to fail. The proposed setup was implemented on a 65 nm Altera Cyclone III FPGA, as shown in Figure
11. The CUT and registers were clocked from a test clock generator. On each clock cycle, the test vector generator injected input vectors, which propagated through the CUT, generating an output
\(z(k)\). In addition, the sample register captured a sample
\(y(k)\) from the CUT at a frequency
\(f_{clk}\). An asynchronous counter recorded the transitions in
\(y(k)\) over
N clock periods, later used to estimate
\(D(y)\) in the TP analyser circuit. When
\(f_{clk}\) is within the operational range and no faults occur on the circuit, then
\(y(k) = z(k)\) and the transition probabilities
\(D(y) = D(z)\). If
\(f_{clk}\) is increased above the CUT propagation time, then
y will start to sample values of the previous cycle (i.e.,
\(y(k) = z(k - 1)\)), thus, changing
\(D(y)\).
The method proposed by Wong et al. [
124] was later applied in the study of circuit degradation under accelerated stress conditions by Stott et al. [
114,
115]. The authors implemented multiple CUTs on a pair of Cyclone III FPGAs, which could be measured using the TP method and allowed to be electrically stressed by an input signal. Environmental stress with an ageing acceleration factor of 180 was applied to the chips by means of elevated temperature and core voltage, which sped up the NBTI process. Additionally, the CUT was subjected to electrical stress by controlling its switching activity through the input signals, which triggered NBTI, TDDB, and HCI degradation mechanisms. Their experiments revealed a circuit speed reduction of up to 15% by the end of the test schedule. This stress condition degraded LUTs stronger than interconnects. Moreover, the method was verified against an RO-based (see Section
4.1) frequency measurement.
Discussion. Measuring changes in the TP of a CUT output allows for detecting its maximum operational frequency. Unlike ROs (Section
4.1), this method measures the propagation delay of the FPGA using existing circuits; thus, the evaluation of the impact of BTI and HCI processes on the application is more direct. On the one hand, this technique has the advantage of being implementable with common resources and only requires a controllable clock signal. On the other hand, it requires injecting precise test vectors, which depend on the CUT and need to be stored or consistently generated. As custom inputs are needed, this technique affects the system operation and can only be implemented during a testing period.