6.1 Reduction in Adversarial Noise Sensitivity
Earlier works [
31,
39] have identified a metric termed as
Adversarial Noise Sensitivity (
ANS) to quantify the sensitivity of each layer of a neural network to adversarial perturbations. ANS for a layer
\(l\) in a neural network, subjected to an adversarial attack, is defined in terms of an error ratio as follows:
where,
\(A^l\) and
\(A_{adv}^l\) are respectively the clean and adversarial activation values of the layer
\(l\). ANS is a simple metric to evaluate how each layer contributes to the net adversarial perturbation during the gradient propagation.
In Figure
6 we show results for a VGG16 BNN (trained using state-aware training) mapped via
SwitchX, and a standard VGG16 BNN mapped normally onto 32
\(\times\) 32 crossbars (
\(R_{MIN} = 20 k\Omega\) and ON/OFF ratio = 10), both adversarially perturbed using HH mode of FGSM attack. We plot the ANS values for all the convolutional layers and the final fully-connected layer (marked as FC). For the case of
SwitchX-mapped BNN we observe lower ANS values implying a reduced error amplification effect [
31] and hence, lesser sensitivity and better stability to the induced adversarial perturbations. However, the authors in [
39] also agree that a lower value of ANS across the layers of a neural network would not necessarily imply improved adversarial robustness under all circumstances. Hence, in Section
6.2, we propose a novel graphical approach by plotting
“robustness maps” to evaluate the adversarial robustness of BNNs upon
SwitchX mapping onto non-ideal crossbars.
6.2 Robustness Analysis
Earlier approaches to quantify adversarial robustness of neural networks have been based on
Adversarial Loss (
AL) metric which is the difference between the clean accuracy and the adversarial accuracy for a given value of
\(\epsilon\) [
2,
38,
39]. Note, clean accuracy is the natural accuracy of a neural network when not under attack. A reduction in the value of AL is said to improve the adversarial robustness of the network. However, AL is not always a suitable metric for the assessment of robustness since reduction in AL does not convey whether the cause is a decrease in clean accuracy or an increase in adversarial accuracy or both. For instance, while evaluating a network with CIFAR-10 dataset, we find the clean and the adversarial accuracies (for a particular
\(\epsilon\)) to be
\(10\%\) each. Then the value of AL is zero implying that the network is exceptionally robust. However, this is absurd since a
\(10\%\) accuracy is random for CIFAR-10 dataset implying that the network is arbitrarily predicting and is not trained.
In this work, we evaluate robustness of the mapped BNNs on crossbars graphically as shown in Figures
7 and
8. For a specific mode of attack (SH or HH) and a given crossbar size, we plot
\(\Delta ~Clean~Accuracy\), the difference between clean accuracy of the mapped network in question and the corresponding clean accuracy of the software baseline, on the
x-axis.
\(\Delta ~Adversarial~Accuracy\) (for a particular
\(\epsilon\) value) which is the difference between the adversarial accuracy of the mapped network in question and the corresponding adversarial accuracy of the software baseline is plotted on the
y-axis. We term this plot as a
“robustness map”. The value of
\(\Delta ~Clean~Accuracy\) is negative since BNNs when mapped on hardware suffer accuracy loss owing to non-idealities. The region bounded by the line
\(y=-x\) and the
y-axis denotes that the absolute increase in the adversarial accuracy is higher than the absolute degradation in the clean accuracy. If a point lies in this region closer to the
y-axis, it implies greater adversarial accuracy with lower clean accuracy loss and hence, greater robustness. Therefore, this is our
favorable region. As we move farther away from the
y-axis in the favorable region, the robustness reduces as the neural network suffers from high loss of clean accuracy, and this has been shown by the variation in color gradient from dark to light-brown (dark shade implying higher robustness). Likewise, the region bounded by the line
\(y=x\) and the
y-axis is where the mapped-network is highly vulnerable to adversarial attacks and hence, the
unfavorable region. Here, as we move farther away from the
y-axis in the unfavorable region, the robustness reduces and the degree of
unfavorability increases. This has been shown by the variation in color gradient from dark to light-yellow (darker shade implying higher robustness).
This approach to assess robustness of a network is comprehensive and accurate since, it takes into account the cumulative impact of both clean accuracy and adversarial accuracy (which is a strong function of the clean accuracy). The closer a point is to the region marked with dark-brown color, the better the robustness of the neural network. Note, in Figures
7 and
8, the circular points correspond to mappings on 16
\(\times\)16 crossbars while the triangular points correspond to mappings on 32
\(\times\) 32 crossbars. All the details pertaining to the baseline software models have been listed in Table
3. Furthermore, the results in Figures
7 and
8 are for BNNs mapped onto crossbars having ReRAM device ON/OFF ratio of 10 with
\(R_{MIN} = 20k\Omega\).
Figure
7 shows the robustness maps for BNNs based on VGG16 network with CIFAR-100 dataset for both SH and HH modes of attack. Figure
7(a) and (b) pertain to FGSM attack with
\(\epsilon\) varying from 0.05 to 0.3 with step size of 0.05. We find that
SwitchX imparts greater clean accuracy (
\(\sim\)3% for 32
\(\times\) 32 crossbar) as well as better adversarial accuracies on hardware for both modes of attack, with the points corresponding to
SwitchX situated closer to the dark-brown portion favorable region than the corresponding points for Normal Mapping. This is a consequence of the reduction in non-ideality factor in case of
SwitchX with respect to Normal Mapping as discussed in Section
3.1. Note, the points for 32
\(\times\) 32 crossbars are situated farther from the favorable region than the corresponding points for 16
\(\times\) 16 crossbars. This is owing to greater non-idealities that exist in case of a 32
\(\times\) 32 crossbar than a 16
\(\times\) 16 crossbar (Figure
4). We further observe that the points for
SwitchX BNN for a given crossbar size are more closely packed than the corresponding points for Normal BNN. This implies that even on increasing the perturbation strength (
\(\epsilon\)), lesser adversarial loss is observed in case of
SwitchX BNN.
Figure
7(c) and (d) also present similar results but for a PGD attack with
\(\epsilon\) varying from 2/255 to 32/255 with step size of 2/255. In this case, the robustness is very high for SH mode of attack as compared to HH mode of attack, with points corresponding to 16
\(\times\) 16 crossbars situated inside the favorable region. Similar to the case of FGSM attack,
SwitchX outperforms a Normal BNN in terms of robustness for both modes of attack. Here, points for different
\(\epsilon\) values, given a style of mapping and crossbar size, are more closely packed than the corresponding points of FGSM attack. This implies that hardware non-idealities interfere more with PGD attacks than FGSM attacks resulting in lesser accuracy degradation.
Effect of varying \(R_{MIN}\) of ReRAM devices: Previous works such as [
3,
6] have shown that the impact of resistive crossbar non-idealities (or NF) decreases upon increasing
\(R_{MIN}\) of memristive devices. To this end, we plot robustness maps in Figure
9(a) to compare Normal and
SwitchX BNNs upon increasing
\(R_{MIN}\) from 20
\(k\Omega\) to 50
\(k\Omega\). We find that for both Normal and
SwitchX BNNs, data points corresponding to
\(R_{MIN} = 50 k\Omega\) (triangle) are closer to the favorable region than for
\(R_{MIN} = 20 k\Omega\) (circle) for the same ON/OFF ratio of 10, signifying reduced impact of non-idealities on the crossbar-mapped network models. Further, for
\(R_{MIN} = 50 k\Omega\),
SwitchX BNNs show performance improvement in terms of robustness over Normal BNNs (
\(\sim\)1% improvement in clean accuracy and adversarial accuracies for HH based FGSM attack on 32
\(\times\) 32). Also, in Figure
9(a), we increase
\(R_{MIN}\) from 20
\(k\Omega\) to 50
\(k\Omega\) and reduce the device ON/OFF ratio to 5 (diamond-shaped points). We find that even at a lower ON/OFF ratio of 5,
SwitchX BNN outperforms the corresponding Normal BNN in terms of robustness. Another important observation from Figure
9(a) is that robustness of a neural network on non-ideal crossbars is a stronger function of
\(R_{MIN}\) than
\(R_{MAX}\). This is because for both
SwitchX and Normal BNNs, improvement in robustness is greater on traversing from the circular to the diamond-shaped points than from the diamond-shaped points to the triangular points.
Efficacy of SwitchX combined with state-aware training: Figure
8 shows the robustness maps for BNNs based on VGG16 network with CIFAR-10 dataset for both SH and HH modes of FGSM attack. Here, we find that there is a significant benefit in terms of improvement in clean accuracy (
\(\sim\)10%) and adversarial accuracies (
\(\sim\)2–5%) due to
SwitchX for a 32
\(\times\) 32 crossbar with respect to Normal Mapping. Similar to the case for CIFAR-100 dataset, we find that
SwitchX outperforms a Normal BNN in terms of robustness for both modes of attack. We now analyze cases when
SwitchX is combined with state-aware training as well as adversarial training. The results for FGSM attack are summarized:
6.2.1 With Adversarial Training.
From Figure
8(c) and (d), we find that
SwitchX combined with adversarial training significantly boosts the robustness of the mapped BNN both in terms of clean as well as adversarial accuracy improvement. For 16
\(\times\) 16 crossbar, the points (in red) for different
\(\epsilon\) lie in the close vicinity of the boundary of the favorable region while for 32
\(\times\) 32 crossbar, the rise in clean accuracy with respect to Normal BNN is very high (
\(\sim\)20%). Furthermore, given a style of mapping and crossbar size, the points for adversarial training are more closely packed than the corresponding points of standalone
SwitchX BNN or Normal BNN, implying lesser accuracy losses on increasing perturbation strength of the attack.
6.2.2 With State-aware Training.
As discussed in Section
5, state-aware mapping combined with
SwitchX can lead to an increase in the proportion of HRS states in the crossbar instances, and thus a reduction in the non-ideality factor of the crossbars. From Figure
8(a) and (b), we find that this approach significantly boosts the robustness of the mapped BNN both in terms of clean as well as adversarial accuracy improvements. For both 16
\(\times\) 16 and 32
\(\times\) 32 crossbars, the points (in red) for different
\(\epsilon\) lie in the close vicinity of the boundary of the favorable region. We find that for 32
\(\times\) 32 crossbar, the rise in clean and adversarial accuracies is so large that it becomes comparable to the case of standalone
SwitchX BNN mapped on a smaller 16
\(\times\) 16 crossbar. Overall, we find the rise in clean and adversarial accuracies is
\(\sim\)35% and
\(\sim\)6–16% greater than Normal BNN, respectively. In this case also, the points for different
\(\epsilon\) values, given a style of mapping and crossbar size, are more closely packed than the corresponding points of standalone
SwitchX BNN or Normal BNN, implying lesser accuracy losses on increasing perturbation strength of the attack. Interestingly, we find that this approach emerges as a stronger defense against adversarial attacks than
SwitchX combined with adversarial training (a state-of-the-art software defense), the defensive action being more pronounced for larger crossbar sizes (32
\(\times\) 32 in Figure
8(a) and (b)).
Furthermore, we observe similar results in Figure
9(b) using the large and complex TinyImagenet dataset with VGG16 BNN. For crossbar sizes of 16
\(\times\) 16 and 32
\(\times\) 32, we find
SwitchX BNNs to outperform the corresponding Normal BNNs in terms of both clean and adversarial accuracies. There is
\(\sim\)5.5% improvement in clean accuracies and
\(\sim\)1–2% improvement in FGSM adversarial accuracies for HH mode of attack. With state-aware training and
SwitchX mapping combined, the improvements on 16
\(\times\) 16 and 32
\(\times\) 32 crossbars shoot upto
\(\sim\)21.2% for clean accuracies and
\(\sim\)4.5–8.3% for adversarial accuracies.
Effect of varying device ON/OFF ratio in crossbars: The results in [
6] show that on increasing the ReRAM device ON/OFF ratio (increasing the value of HRS at a constant value of LRS) in crossbars, the non-ideality factor decreases. This should translate to robustness benefits for our
SwitchX approach that in itself increases the feasibility of HRS states in crossbars. In Figure
10, we map BNNs onto 32
\(\times\) 32 crossbars having device ON/OFF ratios of 10 and 100 respectively. We then compare the robustness of a standalone
SwitchX BNN and a
SwitchX BNN combined with state-aware training for a VGG16 network with CIFAR-10 dataset for both cases. We find that mapping BNNs on crossbars having higher ON/OFF ratios boosts the robustness of the mapped BNNs against both SH and HH based adversarial attacks (
\(\sim\)2–4% for standalone
SwitchX). Further, there are
\(\sim\)6% and
\(\sim\)2% improvements in clean accuracy respectively for standalone
SwitchX and
SwitchX combined with state-aware training.
Effect of synaptic device variations on larger crossbars with greater \(R_{MIN}\):
Here, we increase
\(R_{MIN}\) of the NVM devices to
\(200 k\Omega\) (at ON/OFF ratio of 10) so that the impact of interconnect parasitic non-idealities can be minimized, thereby making simulations on larger crossbar sizes with a modest runtime feasible. In this scenario, the synaptic device variations become predominant and determine the robustness of the crossbar-mapped BNN models. For the robustness maps in Figure
11, we assume the NVM device variations to be Gaussian with
\(\sigma /\mu = 20\%\) and the BNN mappings are carried out on 64
\(\times\) 64 and 128
\(\times\) 128 crossbars. The results are consistent with that in Figure
8(a) and (b), indicating that the
SwitchX method is applicable to BNN mapped onto larger crossbar sizes.
6.3 Impact of SwitchX on Crossbar Power Consumption
Owing to low leakage of the memristive devices, the
\(V.I\) power consumption in the crossbars constitutes a significantly large portion of the overall power expended in low-precision BNN inference on ReRAM crossbars [
8,
18,
27]. Thus, in this section, we show an interesting by-product of
SwitchX mapping, whereby we can also reduce the crossbar power consumption compared to normally mapped BNNs (shown in Figure
12). Here, the simulations for estimating the
\(V.I\) power consumed by the crossbars during BNN inference has been performed using ReRAM devices with
\(R_{MIN} = 20 k\Omega\) and ON/OFF ratio of 10 using the NeuroSim tool [
41], and the analog voltages input to the crossbar are +0.1/–0.1 V (obtained via SPICE simulations as specified in Section
3.1).
Trends for energy-efficiency in crossbars: Figure
13 shows a plot of the power savings observed when randomly generated binary weight matrices with a higher proportion of “+1” values are mapped via
SwitchX onto crossbars of sizes ranging from 16
\(\times\) 16, to 128
\(\times\) 128. Here, input voltages to the crossbars are drawn from a uniform distribution. The case of “90% HRS states” implies that 10% of the values in the BNN weight matrix were “–1”, while “75% HRS states” implies that 25% of the values were “–1”, i.e., the weight matrix is less non-uniformly distributed than the former. Similarly, “60% HRS states” implies that 40% of the values in the BNN weight matrix were “–1”. We find that the power savings (
\(\sim\)7–34% on 64
\(\times\) 64 crossbars) increase when the distribution of BNN weights becomes more non-uniform (from “60% HRS states” to “90% HRS states”) for different crossbar sizes. A more non-uniform distribution would imply greater proportion of HRS states on
SwitchX mapping, thereby translating into greater crossbar power savings owing to lower dot-product currents. Note, state-aware training in BNNs increases the non-uniformity in the distribution of HRS-LRS synapses in crossbars when mapped using
SwitchX.
From Figure
12(a) (data shown for a 16
\(\times\) 16 crossbar), we find that standalone
SwitchX approach leads to
\(\sim\)9% power-savings on an average with respect to Normal BNN for a VGG16 network with CIFAR-100 dataset. While for CIFAR-10 dataset, there is
\(\sim\)8% power saving for standalone
SwitchX, it increases to
\(\sim\)22% when
SwitchX is combined with state-aware training. Furthermore, on carrying out similar experiments on 32
\(\times\) 32 crossbars, we obtain
\(\sim\)21% power savings on combining
SwitchX with state-aware training (data not shown for brevity). For CIFAR-100 dataset, we obtain
\(\sim\)19% power savings on 16
\(\times\) 16 crossbars on combining
SwitchX with state-aware training. Figure
12(b) shows the layer-wise normalized average power consumption by the network with CIFAR-100 and CIFAR-10 dataset. We find that overall for each convolutional layer of the network, the power consumed by the
SwitchX BNN is lesser than the Normal BNN. Furthermore, this reduction becomes even more significant between “conv7” and “conv8” layers when
SwitchX is combined with state-aware training (for CIFAR-10 dataset). This result is in accordance with Figure
13 which shows that the power savings on crossbar arrays increase significantly when the HRS-LRS state-distribution becomes more non-uniform.