Deep Neural Networks (DNNs) have achieved state-of-the-art performance on a wide range of tasks, thus are increasingly deployed in real-world applications. However, recent studies have found that DNNs are vulnerable to carefully crafted perturbations that are imperceptible to human eyes but fool DNNs into making incorrect predictions. Since then, an arms race between the generation of adversarial perturbation attacks and defenses to thwart them has taken off. This dissertation pursues important directions in this regard and discovers a series of adversarial attack methods and proposes an adversarial defense strategy.
The dissertation starts with white-box attacks where an adversary has full access to the victim DNN model, including model parameters and training settings, and proposes white-box attack methods against face recognition models and real-time video classification models. However, in most real-world attacks, the adversary only has partial information about the victim models, such as the predicted labels. In such black-box attacks, the attacker can send queries to the victim model to collect the corresponding labels, and thereby estimate the gradients needed for curating the adversarial perturbations. A query-efficient black-box video attack method is proposed by parameterizing the temporal structure of the gradient search space with geometric transformations. The new method exposes vulnerabilities of diverse video classification models and achieves new state-of-the-art attack results.
In addition to attack methods, a defense strategy utilizing context consistency check is proposed, which is inspired by the observation that humans can recognize objects that appear out of place in a scene. By augmenting DNN models with a system that learns context consistency rules during training and checks for the violations of the same during testing, the proposed approach effectively detects various adversarial attacks, with a detection rate over 20% better than the state-of-the-art context-agnostic methods.
In summary, the dissertation reveals several DNN models' vulnerabilities to adversarial attacks in both white-box and black-box attack settings. The proposed adversarial attack methods can be used as benchmarks to evaluate the robustness of image/video models, and are expected tostimulate studies on adversarial image/video defense. An adversarial defense strategy is proposed to enhance the robustness of DNN models