Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3474370.3485655acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
invited-talk

Using Honeypots to Catch Adversarial Attacks on Neural Networks

Published: 15 November 2021 Publication History

Abstract

Deep neural networks (DNN) are known to be vulnerable to adversarial attacks. Numerous efforts either try to patch weaknesses in trained models, or try to make it difficult or costly to compute adversarial examples that exploit them. In our work, we explore a new "honeypot'' approach to protect DNN models. We intentionally inject trapdoors, honeypot weaknesses in the classification manifold that attract attackers searching for adversarial examples. Attackers' optimization algorithms gravitate towards trapdoors, leading them to produce attacks similar to trapdoors in the feature space. Our defense then identifies attacks by comparing neuron activation signatures of inputs to those of trapdoors.
In this paper, we introduce trapdoors and describe an implementation of a trapdoor-enabled defense. First, we analytically prove that trapdoors shape the computation of adversarial attacks so that attack inputs will have feature representations very similar to those of trapdoors. Second, we experimentally show that trapdoor-protected models can detect, with high accuracy, adversarial examples generated by state-of-the-art attacks (PGD, optimization-based CW, Elastic Net, BPDA), with negligible impact on normal classification. These results generalize across classification domains, including image, facial, and traffic-sign recognition. We also present significant results measuring trapdoors' robustness against customized adaptive attacks (countermeasures).

Supplementary Material

MP4 File (MTD21-fp12345.mp4)
In our work, we explore a new honeypot approach to protect DNN models. We intentionally inject honeypot weaknesses in the classification manifold that attract attackers searching for adversarial examples. Attackers optimization algorithms gravitate towards trapdoors, leading them to produce attacks similar to trapdoors in the feature space. We introduce trapdoors & describe an implementation of a trapdoor-enabled defense. We analytically prove that trapdoors shape the computation of adversarial attacks so that attack inputs will have feature representations very similar to those of trapdoors. We experimentally show that trapdoor-protected models can detect, with high accuracy, adversarial examples generated by state-of-the-art attacks. These results generalize across classification domains, including image, facial, & traffic-sign recognition. We also present significant results measuring trapdoors? robustness against customized adaptive attacks (countermeasures).

Cited By

View all
  • (2023)Beating Backdoor Attack at Its Own Game2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00426(4597-4606)Online publication date: 1-Oct-2023
  • (2022)Application of Artificial Intelligence Technology in Honeypot Technology2021 International Conference on Advanced Computing and Endogenous Security10.1109/IEEECONF52377.2022.10013349(01-09)Online publication date: 21-Apr-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MTD '21: Proceedings of the 8th ACM Workshop on Moving Target Defense
November 2021
48 pages
ISBN:9781450386586
DOI:10.1145/3474370
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2021

Check for updates

Author Tags

  1. adversarial examples
  2. honeypots
  3. neural networks

Qualifiers

  • Invited-talk

Conference

CCS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 40 of 92 submissions, 43%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Beating Backdoor Attack at Its Own Game2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.00426(4597-4606)Online publication date: 1-Oct-2023
  • (2022)Application of Artificial Intelligence Technology in Honeypot Technology2021 International Conference on Advanced Computing and Endogenous Security10.1109/IEEECONF52377.2022.10013349(01-09)Online publication date: 21-Apr-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media