research-article

Open access

Training Robust ML-based Raw-Binary Malware Detectors in Hours, not Months

Authors:

Mahmood SharifAuthors Info & Claims

CCS '24: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security

Pages 124 - 138

https://doi.org/10.1145/3658644.3690208

Published: 09 December 2024 Publication History

PDF eReader

Abstract

Machine-learning (ML) classifiers are increasingly used to distinguish malware from benign binaries. Recent work has shown that ML-based detectors can be evaded by adversarial examples, but also that one may defend against such attacks via adversarial training. However, adversarial training, and subsequent robustness evaluation, is computationally expensive in the raw-binary malware-detection domain because it requires producing many adversarial examples for both training and evaluation. Prior work found that Greedy-training, a faster robust training technique that forgoes using adversarial examples, showed some promise in producing robust malware detectors. However, Greedy-training was far less effective in inducing robustness than the more expensive adversarial training, and it also severely hurt natural accuracy (i.e., accuracy on the original data). To faster train models, this work presents GreedyBlock-training, an enhanced version of Greedy-training that we empirically show achieves not only state-of-the-art robustness in malware detectors, exceeding even adversarial training, but also retains natural accuracy better than adversarial training. Furthermore, as it does not require creating adversarial (or functional) examples, GreedyBlock-training is significantly faster than adversarial training. Specifically, we show that GreedyBlock-training can produce more robust (+54% on average), more naturally accurate (+7% on average), and more efficiently trained (-91% average computation) malware detectors than prior work. To faster evaluate models, we also develop methods to faster gauge the robustness of ML-based raw-binary malware detectors by introducing robustness proxies, which can be used either to predict which models are likely to be the most robust, thus helping prioritize which detectors to evaluate with expensive attacks, or aiding in deciding which detectors are worthwhile to continue training. Experimentally, we show these proxy measures can find the most robust detector in a pool of detectors while using only ~20-50% of the computation that would otherwise be required.

References

[1]

D. W. Aha and R. L. Bankert. A comparative evaluation of sequential feature selection algorithms. In Proc. AISTATS, 1995.

Abstract

References

Index Terms

Recommendations

REMEDII: Robust Malware Detection with Iterative and Intelligent Adversarial Training

A novel method for improving the robustness of deep learning-based malware detectors against adversarial attacks

Towards Adversarially Superior Malware Detection Models: An Adversary Aware Proactive Approach using Adversarial Attacks and Defenses

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations