Computer Science > Machine Learning

arXiv:2311.11961 (cs)

[Submitted on 20 Nov 2023 (v1), last revised 11 Jun 2024 (this version, v2)]

Title:NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation

Authors:Hao Dong, Gaëtan Frusque, Yue Zhao, Eleni Chatzi, Olga Fink

Abstract:Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised anomaly detection. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this paper, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named Nearest Neighbor Gaussian Mixup (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised anomaly detection algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on 57 benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench. Our source code is available at this https URL.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2311.11961 [cs.LG]
	(or arXiv:2311.11961v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.11961

Submission history

From: Hao Dong [view email]
[v1] Mon, 20 Nov 2023 17:38:35 UTC (4,083 KB)
[v2] Tue, 11 Jun 2024 15:39:52 UTC (5,760 KB)

Computer Science > Machine Learning

Title:NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators