Deep Anomaly Detection Based on Variational Deviation Network
<p>(<b>a</b>) Learning features for subsequent anomaly measures vs. (<b>b</b>) direct learning of anomaly scores vs. (<b>c</b>) learning reference score for subsequent anomaly measures.</p> "> Figure 2
<p>The entire training process of variational auto-encoder. <math display="inline"><semantics> <mi>X</mi> </semantics></math> is trained by two encoders to obtain the mean and variance, where the encoder that obtains the mean converges to 0 through Gaussian noise to make the decoder robust to noise; the other encoder that obtains the variance, which makes the variance obtained from training converge to 1 to dynamically adjust the intensity of noise, thus further optimizing the decoder; the <math display="inline"><semantics> <mover accent="true"> <mi>X</mi> <mo>^</mo> </mover> </semantics></math> obtained through the decoder is continuously optimized to approach <math display="inline"><semantics> <mi>X</mi> </semantics></math> through the loss function.</p> "> Figure 3
<p>The overall framework of the variational deviation network. <math display="inline"><semantics> <mrow> <mi>ϕ</mi> <mfenced> <mrow> <mi>x</mi> <mo>;</mo> <mi mathvariant="sans-serif">Θ</mi> </mrow> </mfenced> </mrow> </semantics></math> is an anomaly score learner with the parameters <math display="inline"><semantics> <mi mathvariant="sans-serif">Θ</mi> </semantics></math>; <math display="inline"><semantics> <mrow> <mover accent="true"> <mrow> <msub> <mi>μ</mi> <mi>R</mi> </msub> </mrow> <mo stretchy="true">^</mo> </mover> <mo>,</mo> <mo> </mo> <mover accent="true"> <mrow> <msub> <mi>σ</mi> <mi>R</mi> </msub> </mrow> <mo stretchy="true">^</mo> </mover> </mrow> </semantics></math> are the reference scores of the training anomaly scores, which are the average mean and average variance of the normal data, where the normal data <math display="inline"><semantics> <mrow> <msup> <mi>R</mi> <mi>k</mi> </msup> </mrow> </semantics></math> are trained by the encoder in the variational self-coding to obtain the mean and variance.</p> "> Figure 4
<p>Autoencoder schematic. <span class="html-italic">x</span> is trained by the encoder to obtain the low-dimensional hidden space <span class="html-italic">z</span>. After reducing <span class="html-italic">z</span> to <math display="inline"><semantics> <mover accent="true"> <mi>x</mi> <mo>^</mo> </mover> </semantics></math>, which is similar to <span class="html-italic">x</span> by the decoder, we continuously optimized <math display="inline"><semantics> <mover accent="true"> <mi>x</mi> <mo>^</mo> </mover> </semantics></math> by the loss function to keep <math display="inline"><semantics> <mover accent="true"> <mi>x</mi> <mo>^</mo> </mover> </semantics></math> close to <span class="html-italic">x</span>.</p> "> Figure 5
<p>Schematic diagram of the basic principle of a variational auto-encoder. <math display="inline"><semantics> <mi>X</mi> </semantics></math> is trained by two encoders separately to obtain the mean and variance, which are kept close to the normal distribution by an additional loss function to obtain a probability distribution <math display="inline"><semantics> <mi>Z</mi> </semantics></math> conforming to <math display="inline"><semantics> <mi>X</mi> </semantics></math>. <math display="inline"><semantics> <mi>Z</mi> </semantics></math> is reduced to <math display="inline"><semantics> <mover accent="true"> <mi>X</mi> <mo>^</mo> </mover> </semantics></math>, similar to <math display="inline"><semantics> <mi>X</mi> </semantics></math> by a decoder and continuously optimized by reconstructing the error loss function.</p> ">
Abstract
:1. Introduction
- Introduced a new model: Variational auto-encoder. This model realizes the specification of reference scores by learning the normal distribution of each normal data and generates different reference scores for different data, making the reference scores more meaningful strong explanatory power; and
- The framework instantiates a new deep anomaly detection method, namely the variational deviation networks (V-DevNet). V-DevNet optimizes the anomaly score by anomaly score neural network, variational self-encoding, and deviation loss, and the obtained anomaly score is optimized accurately and easily explained.
2. Related Work
2.1. Traditional Anomaly Detection
2.2. Deep Anomaly Detection
2.3. Limited Abnormal Data
3. Reference Score Learning
3.1. Problem Statement
3.2. Variational Auto-Encoder
4. Methods
4.1. Overall Framework
- Anomaly score network (): Its main function is to generate an anomaly score for each input .
- Variational auto-encoder: In order to generate reference scores in a data-driven manner, we introduced a variational auto-encoder, which generates reference scores and through two encoders, and the reference scores generated by this method are more explanatory and can better fit the data itself and learn its reference scores.
- Finally, we define the deviation loss function, and use , , and to guide the optimization of parameters. Through the deviation loss function, we can make the abnormal data, the reference score of the normal score has a significant deviation, and the normal data are closer to .
4.2. Anomaly Score Network
4.3. Variational Auto-Encoder
4.3.1. Variational Auto-Encoder and Autoencoder
4.3.2. Principle of Variational Auto-Encoder
4.3.3. Generate Reference Score
4.4. Deviation Loss Function
4.5. Variational Deviation Network Algorithm
Algorithm 1 Training the variational deviation network. |
-input normal data |
-output reference score
|
Algorithm 2 Training the variational deviation network. |
-input training data |
-output the trained model
|
5. Results
5.1. Dataset
5.2. Comparison Algorithm
5.3. Experiment Settings
5.4. Performance Evaluation Method
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Moustafa, N.; Slay, J. UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 1–6. [Google Scholar]
- Thapa, N.; Liu, Z.; Kc, D.B.; Gokaraju, B.; Roy, K. Comparison of Machine Learning and Deep Learning Models for Network Intrusion Detection Systems. Future Internet 2020, 12, 167. [Google Scholar] [CrossRef]
- Ryan, S.; Corizzo, R.; Kiringa, I.; Japkowicz, N. Pattern and Anomaly Localization in Complex and Dynamic Data. In Proceedings of the 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1756–1763. [Google Scholar]
- Vlaminck, M.; Heidbuchel, R.; Philips, W.; Luong, H. Region-Based CNN for Anomaly Detection in PV Power Plants Using Aerial Imagery. Sensors 2022, 22, 1244. [Google Scholar] [CrossRef] [PubMed]
- Pang, G.; Cao, L.; Chen, L.; Liu, H. Learning Representations of Ultrahigh-Dimensional Data for Random Distance-Based Outlier Detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2041–2050. [Google Scholar]
- Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep One-Class Classification. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4393–4402. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Sathe, S.; Aggarwal, C.; Turaga, D. Outlier Detection with Autoencoder Ensembles. In Proceedings of the 2017 SIAM International Conference on Data Mining, Houston, TX, USA, 27–29 April 2017; SIAM: Philadelphia, PA, USA, 2017; pp. 90–98. [Google Scholar]
- Hawkins, S.; He, H.; Williams, G.; Baxter, R. Outlier Detection Using Replicator Neural Networks. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, Vienna, Austria, 3–6 September 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 170–180. [Google Scholar]
- Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Schmidt-Erfurth, U.; Langs, G. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. In Proceedings of the International Conference on Information Processing in Medical Imaging, Boone, NC, USA, 25–30 June 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 146–157. [Google Scholar]
- Zenati, H.; Romain, M.; Foo, C.-S.; Lecouat, B.; Chandrasekhar, V. Adversarially Learned Anomaly Detection. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Beijing, China, 17–20 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 727–736. [Google Scholar]
- Zhou, C.; Paffenroth, R.C. Anomaly Detection with Robust Deep Autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 665–674. [Google Scholar]
- Sansone, E.; De Natale, F.G.; Zhou, Z.-H. Efficient Training for Positive Unlabeled Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2584–2598. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Aggarwal, C.C. Supervised Outlier Detection. In Outlier Analysis; Springer: Berlin/Heidelberg, Germany, 2017; pp. 219–248. [Google Scholar]
- Li, X.; Liu, B. Learning to Classify Texts Using Positive and Unlabeled Data. In Proceedings of the IJCAI, Acapulco, Mexico, 9–15 August 2003; Citeseer: State College, PA, USA, 2003; Volume 3, pp. 587–592. [Google Scholar]
- Pang, G.; Shen, C.; van den Hengel, A. Deep Anomaly Detection with Deviation Networks. In Proceedings of the Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 353–362. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation-Based Anomaly Detection. ACM Trans. Knowl. Discov. Data TKDD 2012, 6, 1–39. [Google Scholar] [CrossRef]
- Keller, F.; Muller, E.; Bohm, K. HiCS: High Contrast Subspaces for Density-Based Outlier Ranking. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, Arlington, VA, USA, 1–5 April 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1037–1048. [Google Scholar]
- Pang, G.; Cao, L.; Chen, L.; Lian, D.; Liu, H. Sparse Modeling-Based Sequential Ensemble Learning for Effective Outlier Detection in High-Dimensional Numeric Data. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Li, P.; Hastie, T.J.; Church, K.W. Very Sparse Random Projections. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 287–296. [Google Scholar]
- Tax, D.M.; Duin, R.P. Support Vector Data Description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef] [Green Version]
- Elkan, C.; Noto, K. Learning Classifiers from only Positive and Unlabeled Data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 213–220. [Google Scholar]
- Ma, Y.; Lei, Y.; Wang, T. A Natural Scene Recognition Learning Based on Label Correlation. In IEEE Transactions on Emerging Topics in Computational Intelligence; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
- Fei-Fei, L.; Fergus, R.; Perona, P. One-Shot Learning of Object Categories. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 594–611. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Xiang, Y.; Cobben, J.F. A Bayesian Approach for Fault Location in Medium Voltage Grids with Underground Cables. IEEE Power Energy Technol. Syst. J. 2015, 2, 116–124. [Google Scholar] [CrossRef] [Green Version]
- Doersch, C. Tutorial on Variational Autoencoders. arXiv 2016, arXiv:1606.05908. [Google Scholar]
- Oluwasanmi, A.; Aftab, M.U.; Baagyere, E.; Qin, Z.; Ahmad, M.; Mazzara, M. Attention Autoencoder for Generative Latent Representational Learning in Anomaly Detection. Sensors 2022, 22, 123. [Google Scholar] [CrossRef] [PubMed]
- Corizzo, R.; Ceci, M.; Pio, G.; Mignone, P.; Japkowicz, N. Spatially-Aware Autoencoders for Detecting Contextual Anomalies in Geo-Distributed Data. In Proceedings of the International Conference on Discovery Science, Halifax, NS, Canada, 11–13 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 461–471. [Google Scholar]
- Kriegel, H.-P.; Kroger, P.; Schubert, E.; Zimek, A. Interpreting and Unifying Outlier Scores. In Proceedings of the 2011 SIAM International Conference on Data Mining, Philadelphia, PA, USA, 6 April 2011; SIAM: Philadelphia, PA, USA, 2011; pp. 13–24. [Google Scholar]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality Reduction by Learning an Invariant Mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 22 June 2006; IEEE: Piscataway, NJ, USA, 2006; Volume 2, pp. 1735–1742. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-Shot Learning. arXiv 2017, arXiv:1703.05175. [Google Scholar]
- Gulli, A.; Pal, S. Deep Learning with Keras; Packt Publishing Ltd.: Birmingham, UK, 2017. [Google Scholar]
- Hinton, G.; Srivastava, N.; Swersky, K. Neural Networks for Machine Learning Lecture 6a Overview of Mini-Batch Gradient Descent. Available online: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf (accessed on 28 February 2022).
- Davis, J.; Goadrich, M. The Relationship between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
- Schütze, H.; Manning, C.D.; Raghavan, P. Introduction to Information Retrieval; Cambridge University Press Cambridge: Cambridge, UK, 2008; Volume 39. [Google Scholar]
- Woolson, R.F. Wilcoxon Signed-Rank Test. In Wiley Encyclopedia of Clinical Trials; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2007; pp. 1–3. [Google Scholar]
Datasets | AUC-ROC Performance | |||||
---|---|---|---|---|---|---|
V-DevNet | DevNet | REPEN | DSVDD | FSNet | iForest | |
donors | 1.000 ± 0.000 | 1.000 ± 0.000 | 0.975 ± 0.005 | 0.995 ± 0.005 | 0.997 ± 0.002 | 0.874 ± 0.015 |
census | 0.856 ± 0.017 | 0.828 ± 0.008 | 0.794 ± 0.005 | 0.835 ± 0.014 | 0.732 ± 0.020 | 0.624 ± 0.020 |
fraud | 0.988 ± 0.003 | 0.980 ± 0.001 | 0.972 ± 0.003 | 0.977 ± 0.001 | 0.734 ± 0.046 | 0.953 ± 0.002 |
celeba | 0.967 ± 0.009 | 0.951 ± 0.001 | 0.894 ± 0.005 | 0.944 ± 0.003 | 0.808 ± 0.027 | 0.698 ± 0.020 |
backdoor | 0.975 ± 0.023 | 0.969 ± 0.004 | 0.878 ± 0.007 | 0.952 ± 0.018 | 0.928 ± 0.019 | 0.752 ± 0.021 |
URL | 0.985 ± 0.016 | 0.977 ± 0.004 | 0.842 ± 0.006 | 0.908 ± 0.027 | 0.786 ± 0.047 | 0.720 ± 0.032 |
campaign | 0.914 ± 0.015 | 0.807 ± 0.006 | 0.723 ± 0.006 | 0.748 ± 0.019 | 0.623 ± 0.024 | 0.731 ± 0.015 |
news20 | 0.948 ± 0.002 | 0.950 ± 0.007 | 0.885 ± 0.003 | 0.887 ± 0.000 | 0.578 ± 0.050 | 0.328 ± 0.016 |
thyroid | 0.831 ± 0.005 | 0.783 ± 0.003 | 0.580 ± 0.016 | 0.749 ± 0.011 | 0.564 ± 0.017 | 0.688 ± 0.020 |
Data Sets | AUC-PR Performance | |||||
---|---|---|---|---|---|---|
V-DevNet | DevNet | REPEN | DSVDD | FSNet | iForest | |
donors | 1.000 ± 0.000 | 1.000 ± 0.000 | 0.508 ± 0.048 | 0.846 ± 0.114 | 0.994 ± 0.002 | 0.221 ± 0.025 |
census | 0.368 ± 0.015 | 0.321 ± 0.004 | 0.164 ± 0.003 | 0.291 ± 0.008 | 0.193 ± 0.019 | 0.076 ± 0.004 |
fraud | 0.713 ± 0.004 | 0.690 ± 0.002 | 0.674 ± 0.004 | 0.688 ± 0.004 | 0.043 ± 0.021 | 0.254 ± 0.043 |
celeba | 0.326 ± 0.006 | 0.279 ± 0.009 | 0.161 ± 0.006 | 0.261 ± 0.008 | 0.085 ± 0.012 | 0.065 ± 0.006 |
backdoor | 0.886 ± 0.002 | 0.883 ± 0.008 | 0.116 ± 0.003 | 0.856 ± 0.016 | 0.573 ± 0.167 | 0.051 ± 0.005 |
URL | 0.695 ± 0.015 | 0.681 ± 0.022 | 0.103 ± 0.003 | 0.475 ± 0.040 | 0.149 ± 0.076 | 0.066 ± 0.012 |
campaign | 0.426 ± 0.009 | 0.381 ± 0.008 | 0.330 ± 0.009 | 0.349 ± 0.023 | 0.193 ± 0.012 | 0.328 ± 0.022 |
news20 | 0.650 ± 0.003 | 0.653 ± 0.009 | 0.222 ± 0.004 | 0.253 ± 0.001 | 0.082 ± 0.010 | 0.035 ± 0.002 |
thyroid | 0.386 ± 0.003 | 0.274 ± 0.011 | 0.093 ± 0.005 | 0.241 ± 0.009 | 0.116 ± 0.014 | 0.166 ± 0.017 |
Actual Positive | Actual Negative | |
---|---|---|
Predicted positive | True positive, TP | False positive, FP |
Predicted negative | False negative, FN | True negative, TN |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, J.; Wang, J.; Wei, X.; Wu, K.; Liu, G. Deep Anomaly Detection Based on Variational Deviation Network. Future Internet 2022, 14, 80. https://doi.org/10.3390/fi14030080
Lu J, Wang J, Wei X, Wu K, Liu G. Deep Anomaly Detection Based on Variational Deviation Network. Future Internet. 2022; 14(3):80. https://doi.org/10.3390/fi14030080
Chicago/Turabian StyleLu, Junwen, Jinhui Wang, Xiaojun Wei, Keshou Wu, and Guanfeng Liu. 2022. "Deep Anomaly Detection Based on Variational Deviation Network" Future Internet 14, no. 3: 80. https://doi.org/10.3390/fi14030080
APA StyleLu, J., Wang, J., Wei, X., Wu, K., & Liu, G. (2022). Deep Anomaly Detection Based on Variational Deviation Network. Future Internet, 14(3), 80. https://doi.org/10.3390/fi14030080