A comparative evaluation of novelty detection algorithms for discrete sequences

Rémi Domingues ORCID: orcid.org/0000-0001-9613-2187¹,
Pietro Michiardi¹,
Jérémie Barlet² &
…
Maurizio Filippone¹

969 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

The identification of anomalies in temporal data is a core component of numerous research areas such as intrusion detection, fault prevention, genomics and fraud detection. This article provides an experimental comparison of candidate methods for the novelty detection problem applied to discrete sequences. The objective of this study is to identify which state-of-the-art methods are efficient and appropriate candidates for a given use case. These recommendations rely on extensive novelty detection experiments based on a variety of public datasets in addition to novel industrial datasets. We also perform thorough scalability and memory usage tests resulting in new supplementary insights of the methods’ performance, key selection criteria to solve problems relying on large volumes of data and to meet the expectations of applications subject to strict response time constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novelets: a new primitive that allows online detection of emerging behaviors in time series

Article 10 August 2023

An Empirical Evaluation of Sequential Pattern Mining Algorithms

MERLIN++: parameter-free discovery of time series anomalies

Article 16 January 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aggarwal CC (2015) Outlier analysis. Springer, Cham, pp 237–263
Google Scholar
Bergroth L, Hakonen H, Raita T (2000) A survey of longest common subsequence algorithms. In: Proceedings seventh international symposium on string processing and information retrieval. SPIRE 2000, pp 39–48
Breiman L, Friedman J, Olsen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. SIGMOD Rec 29(2):93–104
Article Google Scholar
Budalakoti S, Srivastava AN, Akella R,Turkov E (2006) Anomaly detection in large sets of high-dimensional symbol sequences. Technical Report NASA TM-2006-214553
Budalakoti S, Srivastava AN, Otey ME (2009) Anomaly detection and diagnosis algorithms for discrete symbol sequences with applications to airline safety. IEEE Trans Syst Cybern C (Appl Rev) 39(1):101–113
Article Google Scholar
Chandola V, Banerjee A, Kumar V (2012) Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng 24(5):823–839
Article Google Scholar
Chandola V, Mithal V, Kumar V (2008) Comparative evaluation of anomaly detection techniques for sequence data. In: 2008 Eighth IEEE international conference on data mining, pp 743–748
Chang D, Jones NA, Li D, Ouyang M, Ragade RK (2008) Compute pairwise Euclidean distances of data points with gpus. In: Proceedings of the iASTED international symposium on computational biology and bioinformatics, pp 278–283
Christ M, Kempa-Liehr AW, Feindt M (2016) Distributed and parallel time series feature extraction for industrial big data applications. arXiv:1610.07717
Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds) Machine learning proceedings 1995. Morgan Kaufmann, San Francisco, pp 115–123
Chapter Google Scholar
Crochemore M, Iliopoulous CS, Pinzon YJ (2003) Speeding-up hirschberg and hunt-szymanski lcs algorithms. Fundamenta Informaticae 56(1–2):89–103
MathSciNet MATH Google Scholar
Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 233–240
de Fortuny EJ, Martens D (2015) Active learning-based pedagogical rule extraction. IEEE Trans Neural Netw Learn Syst 26(11):2664–2677
Article MathSciNet Google Scholar
Domingues R, Michiardi P, Zouaoui J, Filippone M (2018) Deep gaussian process autoencoders for novelty detection. Mach Learn
Emmott A, Das S, Dietterich T, Fern A, Wong W-K (2016) A meta-analysis of the anomaly detection problem. arXiv:1503.01158v2
Etchells TA, Lisboa PJG (2006) Orthogonal search-based rule extraction (osre) for trained neural networks: a practical and efficient approach. IEEE Trans Neural Netw 17(2):374–384
Article Google Scholar
Fowkes J, Sutton C (2016) A subsequence interleaving model for sequential pattern mining. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, New York, NY, USA, 2016. ACM, pp 835–844
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2018) A survey of parallel sequential pattern mining. arXiv:1805.10515
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064 (Special Issue on Intelligent Distributed Information Systems)
Gupta M, Gao J, Aggarwal CC, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267
Article MATH Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
Article MATH Google Scholar
Hofmeyr SA, Forrest S, Somayaji A (1998) Intrusion detection using sequences of system calls. J Comput Secur 6(3):151–180
Article Google Scholar
Hunt JW, Szymanski TG (1977) A fast algorithm for computing longest common subsequences. Commun ACM 20(5):350–353
Article MathSciNet MATH Google Scholar
Huysmans J, Baesens B, Vanthienen J (2006) Iter: an algorithm for predictive regression rule extraction. In: Tjoa AM, Trujillo J (eds) Data warehousing and knowledge discovery. Springer, Berlin, pp 270–279
Chapter Google Scholar
Karpathy A, Johnson J, Li F (2016) Visualizing and understanding recurrent networks. In: Proceedings of the fourth international conference on learning representations (ICLR 2016)
Kundzewicz ZW, Robson AJ (2004) Change detection in hydrological records—a review of the methodology. Hydrol Sci J 49(1):7–19
Article Google Scholar
Lazarevic A, Ertoz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM international conference on data mining, pp 25–36
Lee W, Stolfo SJ, Chan PK (1997) Learning patterns from unix process execution traces for intrusion detection. In: AAAI workshop on AI approaches to fraud detection and risk management, pp 50–56
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10:707
MathSciNet Google Scholar
Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning
Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: Proceedings of the 2008 eighth IEEE international conference on data mining, ICDM ’08. IEEE Computer Society, pp 413–422
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv:1508.04025
Marchi E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional lstm neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1996–2000
Maxion RA, Townsend TN (2002) Masquerade detection using truncated command lines. In: Proceedings international conference on dependable systems and networks, pp 219–228
Onderwater M (2015) Outlier preservation by dimensionality reduction techniques. Int J Data Anal Tech Strateg 7(3):231–252
Article Google Scholar
Park H-S, Jun C-H (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2, Part 2):3336–3341
Article Google Scholar
Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to twitter. In: Human Language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10. Association for Computational Linguistics, Stroudsburg, pp 181–189
Pihur V, Datta S, Datta S (2009) RankAggreg, an r package for weighted rank aggregation. BMC Bioinform 10(1):62
Article Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286
Article Google Scholar
Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00, New York, NY, USA. ACM, pp 427–438
Saad EW, Wunsch DC (2007) Neural network explanation using inversion. Neural Netw 20(1):78–93
Article MATH Google Scholar
Saidi R, Maddouri M, MephuNguifo E (2010) Protein sequences classification by means of feature extraction with substitution matrices. BMC Bioinform 11(1):175
Article Google Scholar
Sakurada M, Yairi T (2014) Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data analysis. ACM, pp 4–11
Schubert E, Rousseeuw PJ (2018) Faster k-Medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. arXiv:1810.05691
Sculley D, Brodley CE (2006) Compression and machine learning: a new perspective on feature space vectors. In: Data compression conference (DCC’06), pp 332–341
Setiono R, Leow WK, Zurada JM (2002) Extraction of rules from artificial neural networks for nonlinear regression. IEEE Trans Neural Netw 13(3):564–577
Article Google Scholar
Sun P, Chawla S, Arunasalam B (2006) Mining for outliers in sequential databases. In: Proceedings of the 2006 SIAM international conference on data mining, pp 94–105
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. Curran Associates Inc., New York, pp 3104–3112
Google Scholar
Taylor SJ, Letham B (2018) Forecasting at scale. Am Stat 72(1):37–45
Article MathSciNet Google Scholar
Wang JTL, Ma Q, Shasha D, Wu CH (2001) New techniques for extracting features from protein sequences. IBM Syst J 40(2):426–441
Article Google Scholar
Warrender C, Forrest S, Pearlmutter B (1999) Detecting intrusions using system calls: alternative data models. In: Proceedings of the 1999 IEEE symposium on security and privacy (Cat. No. 99CB36344), pp 133–145
Zhang Q-S, Zhu S-C (2018) Visual interpretability for deep learning: a survey. Front Inf Technol Electron Eng 19(1):27–39
Article Google Scholar

Download references

Acknowledgements

The authors wish to thank the Amadeus Middleware Fraud Detection team directed by Virginie Amar and Jérémie Barlet, led by the product owner Christophe Allexandre and composed of Jean-Blas Imbert, Jiang Wu, Yang Pu and Damien Fontanes for building the rights, transactions-fr and transactions-mo datasets. MF gratefully acknowledges support from the AXA Research Fund.

Author information

Authors and Affiliations

Department of Data Science, EURECOM, 450 Route des Chappes, Sophia Antipolis, France
Rémi Domingues, Pietro Michiardi & Maurizio Filippone
Amadeus, 485 Route du Pin Montard, Sophia Antipolis, France
Jérémie Barlet

Authors

Rémi Domingues
View author publications
You can also search for this author in PubMed Google Scholar
Pietro Michiardi
View author publications
You can also search for this author in PubMed Google Scholar
Jérémie Barlet
View author publications
You can also search for this author in PubMed Google Scholar
Maurizio Filippone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rémi Domingues.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Domingues, R., Michiardi, P., Barlet, J. et al. A comparative evaluation of novelty detection algorithms for discrete sequences. Artif Intell Rev 53, 3787–3812 (2020). https://doi.org/10.1007/s10462-019-09779-4

Download citation

Published: 08 November 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10462-019-09779-4

A comparative evaluation of novelty detection algorithms for discrete sequences

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Novelets: a new primitive that allows online detection of emerging behaviors in time series

An Empirical Evaluation of Sequential Pattern Mining Algorithms

MERLIN++: parameter-free discovery of time series anomalies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A comparative evaluation of novelty detection algorithms for discrete sequences

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Novelets: a new primitive that allows online detection of emerging behaviors in time series

An Empirical Evaluation of Sequential Pattern Mining Algorithms

MERLIN++: parameter-free discovery of time series anomalies

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation