Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

TS-CHIEF: a scalable and accurate forest algorithm for time series classification

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Time Series Classification (TSC) has seen enormous progress over the last two decades. HIVE-COTE (Hierarchical Vote Collective of Transformation-based Ensembles) is the current state of the art in terms of classification accuracy. HIVE-COTE recognizes that time series data are a specific data type for which the traditional attribute-value representation, used predominantly in machine learning, fails to provide a relevant representation. HIVE-COTE combines multiple types of classifiers: each extracting information about a specific aspect of a time series, be it in the time domain, frequency domain or summarization of intervals within the series. However, HIVE-COTE (and its predecessor, FLAT-COTE) is often infeasible to run on even modest amounts of data. For instance, training HIVE-COTE on a dataset with only 1500 time series can require 8 days of CPU time. It has polynomial runtime with respect to the training set size, so this problem compounds as data quantity increases. We propose a novel TSC algorithm, TS-CHIEF (Time Series Combination of Heterogeneous and Integrated Embedding Forest), which rivals HIVE-COTE in accuracy but requires only a fraction of the runtime. TS-CHIEF constructs an ensemble classifier that integrates the most effective embeddings of time series that research has developed in the last decade. It uses tree-structured classifiers to do so efficiently. We assess TS-CHIEF on 85 datasets of the University of California Riverside (UCR) archive, where it achieves state-of-the-art accuracy with scalability and efficiency. We demonstrate that TS-CHIEF can be trained on 130 k time series in 2 days, a data quantity that is beyond the reach of any TSC algorithm with comparable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Bagnall A, Davis L, Hills J, Lines J (2012) Transformation based ensembles for time series classification. In: Proceedings of the SIAM international conference on data mining, pp 307–318

  • Bagnall A, Lines J, Hills J, Bostrom A (2015) Time-series classification with COTE: the collective of transformation-based ensembles. IEEE Trans Knowl Data Eng 27(9):2522–2535

    Google Scholar 

  • Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31(3):606–660

    MathSciNet  Google Scholar 

  • Baydogan MG, Runger G (2016) Time series representation and similarity based on local autopatterns. Data Min Knowl Discov 30(2):476–509

    MathSciNet  MATH  Google Scholar 

  • Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802

    Google Scholar 

  • Benavoli A, Corani G, Mangili F (2016) Should we really use post-hoc tests based on mean-ranks? J Mach Learn Res 17(1):152–161

    MathSciNet  MATH  Google Scholar 

  • Bostrom A, Bagnall A (2015) Binary shapelet transform for multiclass time series classification. In: International conference on big data analytics and knowledge discovery, pp 257–269. Springer

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32 ISSN 08856125

    MATH  Google Scholar 

  • Chen L, Ng R (2004) On The marriage of lp-norms and edit distance. In: Proceedings of the 13th international conference on very large data bases (VLDB), pp. 792–803

  • Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/

  • Dau HA, Bagnall A, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2018a) The UCR time series archive. arXiv:1810.07758. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

  • Dau HA, Keogh E, Kamgar K, Yeh C-CM, Zhu Y, Gharghabi S, Ratanamahatana CA, Yanping, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2018b) The UCR time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inform Sci 239:142–153

    MathSciNet  MATH  Google Scholar 

  • Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh EJ (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552

    Google Scholar 

  • Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv (CSUR) 45(1):12

    MATH  Google Scholar 

  • Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller PA (2019) Deep learning for time series classification: a review. Data Min and Knowl Discov 33(4):917–963

    MathSciNet  Google Scholar 

  • Friedman E, Eden R (2013) GNU Trove: high-performance collections library for Java. https://bitbucket.org/trove4j/trove/src/master/. Accessed 25 June 2019

  • Górecki T, Łuczak M (2013) Using derivatives in time series classification. Data Min Knowl Discov 26(2):310–331 ISSN 13845810

    MathSciNet  Google Scholar 

  • Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining—KDD ’14, pp 392–401

  • Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2014) Classification of time series by shapelet transformation. Data Min Knowl Discov 28(4):851–881 ISSN 13845810

    MathSciNet  MATH  Google Scholar 

  • Hirschberg DS (1977) Algorithms for the longest common subsequence problem. J ACM 24(4):664–675

    MathSciNet  MATH  Google Scholar 

  • Jeong YS, Jeong MK, Omitaomu OA (2011) Weighted dynamic time warping for time series classification. Pattern Recognit 44(9):2231–2240

    Google Scholar 

  • Karlsson I, Papapetrou P, Boström H (2016) Generalized random shapelet forests. Data Min Knowl Discov 30(5):1053–1085

    MathSciNet  MATH  Google Scholar 

  • Keogh EJ, Pazzani MJ (2001) Derivative dynamic time warping. In: Proceedings of the 2001 SIAM international conference on data mining, pp 1–11

  • Keogh E, Kasetty S (2003) On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Min Knowl Discov 7(4):349–371

    MathSciNet  Google Scholar 

  • Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Locally adaptive dimensionality reduction for indexing large time series databases. ACM Sigmod Record 30(2):151–162

    MATH  Google Scholar 

  • Large J, Lines J, Bagnall A (2017) The heterogeneous ensembles of standard classification algorithms (HESCA): the whole is greater than the sum of its parts, pp 1–31. arXiv:1710.09220

  • Large J, Bagnall A, Malinowski S, Tavenard R (2018) From BOP to BOSS and beyond: time series classification with dictionary based classifiers, pp 1–22. arXiv:1809.06751

  • Le Guennec A, Malinowski S, Tavenard R (2016) Data augmentation for time series classification using convolutional neural networks. In: ECML/PKDD workshop on advanced analytics and learning on temporal data

  • Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144 ISSN 13845810

    MathSciNet  Google Scholar 

  • Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315

    Google Scholar 

  • Lines J, Bagnall A (2015) Time series classification with ensembles of elastic distance measures. Data Min Knowl Discov 29(3):565–592 ISSN 13845810

    MathSciNet  MATH  Google Scholar 

  • Lines J, Taylor S, Bagnall A (2018) Time series classification with hive-cote: the hierarchical vote collective of transformation-based ensembles. ACM Trans Knowl Discov Data (TKDD) 12(5):52

    Google Scholar 

  • Lucas B, Shifaz A, Pelletier C, O’Neill L, Zaidi N, Goethals B, Petitjean F, Webb GI (2019) Proximity Forest: an effective and scalable distance-based classifier for time series. Data Min Knowl Discov 33(3):607–635

    Google Scholar 

  • Marteau P-F (2009) Time warp edit distance with stiffness adjustment for time series matching. IEEE Trans Pattern Anal Mach Intell 31(2):306–318

    Google Scholar 

  • Middlehurst M, Vickers W, Bagnall A (2019) Scalable dictionary classifiers for time series classification. arXiv:1907.11815

  • Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining—KDD ’11, p 1154

  • Nwe TL, Dat TH, Ma B (2017) Convolutional neural network with multi-task learning scheme for acoustic scene classification. In: 2017 Asia-pacific signal and information processing association annual summit and conference (APSIPA ASC), pp 1347–1350. IEEE

  • Nweke HF, Teh YW, Al-Garadi MA, Alo UR (2018) Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: state of the art and research challenges. Expert Syst Appl 105:233–261

    Google Scholar 

  • Osinski S, Weiss D (2015) HPPC: High performance primitive collections for Java. https://labs.carrotsearch.com/hppc.html. Accessed 25 June 2019

  • Pelletier C, Webb GI, Petitjean F (2019) Temporal convolutional neural network for the classification of satellite image time series. Remote Sens 11(5):523

    Google Scholar 

  • Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M et al (2018) Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1(1):18

    Google Scholar 

  • Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 2013 SIAM international conference on data mining, pp 668–676

  • Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2013) Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. ACM Trans Knowl Discov Data (TKDD) 7(3):10

    Google Scholar 

  • Schäfer P (2015) The BOSS is concerned with time series classification in the presence of noise. Data Min Knowl Discov 29(6):1505–1530

    MathSciNet  MATH  Google Scholar 

  • Schäfer P (2016) Scalable time series classification. Data Min Knowl Discov 30(5):1273–1298 ISSN 1573756X

    MathSciNet  MATH  Google Scholar 

  • Schäfer P, Högqvist M (2012) SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: Proceedings of the 15th international conference on extending database technology, pp 516–527

  • Schäfer P, Leser U (2017) Fast and accurate time series classification with WEASEL. In: Proceedings of the 2017 ACM on conference on information and knowledge management (CIKM), pp 637–646. ISBN 9781450349185

  • Senin P, Malinchik S (2013) SAX-VSM: interpretable time series classification using SAX and vector space model. In: Proceedings of IEEE international conference on data mining, ICDM, pp 1175–1180, ISSN 15504786

  • Silva DF, Giusti R, Keogh E, Batista GE (2018) Speeding up similarity search under dynamic time warping by pruning unpromising alignments. Data Min Knowl Discov 32(4):988–1016

    MathSciNet  MATH  Google Scholar 

  • Stefan A, Athitsos V, Das G (2013) The move-split-merge metric for time series. IEEE Trans Knowl Data Eng 25(6):1425–1438 ISSN 10414347

    Google Scholar 

  • Susto GA, Cenedese A, Terzi M (2018) Time-series classification methods: review and applications to power systems data. In: Big data application in power systems, pp 179–220 Elsevier

  • Tan CW, Webb GI, Petitjean F (2017) Indexing and classifying gigabytes of time series under time warping. In: Proceedings of the 2017 SIAM international conference on data mining, pp 282–290. SIAM

  • Ueda N, Nakano R (1996) Generalization error of ensemble estimators. In: IEEE international conference on neural networks, volume 1, pp 90–95. IEEE

  • Wang Z, Yan W, Oates T (2017) Time series classification from scratch with deep neural networks: A strong baseline. In: 2017 international joint conference on neural networks (IJCNN), pp 1578–1585. IEEE

  • Wang J, Liu P, She MF, Nahavandi S, Kouzani A (2013) Bag-of-words representation for biomedical time series classification. Biomed Signal Process Control 8(6):634–644

    Google Scholar 

  • Wang J, Chen Y, Hao S, Peng X, Hu L (2019) Deep learning for sensor-based activity recognition: a survey. Pattern Recognit Lett 119:3–11

    Google Scholar 

  • Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inf Technol Decis Mak 5(04):597–604

    Google Scholar 

  • Ye L, Keogh E (2009) Time series shapelets. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining—KDD ’09, p 947

Download references

Acknowledgements

This research was supported by the Australian Research Council under grant DE170100037. This material is based upon work supported by the Air Force Office of Scientific Research, Asian Office of Aerospace Research and Development (AOARD) under award number FA2386-17-1-4036. The authors would like to thank Prof. Eamonn Keogh and all the people who have contributed to the UCR time series classification archive. We also would like to acknowledge the use of source code freely available at http://www.timeseriesclassification.com and thank Prof. Anthony Bagnall and other contributors of the project. We also acknowledge the use of source code freely provided by the original author of BOSS algorithm, Dr. Patrick Schäfer. Finally, we acknowledge the use of two Java libraries (Osinski and Weiss 2015; Friedman and Eden 2013), which was used to optimize the implementation of our source code.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Shifaz.

Additional information

Responsible editor: Eamonn Keogh.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Tables 3 and 4.

Table 3 Complexities of the methods mentioned in Sect. 2
Table 4 Accuracy of leading TSC classifiers on 85 UCR datasets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shifaz, A., Pelletier, C., Petitjean, F. et al. TS-CHIEF: a scalable and accurate forest algorithm for time series classification. Data Min Knowl Disc 34, 742–775 (2020). https://doi.org/10.1007/s10618-020-00679-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00679-8

Keywords

Navigation