Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

An ensemble of shapelet-based classifiers on inter-class and intra-class imbalanced multivariate time series at the early stage

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Early classification of time series will weaken the accuracy to some degree. If the time series data are imbalanced, it will be also challenging to accurately identify minority class examples. Up to now, these two problems have been intensively addressed separately on univariate time series data, but yet to be well studied when they occur together. Compared with univariate time series, multivariate time series (MTS) is more complex, which contains multiple variables, and the interconnections between variables are hidden. Therefore, it is even more challenging to handle the combination of both problems on multivariate time series. In this paper, we propose an adaptive classification ensemble method called early prediction on imbalanced MTS to deal with early classification on inter-class and intra-class imbalanced MTS data simultaneously. First, an adaptive ensemble framework is designed to learn an early classification model on imbalanced MTS data. Based on a multiple under-sampling approach and dynamical subspace generation method, the diversity of base classifiers is realized as well as all majority class examples being fully utilized. Second, to deal with the implicit issue of intra-class imbalance in the training data, a cluster-based shapelet selection method is introduced to obtain an optimal set of stable and robust shapelets. Finally, an associate-pattern mining approach is designed to efficiently learn base classifiers, which could enhance the interpretability of classification. Experimental results show that our proposed method can achieve effective early prediction on inter-class and intra-class imbalanced MTS data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499

  • Agrawal M, Singh G, Kumar GR (2012) Predictive data mining for highly imbalanced classification. Int J Emerg Technol Adv Eng 2(12):139–143

    Google Scholar 

  • Baydogan MG, Runger G (2015) Learning a symbolic representation for multivariate time series classification. Data Min Knowl Discov. https://doi.org/10.1007/s10618-014-0349-y

    Article  MathSciNet  MATH  Google Scholar 

  • Bregón A, Simón M A, Rodríguez JJ, Alonso CJ, et al (2005) Early fault classification in dynamic systems using case-based reasoning. In: Proceedings of the Spanish Association for Artificial Intelligence, pp 211–220

  • Cao H, Li X-L, Woon Y-K, Ng S-K (2013) Integrated oversampling for imbalanced time series classification. IEEE Trans Knowl Data Eng 25(12):2809–2822

    Article  Google Scholar 

  • Cao H, Li XL, Woon YK, Ng SK (2011) SPO: structure preserving oversampling for imbalanced time series classification. In: Proceedings of international conference on data mining, pp 1008–1013

  • Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases, pp 241–256

  • Diez JJR, González CA, Boström H (2001) Boosting interval based literals: variable length and early classification. Intell Data Anal 5(3):245–262

    Article  Google Scholar 

  • Garcia-Trevino ES, Barria JA (2014) Structural generative descriptions for time series classification. IEEE Trans Cybern 44(10):1978–1991

    Article  Google Scholar 

  • Ghalwash MF, Obradovic Z (2012) Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinform 13:195

    Article  Google Scholar 

  • Ghalwash MF, Radosavljevic V, Obradovic Z (2013) Extraction of interpretable multivariate patterns for early diagnostics. In: Proceedings of international conference on data mining, pp 201–210

  • Ghalwash MF, Radosavljevic V, Obradovic Z (2014) Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining, pp 402–411

  • Griffin MP, O’Shea TM, Bissonette EA, Harrell FE Jr, Lake DE, Moorman JR (2003) Abnormal heart rate characteristics preceding neonatal sepsis and sepsis-like illness. Pediatr Res 53(6):920–926

    Article  Google Scholar 

  • Gu B, Sheng VS (2016) A robust regularization path algorithm for \(\nu \)-support vector classification. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2016.2527796

    Article  Google Scholar 

  • Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2016.2544779

    Article  Google Scholar 

  • He Q, Dong Z, Zhuang F, Shang T, Shi Z (2012) Fast time series classification based on infrequent shapelets. In: Proceedings of international conference on machine learning and applications, pp 215–219

  • He G, Duan Y, Qian T, Xu C (2013) Early prediction on imbalanced multivariate time series. In: Proceedings of ACM international conference on Information and knowledge management, pp 1889–1892

  • He G, Duan Y, Peng R, Jing X, Qian T, Wang L (2015) Early classification on multivariate time series. Neurocomputing 149:777–787

    Article  Google Scholar 

  • He G, Chen L, Zeng C, Zheng Q, Zhou G (2016) Probabilistic skyline queries on uncertain time series. Neurocomputing 191:224–237

    Article  Google Scholar 

  • He G, Li Y, Zhao W (2017) An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification. Knowl Based Syst 124:80–92

    Article  Google Scholar 

  • Ho T (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  • Köknar-Tezek S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inf Syst 28(1):1–23

    Article  Google Scholar 

  • Liang G (2013) An effective method for imbalanced time series classification: hybrid sampling, AI 2013. Lect Notes Comput Sci 8272:374–385

    Article  Google Scholar 

  • Liang G, Zhang C (2012) A comparative study of sampling methods and algorithms for imbalanced time series classification. In: Proceedings of Australasian joint conference on artificial intelligence, pp 637–648

  • Marković D, Petković D, Nikolić V, Milovančević M, Denić N (2017) Determination of important parameters for patent applications. Facta Univ Ser Mech Eng 15(2):307–313. https://doi.org/10.22190/FUME170511014M

    Article  Google Scholar 

  • Mueen A, Keogh E, Yong N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 1154–1162

  • Orsenigo C, Vercellis C (2010) Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification. Pattern Recognit 43:3787–3794

    Article  MATH  Google Scholar 

  • Petković D, Gocić M, Shamshirband S (2016) Adaptive neuro-fuzzy computing technique for precipitation estimation. Facta Univ Ser Mech Eng 14(2):209–218

    Article  Google Scholar 

  • Ping XO, Tseng YJ, Lin YP, Chiu HJ, Lai F, Liang JD, Huang GT, Yang PM (2015) A multiple measurements case-based reasoning method for predicting recurrent status of liver cancer patients. Comput Ind 69:12–21

    Article  Google Scholar 

  • Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65

    Article  MATH  Google Scholar 

  • Ryan HT, Qian Q, Chawla NV, Zhou Z-H (2012) Building decision trees for the multi-class imbalance problem. In: Proceedings of Pacific-Asia conference on knowledge discovery and data mining, pp 122–134

  • Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40:3358–3378

    Article  MATH  Google Scholar 

  • Tan YFV, Cao H, Pang J (2013) MOGT: oversampling with a parsimonious mixture of Gaussian trees model for imbalanced time-series classification. In: MLSP, pp 1–6

  • Tseng YJ, Ping XO, Liang JD, Yang PM, Huang GT, Lai F (2015) Multiple time series clinical data processing for classification with merging algorithm and statistical measures. IEEE J Biomed Health Inform 15(3):1036–43

    Google Scholar 

  • Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295(1):395–406

    Article  Google Scholar 

  • Xing Z, Pei J, Yu PS (2009) Early prediction on time series: a nearest neighbor approach. In: Proceedings of international joint conference on artifical intelligence, pp 1297–1302

  • Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor 12(1):40–48

    Article  Google Scholar 

  • Xing Z, Pei J, Yu PS, Wang K (2011) Extracting interpretable features for early classification on time series. In: Proceedings of SIAM international conference on data mining, pp 247–258

  • Xu R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

  • Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp. 947–956

  • Yoon H, Yang K, Shahabi C (2005) Feature subset selection and feature ranking for multivariate time series. IEEE Trans Knowl Data Eng 17(9):1186–1198

    Article  Google Scholar 

  • Zheng Y, Jeon B, Xu D, Wu QM, Zhang H (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973

    Article  Google Scholar 

Download references

Funding

This study was supported by the National Key Research and Development Plan of China under Grant No. 2017YFB0503700, 2016YFB0501801, National Natural Science Foundation of China under Grant No. 61170026, Natural Science Foundation of Hubei Province of China under Grant No. 2011CDB462.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoying Wu.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, G., Zhao, W., Xia, X. et al. An ensemble of shapelet-based classifiers on inter-class and intra-class imbalanced multivariate time series at the early stage. Soft Comput 23, 6097–6114 (2019). https://doi.org/10.1007/s00500-018-3261-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-018-3261-3

Keywords

Navigation