Abstract
Early classification of time series will weaken the accuracy to some degree. If the time series data are imbalanced, it will be also challenging to accurately identify minority class examples. Up to now, these two problems have been intensively addressed separately on univariate time series data, but yet to be well studied when they occur together. Compared with univariate time series, multivariate time series (MTS) is more complex, which contains multiple variables, and the interconnections between variables are hidden. Therefore, it is even more challenging to handle the combination of both problems on multivariate time series. In this paper, we propose an adaptive classification ensemble method called early prediction on imbalanced MTS to deal with early classification on inter-class and intra-class imbalanced MTS data simultaneously. First, an adaptive ensemble framework is designed to learn an early classification model on imbalanced MTS data. Based on a multiple under-sampling approach and dynamical subspace generation method, the diversity of base classifiers is realized as well as all majority class examples being fully utilized. Second, to deal with the implicit issue of intra-class imbalance in the training data, a cluster-based shapelet selection method is introduced to obtain an optimal set of stable and robust shapelets. Finally, an associate-pattern mining approach is designed to efficiently learn base classifiers, which could enhance the interpretability of classification. Experimental results show that our proposed method can achieve effective early prediction on inter-class and intra-class imbalanced MTS data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499
Agrawal M, Singh G, Kumar GR (2012) Predictive data mining for highly imbalanced classification. Int J Emerg Technol Adv Eng 2(12):139–143
Baydogan MG, Runger G (2015) Learning a symbolic representation for multivariate time series classification. Data Min Knowl Discov. https://doi.org/10.1007/s10618-014-0349-y
Bregón A, Simón M A, Rodríguez JJ, Alonso CJ, et al (2005) Early fault classification in dynamic systems using case-based reasoning. In: Proceedings of the Spanish Association for Artificial Intelligence, pp 211–220
Cao H, Li X-L, Woon Y-K, Ng S-K (2013) Integrated oversampling for imbalanced time series classification. IEEE Trans Knowl Data Eng 25(12):2809–2822
Cao H, Li XL, Woon YK, Ng SK (2011) SPO: structure preserving oversampling for imbalanced time series classification. In: Proceedings of international conference on data mining, pp 1008–1013
Cieslak DA, Chawla NV (2008) Learning decision trees for unbalanced data. In: Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases, pp 241–256
Diez JJR, González CA, Boström H (2001) Boosting interval based literals: variable length and early classification. Intell Data Anal 5(3):245–262
Garcia-Trevino ES, Barria JA (2014) Structural generative descriptions for time series classification. IEEE Trans Cybern 44(10):1978–1991
Ghalwash MF, Obradovic Z (2012) Early classification of multivariate temporal observations by extraction of interpretable shapelets. BMC Bioinform 13:195
Ghalwash MF, Radosavljevic V, Obradovic Z (2013) Extraction of interpretable multivariate patterns for early diagnostics. In: Proceedings of international conference on data mining, pp 201–210
Ghalwash MF, Radosavljevic V, Obradovic Z (2014) Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of ACM SIGKDD international conference on Knowledge discovery and data mining, pp 402–411
Griffin MP, O’Shea TM, Bissonette EA, Harrell FE Jr, Lake DE, Moorman JR (2003) Abnormal heart rate characteristics preceding neonatal sepsis and sepsis-like illness. Pediatr Res 53(6):920–926
Gu B, Sheng VS (2016) A robust regularization path algorithm for \(\nu \)-support vector classification. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2016.2527796
Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2016.2544779
He Q, Dong Z, Zhuang F, Shang T, Shi Z (2012) Fast time series classification based on infrequent shapelets. In: Proceedings of international conference on machine learning and applications, pp 215–219
He G, Duan Y, Qian T, Xu C (2013) Early prediction on imbalanced multivariate time series. In: Proceedings of ACM international conference on Information and knowledge management, pp 1889–1892
He G, Duan Y, Peng R, Jing X, Qian T, Wang L (2015) Early classification on multivariate time series. Neurocomputing 149:777–787
He G, Chen L, Zeng C, Zheng Q, Zhou G (2016) Probabilistic skyline queries on uncertain time series. Neurocomputing 191:224–237
He G, Li Y, Zhao W (2017) An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification. Knowl Based Syst 124:80–92
Ho T (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Köknar-Tezek S, Latecki LJ (2011) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inf Syst 28(1):1–23
Liang G (2013) An effective method for imbalanced time series classification: hybrid sampling, AI 2013. Lect Notes Comput Sci 8272:374–385
Liang G, Zhang C (2012) A comparative study of sampling methods and algorithms for imbalanced time series classification. In: Proceedings of Australasian joint conference on artificial intelligence, pp 637–648
Marković D, Petković D, Nikolić V, Milovančević M, Denić N (2017) Determination of important parameters for patent applications. Facta Univ Ser Mech Eng 15(2):307–313. https://doi.org/10.22190/FUME170511014M
Mueen A, Keogh E, Yong N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 1154–1162
Orsenigo C, Vercellis C (2010) Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification. Pattern Recognit 43:3787–3794
Petković D, Gocić M, Shamshirband S (2016) Adaptive neuro-fuzzy computing technique for precipitation estimation. Facta Univ Ser Mech Eng 14(2):209–218
Ping XO, Tseng YJ, Lin YP, Chiu HJ, Lai F, Liang JD, Huang GT, Yang PM (2015) A multiple measurements case-based reasoning method for predicting recurrent status of liver cancer patients. Comput Ind 69:12–21
Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65
Ryan HT, Qian Q, Chawla NV, Zhou Z-H (2012) Building decision trees for the multi-class imbalance problem. In: Proceedings of Pacific-Asia conference on knowledge discovery and data mining, pp 122–134
Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40:3358–3378
Tan YFV, Cao H, Pang J (2013) MOGT: oversampling with a parsimonious mixture of Gaussian trees model for imbalanced time-series classification. In: MLSP, pp 1–6
Tseng YJ, Ping XO, Liang JD, Yang PM, Huang GT, Lai F (2015) Multiple time series clinical data processing for classification with merging algorithm and statistical measures. IEEE J Biomed Health Inform 15(3):1036–43
Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295(1):395–406
Xing Z, Pei J, Yu PS (2009) Early prediction on time series: a nearest neighbor approach. In: Proceedings of international joint conference on artifical intelligence, pp 1297–1302
Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor 12(1):40–48
Xing Z, Pei J, Yu PS, Wang K (2011) Extracting interpretable features for early classification on time series. In: Proceedings of SIAM international conference on data mining, pp 247–258
Xu R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp. 947–956
Yoon H, Yang K, Shahabi C (2005) Feature subset selection and feature ranking for multivariate time series. IEEE Trans Knowl Data Eng 17(9):1186–1198
Zheng Y, Jeon B, Xu D, Wu QM, Zhang H (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973
Funding
This study was supported by the National Key Research and Development Plan of China under Grant No. 2017YFB0503700, 2016YFB0501801, National Natural Science Foundation of China under Grant No. 61170026, Natural Science Foundation of Hubei Province of China under Grant No. 2011CDB462.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
He, G., Zhao, W., Xia, X. et al. An ensemble of shapelet-based classifiers on inter-class and intra-class imbalanced multivariate time series at the early stage. Soft Comput 23, 6097–6114 (2019). https://doi.org/10.1007/s00500-018-3261-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3261-3