Abstract
Time series discretization is a technique commonly used to tackle time series classification problems. This manuscript presents an enhanced multi-objective approach for the symbolic discretization of time series called eMODiTS. The method proposed uses a different breakpoints vector, defined per each word segment, to increase the search space of the discretization schemes. eMODiTS’ search mechanism is the well-known evolutionary multi-objective algorithm NSGA-II, which finds a set of possible solutions according to entropy, complexity, and information loss estimations. Final solutions were appraised depending on the misclassification rate computed through the decision tree classifier. The trees obtained also produce graphical and significant information from the regions, relationships, or patterns in each database. Our proposal was compared against ten state-of-the-art time symbolic discretization algorithms. The results suggest that our proposal finds a suitable discretization scheme regarding classification, dimensionality, cardinality reduction, and information loss.
Similar content being viewed by others
References
Chaudhari P, Rana DP, Mehta RG, Mistry NJ, Raghuwanshi MM (2014) Discretization of temporal data: a survey. CoRR abs/1402.4283. arXiv:1402.4283
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):12:1–12:34. https://doi.org/10.1145/2379776.2379788
Azulay R, Moskovitch R, Stopel D, Verduijn M, De Jonge E, Shahar Y (2007) Temporal discretization of medical time series: a comparative study. In: Workshop on intelligent data analysis in biomedicine and pharmacology. Amsterdam, The Netherlands
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, DMKD ’03. ACM, New York, NY, USA, pp 2–11
Acosta-Mesa HG, Rechy-Ramírez F, Mezura-Montes E, Cruz-Ramírez N, Jiménez-Hernández R (2014) Application of time series discretization using evolutionary programming for classification of precancerous cervical lesions. J Biomed Inform 49:73–83
Song W, Wang Z, Zhang F, Ye Y, Fan M (2017) Empirical study of symbolic aggregate approximation for time series classification. Intell Data Anal 21(1):135–150
Keogh E, Xi X, Wei L, Ratanamahatana C (2006) The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/
Garcia S, Luengo J, Sáez JA, Lopez V, Herrera F (2012) A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750
Ramírez-Gallego S, García S, Mouriño-Talín H, Martínez-Rego D, Bolón-Canedo V, Alonso-Betanzos A, Benítez JM, Herrera F (2016) Data discretization: taxonomy and big data challenge. Wiley Interdiscip Rev Data Min Knowl Discov 6(1):5–21
Fu TC (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181. https://doi.org/10.1016/j.engappai.2010.09.007
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. In: Machine learning proceedings 1995. Elsevier, pp 194–202
Mörchen F, Ultsch A (2005) Optimizing time series discretization for knowledge discovery. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, pp 660–665
Chaves R, Ramírez J, Gorriz J, Initiative ADN et al (2013) Integrating discretization and association rule-based classification for alzheimer’s disease diagnosis. Expert Syst Appl 40(5):1571–1578
Agrawal R, Faloutsos C, Swami A (1993) Efficient similarity search in sequence databases. In: International conference on foundations of data organization and algorithms. Springer, pp 69–84
Chan F, Fu A, Yu C (2003) Haar wavelets for efficient similarity search of time-series: with and without time warping. IEEE Trans Knowl Data Eng 15:686–705. https://doi.org/10.1109/TKDE.2003.1198399
Yang K, Shahabi C (2005) On the stationarity of multivariate time series for correlation-based data analysis. In: Fifth IEEE international conference on data mining (ICDM’05). IEEE, p 4
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144. https://doi.org/10.1007/s10618-007-0064-z
Pham ND, Le QL, Dang TK (2010) Two novel adaptive symbolic representations for similarity search in time series databases. In: 2010 12th international asia-pacific web conference, pp 181–187. https://doi.org/10.1109/APWeb.2010.23
Bondu A, Boullé M, Cornuéjols A (2016) Symbolic representation of time series: a hierarchical coclustering formalization. Springer International Publishing, Cham, pp 3–16
Bondu A, Boullé M, Grossin B (2013) Saxo: an optimized data-driven symbolic representation of time series. In: The 2013 international joint conference on neural networks (IJCNN), pp 1–9. https://doi.org/10.1109/IJCNN.2013.6706816
Ahmed A, Bakar A, Hamdan A (2011) Harmony search algorithm for optimal word size in symbolic time series representation. In: 2011 3rd conference on data mining and optimization (DMO), pp 57–62
Ahmed AM, Bakar AA, Hamdan AR (2014) A harmony search algorithm with multi-pitch adjustment rate for symbolic time series data representation. Int J Mod Educ Comput Sci 6(6):58
Muhammad Fuad MM (2012) Genetic algorithms-based symbolic aggregate approximation. Springer, Berlin, pp 105–116. https://doi.org/10.1007/978-3-642-32584-7_9
Fuad MMM (2012) Differential evolution versus genetic algorithms: Towards symbolic aggregate approximation of non-normalized time series. In: Proceedings of the 16th international database engineering & applications symposium. ACM, pp 205–210
Acosta-Mesa H, Cruz-Ramírez N, García-López D (2008) Entropy based linear approximation algorithm for time series discretization. Adv Artif Intell Appl 32:214–224
García-López D, Acosta-Mesa HG (2009) Discretization of time series dataset with a genetic search. In: Proceedings of the 8th Mexican international conference on artificial intelligence, MICAI ’09. Springer, Berlin, pp 201–212
Rechy-Ramírez F, Acosta-Mesa HG, Mezura-Montes E, Cruz-Ramírez N (2011) Times series discretization using evolutionary programming. In: Batyrshin IZ, Sidorov G (eds) MICAI (2), vol 7095. Lecture notes in computer science. Springer, Berlin, pp 225–234
Marler R, Arora J (2004) Survey of multi-objective optimization methods for engineering. Struct Multidiscip Optim 26(6):369–395. https://doi.org/10.1007/s00158-003-0368-6
Lkhagva B, Suzuki Y, Kawagoe K (2006) Extended SAX: Extension of symbolic aggregate approximation for financial time series data representation. DEWS2006 4A-i8 7
Sant’Anna A, Wickström N (2011) Symbolization of time-series: an evaluation of sax, persist, and aca. In: 2011 4th international congress on image and signal processing, vol 4, pp 2223–2228. https://doi.org/10.1109/CISP.2011.6100559
Malinowski S, Guyet T, Quiniou R, Tavenard R (2013) 1d-SAX: a novel symbolic representation for time series. Springer, Berlin, pp 273–284. https://doi.org/10.1007/978-3-642-41398-8_24
Lkhagva B, Suzuki Y, Kawagoe K (2006) New time series data representation ESAX for financial applications. In: Proceedings of 22nd international conference on data engineering workshops, 2006. IEEE, pp x115–x115
Bai X, Xiong Y, Zhu Y, Zhu H (2013) Time series representation: a random shifting perspective. In: International conference on web-age information management. Springer, pp 37–50
dos Santos Passos H, Teodoro FGS, Duru BM, de Oliveira EL, Peres SM, Lima CAM (2017) Symbolic representations of time series applied to biometric recognition based on ecg signals. In: 2017 international joint conference on neural networks (IJCNN), pp 3199–3207. https://doi.org/10.1109/IJCNN.2017.7966255
Moskovitch R, Shahar Y (2015) Classification-driven temporal discretization of multivariate time series. Data Min Knowl Discov 29(4):871–913
Márquez-Grajales A, Acosta-Mesa HG, Mezura-Montes E (2017) An adaptive symbolic discretization scheme for the classification of temporal datasets using nsga-ii. In: 2017 IEEE international autumn meeting on power, electronics and computing (ROPEC), pp 1–8. https://doi.org/10.1109/ROPEC.2017.8261674
Sammut C, Webb GI (eds) (2017) Encyclopedia of machine learning and data mining. Springer, Berlin. https://doi.org/10.1007/978-1-4899-7687-1
Deb K, Kalyanmoy D (2001) Multi-objective optimization using evolutionary algorithms. Wiley, New York
Rangaiah G (2009) Multi-objective optimization: techniques and applications in chemical engineering. Advances in process systems engineering. World Scientific, Singapore
Edgeworth FY (1881) Mathematical psychics: an essay on the application of mathematics to the moral sciences. Kegan Paul, London
Pareto V (1896) Cours d’Economie Politique. Droz, Genève
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. Trans Evol Comp 6(2):182–197. https://doi.org/10.1109/4235.996017
Dimitrova ES, Licona MPV, McGee J, Laubenbacher R (2010) Discretization of time series data. J Comput Biol 17(6):853–868
Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Bajesy (ed) Proceedings of the 13th international joint conference on artificail intelligence. Morgan Kaufmann, Chambèry, France, pp 1022–1029
Kurgan LA, Cios KJ (2004) Caim discretization algorithm. IEEE Trans Knowl Data Eng 16(2):145–153
Bagnall A, Lines J, Bostrom A, Large J, Keogh E (2016) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Discov 31:606–660
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co.,Inc, Boston
Dau HA, Bagnall A, Kamgar K, Yeh CCM, Zhu Y, Gharghabi S, Ratanamahatana CA, Keogh E (2018) The UCR time series archive. arXiv preprint arXiv:1810.07758
Batista GE, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: Proceedings of the 2011 SIAM international conference on data mining. SIAM, pp 699–710
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings of 18th international conference on data engineering, 2002. IEEE, pp 673–684
Ding H, Trajcevski G, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: Proceedings of the 34 th VLDB, pp 1542–1552
Acknowledgements
The first author acknowledges the support from the Mexican National Council for Science and Technology (CONACyT) through scholarship number 389200 to pursue graduate studies at the University of Veracruz. Besides, the first author especially acknowledges to the INFOTEC Centro de Investigación e Innovación en Tecnologías de la Información y Comunicación for the borrowed computational resources and the received advice by the researchers during the research stay. The third author acknowledges support from CONACYT through project No. 220522. All the authors acknowledge to all the people who created, collected, and made available the temporal databases used in this work since without them this work would not have been successful.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Márquez-Grajales, A., Acosta-Mesa, HG., Mezura-Montes, E. et al. A multi-breakpoints approach for symbolic discretization of time series. Knowl Inf Syst 62, 2795–2834 (2020). https://doi.org/10.1007/s10115-020-01437-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01437-4