Dual-Metric Clustering for Multivariate Time Series: KMeans with DTW and QuadTree with Entropy
Resumo
The efficacy of machine learning models are contingent on input data quality and model selection itself. In this work we highlight the importance of data quality, particularly in identifying regions within the input space that exhibit similar behavior. Clustering is used to group similar data, and is explored for their potential to enhance model performance by identifying these regions. The aim of this paper is to provide insights into the effectiveness of using clustering to improve machine learning model performance.
Referências
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. (2022). Introduction to algorithms. MIT press.
de Berg, M., Cheong, O., van Kreveld, M., and Overmars, M. (2008). Computational Geometry: Algorithms and Applications. Springer Berlin Heidelberg.
Finkel, R. and Bentley, J. (1974). Quad trees: A data structure for retrieval on composite keys. Acta Inf., 4:1–9.
Montero-Manso, P. and Hyndman, R. J. (2021). Principles and algorithms for forecasting groups of time series: Locality and globality. International Journal of Forecasting, 37(4):1632–1653.
Mueen, A. and Keogh, E. J. (2016). Extracting optimal performance from dynamic time warping. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 2129–2130. ACM.
Ribeiro, V., Pena, E. H. M., de Freitas Saldanha, R., Akbarinia, R., Valduriez, P., Khan, F. A., Stoyanovich, J., and Porto, F. (2023). Subset modelling: A domain partitioning strategy for data-efficient machine-learning. In Proceedings of the 38th Brazilian Symposium on Databases, SBBD 2023, Belo Horizonte, MG, Brazil, September 25-29, 2023, pages 318–323. SBC.
Vázquez, I., Villar, J. R., Sedano, J., and Simić, S. (2021). A comparison of multivariate time series clustering methods. In 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020) 15, pages 571–579. Springer.
Warren Liao, T. (2005). Clustering of time series data—a survey. Pattern Recognition, 38(11):1857–1874.