research-article

Debunking Four Long-Standing Misconceptions of Time-Series Distance Measures

Authors:

John Paparrizos,

Aaron J. Elmore,

Michael J. FranklinAuthors Info & Claims

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

Pages 1887 - 1905

https://doi.org/10.1145/3318464.3389760

Published: 31 May 2020 Publication History

Abstract

Distance measures are core building blocks in time-series analysis and the subject of active research for decades. Unfortunately, the most detailed experimental study in this area is outdated (over a decade old) and, naturally, does not reflect recent progress. Importantly, this study (i) omitted multiple distance measures, including a classic measure in the time-series literature; (ii) considered only a single time-series normalization method; and (iii) reported only raw classification error rates without statistically validating the findings, resulting in or fueling four misconceptions in the time-series literature. Motivated by the aforementioned drawbacks and our curiosity to shed some light on these misconceptions, we comprehensively evaluate 71 time-series distance measures. Specifically, our study includes (i) 8 normalization methods; (ii) 52 lock-step measures; (iii) 4 sliding measures; (iv) 7 elastic measures; (v) 4 kernel functions; and (vi) 4 embedding measures. We extensively evaluate these measures across 128 time-series datasets using rigorous statistical analysis. Our findings debunk four long-standing misconceptions that significantly alter the landscape of what is known about existing distance measures. With the new foundations in place, we discuss open challenges and promising directions.

Supplementary Material

MP4 File (3318464.3389760.mp4)

Presentation Video

Download
81.85 MB

References

[1]

Amaia Abanda, Usue Mori, and Jose A Lozano. 2019. A review on distance based time series classification. Data Mining and Knowledge Discovery 33, 2 (2019), 378--412.

Digital Library

[2]

Rakesh Agrawal, Christos Faloutsos, and Arun N. Swami. 1993. Efficient Similarity Search In Sequence Databases. In FODO. 69--84.

Digital Library

[3]

Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, and Kyuseok Shim. 1995. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In Proceeding of the 21th International Conference on Very Large Data Bases. Citeseer, 490--501.

Digital Library

[4]

Shadab Alam, Franco D Albareti, Carlos Allende Prieto, Friedrich Anders, Scott F Anderson, Timothy Anderton, Brett H Andrews, Eric Armengaud, Éric Aubourg, Stephen Bailey, et al.2015. The eleventh and twelfth data releases of the Sloan Digital Sky Survey: final data from SDSS-III. The Astrophysical Journal Supplement Series 219, 1(2015), 12.

[5]

Jonathan Alon, Stan Sclaroff, George Kollios, and Vladimir Pavlovic. 2003. Discovering clusters in motion time-series data. In CVPR. 375--381.

[6]

Francisco Martinez Alvarez, Alicia Troncoso, Jose C Riquelme, and Jesus S Aguilar Ruiz. 2010. Energy time series forecasting based on pattern sequence similarity. IEEE Transactions on Knowledge and Data Engineering 23, 8 (2010), 1230--1243.

Digital Library

[7]

Henrik André-Jönsson and Dushan Z Badal. 1997. Using signature files for querying time-series data. In European Symposium on Principles of Data Mining and Knowledge Discovery. Springer, 211--220.

[8]

Johannes Aßfalg, Hans-Peter Kriegel, Peer Kröger, Peter Kunath, Alexey Pryakhin, and Matthias Renz. 2006. Similarity search on time series based on threshold queries. In International Conference on Extending Database Technology. Springer, 276--294.

[9]

Martin Bach-Andersen, Bo Rømer-Odgaard, and Ole Winther. 2017. Flexible non-linear predictive models for large-scale wind turbine diagnostics. Wind Energy 20, 5 (2017), 753--764.

[10]

Anthony Bagnall, Hoang Anh Dau, Jason Lines, Michael Flynn, James Large, Aaron Bostrom, Paul Southam, and Eamonn Keogh. 2018.The UEA multivariate time series classification archive, 2018. arXivpreprint arXiv:1811.00075(2018).

[11]

Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. The great time series classification bake off: are view and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery 31, 3 (2017), 606--660.

Digital Library

[12]

Anthony J Bagnall and Gareth J Janacek. 2004. Clustering time series from ARMA models with clipped data. In KDD. 49--58.

[13]

Ziv Bar-Joseph. 2004. Analyzing time series gene expression data. Bioinformatics 20, 16 (2004), 2493--2503.

Digital Library

[14]

Ziv Bar-Joseph, Georg K Gerber, David K Gifford, Tommi S Jaakkola, and Itamar Simon. 2003. Continuous representations of time-series gene expression data.Journal of Computational Biology 10, 3--4 (2003),341--356.

[15]

Ziv Bar-Joseph, Anthony Gitter, and Itamar Simon. 2012. Studying and modelling dynamic biological processes using time-series gene expression data.Nature Reviews Genetics13, 8 (2012), 552.

[16]

Gustavo EAPA Batista, Eamonn J Keogh, Oben Moses Tataw, and Vinicius MA De Souza. 2014. CID: an efficient complexity-invariant distance for time series.Data Mining and Knowledge Discovery 28, 3(2014), 634--669.

[17]

Nurjahan Begum and Eamonn Keogh. 2014. Rare time series motif discovery from unbounded streams. Proceedings of the VLDB Endowment 8, 2 (2014), 149--160.

Digital Library

[18]

Donald J Berndt and James Clifford. 1994. Using Dynamic TimeWarping to Find Patterns in Time Series. In AAAI Workshop on KDD. 359--370.

[19]

Bharat B Biswal, Maarten Mennes, Xi-Nian Zuo, Suril Gohel, ClareKelly, Steve M Smith, Christian F Beckmann, Jonathan S Adelstein, Randy L Buckner, Stan Colcombe, et al. 2010. Toward discovery science of human brain function. Proceedings of the National Academy of Sciences 107, 10 (2010), 4734--4739.

[20]

R Bracewell. 1965. Pentagram notation for cross correlation. The Fourier transform and its applications. New York: McGraw-Hill46(1965), 243.

[21]

Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. In ACM sigmod record, Vol. 29. ACM, 93--104.

Digital Library

[22]

Peter J Brockwell and Richard A Davis. 2016.Introduction to timeseries and forecasting. springer.

[23]

Lisa Gottesfeld Brown. 1992. A survey of image registration techniques. ACM computing surveys (CSUR)24, 4 (1992), 325--376.

[24]

Yuhan Cai and Raymond Ng. 2004. Indexing spatio-temporal trajectories with Chebyshev polynomials. In SIGMOD. 599--610.

[25]

Alessandro Camerra, Themis Palpanas, Jin Shieh, and Eamonn Keogh. 2010. iSAX 2.0: Indexing and mining one billion time series. In 2010 IEEE International Conference on Data Mining. IEEE, 58--67.

Digital Library

[26]

Sung-Hyuk Cha. 2007. Comprehensive survey on distance/similarity measures between probability density functions. City1, 2 (2007), 1.

[27]

Lei Chen and Raymond Ng. 2004. On the marriage of Lp-norms and edit distance. InVLDB. 792--803.

[28]

Lei Chen, M Tamer Özsu, and Vincent Oria. 2005. Robust and fast similarity search for moving object trajectories. In SIGMOD. 491--502.

[29]

Qiuxia Chen, Lei Chen, Xiang Lian, Yunhao Liu, and Jeffrey Xu Yu.2007. Indexable PLA for efficient similarity search. In VLDB. 435--446.

[30]

Yueguo Chen, Mario A Nascimento, Beng Chin Ooi, and Anthony KHTung. 2007. Spade: On shape-based pattern detection in streaming time series. In ICDE. 786--795.

[31]

Bill Chiu, Eamonn Keogh, and Stefano Lonardi. 2003. Probabilistic discovery of time series motifs. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 493--498.

Digital Library

[32]

Kelvin Kam Wing Chu and Man Hon Wong. 1999. Fast time-series searching with scaling and shifting. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. Citeseer, 237--248.

[33]

Richard Cole, Dennis Shasha, and Xiaojian Zhao. 2005. Fast window correlations over uncooperative time series. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 743--749.

Digital Library

[34]

James W Cooley and John W Tukey. 1965. An algorithm for the machine calculation of complex Fourier series. Math. Comp.19, 90(1965), 297--301.

[35]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297.

Digital Library

[36]

Madalena Costa, Ary L Goldberger, and C-K Peng. 2002. Multiscale entropy analysis of complex physiologic time series.Physical review letters 89, 6 (2002), 068102.

[37]

Nello Cristianini and John Shawe-Taylor. 2000. An introduction to support vector machines and other kernel-based learning methods. Cambridge university press.

[38]

Marco Cuturi. 2011. Fast global alignment kernels. In Proceedings of the 28th international conference on machine learning (ICML-11). 929--936.

Digital Library

[39]

Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, and Themis Palpanas. 2012. Uncertain time-series similarity: Return to the basics. Proceedings of the VLDB Endowment 5, 11 (2012), 1662--1673.

Digital Library

[40]

Michele Dallachiesa, Themis Palpanas, and Ihab F Ilyas. 2014. Top-k nearest neighbor search in uncertain data series.Proceedings of the VLDB Endowment 8, 1 (2014), 13--24.

[41]

Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia MichaelYeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana,Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML. 2018. The UCR Time Series Classification Archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.

[42]

Janez Demar. 2006. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7 (2006),1--30.

Digital Library

[43]

Michel-Marie Deza and Elena Deza. 2006.Dictionary of distances. Elsevier.

[44]

Michel Marie Deza and Elena Deza. 2009. Encyclopedia of distances. In Encyclopedia of distances. Springer, 1--583.

[45]

Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment 1, 2 (2008), 1542--1552.

Digital Library

[46]

Rui Ding, Qiang Wang, Yingnong Dang, Qiang Fu, Haidong Zhang,and Dongmei Zhang. 2015. Yading: Fast clustering of large-scale time series data. Proceedings of the VLDB Endowment 8, 5 (2015), 473--484.

Digital Library

[47]

Alejandro Domínguez. 2015. A history of the convolution operation [Retrospectroscope]. IEEE pulse6, 1 (2015), 38--49.

[48]

Karima Echihabi, Kostas Zoumpatianos, Themis Palpanas, and Houda Benbrahim. 2018. The lernaean hydra of data series similarity search:An experimental evaluation of the state of the art. Proceedings of the VLDB Endowment 12, 2 (2018), 112--127.

Digital Library

[49]

Jason Ernst and Ziv Bar-Joseph. 2006. STEM: a tool for the analysis of short time series gene expression data. BMC bioinformatics 7, 1(2006), 191.

[50]

Philippe Esling and Carlos Agon. 2012. Time-series data mining. ACM Computing Surveys (CSUR)45, 1 (2012), 12.

[51]

Christos Faloutsos, M. Ranganathan, and Yannis Manolopoulos. 1994. Fast Subsequence Matching in Time-series Databases. In SIGMOD. 419--429.

[52]

Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems?The journal of machine learning research15,1 (2014), 3133--3181.

[53]

Elias Frentzos, Kostas Gratsias, and Yannis Theodoridis. 2007. Index-based most similar trajectory search. In ICDE. 816--825.

[54]

Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Amer. Statist. Assoc. 32 (1937), 675--701.

[55]

Daniel G Gavin, W Wyatt Oswald, Eugene R Wahl, and John W Williams. 2003. A statistical approach to evaluating distance metrics and analog assignments for pollen records.Quaternary Research 60, 3 (2003), 356--367.

[56]

Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, and Rajeev Motwani. 2000. Mining the stock market: Which measure is best. In Proc. of the 6th ACM SIGKDD. 487--496.

[57]

Rafael Giusti and Gustavo EAPA Batista. 2013. An Empirical Comparison of Dissimilarity Measures for Time Series Classification. In BRACIS. 82--88.

[58]

Steve Goddard, Sherri K Harms, Stephen E Reichenbach, Tsegaye Tadesse, and William J Waltman. 2003. Geospatial decision support for drought risk management. Commun. ACM46, 1 (2003), 35--37.

[59]

Dina Q Goldin and Paris C Kanellakis. 1995. On similarity queries for time-series data: constraint specification and implementation. In International Conference on Principles and Practice of Constraint Programming. Springer, 137--153.

[60]

Tomasz Górecki and Maciej Luczak. 2013. Using derivatives in time series classification.Data Mining and Knowledge Discovery 26, 2(2013), 310--331.

[61]

Aditya Grover, Ashish Kapoor, and Eric Horvitz. 2015. A deep hybrid model for weather forecasting. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 379--386.

Digital Library

[62]

Joel Grus. 2019. Data science from scratch: first principles with python. O'Reilly Media.

[63]

Jon Hills, Jason Lines, Edgaras Baranauskas, James Mapp, and Anthony Bagnall. 2014. Classification of time series by shapelet transformation. Data Mining and Knowledge Discovery 28, 4 (2014), 851--881.

Digital Library

[64]

Ove Hoegh-Guldberg, Peter J Mumby, Anthony J Hooten, Robert S Steneck, Paul Greenfield, Edgardo Gomez, C Drew Harvell, Peter FSale, Alasdair J Edwards, Ken Caldeira, et al. 2007. Coral reefs under rapid climate change and ocean acidification. Science 318, 5857 (2007), 1737--1742.

[65]

Rie Honda, Shuai Wang, Tokio Kikuchi, and Osamu Konishi. 2002.Mining of moving objects from time-series images and its application to satellite weather imagery. Journal of Intelligent Information Systems 19, 1 (2002), 79--93.

Digital Library

[66]

Bing Hu, Yanping Chen, and Eamonn Keogh. 2013. Time Series Classification under More Realistic Assumptions. In SDM. 578--586.

[67]

Pablo Huijse, Pablo A Estevez, Pavlos Protopapas, Jose C Principe, and Pablo Zegers. 2014. Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Computational Intelligence Magazine 9, 3 (2014), 27--39.

Digital Library

[68]

Young-Seon Jeong, Myong K Jeong, and Olufemi A Omitaomu. 2011. Weighted dynamic time warping for time series classification. Pattern Recognition 44, 9 (2011), 2231--2240.

Digital Library

[69]

Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta.2001. Distance measures for effective clustering of ARIMA time-series. In ICDM. 273--280.

[70]

Kunio Kashino, Gavin Smith, and Hiroshi Murase. 1999. Time-series active search for quick retrieval of audio and video. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), Vol. 6. IEEE, 2993--2996.

Digital Library

[71]

Shrikant Kashyap and Panagiotis Karras. 2011. Scalable knn search on vertically stored time series. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1334--1342.

Digital Library

[72]

Eamonn Keogh. 2006. A decade of progress in indexing and mining large time series databases. In VLDB. 1268--1268.

[73]

Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, and Sharad Mehrotra. 2001. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. In SIGMOD. 151--162.

[74]

Eamonn Keogh and Jessica Lin. 2005. Clustering of time-series subsequences is meaningless: Implications for previous and future research. Knowledge and Information Systems 8, 2 (2005), 154--177.

Digital Library

[75]

Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact indexing of dynamic time warping. Knowledge and Information Systems 7, 3 (2005), 358--386.

Digital Library

[76]

Chan Kin-pong and Fu Ada. 1999. Efficient Time Series Matching by Wavelets. In ICDE. 126--133.

[77]

S Knieling, J Niediek, E Kutter, J Bostroem, CE Elger, and F Mormann. 2017. An online adaptive screening procedure for selective neuronal responses. Journal of neuroscience methods291 (2017), 36--42.

[78]

Flip Korn, H. V. Jagadish, and Christos Faloutsos. 1997. Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences. In SIGMOD. 289--300.

[79]

Yann LeCun, Yoshua Bengio, et al.1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361, 10 (1995), 1995.

[80]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.

[81]

Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. 2012. Efficient backprop. InNeural networks: Tricks of the trade. Springer, 9--48.

[82]

Qi Lei, Jinfeng Yi, Roman Vaculin, Lingfei Wu, and Inderjit S Dhillon.2017. Similarity preserving representation learning for time series analysis. arXiv preprint arXiv:1702.03584(2017).

[83]

Chung-Sheng Li, Philip S. Yu, and Vittorio Castelli. 1996. Hierarchyscan: A hierarchical similarity search algorithm for databases of long sequences. In ICDE. IEEE, 546--553.

[84]

Xiang Lian, Lei Chen, Jeffrey Xu Yu, Guoren Wang, and Ge Yu. 2007. Similarity match over high speed time-series streams. InICDE. 1086--1095.

[85]

Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios Gunopulos. 2004. Iterative incremental clustering of time series. In EDBT. 106--122.

[86]

Michele Linardi and Themis Palpanas. 2018. Scalable, variable-length similarity search in data series: The ULISSE approach. Proceedings of the VLDB Endowment 11, 13 (2018), 2236--2248.

Digital Library

[87]

Jason Lines and Anthony Bagnall. 2015. Time series classification with ensembles of elastic distance measures.Data Mining and Knowledge Discovery 29, 3 (2015), 565--592.

[88]

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In2008 Eighth IEEE International Conference on Data Mining. IEEE, 413--422.

Digital Library

[89]

Helmut Lütkepohl, Markus Krätzig, and Peter CB Phillips. 2004. Applied time series econometrics. Cambridge university press.

[90]

Mohammad Saeid Mahdavinejad, Mohammadreza Rezvan, Moham-madamin Barekatain, Peyman Adibi, Payam Barnaghi, and Amit P Sheth. 2017. Machine learning for Internet of Things data analysis: Asurvey. Digital Communications and Networks(2017).

[91]

Rosario N Mantegna. 1999. Hierarchical structure in financial markets.The European Physical Journal B-Condensed Matter and Complex Systems 11, 1 (1999), 193--197.

[92]

Pierre-François Marteau. 2008. Time warp edit distance with stiffness adjustment for time series matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2 (2008), 306--318.

Digital Library

[93]

Pierre-François Marteau and Sylvie Gibet. 2014. On recursive edit distance kernels with application to time series classification. IEEE transactions on neural networks and learning systems 26, 6 (2014),1121--1133.

[94]

Francisco Martínez-Álvarez, Alicia Troncoso, Gualberto Asencio-Cortés, and José Riquelme. 2015. A survey on data mining techniques applied to electricity-related time series forecasting. Energies 8, 11(2015), 13162--13193.

[95]

Richard McCleary, Richard A Hay, Erroll E Meidinger, and David McDowall. 1980.Applied time series analysis for the social sciences. Sage Publications Beverly Hills, CA.

[96]

Vasileios Megalooikonomou, Qiang Wang, Guo Li, and Christos Faloutsos. 2005. A multiresolution symbolic representation of time series. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on. IEEE, 668--679.

Digital Library

[97]

Katsiaryna Mirylenka, Vassilis Christophides, Themis Palpanas, Ioannis Pefkianakis, and Martin May. 2016. Characterizing home device usage from wireless traffic time series.

[98]

Katsiaryna Mirylenka, Michele Dallachiesa, and Themis Palpanas. 2017. Data series similarity using correlation-aware measures. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. 1--12.

Digital Library

[99]

A Morales-Esteban, Francisco Martínez-Álvarez, A Troncoso, JL Justo,and Cristina Rubio-Escudero. 2010. Pattern recognition to forecast seismic time series.Expert Systems with Applications 37, 12 (2010),8333--8342.

[100]

Michael D Morse and Jignesh M Patel. 2007. An efficient and accurate method for evaluating time series similarity. In SIGMOD. 569--580.

[101]

Abdullah Mueen, Eamonn Keogh, and Neal Young. 2011. Logical-shapelets: An expressive primitive for time series classification. In KDD. 1154--1162.

Digital Library

[102]

Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, and Brandon Westover. 2009. Exact discovery of time series motifs. In Proceedings of the 2009 SIAM international conference on data mining. SIAM, 473--484.

[103]

Abdullah Mueen, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Gupta, and Eamonn Keogh. 2017.The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance. http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html.

[104]

Peter Nemenyi. 1963. Distribution-free Multiple Comparisons. Ph.D. Dissertation. Princeton University.

[105]

Themis Palpanas. 2015. Data series management: the road to big sequence analytics. ACM SIGMOD Record 44, 2 (2015), 47--52.

Digital Library

[106]

Themis Palpanas. 2016. Big sequence management: A glimpse of the past, the present, and the future. InInternational Conference onCurrent Trends in Theory and Practice of Informatics. Springer, 63--80.

Digital Library

[107]

Panagiotis Papapetrou, Vassilis Athitsos, Michalis Potamias, GeorgeKollios, and Dimitrios Gunopulos. 2011. Embedding-based subsequence matching in time-series databases. TODS 36, 3 (2011), 17.

Digital Library

[108]

John Paparrizos. 2019. 2018 UCR Time-Series Archive: Backward Compatibility, Missing Values, and Varying Lengths. https://github.com/johnpaparrizos/UCRArchiveFixes.

[109]

John Paparrizos and Michael J Franklin. 2019. GRAIL: efficient time-series representation learning. Proceedings of the VLDB Endowment12, 11 (2019), 1762--1777.

Digital Library

[110]

John Paparrizos and Luis Gravano. 2015. k-shape: Efficient and accurate clustering of time series. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 1855--1870.

Digital Library

[111]

John Paparrizos and Luis Gravano. 2017. Fast and Accurate Time-Series Clustering. ACM Transactions on Database Systems (TODS)42, 2 (2017), 8.

[112]

Athanasios Papoulis. 1962. The Fourier integral and its applications. McGraw-Hill.

[113]

C-K Peng, Shlomo Havlin, H Eugene Stanley, and Ary L Goldberger. 1995. Quantification of scaling exponents and crossover phenomenain nonstationary heartbeat time series. Chaos: An Interdisciplinary Journal of Nonlinear Science 5, 1 (1995), 82--87.

[114]

François Petitjean, Germain Forestier, Geoffrey I Webb, Ann E Nicholson, Yanping Chen, and Eamonn Keogh. 2014. Dynamic time warping averaging of time series allows faster and more accurate classification. In 2014 IEEE international conference on data mining. IEEE, 470--479.

Digital Library

[115]

François Petitjean, Germain Forestier, Geoffrey I Webb, Ann E Nicholson, Yanping Chen, and Eamonn Keogh. 2016. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowledge and Information Systems 47, 1 (2016), 1--26.

Digital Library

[116]

François Petitjean, Alain Ketterlin, and Pierre Gançarski. 2011. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition 44, 3 (2011), 678--693.

Digital Library

[117]

Davood Rafiei and Alberto Mendelzon. 1997. Similarity-based queries for time series data. In ACM SIGMOD Record, Vol. 26. ACM, 13--25.

Digital Library

[118]

Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. 2012. Searching and mining trillions of time series subsequences under dynamic time warping. InKDD. 262--270.

[119]

Chotirat Ann Ralanamahatana, Jessica Lin, Dimitrios Gunopulos, Eamonn Keogh, Michail Vlachos, and Gautam Das. 2005. Mining time series data. InData mining and knowledge discovery handbook. Springer, 1069--1103.

[120]

Chotirat Ann Ratanamahatana and Eamonn Keogh. 2004. Making time-series classification more accurate using learned constraints. In SDM. 11--22.

[121]

Usman Raza, Alessandro Camerra, Amy L Murphy, Themis Palpanas, and Gian Pietro Picco. 2015. Practical data prediction for real-world wireless sensor networks.IEEE Transactions on Knowledge and DataEngineering 27, 8 (2015), 2231--2244.

Digital Library

[122]

John Rice. 2006.Mathematical statistics and data analysis. Cengage Learning.

[123]

Joshua S Richman and J Randall Moorman. 2000. Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology 278, 6(2000), H2039--H2049.

[124]

Kexin Rong, Clara E Yoon, Karianne J Bergen, Hashem Elezabi, Peter Bailis, Philip Levis, and Gregory C Beroza. 2018. Locality-sensitive hashing for earthquake detection: A case study of scaling data-driven science. Proceedings of the VLDB Endowment11, 11 (2018), 1674--1687.

Digital Library

[125]

Eduardo J Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis, and Alejandro Jaimes. 2012. Correlating financial time series with micro-blogging activity. InProceedings of the fifth ACM international conference on Web search and data mining. ACM, 513--522.

[126]

Hiroaki Sakoe and Seibi Chiba. 1971. A dynamic programming approach to continuous speech recognition. In ICA. 65--69.

[127]

Hiroaki Sakoe and Seibi Chiba. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing 26, 1 (1978), 43--49.

[128]

Yasushi Sakurai, Spiros Papadimitriou, and Christos Faloutsos. 2005.Braid: Stream mining through group lag correlations. In SIGMOD. ACM, 599--610.

[129]

Patrick Schäfer and Mikael Högqvist. 2012. SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. InProceedings of the 15th International Conference on Extend-ing Database Technology. ACM, 516--527.

Digital Library

[130]

Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1997. Kernel principal component analysis. InInternational Conference on Artificial Neural Networks. Springer, 583--588.

[131]

Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation 10, 5 (1998), 1299--1319.

Digital Library

[132]

Bernhard Schölkopf and Alexander J Smola. 2002. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press.

[133]

Pavel Senin, Jessica Lin, Xing Wang, Tim Oates, Sunil Gandhi,Arnold P Boedihardjo, Crystal Chen, and Susan Frankenstein. 2015. Time series anomaly discovery with grammar-based compression. In Edbt. 481--492.

[134]

Dennis Shasha. 1999. Tuning time series queries in finance: Case studies and recommendations. IEEE Data Eng. Bull. 22, 2 (1999),40--46.

[135]

Jin Shieh and Eamonn Keogh. 2008. i SAX: indexing and mining terabyte sized time series. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM, 623--631.

Digital Library

[136]

Yutao Shou, Nikos Mamoulis, and David Cheung. 2005. Fast and exact warping of time series using adaptive segmental approximations.Machine Learning 58, 2--3 (2005), 231--267.

[137]

Alexandra Stefan, Vassilis Athitsos, and Gautam Das. 2013. The move-split-merge metric for time series. TKDE 25, 6 (2013), 1425--1438.

Digital Library

[138]

Ruey S Tsay. 2014. Financial Time Series. Wiley StatsRef: Statistics Reference Online(2014), 1--23.

[139]

Kuniaki Uehara and Mitsuomi Shimada. 2002. Extraction of primitive motion and discovery of association rules from human motion data. In Progress in Discovery Science. Springer, 338--348.

[140]

Michail Vlachos, Marios Hadjieleftheriou, Dimitrios Gunopulos, and Eamonn Keogh. 2006. Indexing multidimensional time-series. The VLDB Journal 15, 1 (2006), 1--20.

Digital Library

[141]

Michail Vlachos, George Kollios, and Dimitrios Gunopulos. 2002. Discovering similar multidimensional trajectories. In Proceedings 18th international conference on data engineering. IEEE, 673--684.

[142]

Gabriel Wachman, Roni Khardon, Pavlos Protopapas, and Charles R Alcock. 2009. Kernels for periodic time series arising in astronomy. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 489--505.

[143]

Hao Wang, Yilun Cai, Yin Yang, Shiming Zhang, and Nikos Mamoulis. 2014. Durable Queries over Historical Time Series. TKDE 26, 3 (2014),595--607.

Digital Library

[144]

Xiaoyue Wang, Abdullah Mueen, Hui Ding, Goce Trajcevski, Peter Scheuermann, and Eamonn Keogh. 2013. Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery(2013), 1--35.

[145]

Xiaozhe Wang, Kate Smith, and Rob Hyndman. 2006. Characteristic-based clustering for time series data. Data mining and knowledge Discovery 13, 3 (2006), 335--364.

[146]

Yang Wang, Peng Wang, Jian Pei, Wei Wang, and Sheng Huang. 2013. A data-adaptive and dynamic segmentation index for whole matching on time series. Proceedings of the VLDB Endowment 6, 10 (2013), 793--804.

Digital Library

[147]

T Warren Liao. 2005. Clustering of time series data - a survey. Pattern Recognition 38, 11 (2005), 1857--1874.

Digital Library

[148]

Peter J Webster, Greg J Holland, Judith A Curry, and H-R Chang.2005. Changes in tropical cyclone number, duration, and intensity in a warming environment. Science 309, 5742 (2005), 1844--1846.

[149]

Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin(1945), 80--83.

[150]

Billy M Williams and Lester A Hoel. 2003. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results. Journal of transportation engineering 129, 6(2003), 664--672.

[151]

Lingfei Wu, Ian En-Hsu Yen, Jinfeng Yi, Fangli Xu, Qi Lei, and Michael Witbrock. 2018. Random Warping Series: A Random Features Method for Time-Series Embedding. In AISTATS. 793--802.

[152]

Xiaopeng Xi, Eamonn Keogh, Christian Shelton, Li Wei, and Chotirat Ann Ratanamahatana. 2006. Fast time series classification using numerosity reduction. In Proceedings of the 23rd international conference on Machine learning. ACM, 1033--1040.

Digital Library

[153]

Yimin Xiong and Dit-Yan Yeung. 2002. Mixtures of ARMA models for model-based time series clustering. In ICDM. 717--720.

[154]

Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In WSDM. 177--186.

[155]

Dragomir Yankov, Eamonn Keogh, and Umaa Rebbapragada. 2008. Disk aware discord discovery: Finding unusual time series in terabyte sized datasets. Knowledge and Information Systems 17, 2 (2008), 241--262.

Digital Library

[156]

Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: a new primitive for data mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 947--956.

Digital Library

[157]

Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. 2016. Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining(ICDM). IEEE, 1317--1322.

[158]

Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum,Yifei Ding, Hoang Anh Dau, Zachary Zimmerman, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. 2018. Time series joins,motifs, discords and shapelets: a unifying view that exploits the matrix profile. Data Mining and Knowledge Discovery 32, 1 (2018), 83--123.

Digital Library

[159]

Mi-Yen Yeh, Kun-Lung Wu, Philip S Yu, and Ming-Syan Chen. 2009.PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. InProceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, 684--695.

Digital Library

[160]

Byoung-Kee Yi and Christos Faloutsos. 2000. Fast time sequence indexing for arbitrary Lp norms. VLDB.

[161]

Jesin Zakaria, Abdullah Mueen, and Eamonn Keogh. 2012. Clustering Time Series Using Unsupervised-Shapelets. In ICDM. 785--794.

[162]

Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. 2006. Similarity search: the metric space approach. Vol. 32. Springer Science & Business Media.

[163]

Guoqing Zheng, Yiming Yang, and Jaime Carbonell. 2016. Efficient shift-invariant dictionary learning. In SIGKDD. ACM, 2095--2104.

Digital Library

[164]

Kostas Zoumpatianos, Stratos Idreos, and Themis Palpanas. 2016. ADS: the adaptive data series index.The VLDB Journal-The International Journal on Very Large Data Bases 25, 6 (2016), 843--866.

Cited By

d’Hondt JPapapetrou OPaparrizos J(2024)Beyond the Dimensions: A Structured Evaluation of Multivariate Time Series Distance Measures2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00020(107-112)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDEW61823.2024.00020
Boniol PPaparrizos JPalpanas T(2024)An Interactive Dive into Time-Series Anomaly Detection2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00409(5382-5386)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00409
Zhang HLi JFeng JYao QDong Y(2024)Accelerating time series similarity search under Move-Split-Merge distance via dissimilarity space embeddingExpert Systems with Applications10.1016/j.eswa.2024.124889255(124889)Online publication date: Dec-2024
https://doi.org/10.1016/j.eswa.2024.124889
Show More Cited By

Index Terms

Recommendations

Query-sensitive distance measure selection for time series nearest neighbor classification

Many distance or similarity measures have been proposed for time series similarity search. However, none of these measures is guaranteed to be optimal when used for 1-Nearest Neighbor (NN) classification. In this paper we study the problem of selecting ...
When Similarity Measures Lie
SISAP 2015: Proceedings of the 8th International Conference on Similarity Search and Applications - Volume 9371

Do similarity or distance measures ever go wrong? The inherent subjectivity in similarity discernment has long supported the view that all judgements of similarity are equally valid, and that any selected similarity measure may only be considered more ...
On efficient network similarity measures
Highlights
- The approach is novel and application oriented.
- It outperforms classical graph ...
Abstract
This paper presents novel graph similarity measures which can be applied to simple directed and undirected networks. To define the graph similarity measures, we first map graphs to real numbers by utilizing structural graph measures. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

June 2020

2925 pages

ISBN:9781450367356

DOI:10.1145/3318464

General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SIGMOD/PODS '20

Sponsor:

SIGMOD

SIGMOD/PODS '20: International Conference on Management of Data

June 14 - 19, 2020

OR, Portland, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
1,039
Total Downloads

Downloads (Last 12 months)134
Downloads (Last 6 weeks)11

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

d’Hondt JPapapetrou OPaparrizos J(2024)Beyond the Dimensions: A Structured Evaluation of Multivariate Time Series Distance Measures2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW61823.2024.00020(107-112)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDEW61823.2024.00020
Boniol PPaparrizos JPalpanas T(2024)An Interactive Dive into Time-Series Anomaly Detection2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00409(5382-5386)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00409
Zhang HLi JFeng JYao QDong Y(2024)Accelerating time series similarity search under Move-Split-Merge distance via dissimilarity space embeddingExpert Systems with Applications10.1016/j.eswa.2024.124889255(124889)Online publication date: Dec-2024
https://doi.org/10.1016/j.eswa.2024.124889
Giannoulidis AGounaris ANaskos ANikolaidis NCaljouw D(2024)Engineering and evaluating an unsupervised predictive maintenance solution: a cold-forming press case-studyJournal of Intelligent Manufacturing10.1007/s10845-024-02352-zOnline publication date: 28-Mar-2024
https://doi.org/10.1007/s10845-024-02352-z
Ting KLiu ZGong LZhang HZhu Y(2024)A new distributional treatment for time series anomaly detectionThe VLDB Journal10.1007/s00778-023-00832-x33:3(753-780)Online publication date: 15-Feb-2024
https://doi.org/10.1007/s00778-023-00832-x
De Felice GGoulermas JGusev VOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Time series kernels based on nonlinear vector AutoRegressive delay embeddingsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667740(37230-37251)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667740
Paparrizos JReddy S(2023)Odyssey: An Engine Enabling the Time-Series Clustering JourneyProceedings of the VLDB Endowment10.14778/3611540.361162216:12(4066-4069)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.14778/3611540.3611622
Sylligardos EBoniol PPaparrizos JTrahanias PPalpanas T(2023)Choose Wisely: An Extensive Evaluation of Model Selection for Anomaly Detection in Time SeriesProceedings of the VLDB Endowment10.14778/3611479.361153616:11(3418-3432)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611536
Paparrizos JWu KElmore AFaloutsos CFranklin M(2023)Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding DistancesProceedings of the VLDB Endowment10.14778/3594512.359453016:8(2019-2032)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.14778/3594512.3594530
Barbarioli BMersy GSintos SKrishnan S(2023)Hierarchical Residual Encoding for Multiresolution Time Series CompressionProceedings of the ACM on Management of Data10.1145/35889531:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588953
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents