Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Querying and mining of time series data: experimental comparison of representations and distance measures

Published: 01 August 2008 Publication History

Abstract

The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic.

References

[1]
Additional Experiment Results for Representation and Similarity Measures of Time Series. http://www.ece.northwestern.edu/~hdi117/tsim.htm.
[2]
R. T. Ng (2006), Note of Caution. http://www.cs.ubc.ca/~rng/psdepository/chebyReport2.pdf.
[3]
H. André-Jönsson and D. Z. Badal. Using signature files for querying time-series data. In PKDD, 1997.
[4]
J. Aßfalg, H.-P. Kriegel, P. Kröger, P. Kunath, A. Pryakhin, and M. Renz. Similarity search on time series based on threshold queries. In EDBT, 2006.
[5]
D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In KDD Workshop, 1994.
[6]
Y. Cai and R. T. Ng. Indexing spatio-temporal trajectories with chebyshev polynomials. In SIGMOD Conference, 2004.
[7]
M. Cardle. Automated motion editing. In Technical Report, Computer Laboratory, University of Cambridge, UK, 2004.
[8]
L. Chen and R. T. Ng. On the marriage of lp-norms and edit distance. In VLDB, 2004.
[9]
L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD Conference, 2005.
[10]
L. Chen, M. T. Özsu, and V. Oria. Using multi-scale histograms to answer pattern existence and shape match queries. In SSDBM, 2005.
[11]
Q. Chen, L. Chen, X. Lian, Y. Liu, and J. X. Yu. Indexable PLA for Efficient Similarity Search. In VLDB, 2007.
[12]
Y. Chen, M. A. Nascimento, B. C. Ooi, and A. K. H. Tung. SpADe: On Shape-based Pattern Detection in Streaming Time Series. In ICDE, 2007.
[13]
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast Subsequence Matching in Time-Series Databases. In SIGMOD Conference, 1994.
[14]
E. Frentzos, K. Gratsias, and Y. Theodoridis. Index-based most similar trajectory search. In ICDE, 2007.
[15]
P. Geurts. Pattern Extraction for Time Series Classification. In PKDD, 2001.
[16]
P. Geurts. Contributions to Decision Tree Induction: bias/variance tradeoff and time series classification. PhD thesis, University of Lige, Belgium, May 2002.
[17]
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, CA, 2005.
[18]
I. Karydis, A. Nanopoulos, A. N. Papadopoulos, and Y. Manolopoulos. Evaluation of similarity searching methods for music data in P2P networks. IJBIDM, 1(2), 2005.
[19]
K. Kawagoe and T. Ueda. A Similarity Search Method of Time Series Data with Combination of Fourier and Wavelet Transforms. In TIME, 2002.
[20]
E. Keogh, X. Xi, L. Wei, and C. Ratanamahatana. The UCR Time Series dataset. In http://www.cs.ucr.edu/~eamonn/time_series_data/, 2006.
[21]
E. J. Keogh. Exact Indexing of Dynamic Time Warping. In VLDB, 2002.
[22]
E. J. Keogh. A Decade of Progress in Indexing and Mining Large Time Series Databases. In VLDB, 2006.
[23]
E. J. Keogh, K. Chakrabarti, S. Mehrotra, and M. J. Pazzani. Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. In SIGMOD Conference, 2001.
[24]
E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Knowl. Inf. Syst., 3(3), 2001.
[25]
E. J. Keogh and S. Kasetty. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. Data Min. Knowl. Discov., 7(4), 2003.
[26]
E. J. Keogh and C. A. Ratanamahatana. Exact indexing of dynamic time warping. Knowl. Inf. Syst., 7(3), 2005.
[27]
S.-W. Kim, S. Park, and W. W. Chu. An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases. In ICDE, 2001.
[28]
R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, 1995.
[29]
F. Korn, H. V. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In SIGMOD Conference, 1997.
[30]
J. Lin, E. J. Keogh, L. Wei, and S. Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov., 15(2), 2007.
[31]
M. D. Morse and J. M. Patel. An efficient and accurate method for evaluating time series similarity. In SIGMOD Conference, 2007.
[32]
Pang-Ning Tan and Michael Steinbach and Vipin Kumar. Introduction to Data Mining. Addison-Wesley, Reading, MA, 2005.
[33]
K. pong Chan and A. W.-C. Fu. Efficient Time Series Matching by Wavelets. In ICDE, 1999.
[34]
I. Popivanov and R. J. Miller. Similarity Search Over Time-Series Data Using Wavelets. In ICDE, 2002.
[35]
C. A. Ratanamahatana and E. J. Keogh. Three myths about dynamic time warping data mining. In SDM, 2005.
[36]
Richard O. Duda and Peter E. Hart. Pattern Classification and Scene Analysis. John Wiley & Sons, 1973.
[37]
S. Salzberg. On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Min. Knowl. Discov., 1(3), 1997.
[38]
M. Steinbach, P.-N. Tan, V. Kumar, S. A. Klooster, and C. Potter. Discovery of climate indices using clustering. In KDD, 2003.
[39]
M. Vlachos, D. Gunopulos, and G. Kollios. Discovering similar multidimensional trajectories. In ICDE, 2002.
[40]
M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. J. Keogh. Indexing Multidimensional Time-Series. VLDB J., 15(1), 2006.
[41]
Y.-L. Wu, D. Agrawal, and A. E. Abbadi. A Comparison of DFT and DWT based Similarity Search in Time-Series Databases. In CIKM, 2000.
[42]
X. Xi, E. J. Keogh, C. R. Shelton, L. Wei, and C. A. Ratanamahatana. Fast time series classification using numerosity reduction. In ICML, 2006.
[43]
B.-K. Yi and C. Faloutsos. Fast Time Sequence Indexing for Arbitrary Lp Norms. In VLDB, 2000.
[44]
B.-K. Yi, H. V. Jagadish, and C. Faloutsos. Efficient retrieval of similar time sequences under time warping. In ICDE. IEEE Computer Society, 1998.
[45]
Y. Zhu and D. Shasha. Warping Indexes with Envelope Transforms for Query by Humming. In SIGMOD Conference, 2003.

Cited By

View all
  • (2024)DeepSketch: A Query Sketching Interface for Deep Time Series Similarity SearchProceedings of the VLDB Endowment10.14778/3685800.368587717:12(4369-4372)Online publication date: 1-Aug-2024
  • (2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 1-May-2024
  • (2024)DarkSim: A similarity-based time-series analytic framework for darknet trafficProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3688426(241-258)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 1, Issue 2
August 2008
461 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2008
Published in PVLDB Volume 1, Issue 2

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)277
  • Downloads (Last 6 weeks)30
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)DeepSketch: A Query Sketching Interface for Deep Time Series Similarity SearchProceedings of the VLDB Endowment10.14778/3685800.368587717:12(4369-4372)Online publication date: 1-Aug-2024
  • (2024)CIVET: Exploring Compact Index for Variable-Length Subsequence Matching on Time SeriesProceedings of the VLDB Endowment10.14778/3665844.366584517:9(2123-2135)Online publication date: 1-May-2024
  • (2024)DarkSim: A similarity-based time-series analytic framework for darknet trafficProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3688426(241-258)Online publication date: 4-Nov-2024
  • (2024)An accurate slicing method for dynamic time warping algorithm and the segment-level early abandoning optimizationKnowledge-Based Systems10.1016/j.knosys.2024.112231300:COnline publication date: 18-Nov-2024
  • (2024)Task-oriented analysis and visualization of correlation patterns in multi-sensor time seriesKnowledge-Based Systems10.1016/j.knosys.2024.111525289:COnline publication date: 8-Apr-2024
  • (2024)Time series clustering to improve one-class classifier performance▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2023.122895243:COnline publication date: 25-Jun-2024
  • (2024)SE-shapeletsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122584240:COnline publication date: 15-Apr-2024
  • (2024)Dynamic trajectory partition optimization method based on historical trajectory dataApplied Soft Computing10.1016/j.asoc.2023.111120151:COnline publication date: 17-Apr-2024
  • (2024)Somtimes: self organizing maps for time series clustering and its application to serious illness conversationsData Mining and Knowledge Discovery10.1007/s10618-023-00979-938:3(813-839)Online publication date: 1-May-2024
  • (2024)DumpyOS: A data-adaptive multi-ary index for scalable data series similarity searchThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00874-933:6(1887-1911)Online publication date: 1-Nov-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media