Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Scalable, variable-length similarity search in data series: the ULISSE approach

Published: 01 September 2018 Publication History

Abstract

Data series similarity search is an important operation and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing indexes can only answer queries of a single length (fixed at index construction time), which is a severe limitation. In this work, we propose ULISSE, the first data series index structure designed for answering similarity search queries of variable length. Our contribution is two-fold. First, we introduce a novel representation technique, which effectively and succinctly summarizes multiple sequences of different length (irrespective of Z-normalization). Based on the proposed index, we describe efficient algorithms for approximate and exact similarity search, combining disk based index visits and in-memory sequential scans. We experimentally evaluate our approach using several synthetic and real datasets. The results show that ULISSE is several times (and up to orders of magnitude) more efficient in terms of both space and time cost, when compared to competing approaches.

References

[1]
K. Kashino, G. Smith, and H. Murase, "Time-series active search for quick retrieval of audio and video," in ICASSP, 1999.
[2]
U. Raza, A. Camerra, A. L. Murphy, T. Palpanas, and G. P. Picco, "Practical data prediction for real-world wireless sensor networks," IEEE Trans. Knowl. Data Eng., 2015.
[3]
D. Shasha, "Tuning time series queries in finance: Case studies and recommendations," IEEE Data Eng. Bull., 1999.
[4]
P. Huijse, P. A. Estévez, P. Protopapas, J. C. Principe, and P. Zegers, "Computational intelligence challenges and applications on large-scale astronomical time series databases," 2014.
[5]
T. Palpanas, "Data series management: The road to big sequence analytics," SIGMOD Rec., 2015.
[6]
ESA. SENTINEL-2 mission. [Online]. Available: https://sentinel.esa.int/web/sentinel/missions/sentinel-2
[7]
K. Zoumpatianos and T. Palpanas., "Data series management: Fulfilling the need for big sequence analytics." in ICDE, 2018.
[8]
V. Niennattrakul and C. A. Ratanamahatana, "On clustering multimedia time series data using k-means and dynamic time warping," ser. MUE '07, 2007.
[9]
J. Lines and A. Bagnall, "Time series classification with ensembles of elastic distance measures," Data Mining and Knowledge Discovery, 2015.
[10]
P. Senin, J. Lin, X. Wang, T. Oates, S. Gandhi, A. P. Boedihardjo, C. Chen, and S. Frankenstein, "Time series anomaly discovery with grammar-based compression," in EDBT, 2015.
[11]
Y. Wang, P. Wang, J. Pei, W. Wang, and S. Huang, "A data-adaptive and dynamic segmentation index for whole matching on time series," PVLDB 6(10):793--804, 2013.
[12]
K. Zoumpatianos, S. Idreos, and T. Palpanas, "Indexing for interactive exploration of big data series," in SIGMOD, 2014.
[13]
T. Palpanas, "Big sequence management: A glimpse of the past, the present, and the future," in SOFSEM, 2016.
[14]
T. Palpanas, "The parallel and distributed future of data series mining," in High Performance Computing & Simulation (HPCS), 2017.
[15]
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, "Fast subsequence matching in time-series databases," in SIGMOD, 1994.
[16]
D. Rafiei and A. Mendelzon, "Efficient retrieval of similar time sequences using dft," in ICDE, 1998.
[17]
E. J. Keogh, T. Palpanas, V. B. Zordan, D. Gunopulos, and M. Cardle, "Indexing large human-motion databases," in VLDB, 2004.
[18]
I. Assent, R. Krieger, F. Afschari, and T. Seidl, "The ts-tree: Efficient time series search and retrieval," in EDBT, 2008.
[19]
J. Shieh and E. J. Keogh, "isax: indexing and mining terabyte sized time series," in KDD, 2008, pp. 623--631. [Online]. Available
[20]
S. Kadiyala and N. Shiri, "A compact multi-resolution index for variable length queries in time series databases," KAIS, 2008.
[21]
A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. J. Keogh, "Beyond one billion time series: indexing and mining very large time series collections with isax2+," KAIS, 2014.
[22]
M. Dallachiesa, T. Palpanas, and I. F. Ilyas, "Top-k nearest neighbor search in uncertain data series," PVLDB(8)1:13--24, 2014.
[23]
K. Zoumpatianos, S. Idreos, and T. Palpanas, "RINSE: interactive data series exploration with ADS+," PVLDB (8)12:1912--1915, 2015.
[24]
K. Zoumpatianos, "ADS: the adaptive data series index," VLDB J. 25(6):843--866, 2016.
[25]
D. E. Yagoubi, R. Akbarinia, F. Masseglia, and T. Palpanas, "Dpisax: Massively distributed partitioned isax," in ICDM, 2017, pp. 1135--1140.
[26]
H. Kondylakis, N. Dayan, K. Zoumpatianos, and T. Palpanas, "Coconut: A scalable bottom-up approach for building data series indexes," PVLDB (11)6:677--690, 2018.
[27]
T. Kahveci and A. Singh, "Variable length queries for time series data," in Proceedings 17th International Conference on Data Engineering, 2001.
[28]
T. Rakthanmanon, B. J. L. Campana, A. Mueen, G. E. A. P. A. Batista, M. B. Westover, Q. Zhu, J. Zakaria, and E. J. Keogh, "Searching and mining trillions of time series subsequences under dynamic time warping," in SIGKDD, 2012.
[29]
M. Linardi, Y. Zhu, T. Palpanas, and E. J. Keogh, "Matrix profile X: VALMOD - scalable discovery of variable-length motifs in data series," in SIGMOD Conference 2018.
[30]
M. Linardi, "VALMOD: A suite for easy and exact detection of variable length motifs in data series," in SIGMOD Conference 2018.
[31]
A. G. H. of Operational Intelligence Department Airbus., "Personal communication." 2017.
[32]
"Automatic detection of cyclic alternating pattern (cap) sequences in sleep: preliminary results," Clinical Neurophysiology, 1999.
[33]
E. J. Keogh and S. Kasetty, "On the need for time series data mining benchmarks: A survey and empirical demonstration," Data Min. Knowl. Discov., 2003.
[34]
A. Camerra, T. Palpanas, J. Shieh, and E. J. Keogh, "isax 2.0: Indexing and mining one billion time series," in ICDM 2010.
[35]
M. Linardi and T. Palpanas, "ULISSE: ULtra compact Index for Variable-Length Similarity SEarch in Data Series," in ICDE 2018. [Online]. Available: http://www.mi.parisdescartes.fr/~themisp/publications/icde18-ulisse.pdf
[36]
Y. Bu, T. wing Leung, A. W. chee Fu, E. Keogh, J. Pei, and S. Meshkin, "Wat: Finding top-k discords in time series database," in SDM, 2007, pp. 449--454.
[37]
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, "Dimensionality reduction for fast similarity search in large time series databases," KAIS, vol. 3, 2000.
[38]
A. Mueen, H. Hamooni, and T. Estrada, "Time series join on subsequence correlation," in ICDM 2014, 2014.
[39]
E. J. Keogh and C. A. Ratanamahatana, "Exact indexing of dynamic time warping," Knowl. Inf. Syst., 2005.
[40]
J. Lin, E. Keogh, L. Wei, and S. Lonardi, "Experiencing sax: a novel symbolic representation of time series," Data Mining and Knowledge Discovery, 2007.
[41]
K. Zoumpatianos, Y. Lou, T. Palpanas, and J. Gehrke, "Query workloads for data series indexes," in SIGKDD, 2015, pp. 1603--1612.
[42]
www.mi.parisdescartes.fr/~mlinardi/ULISSE.html.
[43]
W. e. a. G. C. P. L. H. F. M. J. T. S. Soldi, V. Beckmann, "Long-term variability of agn at hard x-rays," Astronomy & Astrophysics, 2014.
[44]
IRIS. Seismic Data Access 2016. [Online]. Available: http://ds.iris.edu/data/access
[45]
A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. J. Keogh, "The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances," Data Min. Knowl. Discov., 2017.
[46]
X. Wang, A. Mueen, H. Ding, G. Trajcevski, P. Scheuermann, and E. J. Keogh, "Experimental comparison of representation methods and distance measures for time series data," Data Min. Knowl. Discov., 2013.
[47]
Y. Chen, E. Keogh, B. Hu, N. Begum, A. Bagnall, A. Mueen, and G. Batista, "The ucr time series classification archive," July 2015, www.cs.ucr.edu/~eamonn/time_series_data/.
  1. Scalable, variable-length similarity search in data series: the ULISSE approach

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 11, Issue 13
    September 2018
    218 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 September 2018
    Published in PVLDB Volume 11, Issue 13

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 97
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media