Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1835804.1835854acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

DUST: a generalized notion of similarity between uncertain time series

Published: 25 July 2010 Publication History

Abstract

Large-scale sensor deployments and an increased use of privacy-preserving transformations have led to an increasing interest in mining uncertain time series data. Traditional distance measures such as Euclidean distance or dynamic time warping are not always effective for analyzing uncertain time series data. Recently, some measures have been proposed to account for uncertainty in time series data. However, we show in this paper that their applicability is limited. In specific, these approaches do not provide an intuitive way to compare two uncertain time series and do not easily accommodate multiple error functions.
In this paper, we provide a theoretical framework that generalizes the notion of similarity between uncertain time series. Secondly, we propose DUST, a novel distance measure that accommodates uncertainty and degenerates to the Euclidean distance when the distance is large compared to the error. We provide an extensive experimental validation of our approach for the following applications: classification, top-k motif search, and top-k nearest-neighbor queries.

Supplementary Material

JPG File (kdd2010_sarangi_dust_01.jpg)
MOV File (kdd2010_sarangi_dust_01.mov)

References

[1]
C. C. Aggarwal and P. S. Yu. A framework for clustering uncertain data streams. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, 2008.
[2]
C. C. Aggarwal and P. S. Yu. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 21(5):609--623, 2009.
[3]
J. Assfalg, H. Kriegel, P. Kroeger, and M. Renz. Probabilistic similarity search for uncertain time series. In Proceedings of the 21st International Conference on Scientific and Statistical Database Management, 2009.
[4]
D. J. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In Proceedings of the 1994 AAAI Workshop, 1994.
[5]
R. Braff and C. Shively. A method of over bounding ground based augmentation system (gbas) heavy tail error distributions. Journal of Navigation, 58(1):83--103, 2005.
[6]
S. Cho. Bidirectional data aggregation scheme for wireless sensor networks. In Proceedings of the 3rd International Conference on Ubiquitous Intelligence and Computing, 2006.
[7]
P. Ciarlini and U. Maniscalco. Mixture of soft sensors for monitoring air ambient parameters. In Proceedings of the XVIII IMEKO World Congress, 2006.
[8]
H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment, 1(2):1542--1552, 2008.
[9]
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. SIGMOD Record, 23(2):419--429, 1994.
[10]
S. R. Jeffery, M. Garofalakis, and M. J. Franklin. Adaptive cleaning for rfid data streams. In Proceedings of the 32nd International Conference on Very Large Databases, 2006.
[11]
E. Keogh, J. Lin, and W. Truppel. Clustering of time series subsequences is meaningless: Implications for previous and future research. Knowledge and Information Systems, 8(2), 2005.
[12]
E. Keogh, X. Xi, L. Wei, and C. A. Ratanamahatana. The ucr time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data, Accessed on Feb 5th 2010.
[13]
J. Lin, E. J. Keogh, L. Wei, and S. Lonardi. Experiencing sax: a novel symbolic representation of time series. Data Mining and Knowledge Discovery, 15(2):107--144, 2007.
[14]
A. Mueen, E. J. Keogh, Q. Zhu, S. Cash, and B. Westover. Exact discovery of time series motifs. In Proceedings of the SIAM International Conference on Data Mining, 2009.
[15]
S. V. R. Nageswara. Algorithms for fusion of multiple sensors having unknown error distributions. In Proceedings of the 15th Symposium on Energy Engineering Sciences, 1997.
[16]
Nature Publishing Group. Bone marrow transplantation. www.nature.com/bmt/journal/v31/ n8/fig_tab/1703917f2.html, Accessed on Feb 5th 2010.
[17]
S. Palit. Signal extraction from multiple noisy sensors. Signal Processing, 61(3):199--212, 1999.
[18]
A. D. Sarma, O. Benjelloun, A. Halevy, S. Nabar, and J. Widom. Representing uncertain data: Models, properties, and algorithms. The International Journal on Very Large Data Bases, 18(5), 2009.
[19]
A. Sharma, L. Golubchick, and R. Govindam. On the prevalence of sensor faults in real-world deployments. In Proceedings of the 4th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, 2007.
[20]
J. Sudano. Dynamic real-time sensor performance evaluation. In Proceedings of the 5th International Conference on Information Fusion, 2002.
[21]
M. Yeh, K. Wu, P. S. Yu, and M. Chen. Proud: A probabilistic approach to processing similarity queries over uncertain data streams. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, 2009.

Cited By

View all
  • (2022)ProS: data series progressive k-NN similarity search and classification with probabilistic quality guaranteesThe VLDB Journal10.1007/s00778-022-00771-z32:4(763-789)Online publication date: 30-Nov-2022
  • (2021)Matrix Profile IX: Admissible Time Series Motif Discovery With Missing DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.295062333:6(2616-2626)Online publication date: 1-Jun-2021
  • (2020)Uncertain Time Series Classification with Shapelet Transform2020 International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW51313.2020.00044(259-266)Online publication date: Nov-2020
  • Show More Cited By

Index Terms

  1. DUST: a generalized notion of similarity between uncertain time series

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
    July 2010
    1240 pages
    ISBN:9781450300551
    DOI:10.1145/1835804
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data mining
    2. distance measure
    3. similarity
    4. time series
    5. uncertain data

    Qualifiers

    • Research-article

    Conference

    KDD '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)ProS: data series progressive k-NN similarity search and classification with probabilistic quality guaranteesThe VLDB Journal10.1007/s00778-022-00771-z32:4(763-789)Online publication date: 30-Nov-2022
    • (2021)Matrix Profile IX: Admissible Time Series Motif Discovery With Missing DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.295062333:6(2616-2626)Online publication date: 1-Jun-2021
    • (2020)Uncertain Time Series Classification with Shapelet Transform2020 International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW51313.2020.00044(259-266)Online publication date: Nov-2020
    • (2020)Frobenius correlation based u-shapelets discovery for time series clusteringPattern Recognition10.1016/j.patcog.2020.107301103(107301)Online publication date: Jul-2020
    • (2020)An ultra-fast time series distance measure to allow data mining in more complex real-world deploymentsData Mining and Knowledge Discovery10.1007/s10618-020-00695-834:4(1104-1135)Online publication date: 30-May-2020
    • (2019)Return of the Lernaean HydraProceedings of the VLDB Endowment10.14778/3368289.336830313:3(403-420)Online publication date: 1-Nov-2019
    • (2018)The lernaean hydra of data series similarity searchProceedings of the VLDB Endowment10.14778/3282495.328249812:2(112-127)Online publication date: 1-Oct-2018
    • (2018)A parallel computational approach for similarity search using Bloom filtersComputational Intelligence10.1111/coin.1217234:2(713-733)Online publication date: 25-Mar-2018
    • (2018)TimeTubes: Automatic Extraction of Observable Blazar Features from Long-Term, Multi-Dimensional Datasets2018 IEEE Scientific Visualization Conference (SciVis)10.1109/SciVis.2018.8823802(67-71)Online publication date: Oct-2018
    • (2018)Privacy-Preserving Frequent Pattern Mining from Big Uncertain Data2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622260(5101-5110)Online publication date: Dec-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media