Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2588555.2610498acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Indexing for interactive exploration of big data series

Published: 18 June 2014 Publication History

Abstract

Numerous applications continuously produce big amounts of data series, and in several time critical scenarios analysts need to be able to query these data as soon as they become available, which is not currently possible with the state-of-the-art indexing methods and for very large data series collections. In this paper, we present the first adaptive indexing mechanism, specifically tailored to solve the problem of indexing and querying very large data series collections. The main idea is that instead of building the complete index over the complete data set up-front and querying only later, we interactively and adaptively build parts of the index, only for the parts of the data on which the users pose queries. The net effect is that instead of waiting for extended periods of time for the index creation, users can immediately start exploring the data series. We present a detailed design and evaluation of adaptive data series indexing over both synthetic data and real-world workloads. The results show that our approach can gracefully handle large data series collections, while drastically reducing the data to query delay: by the time state-of-the-art indexing techniques finish indexing 1 billion data series (and before answering even a single query), adaptive data series indexing has already answered $3*10^5$ queries.

References

[1]
R. Agrawal, C. Faloutsos, and A. N. Swami. Efficient similarity search in sequence databases. In FODO, pages 69--84, 1993.
[2]
I. Assent, R. Krieger, F. Afschari, and T. Seidl. The TS-tree: efficient time series search and retrieval. In EDBT, pages 252--263, 2008.
[3]
J. L. Bentley. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM, 18(9):509--517, Sept. 1975.
[4]
S. Berchtold, D. A. Keim, and H.-P. Kriegel. The X-tree: An index structure for high-dimensional data. In VLDB, pages 28--39, 1996.
[5]
Y. Bu, T. wing Leung, A. W. chee Fu, E. Keogh, J. Pei, and S. Meshkin. Wat: Finding top-k discords in time series database. In SDM, pages 449--454, 2007.
[6]
A. Camerra, T. Palpanas, J. Shieh, and E. Keogh. iSAX 2.0: Indexing and mining one billion time series. In ICDM, pages 58--67, 2010.
[7]
A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. Keogh. Beyond One Billion Time Series: Indexing and Mining Very Large Time Series Collections with iSAX2+. KAIS, 39(1):123--151, 2014.
[8]
K.-P. Chan and A.-C. Fu. Efficient time series matching by wavelets. In ICDE, pages 126--133, 1999.
[9]
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: a survey. ACM Computing Surveys, 41(3):1--58, 2009.
[10]
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In SIGMOD, pages 419--429, 1994.
[11]
G. Graefe, F. Halim, S. Idreos, H. A. Kuno, and S. Manegold. Concurrency control for adaptive indexing. PVLDB, 5(7):656--667, 2012.
[12]
G. Graefe, F. Halim, S. Idreos, H. A. Kuno, S. Manegold, and B. Seeger. Transactional support for adaptive indexing. VLDB J., 23(2):303--328, 2014.
[13]
A. Guttman. R-Trees A Dynamic Structure for Spatial Searching. In SIGMOD, pages 47--57, 1984.
[14]
F. Halim, S. Idreos, P. Karras, and R. H. C. Yap. Stochastic database cracking: Towards robust adaptive indexing in main-memory column-stores. PVLDB, 5(6):502--513, 2012.
[15]
S. Idreos, I. Alagiannis, R. Johnson, and A. Ailamaki. Here are my Data Files. Here are my Queries. Where are my Results? In CIDR, pages 57--68, 2011.
[16]
S. Idreos, M. L. Kersten, and S. Manegold. Updating a Cracked Database. In SIGMOD, pages 413--424, 2007.
[17]
S. Idreos, M. L. Kersten, and S. Manegold. Database Cracking. In CIDR, pages 68--78, 2007.
[18]
S. Idreos, M. L. Kersten, and S. Manegold. Self-organizing Tuple Reconstruction in Column-stores. In SIGMOD, pages 297--308, 2009.
[19]
S. Idreos and E. Liarou. dbtouch: Analytics at your fingertips. In CIDR, 2013.
[20]
S. Idreos, S. Manegold, H. A. Kuno, and G. Graefe. Merging what's cracked, cracking what's merged: Adaptive indexing in main-memory column-stores. PVLDB, 4(9):585--597, 2011.
[21]
H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. TPAMI, 33(1):117--128, 2011.
[22]
E. Keogh, K. Chakrabarti, and M. Pazzani. Dimensionality reduction for fast similarity search in large time series databases. KAIS, 3(3):263--286, 2000.
[23]
J. Lin, E. Keogh, and S. Lonardi. A symbolic representation of time series, with implications for streaming algorithms. In DMKD, pages 2--11, 2003.
[24]
T. Palpanas, M. Vlachos, E. J. Keogh, and D. Gunopulos. Streaming time series summarization using user-defined amnesic functions. TKDE, 20(7):992--1006, 2008.
[25]
D. Rafiei and A. Mendelzon. Similarity-based queries for time series data. In SIGMOD, pages 13--25, 1997.
[26]
T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In SIGKDD, pages 262--270, 2012.
[27]
S. Richter, J.-A. Quiane-Ruiz, S. Schuh, and J. Dittrich. Towards zero-overhead static and adaptive indexing in hadoop. VLDBJ, 2013.
[28]
F. M. Schuhknecht, A. Jindal, and J. Dittrich. The Uncracked Pieces in Database Cracking. PVLDB, 7(2):97--108, 2013.
[29]
J. Shieh and E. Keogh. iSAX: Indexing and Mining Terabyte Sized Time Series. In SIGKDD, pages 623--631, 2008.
[30]
J. Shieh and E. Keogh. iSAX: disk-aware mining and indexing of massive time series datasets. DMKD, 19(1):24--57, 2009.
[31]
M. Stonebraker. The case for partial indexes. SIGMOD Rec., 18(4):4--11, 1989.
[32]
Y. Wang, P. Wang, J. Pei, W. Wang, and S. Huang. A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB, 6(10):793--804, 2013.
[33]
T. Warren Liao. Clustering of time series data - a survey. Pattern Recognition, 38(11):1857--1874, 2005.
[34]
B. Yi and C. Faloutsos. Fast time sequence indexing for arbitrary lp norms. In VLDB, pages 385--394, 2000.
[35]
J. Zhou and K. A. Ross. Buffering accesses to memory-resident index structures. In VLDB, pages 405--416, 2003.
[36]
J. Zhou and K. A. Ross. Buffering database operations for enhanced instruction cache performance. In SIGMOD, pages 191--202, 2004.

Cited By

View all
  • (2024)A Survey of Techniques for Discovering, Using, and Paying for Third-Party IoT SensorsSensors10.3390/s2408253924:8(2539)Online publication date: 15-Apr-2024
  • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 1-May-2024
  • (2024)Optimizing Dataflow Systems for Scalable Interactive VisualizationProceedings of the ACM on Management of Data10.1145/36392762:1(1-25)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
June 2014
1645 pages
ISBN:9781450323765
DOI:10.1145/2588555
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive data-series index
  2. adaptive indexing
  3. data-series
  4. nearest neighbor
  5. similarity search

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'14
Sponsor:

Acceptance Rates

SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey of Techniques for Discovering, Using, and Paying for Third-Party IoT SensorsSensors10.3390/s2408253924:8(2539)Online publication date: 15-Apr-2024
  • (2024)DIDS: Double Indices and Double Summarizations for Fast Similarity SearchProceedings of the VLDB Endowment10.14778/3665844.366585117:9(2198-2211)Online publication date: 1-May-2024
  • (2024)Optimizing Dataflow Systems for Scalable Interactive VisualizationProceedings of the ACM on Management of Data10.1145/36392762:1(1-25)Online publication date: 26-Mar-2024
  • (2024)DumpyOS: A data-adaptive multi-ary index for scalable data series similarity searchThe VLDB Journal10.1007/s00778-024-00874-933:6(1887-1911)Online publication date: 21-Aug-2024
  • (2023)Querying Similar Multi-Dimensional Time Series with a Spatial DatabaseISPRS International Journal of Geo-Information10.3390/ijgi1204017912:4(179)Online publication date: 21-Apr-2023
  • (2023)Odyssey: A Journey in the Land of Distributed Data Series Similarity SearchProceedings of the VLDB Endowment10.14778/3579075.357908716:5(1140-1153)Online publication date: 6-Mar-2023
  • (2023)Dumpy: A Compact and Adaptive Index for Large Data Series CollectionsProceedings of the ACM on Management of Data10.1145/35889651:1(1-27)Online publication date: 30-May-2023
  • (2023)Indexing for Near-Sorted Data2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00117(1475-1488)Online publication date: Apr-2023
  • (2022)G-tranProceedings of the VLDB Endowment10.14778/3551793.355181315:11(2545-2558)Online publication date: 29-Sep-2022
  • (2022)Efficient Range and kNN Twin Subsequence Search in Time SeriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3167257(1-1)Online publication date: 2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media