Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2806777.2806839acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Using data transformations for low-latency time series analysis

Published: 27 August 2015 Publication History

Abstract

Time series analysis is commonly used when monitoring data centers, networks, weather, and even human patients. In most cases, the raw time series data is massive, from millions to billions of data points, and yet interactive analyses require low (e.g., sub-second) latency. Aperture transforms raw time series data, during ingest, into compact summarized representations that it can use to efficiently answer queries at runtime. Aperture handles a range of complex queries, from correlating hundreds of lengthy time series to predicting anomalies in the data. Aperture achieves much of its high performance by executing queries on data summaries, while providing a bound on the information lost when transforming data. By doing so, Aperture can reduce query latency as well as the data that needs to be stored and analyzed to answer a query. Our experiments on real data show that Aperture can provide one to four orders of magnitude lower query response time, while incurring only 10% ingest time overhead and less than 20% error in accuracy.

References

[1]
https://cloud.google.com/bigquery/docs/dataset-gsod.
[2]
Haar wavelet. http://en.wikipedia.org/wiki/Haar_wavelet.
[3]
Apache HBase. http://hbase.apache.org/.
[4]
https://cloud.google.com/bigquery/docs/dataset-mlab.
[5]
OpenTSDB. http://opentsdb.net/.
[6]
Pearson's product-moment correlation coefficient. http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient.
[7]
Apache Storm. http://storm.apache.org/.
[8]
How Twitter monitors millions of time series. http://radar.oreilly.com/2013/09/how-twitter-monitors-millions-of-time-series.html.
[9]
HP Vertica Live Aggregate Projections. http://www.vertica.com/2014/07/01/live-aggregate-projections-with-hp-vertica/.
[10]
L. Abraham, J. Allen, O. Barykin, V. Borkar, B. Chopra, C. Gerea, D. Merl, J. Metzler, D. Reiss, S. Subramanian, et al. Scuba: diving into data at Facebook. Proceedings of the VLDB Endowment, 6(11):1057--1067, 2013.
[11]
R. Agarwal, A. Khandelwal, and I. Stoica. Succinct: Enabling queries on compressed data. NSDI, 2015.
[12]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 29--42. ACM, 2013.
[13]
E. Anderson, M. Arlitt, C. B. Morrey III, and A. Veitch. Dataseries: an efficient, flexible data format for structured serial data. ACM SIGOPS Operating Systems Review, 43(1):70--75, 2009.
[14]
P. J. Brockwell and R. A. Davis. Time series: theory and methods. Springer Science & Business Media, 2009.
[15]
K.-P. Chan and A. W.-C. Fu. Efficient time series matching by wavelets. In Data Engineering, 1999. Proceedings., 15th International Conference on, pages 126--133. IEEE, 1999.
[16]
C. K. Chui. An introduction to wavelets, volume 1. Academic Press, 2014.
[17]
J. Cipar, G. Ganger, K. Keeton, C. B. Morrey III, C. A. Soules, and A. Veitch. LazyBase: Trading freshness for performance in a scalable database. In Proceedings of the 7th ACM european conference on Computer Systems, pages 169--182. ACM, 2012.
[18]
G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[19]
L. George. HBase: the definitive guide. "O'Reilly Media, Inc.", 2011.
[20]
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In VLDB, volume 1, pages 79--88, 2001.
[21]
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Locally adaptive dimensionality reduction for indexing large time series databases. ACM SIGMOD Record, 30(2):151--162, 2001.
[22]
F. Korn, H. V. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. ACM SIGMOD Record, 26(2):289--300, 1997.
[23]
S. Matusevych, A. Smola, and A. Ahmed. Hokusai-sketching streams in real time. arXiv preprint arXiv:1210.4891, 2012.
[24]
G. Reeves, J. Liu, S. Nath, and F. Zhao. Managing massive time series streams with multi-scale compressed trickles. Proceedings of the VLDB Endowment, 2(1):97--108, 2009.
[25]
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment, 2(2):1626--1629, 2009.
[26]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, et al. Storm@ Twitter. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 147--156. ACM, 2014.
[27]
M. Widenius and D. Axmark. Mysql Reference Manual. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 1st edition, 2002. ISBN 0596002653.
[28]
Y.-L. Wu, D. Agrawal, and A. El Abbadi. A comparison of DFT and DWT based similarity search in time-series databases. In Proceedings of the ninth international conference on Information and knowledge management, pages 488--495. ACM, 2000.
[29]
M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing, pages 10--10. USENIX Association, 2012.
[30]
Y. Zhu and D. Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th international conference on Very Large Data Bases, pages 358--369. VLDB Endowment, 2002.

Cited By

View all

Index Terms

  1. Using data transformations for low-latency time series analysis

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing
    August 2015
    446 pages
    ISBN:9781450336512
    DOI:10.1145/2806777
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    • PDL Consortium
    • Intel Science and Technology Center for Cloud Computing

    Conference

    SoCC '15
    Sponsor:
    SoCC '15: ACM Symposium on Cloud Computing
    August 27 - 29, 2015
    Hawaii, Kohala Coast

    Acceptance Rates

    SoCC '15 Paper Acceptance Rate 34 of 157 submissions, 22%;
    Overall Acceptance Rate 169 of 722 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Trinity: A Fast Compressed Multi-attribute Data StoreProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650072(405-420)Online publication date: 22-Apr-2024
    • (2021)TSCacheProceedings of the VLDB Endowment10.14778/3484224.348422514:13(3253-3266)Online publication date: 28-Oct-2021
    • (2019)Fast and accurate stream processing by filtering the coldThe VLDB Journal10.1007/s00778-019-00560-1Online publication date: 13-Aug-2019
    • (2018)Integrating Low-latency Analysis into HPC System MonitoringProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225086(1-10)Online publication date: 13-Aug-2018
    • (2017)Low-Latency Analytics on Colossal Data Streams with SummaryStoreProceedings of the 26th Symposium on Operating Systems Principles10.1145/3132747.3132758(647-664)Online publication date: 14-Oct-2017
    • (undefined)Ccs: A Counter-Cascading Sketch-Based Solution for Data Streams on FpgaSSRN Electronic Journal10.2139/ssrn.4112924

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media