Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

T-Rex: Optimizing Pattern Search on Time Series

Published: 20 June 2023 Publication History

Abstract

Pattern search is an important class of queries for time series data. Time series patterns often match variable-length segments with a large search space, thereby posing a significant performance challenge. The existing pattern search systems, for example, SQL query engines supporting MATCH_RECOGNIZE, are ineffective in pruning the large search space of variable-length segments. In many cases, the issue is due to the use of a restrictive query language modeled on time series points and a computational model that limits search space pruning. We built T-ReX to address this problem using two main building blocks: first, a MATCH_RECOGNIZE language extension that exposes the notion of segment variable and adds new operators, lending itself to better optimization; second, an executor capable of pruning the search space of matches and minimizing total query time using an optimizer. We conducted experiments using 5 real-world datasets and 11 query templates, including those from existing works. T-ReX outperformed an optimized NFA-based pattern search executor by 6x in median query time and an optimized tree-based executor by 19X.

Supplemental Material

MP4 File
Recorded presentation of "T-Rex: Optimizing Pattern Search on Time Series"

References

[1]
Covid-19 data repository by the center for systems science and engineering (csse) at Johns Hopkins University. https://github.com/CSSEGISandData/COVID-19/tree/master/archived_data/archived_time_series, 2021.
[2]
Cold Wave. https://en.wikipedia.org/wiki/Cold_wave, 2022.
[3]
Expected Distance Between Random Points on a Line Segment. https://math.stackexchange.com/questions/195245/average-distance-between-random-points-on-a-line-segment, 2022.
[4]
Expected Number of Distinct Items from a Multi-Set. https://math.stackexchange.com/a/72351, 2022.
[5]
Historical hourly weather data 2012--2017. https://www.kaggle.com/selfishgene/historical-hourly-weather-data, 2022.
[6]
ISO/IEC TR 19075--5:2016 Information technology -- Database languages -- SQL Technical Reports -- Part 5: Row Pattern Recognition in SQL. https://www.iso.org/standard/65143.html, 2022.
[7]
MATCH_RECOGNIZE. https://trino.io/docs/current/sql/match-recognize.html, 2022.
[8]
MATCH_RECOGNIZE - Snowflake Documentation. https://docs.snowflake.com/en/sql-reference/constructs/match_recognize.html, 2022.
[9]
MATCH_RECOGNIZE for Pattern Recognition. https://docs.oracle.com/en/middleware/fusion-middleware/osa/19.1/cqlreference/pattern-recognition-match_recognize.html, 2022.
[10]
MATCH_RECOGNIZE for Pattern Recognition. https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/match_recognize/, 2022.
[11]
MATCH_RECOGNIZE (Stream Analytics). https://docs.microsoft.com/en-us/stream-analytics-query/match-recognize-stream-analytics, 2022.
[12]
The numenta anomaly benchmark (nab). https://github.com/numenta/NAB/tree/master/data/realKnownCause, 2022.
[13]
Snowflake MATCH_RECOGNIZE for finding transactions that do NOT match a pattern. https://livenewcapital.com/snowflake-match_recognize-for-finding-transactions-that-do-not-match-a-pattern/, 2023.
[14]
Snowflake MATCH_RECOGNIZE for performing A/B analysis on streaming data. https://livenewcapital.com/match_recognize-for-performing-a-b-analysis/, 2023.
[15]
T-ReX: Optimizing Pattern Search on Time Series (Extended Version). https://www.microsoft.com/en-us/research/publication/t-rex-optimizing-pattern-search-on-time-series-extended-version/, 2023.
[16]
Kolchinsky, Ilya and Schuster, Assaf. OpenCEP. https://research.redhat.com/blog/research_project/complex-event-processing-2/ https://github.com/ilya-kolchinsky/OpenCEP, 2021.
[17]
J. Agrawal, Y. Diao, D. Gyllstrom, and N. Immerman. Efficient pattern matching over event streams. In J. T. Wang, editor, Proceedings of the ACM SIGMOD Int'l Conf. on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10--12, 2008, pages 147--160. ACM, 2008.
[18]
R. Agrawal, G. Psaila, E. L. Wimmers, and M. Zaït. Querying shapes of histories. In U. Dayal, P. M. D. Gray, and S. Nishio, editors, VLDB'95, Proceedings of 21th Int'l Conf. on Very Large Data Bases, September 11--15, 1995, Zurich, Switzerland, pages 502--514. Morgan Kaufmann, 1995.
[19]
A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In M. A. Nascimento, M. T. Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B. Schiefer, editors, (e)Proceedings of the Thirtieth Int'l Conf. on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31 - September 3 2004, pages 336--347. Morgan Kaufmann, 2004.
[20]
P. Buono, A. Aris, C. Plaisant, A. Khella, and B. Shneiderman. Interactive pattern search in time series. In R. F. Erbacher, J. C. Roberts, M. T. Gröhn, and K. Börner, editors, Visualization and Data Analysis 2005, San Jose, CA, USA, January 17, 2005, volume 5669 of SPIE Proceedings, pages 175--186. SPIE, 2005.
[21]
B. Cadonna, J. Gamper, and M. H. Böhlen. Sequenced event set pattern matching. In A. Ailamaki, S. Amer-Yahia, J. M. Patel, T. Risch, P. Senellart, and J. Stoyanovich, editors, EDBT 2011, 14th Int'l Conf. on Extending Database Technology, Uppsala, Sweden, March 21--24, 2011, Proceedings, pages 33--44. ACM, 2011.
[22]
B. Cadonna, J. Gamper, and M. H. Böhlen. Efficient event pattern matching with match windows. In Q. Yang, D. Agarwal, and J. Pei, editors, The 18th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, KDD '12, Beijing, China, August 12--16, 2012, pages 471--479. ACM, 2012.
[23]
B. Chandramouli, J. Goldstein, M. Barnett, R. DeLine, J. C. Platt, J. F. Terwilliger, and J. Wernsing. Trill: A high-performance incremental query processor for diverse analytics. Proc. VLDB Endow., 8(4):401--412, 2014.
[24]
B. Chandramouli, J. Goldstein, and D. Maier. High-performance dynamic pattern matching over disordered streams. Proc. VLDB Endow., 3(1):220--231, 2010.
[25]
A. J. Demers, J. Gehrke, B. Panda, M. Riedewald, V. Sharma, and W. M. White. Cayuga: A general purpose event monitoring system. In Third Biennial Conf. on Innovative Data Systems Research, CIDR 2007, Asilomar, CA, USA, January 7--10, 2007, Online Proceedings, pages 412--422. www.cidrdb.org, 2007.
[26]
Y. Diao, N. Immerman, and D. Gyllstrom. Sase: An agile language for kleene closure over event streams. UMass Technical Report, 2007.
[27]
T. Fu, K. F. Chung, R. W. P. Luk, and C. Ng. Stock time series pattern matching: Template-based vs. rule-based approaches. Eng. Appl. Artif. Intell., 20(3):347--364, 2007.
[28]
G. Graefe. Volcano - an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng., 6(1):120--135, 1994.
[29]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002.
[30]
I. Kolchinsky and A. Schuster. Efficient adaptive detection of complex event patterns. Proc. VLDB Endow., 11(11):1346--1359, 2018.
[31]
I. Kolchinsky and A. Schuster. Join query optimization techniques for complex event processing applications. Proc. VLDB Endow., 11(11):1332--1345, 2018.
[32]
I. Kolchinsky and A. Schuster. Real-time multi-pattern detection over event streams. In P. A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, editors, Proceedings of the 2019 Int'l Conf. on Management of Data, SIGMOD Conf. 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 589--606. ACM, 2019.
[33]
I. Kolchinsky, I. Sharfman, and A. Schuster. Lazy evaluation methods for detecting complex events. In F. Eliassen and R. Vitenberg, editors, Proceedings of the 9th ACM Int'l Conf. on Distributed Event-Based Systems, DEBS '15, Oslo, Norway, June 29 - July 3, 2015, pages 34--45. ACM, 2015.
[34]
V. Leis, K. Kundhikanjana, A. Kemper, and T. Neumann. Efficient processing of window functions in analytical SQL queries. Proc. VLDB Endow., 8(10):1058--1069, 2015.
[35]
M. Liu, M. Ray, E. A. Rundensteiner, D. J. Dougherty, C. Gupta, S. Wang, I. Ari, and A. Mehta. Processing nested complex sequence pattern queries over event streams. In D. Zeinalipour-Yazti and W. Lee, editors, Proceedings of the 7th Workshop on Data Management for Sensor Networks, in conjunction with VLDB, DMSN 2010, Singapore, September 13, 2010, ACM Int'l Conf. Proceeding Series, pages 14--19. ACM, 2010.
[36]
M. Mannino and A. Abouzied. Expressive time series querying with hand-drawn scale-free sketches. In R. L. Mandryk, M. Hancock, M. Perry, and A. L. Cox, editors, Proceedings of the 2018 CHI Conf. on Human Factors in Computing Systems, CHI 2018, Montreal, QC, Canada, April 21--26, 2018, page 388. ACM, 2018.
[37]
Y. Mei and S. Madden. Zstream: a cost-based query processor for adaptively detecting composite events. In U. Çetintemel, S. B. Zdonik, D. Kossmann, and N. Tatbul, editors, Proceedings of the ACM SIGMOD Int'l Conf. on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009, pages 193--206. ACM, 2009.
[38]
M. Ray, C. Lei, and E. A. Rundensteiner. Scalable pattern sharing on event streams. In F. Özcan, G. Koutrika, and S. Madden, editors, Proceedings of the 2016 Int'l Conf. on Management of Data, SIGMOD Conf. 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 495--510. ACM, 2016.
[39]
M. Ray, E. A. Rundensteiner, M. Liu, C. Gupta, S. Wang, and I. Ari. High-performance complex event processing using continuous sliding views. In G. Guerrini and N. W. Paton, editors, Joint 2013 EDBT/ICDT Conf., EDBT '13 Proceedings, Genoa, Italy, March 18--22, 2013, pages 525--536. ACM, 2013.
[40]
R. Sadri, C. Zaniolo, A. M. Zarkesh, and J. Adibi. Expressing and optimizing sequence queries in database systems. ACM Trans. Database Syst., 29(2):282--318, 2004.
[41]
T. Siddiqui, P. Luh, Z. Wang, K. Karahalios, and A. G. Parameswaran. Shapesearch: A flexible and efficient system for shape-based exploration of trendlines. In D. Maier, R. Pottinger, A. Doan, W. Tan, A. Alawini, and H. Q. Ngo, editors, Proceedings of the 2020 Int'l Conf. on Management of Data, SIGMOD Conf. 2020, online Conf. [Portland, OR, USA], June 14--19, 2020, pages 51--65. ACM, 2020.
[42]
A. Vogelsgesang, T. Neumann, V. Leis, and A. Kemper. Efficient evaluation of arbitrarily-framed holistic SQL aggregates and window functions. In Z. Ives, A. Bonifati, and A. E. Abbadi, editors, SIGMOD '22: Int'l Conf. on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, pages 1243--1256. ACM, 2022.
[43]
R. Wesley and F. Xu. Incremental computation of common windowed holistic aggregates. Proc. VLDB Endow., 9(12):1221--1232, 2016.
[44]
E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing over streams. In S. Chaudhuri, V. Hristidis, and N. Polyzotis, editors, Proceedings of the ACM SIGMOD Int'l Conf. on Management of Data, Chicago, Illinois, USA, June 27--29, 2006, pages 407--418. ACM, 2006.
[45]
J. Yang and J. Widom. Incremental computation and maintenance of temporal aggregates. In D. Georgakopoulos and A. Buchmann, editors, Proceedings of the 17th Int'l Conf. on Data Engineering, April 2--6, 2001, Heidelberg, Germany, pages 51--60. IEEE Computer Society, 2001.
[46]
S. Yue, P. Pilon, and G. Cavadias. Power of the mann--kendall and spearman's rho tests for detecting monotonic trends in hydrological series. Journal of Hydrology, 259(1):254--271, 2002.
[47]
H. Zhang, Y. Diao, and N. Immerman. On complexity and optimization of expensive queries in complex event processing. In C. E. Dyreson, F. Li, and M. T. Özsu, editors, Int'l Conf. on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, pages 217--228. ACM, 2014.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 2
PACMMOD
June 2023
2310 pages
EISSN:2836-6573
DOI:10.1145/3605748
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2023
Published in PACMMOD Volume 1, Issue 2

Permissions

Request permissions for this article.

Author Tags

  1. operator
  2. pattern search
  3. query optimizer
  4. query processing
  5. time series

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)195
  • Downloads (Last 6 weeks)16
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Streamlining Temporal Formal Verification over Columnar DatabasesInformation10.3390/info1501003415:1(34)Online publication date: 8-Jan-2024
  • (2024)Window Function Expression: Let the Self-Join EnterProceedings of the VLDB Endowment10.14778/3665844.366584817:9(2162-2174)Online publication date: 6-Aug-2024
  • (2024)Proximity Queries on Point Clouds using Rapid Construction Path OracleProceedings of the ACM on Management of Data10.1145/36392612:1(1-26)Online publication date: 26-Mar-2024
  • (2024)ACER: Accelerating Complex Event Recognition via Two-Phase Filtering under Range Bitmap-Based IndexesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671814(1933-1943)Online publication date: 25-Aug-2024
  • (2024)When Data Pricing Meets Non-Cooperative Game Theory2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00443(5548-5559)Online publication date: 13-May-2024
  • (2024)An Efficient Algorithm for Continuous Complex Event Matching Using Bit-Parallelism2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00037(396-408)Online publication date: 13-May-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media