Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

SWIX: A Memory-efficient Sliding Window Learned Index

Published: 26 March 2024 Publication History

Abstract

Data stream processing systems enable querying over sliding windows of streams of data. Efficient index structures for the streaming window are a crucial building block to enable querying the sliding window for operations such as aggregation and joins. This paper proposes SWIX, a novel memory-efficient learned index for sliding windows. Unlike conventional learned indexes that rely on tree structures to achieve logarithmic query cost, SWIX has a flat structure that uses substantially less memory and enables efficient query execution while having a low cost for index maintenance when inserting (and retraining). SWIX dynamically adapts itself to the real-time distribution shifts of data streams.
SWIX outperforms existing indexes in terms of query execution time and memory footprint for workloads characterized by very frequent updates. Our results show that SWIX has a significantly smaller memory footprint than conventional, streaming, and learned indexes, using only 22% to 42% of the size compared to state-of-the-art approaches, yet outperforming them by up 1.2× to 1.6× on average (and up to 52×) in terms of query time, making it a space- and time-efficient method for indexing data streams. For concurrent learned indexes, Parallel SWIX can achieve up to 3.45× throughput with only 34% of memory consumption.

References

[1]
Ajay Acharya and Nandini S. Sidnal. 2016. High Frequency Trading with Complex Event Processing. In 2016 IEEE 23rd International Conference on High Performance Computing Workshops (HiPCW). 39--42. https://doi.org/10.1109/HiPCW.2016.014
[2]
Hirotugu Akaike. 1974. A new look at the statistical model identification. IEEE transactions on automatic control, Vol. 19, 6 (1974), 716--723.
[3]
Daniel Berjón, Guillermo Gallego, Carlos Cuevas, Francisco Morán, and Narciso Garc'ia. 2015. Optimal piecewise linear function approximation for GPU-based applications. IEEE transactions on cybernetics, Vol. 46, 11 (2015), 2584--2595.
[4]
bingmann. 2019. STX-BTree. https://github.com/bingmann/stx-btree Last Accessed: 2023--12--28.
[5]
Robert Binna, Eva Zangerle, Martin Pichl, Günther Specht, and Viktor Leis. 2018. HOT: a height optimized Trie index for main-memory database systems. In Proceedings of the International Conference on Management of Data (SIGMOD). ACM, 521--534.
[6]
Sanket Chintapalli, Derek Dagit, Bobby Evans, Reza Farivar, Thomas Graves, Mark Holderbaugh, Zhuo Liu, Kyle Nusbaum, Kishorkumar Patil, Boyang Jerry Peng, et al. 2016. Benchmarking streaming computation engines: Storm, flink and spark streaming. In 2016 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, 1789--1792.
[7]
Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2020. From wisckey to bourbon: A learned index for log-structured merge trees. In OSDI. 155--171.
[8]
Herbert A David and Haikady N Nagaraja. 2004. Order statistics. John Wiley & Sons.
[9]
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the International Conference on Management of Data (SIGMOD) (Portland, OR, USA). 969--984.
[10]
Adam Dziedzic, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R Narasayya, and Manoj Syamala. 2018. Columnstore and B tree-Are Hybrid Physical Designs Important?. In Proceedings of the International Conference on Management of Data (SIGMOD). 177--190.
[11]
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment (PVLDB), Vol. 13, 8 (2020).
[12]
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In Proceedings of the International Conference on Management of Data (SIGMOD). 1189--1206.
[13]
Goetz Graefe et al. 2011. Modern B-tree Techniques. Foundations and Trends in Databases, Vol. 3, 4 (2011), 203--402.
[14]
gvinciguerra. 2023. PGM-Index. https://github.com/gvinciguerra/PGM-index Last Accessed: 2023--12--28.
[15]
Ali Hadian, Behzad Ghaffari, Taiyi Wang, and Thomas Heinis. 2021. COAX: Correlation-Aware Indexing on Multidimensional Data with Soft Functional Dependencies. arXiv preprint arXiv:2006.16393 (2021).
[16]
Ali Hadian and Thomas Heinis. 2019. Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and Learned Indexes. In Proceedings of the International Conference on Extending Database Technology (EDBT).
[17]
Ali Hadian and Thomas Heinis. 2020. MADEX: Learning-augmented Algorithmic Index Structures. In Proceedings of the International VLDB Workshop on Applied AI for Database Systems and Applications (AIDB).
[18]
Ali Hadian and Thomas Heinis. 2021. Shift-Table: A Low-latency Learned Index for Range Queries using Model Correction. In Proceedings of the International Conference on Extending Database Technology (EDBT).
[19]
Ali Hadian, Ankit Kumar, and Thomas Heinis. 2020. Hands-off Model Integration in Spatial Index Structures. In Proceedings of the International VLDB Workshop on Applied AI for Database Systems and Applications (AIDB).
[20]
Intel. 1997. Read Time-Stamp Counter. https://c9x.me/x86/html/file_module_x86_id_278.html Last Accessed: 2023--10--14.
[21]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2019. SOSD: A Benchmark for Learned Indexes. NeurIPS Workshop on Machine Learning for Systems (2019).
[22]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: A Single-Pass Learned Index. In Proceedings of the International SIGMOD Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM).
[23]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the International Conference on Management of Data (SIGMOD). 489--504.
[24]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 38--49.
[25]
Justin J Levandoski, David B Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). 302--313.
[26]
Pengfei Li, Yu Hua, Jingnan Jia, and Pengfei Zuo. 2021. FINEdex: a fine-grained learned index scheme for scalable and concurrent memory systems. Proceedings of the VLDB Endowment, Vol. 15, 2 (2021), 321--334.
[27]
Pengfei Li, Hua Lu, Qian Zheng, Long Yang, and Gang Pan. 2020. LISA: A Learned Index Structure for Spatial Data. In Proceedings of the International Conference on Management of Data.
[28]
Anisa Llavesh, Utku Sirin, Robert West, and Anastasia Ailamaki. 2019. Accelerating Btree Search by Using Simple Machine Learning Techniques. In Proceedings of the 1st International VLDB Workshop on Applied AI for Database Systems and Applications (AIDB).
[29]
Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020a. Benchmarking learned indexes. Proceedings of the VLDB Endowment (PVLDB), Vol. 14, 1 (2020), 1--13.
[30]
Ryan Marcus, Emily Zhang, and Tim Kraska. 2020b. CDFShop: Exploring and Optimizing Learned Index Structures. In Proceedings of the International Conference on Management of Data (SIGMOD). 2789--2792.
[31]
microsoft. 2022. ALEX. https://github.com/microsoft/ALEX Last Accessed: 2023--12--28.
[32]
Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-dimensional Indexes. In Proceedings of the International Conference on Management of Data (SIGMOD).
[33]
Varun Pandey, Alexander van Renen, Andreas Kipf, Ibrahim Sabek, Jialin Ding, and Alfons Kemper. 2020. The Case for Learned Spatial Indexes. In Proceedings of the International VLDB Workshop on Applied AI for Database Systems and Applications (AIDB).
[34]
Naufal Fikri Setiawan, Benjamin IP Rubinstein, and Renata Borovica-Gajic. 2020. Function Interpolation for Learned Index Structures. In ADC.
[35]
Amirhesam Shahvarani and Hans-Arno Jacobsen. 2020. Parallel Index-based Stream Join on a Multicore CPU. In Proceedings of the International Conference on Management of Data (SIGMOD). Association for Computing Machinery, New York, NY, USA, 2523--2537.
[36]
Stefan Sprenger, Steffen Zeuch, and Ulf Leser. 2016. Cache-sensitive skip list: Efficient range queries on modern cpus. In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN). Springer, 1--17.
[37]
Hari Subramoni, Fabrizio Petrini, Virat Agarwal, and Davide Pasetto. 2010. Streaming, low-latency communication in on-line trading systems. In International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW). 1--8.
[38]
Chuzhe Tang. 2022a. FINEdex. https://github.com/iotlpf/FINEdex Last Accessed: 2023--12--28.
[39]
Chuzhe Tang. 2022b. XIndex. https://ipads.se.sjtu.edu.cn:1312/opensource/xindex Last Accessed: 2023--12--28.
[40]
Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. 2020. XIndex: a scalable learned index for multicore data storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 308--320.
[41]
Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable learned index with precise positions. arXiv preprint arXiv:2104.05520 (2021).
[42]
Zhongle Xie, Qingchao Cai, HV Jagadish, Beng Chin Ooi, and Weng-Fai Wong. 2017. Parallelizing skip lists for in-memory multi-core database systems. In Proceedings of the IEEE International Conference on Data Engineering (ICDE). IEEE, 119--122.
[43]
Yu Ya-xin, Yang Xing-hua, Yu Ge, and Wu Shan-shan. 2006. An Indexed Non-equijoin Algorithm Based on Sliding Windows over Data Streams. Wuhan University Journal of Natural Sciences, Vol. 11, 1 (2006), 294--298.
[44]
Guang Yang, Liang Liang, Ali Hadian, and Thomas Heinis. 2023. FLIRT: A Fast Learned Index for Rolling Time frames. In Proceedings of the International Conference on Extending Database Technology (EDBT). 234--246.
[45]
Jiaoyi Zhang and Yihan Gao. 2022. CARMI: A Cache-Aware Learned Index with a Cost-Based Construction Algorithm. Proc. VLDB Endow., Vol. 15, 11 (jul 2022), 2679--2691. https://doi.org/10.14778/3551793.3551823
[46]
Shuhao Zhang, Yancan Mao, Jiong He, Philipp M Grulich, Steffen Zeuch, Bingsheng He, Richard TB Ma, and Volker Markl. 2021. Parallelizing Intra-Window Join on Multicores: An Experimental Study. In Proceedings of the 2021 International Conference on Management of Data. 2089--2101.
[47]
Wenshao Zhong, Chen Chen, Xingbo Wu, and Song Jiang. 2021. REMIX: Efficient Range Query for LSM-trees. In USENIX Conference on File and Storage Technologies (FAST). 51--64.

Index Terms

  1. SWIX: A Memory-efficient Sliding Window Learned Index

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 2, Issue 1
    SIGMOD
    February 2024
    1874 pages
    EISSN:2836-6573
    DOI:10.1145/3654807
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 March 2024
    Published in PACMMOD Volume 2, Issue 1

    Author Tags

    1. automatic tuning
    2. flat structure
    3. learned index
    4. lightweight structure
    5. parallel structure
    6. streaming processing
    7. update heavy
    8. window-based query

    Qualifiers

    • Research-article

    Funding Sources

    • National Agency for Research and Development (ANID)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 559
      Total Downloads
    • Downloads (Last 12 months)559
    • Downloads (Last 6 weeks)80
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media