Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3318464.3380563acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

BinDex: A Two-Layered Index for Fast and Robust Scans

Published: 31 May 2020 Publication History

Abstract

In modern analytical database systems, the performance of the data scan operation is of key importance to the performance of query execution. Existing approaches may be categorized into index scan and sequential scan. However, both approaches have inherent inefficiencies. Indeed, sequential scan may need to access a large amount of unneeded data, especially for queries with low selectivity. Instead, index scan may involve a large number of expensive random memory accesses when the query selectivity is high. Moreover, with the growing complexities in database query workloads, it has become hard to predict which approach is better for a particular query. In order to obtain fast and robust scans under all selectivities, this paper proposes BinDex, a two-layered index structure based on binned bitmaps that can be used to significantly accelerate the scan operations for in-memory column stores. The first layer of BinDex consists of a set of binned bitmaps which filter out most unneeded values in a column. The second layer provides some auxiliary information to correct the bits that have incorrect values. By varying the number of bit vectors in the first layer, BinDex can make a tradeoff between memory space and performance. Experimental results show that BinDex outperforms the state-of-the-art approaches with less memory than a B+-tree would use. And by enlarging the memory space, BinDex can achieve up to 2.9 times higher performance, eliminating the need for making a choice between sequential or index scans.

Supplementary Material

MP4 File (3318464.3380563.mp4)
Presentation Video

References

[1]
Daniel Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating Compression and Execution in Column-oriented Database Systems. In SIGMOD. 671--682.
[2]
Carsten Binnig, Stefan Hildenbrand, and Franz F"arber. 2009a. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD. ACM, 283--296.
[3]
Carsten Binnig, Stefan Hildenbrand, and Franz Farber. 2009b. Dictionary-based Order-preserving String Compression for Main Memory Column Stores. In SIGMOD. 283--296.
[4]
Peter A Boncz, Stefan Manegold, Martin L Kersten, et al. 1999. Database architecture optimized for the new bottleneck: Memory access. In Proc. VLDB Endow., Vol. 99. 54--65.
[5]
Renata Borovica-Gajic, Stratos Idreos, Anastasia Ailamaki, Marcin Zukowski, and Campbell Fraser. 2018. Smooth Scan: robust access path selection without cardinality estimation. The VLDB Journal, Vol. 27, 4 (2018), 521--545.
[6]
George Candea, Neoklis Polyzotis, and Radek Vingralek. 2011. Predictable Performance and High Query Concurrency for Data Analytics. The VLDB Journal, Vol. 20, 2 (2011), 227--248.
[7]
Chee-Yong Chan and Yannis E. Ioannidis. 1998. Bitmap Index Design and Evaluation. In SIGMOD. 355--366.
[8]
Chee-Yong Chan and Yannis E. Ioannidis. 1999. An Efficient Bitmap Encoding Scheme for Selection Queries. In SIGMOD. 215--226.
[9]
S. Christodoulakis. 1984. Implications of Certain Assumptions in Database Performance Evauation. ACM Transaction on Database Systems, Vol. 9, 2 (June 1984).
[10]
Douglas Comer. 1979. Ubiquitous B-Tree. Comput. Surveys, Vol. 11 (1979), 121--137. Issue 2.
[11]
Wenbin Fang, Bingsheng He, and Qiong Luo. 2010. Database compression on graphics processors. Proc. VLDB Endow., Vol. 3, 1--2 (2010), 670--680.
[12]
Ziqiang Feng, Eric Lo, Ben Kao, and Wenjian Xu. 2015. Byteslice: Pushing the envelop of main memory data processing with a new storage layout. In SIGMOD. ACM, 31--46.
[13]
Georgios Giannikis, Gustavo Alonso, and Donald Kossmann. 2012. SharedDB: Killing One Thousand Queries with One Stone. Proc. VLDB Endow., Vol. 5, 6 (2012), 526--537.
[14]
Georgios Giannikis, Darko Makreshanski, Gustavo Alonso, and Donald Kossmann. 2014. Shared Workload Optimization. Proc. VLDB Endow., Vol. 7, 6 (2014), 429--440.
[15]
Brian Hentschel, Michael S Kester, and Stratos Idreos. 2018. Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation. In SIGMOD. ACM, 857--872.
[16]
Byunghyun Jang, Dana Schaa, Perhaad Mistry, and David Kaeli. 2010. Exploiting memory access patterns to improve memory performance in data-parallel architectures. IEEE Transactions on Parallel & Distributed Systems 1 (2010), 105--118.
[17]
Michael S. Kester, Manos Athanassoulis, and Stratos Idreos. 2017. Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?. In SIGMOD. 715--730.
[18]
Jens Krueger, Changkyu Kim, Martin Grund, Nadathur Satish, David Schwalb, Jatin Chhugani, Hasso Plattner, Pradeep Dubey, and Alexander Zeier. 2011. Fast Updates on Read-optimized Databases Using Multi-core CPUs. In Proc. VLDB Endow., Vol. 5. 61--72.
[19]
Yinan Li and Jignesh M Patel. 2013. BitWeaving: fast scans for main memory data processing. In SIGMOD. ACM, 289--300.
[20]
Guido Moerkotte. 1998. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In Proc. VLDB Endow. 476--487.
[21]
Patrick O'Neil and Dallan Quass. 1997. Improved Query Performance with Variant Indexes. In SIGMOD. 38--49.
[22]
Orestis Polychroniou, Arun Raghavan, and Kenneth A. Ross. 2015. Rethinking SIMD Vectorization for In-Memory Databases. In SIGMOD. 1493--1508.
[23]
Iraklis Psaroudakis, Manos Athanassoulis, and Anastasia Ailamaki. 2013. Sharing Data and Work Across Concurrent Analytical Queries. Proc. VLDB Endow., Vol. 6, 9 (2013), 637--648.
[24]
Lin Qiao, Vijayshankar Raman, Frederick Reiss, Peter J. Haas, and Guy M. Lohman. 2008. Main-memory Scan Sharing for Multi-core CPUs. Proc. VLDB Endow., Vol. 1, 1 (2008), 610--621.
[25]
Wilson Qin and Stratos Idreos. 2016. Adaptive data skipping in main-memory systems. In SIGMOD. ACM, 2255--2256.
[26]
Doron Rotem, Kurt Stockinger, and Kesheng Wu. 2005. Optimizing Candidate Check Costs for Bitmap Indices. In CIKM. 648--655.
[27]
D. Rotem, K. Stockinger, and Kesheng Wu. 2006. Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices. In SSDBM. 33--44.
[28]
P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In SIGMOD. 23--34.
[29]
Lefteris Sidirourgos and Martin Kersten. 2013. Column imprints: a secondary index structure. In SIGMOD. ACM, 893--904.
[30]
Kurt Stockinger, Kesheng Wu, and Arie Shoshani. 2004. Evaluation Strategies for Bitmap Indices with Binning. In Database and Expert Systems Applications. 120--129.
[31]
Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. 2005. C-store: A Column-oriented DBMS. In Proc. VLDB Endow. 553--564.
[32]
Liwen Sun, Michael J Franklin, Sanjay Krishnan, and Reynold S Xin. 2014. Fine-grained partitioning for aggressive data skipping. In SIGMOD. ACM, 1115--1126.
[33]
Liwen Sun, Michael J Franklin, Jiannan Wang, and Eugene Wu. 2016. Skipping-oriented partitioning for columnar layouts. Proc. VLDB Endow., Vol. 10, 4 (2016), 421--432.
[34]
Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2017. An Experimental Study of Bitmap Compression vs. Inverted List Compression. In SIGMOD. 993--1008.
[35]
Thomas Willhalm, Nicolae Popovici, Yazan Boshmaf, Hasso Plattner, Alexander Zeier, and Jan Schaffner. 2009. SIMD-scan: Ultra Fast In-memory Table Scan Using On-chip Vector Processing Units. Proc. VLDB Endow., Vol. 2, 1 (2009), 385--394.
[36]
Jingren Zhou and Kenneth A. Ross. 2002. Implementing Database Operations Using SIMD Instructions. In SIGMOD. 145--156.
[37]
Marcin Zukowski, Sándor Héman, Niels Nes, and Peter Boncz. 2007. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS. In Proc. VLDB Endow. 723--734.

Cited By

View all

Index Terms

  1. BinDex: A Two-Layered Index for Fast and Robust Scans

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
    June 2020
    2925 pages
    ISBN:9781450367356
    DOI:10.1145/3318464
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. in-memory column stores
    2. indexing
    3. scan

    Qualifiers

    • Research-article

    Funding Sources

    • NSFC
    • National Key R&D Program of China

    Conference

    SIGMOD/PODS '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)89
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 13 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)RTScan: Efficient Scan with Ray Tracing CoresProceedings of the VLDB Endowment10.14778/3648160.364818317:6(1460-1472)Online publication date: 3-May-2024
    • (2022)ByteStore: Hybrid Layouts for Main-Memory Column Stores2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020303(170-179)Online publication date: 17-Dec-2022
    • (2022)SgIndex: An Index Structure Supporting Multiple Graph QueriesWeb and Big Data10.1007/978-3-031-25158-0_45(553-561)Online publication date: 11-Aug-2022
    • (2022)CrossIndex: Memory-Friendly and Session-Aware Index for Supporting Crossfilter in Interactive Data ExplorationDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_38(476-492)Online publication date: 11-Apr-2022
    • (2021)LES3Proceedings of the VLDB Endowment10.14778/3476249.347626314:11(2073-2086)Online publication date: 27-Oct-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media