Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3625549.3658824acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper
Open access

Constrained Approximate Query Processing with Error and Response Time-Bound Guarantees for Efficient Big Data Analytics

Published: 30 August 2024 Publication History

Abstract

Approximate query processing (AQP) is a technique for obtaining approximate answers to queries over large datasets. AQP techniques trade off accuracy for speed, making them ideal for scenarios where exact answers are not required or the cost of obtaining exact answers is prohibitive. This paper proposes a novel machine learning (ML)-based AQP framework that leverages both generative and inferential ML models to improve accuracy and efficiency. The framework first constructs a generative ML model that learns the underlying data distribution and then generates synthetic data that follows the same distribution. The proposed framework also includes a mechanism for constrained approximate query processing (CAQP) with bounded errors and bounded response times. This allows users to specify the desired error bound for the results of an aggregation query. The framework then selects a subset of the synthetic data that is guaranteed to satisfy the error bound. An evaluation of the proposed framework using the Instacart benchmark dataset and queries demonstrates significant efficiency improvements in AQP compared to existing techniques.

References

[1]
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: queries with bounded errors and bounded response times on very large data. In Eighth Eurosys Conference 2013, EuroSys '13, Prague, Czech Republic, April 14--17, 2013. ACM, 29--42.
[2]
Benjamin Hilprecht, Andreas Schmidt, Moritz Kulessa, Alejandro Molina, Kristian Kersting, and Carsten Binnig. 2020. DeepDB: Learn from Data, not from Queries! Proc. VLDB Endow., 13, 7, 992--1005.
[3]
Sungsoo Kim, Taewhi Lee, Moonyoung Chung, and Jongho Won. 2015. Flying KIWI: Design of Approximate Query Processing Engine for Interactive Data Analytics at Scale. In Proceedings of the 2015 International Conference on Big Data Applications and Services, BigDAS '15, 2015. ACM, 206--207.
[4]
Sungsoo Kim, Taewhi Lee, Moonyoung Chung, and Jongho Won. 2016. Sweet KIWI: Statistics-Driven OLAP Acceleration using Query Column Sets. In Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, March 15--16, 2016. OpenProceedings.org, 680--681.
[5]
Taewhi Lee, Kihyuk Nam, Choon Seo Park, and Sungsoo Kim. 2022. Exploiting Machine Learning Models for Approximate Query Processing. In IEEE International Conference on Big Data, Big Data 2022, 2022. IEEE, 6752--6754.
[6]
Fotis Savva, Christos Anagnostopoulos, and Peter Triantafillou. 2020. ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning. CoRR, abs/2003.06613. https://arxiv.org/abs/2003.06613 arXiv: 2003.06613.
[7]
DataCebo, Inc. 2023. Synthetic Data Metrics. Version 0.12.0. DataCebo, Inc. (Oct. 2023). https://docs.sdv.dev/sdmetrics/.

Index Terms

  1. Constrained Approximate Query Processing with Error and Response Time-Bound Guarantees for Efficient Big Data Analytics

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing
        June 2024
        436 pages
        ISBN:9798400704130
        DOI:10.1145/3625549
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 30 August 2024

        Check for updates

        Author Tags

        1. approximate query processing
        2. exploratory data analysis
        3. query optimization

        Qualifiers

        • Short-paper

        Funding Sources

        • Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT)

        Conference

        HPDC '24
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 166 of 966 submissions, 17%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 71
          Total Downloads
        • Downloads (Last 12 months)71
        • Downloads (Last 6 weeks)42
        Reflects downloads up to 22 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media