Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3035918.3056098acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
invited-talk
Public Access

Approximate Query Engines: Commercial Challenges and Research Opportunities

Published: 09 May 2017 Publication History

Abstract

Recent years have witnessed a surge of interest in Approximate Query Processing (AQP) solutions, both in academia and the commercial world. In addition to well-known open problems in this area, there are many new research challenges that have surfaced as a result of the first interaction of AQP technology with commercial and real-world customers. We categorize these into deployment, planning, and interface challenges. At the same time, AQP settings introduce many interesting opportunities that would not be possible in a database with precise answers. These opportunities create hopes for overcoming some of the major limitations of traditional database systems. For example, we discuss how a database can reuse its past work in a generic way, and become smarter as it answers new queries. Our goal in this talk is to suggest some of the exciting research directions in this field that are worth pursuing.

References

[1]
Databricks. http://databricks.com/.
[2]
Fast, approximate analysis of big data (yahoo's druid). http://yahooeng.tumblr.com/post/135390948446/data-sketches.
[3]
Presto: Distributed SQL query engine for big data. https://prestodb.io/docs/current/release/release-0.61.html.
[4]
SnappyData Inc. http://snappydata.io.
[5]
Verdict. http://verdictdb.org/.
[6]
S. Acharya, P. B. Gibbons, and V. Poosala. Aqua: A fast decision support system using approximate query answers. In VLDB, 1999.
[7]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, 1999.
[8]
S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. In SIGMOD, 2014.
[9]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica. BlinkDB: queries with bounded errors and bounded response times on very large data. In EuroSys, 2013.
[10]
S. Chaudhuri, G. Das, and V. Narasayya. Optimized stratified sampling for approximate query processing. TODS, 2007.
[11]
A. Dobra, C. Jermaine, F. Rusu, and F. Xu. Turbo-charging estimate convergence in dbo. PVLDB, 2009.
[12]
J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In SIGMOD, 1997.
[13]
Infobright. Infobright approximate query (iaq). https://infobright.com/introducing-iaq/.
[14]
F. Li, B. Wu, K. Yi, and Z. Zhao. Wander join: Online aggregation via random walks. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016.
[15]
S. A. McKee. Reflections on the memory wall. In Proceedings of the 1st Conference on Computing Frontiers, 2004.
[16]
R. B. Miller. Response time in man-computer conversational transactions. In Proceedings of the December 9--11, 1968, fall joint computer conference, part I. ACM, 1968.
[17]
B. Mozafari. Verdict: A system for stochastic query planning. In CIDR, Biennial Conference on Innovative Data Systems, 2015.
[18]
B. Mozafari, C. Curino, A. Jindal, and S. Madden. Performance and resource modeling in highly-concurrent OLTP workloads. In SIGMOD, 2013.
[19]
B. Mozafari, C. Curino, and S. Madden. DBSeer: Resource and performance prediction for building a next generation database cloud. In CIDR, 2013.
[20]
B. Mozafari, E. Z. Y. Goh, and D. Y. Yoon. CliffGuard: A principled framework for finding robust database designs. In SIGMOD, 2015.
[21]
B. Mozafari and N. Niu. A handbook for building an approximate query engine. IEEE Data Eng. Bull., 2015.
[22]
B. Mozafari, J. Ramnarayan, S. Menon, Y. Mahajan, S. Chakraborty, H. Bhanawat, and K. Bachhav. Snappydata: A unified cluster for streaming, transactions, and interactive analytics. In CIDR, 2017.
[23]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 2011.
[24]
N. Pansare, V. R. Borkar, C. Jermaine, and T. Condie. Online aggregation for large mapreduce jobs. PVLDB, 4, 2011.
[25]
Y. Park, M. Cafarella, and B. Mozafari. Visualization-aware sampling for very large databases. ICDE, 2016.
[26]
Y. Park, A. S. Tajik, M. Cafarella, and B. Mozafari. Database Learning: Towards a database that becomes smarter every time. In SIGMOD, 2017.
[27]
N. Potti and J. M. Patel. DAQ: a new paradigm for approximate query processing. PVLDB, 8, 2015.
[28]
C. Qin and F. Rusu. Pf-ola: a high-performance framework for parallel online aggregation. Distributed and Parallel Databases, 2013.
[29]
J. Ramnarayan, B. Mozafari, S. Menon, S. Wale, N. Kumar, H. Bhanawat, S. Chakraborty, Y. Mahajan, R. Mishra, and K. Bachhav. Snappydata: A hybrid transactional analytical store built on spark. In SIGMOD, 2016.
[30]
E. Russo. Applying moore's law to data growth. https://www.datavail.com/blog/applying-moores-law-data-growth/.
[31]
B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin--Madison, 2010.
[32]
I. Stoica. For big data, moore's law means better decisions. https://amplab.cs.berkeley.edu/for-big-data-moores-law-means-better-decisions/.
[33]
H. Su, M. Zait, V. Barrière, J. Torres, and A. Menck. Approximate aggregates in oracle 12c, 2016.
[34]
S. Vrbsky, K. Smith, and J. Liu. An object-oriented semantic data model to support approximate query processing. In Proceedings of IFIP TC2 Working Conference on Object-Oriented Database Semantics, 1990.
[35]
K. Zeng, S. Agarwal, A. Dave, M. Armbrust, and I. Stoica. G-OLA: Generalized on-line aggregation for interactive analysis on big data. In SIGMOD, 2015.

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
  • (2024)Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data ExplorationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334145136:11(6532-6546)Online publication date: Nov-2024
  • Show More Cited By

Index Terms

  1. Approximate Query Engines: Commercial Challenges and Research Opportunities

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
      May 2017
      1810 pages
      ISBN:9781450341974
      DOI:10.1145/3035918
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 May 2017

      Check for updates

      Author Tags

      1. analytics
      2. approximation
      3. interactive response times

      Qualifiers

      • Invited-talk

      Funding Sources

      Conference

      SIGMOD/PODS'17
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)54
      • Downloads (Last 6 weeks)10
      Reflects downloads up to 20 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
      • (2024)Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and QualityProceedings of the ACM on Management of Data10.1145/36771342:4(1-31)Online publication date: 30-Sep-2024
      • (2024)Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data ExplorationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334145136:11(6532-6546)Online publication date: Nov-2024
      • (2024)Generalized Measure-Biased Sampling and Priority SamplingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.334067336:11(6251-6265)Online publication date: Nov-2024
      • (2022)JENNERProceedings of the VLDB Endowment10.14778/3551793.355182215:11(2666-2678)Online publication date: 1-Jul-2022
      • (2022)One Size Does Not Fit All: A Bandit-Based Sampler Combination Framework with Theoretical GuaranteesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517900(531-544)Online publication date: 10-Jun-2022
      • (2022)Road-aware Indexing for Trajectory Range QueriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3220822(1-14)Online publication date: 2022
      • (2022)Prediction Intervals for Learned Cardinality Estimation: An Experimental Evaluation2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00274(3051-3064)Online publication date: May-2022
      • (2022)Revisiting Approximate Query Processing and Bootstrap Error Estimation on GPUDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_5(72-87)Online publication date: 11-Apr-2022
      • (2021)Approximate computation for big data analyticsACM SIGWEB Newsletter10.1145/3447879.34478832021:Winter(1-8)Online publication date: 19-Feb-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media