research-article

Approximate Aggregates in Oracle 12C

Authors:

Vladimir Barrière,

Andre MenckAuthors Info & Claims

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 1603 - 1612

https://doi.org/10.1145/2983323.2983353

Published: 24 October 2016 Publication History

Abstract

New generation of analytic applications emerged to process data generated from non conventional sources. The challenge for the traditional database systems is that the data sets are very large and keep increasing at a very high rate while the application users have higher performance expectations. The most straightforward response to this challenge is to deploy larger hardware configurations making the solution very expensive and not acceptable for most cases. Alternative solutions fall into two categories: reduce the data set using sampling techniques or reduce the computational complexity of expensive database operations by using alternative algorithms. Alternative algorithms considered in this paper are approximate aggregates that perform a lot better at the cost of reduced and tolerable accuracy. In Oracle 12C we introduced approximate aggregates of expensive aggregate functions that are very common in analytic applications, that is, approximate count distinct and approximate percentile. The performance is improved in two ways. First, the approximate aggregates use bounded memory, often eliminating the need to use temporary storage which results in significant performance improvement over the exact aggregates. Second, we provide materialized view support that allows users to store pre-computed results of approximate aggregates. These results can be rolled up to answer queries on different dimensions (such rollup is not possible for exact aggregates).

References

[1]

Acharya, S., Gibbons, P.B., Poosala, V. and Ramaswamy, S. The Aqua approximate query answering system. SIGMOD 1999, pages 574--576.

Digital Library

[2]

Chakkappen, S., Cruanes, T., Dageville, B., Jiang, L., Shaft, U., Su, H. and Zait, M. Efficient and scalable statistics gathering for large databases in Oracle. SIGMOD 2008, pages 1053--1064.

Digital Library

[3]

Flajolet, P., Fusy, E., Gandouet, O. and Meunier, F. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Analysis of Algorithms, 2007, pages 127--146

[4]

Gibbons, P.B. Distinct sampling for highly-accurate answers to distinct value queries and event reports. VLDB 2001, 541--550.

Digital Library

[5]

Heule, S., Nunkesser, M. and Hall, A. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. EDBT/ICDT 2013.

Digital Library

[6]

Wang, L., Luo, G., Yi, K. and Cormode G. Quantiles over data streams: an experimental study. SIGMOD 2013, pages 737--748.

Digital Library

[7]

Cormode, G. and Muthukrishnan, S. An improved data stream summary: the count-min sketch and its applications. Journal of algorithms, 55(1):58--75, 2005.

Digital Library

[8]

Zhang, Q. and Wang, W. A fast algorithm for approximate quantiles in high speed data streams. SSDBM 2007.

Digital Library

[9]

Space-efficient online computation of quantile summaries. SIGMOD 2001.

Digital Library

[10]

Random sampling techniques for space efficient online computation of order statistics of large datasets. S. Manku, S. Rajagopalan and G. Lindsay. SIGMOD 1999.

Digital Library

[11]

Shrivastava, N., Buragohain, C., Agrawal, D. and Suri, S. Medians and beyond: new aggregation techniques for sensor networks. ACM SenSys, 2004.

Digital Library

[12]

Manku, G.S. and Motwani, R. Approximate frequency counts over data streams. VLDB 2002, 346--357.

Digital Library

[13]

Agarwal, S., Milner, H., Kleiner, A., Talwalkar, A., Jordan, M., Madden, S., Mozafari, B. and Stoica, I. Knowing when you are wrong: building fast and reliable approximate query processing Systems. SIGMOD 2014.

Digital Library

[14]

Metwally, A., Agrawal, D. and Abbadi A. E. Efficient computation of frequent and top-k elements in data streams. ICDT 2005.

Digital Library

[15]

Hellerstein, J. M., Haas, P. J. and Wang, H. J. Online aggregation. SIGMOD 1997.

Digital Library

[16]

Potti, N. and Patel, J. M. DAQ: A new paradigm for approximate query process. VLDB 2015 Vol. 8, No. 9.

Digital Library

[17]

Aagrwal, S. Mozafari, B., Panda, A., Milner, H., Madden, S. and Stoica, I. Blinkdb: queries with bounded errors and bounded response times on very large data. ACM EuroSys, 2013.

Digital Library

[18]

Greenwald, M. and Khanna, S. Space-efficient online computation of quantile summaries. SIGMOD 2001.

Digital Library

[19]

Zhang, Q. and Wang, W. A fast algorithm for approximate quantiles in high speed data streams. SSDBM 2007.

Digital Library

[20]

Bellamkonda, B., Li, H., Jagtap, U., Zhu, Y., Liang, V. and Cruanes, T. Adaptive and Big Data Scale Parallel Execution in Oracle. VLDB 2013.

Digital Library

[21]

Chan, L. Presto: Interacting with petabytes of data at Facebook. https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920/. 2013.

[22]

Rhodes, L. Fast, approximate analysis of big data. http://yahooeng.tumblr.com/post/135390948446/data-sketches. 2015.

[23]

Hellerstein, J.M., Re, C., Schoppmann, F. and Wang, Z. etc. The MADlib analytics library or MAD Skills, The SQL. Technical Report No. UCB/EECS-2012--38.

[24]

Open Source Release: postgresql-hll. 2014

[25]

Mukherjee, N., Chavan, S. and Colgan M. etc. Distributed Architecture of Oracle Database In-memory. VLDB 2015

Digital Library

Cited By

Wu NVatsalan DKaafar MRamesh S(2023)Privacy-Preserving Record Linkage for Cardinality CountingProceedings of the 2023 ACM Asia Conference on Computer and Communications Security10.1145/3579856.3590338(53-64)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.1145/3579856.3590338
Thirumuruganathan SShetiya SKoudas NDas G(2022)Prediction Intervals for Learned Cardinality Estimation: An Experimental Evaluation2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00274(3051-3064)Online publication date: May-2022
https://doi.org/10.1109/ICDE53745.2022.00274
Pennino DPizzonia MPapi A(2019)Overlay Indexes: Efficiently Supporting Aggregate Range Queries and Authenticated Data Structures in Off-the-Shelf DatabasesIEEE Access10.1109/ACCESS.2019.29573467(175642-175670)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2957346
Show More Cited By

Index Terms

Approximate Aggregates in Oracle 12C
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Relational database model
    2. Database management system engines
      1. Database query processing
        Query optimization
2. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis

Recommendations

Approximate query processing for database flexible querying with aggregates
Transactions on Large-Scale Data- and Knowledge-Centered Systems V

Database flexible querying is an alternative to the classic one for users. The use of Formal Concepts Analysis (FCA) makes it possible to turn approximate answers that those turned over by a classic DataBase Management System (DBMS). Some applications ...
Estimating aggregates in time-constrained approximate queries in Oracle
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

The concept of time-constrained SQL queries was introduced to address the problem of long-running SQL queries. A key approach adopted for supporting time-constrained SQL queries is to use sampling to reduce the amount of data that needs to be processed, ...
APPROXIMATE: A Query Processor that Produces Monotonically Improving Approximate Answers

APPROXIMATE, a query processor that makes approximate answers available if part of the database is unavailable, or if there is not enough time to produce an exact answer, is described. The processor implements approximate query processing, and the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

October 2016

2566 pages

ISBN:9781450340731

DOI:10.1145/2983323

General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM'16

Sponsor:

CIKM'16: ACM Conference on Information and Knowledge Management

October 24 - 28, 2016

Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
356
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu NVatsalan DKaafar MRamesh S(2023)Privacy-Preserving Record Linkage for Cardinality CountingProceedings of the 2023 ACM Asia Conference on Computer and Communications Security10.1145/3579856.3590338(53-64)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.1145/3579856.3590338
Thirumuruganathan SShetiya SKoudas NDas G(2022)Prediction Intervals for Learned Cardinality Estimation: An Experimental Evaluation2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00274(3051-3064)Online publication date: May-2022
https://doi.org/10.1109/ICDE53745.2022.00274
Pennino DPizzonia MPapi A(2019)Overlay Indexes: Efficiently Supporting Aggregate Range Queries and Authenticated Data Structures in Off-the-Shelf DatabasesIEEE Access10.1109/ACCESS.2019.29573467(175642-175670)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2957346
Li KLi G(2018)Approximate Query Processing: What is New and Where to Go?Data Science and Engineering10.1007/s41019-018-0074-43:4(379-397)Online publication date: 14-Sep-2018
https://doi.org/10.1007/s41019-018-0074-4

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents