Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2983323.2983353acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Approximate Aggregates in Oracle 12C

Published: 24 October 2016 Publication History

Abstract

New generation of analytic applications emerged to process data generated from non conventional sources. The challenge for the traditional database systems is that the data sets are very large and keep increasing at a very high rate while the application users have higher performance expectations. The most straightforward response to this challenge is to deploy larger hardware configurations making the solution very expensive and not acceptable for most cases. Alternative solutions fall into two categories: reduce the data set using sampling techniques or reduce the computational complexity of expensive database operations by using alternative algorithms. Alternative algorithms considered in this paper are approximate aggregates that perform a lot better at the cost of reduced and tolerable accuracy. In Oracle 12C we introduced approximate aggregates of expensive aggregate functions that are very common in analytic applications, that is, approximate count distinct and approximate percentile. The performance is improved in two ways. First, the approximate aggregates use bounded memory, often eliminating the need to use temporary storage which results in significant performance improvement over the exact aggregates. Second, we provide materialized view support that allows users to store pre-computed results of approximate aggregates. These results can be rolled up to answer queries on different dimensions (such rollup is not possible for exact aggregates).

References

[1]
Acharya, S., Gibbons, P.B., Poosala, V. and Ramaswamy, S. The Aqua approximate query answering system. SIGMOD 1999, pages 574--576.
[2]
Chakkappen, S., Cruanes, T., Dageville, B., Jiang, L., Shaft, U., Su, H. and Zait, M. Efficient and scalable statistics gathering for large databases in Oracle. SIGMOD 2008, pages 1053--1064.
[3]
Flajolet, P., Fusy, E., Gandouet, O. and Meunier, F. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. Analysis of Algorithms, 2007, pages 127--146
[4]
Gibbons, P.B. Distinct sampling for highly-accurate answers to distinct value queries and event reports. VLDB 2001, 541--550.
[5]
Heule, S., Nunkesser, M. and Hall, A. Hyperloglog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. EDBT/ICDT 2013.
[6]
Wang, L., Luo, G., Yi, K. and Cormode G. Quantiles over data streams: an experimental study. SIGMOD 2013, pages 737--748.
[7]
Cormode, G. and Muthukrishnan, S. An improved data stream summary: the count-min sketch and its applications. Journal of algorithms, 55(1):58--75, 2005.
[8]
Zhang, Q. and Wang, W. A fast algorithm for approximate quantiles in high speed data streams. SSDBM 2007.
[9]
Space-efficient online computation of quantile summaries. SIGMOD 2001.
[10]
Random sampling techniques for space efficient online computation of order statistics of large datasets. S. Manku, S. Rajagopalan and G. Lindsay. SIGMOD 1999.
[11]
Shrivastava, N., Buragohain, C., Agrawal, D. and Suri, S. Medians and beyond: new aggregation techniques for sensor networks. ACM SenSys, 2004.
[12]
Manku, G.S. and Motwani, R. Approximate frequency counts over data streams. VLDB 2002, 346--357.
[13]
Agarwal, S., Milner, H., Kleiner, A., Talwalkar, A., Jordan, M., Madden, S., Mozafari, B. and Stoica, I. Knowing when you are wrong: building fast and reliable approximate query processing Systems. SIGMOD 2014.
[14]
Metwally, A., Agrawal, D. and Abbadi A. E. Efficient computation of frequent and top-k elements in data streams. ICDT 2005.
[15]
Hellerstein, J. M., Haas, P. J. and Wang, H. J. Online aggregation. SIGMOD 1997.
[16]
Potti, N. and Patel, J. M. DAQ: A new paradigm for approximate query process. VLDB 2015 Vol. 8, No. 9.
[17]
Aagrwal, S. Mozafari, B., Panda, A., Milner, H., Madden, S. and Stoica, I. Blinkdb: queries with bounded errors and bounded response times on very large data. ACM EuroSys, 2013.
[18]
Greenwald, M. and Khanna, S. Space-efficient online computation of quantile summaries. SIGMOD 2001.
[19]
Zhang, Q. and Wang, W. A fast algorithm for approximate quantiles in high speed data streams. SSDBM 2007.
[20]
Bellamkonda, B., Li, H., Jagtap, U., Zhu, Y., Liang, V. and Cruanes, T. Adaptive and Big Data Scale Parallel Execution in Oracle. VLDB 2013.
[21]
Chan, L. Presto: Interacting with petabytes of data at Facebook. https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920/. 2013.
[22]
Rhodes, L. Fast, approximate analysis of big data. http://yahooeng.tumblr.com/post/135390948446/data-sketches. 2015.
[23]
Hellerstein, J.M., Re, C., Schoppmann, F. and Wang, Z. etc. The MADlib analytics library or MAD Skills, The SQL. Technical Report No. UCB/EECS-2012--38.
[24]
Open Source Release: postgresql-hll. 2014
[25]
Mukherjee, N., Chavan, S. and Colgan M. etc. Distributed Architecture of Oracle Database In-memory. VLDB 2015

Cited By

View all
  • (2023)Privacy-Preserving Record Linkage for Cardinality CountingProceedings of the 2023 ACM Asia Conference on Computer and Communications Security10.1145/3579856.3590338(53-64)Online publication date: 10-Jul-2023
  • (2022)Prediction Intervals for Learned Cardinality Estimation: An Experimental Evaluation2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00274(3051-3064)Online publication date: May-2022
  • (2019)Overlay Indexes: Efficiently Supporting Aggregate Range Queries and Authenticated Data Structures in Off-the-Shelf DatabasesIEEE Access10.1109/ACCESS.2019.29573467(175642-175670)Online publication date: 2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. approximate query processing
  2. bigdata

Qualifiers

  • Research-article

Conference

CIKM'16
Sponsor:
CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Privacy-Preserving Record Linkage for Cardinality CountingProceedings of the 2023 ACM Asia Conference on Computer and Communications Security10.1145/3579856.3590338(53-64)Online publication date: 10-Jul-2023
  • (2022)Prediction Intervals for Learned Cardinality Estimation: An Experimental Evaluation2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00274(3051-3064)Online publication date: May-2022
  • (2019)Overlay Indexes: Efficiently Supporting Aggregate Range Queries and Authenticated Data Structures in Off-the-Shelf DatabasesIEEE Access10.1109/ACCESS.2019.29573467(175642-175670)Online publication date: 2019
  • (2018)Approximate Query Processing: What is New and Where to Go?Data Science and Engineering10.1007/s41019-018-0074-43:4(379-397)Online publication date: 14-Sep-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media