Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2803140.2803148acmotherconferencesArticle/Chapter ViewAbstractPublication PagesimdmConference Proceedingsconference-collections
research-article

Query Optimization Time: The New Bottleneck in Real-time Analytics

Published: 31 August 2015 Publication History

Abstract

In the recent past, in-memory distributed database management systems have become increasingly popular to manage and query huge amounts of data. For an in-memory distributed database like MemSQL, it is imperative that the analytical queries run fast. A huge proportion of MemSQL's customer workloads have ad-hoc analytical queries that need to finish execution within a second or a few seconds. This leaves us with very little time to perform query optimization for complex queries involving several joins, aggregations, sub-queries etc. Even for queries that are not ad-hoc, a change in data statistics can trigger query re-optimization. Query Optimization, if not done intelligently, could very well be the bottleneck for such complex analytical queries that require real-time response. In this paper, we outline some of the early steps that we have taken to reduce the query optimization time without sacrificing plan quality. We optimized the Enumerator (the optimizer component that determines operator order), which takes up bulk of the optimization time. Generating bushy plans inside the Enumerator can be a bottleneck and so we used heuristics to generate bushy plans via query rewrite. We also implemented new distribution aware greedy heuristics to generate a good starting candidate plan that significantly prunes out states during search space analysis inside the Enumerator. We demonstrate the effectiveness of these techniques over several queries in TPC-H and TPC-DS benchmarks.

References

[1]
P. Selinger et al "Access Path Selection in a Database Management System", Proc. ACM SIGMOD, 1979.
[2]
R. Othayoth et al., "The Making of TPC-DS", Proc. of the 32nd International Conf. on VLDB, Seoul, S. Korea, 2006.
[3]
K. Ono and G. M. Lohman, "Measuring the Complexity of Join Enumeration in Query Optimization", Proc. Of 16th VLDB, Conf., Brisbane, Australia, 1990.
[4]
G. Moerkotte and W. Scheufele, "Constructing Optimal Bushy Processing Trees for Join Queries is NP-hard", Technical Report, University of Mannheim 1996.
[5]
R. Ahmed, R Sen, M Poess, S Chakkappen "Of Snowstorms and Bushy Trees", Proc. of the 40th VLDB Conference, Hangzhao, China, 2014.
[6]
S. Shankar et al. Query Optimization in Microsoft SQL Server PDW. In SIGMOD, 2012
[7]
M A. Soliman et al.: Orca: a modular query optimizer architecture for big data. SIGMOD Conference 2014
[8]
Lamb et al.: The Vertica Analytic Database: C-Store 7 Years Later. PVLDB 5(12): 1790--1801 (2012)
[9]
Farber et al. SAP HANA database: data management for modern business applications. SIGMOD Record, 2011
[10]
Teradata. http://www.teradata.com/Teradata-Aster-Database
[11]
M. Singh and B. Leonhardi. Introduction to the IBM Netezza Warehouse Appliance. In CASCON, 2011.
[12]
R. Weiss. A Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server, 2012.
[13]
MemSQL. www.memsql.com
[14]
MemSQL Two-Tiered Architecture. http://docs.memsql.com/docs/latest/intro.html#two-tiered-architecture
[15]
Amazon EC2. http://aws.amazon.com/ec2/
[16]
VectorWise. http://www.actian.com
[17]
SAP HANA. http://hana.sap.com/abouthana.html
[18]
P. Seshadri, H. Pirahesh, T.Y. Leung. "Complex Query Decorrelation", In ICDE, 1996
[19]
Bruno et al. Polynomial Heuristics for Query Optimization. In ICDE, 2010.
[20]
A. Swami. "Optimization of large join queries: combining heuristics and combinatorial techniques". In SIGMOD 1989
[21]
L. Fegaras. "A new heuristic for optimizing large queries". In DEXA, 1998
[22]
F Waas, J Hellerstein. "Parallelizing Extensible Query Optimizers". In SIGMOD 2009.
[23]
Han et al. "Parallelizing Query Optimization". VLDB 2008.
[24]
Heimel et al. "A First Step Towards GPU-assisted Query Optimization". In ADMS, 2012

Cited By

View all
  • (2019)Communication-Efficient Data Aggregation Tree Construction for Complex Queries in IoT ApplicationsIEEE Internet of Things Journal10.1109/JIOT.2018.28828206:2(3352-3363)Online publication date: Apr-2019
  • (2017)Access Path Selection in Main-Memory Optimized Data SystemsProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3064049(715-730)Online publication date: 9-May-2017
  • (2016)The MemSQL query optimizerProceedings of the VLDB Endowment10.14778/3007263.30072779:13(1401-1412)Online publication date: 1-Sep-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
IMDM '15: Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics
August 2015
63 pages
ISBN:9781450337137
DOI:10.1145/2803140
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • SAMSUNG: SAMSUNG
  • VLDB Endowment: Very Large Database Endowment
  • Microsoft: Microsoft

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 August 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IMDM '15

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Communication-Efficient Data Aggregation Tree Construction for Complex Queries in IoT ApplicationsIEEE Internet of Things Journal10.1109/JIOT.2018.28828206:2(3352-3363)Online publication date: Apr-2019
  • (2017)Access Path Selection in Main-Memory Optimized Data SystemsProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3064049(715-730)Online publication date: 9-May-2017
  • (2016)The MemSQL query optimizerProceedings of the VLDB Endowment10.14778/3007263.30072779:13(1401-1412)Online publication date: 1-Sep-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media