SLA-driven resource re-allocation for SQL-like queries in the cloud

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Cloud computing has become a widely used environment for database querying. In this context, the goal of a query optimizer is to satisfy the needs of tenants and maximize the provider’s benefit. Resource allocation is an important step toward achieving this goal. Allocation methods are based on analytical formulas and statistics collected from a catalog to estimate the cost of various possible allocations and then choose the best one. However, the allocation initially chosen is not necessarily the optimal one because of the approximate nature of the analytical formulas and the fact that the catalog may not be up to date. To solve this problem, existing work was proposed to collect statistics during the execution of the query and then trigger a re-allocation if suboptimality is detected. However, these proposals consider that queries have the same level of priority. Unlike the existing work, we propose in this paper a method of statistics collector placement and resource re-allocation by taking into account that the cloud is a multi-tenant environment and queries have different services-level agreements. In the experimental section, we show that our method provides a better benefit for the provider compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based Applications

SLA-RALBA: cost-efficient and resource-aware load balancing algorithm for cloud computing

Article 05 June 2019

Distributed query optimization strategies for cloud environment

Article 05 October 2021

Notes

In practice, there may be an additional waiting time if other different queries share the same resources. If so, the waiting time must be included in $ET_{stat}(q)$.

References

Agarwal S, Kandula S, Bruno N, Wu MC, Stoica I, Zhou J (2012) Reoptimizing data parallel computing. In: Presented as part of the 9th $\{$USENIX$\}$ symposium on networked systems design and implementation ($\{$NSDI$\}$ 12), pp 281–294
Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A et al (2015) Spark sql: Relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1383–1394
AWS (2020) Aws database. https://aws.amazon.com/fr/products/databases/. Accessed 15 Apr 2020
AWS (2020) Aws elastic map reduce. https://aws.amazon.com/emr/. Accessed 15 Apr 2020
Azure (2020) Azure database. https://azure.microsoft.com/en-gb/services/ sql-database/. Accessed 15 Apr 2020
Azure (2020) Azure hdinsight. https://azure.microsoft.com/en-gb/services/ hdinsight/. Accessed 05 Apr 2020
Bi J, Yuan H, Tan W, Zhou M, Fan Y, Zhang J, Li J (2015) Application-aware dynamic fine-grained resource provisioning in a virtualized cloud data center. IEEE Trans Autom Sci Eng 14(2):1172–1184
Article Google Scholar
Bi J, Yuan H, Tie M, Tan W (2015) Sla-based optimisation of virtualised resource for multi-tier web applications in cloud data centres. Enterp Inf Syst 9(7):743–767
Article Google Scholar
Bouganim L, Kapitskaia O, Valduriez P (1998) Memory-adaptive scheduling for large query execution. In: Proceedings of the seventh international conference on information and knowledge management. ACM, pp 105–115
Bruno N, Jain S, Zhou J (2013) Continuous cloud-scale query optimization and processing. Proc VLDB Endow 6(11):961–972
Article Google Scholar
Date CJ, Darwen H (1987) A guide to the SQL standard, vol 3. Addison-Wesley, New York
Google Scholar
Floratou A, Minhas UF, Özcan F (2014) Sql-on-hadoop: full circle back to shared-nothing database architectures. Proc VLDB Endow 7(12):1295–1306
Article Google Scholar
Garcia-Molina H (2008) Database systems: the complete book. Pearson Education India, Noida
Google Scholar
Hive (2020) Hive testbench. https://github.com/hortonworks/hive-testbench. Accessed 15 Apr 2020
Ioannidis YE, Christodoulakis S (1991) On the propagation of errors in the size of join results, vol 20. ACM, New York
Google Scholar
Kabra N, DeWitt DJ (1998) Efficient mid-query re-optimization of sub-optimal query execution plans. ACM SIGMOD Record ACM 27:106–117
Article Google Scholar
Kandi MM, Yin S, Hameurlain A (2018) An integer linear-programming based resource allocation method for sql-like queries in the cloud. In: Proceedings of the 33rd annual ACM symposium on applied computing. ACM, pp 161–166
Kandi MM, Yin S, Hameurlain A (2019) Resource auto-scaling for sql-like queries in the cloud based on parallel reinforcement learning. Int J Grid Util Comput 10(6):654–671
Article Google Scholar
Kllapi H, Sitaridi E, Tsangaris MM, Ioannidis Y (2011) Schedule optimization for data processing flows on the cloud. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 289–300
Nag B, DeWitt DJ (1998) Memory allocation strategies for complex decision support queries. In: Proceedings of the seventh international conference on information and knowledge management (CIKM), Citeseer, vol 2. pp 116–123
Notebookcheck (2020) The performance of the core i7-3630qm. https://www.notebookcheck.net/Intel-Core-i7-3630QM-Notebook-Processor.80051.0.html. Accessed 15 Apr 2020
Oracle (2020) Database reference db\_block\_size. https://docs.oracle.com/ cd/B19306_01/server.102/b14237/initparams041.ht m#REFRN10031. Accessed 15 Apr 2020
OSIRIM (2020) Osirim. https://osirim.irit.fr/site/fr/articles/description-slurm. Accessed 15 Apr 2020
Pietri I, Chronis Y, Ioannidis Y (2019) Fairness in dataflow scheduling in the cloud. Inf Syst 83:118–125. https://doi.org/10.1016/j.is.2019.03.003
Article Google Scholar
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010) Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th international conference on data engineering (ICDE). IEEE, pp 996–1005
TPC-H (2020) Tpc-h. http://www.tpc.org/tpch/. Accessed 15 Apr 2020
Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S et al (2013) Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing. ACM, p 5
Yin S, Hameurlain A, Morvan F (2018) SLA definition for multi-tenant dbms and its impact on query optimization. IEEE Trans Knowl Data Eng 30(11):2213–2226
Google Scholar
Yu PS, Cornell DW (1993) Buffer management based on return on consumption in a multi-query environment. VLDB J 2(1):1–38
Article Google Scholar
Yuan H, Bi J, Tan W, Li BH (2016) Temporal task scheduling with constrained service delay for profit maximization in hybrid clouds. IEEE Trans Autom Sci Eng 14(1):337–348
Article Google Scholar
Yuan H, Bi J, Zhou M, Ammari AC (2017) Time-aware multi-application task scheduling with guaranteed delay constraints in green data center. IEEE Trans Autom Sci Eng 15(3):1138–1151
Article Google Scholar
Yuan H, Bi J, Zhou M (2018) Spatial task scheduling for cost minimization in distributed green cloud data centers. IEEE Trans Autom Sci Eng 16(2):729–740
Article Google Scholar
Yuan H, Bi J, Zhou M (2019) Profit-sensitive spatial scheduling of multi-application tasks in distributed green clouds. IEEE Trans Autom Sci Eng PP(99):1–10. https://doi.org/10.1109/TASE.2019.2909866

Download references

Acknowledgements

This work is supported in part by the Ministries of Europe and Foreign Affairs (MEFA) and of Higher Education, Research and Innovation (MHERI), Amadeus Program 2020 (French-Austrian Hubert Curien Partnership—PHC) (Grant Number 44086TD).

Author information

Authors and Affiliations

IRIT Laboratory, Paul Sabatier University, 118 Route de Narbonne, 31062, Toulouse, France
Mohamed Mehdi Kandi, Shaoyi Yin & Abdelkader Hameurlain

Authors

Mohamed Mehdi Kandi
View author publications
You can also search for this author in PubMed Google Scholar
Shaoyi Yin
View author publications
You can also search for this author in PubMed Google Scholar
Abdelkader Hameurlain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Mehdi Kandi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Computation of $T_{op}$

In this appendix, we present the computation detail of $T_{op}(s)$ for the stages of the query of Figure 1 . For stages that contain joins or aggregations, we consider the case where a one-pass or two-pass algorithm is applied. The M1$\rightarrow $M2 and M2$\rightarrow $M3 communications are broadcast type, while the M3$\rightarrow $R1 and R1$\rightarrow $R2 communications are partitioning type.

The stage M1: (1) read $R_{M1}$ from DFS, (2) apply the filter , (3)apply the projection.

$$\begin{aligned}&T_{op}(M1)=(\nonumber \\&(||R_{M1}||/pd_{M1})/db_{dfs}+{ \textit{// (1)}}\nonumber \\&(|R_{M1}|/pd_{M1})*t_{filter}+{ \textit{// (2)}}\nonumber \\&\sigma (|R_{M1}|/pd_{M1})*t_{project}){ \textit{// (3)}} \end{aligned}$$

(14)

The stage M2 (one-pass join): (1) read $R_{M2}$ from local disk, (2) execute debuild phase, (3) read $S_{M2}$ from DFS, (4) apply the filter, (5) execute deprobe phase.

$$\begin{aligned}&T_{op}(M2)=(\nonumber \\&||R_{M2}||/db_l+{ \textit{// (1)}}\nonumber \\&|R_{M2}|*t_{hash}+{ \textit{// (2)}}\nonumber \\&(||S_{M2}||/pd_{M2})/db_{dfs}+{ \textit{// (3)}}\nonumber \\&(|S_{M2}|/pd_{M2})*t_{filter}+{ \textit{// (4)}}\nonumber \\&(\sigma (|S_{M2}|)/pd_{M2})*(t_{hash}+t_{search})+\nonumber \\&(\psi (|R_{M2}|*\sigma (|S_{M2}|))/pd_{M2})*t_{join}) { \textit{// (5)}} \end{aligned}$$

(15)

The stage M2 (two-pass join): (1) read $R_{M2}$ from local disk, (2) partition $R_{M2}$ for the two-pass join, (3) write $R_{M2}$ on local disk, (4) read $S_{M2}$ from DFS, (5) apply the filter, (6) partition $S_{M2}$ for the two-pass join, (7) write $S_{M2}$ on local disk, (8) read $R_{M2}$ from local disk, (9) execute debuild phase, (10) read $S_{M2}$ from local disk, (11) execute deprobe phase.

$$\begin{aligned}&T_{op}(M2)=(\nonumber \\&||R_{M2}||/db_{l}+{ \textit{// (1)}}\nonumber \\&|R_{M2}|*t_{hash}+{ \textit{// (2)}}\nonumber \\&((1-f)*||R_{M2}||)/db_l+{ \textit{// (3)}}\nonumber \\&(||S_{M2}||/pd_{M2})/db_{dfs}+{ \textit{// (4)}}\nonumber \\&(|S_{M2}|/pd_{M2})*t_{filter}+{ \textit{// (5)}}\nonumber \\&(\sigma (|S_{M2}|)/pd_{M2})*t_{hash}+{ \textit{// (6)}}\nonumber \\&(((1-f)*\sigma (||S_{M2}||))/pd_{M2})/db_l+{ \textit{// (7)}}\nonumber \\&((1-f)*||R_{M2}||)/db_l+{ \textit{// (8)}}\nonumber \\&|R_{M2}|*t_{hash}+{ \textit{// (9)}}\nonumber \\&(((1-f)*\sigma (||S_{M2}||))/pd_{M2})/db_{l}+{ \textit{// (10)}}\nonumber \\&(\sigma (|S_{M2}|)/pd_{M2})*(t_{hash}+t_{search})+\nonumber \\&(\psi (|R_{M2}|*\sigma (|S_{M2}|))/pd_{M2})*t_{join}) { \textit{// (11)}} \end{aligned}$$

(16)

The stage M3 (one-pass join): (1) read $R_{M3}$ from local disk, (2) execute debuild phase, (3) read $S_{M3}$ from DFS, (4) apply the filter, (5) apply the projection, (6) execute deprobe phase, (7) partition the data for the aggregation, (8) write the data on local disk, (9) read the data from local disk, (10) execute the aggregation operator.

$$\begin{aligned}&T_{op}(M3)=(\nonumber \\&||R_{M3}||/db_l+{ \textit{// (1)}}\nonumber \\&|R_{M3}|*t_{hash}+{ \textit{// (2)}}\nonumber \\&(||S_{M3}||/pd_{M3})/db_{dfs}+{ \textit{// (3)}}\nonumber \\&(|S_{M3}|/pd_{M3})*t_{filter}+{ \textit{// (4)}}\nonumber \\&(\sigma (|S_{M3}|)/pd_{M3})*t_{project}+{ \textit{// (5)}}\nonumber \\&(\sigma (|S_{M3}|)/pd_{M3})*(t_{hash}+t_{search})+\nonumber \\&(\psi (|R_{M3}|*\sigma (|S_{M3}|))/pd_{M3})*t_{join}+ { \textit{// (6)}}\nonumber \\&(\psi (|R_{M3}|*\sigma (|S_{M3}|))/pd_{M3})*t_{hash}+{ \textit{// (7)}}\nonumber \\&\psi (||R_{M3}||*\pi (\sigma (||S_{M3}||))/pd_{M3})/db_l+{ \textit{// (8)}}\nonumber \\&\psi (||R_{M3}||*\pi (\sigma (||S_{M3}||))/pd_{M3})/db_l+{ \textit{// (9)}}\nonumber \\&(\psi (|R_{M3}|*\sigma (|S_{M3}|))/pd_{M3})*(t_{hash}+t_{search}+t_{agg}))\nonumber \\&{ \textit{// (10)}} \end{aligned}$$

(17)

The stage M3 (two-pass join): (1) read $R_{M3}$ from local disk, (2) partition $R_{M3}$ for the two-pass join, (3) write $R_{M3}$ on local disk, (4) read $S_{M3}$ from DFS, (5) apply the filter, (6) apply the projection, (7) partition $S_{M3}$ for the two-pass join, (8) write $S_{M3}$ on local disk, (9) read $R_{M3}$ from local disk, (10) execute the build phase, (11) read $S_{M3}$ from DFS, (12) execute de probe phase, (13) partition the data for the aggregation, (14) write the data on local disk, (15) read the data from local disk, (16) execute de aggregation operator.

$$\begin{aligned}&T_{op}(M3)=(\nonumber \\&||R_{M3}||/db_{l}+{ \textit{// (1)}}\nonumber \\&|R_{M3}|*t_{hash}+{ \textit{// (2)}}\nonumber \\&((1-f)*||R_{M3}||)/db_l+{ \textit{// (3)}}\nonumber \\&(||S_{M3}||/pd_{M3})/db_{dfs}+{ \textit{// (4)}}\nonumber \\&(|S_{M3}|/pd_{M3})*t_{filter}+{ \textit{// (5)}}\nonumber \\&(\sigma (|S_{M3}|)*t_{project}+{ \textit{// (6)}}\nonumber \\&(\sigma (|S_{M3}|)/pd_{M3})*t_{hash}+{ \textit{// (7)}}\nonumber \\&(((1-f)*\pi (\sigma (||S_{M3}||)))/pd_{M3})/db_l+{ \textit{// (8)}}\nonumber \\ \end{aligned}$$

(18)

$$\begin{aligned}&((1-f)*||R_{M3}||)/db_l+{ \textit{// (9)}}\nonumber \\&|R_{M3}|*t_{hash}+{ \textit{// (10)}}\nonumber \\&(((1-f)*\pi (\sigma (||S_{M3}||)))/pd_{M3})/db_l+{ \textit{// (11)}}\nonumber \\&(\sigma (|S_{M3}|)/pd_{M3})*(t_{hash}+t_{search})+\nonumber \\&((\psi (|R_{M3}|*\sigma (|S_{M3}|))/pd_{M3})*t_{join})+ { \textit{// (12)}}\nonumber \\&((\psi (|R_{M3}|*\sigma (|S_{M3}|))/pd_{M3})*t_{hash}+{ \textit{// (13)}}\nonumber \\&((\psi (||R_{M3}||*\pi (\sigma (||S_{M3}||)))/pd_{M3})/db_l+{ \textit{// (14)}}\nonumber \\&(\psi (||R_{M3}||*\pi (\sigma (||S_{M3}||)))/pd_{M3})/db_l+{ \textit{// (15)}}\nonumber \\&(|R_{M3}|/pd_{M3})*(t_{hash}+t_{search}+t_{agg})){ \textit{// (16)}} \end{aligned}$$

(19)

The stage R1 (one-pass aggregation): (1) read $R_{R1}$ from local disk, (2) execute deaggregation operator.

$$\begin{aligned}&T_{op}(R1)=(\nonumber \\&(||R_{R1}||/pd_{R1})/db_{l}+{ \textit{// (1)}}\nonumber \\&(|R_{R1}|/pd_{R1})*(t_{hash}+t_{search}+t_{agg})){ \textit{// (2)}} \end{aligned}$$

(20)

The stage R1 (two-pass aggregation): (1) read $R_{R1}$ from local disk, (2) partition the data for the aggregation, (3) write the data on local disk, (4) read the data from local disk, (5) execute de aggregation operator.

$$\begin{aligned}&T_{op}(R1)=(\nonumber \\&(||R_{R1}||/pd_{R1})/db_{l}+{ \textit{// (1)}}\nonumber \\&(||R_{R1}||/pd_{R1})*t_{hash}+{ \textit{// (2)}}\nonumber \\&(((1-f)*||R_{R1}||)/pd_{R1})/db_l+{ \textit{// (3)}}\nonumber \\&(((1-f)*||R_{R1}||)/pd_{R1})/db_l+{ \textit{// (4)}}\nonumber \\&(|R_{R1}|/pd_{R1})*(t_{hash}+t_{search}+t_{agg})\nonumber \\&{ \textit{// (5)}} \end{aligned}$$

(21)

The stage R2: (1) read the data from local disk, (2) write the data on DFS.

$$\begin{aligned}&T_{op}(R2)=(\nonumber \\&limit(R_{R2})/db_l+{ \textit{// (1)}}\nonumber \\&limit(R_{R2})/db_{dfs}){ \textit{// (2)}} \end{aligned}$$

(22)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kandi, M.M., Yin, S. & Hameurlain, A. SLA-driven resource re-allocation for SQL-like queries in the cloud. Knowl Inf Syst 62, 4653–4680 (2020). https://doi.org/10.1007/s10115-020-01501-z

Download citation

Received: 22 July 2019
Revised: 21 July 2020
Accepted: 25 July 2020
Published: 11 August 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10115-020-01501-z

SLA-driven resource re-allocation for SQL-like queries in the cloud

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based Applications

SLA-RALBA: cost-efficient and resource-aware load balancing algorithm for cloud computing

Distributed query optimization strategies for cloud environment

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Computation of \(T_{op}\)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

SLA-driven resource re-allocation for SQL-like queries in the cloud

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SQL Query Optimization in Distributed NoSQL Databases for Cloud-Based Applications

SLA-RALBA: cost-efficient and resource-aware load balancing algorithm for cloud computing

Distributed query optimization strategies for cloud environment

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A: Computation of \(T_{op}\)

Appendix A: Computation of \(T_{op}\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now