Abstract
Cloud computing has become a widely used environment for database querying. In this context, the goal of a query optimizer is to satisfy the needs of tenants and maximize the provider’s benefit. Resource allocation is an important step toward achieving this goal. Allocation methods are based on analytical formulas and statistics collected from a catalog to estimate the cost of various possible allocations and then choose the best one. However, the allocation initially chosen is not necessarily the optimal one because of the approximate nature of the analytical formulas and the fact that the catalog may not be up to date. To solve this problem, existing work was proposed to collect statistics during the execution of the query and then trigger a re-allocation if suboptimality is detected. However, these proposals consider that queries have the same level of priority. Unlike the existing work, we propose in this paper a method of statistics collector placement and resource re-allocation by taking into account that the cloud is a multi-tenant environment and queries have different services-level agreements. In the experimental section, we show that our method provides a better benefit for the provider compared to state-of-the-art methods.
Similar content being viewed by others
Notes
In practice, there may be an additional waiting time if other different queries share the same resources. If so, the waiting time must be included in \(ET_{stat}(q)\).
References
Agarwal S, Kandula S, Bruno N, Wu MC, Stoica I, Zhou J (2012) Reoptimizing data parallel computing. In: Presented as part of the 9th \(\{\)USENIX\(\}\) symposium on networked systems design and implementation (\(\{\)NSDI\(\}\) 12), pp 281–294
Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A et al (2015) Spark sql: Relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM, pp 1383–1394
AWS (2020) Aws database. https://aws.amazon.com/fr/products/databases/. Accessed 15 Apr 2020
AWS (2020) Aws elastic map reduce. https://aws.amazon.com/emr/. Accessed 15 Apr 2020
Azure (2020) Azure database. https://azure.microsoft.com/en-gb/services/ sql-database/. Accessed 15 Apr 2020
Azure (2020) Azure hdinsight. https://azure.microsoft.com/en-gb/services/ hdinsight/. Accessed 05 Apr 2020
Bi J, Yuan H, Tan W, Zhou M, Fan Y, Zhang J, Li J (2015) Application-aware dynamic fine-grained resource provisioning in a virtualized cloud data center. IEEE Trans Autom Sci Eng 14(2):1172–1184
Bi J, Yuan H, Tie M, Tan W (2015) Sla-based optimisation of virtualised resource for multi-tier web applications in cloud data centres. Enterp Inf Syst 9(7):743–767
Bouganim L, Kapitskaia O, Valduriez P (1998) Memory-adaptive scheduling for large query execution. In: Proceedings of the seventh international conference on information and knowledge management. ACM, pp 105–115
Bruno N, Jain S, Zhou J (2013) Continuous cloud-scale query optimization and processing. Proc VLDB Endow 6(11):961–972
Date CJ, Darwen H (1987) A guide to the SQL standard, vol 3. Addison-Wesley, New York
Floratou A, Minhas UF, Özcan F (2014) Sql-on-hadoop: full circle back to shared-nothing database architectures. Proc VLDB Endow 7(12):1295–1306
Garcia-Molina H (2008) Database systems: the complete book. Pearson Education India, Noida
Hive (2020) Hive testbench. https://github.com/hortonworks/hive-testbench. Accessed 15 Apr 2020
Ioannidis YE, Christodoulakis S (1991) On the propagation of errors in the size of join results, vol 20. ACM, New York
Kabra N, DeWitt DJ (1998) Efficient mid-query re-optimization of sub-optimal query execution plans. ACM SIGMOD Record ACM 27:106–117
Kandi MM, Yin S, Hameurlain A (2018) An integer linear-programming based resource allocation method for sql-like queries in the cloud. In: Proceedings of the 33rd annual ACM symposium on applied computing. ACM, pp 161–166
Kandi MM, Yin S, Hameurlain A (2019) Resource auto-scaling for sql-like queries in the cloud based on parallel reinforcement learning. Int J Grid Util Comput 10(6):654–671
Kllapi H, Sitaridi E, Tsangaris MM, Ioannidis Y (2011) Schedule optimization for data processing flows on the cloud. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 289–300
Nag B, DeWitt DJ (1998) Memory allocation strategies for complex decision support queries. In: Proceedings of the seventh international conference on information and knowledge management (CIKM), Citeseer, vol 2. pp 116–123
Notebookcheck (2020) The performance of the core i7-3630qm. https://www.notebookcheck.net/Intel-Core-i7-3630QM-Notebook-Processor.80051.0.html. Accessed 15 Apr 2020
Oracle (2020) Database reference db\_block\_size. https://docs.oracle.com/ cd/B19306_01/server.102/b14237/initparams041.ht m#REFRN10031. Accessed 15 Apr 2020
OSIRIM (2020) Osirim. https://osirim.irit.fr/site/fr/articles/description-slurm. Accessed 15 Apr 2020
Pietri I, Chronis Y, Ioannidis Y (2019) Fairness in dataflow scheduling in the cloud. Inf Syst 83:118–125. https://doi.org/10.1016/j.is.2019.03.003
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010) Hive-a petabyte scale data warehouse using hadoop. In: 2010 IEEE 26th international conference on data engineering (ICDE). IEEE, pp 996–1005
TPC-H (2020) Tpc-h. http://www.tpc.org/tpch/. Accessed 15 Apr 2020
Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S et al (2013) Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual symposium on cloud computing. ACM, p 5
Yin S, Hameurlain A, Morvan F (2018) SLA definition for multi-tenant dbms and its impact on query optimization. IEEE Trans Knowl Data Eng 30(11):2213–2226
Yu PS, Cornell DW (1993) Buffer management based on return on consumption in a multi-query environment. VLDB J 2(1):1–38
Yuan H, Bi J, Tan W, Li BH (2016) Temporal task scheduling with constrained service delay for profit maximization in hybrid clouds. IEEE Trans Autom Sci Eng 14(1):337–348
Yuan H, Bi J, Zhou M, Ammari AC (2017) Time-aware multi-application task scheduling with guaranteed delay constraints in green data center. IEEE Trans Autom Sci Eng 15(3):1138–1151
Yuan H, Bi J, Zhou M (2018) Spatial task scheduling for cost minimization in distributed green cloud data centers. IEEE Trans Autom Sci Eng 16(2):729–740
Yuan H, Bi J, Zhou M (2019) Profit-sensitive spatial scheduling of multi-application tasks in distributed green clouds. IEEE Trans Autom Sci Eng PP(99):1–10. https://doi.org/10.1109/TASE.2019.2909866
Acknowledgements
This work is supported in part by the Ministries of Europe and Foreign Affairs (MEFA) and of Higher Education, Research and Innovation (MHERI), Amadeus Program 2020 (French-Austrian Hubert Curien Partnership—PHC) (Grant Number 44086TD).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Computation of \(T_{op}\)
Appendix A: Computation of \(T_{op}\)
In this appendix, we present the computation detail of \(T_{op}(s)\) for the stages of the query of Figure 1 . For stages that contain joins or aggregations, we consider the case where a one-pass or two-pass algorithm is applied. The M1\(\rightarrow \)M2 and M2\(\rightarrow \)M3 communications are broadcast type, while the M3\(\rightarrow \)R1 and R1\(\rightarrow \)R2 communications are partitioning type.
The stage M1: (1) read \(R_{M1}\) from DFS, (2) apply the filter , (3)apply the projection.
The stage M2 (one-pass join): (1) read \(R_{M2}\) from local disk, (2) execute debuild phase, (3) read \(S_{M2}\) from DFS, (4) apply the filter, (5) execute deprobe phase.
The stage M2 (two-pass join): (1) read \(R_{M2}\) from local disk, (2) partition \(R_{M2}\) for the two-pass join, (3) write \(R_{M2}\) on local disk, (4) read \(S_{M2}\) from DFS, (5) apply the filter, (6) partition \(S_{M2}\) for the two-pass join, (7) write \(S_{M2}\) on local disk, (8) read \(R_{M2}\) from local disk, (9) execute debuild phase, (10) read \(S_{M2}\) from local disk, (11) execute deprobe phase.
The stage M3 (one-pass join): (1) read \(R_{M3}\) from local disk, (2) execute debuild phase, (3) read \(S_{M3}\) from DFS, (4) apply the filter, (5) apply the projection, (6) execute deprobe phase, (7) partition the data for the aggregation, (8) write the data on local disk, (9) read the data from local disk, (10) execute the aggregation operator.
The stage M3 (two-pass join): (1) read \(R_{M3}\) from local disk, (2) partition \(R_{M3}\) for the two-pass join, (3) write \(R_{M3}\) on local disk, (4) read \(S_{M3}\) from DFS, (5) apply the filter, (6) apply the projection, (7) partition \(S_{M3}\) for the two-pass join, (8) write \(S_{M3}\) on local disk, (9) read \(R_{M3}\) from local disk, (10) execute the build phase, (11) read \(S_{M3}\) from DFS, (12) execute de probe phase, (13) partition the data for the aggregation, (14) write the data on local disk, (15) read the data from local disk, (16) execute de aggregation operator.
The stage R1 (one-pass aggregation): (1) read \(R_{R1}\) from local disk, (2) execute deaggregation operator.
The stage R1 (two-pass aggregation): (1) read \(R_{R1}\) from local disk, (2) partition the data for the aggregation, (3) write the data on local disk, (4) read the data from local disk, (5) execute de aggregation operator.
The stage R2: (1) read the data from local disk, (2) write the data on DFS.
Rights and permissions
About this article
Cite this article
Kandi, M.M., Yin, S. & Hameurlain, A. SLA-driven resource re-allocation for SQL-like queries in the cloud. Knowl Inf Syst 62, 4653–4680 (2020). https://doi.org/10.1007/s10115-020-01501-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01501-z