Abstract
Data-intensive and long-lasting applications running in the form of workflows are being increasingly more dispatched to cloud computing systems. Current scheduling approaches for graphs of dependencies fail to deliver high resource efficiency while keeping computation costs low, especially for continuous data processing workflows, where the scheduler does not perform any reasoning about the impact new input data may have in the workflow final output. To face such stark challenge, we introduce a new scheduling criterion, Quality-of-Data (QoD), which describes the requirements about the data that worth the triggering of tasks in workflows. Based on the QoD notion, we propose a novel service-oriented scheduler planner, for continuous data processing workflows, that is capable of enforcing QoD constraints and guide the scheduling to attain resource efficiency, overall controlled performance, and task prioritization. To contrast the advantages of our scheduling model against others, we developed WaaS (Workflow-as-a-Service), a workflow coordinator system for the Cloud where data is shared among tasks via cloud columnar database.
This work was partially supported by national funds through FCT - Fundação para a Ciência e a Tecnologia, under projects PEst-OE/EEI/LA0021/2013, PTDC/EIA-EIA/113613/2009.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bartholomew, D.: Qemu: a multihost, multitarget emulator. Linux J. 2006(145), 3 (2006)
Blythe, J., Jain, S., Deelman, E., Gil, Y., Vahi, K., Mandal, A., Kennedy, K.: Task scheduling strategies for workflow-based applications in grids. In: Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2005), pp. 759–767. IEEE Computer Society, Washington, DC (2005)
Brown, D.A., Brady, P.R., Dietz, A., Cao, J., Johnson, B., McNabb, J.: A case study on the use of workflow technologies for scientific analysis: Gravitational wave data analysis. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science. Springer, London (2007)
Chen, W.-N., Zhang, J.: An ant colony optimization approach to a grid workflow scheduling problem with various qos requirements. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 39(1), 29–43 (2009)
Costa, F., Silva, J.N., Veiga, L., Ferreira, P.: Large-scale volunteer computing over the internet. J. Internet Services and Applications 3(3), 329–346 (2012)
Couvares, P., Kosar, T., Roy, A., Weber, J., Wenger, K.: Workflow management in condor. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science, pp. 357–375. Springer, Heidelberg (2007)
Deelman, E., et al.: Managing large-scale workflow execution from resource provisioning to provenance tracking: The cybershake example. In: Proceedings of the Second IEEE International Conference on e-Science and Grid Computing, E-SCIENCE 2006, p. 14. IEEE Computer Society, Washington, DC (2006)
Eder, J., Panagos, E., Rabinovich, M.: Time constraints in workflow systems. In: Jarke, M., Oberweis, A. (eds.) CAiSE 1999. LNCS, vol. 1626, pp. 286–300. Springer, Heidelberg (1999)
Esteves, S., Silva, J.N., Veiga, L.: Fluchi: a quality-driven dataflow model for data intensive computing. Journal of Internet Services and Applications 4(1), 12 (2013)
George, L.: HBase: The Definitive Guide, 1st edn. O’Reilly Media (2011)
Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K., Berriman, B., Good, J.: On the use of cloud computing for scientific workflows. In: EEE Fourth International Conference on eScience, 2008, pp. 640–645 (2008)
Li, X., Plale, B., Vijayakumar, N., Ramachandran, R., Graves, S., Conover, H.: Real-time storm detection and weather forecast activation through data mining and events processing. Earth Science Informatics
Mandal, A., Kennedy, K., Koelbel, C., Marin, G., Mellor-Crummey, J., Liu, B., Johnsson, L.: Scheduling strategies for mapping application workflows onto the grid. In: Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing, HPDC-2014, pp. 125–134 (2005)
Mao, M., Humphrey, M.: Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–12 (2011)
Oliveira, P., Ferreira, P., Veiga, L.: Gridlet economics: Resource management models and policies for cycle-sharing systems. In: Riekki, J., Ylianttila, M., Guo, M. (eds.) GPC 2011. LNCS, vol. 6646, pp. 72–83. Springer, Heidelberg (2011)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)
Richards, M., Ghanem, M., Osmond, M., Guo, Y., Hassard, J.: Grid-based analysis of air pollution data. Ecological Modelling 194(1-3), 274–286 (2006)
Shi, Z., Dongarra, J.J.: Scheduling workflow applications on processors with different capabilities. Future Gener. Comput. Syst. 22(6), 665–675 (2006)
Simão, J., Veiga, L.: Qoe-jvm: An adaptive and resource-aware java runtime for cloud computing. In: OTM Conferences, vol. 2, pp. 566–583 (2012)
Veiga, L., Rodrigues, R., Ferreira, P.: Gigi: An ocean of gridlets on a “grid-for-the-masses”. In: CCGRID, pp. 783–788. IEEE Computer Society (2007)
Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the askalon grid environment. SIGMOD Rec. 34(3), 56–62 (2005)
Wu, Z., Liu, X., Ni, Z., Yuan, D., Yang, Y.: A market-oriented hierarchical scheduling strategy in cloud workflow systems. The Journal of Supercomputing 63, 256–293 (2013)
Yih, Y., Thesen, A.: Semi-Markov Decision Models for Real-time Scheduling. Research memorandum. School of Industrial Engineering, Purdue University (1991)
Yu, J., Buyya, R.: A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34(3), 44–49 (2005)
Yu, J., Buyya, R., Tham, C.K.: Cost-based scheduling of scientific workflow application on utility grids. In: Proceedings of the First International Conference on e-Science and Grid Computing, E-SCIENCE 2005, pp. 140–147. IEEE Computer Society, Washington, DC (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Esteves, S., Veiga, L. (2014). Planning and Scheduling Data Processing Workflows in the Cloud with Quality-of-Data Constraints. In: Lomuscio, A.R., Nepal, S., Patrizi, F., Benatallah, B., Brandić, I. (eds) Service-Oriented Computing – ICSOC 2013 Workshops. ICSOC 2013. Lecture Notes in Computer Science, vol 8377. Springer, Cham. https://doi.org/10.1007/978-3-319-06859-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-06859-6_29
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06858-9
Online ISBN: 978-3-319-06859-6
eBook Packages: Computer ScienceComputer Science (R0)