Abstract
Scientific workflows have gained great momentum in recent years due to their critical roles in e-Science and cyberinfrastructure applications. However, some tasks of a scientific workflow might fail during execution. A domain scientist might require a region of a scientific workflow to be “atomic”. Data provenance, which determines the source data that are used to produce a data item, is also essential to scientific workflows. In this paper, we propose: (i) an architecture for scientific workflow management systems that supports both provenance and atomicity; (ii) a dataflow-oriented atomicity model that supports the notions of commit and abort; and (iii) a dataflow-oriented provenance model that, in addition to supporting existing provenance graphs and queries, also supports queries related to atomicity and failure.
The first two authors contributed equally to this paper.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bernstein, P.A., Hsu, M., Mann, B.: Implementing recoverable requests using queues. In: Proc. of the 1990 ACM SIGMOD international conference on Management of data, pp. 112–122. ACM Press, New York (1990)
Bowers, S., McPhillips, T., Ludäscher, B., Cohen, S., Davidson, S.B.: A model for user-oriented data provenance in pipelined scientific workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 133–147. Springer, Heidelberg (2006)
Buneman, P., Khanna, S., Tan, W.-C.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)
Cohen, S., Boulakia, S.C., Davidson, S.B.: Towards a model of provenance and user views in scientific workflows. In: Data Integration in the Life Sciences, pp. 264–279 (2006)
Derks, W., Dehnert, J., Grefen, P., Jonker, W.: Customized atomicity specification for transactional workflows. In: Proc. of the Third International Symposium on Cooperative Database Systems for Advanced Applications (CODAS’01), pp. 140–147. IEEE Computer Society Press, Los Alamitos (2001)
Garcia-Molina, H., Salem, K.: Sagas. In: SIGMOD ’87: Proceedings of the 1987 ACM SIGMOD international conference on Management of data, pp. 249–259. ACM Press, New York (1987)
Groth, P., Miles, S., Fang, W., Wong, S.C., Zauner, K.-P., Moreau, L.: Recording and using provenance in a protein compressibility experiment. In: Proc. of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC’05), Research Triangle Park, North Carolina, U.S.A. (July 2005)
Leymann, F., Roller, D.: Production workflow: concepts and techniques. Prentice-Hall, Englewood Cliffs (2000)
Ludascher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the kepler system: Research articles. Concurr. Comput.: Pract. Exper. 18(10), 1039–1065 (2006)
Miles, S., Groth, P., Branco, M., Moreau, L.: The requirements of recording and using provenance in e-science experiments. Journal of Grid Computing (2006)
Simmhan, Y.L., Plale, B., Gannon, D.: A framework for collecting provenance in data-centric scientific workflows. In: Proc. of the IEEE International Conference on Web Services (ICWS’06), Washington, DC, USA, pp. 427–436 (2006)
Yu, J., Buyya, R.: A taxonomy of scientific workflow systems for grid computing. SIGMOD Record 34(3), 44–49 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Lu, S., Fei, X., Ram, J. (2007). A Dataflow-Oriented Atomicity and Provenance System for Pipelined Scientific Workflows. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds) Computational Science – ICCS 2007. ICCS 2007. Lecture Notes in Computer Science, vol 4489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72588-6_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-72588-6_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72587-9
Online ISBN: 978-3-540-72588-6
eBook Packages: Computer ScienceComputer Science (R0)