Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-642-34222-6_12guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Using domain-specific data to enhance scientific workflow steering queries

Published: 19 June 2012 Publication History

Abstract

In scientific workflows, provenance data helps scientists in understanding, evaluating and reproducing their results. Provenance data generated at runtime can also support workflow steering mechanisms. Steering facilities for workflows is considered a challenge due to its dynamic demands during execution. To steer, for example, scientists should be able to suspend (or stop) a workflow execution when the approximate solution meets (or deviates) preset criteria. These criteria are commonly evaluated based on provenance data (execution data) and domain-specific data. We claim that the final decision on whether to interfere on the workflow execution may only become feasible when workflows can be steered by scientists using provenance data enriched with domain-specific data. In this paper we propose an approach based on specialized software components, named Data Extractor (DE), to acquire domain-specific data from data files produced during a scientific workflow execution. DE gathers domain-specific data from produced data files and associates it to existing provenance data on the provenance repository. We have evaluated the proposed approach using a real bioinformatics workflow for comparative genomics executed in SciCumulus cloud workflow parallel engine.

References

[1]
Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids, 1st edn. Springer (2007)
[2]
Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for Computational Tasks: A Survey. Computing in Science and Engineering 10(3), 11-21 (2008)
[3]
Vaquero, L.M., Rodero-Merino, L., Caceres, J., Lindner, M.: A break in the clouds: towards a cloud definition. SIGCOMM Comput. Commun. Rev. 39(1), 50-55 (2009)
[4]
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM SIGMOD Record 34(3), 31-36 (2005)
[5]
Factor, M., Henis, E., Naor, D., Rabinovici-Cohen, S., Reshef, P., Ronen, S., Michetti, G., Guercio, M.: Authenticity and provenance in long term digital preservation: modeling and implementation in preservation aware storage. In: First Workshop on Theory and Practice of Provenance, Berkeley, CA, USA, pp. 6:1-6:10 (2009)
[6]
Groth, P., Deelman, E., Juve, G., Mehta, G., Berriman, B.: Pipeline-centric provenance model. In: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, Portland, Oregon, pp. 1-8 (2009)
[7]
Sahoo, S., Sheth, A.: Provenir ontology: Towards a Framework for eScience Provenance Management. In: Microsoft eScience Workshop, Pittsburgh, PA, pp. 15-17 (2009)
[8]
Wolstencroft, K., Alper, P., Hull, D., Wroe, C., Lord, P.W., Stevens, R.D., Goble, C.A.: The myGrid ontology: bioinformatics service discovery. Int. J. Bioinformatics Res. Appl. 3(3), 303-325 (2007)
[9]
Crawl, D., Altintas, I.: A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 152-159. Springer, Heidelberg (2008)
[10]
de Oliveira, D., Ogasawara, E., Seabra, F., Silva, V., Murta, L., Mattoso, M.: GExpLine: A Tool for Supporting Experiment Composition. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 251-259. Springer, Heidelberg (2010)
[11]
Missier, P.: Incremental workflow improvement through analysis of its data provenance. In: 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP 2011), Heraklion, Crete, Greece (2011)
[12]
Ocaña, K.A.C.S., Oliveira, D., Dias, J., Ogasawara, E., Mattoso, M.: Optimizing Phylogenetic Analysis Using SciHmm Cloud-based Scientific Workflow. In: 2011 IEEE Seventh International Conference on e-Science (e-Science) IEEE e-Science 2011, Stockholm, Sweden, pp. 190-197 (2011)
[13]
Guerra, G., Rochinha, F., Elias, R., Oliveira, D., Ogasawara, E., Dias, J., Mattoso, M., Coutinho, A.L.G.A.: Uncertainty Quantification in Computational Predictive Models for Fluid Dynamics Using Workflow Management Engine. International Journal for Uncertainty Quantification 2(1), 53-71 (2012)
[14]
Ogasawara, E., Oliveira, D., Chirigati, F., Barbosa, C.E., Elias, R., Braganholo, V., Coutinho, A., Mattoso, M.: Exploring many task computing in scientific workflows. In: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, MTAGS 2009, Portland, Oregon, USA, pp. 1-10 (2009)
[15]
Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., et al.: Examining the Challenges of Scientific Workflows. Computer 40(12), 24-32 (2007)
[16]
Dias, J., Ogasawara, E., Oliveira, D., Porto, F., Coutinho, A., Mattoso, M.: Supporting Dynamic Parameter Sweep in Adaptive and User-Steered Workflow. In: 6th Workshop on Workflows in Support of Large-Scale Science WORKS 2011, Seattle, WA, USA, pp. 31-36 (2011)
[17]
Oliveira, D., Ogasawara, E., Ocaña, K., Baiao, F., Mattoso, M.: An Adaptive Parallel Execution Strategy for Cloud-based Scientific Workflows. Concurrency and Computation: Practice and Experience (2011) (online)
[18]
Miller, W., Makova, K.D., Nekrutenko, A., Hardison, R.C.: Comparative Genomics. Annual Review of Genomics and Human Genetics 5(1), 15-56 (2004)
[19]
Clark, A.G.: Genomics of the evolutionary process. Trends in Ecology & Evolution 21(6), 316-321 (2006)
[20]
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5), 412-424 (2000)
[21]
Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E., Silva, C.T., Vo, H.T.: VisTrails: visualization meets data management. In: SIGMOD International Conference on Management of Data, Chicago, Illinois, USA, pp. 745-747 (2006)
[22]
Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., Li, P., Oinn, T.: Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34(2), 729-732 (2006)
[23]
Amazon EC2, Amazon Elastic Compute Cloud (Amazon EC2) (2010), http://aws.amazon.com/ec2/
[24]
Ogasawara, E., Dias, J., Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: An Algebraic Approach for Data-Centric Scientific Workflows. Proc. of VLDB Endowment 4(12), 1328-1339 (2011)
[25]
Gamma, E., Helm, R., Johnson, R., Vlissides, J.M.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional (1994)
[26]
Moreau, L., Freire, J., Futrelle, J., McGrath, R.E., Myers, J., Paulson, P.: The Open Provenance Model: An Overview. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 323-326. Springer, Heidelberg (2008)
[27]
Carpenter, B., Getov, V., Judd, G., Skjellum, A., Fox, G.: MPJ: MPI-like message passing for Java. Concurrency: Practice and Experience 12(11), 1019-1038 (2000)
[28]
Pruitt, K.D., Tatusova, T., Klimke, W., Maglott, D.R.: NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Research 37(Database issue), D32-D36 (2009)
[29]
Simmhan, Y.L., Plale, B., Gannon, D.: A Framework for Collecting Provenance in Data-Centric Scientific Workflows. In: ICWS, pp. 427-436 (2006)
[30]
Missier, P., Sahoo, S.S., Zhao, J., Goble, C., Sheth, A.: Janus: From Workflows to Semantic Provenance and Linked Open Data. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 129-141. Springer, Heidelberg (2010)
[31]
Anand, M.K., Bowers, S., McPhillips, T., Ludäscher, B.: Exploring Scientific Workflow Provenance Using Hybrid Queries over Nested Data and Lineage Graphs. In: Winslett, M. (ed.) SSDBM 2009. LNCS, vol. 5566, pp. 237-254. Springer, Heidelberg (2009)
[32]
Gadelha, L., Mattoso, M., Wilde, M., Foster, I.: Provenance Query Patterns for Many-Task Scientific Computing. In: USENIX Workshop on the Theory and Practice of Provenance (TaPP), Heraklion, Crete, Greece (2011)
[33]
Zhao, Y., Hategan, M., Clifford, B., Foster, I., von Laszewski, G., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: Fast, Reliable, Loosely Coupled Parallel Computation. In: 3rd IEEE World Congress on Services, Salt Lake City, USA, pp. 199- 206 (2007)

Cited By

View all
  • (2018)Provenance Analytics for Workflow-Based Computational ExperimentsACM Computing Surveys10.1145/318490051:3(1-25)Online publication date: 23-May-2018
  • (2017)Managing Provenance of Implicit Data Flows in Scientific ExperimentsACM Transactions on Internet Technology10.1145/305337217:4(1-22)Online publication date: 18-Aug-2017
  • (2014)ProvGenRevised Selected Papers of the 5th International Provenance and Annotation Workshop on Provenance and Annotation of Data and Processes - Volume 862810.1007/978-3-319-16462-5_2(16-27)Online publication date: 9-Jun-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IPAW'12: Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
June 2012
253 pages
ISBN:9783642342219
  • Editors:
  • Paul Groth,
  • James Frew

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 19 June 2012

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Provenance Analytics for Workflow-Based Computational ExperimentsACM Computing Surveys10.1145/318490051:3(1-25)Online publication date: 23-May-2018
  • (2017)Managing Provenance of Implicit Data Flows in Scientific ExperimentsACM Transactions on Internet Technology10.1145/305337217:4(1-22)Online publication date: 18-Aug-2017
  • (2014)ProvGenRevised Selected Papers of the 5th International Provenance and Annotation Workshop on Provenance and Annotation of Data and Processes - Volume 862810.1007/978-3-319-16462-5_2(16-27)Online publication date: 9-Jun-2014
  • (2013)Provenance traces from Chiron parallel workflow engineProceedings of the Joint EDBT/ICDT 2013 Workshops10.1145/2457317.2457379(337-338)Online publication date: 18-Mar-2013
  • (2013)Capturing and querying workflow runtime provenance with PROVProceedings of the Joint EDBT/ICDT 2013 Workshops10.1145/2457317.2457365(282-289)Online publication date: 18-Mar-2013

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media