Abstract
Semantic modeling approaches (e.g., conceptual models, controlled vocabularies, and ontologies) are increasingly being adopted to help address a number of challenges in scientific data management. While semantic information has played a considerable role within bioinformatics, semantic technologies can similarly benefit a wide range of scientific disciplines. Here we focus on three main areas where modeling and semantics are playing an increasingly important role: scientific workflows, scientific data provenance, and observational data management. Applications of these areas span a number of disciplines and provide both challenges and new opportunities for conceptual modeling research and development. We provide a brief overview of each area, discuss the role that modeling plays within each, and present current research opportunities.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
van der Aalst W (1998) The application of petri nets to workflow management. J Circuits Syst Comput 8(1):21–66
van der Aalst W, van Kees H (2004) Workflow management: models, methods, and systems. The MIT Press, Cambridge
Altintas I, Barney O, Jaeger-Frank E (2006) Provenance collection support in the Kepler scientific workflow system. In: International provenance and annotation workshop (IPAW), pp 118–132
Altintas I, Lin AW, Chen J, Churas C, Gujral M, Sun S, Li W, Manansala R, Sedova M, Grethe JS, Ellisman MH (2010) Camera 2.0: a data-centric metagenomics community infrastructure driven by scientific workflows. In: IEEE World Congress on Services, pp 352–359
Amsterdamer Y, Davidson SB, Deutch D, Milo T, Stoyanovich J, Tannen V (2011) Putting lipstick on pig: enabling database-style workflow provenance. PVLDB 5(4):346–357
Anand MK, Bowers S, Ludäscher B (2009) A navigation model for exploring scientific workflow provenance graphs. In: Proceedings of the workshop on workflows in support of large-scale science (WORKS)
Anand MK, Bowers S, McPhillips TM, Ludäscher B (2009) Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs. In: SSDBM, pp 237–254
Andelman S, Bowles C, Willig M, Waide R (2004) Understanding environmental complexity through a distributed knowledge network. BioSciences 54(3):2400–2246
Bavoil L, Callahan S, Scheidegger C, Vo H, Crossno P, Silva C, Freire J (2005) Vistrails: enabling interactive multiple-view visualizations. In: IEEE visualization, pp 135–142
Belhajjame K, Wolstencroft K, Corcho Ó, Oinn T, Tanoh F, Williams A, Goble CA (2008) Metadata management in the taverna workflow system. In: IEEE international symposium on cluster computing and the grid (CCGRID), pp 651–656
Biton O, Boulakia SC, Davidson SB, Hara CS (2008) Querying and managing provenance through user views in scientific workflows. In: ICDE, pp 1072–1081
Bowers S, Kudo J, Cao H, Schildhauer MP (2010) ObsDB: a system for uniformly storing and querying heterogeneous observational data. In: eScience, pp 261–268
Bowers S, Ludäscher B (2004) An ontology-driven framework for data transformation in scientific workflows. In: International workshop on data integration in the life sciences (DILS), pp 1–16
Bowers S, Ludäscher B (2005) Actor-oriented design of scientific workflows. In: International conference on conceptual modeling (ER), pp 369–384
Bowers S, Madin JS, Schildhauer MP (2008) A conceptual modeling framework for expressing observational data semantics. In: ER, pp 41–54
Bowers S, McPhillips TM, Ludäscher B, Cohen S, Davidson SB (2006) A model for user-oriented data provenance in pipelined scientific workflows. In: International workshop on provenance and annotation (IPAW), pp 133–147
Bowers S, McPhillips TM, Wu M, Ludäscher B (2007) Project histories: managing data provenance across collection-oriented scientific workflow runs. In: International workshop on data integration in the life sciences (DILS), pp 122–138
Cao H, Bowers S, Schildhauer MP (2011) Approaches for semantically annotating and discovering scientific observational data. In: International conference on database and expert systems applications (DEXA), pp 526–541
Cheney J, Chiticariu L, Tan WC (2009) Provenance in databases: why, how, and where. Found Trends Databases 1(4):379–474
Cox S (2011) Observations and measurements v2.0—XML implementation. Tech Rep 10-025r1, OGC
Cushing JB, Nadkarni N, Finch M, Fiala A, Murphy-Hill ER, Delcambre LML, Maier D (2007) Component-based end-user database design for ecologists. J Intell Inf Syst 29(1):7–24
Damevski K, Khan A, Parker S (2008) Scientific workflows and components: together at last. In: Proceedings of the workshop on component-based high-performance computing (CBHPC)
Davidson SB, Freire J (2008) Provenance and scientific workflows: challenges and opportunities. In: SIGMOD conference, pp 1345–1350
De Roure D, Goble C, Stevens R (2009) The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Gener Comput Syst 25:561–567
Deelman E, Gannon D, Shields MS, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540
Deelman E, Singh G, Su MH, Blythe J, Gil Y, Kesselman C, Mehta G, Vahi K, Berriman GB, Good J, Laity AC, Jacob JC, Katz DS (2005) Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci Program 13(3):219–237
Ellison A, Osterweil L, Hadley J, Wise A, Boose E, Clarke L, Foster D, Hanson A, Jensen D, Kuzeja P, Riseman E, Schultz H (2006) Analytic webs support the synthesis of ecological datasets. Ecology 87:1345–1358
Fox GC, Gannon D (eds) (2006) Concurrency and computation: practice and experience. Special issue: Workflow in grid systems, vol 18(10). Wiley, Chichester
Gil Y, Deelman W, Ellisman W, Fahringer T, Fox G, Gannon D, Goble C, Livny M, Moreau L, Myers J (2007) Examining the challenges of scientific workflows. Computer 40(12):24–32
Gil Y, Groth PT, Ratnakar V, Fritz C (2009) Expressive reusable workflow templates. In: International conference on e-science, pp 344–351
Gil Y, Ratnakar V, Kim J, González-Calero PA, Groth PT, Moody J, Deelman E (2011) Wings: intelligent workflow-based design of computational experiments. IEEE Intell Syst 26(1):62–72
Goecks J, Nekrutenko A, Taylor J (2010) Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 11(8):R86
Houstis E, Gallopoulos E, Bramley R, Rice J (1997) Problem-solving environments for computational science. IEEE Comput Sci Eng 4(3):18–21
Hull R (2008) Artifact-centric business process models: brief survey of research results and challenges. In: OTM conferences, pp 1152–1163
Ioannidis YE, Livny M (1989) Moose: modeling objects in a simulation environment. In: Ritter GX (ed) IFIP congress. North Holland, pp 821–826
Ioannidis YE, Livny M, Gupta S, Ponnekanti N (1996) ZOO: a desktop experiment management environment. In: Vijayaraman TM, Buchmann AP, Mohan C, Sarda NL (eds) Proceedings of international conference on very large data bases (VLDB), pp 274–285
Lee EA, Messerschmitt DG (1987) Synchronous data flow. Proc IEEE 75(9):1235–1245
Lee EA, Parks TM (1995) Dataflow process networks. Proc IEEE 83(5):773–799
Lim C, Lu S, Chebotko A, Fotouhi F (2011) OPQL: A first OPM-level query language for scientific workflow provenance. In: IEEE international conference on services computing (SCC), pp 136–143
Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones MB, Lee EA, Tao J, Zhao Y (2006) Scientific workflow management and the Kepler system. Concurr Comput Pract Exper 18(10)
Madin J, Bowers S, Schildhauer M, Jones M (2008) Advancing ecological research with ontologies. Trends Ecol Evol 23(3):159–168
Madin J, Bowers S, Schildhauer M, Krivov S, Pennington D, Villa F (2006) An ontology for describing and synthesizing ecological observation data. Ecol Inform 2:279–296
Majithia S, Shields M, Taylor I, Wang I (2004) Triana: a graphical web service composition and execution toolkit. In: Proceedings of the IEEE international conference on web services (ICWS). IEEE Computer Society
McGuinness D, Fox P, Cinquini L, West P, Garcia J, Benedict J, Middleton D (2007) The virtual solar-terrestrial observatory: a deployed semantic web application case study for scientific research. In: AAAI, pp 1730–1737
McPhillips TM, Bowers S, Zinn D, Ludäscher B (2009) Scientific workflow design for mere mortals. Future Gener Comput Syst 25(5):541–551
Medeiros CB, Vossen G, Weske M (1995) WASA: A workflow-based architecture to support scientific database applications. In: Database and expert systems application (DEXA). Springer LNCS 978, pp 574–583
Missier P, Paton NW, Belhajjame K (2010) Fine-grained and efficient lineage querying of collection-based workflow provenance. In: EDBT, pp 299–310
Missier P, Sahoo SS, Zhao J, Goble CA, Sheth AP (2010) Janus: From workflows to semantic provenance and linked open data. In: International provenance and annotation workshop (IPAW), pp 129–141
Missier P, Soiland-Reyes S, Owen S, Tan W, Nenadic A, Dunlop I, Williams A, Oinn T, Goble CA (2010) Taverna, reloaded. In: SSDBM, pp 471–481
Moreau L, Clifford B, Freire J, Futrelle J, Gil Y, Groth PT, Kwasnikowska N, Miles S, Missier P, Myers J, Plale B, Simmhan Y, Stephan EG, den Bussche JV (2011) The open provenance model core specification (v1.1). Future Gener Comput Syst 27(6):743–756
Moreau L, Ludäscher B, Altintas I, Barga RS, Bowers S, Callahan SP, Jr. GC, Clifford B, Cohen S, Boulakia SC, Davidson SB, Deelman E, Digiampietri LA, Foster IT, Freire J, Frew J, Futrelle J, Gibson T, Gil Y, Goble CA, Golbeck J, Groth PT, Holland DA, Jiang S, Kim J, Koop D, Krenek A, McPhillips TM, Mehta G, Miles S, Metzger D, Munroe S, Myers J, Plale B, Podhorszki N, Ratnakar V, Santos E, Scheidegger CE, Schuchardt K, Seltzer MI, Simmhan YL, Silva CT, Slaughter P, Stephan EG, Stevens R, Turi D, Vo HT, Wilde M, Zhao J, Zhao Y (2008) Special issue: The first provenance challenge. Concurr Comput Pract Exp 20(5):409–418
Mungall C (2007) Representing phenotypes in owl. In: Proceedings of the workshop on OWL: experiences and directions (OWLED)
Nakagawa AS (1994) LIMS: implementation and management. The Royal Society of Chemistry, Thomas Graham House, The Science Park, Cambridge CB4 4WF
Ngu AHH, Bowers S, Haasch N, McPhillips TM, Critchlow T (2008) Flexible scientific workflow modeling using frames, templates, and dynamic embedding. In: SSDBM, pp 566–572
Oinn T, Greenwood M, Addis M, Alpdemir MN, Ferris J, Glover K, Goble C, Goderis A, Hull D, Marvin D, Li P, Lord P, Pocock MR, Senger M, Stevens R, Wipat A, Wroe C (2006) Taverna: lessons in creating a workflow environment for the life sciences. Concurr Comput Pract Exp 18(10)
Pennings S, Clark C, Cleland E, Collins S, Gough L, Gross K, Milchunas D, Suding K (2005) Do individual plant species show predictable responses to nitrogen addition across multiple experiments? Oikos 110(3):547–555
Raskin R (2004) Enabling semantic interoperability for earth science data. http://sweet.jpl.nasa.gov
Sahoo SS, Barga RS, Sheth AP, Thirunarayan K, Hitzler P (2009) PrOM: a semantic web framework for provenance management in science. Tech. Rep. KNOESIS-TR-2009, Kno.e.sis Center
Sahoo SS, Sheth AP, Henson CA (2008) Semantic provenance for escience: managing the deluge of scientific data. IEEE Internet Comput 12(4):46–54
Scheidegger CE, Koop D, Santos E, Vo HT, Callahan SP, Freire J, Silva CT (2008) Tackling the provenance challenge one layer at a time. Concurr Comput Pract Exp 20(5):473–483
Simmhan YL, Plale B, Gannon D (2008) Query capabilities of the karma provenance framework. Concurr Comput Pract Exp 20(5):441–451
Sorokina D, Caruana R, Riedewald M, Hochachka W, Kelling S (2009) Detecting and interpreting variable interactions in observational ornithology data. In: ICDM workshops, pp 64–69
Tarboton D, Horsburgh J, Maidment D (2007) CUAHSI community observations data model (ODM), version 1.0 design specifications. http://water.usu.edu/cuahsi/odm/
Taylor I, Deelman E, Gannon D, Shields M (eds) (2007) Workflows for e-Science: scientific workflows for grids. Springer
Thau D, Bowers S, Ludäscher B (2009) Merging sets of taxonomically organized data using concept mappings under uncertainty. In: OTM conferences, pp 1103–1120
Tolosana-Calasanz R, Bañares JA, Rana OF, Álvarez P, Ezpeleta J, Hoheisel A (2010) Adaptive exception handling for scientific workflows. Concurr Comput Pract Exp 22(5):617–642
Wainer J, Weske M, Vossen G, Medeiros CB (1996) Scientific workflow systems. In: Proceedings of the NSF workshop on workflow and process automation in information systems: state of the art and future directions
Wang L, Lu S, Fei X, Chebotko A, Bryant HV, Ram JL (2009) Atomicity and provenance support for pipelined scientific workflows. Future Gener Comput Syst 25(5):568–576
Wang P, Zheng J, Fu L, Patton EW, Lebo T, Ding L, Liu Q, Luciano JS, McGuinness DL (2011) A semantic portal for next generation monitoring systems. In: International semantic web conference (ISWC), pp 253–268
Wiener JL, Ioannidis YE (1993) A moose and a fox can aid scientists with data management problems. In: C. Beeri, A. Ohori, D. Shasha (eds) 4th international workshop database programming languages (DBPL). Springer, pp 376–398
Williams R, Martinez N, Goldbeck J (2006) Ontologies for ecoinformatics. J Web Semant 4:237–242
Wolstencroft K, Alper P, Hull D, Wroe C, Lord PW, Stevens RD, Goble CA (2007) The myGrid ontology: bioinformatics service discovery. IJBRA 3(3):303–325
Yu J, Buyya R (2005) A taxonomy of scientific workflow systems for grid computing. SIGMOD Record 34(5)
Acknowledgments
This work was supported in part by NSF grants #0743429, #0753144, and #1118088. The author would especially like to thank Bertram Ludäscher, Timothy McPhillips, Manish Kumar Anand, Mark Schildhauer, and Matthew Jones whose collaborations over many years in scientific workflows, provenance, and observational data semantics were essential for the ideas presented in this article.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bowers, S. Scientific Workflow, Provenance, and Data Modeling Challenges and Approaches. J Data Semant 1, 19–30 (2012). https://doi.org/10.1007/s13740-012-0004-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-012-0004-y