Abstract
We present a novel, web-accessible scientific workflow system which makes large-scale comparative studies accessible without programming or excessive configuration requirements. GPFlow allows a workflow defined on single input values to be automatically lifted to operate over collections of input values and supports the formation and processing of collections of values without the need for explicit iteration constructs. We introduce a new model for collection processing based on key aggregation and slicing which guarantees processing integrity and facilitates automatic association of inputs, allowing scientific users to manage the combinatorial explosion of data values inherent in large scale comparative studies. The approach is demonstrated using a core task from comparative genomics, and builds upon our previous work in supporting combined interactive and batch operation, through a lightweight web-based user interface.
Chapter PDF
Similar content being viewed by others
References
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Sixth Symposium on Operating System Design and Implementation OSDI 2004, USENIX Association, San Francisco, CA (2004)
Rygg, A., Roe, P., Wong, O.: GPFlow: An Intuitive Environment for Web Based Scientific Workflow. In: Fifth International Conference on Grid and Cooperative Computing Workshops, pp. 204–211. IEEE Computer Society, Los Alamitos (2006)
Fox, G.C., Gannon, D.: Special Issue: workflow in Grid Systems (Editorial). Concurrency and Computation: Practice and Experience 18(10), 1009–1019 (2006)
McPhillips, T., Bowers, S.: An Approach for Pipelining Nested Collections in Scientific Workflows. ACM SIGMOD Record, ACM SIGMOD/PODS 34(3), 12–17 (2005)
McPhillips, T., Bowers, S., Ludäscher, B.: Collection-Oriented Scientific Workflows for Integrating and Analyzing Biological Data. In: Leser, U., Naumann, F., Eckman, B. (eds.) DILS 2006. LNCS (LNBI), vol. 4075, pp. 248–263. Springer, Heidelberg (2006)
Oinn, T., Greenwood, M., Addis, M., Alpdemir, M.N., Ferris, J., Glover, K., Goble, C., Goderis, A., Hull, D., Marvin, D., Li, P., Lord, P., Pocock, M., Senger, M., Stevens, R., Wipat, A., Wroe, C.: Taverna: lessons in creating a workflow environment for the life sciences. In: Concurrency and Computation: Practice and Experience, vol. 18, pp. 1067–1100. Wiley InterScience, Chichester (2006)
Johnston, W.M., Hanna, J.R.P., Millar, R.J.: Advances in dataflow programming languages. ACM Computing Surveys 36(1), 1–34 (2004)
Davis, A.L., Keller, R.M.: Data flow program graphs. IEEE Computer 15(2), 26–41 (1982)
Hylands, C., Lee, E., Liu, J., Liu, X., Neuendorffer, S., Xiong, Y., Zhao, Y., Zheng, H.: Overview of the Ptolemy Project. Technical Memorandum UCB/ERL M02/25, University of California, Berkeley (2003)
Stevens, R., Robinson, A., Goble, C.A.: myGrid: Personalized Bioinformatics on the Information Grid. Bioinformatics, Oxford Journals 19(suppl. 1), i302-i304 (2003)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Felsenstein, J.: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buckingham, L., Hogan, J.M., Roe, P., Sumitomo, J., Towsey, M. (2008). Comparative Studies Simplified in GPFlow. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds) Computational Science – ICCS 2008. ICCS 2008. Lecture Notes in Computer Science, vol 5103. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69389-5_56
Download citation
DOI: https://doi.org/10.1007/978-3-540-69389-5_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69388-8
Online ISBN: 978-3-540-69389-5
eBook Packages: Computer ScienceComputer Science (R0)