Personal Workspace for Large-Scale Data-Driven Computational Experiment
Pages 112 - 119
Abstract
As the scale and complexity of data-driven computational science grows, so grows the burden on the scientists and students in managing the data products used and generated during experiments. Products must be moved and directories created. Search support in traditional file systems is arcane. While storage management tools can store rich metadata, these tools do not satisfy the nuances of the individual computational science researcher working alone or cooperatively. We have developed a personal workspace tool, myLEAD, that actively manages metadata and data products for users. Inspired by the Globus MCS metadata catalog and layered on top of the UK e-Science OGSA-DAI tool, myLEAD provides capture, storage and search tools to the computational scientist. In this paper we experimentally evaluate the performance of the myLEAD metadata catalog.
References
[1]
Acopia Networks, "File virtualization with the Acopia ARX," Acopia White Paper, 2005.
[2]
Altintas, I., C. Berkley, E. Jaeger, M. Jones, B. Ludäscher, and S. Mock, "Kepler: an extensible system for design and execution of scientific workflows," In Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM), 2004.
[3]
Antonioletti, M., M. Atkinson, R. Baxter, A. Borley, N. P. Chue Hong, B. Collins, N. Hardman, A. C. Hume, A. Knox, M. Jackson, A. Krause, S. Laws, J. Magowan, N. W. Paton, D. Pearson, T. Sugden, P. Watson, and M. Westhead, "The design and implementation of grid database services in OGSA-DAI," Concurrency and Computation: Practice and Experience, Vol. 17, No. 2-4, 2005, pp. 357-376.
[4]
Barkes, J., M. R. Barrios, F. Cougard, P. G. Crumley, D. Martin, H. Reddy, and T. Thitayanum, "GPFS: a parallel file system," IBM International Technical Support Organization, SG24-5165-00, Apr 1998.
[5]
Beckman, P. H., "Building the TeraGrid," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 363, No. 1833, Aug 2005, pp. 1715-1728.
[6]
Buehler, K., and L. McKee, "The OpenGIS guide - introduction to interoperable geoprocessing," Open Geodata Interoperability Specification (OGIS), Open GIS Consortium Inc., Technical Report. 1996.
[7]
Chervenak, A., R. Schuler, C. Kesselman, S. Kornada, and B. Moe, "Wide area data replication for scientific collaborations," In Proceedings of Grid Computing, 2005, The 6th IEEE/ACM International Workshop, Nov 2005, pp. 1-8, 13-14.
[8]
Cornillon, P., J. Gallagher, and T. Sgouros, "OPeNDAP: accessing data in a distributed, heterogeneous environment," Data Science Journal, Vol. 2, Nov 2003.
[9]
Droegemeier, K. Brewster, M. Xue, D. Weber, D. Gannon, B. Plale, D. Reed, L. Ramakrishnan, J. Alameda, R. Wilhelmson, T. Baltzer, B. Domenico, D. Murray, A. Wilson, R. Clark, S. Yalda, S. Graves, R. Ramachandran, J. Rushing, and E. Joseph, "Service-oriented environments for dynamically interacting with mesoscale weather," Computing in Science and Engineering, IEEE Computer Society Press and American Institute of Physics, Vol. 7, No. 6, 2005, pp. 12-29.
[10]
Federal Geographic Data Committee, Content Standard for Digital Geospatial Metadata Workbook Version 2.0, Federal Geographic Data Committee. (May 1st, 2000).
[11]
Foster, I., and C. Kesselman, The Grid: blueprint for a new computing infrastructure, Morgan Kaufman Publishers, Inc. San Francisco, California, 1999.
[12]
Gannon, D., B. Plale, S. Marru, G. Kandaswamy, Y. Simmhan, and S. Shirasuna, "Chapter 10: Dynamic, adaptive workflows for mesoscale meteorology," Workflows for eScience: Scientific Workflows for Grids, Ed. I. Taylor, E. Deelman, D. Gannon, and M. Shields, Springer Verlag, 2006, pp. 97-114.
[13]
D. Gannon, et, al. "Building grid portal applications from a web-service component architecture," In Proceedings of the IEEE, Vol. 93, No. 3, March 2005, pp. 551-563.
[14]
Goble, C., C. Wroe, R. Stevens, and myGrid consortium (2003), "The myGrid project: services, architecture and demonstrator," In Proceedings of the UK e-Science programme All Hands Meeting, University of Manchester, Manchester, UK.
[15]
Halbwachs, N., P. Caspi, P. Raymond, and D. Pilaud, "The synchronous data flow programming language Lustre," In Proceedings of the IEEE, Vol. 79, No. 9, 1991, pp. 1305-1320.
[16]
Hastings, S., S. Langella, S. Oster, T. Kurc, T. Pan, U. Catalyurek, D. Janies, and J. Saltz, "Grid-based management of biomedical data using an XML-based distributed data management system," In Proceedings of the 2005 ACM Symposium on Applied Computing (SAC), March 2005.
[17]
Jensen, S., B. Plale, S. L. Pallickara and Y. Sun, "A hybrid XML-relational grid metadata catalog," To appear Workshop on Web Services-based Grid Applications (WGSA'06) in association with International Conference on Parallel Processing (ICPP-06), Aug 2006.
[18]
Oinn, T., M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. Pocock, A. Wipat, and P. Li, "Taverna: a tool for the composition and enactment of bioinformatics workflows," Bioinformatics, Vol. 20, No. 172004, Nov 2004, pp. 3045-3054.
[19]
Plale, B., "Usage study for data storage repository in LEAD," LEAD Technical Report, LEAD TR001, October 2005.
[20]
Plale B., D. Gannon, J. Alameda, B. Wilhelmson, S. Hampton, A. Rossi, and K. Droegemeier, "Active management of scientific data," IEEE Internet Computing special issue on Internet Access to Scientific Data, Vol. 9, No. 1, Jan/Feb 2005, pp. 27-34.
[21]
Plale, B., D. Gannon, Y. Huang, G. Kandaswamy, S. L. Pallickara, and A. Slominski, "Cooperating services for managing data driven computational experimentation," IEEE Computing in Science and Engineering (CiSE), Vol. 7, No. 5, Sep/Oct 2005.
[22]
Plale, B., R. Ramachandran, and S. Tanner, "Data management support for adaptive analysis and prediction of the atmosphere in LEAD," 22nd Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology (IIPS), Jan 2006.
[23]
Rajasekar, A., M. Wan, and R. Moore, "MySRB & SRB - components of a data grid," The 11th International Symposium on High Performance Distributed Computing (HPDC-11) Edinburgh, Scotland, Jul 24-26, 2002.
[24]
Sharman, N., N. Alpdemir, J. Ferris, M. Greenwood, P. Li, and C. Wroe, "The myGrid information model," In Proceedings of the Third UK e-Science AHM, Nottingham, UK, Aug 2004.
[25]
Singh, G., S. Bharathi, A. Chervenak, E. Deelman, C. Kesselman, M. Manohar, S. Patil, and L. Pearlman, "A metadata catalog service for data intensive applications," ACM Supercomputing Conference, Phoenix, AZ, Nov 2003.
[26]
Stevens, R.D., A.J. Robinson, and C.A. Goble, "MyGrid: personalised bioinformatics on the information grid," Proc. 11th Int'l Conf. Intelligent Systems for Molecular Biology, International Society for Computational Biology, 2003.
[27]
Stolte, E., C. von Praun, G. Alonso, and T. Gross, "Scientific data repositories - designing for a moving target," In Proceedings of SIGMOD, San Diego, Jun 2003.
[28]
Wang, F., P. Liu, J. Pearson, F. Azar, and G. Madlmayr, "Experiment management with metadata-based integration for collaborative scientific research," In Proceeding of the 22nd ICDE, Atlanta, Georgia, Apr 2006.
Index Terms
- Personal Workspace for Large-Scale Data-Driven Computational Experiment
Index terms have been assigned to the content through auto-classification.
Recommendations
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
September 2006
382 pages
ISBN:142440343X
Publisher
IEEE Computer Society
United States
Publication History
Published: 28 September 2006
Qualifiers
- Article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 107Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)15
Reflects downloads up to 20 Nov 2024
Other Metrics
Citations
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in