Workflow management for high volume supernova search

CR Aragon, KJ Runge - Proceedings of the 2009 ACM symposium on …, 2009 - dl.acm.org
CR Aragon, KJ Runge
Proceedings of the 2009 ACM symposium on Applied Computing, 2009dl.acm.org
Observational astrophysics has recently become a data-intensive science after many
decades of relative data poverty. As a result, many of the algorithms developed for
processing astronomical data, although well established for low-volume data capture, do not
scale well to today's high-volume sky surveys and transient searches. Specifically, problems
may occur with data transfer, workflow management, efficient parallelization, and integration
of legacy code. Observational astrophysics workflows present computational challenges …
Observational astrophysics has recently become a data-intensive science after many decades of relative data poverty. As a result, many of the algorithms developed for processing astronomical data, although well established for low-volume data capture, do not scale well to today's high-volume sky surveys and transient searches. Specifically, problems may occur with data transfer, workflow management, efficient parallelization, and integration of legacy code. Observational astrophysics workflows present computational challenges unique in high performance computing, including 24/7 operations, time-critical processing, and very large numbers of relatively small data files which must all be processed and archived. We present a case study based on Sunfall, a distributed, parallel scientific workflow system we built for the Nearby Supernova Factory, the largest data-volume supernova search currently in existence. We describe innovative techniques for data transfer and workflow management, and discuss lessons learned in building a large-scale observational astrophysics workflow management system.
ACM Digital Library