Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1109/DISCS.2014.12acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

dispel4py: a Python framework for data-intensive scientific computing

Published: 16 November 2014 Publication History

Abstract

This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. The main aim of dispel4py is to enable scientists to focus on their computation instead of being distracted by details of the computing infrastructure they use. Therefore, special care has been taken to provide dispel4py with the ability to map abstract workflows to different enactment platforms dynamically, at run time. In this work we present four dispel4py mappings: Apache Storm, MPI, multi-threading and sequential. The results show that dispel4py is successful in enacting on different platforms, while also providing scalable performance.

References

[1]
A. J. G. Hey, S. Tansley, and K. Tolle, The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, 2009.
[2]
T. Segaran and J. Hammerbacher, Beautiful Data: The Stories Behind Elegant Data Solutions. O'Reilly, 2009.
[3]
A. Shoshani and D. Rotem, Scientific Data Management: Challenges, Technology and Deployment, ser. Computational Science Series. Chapman and Hall/CRC, 2010.
[4]
W. H. Dutton and P. W. Jeffreys, World Wide Research: Reshaping the Sciences and Humanities. MIT Press, 2010.
[5]
T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li, "Taverna: A tool for the composition and enactment of bioinformatics workflows," Bioinformatics, vol. 20, no. 17, pp. 3045--3054, Nov. 2004. {Online}. Available: http://dx.doi.org/10.1093/bioinformatics/bth361
[6]
D. De Roure and C. Goble, "Software design for empowering scientists," IEEE Software, vol. 26, no. 1, pp. 88--95, 2009.
[7]
M. P. Atkinson, C. S. Liew, M. Galea, P. Martin, A. Krause, A. Mouat,. Corcho, and D. Snelling, "Data-intensive architecture for scientific knowledge discovery." Distributed and Parallel Databases, vol. 30, no. 5-6, pp. 307--324, 2012. {Online}. Available: http://dblp.uni-trier.de/db/journals/dpd/dpd30.html#AtkinsonLGMKMCS12
[8]
Message Passing Interface Forum, "MPI: A message-passing interface standard," International Journal of Supercomputer Applications, vol. 8, pp. 165--414, 1994.
[9]
I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock, "Kepler: an extensible system for design and execution of scientific workflows," in Proceedings of 16th International Conference on Scientific and Statistical Database Management, June 2004, pp. 423--424.
[10]
E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. C. Laity, J. C. Jacob, and D. S. Katz, "Pegasus: A framework for mapping complex scientific workflows onto distributed systems," Scientific Programming, vol. 13, no. 3, pp. 219--237, 2005.
[11]
D. Churches, G. Gombas, A. Harrison, J. Maassen, C. Robinson, M. Shields, I. Taylor, and I. Wang, "Programming scientific and distributed workflow with Triana services: Research Articles," Concurrency and Computation: Practice and Experience, vol. 18, no. 10, pp. 1021--1037, 2006.
[12]
M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster, "Swift: A language for distributed parallel scripting," Parallel Computing, vol. 37, no. 9, pp. 633--652, 2011. {Online}. Available: http://www.sciencedirect.com/science/article/pii/S0167819111000524
[13]
Y. Simmhan, R. Barga, C. van Ingen, E. Lazowska, and A. Szalay, "Building the Trident Scientific Workflow Workbench for Data Management in the Cloud," in International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP). IEEE, October 2009.
[14]
X. Llorá, B. Ács, L. S. Auvil, B. Capitanu, M. E. Welge, and D. E. Goldberg, "Meandre: Semantic-Driven Data-Intensive Flows in the Clouds," in IEEE Fourth International Conference on eScience. IEEE Press, 2008, pp. 238--245.
[15]
Z. Falt, D. Bednárek, M. Kruliš, J. Yaghob, and F. Zavoral, "Bobolang: A language for parallel streaming applications," in Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, ser. HPDC '14. New York, NY, USA: ACM, 2014, pp. 311--314. {Online}. Available: http://doi.acm.org/10.1145/2600212.2600711
[16]
P. Fraternali and S. Paraboschi, "Chimera: A language for designing rule applications." in Active Rules in Database Systems, 1999, pp. 309--322. {Online}. Available: http://dblp.uni-trier.de/db/books/collections/patton99.html#FraternaliP99
[17]
G. D. Bensen, M. H. Ritzwoller, M. P. Barmin, A. L. Levshin, F. Lin, M. P. Moschetti, N. M. Shapiro, and Y. Yang, "Processing seismic ambient noise data to obtain reliable broad-band surface wave dispersion measurements," Geophysical Journal International, vol. 169, no. 3, pp. 1239--1260, 2007.
[18]
"SEED Reference Manual v2.4, Appendix G: Data Only SEED Volumes (Mini-SEED)," http://www.iris.edu/manuals/SEED_appG.htm, accessed August 2011.

Cited By

View all
  • (2021)Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated WayProceedings of the ACM on Human-Computer Interaction10.1145/34796045:CSCW2(1-43)Online publication date: 18-Oct-2021
  • (2019)A Survey on Collecting, Managing, and Analyzing Provenance from ScriptsACM Computing Surveys10.1145/331195552:3(1-38)Online publication date: 18-Jun-2019
  • (2016)AsterismProceedings of the 7th International Workshop on Data-Intensive Computing in the Cloud10.5555/3018100.3018101(1-8)Online publication date: 13-Nov-2016
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
DISCS '14: Proceedings of the 2014 International Workshop on Data Intensive Scalable Computing Systems
November 2014
86 pages
ISBN:9781479970384

Sponsors

Publisher

IEEE Press

Publication History

Published: 16 November 2014

Check for updates

Author Tags

  1. Python
  2. data streaming
  3. data-intensive computing
  4. e-infrastructures
  5. programming frameworks
  6. scientific workflows

Qualifiers

  • Research-article

Conference

SC '14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 19 of 34 submissions, 56%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Dr.Aid: Supporting Data-governance Rule Compliance for Decentralized Collaboration in an Automated WayProceedings of the ACM on Human-Computer Interaction10.1145/34796045:CSCW2(1-43)Online publication date: 18-Oct-2021
  • (2019)A Survey on Collecting, Managing, and Analyzing Provenance from ScriptsACM Computing Surveys10.1145/331195552:3(1-38)Online publication date: 18-Jun-2019
  • (2016)AsterismProceedings of the 7th International Workshop on Data-Intensive Computing in the Cloud10.5555/3018100.3018101(1-8)Online publication date: 13-Nov-2016
  • (2016)Web Services as Building Blocks for Science Gateways in AstrophysicsJournal of Grid Computing10.1007/s10723-016-9382-y14:4(673-685)Online publication date: 1-Dec-2016
  • (2015)dispel4pyProceedings of the 5th Workshop on Python for High-Performance and Scientific Computing10.1145/2835857.2835863(1-10)Online publication date: 15-Nov-2015
  • (2014)Workflows in a dashboardProceedings of the 9th Workshop on Workflows in Support of Large-Scale Science10.1109/WORKS.2014.6(82-93)Online publication date: 16-Nov-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media