Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1645164.1645175acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Composing and executing parallel data-flow graphs with shell pipes

Published: 16 November 2009 Publication History

Abstract

In this paper we extend the concept of shell pipes to incorporate forks, joins, cycles, and key-value aggregation. These extensions enable the implementation of a class of data-flow computation with strong deterministic properties, and provide a simple yet powerful coordination layer for leveraging multi-language and legacy components for large-scale parallel computation. Concretely, this paper describes the design and implementation of the language extensions in Bourne Again SHell (BASH), and examines the performance of the system using micro and macro benchmarks. The implemented system is shown to scale to thousands of processors, enabling high throughput performance for millions of processing tasks on large commodity compute clusters.

References

[1]
Dennis. M. Ritchie, "The Evolution of the Unix Time-Sharing System", AT&T Laboratories Technical Journal, vol. 6(2) (Oct. 1984), 1577--1593, Oct. 1984.
[2]
R. P. Lai, "In Praise of Scripting: Real Programming Pragmatism", IEEE Computer, 22--26, July 2008.
[3]
L. Prechelt, "An Empirical Comparison of Seven Programming Languages", IEEE Computer, vol. 33(10), 23--29, Oct. 2000.
[4]
J. K. Ousterhout, "Scripting: Higher Level Programming for the 21st Century", IEEE Computer, 23--30, Mar. 1998.
[5]
G. Kahn, "The Semantics of a Simple Language for Parallel Programming", Information Processing, vol. 4, 471--475, 1974.
[6]
G. Kahn and D. B. MacQueen, "Coroutines and Networks of Parallel Processes", Information Processing, vol. 7, 993--998, 1997.
[7]
T. M. Parks, Bounded Scheduling of Process Networks, PhD Thesis, University of California at Berkeley, 1995.
[8]
Edward Lee, "Dataflow Process Networks", Proceedings of IEEE, vol. 83(5) (May 1995), 773--801, 1995.
[9]
D. Gelernter and N. Carriero, "Coordination Languages and their Significance", Comm. ACM, 35(2), 97--107, 1992.
[10]
G. Papadopoulos, and F. Arbab, Coordination Models and Languages, Advances in Computers -- The Engineering of Large Systems, vol. 46, Academic Press, 329--400, 1998.
[11]
Edward Lee et. al. The Ptolemy Project. (online) http://ptolemy.eecs.berkeley.edu/
[12]
J. P. Hartmann. 2007. CMS Pipelines Explained. (online) http://vm.marist.edu/~pipeline/pipjarg.pdf
[13]
B. Gailer. Python-pipelines: A Python Implementation of Hartmann (CMS) Pipelines. (online) http://code.google.com/p/python-pipelines/
[14]
S. Macdonald, "Rethinking the Pipeline as Object-Oriented States with Transformations", Proc. 9th International Workshop on High-Level Programming Models and Supportive Environments (HIPS'2004), 12--21, 2004.
[15]
D. Jones, "Windows PowerShell: Rethinking the Pipeline", Microsoft TechNet Magazine, July 2007.
[16]
Yahoo!Pipes. (online) http://pipes.yahoo.com/pipes/
[17]
J. Dean, and S. Ghemawat, "MapReduce: Simplified Data Processing for Large Clusters", in Proc. of 6th Symposium on Operating System Design and Implementation (OSDI'04), (2004), 137--150, 2004.
[18]
H. Chih Yang, A. Dasdan, R-L. Hsiao, and D. S. Parker, "Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters", in Proc. of SIGMOD International Conf. on Management of Data, 2007.
[19]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, "Evaluating MapReduce for Multi-core and Multiprocessor Systems", Proc. of 13th International Symposium of High-Performance Computer Architecture (HPCA), Feb. 2007.
[20]
E. Walker, and T. Minyard, and J. Boisseau, "GridShell: A Login Shell for Orchestrating and Coordinating Applications in a Grid Enabled Environment", Proc. of International Conference of Computing, Communications and Control Technologies, 182--187, Austin, TX, August 2004.
[21]
ApectShell: Aspect-oriented scripting shells. (online) http://www.tacc.utexas.edu/~ewalker/gridshell/GridShell.htm
[22]
Apache Hadoop. (online) http://hadoop.apache.org/core/
[23]
R. Pike, S. Dorward, R. Griesemer, and S. Quinlan, "Interpreting the Data: Parallel Analysis with Sawzall", Scientific Programming 13(4), 2005.
[24]
C. Olston, B. Reed, U. Srivastave, R. Kumar, and A. Tomkins, "Pig Latin: A not so Foreign Language for Data Processing", in Proc. of International Conf. on Management of Data (Industrial Track), 2008.
[25]
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. Gunda, and J. Currey, "DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language", in Proc. of 8th Symposium on Operating System Design and Implementation (OSDI'08), 2008.
[26]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks", in Proc. of European Conference on Computer Systems (EuroSys), 2007.
[27]
J. Carnahan, and D. DeCoste, "Pipelets: A Framework for Distributed Computation", W4: Learning in Web Search, 2005.
[28]
R. Chaiken, B. Jenkins, P-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou, "SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets", in Proc. of International Conference of Very Large Data Bases (VLDB), 2008.
[29]
TeraSort Benchmark. (online) http://www.hpl.hp.com/hosted/sortbenchmark/
[30]
Project Gutenberg. (online) http://www.gutenberg.org
[31]
Dataflow Graphs in System Shells. (online) http://sites.google.com/site/ewalker544/dataflowshell
[32]
TOP500 Supercomputing Sites (online), http://www.top500.org.
[33]
David Kitchin, Adrian Quark, William Cook, and Jayadev Misra, "The Orc Programming Language", in Proc. FMOODS/FORTE, Springer Verlag, LNCS 5522, 1--25, 2009
[34]
J. Karges, O. Ritter, and S. Suhai, "Design and Implementation of a Parallel Pipe", Operating System Reviews, 31(2), 64--94, 1997.

Cited By

View all
  • (2021)An order-aware dataflow model for parallel Unix pipelinesProceedings of the ACM on Programming Languages10.1145/34735705:ICFP(1-28)Online publication date: 19-Aug-2021
  • (2020)A Complete Language for Faceted Dataflow ProgramsElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.323.1323(1-14)Online publication date: 15-Sep-2020
  • (2017)Extending Unix Pipelines to DAGsIEEE Transactions on Computers10.1109/TC.2017.269544766:9(1547-1561)Online publication date: 1-Sep-2017
  • Show More Cited By

Index Terms

  1. Composing and executing parallel data-flow graphs with shell pipes

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WORKS '09: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
    November 2009
    136 pages
    ISBN:9781605587172
    DOI:10.1145/1645164
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 November 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. coordination languages
    2. data-flow processing
    3. parallel processing

    Qualifiers

    • Research-article

    Conference

    SC '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 30 of 54 submissions, 56%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)An order-aware dataflow model for parallel Unix pipelinesProceedings of the ACM on Programming Languages10.1145/34735705:ICFP(1-28)Online publication date: 19-Aug-2021
    • (2020)A Complete Language for Faceted Dataflow ProgramsElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.323.1323(1-14)Online publication date: 15-Sep-2020
    • (2017)Extending Unix Pipelines to DAGsIEEE Transactions on Computers10.1109/TC.2017.269544766:9(1547-1561)Online publication date: 1-Sep-2017
    • (2016)From desktop to large-scale model exploration with Swift/TProceedings of the 2016 Winter Simulation Conference10.5555/3042094.3042132(206-220)Online publication date: 11-Dec-2016
    • (2016)From desktop to Large-Scale Model Exploration with Swift/T2016 Winter Simulation Conference (WSC)10.1109/WSC.2016.7822090(206-220)Online publication date: Dec-2016
    • (2014)Adaptive Simulation with Repast Simphony and SwiftRevised Selected Papers, Part I, of the Euro-Par 2014 International Workshops on Parallel Processing - Volume 880510.1007/978-3-319-14325-5_36(418-429)Online publication date: 25-Aug-2014
    • (2013)Parallelizing the execution of sequential scriptsProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.1145/2503210.2503222(1-12)Online publication date: 17-Nov-2013
    • (2012)TurbineProceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies10.1145/2443416.2443421(1-12)Online publication date: 20-May-2012
    • (2011)Opportunities and Challenges in Running Scientific Workflows on the CloudProceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery10.1109/CyberC.2011.80(455-462)Online publication date: 10-Oct-2011
    • (2011)SwiftParallel Computing10.1016/j.parco.2011.05.00537:9(633-652)Online publication date: 1-Sep-2011
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media