Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3149457.3154483acmotherconferencesArticle/Chapter ViewAbstractPublication PageshpcasiaConference Proceedingsconference-collections
research-article

A Scalable Multi-Granular Data Model for Data Parallel Workflows

Published: 28 January 2018 Publication History

Abstract

Scientific applications consist of many tasks and each task has different requirements for the degree of parallelism and data access pattern. To satisfy these requirements, a task scheduling has to assign required number of processes to each task and task's input has to be decomposed and arranged to these processes by considering data access pattern to exploit data locality. However, hand-writing these code is a troublesome and error-prone work. We propose a multi-view data model where users can specify rules of data decomposition for multi-dimensional data to change data layout on top of processes and define unit of parallel processing by simple directives. Our framework conducts data arrangement and affinity-aware task scheduling transparently from users by following the specified rules. Through a case study of a lattice QCD simulation program, we confirmed that our proposal reduced programming efforts against hand-written MPI code with performance penalties up to 17%.

References

[1]
{n. d.}. Welcome to Apache Hadoop. http://hadoop.apache.org/. ({n. d.}).
[2]
Ganesh Bikshandi, Jia Guo, Daniel Hoeflinger, Gheorghe Almasi, Basilio B Fraguela, María J Garzarán, David Padua, and Christoph von Praun. 2006. Programming for Parallelism and Locality with Hierarchically Tiled Arrays. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 48--57.
[3]
Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: The New Adventures of Old X10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java. 51--61.
[4]
Bradford L. Chamberlain, Steven J. Deitz, David Iten, and Sung-Eun Choi. 2010. User-Defined Distributions and Layouts in Chapel: Philosophy and Framework. 2nd USENIX Workshop on Hot Topics in Parallelism (2010).
[5]
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An Object-oriented Approach to Non-uniform Cluster Computing. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications. 519--538.
[6]
Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey, François Cantonnet, Tarek El-Ghazawi, Ashrujit Mohanti, Yiyi Yao, and Daniel Chavarría-Miranda. 2005. An Evaluation of Global Address Space Languages: Co-array Fortran and Unified Parallel C. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 36--47.
[7]
Jeff Daily, Abhinav Vishnu, Bruce Palmer, Hubertus van Dam, and Darren Kerbyson. 2014. On the suitability of MPI as a PGAS runtime. In 2014 21st International Conference on High Performance Computing (HiPC).
[8]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (2008), 1--13.
[9]
Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: a runtime for iterative MapReduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 810--818.
[10]
Ltd. Fujitsu. 2012. Special Issue: The K Computer. FUJITSU Scientific & Technical Journal (FSTJ) 48, 3 (2012).
[11]
David B. Loveman. 1993. High performance Fortran. IEEE Parallel Distributed Technology: Systems Applications 1, 1 (feb 1993), 25--42.
[12]
Motohiko Matsuda, Naoya Maruyama, and Shinichiro Takizawa. 2013. K MapReduce: A Scalable Tool for Data-Processing and Search/Ensemble Applications on Large-Scale Supercomputers. In IEEE Cluster 2013 Conference.
[13]
Yohifumi Nakamura and Hinnerk Stuben. 2010. BQCD - Berlin quantum chromodynamics program. In The 28th International Symposium on Lattice Field Theory (Lattice2010).
[14]
Jaroslaw Nieplocha, Robert J. Harrison, and Richard J. Little-field. 1996. Global arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing 10, 2 (1996).
[15]
Steven J Plimpton and Karen D Devine. 2011. MapReduce in MPI for Large-scale graph algorithms. Parallel Comput. 37, 9 (2011), 610--632.
[16]
Yi Wang, Gagan Agrawal, Tekin Bicer, and Wei Jiang. 2015. Smart: A MapReduce-like Framework for In-situ Scientific Analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, 51:1--51:12.
[17]
Hisashi Yashiro, Koji Terasaki, Takemasa Miyoshi, and Hirofumi Tomita. 2016. Performance evaluation of a throughput-aware framework for ensemble data assimilation: the case of NICAMLETKF. Geoscientific Model Development 9, 7 (2016), 2293--2300.
[18]
Yili Zheng, A Kamil, M B Driscoll, Hongzhang Shan, and K Yelick. 2014. UPC++: A PGAS Extension for C++. In 28th IEEE International Parallel and Distributed Processing Symposium. 1105--1114.

Cited By

View all
  • (2020)Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel FrameworkJournal of Grid Computing10.1007/s10723-019-09503-018:2(239-250)Online publication date: 1-Jun-2020

Index Terms

  1. A Scalable Multi-Granular Data Model for Data Parallel Workflows
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
        January 2018
        322 pages
        ISBN:9781450353724
        DOI:10.1145/3149457
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        In-Cooperation

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 28 January 2018

        Permissions

        Request permissions for this article.

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        HPC Asia 2018

        Acceptance Rates

        HPCAsia '18 Paper Acceptance Rate 30 of 67 submissions, 45%;
        Overall Acceptance Rate 69 of 143 submissions, 48%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)3
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 14 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2020)Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel FrameworkJournal of Grid Computing10.1007/s10723-019-09503-018:2(239-250)Online publication date: 1-Jun-2020

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media