research-article

A Scalable Multi-Granular Data Model for Data Parallel Workflows

Authors:

Shinichiro Takizawa,

Motohiko Matsuda,

Naoya Maruyama,

Yoshifumi NakamuraAuthors Info & Claims

HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

Pages 251 - 260

https://doi.org/10.1145/3149457.3154483

Published: 28 January 2018 Publication History

Abstract

Scientific applications consist of many tasks and each task has different requirements for the degree of parallelism and data access pattern. To satisfy these requirements, a task scheduling has to assign required number of processes to each task and task's input has to be decomposed and arranged to these processes by considering data access pattern to exploit data locality. However, hand-writing these code is a troublesome and error-prone work. We propose a multi-view data model where users can specify rules of data decomposition for multi-dimensional data to change data layout on top of processes and define unit of parallel processing by simple directives. Our framework conducts data arrangement and affinity-aware task scheduling transparently from users by following the specified rules. Through a case study of a lattice QCD simulation program, we confirmed that our proposal reduced programming efforts against hand-written MPI code with performance penalties up to 17%.

References

[1]

{n. d.}. Welcome to Apache Hadoop. http://hadoop.apache.org/. ({n. d.}).

[2]

Ganesh Bikshandi, Jia Guo, Daniel Hoeflinger, Gheorghe Almasi, Basilio B Fraguela, María J Garzarán, David Padua, and Christoph von Praun. 2006. Programming for Parallelism and Locality with Hierarchically Tiled Arrays. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 48--57.

Digital Library

[3]

Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: The New Adventures of Old X10. In Proceedings of the 9th International Conference on Principles and Practice of Programming in Java. 51--61.

Digital Library

[4]

Bradford L. Chamberlain, Steven J. Deitz, David Iten, and Sung-Eun Choi. 2010. User-Defined Distributions and Layouts in Chapel: Philosophy and Framework. 2nd USENIX Workshop on Hot Topics in Parallelism (2010).

Digital Library

[5]

Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An Object-oriented Approach to Non-uniform Cluster Computing. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications. 519--538.

Digital Library

[6]

Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey, François Cantonnet, Tarek El-Ghazawi, Ashrujit Mohanti, Yiyi Yao, and Daniel Chavarría-Miranda. 2005. An Evaluation of Global Address Space Languages: Co-array Fortran and Unified Parallel C. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 36--47.

Digital Library

[7]

Jeff Daily, Abhinav Vishnu, Bruce Palmer, Hubertus van Dam, and Darren Kerbyson. 2014. On the suitability of MPI as a PGAS runtime. In 2014 21st International Conference on High Performance Computing (HiPC).

[8]

Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (2008), 1--13.

Digital Library

[9]

Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: a runtime for iterative MapReduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 810--818.

Digital Library

[10]

Ltd. Fujitsu. 2012. Special Issue: The K Computer. FUJITSU Scientific & Technical Journal (FSTJ) 48, 3 (2012).

[11]

David B. Loveman. 1993. High performance Fortran. IEEE Parallel Distributed Technology: Systems Applications 1, 1 (feb 1993), 25--42.

Digital Library

[12]

Motohiko Matsuda, Naoya Maruyama, and Shinichiro Takizawa. 2013. K MapReduce: A Scalable Tool for Data-Processing and Search/Ensemble Applications on Large-Scale Supercomputers. In IEEE Cluster 2013 Conference.

[13]

Yohifumi Nakamura and Hinnerk Stuben. 2010. BQCD - Berlin quantum chromodynamics program. In The 28th International Symposium on Lattice Field Theory (Lattice2010).

[14]

Jaroslaw Nieplocha, Robert J. Harrison, and Richard J. Little-field. 1996. Global arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing 10, 2 (1996).

Digital Library

[15]

Steven J Plimpton and Karen D Devine. 2011. MapReduce in MPI for Large-scale graph algorithms. Parallel Comput. 37, 9 (2011), 610--632.

Digital Library

[16]

Yi Wang, Gagan Agrawal, Tekin Bicer, and Wei Jiang. 2015. Smart: A MapReduce-like Framework for In-situ Scientific Analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, 51:1--51:12.

Digital Library

[17]

Hisashi Yashiro, Koji Terasaki, Takemasa Miyoshi, and Hirofumi Tomita. 2016. Performance evaluation of a throughput-aware framework for ensemble data assimilation: the case of NICAMLETKF. Geoscientific Model Development 9, 7 (2016), 2293--2300.

[18]

Yili Zheng, A Kamil, M B Driscoll, Hongzhang Shan, and K Yelick. 2014. UPC++: A PGAS Extension for C++. In 28th IEEE International Parallel and Distributed Processing Symposium. 1105--1114.

Digital Library

Cited By

Lu W(2020)Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel FrameworkJournal of Grid Computing10.1007/s10723-019-09503-018:2(239-250)Online publication date: 1-Jun-2020
https://dl.acm.org/doi/10.1007/s10723-019-09503-0

Index Terms

A Scalable Multi-Granular Data Model for Data Parallel Workflows

Index terms have been assigned to the content through auto-classification.

Recommendations

Communicating Data-Parallel Tasks: An MPI Library for HPF
HIPC '96: Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)

High Performance Fortran (HPF) has emerged as a standard dialect of Fortran for data-parallel computing. However, HPF does not support task parallelism or heterogeneous computing adequately. This paper presents a summary of our work on a library-based ...
Scalable data redistribution services for distributed-memory machines
Data-Parallel Programming on MIMD Computers

The implementation of two compilers for the data-parallel programming language Dataparallel C is described. One compiler generates code for Intel and nCUBE hypercube multicomputers; the other generates code for Sequent multiprocessors. A suite of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

HPCAsia '18: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

January 2018

322 pages

ISBN:9781450353724

DOI:10.1145/3149457

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
IPSJ: Information Processing Society of Japan
Cybermedia Center, Osaka University: Cybermedia Center, Osaka University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 January 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

HPC Asia 2018

HPC Asia 2018: International Conference on High Performance Computing in Asia-Pacific Region

January 28 - 31, 2018

Tokyo, Chiyoda, Japan

Acceptance Rates

HPCAsia '18 Paper Acceptance Rate 30 of 67 submissions, 45%;

Overall Acceptance Rate 69 of 143 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
44
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lu W(2020)Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel FrameworkJournal of Grid Computing10.1007/s10723-019-09503-018:2(239-250)Online publication date: 1-Jun-2020
https://dl.acm.org/doi/10.1007/s10723-019-09503-0

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten