Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2503210.2503216acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

ACIC: automatic cloud I/O configurator for HPC applications

Published: 17 November 2013 Publication History

Abstract

The cloud has become a promising alternative to traditional HPC centers or in-house clusters. This new environment highlights the I/O bottleneck problem, typically with top-of-the-line compute instances but sub-par communication and I/O facilities. It has been observed that changing cloud I/O system configurations leads to significant variation in the performance and cost efficiency of I/O intensive HPC applications. However, storage system configuration is tedious and error-prone to do manually, even for experts.
This paper proposes ACIC, which takes a given application running on a given cloud platform, and automatically searches for optimized I/O system configurations. ACIC utilizes machine learning models to perform black-box performance/cost predictions. To tackle the high-dimensional parameter exploration space unique to cloud platforms, we enable affordable, reusable, and incremental training guided by Plackett and Burman Matrices. Results with four representative applications indicate that ACIC consistently identifies near-optimal configurations among a large group of candidate settings.

References

[1]
GPFS: A shared-disk file system for large computing clusters.
[2]
G. Alvarez, E. Borowsky, and S. e. a. Go. Minerva: An Automated Resource Provisioning Tool for Large-scale Storage Systems. ACM Transactions on Computer Systems (TOCS), 19(4):483--518, 2001.
[3]
Amazon Inc. High Performance Computing (HPC). http://aws.amazon.com/ec2/hpc-applications/, 2011.
[4]
E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. Hippodrome: Running Circles Around Storage Administration. In FAST, 2002.
[5]
M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al. A View of Cloud Computing. Communications of the ACM, 53(4):50--58, 2010.
[6]
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A Scalable Cross-platform Infrastructure for Application Performance Tuning Using Hardware Counters. In SC. IEEE, 2000.
[7]
S. Byna, Y. Chen, X.-H. Sun, R. Thakur, and W. Gropp. Parallel I/O Prefetching Using MPI File Caching and I/O Signatures. In SC. IEEE, 2008.
[8]
B. Callaghan. NFS Illustrated. Addison-Wesley Longman Ltd., Essex, UK, 2000.
[9]
P. Carns, W. L. III, R. Ross, and R. Thakur. PVFS: A Parallel File System For Linux Clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, 2000.
[10]
Computational Research Division. Madbench2. http://crd-legacy.lbl.gov/~borrill/MADbench2/.
[11]
A. Darling, L. Carey, and W. Feng. The Design, Implementation, and Evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo, 2003.
[12]
C. Evangelinos and C. Hill. Cloud Computing for parallel Scientific HPC Applications: Feasibility of running Coupled Atmosphere-Ocean Climate Models on Amazon's EC2. ratio, 2(2.40):2--34, 2008.
[13]
M. Fahey, J. Larkin, and J. Adams. I/O performance on a massively parallel Cray XT3/XT4. In IPDPS. IEEE, 2008.
[14]
M. Folk, A. Cheng, and K. Yates. HDF5: A File Format and I/O Library for High Performance Computing Applications. In SC, volume 99, 1999.
[15]
S. Ghemawat, H. Gobioff, and S. Leung. The Google File System. In SOSP. ACM, 2003.
[16]
A. Gulati, G. Shanmuganathan, I. Ahmad, C. Waldspurger, and M. Uysal. Pesto: Online Storage Performance Management in Virtualized Datacenters. In SOCC, page 19. ACM, 2011.
[17]
H. Herodotou, F. Dong, and S. Babu. No One (cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics. In SOCC. ACM, 2011.
[18]
Y. Huai, R. Lee, S. Zhang, C. H. Xia, and X. Zhang. DOT: A Matrix Model for Analyzing, Optimizing And Deploying Software for Big Data Analytics in Distributed Systems. In SOCC. ACM, 2011.
[19]
N. Huber, S. Becker, C. Rathfelder, J. Schweflinghaus, and R. H. Reussner. Performance Modeling in Industry: A Case Study on Storage Virtualization. In ICSE. ACM, 2010.
[20]
G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling. Data Sharing Options for Scientific Workflows on Amazon EC2. In SC, 2010.
[21]
D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive Performance And Scalability Modeling of A Large-scale Application. In SC. ACM, 2001.
[22]
A. Konwinski, J. Bent, J. Nunez, and M. Quist. Towards An I/O Tracing Framework Taxonomy. In PDSW. ACM, 2007.
[23]
L. M. Kristensen and L. Petrucci. An Approach to Distributed State Space Exploration for Coloured Petri Nets. In ICATPN. Springer, 2004.
[24]
S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock. I/O Performance Challenges at Leadership Scale. In SC. ACM, 2009.
[25]
H. Lin, X. Ma, W. Feng, and N. Samatova. Coordinating Computation and I/O in Massively Parallel Sequence Search. IEEE Transactions on Parallel and Distributed Systems, 22(4):529--543, 2011.
[26]
M. Liu, Y. Jin, J. Zhai, Y. Z. Q. Shi, X. Ma, and W. Chen. ACIC Homepage. http://hpc.cs.tsinghua.edu.cn/ACIC, 2013.
[27]
X. Ma, M. Winslett, J. Lee, and S. Yu. Improving MPI-IO Output Performance with Active Buffering Plus Threads. In IPDPS. IEEE, 2003.
[28]
H. Madhyastha, J. McCullough, G. Porter, R. Kapoor, S. Savage, A. Snoeren, and A. Vahdat. scc: Cluster Storage Provisioning Informed by Application Characteristics and SLAs. In FAST. USENIX, 2012.
[29]
G. Marin and J. Mellor-Crummey. Cross-architecture Performance Predictions for Scientific Applications Using Parameterized Models. In SIGMETRICS. ACM, 2004.
[30]
M. Mesnier, M. Wachs, R. Sambasivan, A. Zheng, and G. Ganger. Modeling the Relative Fitness of Storage. In SIGMETRICS. ACM, 2007.
[31]
Message Passing Interface Forum. The Message Passing Interface (MPI) standard. http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf.
[32]
Mingliang Liu and Jidong Zhai and Yan Zhai and Xiaosong Ma and Wenguang Chen. One Optimized I/O Configuration per HPC Application: Leveraging The Configurability of Cloud. In APSys. ACM, 2011.
[33]
D. Montgomery. Design and analysis of experiments. John Wiley & Sons Inc., 1991.
[34]
National Center for Biotechnology Information. NCBI BLAST. http://www.ncbi.nlm.nih.gov/BLAST/.
[35]
L. Olshen and C. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
[36]
T. Osogami and S. Kato. Optimizing System Configurations Quickly by Guessing at The Performance. In SIGMETRICS, 2007.
[37]
T. Papaioannou, N. Bonvin, and K. Aberer. Scalia: An Adaptive Scheme for Efficient Multi-Cloud Storage. In SC, 2012.
[38]
R. Plackett and J. Burman. The Design of Optimum Multifactorial Experiments. Biometrika, 33(4):305--325, 1946.
[39]
A. Purakayastha, C. Ellis, D. Kotz, N. Nieuwejaar, and M. Best. Characterizing Parallel File-access Patterns on a Large-scale Multiprocessor. In IPDPS. IEEE, 1995.
[40]
R. Rew and G. Davis. NetCDF: An Interface for Scientific Data Access. Computer Graphics and Applications, IEEE, 10(4):76--82, 1990.
[41]
P. Schwan. Lustre: Building A File System for 1000-node Clusters. In Proceedings of the 2003 Linux Symposium, volume 2003, 2003.
[42]
H. Shan, K. Antypas, and J. Shalf. Characterizing and Predicting the I/O Performance of HPC Applications Using a Parameterized Synthetic Benchmark. In SC. IEEE, 2008.
[43]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1--10. IEEE, 2010.
[44]
J. Sim, A. Dasgupta, H. Kim, and R. Vuduc. A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications. In PPoPP. ACM, 2012.
[45]
N. R. Tallent, J. M. Mellor-Crummey, and M. W. Fagan. Binary Analysis for Measurement and Attribution of Program Performance. In PLDI. ACM, 2009.
[46]
V. Taylor, X. Wu, and R. Stevens. Prophesy: An Infrastructure for Performance Analysis And Modeling of Parallel And Grid Applications. In SIGMETRICS. ACM, 2003.
[47]
R. Thakur, W. Gropp, and E. Lusk. Data Sieving and Collective I/O in ROMIO. In FRONTIERS, 1999.
[48]
E. Thereska, B. Doebel, A. Zheng, and P. Nobel. Practical Performance Models for Complex, Popular Applications. In SIGMETRICS. ACM, 2010.
[49]
L. William, M. Tyce, and M. Christopher. IOR HPC Benchmark. https://asc.llnl.gov/sequoia/benchmarks, 2003.
[50]
P. Wong and R. der Wijngaart. NAS Parallel Benchmarks I/O Version 2.4. NASA Ames Research Center Tech. Rep. NAS-03-002, 2003.
[51]
R. Xue, W. Chen, and W. Zheng. CprFS: A User-level File System to Support Consistent File States for Checkpoint and Restart. In ICS. ACM, 2008.
[52]
L. T. Yang, X. Ma, and F. Mueller. Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution. In SC. IEEE, 2005.
[53]
J. Yi, D. Lilja, and D. Hawkins. A Statistically Rigorous Approach for Improving Simulation Methodology. In HPCA. IEEE, 2003.
[54]
J. Zhai, W. Chen, and W. Zheng. Phantom: Predicting Performance of Parallel Applications on Large-scale Parallel Machines Using a Single Node. In PPoPP. ACM, 2010.
[55]
Y. Zhai, M. Liu, J. Zhai, X. Ma, and W. Chen. Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications. In SC. ACM, 2011.
[56]
M. Zingale. FLASH I/O Benchmark Routine Parallel HDF5. http://www.ucolick.org/~zingale, 2001.

Cited By

View all
  • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
  • (2023)Towards OS Heterogeneity Aware Cluster Management for HPCProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609819(16-23)Online publication date: 24-Aug-2023
  • (2022)Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directionsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-0625-816:5Online publication date: 1-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. modeling
  3. performance
  4. storage

Qualifiers

  • Research-article

Funding Sources

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)I/O Access Patterns in HPC Applications: A 360-Degree SurveyACM Computing Surveys10.1145/361100756:2(1-41)Online publication date: 15-Sep-2023
  • (2023)Towards OS Heterogeneity Aware Cluster Management for HPCProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609819(16-23)Online publication date: 24-Aug-2023
  • (2022)Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directionsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-022-0625-816:5Online publication date: 1-Oct-2022
  • (2021)Sova: A Software-Defined Autonomic Framework for Virtual Network AllocationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.301214632:1(116-130)Online publication date: 1-Jan-2021
  • (2020)Kill Two Birds with One Stone: Auto-tuning RocksDB for High Bandwidth and Low Latency2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS47774.2020.00113(652-664)Online publication date: Nov-2020
  • (2019)Harnessing Data Movement in Virtual Clusters for In-Situ ExecutionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286787930:3(615-629)Online publication date: 1-Mar-2019
  • (2019)Can I/O Variability Be Reduced on QoS-Less HPC Storage Systems?IEEE Transactions on Computers10.1109/TC.2018.288170968:5(631-645)Online publication date: 1-May-2019
  • (2016)Exploiting Redundancy and Application Scalability for Cost-Effective, Time-Constrained Execution of HPC Applications on Amazon EC2IEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.250845727:9(2574-2588)Online publication date: 1-Sep-2016
  • (2016)Support for Provisioning and Configuration Decisions for Data Intensive WorkflowsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.249769327:9(2725-2739)Online publication date: 1-Sep-2016
  • (2015)Towards Transparent Throughput Elasticity for IaaS Cloud StorageInternational Journal of Distributed Systems and Technologies10.4018/IJDST.20151001026:4(21-44)Online publication date: 1-Oct-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media