Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2503210.2503238acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Insights for exascale IO APIs from building a petascale IO API

Published: 17 November 2013 Publication History

Abstract

Near the dawn of the petascale era, IO libraries had reached a stability in their function and data layout with only incremental changes being incorporated. The shift in technology, particularly the scale of parallel file systems and the number of compute processes, prompted revisiting best practices for optimal IO performance.
Among other efforts like PLFS, the project that led to ADIOS, the ADaptable IO System, was motivated by both the shift in technology and the historical requirement, for optimal IO performance, to change how simulations performed IO depending on the platform. To solve both issues, the ADIOS team, along with consultation with other leading IO experts, sought to build a new IO platform based on the assumptions inherent in the petascale hardware platforms.
This paper helps inform the design of future IO platforms with a discussion of lessons learned as part of the process of designing and building ADIOS.

References

[1]
M. Christen, N. Keen, T. Ligocki, L. Oliker, J. Shalf, B. Van Straalen, and S. Williams, "Automatic thread-level parallelization in the chombo amr library," Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US), Tech. Rep., 2011.
[2]
O. E. B. Messer, S. W. Bruenn, J. M. Blondin, W. R. Hix, A. Mezzacappa, and C. J. Dirk, "Petascale supernova simulation with CHIMERA," Journal of Physics Conference Series, vol. 78, no. 1, pp. 012 049--+, Jul. 2007.
[3]
Z. Lin, T. S. Hahm, W. W. Lee, W. M. Tang, and R. B. White, "Turbulent transport reduction by zonal flows: Massively parallel simulations," Science, vol. 281, no. 5384, pp. 1835--1837, September 1998.
[4]
R. Rew and G. Davis, "Netcdf: an interface for scientific data access," Computer Graphics and Applications, IEEE, vol. 10, no. 4, pp. 76--82, 1990.
[5]
J. Li, W.-K. Liao, A. Choudhary, R. Ross, R. Thakur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale, "Parallel netcdf: A high-performance scientific i/o interface," in Supercomputing, 2003 ACM/IEEE Conference, 2003, pp. 39--39.
[6]
"HDF5 home page." http://hdf.ncsa.uiuc.edu/HDF5/.
[7]
SILO, "https://wci.llnl.gov/codes/visit/3rd_party/silo.book.pdf."
[8]
J. M. del Rosario, R. Bordawekar, and A. Choudhary, "Improved parallel I/O via a two-phase run-time access strategy," in Proceedings of the IPPS '93 Workshop on Input/Output in Parallel Computer Systems, Newport Beach, CA, 1993, pp. 56--70, also published in Computer Architecture News 21(5), December 1993, pages 31--38.
[9]
S. Lang, P. Carns, R. Latham, R. Ross, K. Harms, and W. Allcock, "I/O performance challenges at leadership scale," in Proceedings ofSC2009: High Performance Networking and Computing, Portland, OR, November 2009.
[10]
J. Lofstead, F. Zheng, S. Klasky, and K. Schwan, "Adaptable, metadata rich IO methods for portable high performance IO," in Proceedings of the International Parallel and Distributed Processing Symposium, Rome, Italy, 2009.
[11]
M. Polte, J. Lofstead, J. Bent, G. Gibson, S. A. Klasky, Q. Liu, M. Parashar, N. Podhorszki, K. Schwan, M. Wingate, and M. Wolf, "...and eat it too: high read performance in write-optimized hpc i/o middleware file formats," in Proceedings of the 4th Annual Workshop on Petascale Data Storage, ser. PDSW '09. New York, NY, USA: ACM, 2009, pp. 21--25. {Online}. Available: http://doi.acm.org/10.1145/1713072.1713079
[12]
J. Lofstead, M. Polte, G. Gibson, S. A. Klasky, K. Schwan, R. Oldfield, and M. Wolf, "Six degrees of scientific data: Reading patters for extreme scale IO," in Proceedings of the Twentieth IEEE International Symposium on High Performance Distributed Computing. San Jose, CA: IEEE Computer Society Press, Jun. 2011.
[13]
J. Lofstead, F. Zheng, Q. Liu, S. Klasky, R. Oldfield, T. Kordenbrock, K. Schwan, and M. Wolf, "Managing variability in the IO performance of petascale storage systems," in Proceedings ofSC2010: High Performance Networking and Computing, Nov. 2010.
[14]
N. P. P. WD1000DHTZ, "http://www.newegg.com/product/product.aspx?item=n82e16822236243&tpk=wd1000dhtz," September 2012, wD1000DHTZ, 1 TB, 203.10 MB/sec, $289.99 at newegg.
[15]
N. P. P. MZ-7PC12B/WW, "http://www.newegg.com/product/product.aspx?item=n82e16820147163," September 2012, samsung SSD 830, 128 GB, 392.10 MB/sec, $99.99 at newegg {256 MB @ 226.99, 512 MB @ 549.99}.
[16]
T. H. H. C. S. Writes, "http://www.tomshardware.com/charts/hdd-charts-2012/-25-iometer-2006.07.27-streaming-writes,2929.html," September 2012, wD1000DHTZ, 1 TB, 203.10 MB/sec, $289.99 at newegg.
[17]
T. H. S. C. S. Writes, "http://www.tomshardware.com/charts/ssd-charts-2011/as-ssd-sequential-write,2783.html," September 2012, samsung SSD 830, 128 GB, 392.10 MB/sec, $99.99 at newegg.
[18]
S. Sharwood, "Flash memory made immortal by fiery heat: Macronix's 'thermal annealing' process extends ssd life from 10k to 100m read/write cycles."
[19]
P. Nowoczynski, N. Stone, J. Yanovich, and J. Sommerfield, "Zest checkpoint storage system for large supercomputers," in Petascale Data Storage Workshop, 2008. PDSW '08. 3rd, nov. 2008, pp. 1--5.
[20]
N. Liu, J. Cope, P. H. Carns, C. D. Carothers, R. B. Ross, G. Grider, A. Crume, and C. Maltzahn, "On the role of burst buffers in leadership-class storage systems," in MSST. IEEE, 2012, pp. 1--11.
[21]
P. A. P. Storage, "http://www.panasas.com/activestor-14," September 2012, 20 HDDs + 10 SSDs for 1600 MB/sec.
[22]
IEEE, 2004 (ISO/IEC) {IEEE/ANSI Std 1003.1, 2004 Edition} Information Technology --- Portable Operating System Interface (POSIX®) --- Part 1: System Application: Program Interface (API) {C Language}. New York, NY USA: IEEE, 2004.
[23]
P. J. Braam, "The lustre storage architecture," Cluster File Systems Inc. Architecture, design, and manual for Lustre, Nov. 2002, http://www.lustre.org/docs/lustre.pdf. {Online}. Available: http://www.lustre.org/docs/lustre.pdf
[24]
P. H. Carns, W. B. Ligon III, R. B. Ross, and R. Thakur, "PVFS: A parallel file system for linux clusters," in Proceedings of the 4th Annual Linux Showcase and Conference. Atlanta, GA: USENIX Association, Oct. 2000, pp. 317--327. {Online}. Available: http://www.mcs.anl.gov/~thakur/papers/pvfs.ps
[25]
F. Schmuck and R. Haskin, "GPFS: A shared-disk file system for large computing clusters," in Proceedings of the USENIX FAST '02Conference on File and Storage Technologies. Monterey, CA: USENIX Association, Jan. 2002, pp. 231--244. {Online}. Available: http://www.usenix.org/publications/library/proceedings/fast02/schmuck.html
[26]
B. Welch, M. Unangst, Z. Abbasi, G. A. Gibson, B. Mueller, J. Small, J. Zelenka, and B. Zhou, "Scalable performance of the panasas parallel file system," in Proceedings of the USENIX FAST'08 Conference on File and Storage Technologies, M. Baker and E. Riedel, Eds. USENIX, Feb. 2008, pp. 17--33.
[27]
V. Pascucci and R. J. Frank, "Global static indexing for real-time exploration of very large regular grids," in Proc. SC01, Nov. 2001, pp. 45--45.
[28]
H. V. Jagadish, "Linear clustering of objects with multiple attributes," SIGMOD Rec., vol. 19, no. 2, pp. 332--342, 1990.
[29]
B. Moon, H. Jagadish, C. Faloutsos, and J. Saltz, "Analysis of the clustering properties of the hilbert space-filling curve," IEEE T. Knowl. Data En., vol. 13, no. 1, pp. 124--141, 2001.
[30]
Y. Hu, A. Cox, and W. Zwaenepoel, "Improving fine-grained irregular shared-memory benchmarks by data reordering," in Proc. SC00, Nov. 2000, pp. 33--33.
[31]
S. Kuo, M. Winslett, Y. Cho, J. Lee, and Y. Chen, "Efficient input and output for scientific simulations," in In Proceedings of I/O in Parallel and Distributed Systems (IOPADS). ACM Press, 1999, pp. 33--44.
[32]
Y. Tian, S. Klasky, H. Abbasi, J. F. Lofstead, R. W. Grout, N. Podhorszki, Q. Liu, Y. Wang, and W. Yu, "Edo: Improving read performance for scientific applications through elastic data organization," in CLUSTER. IEEE, 2011, pp. 93--102.
[33]
E. J. Felix, K. Fox, K. Regimbal, and J. Nieplocha, "Active storage processing in a parallel file system," in Proceedings of the LCI Internaltional Conference on Linux Clusters, Chapel Hill, North Carolina, Apr. 2005. {Online}. Available: http://www.linuxclustersinstitute.org/Linux-HPC-Revolution/Archive/PDF05/18-Felix E.pdf
[34]
E. Riedel, C. Faloutsos, G. A. Gibson, and D. Nagle, "Active disks for large-scale data processing," IEEE Computer, vol. 34, no. 6, pp. 68--74, Jun. 2001. {Online}. Available: http://www.computer.org/computer/co2001/r6068abs.htm
[35]
R. Wickremesinghe, J. S. Chase, and J. S. Vitter, "Distributed computing with load-managed active storage," in Proceedings of the Eleventh IEEE International Symposium on High Performance Distributed Computing. Edinburgh, Scotland: IEEE Computer Society Press, 2002, pp. 24--34.
[36]
R. A. Oldfield, A. B. Maccabe, S. Arunagiri, T. Kordenbrock, R. Riesen, L. Ward, and P. Widener, "Lightweight I/O for scientific applications," in Proceedings of the IEEE International Conference on Cluster Computing, Barcelona, Spain, Sep. 2006. {Online}. Available: http://doi.ieeecomputersociety. org/10.1109/CLUSTR.2006.311853
[37]
W. Yu and J. Vetter, "ParColl: Partitioned collective I/O on the cray XT," Parallel Processing, International Conference on, vol. 0, pp. 562--569, 2008.
[38]
H. Abbasi, J. Lofstead, F. Zheng, S. Klasky, K. Schwan, and M. Wolf, "Extending i/o through high performance data services," in Cluster Computing. Luoisiana, LA: IEEE International, September 2009.
[39]
J. Shalf, "Exascale computing technology challenges."
[40]
K. Gao, W. keng Liao, A. Choudhary, R. Ross, and R. Latham, "Combining I/O operations for multiple array variables in parallel netCDF," in Proceedings of 2009 Workshop on Interfaces and Architectures for Scientific Data Storage, New Orleans, LA, Sep. 2009.
[41]
R. Thakur, W. Gropp, and E. Lusk, "Optimizing noncontiguous accesses in MPI-IO," Parallel Computing, vol. 28, no. 1, pp. 83--105, Jan. 2002. {Online}. Available: http://www.mcs.anl.gov/~thakur/papers/mpi-io-noncontig.ps
[42]
J. Lofstead, R. Oldfiend, T. Kordenbrock, and C. Reiss, "Extending scalability of collective io through nessie and staging," in The Petascale Data Storage Workshop at Supercomputing, Seattle, WA, November 2011.
[43]
J. Lofstead, R. Oldfield, and T. Kordenbrock, "Unconventional data staging using nssi," in In Proceedings of IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, Delft, The Netherlands, May 2013.
[44]
R. Bordawekar, J. M. del Rosario, and A. Choudhary, "Design and evaluation of primitives for parallel I/O," in Proceedings of Supercomputing '93. Portland, OR: IEEE Computer Society Press, 1993, pp. 452--461. {Online}. Available: ftp://erc.cat.syr.edu/ece/choudhary/PASSION/sc93.ps.Z
[45]
J. Carretero, J. No, S.-S. Park, A. Choudhary, and P. Chen, "Compassion: a parallel I/O runtime system including chunking and compression for irregular applications," in Proceedings of the International Conference on High-Performance Computing and Networking, Apr. 1998, pp. 668--677.
[46]
R. Thakur and A. Choudhary, "An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays," Scientific Programming, vol. 5, no. 4, pp. 301--317, Winter 1996. {Online}. Available: http://www.mcs.anl.gov/~thakur/papers/ext2ph.ps
[47]
W. Yu, J. S. Vetter, S. Canon, and S. Jiang, "Exploiting Lustre file joining for effective collective IO," in Proceedings of the Seventh IEEE/ACM International Symposium on Cluster Computing and the Grid. IEEE Computer Society, May 2007, pp. 267--274.
[48]
K. Gao, W. keng Liao, A. Nisar, A. Choudhary, R. Ross, and R. Latham, "Using subfiling to improve programming flexibility and performance of parallel shared-file I/O," in Proc. ICPP 09, Vienna, Austria, Sep. 2009.
[49]
T. Blackwell, J. Harris, and M. Seltzer, "Heuristic cleaning algorithms in log-structured file systems," in Proceedings of the 1995 USENIX Technical Conference, Jan. 1995, pp. 277--288. {Online}. Available: http://das-www.harvard.edu/users/students/Trevor_Blackwell/Usenix95.html
[50]
B. M. Broom and R. Cohen, "Acacia: A distributed, parallel file system for the CAP-II," in Proceedings of the First Fujitsu-ANU CAP Workshop, Nov. 1990.
[51]
S. Carson and S. Setia, "Optimal write batch size in log-structured file systems," in Proceedings of the USENIX File Systems Workshop, May 1992, pp. 79--91.
[52]
F. Douglis and J. Ousterhout, "Log-structured file systems," in Proceedings of IEEE Compcon, Spring 1989, pp. 124--129, san Francisco, CA.
[53]
M. Rosenblum and J. K. Ousterhout, "The design and implementation of a log-structured file system," in Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles. Pacific Grove, CA: ACM Press, 1991, pp. 1--15.
[54]
J. Bent, G. A. Gibson, G. Grider, B. McClelland, P. Nowoczynski, J. Nunez, M. Polte, and M. Wingate, "Plfs: a checkpoint filesystem for parallel applications," in SC. ACM, 2009.
[55]
A. Guttman, "R-trees: a dynamic index structure for spatial searching," SIGMOD Rec., vol. 14, no. 2, pp. 47--57, Jun. 1984. {Online}. Available: http://doi.acm.org/10.1145/971697.602266
[56]
J. H. Chen, A. Choudhary, B. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo, "Terascale direct numerical simulations of turbulent combustion using S3D," Computational Science & Discovery, vol. 2, no. 1, p. 015001 (31pp), 2009. {Online}. Available: http://stacks.iop.org/1749-4699/2/015001
[57]
E. O"Neil, P. O"Neil, and K. Wu, "Bitmap index design choices and their performance implications," in Database Engineering and Applications Symposium, 2007. IDEAS 2007. 11th International, 2007, pp. 72--84.
[58]
R. A. Oldfield, D. E. Womble, and C. C. Ober, "Efficient parallel I/O in seismic imaging," The International Journal of High Performance Computing Applications, vol. 12, no. 3, pp. 333--344, Fall 1998. {Online}. Available: ftp://ftp.cs.dartmouth.edu/pub/raoldfi/salvo/salvoIO.ps.gz
[59]
H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng, "Datastager: scalable data staging services for petascale applications," in HPDC, D. Kranzlmüller, A. Bode, H.-G. Hegering, H. Casanova, and M. Gerndt, Eds. ACM, 2009, pp. 39--48.
[60]
D. Wallace and S. Sugiyama, "Data virtualization service," in Proceedings of Cray User's Group. Cray User's Group, 2008.
[61]
J. Fu, N. Liu, O. Sahni, K. E. Jansen, M. S. Shephard, and C. D. Carothers, "Scalable parallel i/o alternatives for massively parallel partitioned solver systems," in IPDPS Workshops, 2010, pp. 1--8.
[62]
N. S. S. Interface, "https://software.sandia.gov/trac/nessie/."
[63]
F. Zheng, H. Abbasi, C. Docan, J. Lofstead, S. Klasky, Q. Liu, M. Parashar, N. Podhorszki, K. Schwan, and M. Wolf, "PreDatA - preparatory data analytics on Peta-Scale machines," in In Proceedings of 24th IEEE International Parallel and Distributed Processing Symposium, April, Atlanta, Georgia, 2010.
[64]
C. Docan, M. Parashar, J. Cummings, and S. Klasky, "Moving the code to the data - dynamic code deployment using activespaces," in IPDPS. IEEE, 2011, pp. 758--769.
[65]
V. Vishwanath, M. Hereld, and M. Papka, "Toward simulation-time data analysis and i/o acceleration on leadership-class systems," in Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, oct. 2011, pp. 9--14.

Cited By

View all
  • (2023)KV-CSD: A Hardware-Accelerated Key-Value Store for Data-Intensive Applications2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00019(132-144)Online publication date: 31-Oct-2023
  • (2022)Domain-Specific Type-Safe APIs for Hierarchical Scientific Data with Modern C++Responsible Data Science10.1007/978-981-19-4453-6_14(191-204)Online publication date: 15-Nov-2022
  • (2021)DEISA: Dask-Enabled In Situ Analytics2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC53243.2021.00015(11-20)Online publication date: Dec-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
November 2013
1123 pages
ISBN:9781450323789
DOI:10.1145/2503210
  • General Chair:
  • William Gropp,
  • Program Chair:
  • Satoshi Matsuoka
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC13
Sponsor:

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)KV-CSD: A Hardware-Accelerated Key-Value Store for Data-Intensive Applications2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00019(132-144)Online publication date: 31-Oct-2023
  • (2022)Domain-Specific Type-Safe APIs for Hierarchical Scientific Data with Modern C++Responsible Data Science10.1007/978-981-19-4453-6_14(191-204)Online publication date: 15-Nov-2022
  • (2021)DEISA: Dask-Enabled In Situ Analytics2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC53243.2021.00015(11-20)Online publication date: Dec-2021
  • (2020)GekkoFS — A Temporary Burst Buffer File System for HPC ApplicationsJournal of Computer Science and Technology10.1007/s11390-020-9797-635:1(72-91)Online publication date: 17-Jan-2020
  • (2019)I/O Scheduling Strategy for Periodic ApplicationsACM Transactions on Parallel Computing10.1145/33385106:2(1-26)Online publication date: 23-Jul-2019
  • (2019)Understanding Data Motion in the Modern HPC Data Center2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)10.1109/PDSW49588.2019.00012(74-83)Online publication date: Nov-2019
  • (2019)NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging2019 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2019.8891014(1-12)Online publication date: Sep-2019
  • (2019)Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIYIEEE Access10.1109/ACCESS.2019.29498367(156929-156955)Online publication date: 2019
  • (2018)Scaling embedded in-situ indexing with deltaFSProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291660(1-15)Online publication date: 11-Nov-2018
  • (2018)A Checkpoint of Research on Parallel I/O for High-Performance ComputingACM Computing Surveys10.1145/315289151:2(1-35)Online publication date: 12-Mar-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media