Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3208040.3208049acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Public Access

Performance analysis and optimization of in-situ integration of simulation with data analysis: zipping applications up

Published: 11 June 2018 Publication History

Abstract

This paper targets an important class of applications that requires combining HPC simulations with data analysis for online or real-time scientific discovery. We use the state-of-the-art parallel-IO and data-staging libraries to build simulation-time data analysis workflows, and conduct performance analysis with real-world applications of computational fluid dynamics (CFD) simulations and molecular dynamics (MD) simulations. Driven by in-depth performance inefficiency analysis, we design an end-to-end application-level approach to eliminating the interlocks and synchronizations existent in the present methods. Our new approach employs both task parallelism and pipeline parallelism to reduce synchronizations effectively. In addition, we design a fully asynchronous, fine-grain, and pipelining runtime system, which is named Zipper. Zipper is a multi-threaded distributed runtime system and executes in a layer below the simulation and analysis applications. To further reduce the simulation application's stall time and enhance the data transfer performance, we design a concurrent data transfer optimization that uses both HPC network and parallel file system for improved bandwidth. The scalability of the Zipper system has been verified by a performance model and various empirical large scale experiments. The experimental results on an Intel multicore cluster as well as a Knight Landing HPC system demonstrate that the Zipper based approach can outperform the fastest state-of-the-art I/O transport library by up to 220% using 13,056 processor cores.

References

[1]
H. Abbasi, J. Lofstead, F. Zheng, K. Schwan, M. Wolf, and S. Klasky. 2009. Extending I/O through high performance data services. In IEEE International Conference on Cluster Computing and Workshops (CLUSTER'09). IEEE, 1--10.
[2]
H. Abbasi, M. Wolf, G. Eisenhauer, S. Klasky, K. Schwan, and F. Zheng. 2010. Datastager: Scalable data staging services for petascale applications. Cluster Computing 13, 3 (2010), 277--290.
[3]
Fatma Alali, Fabrice Mizero, Malathi Veeraraghavan, and John M Dennis. 2017. A measurement study of congestion in an InfiniBand network. In Network Traffic Measurement and Analysis Conference (TMA), 2017. IEEE, 1--9.
[4]
A. C. Bauer, H. Abbasi, J. Ahrens, H. Childs, B. Geveci, S. Klasky, K. Moreland, P. O'Leary, V. Vishwanath, B. Whitlock, et al. 2016. In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms. In Computer Graphics Forum, Vol. 35. Wiley Online Library, 577--597.
[5]
J.C. Bennett, H. Abbasi, P.T. Bremer, R. Grout, A. Gyulassy, T. Jin, et al. 2012. Combining in-situ and in-transit processing to enable extreme-scale scientific analysis. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for. IEEE, 1--9.
[6]
Mark S Birrittella, Mark Debbage, Ram Huggahalli, James Kunz, Tom Lovett, Todd Rimmer, Keith D Underwood, and Robert C Zak. 2015. Intel® Omni-path architecture: Enabling scalable, high performance fabrics. In The 23rd IEEE Annual Symposium on High-Performance Interconnects (HOTI). IEEE, 1--9.
[7]
J Chen, A Choudhary, S Feldman, B Hendrickson, CR Johnson, R Mount, V Sarkar, V White, and D Williams. 2013. Synergistic challenges in data-intensive science and exascale computing. DOE ASCAC Data Subcommittee Report, Department of Energy Office of Science (2013).
[8]
DataSpaces Project. 2018. http://dataspaces.org. (2018).
[9]
J. Dayal, D. Bratcher, G. Eisenhauer, K. Schwan, M. Wolf, X. Zhang, H. Abbasi, S. Klasky, and N. Podhorszki. 2014. Flexpath: Type-based publish/subscribe system for large-scale science analytics. In The 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, 246--255.
[10]
Ewa Deelman, Gurmeet Singh, Mei-Hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, Karan Vahi, G Bruce Berriman, John Good, et al. 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13, 3 (2005), 219--237.
[11]
Ciprian Docan, Manish Parashar, and Scott Klasky. 2012. DataSpaces: An interaction and coordination framework for coupled simulation workflows. Cluster Computing 15, 2 (2012), 163--181.
[12]
S Dormido-Canto, J Vega, JM Ramírez, A Murari, R Moreno, JM López, A Pereira, and JET-EFDA Contributors. 2013. Development of an efficient real-time disruption predictor from scratch on JET and implications for ITER. Nuclear Fusion 53, 11 (2013), 113001.
[13]
Matthieu Dreher and Tom Peterka. 2017. Decaf Decoupled dataflows for in situ high-performance workflows. Technical Report. Argonne National Lab.(ANL), Argonne, IL (United States).
[14]
Greg Eisenhauer, Matthew Wolf, Hasan Abbasi, and Karsten Schwan. 2009. Event-based systems: opportunities and challenges at exascale. In Proceedings of the Third ACM International Conference on Distributed Event-Based Systems. ACM.
[15]
N. Fabian, K. Moreland, D. Thompson, A. C. Bauer, P. Marion, B. Gevecik, M. Rasquin, and K. E. Jansen. 2011. The Paraview coprocessing library: A scalable, general purpose in situ visualization library. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on. IEEE, 89--96.
[16]
Geoffrey Fox, Judy Qiu, Shantenu Jha, Saliya Ekanayake, and Supun Kamburugamuve. 2016. Big data, simulations and HPC convergence. In Workshop on Big Data Benchmarks. Springer, 3--17.
[17]
Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon, C. Goble, M. Livny, L. Moreau, and J. Myers. 2007. Examining the Challenges of Scientific Workflows. Computer 40, 12 (Dec 2007), 24--32.
[18]
E. Gonsiorowski, C. D. Carothers, J. LaPre, P. Heidelberger, C. Minkenberg, and G. Rodriguez. 2017. Using quality of service lanes to control the impact of RAID traffic within a burst buffer. In 2017 Winter Simulation Conference (WSC). 932--943.
[19]
William Gropp, Ewing Lusk, and Rajeev Thakur. 1999. Using MPI-2: Advanced features of the message-passing interface. MIT press.
[20]
Zhaoli Guo and Chang Shu. 2013. Lattice Boltzmann method and its applications in engineering. World Scientific.
[21]
Huateng Huang and L Lacey Knowles. 2014. Unforeseen consequences of excluding missing data from next-gene ration sequences: Simulation study of RAD sequences. Systematic biology (2014), 1--9.
[22]
Intel. 2015. Intel Omni-Path Fabric Suite Fabric Manager GUI User Guide.
[23]
Intel. 2018. Intel Trace Analyzer and Collector. (2018). https://software.intel.com/en-us/intel-trace-analyzer
[24]
Selim Kalayci, Gargi Dasgupta, Liana Fong, Onyeka Ezenwoye, and Seyed Masoud Sadjadi. 2010. Distributed and Adaptive Execution of Condor DAGMan Workflows. In SEKE. 587--590.
[25]
Bongjae Kim and Jeong-Dong Kim. 2017. Dynamic QoS Scheme for InfiniBand-Based Clusters. In Advances in Computer Science and Ubiquitous Computing, James J. (Jong Hyuk) Park, Yi Pan, Gangman Yi, and Vincenzo Loia (Eds.). Springer Singapore, Singapore, 573--578.
[26]
Feng Li and Fengguang Song. 2017. A Real-Time Machine Learning and Visualization Framework for Scientific Workflows. In Practice & Experience in Advanced Research Computing (PEARC-2017). ACM, New Orleans, LA.
[27]
Qing Liu, Jeremy Logan, Yuan Tian, Hasan Abbasi, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Roselyne Tchoua, Jay Lofstead, and Ron Oldfield. 2014. Hello ADIOS: The challenges and lessons of developing leadership class I/O frameworks. Concurrency and Computation: Practice and Experience 26, 7 (2014), 1453--1473.
[28]
Bertram Ludäscher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A Lee, Jing Tao, and Yang Zhao. 2006. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience 18, 10 (2006), 1039--1065.
[29]
John L Lumley. 2007. Stochastic tools in turbulence. Courier Corporation.
[30]
Ewing L Lusk, Steve C Pieper, Ralph M Butler, et al. 2010. More scalability, less pain: A simple programming model and its implementation for extreme computing. SciDAC Review 17, 1 (2010), 30--37.
[31]
Kwan-Liu Ma. 2009. In situ visualization at extreme scale: Challenges and opportunities. Computer Graphics and Applications, IEEE 29, 6 (2009), 14--19.
[32]
Shahrbanou Madadgar, Hamid Moradkhani, and David Garen. 2014. Towards improved post-processing of hydrologic forecast ensembles. Hydrological Processes 28, 1 (2014), 104--122.
[33]
Takemasa Miyoshi, Masaru Kunii, Juan Ruiz, Guo-Yuan Lien, Shinsuke Satoh, Tomoo Ushio, Kotaro Bessho, Hiromu Seko, Hirofumi Tomita, and Yutaka Ishikawa. 2016. "Big Data Assimilation" revolutionizing severe weather prediction. Bulletin of the American Meteorological Society 97, 8 (2016), 1347--1354.
[34]
P. Nagar, F. Song, L. Zhu, and L. Lin. 2015. LBM-IB: A Parallel Library to Solve 3D Fluid-Structure Interaction Problems on Manycore Systems. In Proceedings of the 2015 International Conference on Parallel Processing (ICPP'15). IEEE.
[35]
National Academies of Sciences, Engineering, and Medicine. 2016. Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017--2020. Washington, DC: The National Academies Press.
[36]
PAPI project. 2018. http://icl.utk.edu/papi/. (2018).
[37]
Daniel A Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56--68.
[38]
S-A Reinemo, Tor Skeie, Thomas Sodring, Olav Lysne, and O Trudbakken. 2006. An overview of QoS capabilities in InfiniBand, advanced switching interconnect, and Ethernet. IEEE Communications Magazine 44, 7 (2006), 32--38.
[39]
Denis Ricot, Virginie Maillard, and Christophe Bailly. 2002. Numerical simulation of unsteady cavity flow using Lattice Boltzmann Method. In 8th AIAA/CEAS Aeroacoustics Conference & Exhibit. 2532.
[40]
Joerg Schumacher. 2001. Derivative moments in stationary homogeneous shear turbulence, Journal of Fluid Mechanics 441 (2001), 109--118.
[41]
Sameer S Shende and Allen D Malony. 2006. The TAU parallel performance system. The International Journal of High Performance Computing Applications 20, 2 (2006), 287--311.
[42]
Angela B Shiflet and George W Shiflet. 2014. Introduction to computational science: Modeling and simulation for the sciences. Princeton University Press.
[43]
Q. Sun, M. Romanus, T. Jin, H. Yu, P. Bremer, S. Petruzza, S. Klasky, and M. Parashar. 2016. In-staging data placement for asynchronous coupling of task-based scientific workflows. In International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). IEEE, 2--9.
[44]
Alexander S Szalay. 2013. From Large Simulations to Interactive Numerical Laboratories. IEEE Data Eng. Bull. 36, 4 (2013), 41--53.
[45]
Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. On implementing MPI-IO portably and with high performance. In Proceedings of the sixth workshop on I/O in parallel and distributed systems. ACM, 23--32.
[46]
V. Vishwanath, M. Hereld, M.E. Papka, R. Hudson, G.C. Jordan IV, and C Daley. 2011. In Situ Data Analysis and I/O Acceleration of FLASH Astrophysics Simulation on Leadership-Class System Using GLEAN. In Proc. SciDAC, Journal of Physics: Conference Series.
[47]
VisIt. 2018. https://visit.llnl.gov. (2018).
[48]
K. Wolstencroft, R. Haines, D. Fellows, A. Williams, D. Withers, S. Owen, S. Soiland-Reyes, I. Dunlop, A. Nenadic, Paul Fisher, et al. 2013. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic acids research 41, W1 (2013), W557--W561.
[49]
Justin M Wozniak, Timothy G Armstrong, Michael Wilde, Daniel S Katz, Ewing Lusk, and Ian T Foster. 2013. Swift/T: Large-scale application composition via distributed-memory dataflow processing. In 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). 95--102.
[50]
Fan Zhang. 2015. Programming and runtime support for enabling data-intensive coupled scientific simulation workflows (Phd dissertation). Rutgers The State University of New Jersey-New Brunswick.
[51]
Fang Zheng, Hasan Abbasi, Ciprian Docan, Jay Lofstead, Qing Liu, Scott Klasky, Manish Parashar, Norbert Podhorszki, Karsten Schwan, and Matthew Wolf. 2010. PreDatA-preparatory data analytics on peta-scale machines. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 1--12.
[52]
F. Zheng, H. Zou, G. Eisenhauer, K. Schwan, M. Wolf, J. Dayal, T.A. Nguyen, J. Cao, H. Abbasi, and S. Klasky. 2013. FlexIO: I/O middleware for location-flexible scientific data analytics. In IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 320--331.
[53]
Luoding Zhu, Derek Tretheway, Linda Petzold, and Carl Meinhart. 2005. Simulation of fluid slip at 3D hydrophobic microchannel walls by the lattice Boltzmann method. J. Comput. Phys. 202, 1 (2005), 181--195.

Cited By

View all
  1. Performance analysis and optimization of in-situ integration of simulation with data analysis: zipping applications up

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      HPDC '18: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing
      June 2018
      291 pages
      ISBN:9781450357852
      DOI:10.1145/3208040
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 June 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. high performance computing
      2. in-situ/in-transit workflows
      3. performance analysis and optimization

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      HPDC '18

      Acceptance Rates

      HPDC '18 Paper Acceptance Rate 22 of 111 submissions, 20%;
      Overall Acceptance Rate 166 of 966 submissions, 17%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)74
      • Downloads (Last 6 weeks)15
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)INSTANT: A Runtime Framework to Orchestrate In-Situ WorkflowsEuro-Par 2023: Parallel Processing10.1007/978-3-031-39698-4_14(199-213)Online publication date: 24-Aug-2023
      • (2022)Identifying Challenges and Opportunities of In-Memory Computing on Large HPC SystemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.02.002Online publication date: Feb-2022
      • (2021)X-composerProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3468267.3470621(1-10)Online publication date: 5-Jul-2021
      • (2021)Bootstrapping in-situ workflow auto-tuning via combining performance models of component applicationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476197(1-15)Online publication date: 13-Nov-2021
      • (2020)A Comprehensive Study of In-Memory Computing on Large HPC Systems2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS47774.2020.00045(987-997)Online publication date: Nov-2020
      • (2020)Towards autonomic data management for staging-based coupled scientific workflowsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2020.07.002Online publication date: Jul-2020

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media