default search action
SC 2019: Denver, CO, USA
- Michela Taufer, Pavan Balaji, Antonio J. Peña:
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, Denver, Colorado, USA, November 17-19, 2019. ACM 2019, ISBN 978-1-4503-6229-0
ACM Gordon Bell finalists
- Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler:
A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations. 1:1-1:13 - Sambit Das, Phani Motamarri, Vikram Gavini, Bruno Turcksin, Ying Wai Li, Brent Leback:
Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing: 46 PFLOPS simulation of a metallic dislocation system. 2:1-2:11
Technical papers: Better data systems via better data structures
- Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, Haikun Liu, Yicheng Chen:
GraphM: an efficient storage system for high throughput of concurrent graph processing. 3:1-3:14 - HyeongSik Kim, Abhisha Bhattacharyya, Kemafor Anyanwu:
Semantic query transformations for increased parallelization in distributed knowledge graph query processing. 4:1-4:14 - Wei Zhang, Suren Byna, Houjun Tang, Brody Williams, Yong Chen:
MIQS: metadata indexing and querying service for self-describing file formats. 5:1-5:24
Technical papers: Computational fluid dynamics
- Libin Lu, Matthew J. Morse, Abtin Rahimian, Georg Stadler, Denis Zorin:
Scalable simulation of realistic volume fraction red blood cell flows through vascular networks. 6:1-6:30 - Wenqian Dong, Jie Liu, Zhen Xie, Dong Li:
Adaptive neural network-based approximation to accelerate eulerian fluid simulation. 7:1-7:22 - Kiran Ravikumar, David Appelhans, P. K. Yeung:
GPU acceleration of extreme scale pseudo-spectral simulations of turbulence using asynchronism. 8:1-8:22
Technical papers: Machine learning training
- Yang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh:
Large-batch training for LSTM and beyond. 9:1-9:16 - Nikoli Dryden, Naoya Maruyama, Tim Moon, Tom Benson, Marc Snir, Brian Van Essen:
Channel and filter parallelism for large-scale CNN training. 10:1-10:20 - Cédric Renggli, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan Alistarh, Torsten Hoefler:
SparCML: high-performance sparse communication for machine learning. 11:1-11:15
Technical papers: Cloud scheduling
- Xiongchao Tang, Haojie Wang, Xiaosong Ma, Nosayba El-Sayed, Jidong Zhai, Wenguang Chen, Ashraf Aboulnaga:
Spread-n-share: improving application performance and cluster throughput with resource-aware job placement. 12:1-12:15 - Heyang Qin, Syed Zawad, Yanqi Zhou, Lei Yang, Dongfang Zhao, Feng Yan:
Swift machine learning model serving scheduling: a region based reinforcement learning approach. 13:1-13:23 - Krishna Giri Narra, Zhifeng Lin, Mehrdad Kiamari, Salman Avestimehr, Murali Annavaram:
Slack squeeze coded computing for adaptive straggler mitigation. 14:1-14:16
Technical papers: High radix routing
- Nic McDonald, Mikhail Isaev, Adriana Flores, Al Davis, John Kim:
Practical and efficient incremental adaptive routing for HyperX networks. 15:1-15:13 - Daniele De Sensi, Salvatore Di Girolamo, Torsten Hoefler:
Mitigating network noise on Dragonfly networks through application-aware routing. 16:1-16:32 - Md. Shafayat Rahman, Saptarshi Bhowmik, Yevgeniy Ryasnianskiy, Xin Yuan, Michael Lang:
Topology-custom UGAL routing on dragonfly. 17:1-17:15
Technical papers: Performance tools
- Muhammad Aditya Sasongko, Milind Chabbi, Palwisha Akhtar, Didem Unat:
ComDetective: a lightweight communication detection tool for threads. 18:1-18:21 - Pengfei Su, Shuyin Jiao, Milind Chabbi, Xu Liu:
Pinpointing performance inefficiencies via lightweight variance profiling. 19:1-19:19 - Abhinav Bhatele, Stephanie Brink, Todd Gamblin:
Hatchet: pruning the overgrowth in parallel profiles. 20:1-20:21
Technical papers: Frameworks & tools
- Benjamin Welton, Barton P. Miller:
Diogenes: looking for an honest CPU/GPU performance measurement tool. 21:1-21:20 - Nikhil Hegde, Qifan Chang, Milind Kulkarni:
D2P: from recursive formulations to distributed-memory codes. 22:1-22:22 - Michael Bauer, Michael Garland:
Legate NumPy: accelerated and distributed array computing. 23:1-23:23
Technical papers: Linear algebra algorithms
- Grzegorz Kwasniewski, Marko Kabic, Maciej Besta, Joost VandeVondele, Raffaele Solcà, Torsten Hoefler:
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication. 24:1-24:22 - Zhihao Li, Haipeng Jia, Yunquan Zhang, Tun Chen, Liang Yuan, Luning Cao, Xiao Wang:
AutoFFT: a template-based FFT codes auto-generation framework for ARM and X86 CPUs. 25:1-25:15 - Mark Gates, Jakub Kurzak, Ali Charara, Asim YarKhan, Jack J. Dongarra:
SLATE: design of a modern distributed and accelerated linear algebra library. 26:1-26:18
Technical papers: Power and scale
- Neha Gholkar, Frank Mueller, Barry Rountree:
Uncore power scavenger: a runtime for uncore power conservation on HPC systems. 27:1-27:23 - Huazhe Zhang, Henry Hoffmann:
PoDD: power-capping dependent distributed applications. 28:1-28:23 - Atilim Günes Baydin, Lei Shao, Wahid Bhimji, Lukas Heinrich, Lawrence Meadows, Jialin Liu, Andreas Munk, Saeid Naderiparizi, Bradley Gram-Hansen, Gilles Louppe, Mingfei Ma, Xiaohui Zhao, Philip H. S. Torr, Victor W. Lee, Kyle Cranmer, Prabhat, Frank Wood:
Etalumis: bringing probabilistic programming to scientific simulators at scale. 29:1-29:24
Technical papers: State of the practice
- David Jauk, Dai Yang, Martin Schulz:
Predicting faults in high performance computing systems: an in-depth survey of the state-of-the-practice. 30:1-30:13 - Ignacio Laguna, Ryan J. Marshall, Kathryn M. Mohror, Martin Ruefenacht, Anthony Skjellum, Nawrin Sultana:
A large-scale study of MPI usage in open-source HPC applications. 31:1-31:14 - Ian Karlin, Yoonho Park, Bronis R. de Supinski, Peng Wang, Bert Still, David Beckingsale, Robert Blake, Tong Chen, Guojing Cong, Carlos H. A. Costa, Johann Dahm, Giacomo Domeniconi, Thomas Epperly, Aaron Fisher, Sara Kokkila Schumacher, Steven H. Langer, Hai Le, Eun Kyung Lee, Naoya Maruyama, Xinyu Que, David F. Richards, Björn Sjögreen, Jonathan Wong, Carol S. Woodward, Ulrike Meier Yang, Xiaohua Zhang, Bob Anderson, David Appelhans, Levi Barnes, Peter D. Barnes Jr., Sorin Bastea, David Böhme, Jamie A. Bramwell, James M. Brase, José R. Brunheroto, Barry Chen, Charway R. Cooper, Tony Degroot, Robert D. Falgout, Todd Gamblin, David J. Gardner, James N. Glosli, John A. Gunnels, Max P. Katz, Tzanio V. Kolev, I-Feng W. Kuo, Matthew P. LeGendre, Ruipeng Li, Pei-Hung Lin, Shelby Lockhart, Kathleen McCandless, Claudia Misale, Jaime H. Moreno, Rob Neely, Jarom Nelson, Rao Nimmakayala, Kathryn M. O'Brien, Kevin O'Brien, Ramesh Pankajakshan, Roger Pearce, Slaven Peles, Phil Regier, Steven C. Rennich, Martin Schulz, Howard Scott, James C. Sexton, Kathleen Shoga, Shiv Sundram, Guillaume Thomas-Collignon, Brian Van Essen, Alexey Voronin, Bob Walkup, Lu Wang, Chris Ward, Hui-Fang Wen, Daniel A. White, Christopher Young, Cyril Zeller, Edward Zywicz:
Preparation and optimization of a diverse workload for a large-scale heterogeneous system. 32:1-32:17
Technical papers: Compression
- Xin Liang, Sheng Di, Sihuan Li, Dingwen Tao, Bogdan Nicolae, Zizhong Chen, Franck Cappello:
Significantly improving lossy compression quality based on an optimized hybrid prediction model. 33:1-33:26 - Madhurima Vardhan, John Gounley, Luiz Hegele, Erik W. Draeger, Amanda Randles:
Moment representation in the lattice Boltzmann method on massively parallel hardware. 34:1-34:21 - Maciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, Torsten Hoefler:
Slim graph: practical lossy graph compression for approximate graph processing, storage, and analytics. 35:1-35:25
Technical papers: Machine learning optimization
- Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, Mattan Erez:
PruneTrain: fast neural network training by dynamic sparse model reconfiguration. 36:1-36:13 - Prasanna Balaprakash, Romain Egele, Misha Salim, Stefan M. Wild, Venkatram Vishwanath, Fangfang Xia, Tom Brettin, Rick Stevens:
Scalable reinforcement-learning-based neural architecture search for cancer deep learning research. 37:1-37:33 - Ang Li, Tong Geng, Tianqi Wang, Martin C. Herbordt, Shuaiwen Leon Song, Kevin J. Barker:
BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets. 38:1-38:30
Technical papers: Network evaluation
- Christopher Zimmer, Scott Atchley, Ramesh Pankajakshan, Brian E. Smith, Ian Karlin, Matthew L. Leininger, Adam Bertsch, Brian S. Ryujin, Jason Burmark, André Walker-Loud, Michael A. Clark, Olga Pearce:
An evaluation of the CORAL interconnects. 39:1-39:18 - Jens Domke, Satoshi Matsuoka, Ivan R. Ivanov, Yuki Tsushima, Tomoya Yuki, Akihiro Nomura, Shin'ichi Miura, Nic McDonald, Dennis Lee Floyd, Nicolas Dubé:
HyperX topology: first at-scale implementation and comparison to the fat-tree. 40:1-40:23 - George Michelogiannakis, Yiwen Shen, Min Yee Teh, Xiang Meng, Benjamin Aivazi, Taylor L. Groves, John Shalf, Madeleine Glick, Manya Ghobadi, Larry Dennison, Keren Bergman:
Bandwidth steering in HPC using silicon nanophotonics. 41:1-41:25
Technical papers: Network congestion and offload
- Sudheer Chunduri, Taylor L. Groves, Peter Mendygral, Brian Austin, Jacob Balma, Krishna Kandalla, Kalyan Kumaran, Glenn K. Lockwood, Scott Parker, Steven Warren, Nathan Wichmann, Nicholas J. Wright:
GPCNeT: designing a benchmark suite for inducing and measuring contention in HPC networks. 42:1-42:33 - Philip Taffet, John M. Mellor-Crummey:
Understanding congestion in high performance interconnection networks using sampling. 43:1-43:24 - Haiyang Shi, Xiaoyi Lu:
TriEC: tripartite graph based erasure coding NIC offload. 44:1-44:34
Technical papers: Partitioning & scheduling
- Wonchan Lee, Manolis Papadakis, Elliott Slaughter, Alex Aiken:
A constraint-based approach to automatic data partitioning for distributed memory execution. 45:1-45:24 - Serif Yesil, Azin Heidarshenas, Adam Morrison, Josep Torrellas:
Understanding priority-based scheduling of graph algorithms on a shared-memory platform. 46:1-46:14 - Shumpei Shiina, Kenjiro Taura:
Almost deterministic work stealing. 47:1-47:16
Technical papers: Sparse computations
- Athena Elafrou, Georgios I. Goumas, Nectarios Koziris:
Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures. 48:1-48:15 - Israt Nisa, Jiajia Li, Aravind Sukumaran-Rajam, Prashant Singh Rawat, Sriram Krishnamoorthy, P. Sadayappan:
An efficient mixed-mode representation of sparse tensors. 49:1-49:25 - Oguz Selvitopi, Cevdet Aykanat:
Regularizing irregularly sparse point-to-point communications. 50:1-50:14
Technical papers: GPU
- Lingda Li, Barbara M. Chapman:
Compiler assisted hybrid implicit and explicit GPU memory management under unified address space. 51:1-51:16 - Tuowen Zhao, Protonu Basu, Samuel Williams, Mary W. Hall, Hans Johansen:
Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs. 52:1-52:44 - Peng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka:
A versatile software systolic execution model for GPU memory-bound kernels. 53:1-53:81
Technical papers: Network and memory specialization
- Whit Schonbein, Ryan E. Grant, Matthew G. F. Dosanjh, Dorian C. Arnold:
INCA: in-network compute assistance. 54:1-54:13 - Daichi Fujiki, Niladrish Chatterjee, Donghyuk Lee, Mike O'Connor:
Near-memory data transformation for efficient sparse matrix multi-vector multiplication. 55:1-55:17 - Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej Besta, Luca Benini, Duncan Roweth, Torsten Hoefler:
Network-accelerated non-contiguous memory transfers. 56:1-56:14
Technical papers: Software infrastructures for applications
- Francesco Di Natale, Harsh Bhatia, Timothy S. Carpenter, Chris Neale, Sara Kokkila Schumacher, Tomas Oppelstrup, Liam Stanton, Xiaohua Zhang, Shiv Sundram, Thomas R. W. Scogland, Gautham Dharuman, Michael P. Surh, Yue Yang, Claudia Misale, Lars Schneidenbach, Carlos H. A. Costa, Changhoan Kim, Bruce D'Amora, Sandrasegaram Gnanakaran, Dwight V. Nissley, Frederick H. Streitz, Felice C. Lightstone, Peer-Timo Bremer, James N. Glosli, Helgi I. Ingólfsson:
A massively parallel infrastructure for adaptive multiscale simulations: modeling RAS initiation pathway for cancer. 57:1-57:16 - Chao Chen, Greg Eisenhauer, Santosh Pande, Qiang Guan:
CARE: compiler-assisted recovery from soft failures. 58:1-58:23 - Martin Bauer, Johannes Hötzer, Dominik Ernst, Julian Hammer, Marco Seiz, Henrik Hierl, Jan Hönig, Harald Köstler, Gerhard Wellein, Britta Nestler, Ulrich Rüde:
Code generation for massively parallel phase-field simulations. 59:1-59:32
Technical papers: Algorithmic techniques for large-scale applications
- Arnur Nigmetov, Dmitriy Morozov:
Local-global merge tree computation with local exchanges. 60:1-60:13 - Masado Ishii, Milinda Fernando, Kumar Saurabh, Biswajit Khara, Baskar Ganapathysubramanian, Hari Sundar:
Solving PDEs in space-time: 4D tree-based adaptivity, mesh-free and matrix-free approaches. 61:1-61:61 - Gregor Daiß, Parsa Amini, John Biddiscombe, Patrick Diehl, Juhan Frank, Kevin A. Huck, Hartmut Kaiser, Dominic Marcello, David Pfander, Dirk Pflüger:
From piz daint to the stars: simulation of stellar mergers using high-level abstractions. 62:1-62:37
Technical papers: Improved performance through monitoring and fine-tuned orchestration
- Sarp Oral, Sudharshan S. Vazhkudai, Feiyi Wang, Christopher Zimmer, Christopher Brumgard, Jesse Hanley, George Markomanolis, Ross G. Miller, Dustin Leverman, Scott Atchley, Verónica G. Vergara Larrea:
End-to-end I/O portfolio for the summit supercomputing ecosystem. 63:1-63:14 - Alessio Netti, Micha Müller, Axel Auweter, Carla Guillén, Michael Ott, Daniele Tafani, Martin Schulz:
From facility to application sensor data: modular, continuous and holistic monitoring with DCDB. 64:1-64:27 - Tirthak Patel, Suren Byna, Glenn K. Lockwood, Devesh Tiwari:
Revisiting I/O behavior in large-scale storage systems: the expected and the unexpected. 65:1-65:13
Technical papers: Molecular dynamics
- Tingjian Zhang, Yuxuan Li, Ping Gao, Qi Shao, Mingshan Shao, Meng Zhang, Jinxiao Zhang, Xiaohui Duan, Zhao Liu, Lin Gan, Haohuan Fu, Wei Xue, Weiguo Liu, Guangwen Yang:
SW_GROMACS: accelerate GROMACS on Sunway TaihuLight. 66:1-66:14 - Chen Yang, Tong Geng, Tianqi Wang, Rushi Patel, Qingqing Xiong, Ahmed Sanaullah, Chunshu Wu, Jiayi Sheng, Charles Lin, Vipin Sachdeva, Woody Sherman, Martin C. Herbordt:
Fully integrated FPGA molecular dynamics simulations. 67:1-67:31 - Kun Li, Honghui Shang, Yunquan Zhang, Shigang Li, Baodong Wu, Dong Wang, Libo Zhang, Fang Li, Dexun Chen, Zhiqiang Wei:
OpenKMC: a KMC design for hundred-billion-atom simulation using millions of cores on Sunway Taihulight. 68:1-68:16
Technical papers: Resilience and fault injection
- Zitao Chen, Guanpeng Li, Karthik Pattabiraman, Nathan DeBardeleben:
BinFI: an efficient fault injector for safety-critical machine learning systems. 69:1-69:23 - Chun-Kai Chang, Wenqi Yin, Mattan Erez:
Assessing the impact of timing errors on HPC applications. 70:1-70:19 - Sihuan Li, Hongbo Li, Xin Liang, Jieyang Chen, Elisabeth Giem, Kaiming Ouyang, Kai Zhao, Sheng Di, Franck Cappello, Zizhong Chen:
FT-iSort: efficient fault tolerance for introsort. 71:1-71:17
Technical papers: Graph and tensor computations
- Patrick Flick, Srinivas Aluru:
Distributed enhanced suffix arrays: efficient algorithms for construction and querying. 72:1-72:17 - George M. Slota, Jonathan W. Berry, Simon D. Hammond, Stephen L. Olivier, Cynthia A. Phillips, Sivasankaran Rajamanickam:
Scalable generation of graphs for benchmarking HPC community-detection algorithms. 73:1-73:14 - Rui Li, Aravind Sukumaran-Rajam, Richard Veras, Tze Meng Low, Fabrice Rastello, Atanas Rountev, P. Sadayappan:
Analytical cache modeling and tilesize optimization for tensor contractions. 74:1-74:13
Technical papers: Improving next-generation performance and resilience
- Jacob Alter, Ji Xue, Alma Dimnaku, Evgenia Smirni:
SSD failures in the field: symptoms, causes, and prediction models. 75:1-75:14 - Michèle Weiland, Holger Brunst, Tiago Quintino, Nick Johnson, Olivier Iffrig, Simon D. Smart, Christian Herold, Antonino Bonanni, Adrian Jackson, Mark Parsons:
An early evaluation of Intel's optane DC persistent memory module and its impact on high-performance scientific applications. 76:1-76:19 - Tapasya Patki, Jayaraman J. Thiagarajan, Alexis Ayala, Tanzima Z. Islam:
Performance optimality or reproducibility: that is the question. 77:1-77:30
Technical papers: Quantum applications
- Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier, Torsten Hoefler:
Optimizing the data movement in quantum transport simulations via data-centric parallel programming. 78:1-78:17 - Weile Jia, Lin-Wang Wang, Lin Lin:
Parallel transport time-dependent density functional theory calculations with hybrid functional on summit. 79:1-79:23 - Xin-Chuan Wu, Sheng Di, Emma Maitreyee Dasgupta, Franck Cappello, Hal Finkel, Yuri Alexeev, Frederic T. Chong:
Full-state quantum circuit simulation by using data compression. 80:1-80:24
Technical papers: Heterogeneous systems
- Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler:
Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures. 81:1-81:14 - Tiziano De Matteis, Johannes de Fine Licht, Jakub Beránek, Torsten Hoefler:
Streaming message interface: high-performance distributed memory programming on reconfigurable hardware. 82:1-82:33 - Kun Yang, Yi-Fan Chen, Georgios Roumpos, Chris Colby, John R. Anderson:
High performance Monte Carlo simulation of ising model on TPU clusters. 83:1-83:15
Technical papers: Image reconstruction
- Peng Chen, Mohamed Wahib, Shin'ichiro Takizawa, Ryousei Takano, Satoshi Matsuoka:
iFDK: a scalable framework for instant high-resolution image reconstruction. 84:1-84:24 - Mert Hidayetoglu, Tekin Biçer, Simon Garcia De Gonzalo, Bin Ren, Doga Gürsoy, Rajkumar Kettimuthu, Ian T. Foster, Wen-mei W. Hwu:
MemXCT: memory-centric X-ray CT reconstruction with massive parallelization. 85:1-85:56 - Xiao Wang, Venkatesh Sridhar, Zahra Ronaghi, Rollin C. Thomas, Jack Deslippe, Dilworth Parkinson, Gregery T. Buzzard, Samuel P. Midkiff, Charles A. Bouman, Simon K. Warfield:
Consensus equilibrium framework for super-resolution and extreme-scale CT reconstruction. 86:1-86:23
Technical papers: The fewer tiers, the fewer tears
- Shaohua Duan, Pradeep Subedi, Philip E. Davis, Manish Parashar:
Addressing data resiliency for staging based scientific workflows. 87:1-87:22 - Yingjin Qian, Xi Li, Shuichi Ihara, Andreas Dilger, Carlos Thomaz, Shilong Wang, Wen Cheng, Chunyan Li, Lingfang Zeng, Fang Wang, Dan Feng, Tim Süß, André Brinkmann:
LPCC: hierarchical persistent client caching for lustre. 88:1-88:14 - Anne Benoit, Thomas Hérault, Valentin Le Fèvre, Yves Robert:
Replication is more efficient than you think. 89:1-89:14
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.