default search action
ICS 2023: Orlando, FL, USA
- Kyle A. Gallivan, Efstratios Gallopoulos, Dimitrios S. Nikolopoulos, Ramón Beivide:
Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023. ACM 2023
Best Papers
- Jinyang Liu, Sheng Di, Kai Zhao, Xin Liang, Zizhong Chen, Franck Cappello:
FAZ: A flexible auto-tuned modular error-bounded compression framework for scientific data. 1-13 - Neil Lindquist, Piotr Luszczek, Jack J. Dongarra:
Using Additive Modifications in LU Factorization Instead of Pivoting. 14-24 - Jun Xiao, Yaocheng Xiang, Xiaolin Wang, Yingwei Luo, Andy D. Pimentel, Zhenlin Wang:
FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance. 25-36
Compilation and Scheduling
- Thomas Randall, Jaehoon Koo, Brice Videau, Michael Kruse, Xingfu Wu, Paul D. Hovland, Mary W. Hall, Rong Ge, Prasanna Balaprakash:
Transfer-learning-based Autotuning using Gaussian Copula. 37-49 - Lukas Trümper, Tal Ben-Nun, Philipp Schaad, Alexandru Calotoiu, Torsten Hoefler:
Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization. 50-62 - Xu Wen, Wanling Gao, Anzheng Li, Lei Wang, Zihan Jiang, Jianfeng Zhan:
CMLCompiler: A Unified Compiler for Classical Machine Learning. 63-74 - Pu Pang, Yaoxuan Li, Bo Liu, Quan Chen, Zhou Yu, Zhibin Yu, Deze Zeng, Jingwen Leng, Jieru Zhao, Minyi Guo:
PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization. 75-86
Tools and Libraries
- Kelun Lei, Xin You, Hailong Yang, Zhongzhi Luan, Depei Qian:
BiRFIA: Selective Binary Rewriting for Function Interception on ARM. 87-98 - Milan Shah, Xiaodong Yu, Sheng Di, Michela Becchi, Franck Cappello:
Lightweight Huffman Coding for Efficient GPU Compression. 99-110 - RuQing G. Xu, Field G. Van Zee, Robert A. van de Geijn:
Towards a Unified Implementation of GEMM in BLIS. 111-121
I/O and Storage
- Md. Arifuzzaman, Engin Arslan:
Use Only What You Need: Judicious Parallelism For File Transfers in High Performance Networks. 122-132 - Meghana Madhyastha, Robert Underwood, Randal C. Burns, Bogdan Nicolae:
DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access. 133-143 - Amelie Chi Zhou, Zhoubin Ke, Jianming Lao:
DyVer: Dynamic Version Handling for Array Databases. 144-154
Accelerator Programming I
- Minh Pham, Yicheng Tu, Xiaoyi Lv:
Accelerating BWA-MEM Read Mapping on GPUs. 155-166 - Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications. 167-179 - Marcelo Orenes-Vera, Ilya Sharapov, Robert Schreiber, Mathias Jacquelin, Philippe Vandermersch, Sharan Chetlur:
Wafer-Scale Fast Fourier Transforms. 180-191 - Ismayil Ismayilov, Javid Baydamirli, Dogan Sagbili, Mohamed Wahib, Didem Unat:
Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge. 192-202
Large Scale Applications I
- Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele:
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training. 203-214 - Han D. Tran, Siddharth Saurav, P. Sadayappan, Sandip Mazumder, Hari Sundar:
Scalable parallelization for the solution of phonon Boltzmann Transport Equation. 215-226 - Xiaojian Yang, Shengguo Li, Fan Yuan, Dezun Dong, Chun Huang, Zheng Wang:
Optimizing Multi-grid Computation and Parallelization on Multi-cores. 227-239 - Xinbiao Gan, Guang Wu, Ruigeng Zeng, Jiaqi Si, Ji Liu, Daxiang Dong, Chunye Gong, Cong Liu, Tiejun Li:
FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing. 240-250
Accelerator Programming II
- Lingqi Zhang, Mohamed Wahib, Peng Chen, Jintao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka:
Revisiting Temporal Blocking Stencil Optimizations. 251-263 - Jou-An Chen, Hsin-Hsuan Sung, Xipeng Shen, Sutanay Choudhury, Ang Li:
BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs. 264-276 - Shaofeng Yang, Xiandong Liu, Yunting Wang, Xin He, Guangming Tan:
Fast All-Pairs Shortest Paths Algorithm in Large Sparse Graph. 277-288 - Vani Nagarajan, Durga Mandarapu, Milind Kulkarni:
RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search. 289-300
Large Scale Applications II
- Srinivas Eswar, Benjamin Cobb, Koby Hayashi, Ramakrishnan Kannan, Grey Ballard, Richard W. Vuduc, Haesun Park:
Distributed-Memory Parallel JointNMF. 301-312 - Yu Chen, Lucca Skon, James R. McCombs, Zhenming Liu, Andreas Stathopoulos:
Parallel Software for Million-scale Exact Kernel Regression. 313-323 - Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao:
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. 324-335 - Anqi Guo, Yuchen Hao, Chunshu Wu, Pouya Haghi, Zhenyu Pan, Min Si, Dingwen Tao, Ang Li, Martin C. Herbordt, Tong Geng:
Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training. 336-347
Architecture and Interconnects
- Boyuan Zhang, Jiannan Tian, Sheng Di, Xiaodong Yu, Martin Swany, Dingwen Tao, Franck Cappello:
GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs. 348-359 - Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen:
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs. 360-372 - Marcin Copik, Roman Böhringer, Alexandru Calotoiu, Torsten Hoefler:
FMI: Fast and Cheap Message Passing for Serverless Functions. 373-385 - Maulein Pathak, Yogish Sabharwal, Neelima Gupta:
Scalable algorithms for compact spanners on real world graphs. 386-397 - Tun Chen, Haipeng Jia, Yunquan Zhang, Kun Li, Zhihao Li, Xiang Zhao, Jianyu Yao, Chendi Li:
OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs. 398-409
Parallel Algorithms
- Grigory Chirkov, David Wentzlaff:
Seizing the Bandwidth Scaling of On-Package Interconnect in a Post-Moore's Law World. 410-422 - Ruiqi Wang, Dezun Dong, Fei Lei, Junchao Ma, Ke Wu, Kai Lu:
Roar: A Router Microarchitecture for In-network Allreduce. 423-436 - Guangnan Feng, Dezun Dong, Shizhen Zhao, Yutong Lu:
GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPC. 437-449 - Pouya Haghi, William Krska, Cheng Tan, Tong Geng, Po-Hao Chen, Connor Greenwood, Anqi Guo, Thomas M. Hines, Chunshu Wu, Ang Li, Anthony Skjellum, Martin C. Herbordt:
FLASH: FPGA-Accelerated Smart Switches with GCN Case Study. 450-462 - Gagandeep Singh, Alireza Khodamoradi, Kristof Denolf, Jack Lo, Juan Gómez-Luna, Joseph Melber, Andra Bisca, Henk Corporaal, Onur Mutlu:
SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation. 463-476 - Nicholas Contini, Bharath Ramesh, Kaushik Kandadi Suresh, Tu Tran, Benjamin Michalowicz, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication. 477-487
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.