default search action
ICS 2024: Kyoto, Japan
- Kenji Kise, Valentina Salapura, Murali Annavaram, Ana Lucia Varbanescu:
Proceedings of the 38th ACM International Conference on Supercomputing, ICS 2024, Kyoto, Japan, June 4-7, 2024. ACM 2024
Session 2: Best Paper Nominees
- Yelai Feng, Huaixi Wang, Yining Zhu, Xiandong Liu, Hongyi Lu, Qing Liu:
DAWN: Matrix Operation-Optimized Algorithm for Shortest Paths Problem on Unweighted Graphs. 1-13 - Durga Keerthi Mandarapu, Vani Nagarajan, Artem Pelenitsyn, Milind Kulkarni:
Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing. 14-25 - Bennett Cooper, Thomas R. W. Scogland, Rong Ge:
Shared Virtual Memory: Its Design and Performance Implications for Diverse Applications. 26-37 - Reece Neff, Mostafa Eghbali Zarch, Marco Minutoli, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman, Michela Becchi:
FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems. 38-49 - Juhyeon Lee, Insung Bahk, Hoseung Kim, Sinjin Jeong, Suyeon Lee, Donghyun Min:
An Autonomous Parallelization of Transformer Model Inference on Heterogeneous Edge Devices. 50-61
Session 3A: Memory and Storage Systems
- Chengtao Lai, Zhongchun Zhou, Akash Poptani, Wei Zhang:
LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators. 62-73 - Qi Shao, Angelos Arelakis, Per Stenström:
HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory. 74-84 - Raveendra Soori, Shreyas Prabhu, Harpreet Singh Chawla, Michael Ferdman:
NUCAlloc: Fine-Grained Block Placement in Hashed Last-Level NUCA Caches. 85-97 - Francesc Martínez Palau, Martí Torrents, Adrià Armejach, Marc Casas:
Exploiting Vector Code Semantics for Efficient Data Cache Prefetching. 98-109
Session 3B: Emerging supercomputing applications
- Du Wu, Peng Chen, Xiao Wang, Isaac Lyngaas, Takaaki Miyajima, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib:
Real-time High-resolution X-Ray Computed Tomography. 110-123 - Liang Geng, Rubao Lee, Xiaodong Zhang:
RayJoin: Fast and Precise Spatial Join. 124-136 - Xiao Fu, Weiling Yang, Dezun Dong, Xing Su:
Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs. 137-149 - Hans Vandierendonck:
Differentiating Set Intersections in Maximal Clique Enumeration by Function and Subproblem Size. 150-163
Session 5A: Reliability, dependability and availability
- Soheil Khadirsharbiyani, Movahhed Sadeghi, Mostafa Eghbali Zarch, Mahmut Taylan Kandemir:
Minimizing Coherence Errors via Dynamic Decoupling. 164-175 - Jianping Zeng, Shaoyu Huang, Jiuyang Liu, Changhee Jung:
Soft Error Resilience at Near-Zero Cost. 176-187 - Vladyslav Oles, Anna Schmedding, George Ostrouchov, Woong Shin, Evgenia Smirni, Christian Engelmann:
Understanding GPU Memory Corruption at Extreme Scale: The Summit Case Study. 188-200 - Dolores Miao, Ignacio Laguna, Cindy Rubio-González:
Input Range Generation for Compiler-Induced Numerical Inconsistencies. 201-212
Session 5B: Heterogeneous software: GPUs and domain specific accelerators
- Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg:
Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs. 213-224 - Benjamin Brock, Aydin Buluç, Katherine A. Yelick:
RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs. 225-235 - Benjamin Brock, Robert Cohn, Suyash Bakshi, Tuomas Karna, Jeongnim Kim, Mateusz Nowak, Lukasz Slusarczyk, Kacper Stefanski, Timothy G. Mattson:
Distributed Ranges: A Model for Distributed Data Structures, Algorithms, and Views. 236-246 - Wenxuan Zhao, Liang Yuan, Baicheng Yan, Penghao Ma, Yunquan Zhang, Long Wang, Zhe Wang:
Stencil Computation with Vector Outer Product. 247-258
Session 6A: Cloud and ML Systems Efficiency
- Wei Gao, Weiming Zhuang, Minghao Li, Peng Sun, Yonggang Wen, Tianwei Zhang:
Ymir: A Scheduler for Foundation Model Fine-tuning Workloads in Datacenters. 259-271 - Franz Kevin Stehle, Wainer Vandelli, Felix Zahn, Giuseppe Avolio, Holger Fröning:
DeepHYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems. 272-285 - Quentin R. Petit, Chong Li, Nahid Emad:
An Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer. 286-297 - Justin McGowen, Ismet Dagli, Neil T. Dantam, Mehmet E. Belviranli:
Scheduling for Cyber-Physical Systems with Heterogeneous Processing Units under Real-World Constraints. 298-311
Session 6B: Accelerator Designs
- Raúl Taranco, José-María Arnau, Antonio González:
SLIDEX: A Novel Architecture for Sliding Window Processing. 312-323 - Zhengang Li, Alec Lu, Yanyue Xie, Zhenglun Kong, Mengshu Sun, Hao Tang, Zhong Jia Xue, Peiyan Dong, Caiwen Ding, Yanzhi Wang, Xue Lin, Zhenman Fang:
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers. 324-337 - Sungmin Yun, Hwayong Nam, Kwanhee Kyung, Jaehyun Park, Byeongho Kim, Yongsuk Kwon, Eojin Lee, Jung Ho Ahn:
CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers. 338-351 - Xianbin Li, Yinyi Liu, Fan Jiang, Chengeng Li, Yuxiang Fu, Wei Zhang, Jiang Xu:
NEOCNN: NTT-Enabled Optical Convolution Neural Network Accelerator. 352-362
Session 8A: Supercomputing Software and Security
- Stepan Vanecek, Martin Schulz:
sys-sage: A Unified Representation of Dynamic Topologies & Attributes on HPC Systems. 363-375 - Yubo Du, Yanan Guo, Youtao Zhang, Jun Yang:
RTT-UAF: Reuse Time Tracking for Use-After-Free Detection. 376-387 - Shilpa Babalad, Shirish K. Shevade, Matthew Jacob Thazhuthaveetil, R. Govindarajan:
Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures. 388-399 - Alexandre Chen, Brittany A. Erickson, Jeremy E. Kozdon, Jee Choi:
Matrix-free SBP-SAT finite difference methods and the multigrid preconditioner on GPUs. 400-412
Session 8B: Interconnects and Networks
- Pouya Haghi, Cheng Tan, Anqi Guo, Chunshu Wu, Dongfang Liu, Ang Li, Anthony Skjellum, Tong Geng, Martin C. Herbordt:
SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications. 413-425 - Mert Hidayetoglu, Simon Garcia De Gonzalo, Elliott Slaughter, Yu Li, Christopher Zimmer, Tekin Bicer, Bin Ren, William Gropp, Wen-Mei Hwu, Alex Aiken:
CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes. 426-436 - Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur:
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters. 437-448 - Ram Sharan Chaulagain, Xin Yuan:
Enhanced UGAL Routing Schemes for Dragonfly Networks. 449-459
Session 9A: Machine learning systems
- Mingyi Li, Junmin Xiao, Kewei Zhang, Zhiheng Lin, Chaoyang Shui, Ke Meng, Zehua Wang, Yunfei Pang, Guangming Tan:
A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations. 460-472 - Wei Gao, Xu Zhang, Shan Huang, Shangwei Guo, Peng Sun, Yonggang Wen, Tianwei Zhang:
AutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training Workloads. 473-484 - Baorun Mu, Christina Giannoula, Shang Wang, Gennady Pekhimenko:
Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information. 485-497 - Hanxian Huang, Xin Chen, Jishen Zhao:
Fasor: A Fast Tensor Program Optimization Framework for Efficient DNN Deployment. 498-510
Session 9B: Software Design for Accelerators
- Keren Zhou, Karthik Ganapathi Subramanian, Po-Hsun Lin, Matthias Fey, Binqian Yin, Jiajia Li:
FASTEN: Fast GPU-accelerated Segmented Matrix Multiplication for Heterogenous Graph Neural Networks. 511-524 - Mohammad Kefah Taha Issa, Muhammad Aditya Sasongko, Ilyas Turimbetov, Javid Baydamirli, Dogan Sagbili, Didem Unat:
Snoopie: A Multi-GPU Communication Profiler and Visualizer. 525-536 - Yifei Li, Bole Zhou, Jiejing Zhang, Xuechao Wei, Yinghan Li, Yingda Chen:
RadiK: Scalable and Optimized GPU-Parallel Radix Top-K Selection. 537-548 - Chendi Li, Yufan Xu, Sina Mahdipour Saravani, Ponnuswamy Sadayappan:
Accelerated Auto-Tuning of GPU Kernels for Tensor Computations. 549-561
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.