default search action
PPoPP 2023: Montreal, QC, Canada
- Maryam Mehri Dehnavi, Milind Kulkarni, Sriram Krishnamoorthy:
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023, Montreal, QC, Canada, 25 February 2023 - 1 March 2023. ACM 2023, ISBN 979-8-4007-0015-6
Data Structures
- Weihua Zhang, Chuanlei Zhao, Lu Peng, Yuzhe Lin, Fengzhe Zhang, Yunping Lu:
Boosting Performance and QoS for Concurrent GPU B+trees by Combining-Based Synchronization. 1-13 - Raed Romanov, Nikita Koval:
The State-of-the-Art LCRQ Concurrent Queue Algorithm Does NOT Require CAS2. 14-26 - Zhe Wang, Jinhao Zhao, Kunal Agrawal, He Liu, Meng Xu, Jing Li:
Provably Good Randomized Strategies for Data Placement in Distributed Key-Value Stores. 27-38
Algorithms
- Pedro Ramalhete, Andreia Correia, Pascal Felber:
2PLSF: Two-Phase Locking with Starvation-Freedom. 39-51 - Xiaojun Dong, Letong Wang, Yan Gu, Yihan Sun:
Provably Fast and Space-Efficient Parallel Biconnectivity. 52-65 - Yuanhao Wei, Guy E. Blelloch, Panagiota Fatourou, Eric Ruppert:
Practically and Theoretically Efficient Garbage Collection for Multiversioning. 66-78
Programming Models
- Muhammad Osama, Serban D. Porumbescu, John D. Owens:
A Programming Model for GPU Load Balancing. 79-91 - Mohak Chadha, Nils Krueger, Jophin John, Anshul Jindal, Michael Gerndt, Shajulin Benedict:
Exploring the Use of WebAssembly in HPC. 92-106 - Nikita Koval, Dan Alistarh, Roman Elizarov:
Fast and Scalable Channels in Kotlin Coroutines. 107-118 - William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko:
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs. 119-134
Applications
- Kehao Lin, Chunbao Zhou, Yan Zeng, Ningming Nie, Jue Wang, Shigang Li, Yangde Feng, Yangang Wang, Kehan Yao, Tiechui Yao, Jilin Zhang, Jian Wan:
A Scalable Hybrid Total FETI Method for Massively Parallel FEM Simulations. 135-147 - Yaojian Chen, Yong Liu, Xinmin Shi, Jiawei Song, Xin Liu, Lin Gan, Chu Guo, Haohuan Fu, Jie Gao, Dexun Chen, Guangwen Yang:
Lifetime-Based Optimization for Simulating Quantum Circuits on a New Sunway Supercomputer. 148-159 - Hunter McCoy, Steven A. Hofmeyr, Katherine A. Yelick, Prashant Pandey:
High-Performance Filters for GPUs. 160-173 - Lukas Breitwieser, Ahmad Hesam, Fons Rademakers, Juan Gómez-Luna, Onur Mutlu:
High-Performance and Scalable Agent-Based Simulation with BioDynaMo. 174-188
Task Parallelism
- Tao B. Schardl, I-Ting Angelina Lee:
OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code. 189-203 - Zhen Xie, Jie Liu, Jiajia Li, Dong Li:
Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness. 204-217 - Michael Bauer, Elliott Slaughter, Sean Treichler, Wonchan Lee, Michael Garland, Alex Aiken:
Visibility Algorithms for Dynamic Dependence Analysis and Distributed Coherence. 218-231
Transactions
- Rati Gelashvili, Alexander Spiegelman, Zhuolun Xiang, George Danezis, Zekun Li, Dahlia Malkhi, Yu Xia, Runtian Zhou:
Block-STM: Scaling Blockchain Execution by Turning Ordering Curse to a Performance Blessing. 232-244 - Gal Assa, Andreia Correia, Pedro Ramalhete, Valerio Schiavoni, Pascal Felber:
TL4x: Buffered Durable Transactions on Disk as Fast as in Memory. 245-259
Decompositions
- Lizhi Xiang, Miao Yin, Chengming Zhang, Aravind Sukumaran-Rajam, P. Sadayappan, Bo Yuan, Dingwen Tao:
TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition. 260-273 - Jieyang Chen, Xin Liang, Kai Zhao, Hadi Zamani Sabzi, Laxmi N. Bhuyan, Zizhong Chen:
Improving Energy Saving of One-Sided Matrix Decompositions on CPU-GPU Heterogeneous Systems. 274-287 - Yang Xia, Peng Jiang, Gagan Agrawal, Rajiv Ramnath:
End-to-End LU Factorization of Large Matrices on GPUs. 288-300 - Shaoshuai Zhang, Ruchi Shah, Hiroyuki Ootomo, Rio Yokota, Panruo Wu:
Fast Symmetric Eigenvalue Decomposition via WY Representation on Tensor Core. 301-312
Kernels
- Zhen Peng, Minjia Zhang, Kai Li, Ruoming Jin, Bin Ren:
iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures. 313-328 - Serif Yesil, Azin Heidarshenas, Adam Morrison, Josep Torrellas:
WISE: Predicting the Performance of Sparse Matrix Vector Multiplication with Machine Learning. 329-341 - Alexandre de Limas Santana, Adrià Armejach, Marc Casas:
Efficient Direct Convolution Using Long SIMD Instructions. 342-353
Attention
- Yufeng Wang, Charith Mendis:
TGOpt: Redundancy-Aware Optimizations for Temporal Graph Attention Networks. 354-368 - Zhaodong Chen, Zheng Qu, Yuying Quan, Liu Liu, Yufei Ding, Yuan Xie:
Dynamic N: M Fine-Grained Structured Sparse Attention Mechanism. 369-379
Training
- Zihao Chen, Chen Xu, Weining Qian, Aoying Zhou:
Elastic Averaging for Efficient Pipelined DNN Training. 380-391 - Zhenkun Cai, Qihui Zhou, Xiao Yan, Da Zheng, Xiang Song, Chenguang Zheng, James Cheng, George Karypis:
DSP: Efficient GNN Training with Multiple GPUs. 392-404 - Chunyang Wang, Desen Sun, Yuebin Bai:
PiPAD: Pipelined and Parallel Dynamic GNN Training on GPUs. 405-418
Posters
- Ricardo Jesus, Michèle Weiland:
AArch64 Atomics: Might They Be Harming Your Performance? 419-421 - Fei Dai, Yawen Chen, Zhiyi Huang, Haibo Zhang, Fangfang Zhang:
Efficient All-Reduce for Distributed DNN Training in Optical Interconnect Systems. 422-424 - Jiantong Jiang, Zeyi Wen, Atif Bin Mansoor, Ajmal Mian:
Fast Parallel Exact Inference on Bayesian Networks. 425-426 - Zhihao Li, Haipeng Jia, Yunquan Zhang, Yuyan Sun, Yiwei Zhang, Tun Chen:
Generating Fast FFT Kernels on CPUs via FFT-Specific Intrinsics. 427-428 - Muhammad Osama, Duane Merrill, Cris Cecka, Michael Garland, John D. Owens:
Stream-K: Work-Centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU. 429-431 - Cheng Xu, Chao Li, Pengyu Wang, Xiaofeng Hou, Jing Wang, Shixuan Sun, Minyi Guo, Hanqing Wu, Dongbai Chen, Xiangwen Liu:
High-Throughput GPU Random Walk with Fine-Tuned Concurrent Query Processing. 432-434 - Gali Sheffi, Erez Petrank:
The ERA Theorem for Safe Memory Reclamation. 435-437 - Vitaly Aksenov, Trevor Brown, Alexander Fedorov, Ilya Kokorin:
Unexpected Scaling in Path Copying Trees. 438-440 - Wentao Cai, Haosen Wen, Michael L. Scott:
Transactional Composition of Nonblocking Data Structures. 441-443 - Ruobing Han, Jun Chen, Bhanu Garg, Jeffrey Young, Jaewoong Sim, Hyesoon Kim:
CuPBoP: A Framework to Make CUDA Portable. 444-446 - Yuchen Zhong, Guangming Sheng, Juncheng Liu, Jinhui Yuan, Chuan Wu:
Swift: Expedited Failure Recovery for Large-Scale DNN Training. 447-449 - Re'em Harel, Yuval Pinter, Gal Oren:
Learning to Parallelize in a Shared-Memory Environment with Transformers. 450-452
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.