default search action
38th IPDPS 2024: San Francisco, CA, USA
- IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024. IEEE 2024, ISBN 979-8-3503-8711-7
- Zhengding Hu, Jingwei Sun, Zhongyang Li, Guangzhong Sun:
PckGNN: Optimizing Aggregation Operators with Packing Strategies in Graph Neural Networks. 2-13 - Luhan Wang, Haipeng Jia, Lei Xu, Cunyang Wei, Kun Li, Xianmeng Jiang, Yunquan Zhang:
VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs. 14-25 - Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld:
Two-Stage Block Orthogonalization to Improve Performance of s-step GMRES. 26-37 - Oded Schwartz, Sivan Toledo, Noa Vaknin, Gal Wiernik:
Alternative Basis Matrix Multiplication is Fast and Stable. 38-51 - Tianyu Liang, Riley Murray, Aydin Buluç, James Demmel:
Fast multiplication of random dense matrices with sparse matrices. 52-62 - Takeshi Fukaya, Yuji Nakatsukasa, Yusaku Yamamoto:
A Cholesky QR type algorithm for computing tall-skinny QR factorization with column pivoting. 63-75 - Yunfei Gu, Yihui Lu, Chentao Wu, Jie Li, Minyi Guo:
CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems. 76-88 - Amelie Chi Zhou, Rongzheng Huang, Zhoubin Ke, Yusen Li, Yi Wang, Rui Mao:
Tackling Cold Start in Serverless Computing with Multi-Level Container Reuse. 89-99 - Vivek M. Bhasi, Aakash Sharma, Shruti Mohanty, Mahmut Taylan Kandemir, Chita R. Das:
Paldia: Enabling SLO-Compliant and Cost-Effective Serverless Computing on Heterogeneous Hardware. 100-113 - Moiz Arif, Avinash Maurya, M. Mustafa Rafique, Dimitrios S. Nikolopoulos, Ali Raza Butt:
Application-Attuned Memory Management for Containerized HPC Workflows. 114-127 - Yunlong Cheng, Xiuqi Huang, Zifeng Liu, Jiadong Chen, Xiaofeng Gao, Zhen Fang, Yongqiang Yang:
FEDGE: An Interference-Aware QoS Prediction Framework for Black-Box Scenario in IaaS Clouds with Domain Generalization. 128-138 - Marcin Copik, Marcin Chrapek, Larissa Schmid, Alexandru Calotoiu, Torsten Hoefler:
Software Resource Disaggregation for HPC with Serverless Computing. 139-156 - Haishuang Fan, Rui Meng, Qichu Sun, Jingya Wu, Wenyan Lu, Xiaowei Li, Guihai Yan:
AMST: Accelerating Large-Scale Graph Minimum Spanning Tree Computation on FPGA. 157-168 - Ilya Kokorin, Victor Yudov, Vitaly Aksenov, Dan Alistarh:
Wait-free Trees with Asymptotically-Efficient Range Queries. 169-179 - Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski:
Low-Depth Spatial Tree Algorithms. 180-192 - Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu:
QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices. 193-204 - Van An Le, Nam Duong Tran, Phuong Nam Nguyen, Thanh Hung Nguyen, Phi Le Nguyen, Truong Thao Nguyen, Yusheng Ji:
Enhancing the Generalization of Personalized Federated Learning with Multi-head Model and Ensemble Voting. 205-216 - Yifei Li, Ryan Chard, Yadu N. Babuji, Kyle Chard, Ian T. Foster, Zhuozhao Li:
UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving. 217-229 - Zhiqian Xu, Honghui Shang, Yi Fan, Xiongzhi Zeng, Yunquan Zhang, Chu Guo:
Scalable and Differentiable Simulator for Quantum Computational Chemistry. 230-240 - S. M. Ferdous, Reece Neff, Bo Peng, Salman Shuvo, Marco Minutoli, Sayak Mukherjee, Karol Kowalski, Michela Becchi, Mahantesh Halappanavar:
Picasso: Memory-Efficient Graph Coloring Using Palettes With Applications in Quantum Computing. 241-252 - Niteya Shah, Christine Sweeney, Vinay Ramakrishnaiah, Jeffrey Donatelli, Wu-Chun Feng:
Optimizing and Scaling the 3D Reconstruction of Single-Particle Imaging. 253-264 - Xiran Zhang, Sameh Abdulah, Jian Cao, Hatem Ltaief, Ying Sun, Marc G. Genton, David E. Keyes:
Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications. 265-276 - Zeyu Song, Lin Gan, Shengye Xiang, Yinuo Wang, Xiaohui Duan, Guangwen Yang:
Enabling High-Performance Physical Based Rendering on New Sunway Supercomputer. 277-288 - Taolei Wang, Chao Li, Jing Wang, Cheng Xu, Xiaofeng Hou, Minyi Guo:
CoCG: Fine-grained Cloud Game Co-location on Heterogeneous Platform. 289-299 - Thanh Son Phung, Douglas Thain:
Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources. 300-311 - David Álvarez, Kevin Sala, Vicenç Beltran:
nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling. 312-324 - Jing Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs:
SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs. 325-336 - Baolin Li, Siddharth Samsi, Vijay Gadepally, Devesh Tiwari:
Interpretable Analysis of Production GPU Clusters Monitoring Data via Association Rule Mining. 337-349 - Jan Laukemann, Thomas Gruber, Georg Hager, Dossay Oryspayev, Gerhard Wellein:
CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion. 350-360 - Yi-Chien Lin, Yuyang Chen, Sameh Gobriel, Nilesh Jain, Gopi Krishna Jha, Viktor K. Prasanna:
ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor. 361-372 - Yuke Li, Arjun Kashyap, Weicong Chen, Yanfei Guo, Xiaoyi Lu:
Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures. 373-385 - Tobias S. Flynn, Robert Manson-Sawko, Gihan R. Mudalige:
Performance-Portable Multiphase Flow Solutions with Discontinuous Galerkin Methods. 386-397 - Ahmed H. Mahmoud, Hesam Salehipour, Massimiliano Meneghin:
Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method. 398-407 - Herbert Owen, Dominik Ernst, Thomas Gruber, Oriol Lehmkuhl, Guillaume Houzeaux, Lucas Gasparino, Gerhard Wellein:
Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs. 408-416 - Zizhe Jian, Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Haiying Xu, Robert Underwood, Shixun Wu, Jiajun Huang, Zizhong Chen, Franck Cappello:
CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction. 417-429 - Eric Heisler, Siddharth Saurav, Aadesh Deshmukh, Sandip Mazumder, Hari Sundar:
Automating GPU Scalability for Complex Scientific Models: Phonon Boltzmann Transport Equation. 430-439 - Tianyu Liang, Chao Chen, Per-Gunnar Martinsson, George Biros:
An O(N) distributed-memory parallel direct solver for planar integral equations. 440-452 - Marc Blancafort, Roger Ferrer, Guillaume Houzeaux, Marta Garcia-Gasulla, Filippo Mantovani:
Exploiting long vectors with a CFD code: a co-design show case. 453-464 - Ahmad Tarraf, Alexis Bandet, Francieli Boito, Guillaume Pallez, Felix Wolf:
Capturing Periodic I/O Using Frequency Techniques. 465-478 - Anxin Guo, Jingwei Li, Pattara Sukprasert, Samir Khuller, Amol Deshpande, Koyel Mukherjee:
To Store or Not to Store: a graph theoretical approach for Dataset Versioning. 479-493 - Neeraj Rajesh, Keith Bateman, Jean Luca Bez, Suren Byna, Anthony Kougkas, Xian-He Sun:
TunIO: An AI-powered Framework for Optimizing HPC I/O. 494-505 - Dong Kyu Sung, Yongseok Son, Alex Sim, Kesheng Wu, Suren Byna, Houjun Tang, Hyeonsang Eom, Changjong Kim, Sunggon Kim:
A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis. 506-518 - Darren Ng, Andrew Lin, Arjun Kashyap, Guanpeng Li, Xiaoyi Lu:
NVMe-oPF: Designing Efficient Priority Schemes for NVMe-over-Fabrics with Multi-Tenancy Support. 519-531 - Hammad Ather, Jean Luca Bez, Yankun Xia, Suren Byna:
Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration. 532-543 - Mark Hildebrand, Jason Lowe-Power, Venkatesh Akella:
CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems. 545-555 - Junqi Yin, Avishek Bose, Guojing Cong, Isaac Lyngaas, Quentin Anthony:
Comparative Study of Large Language Model Architectures on Frontier. 556-569 - Daniel Nichols, Alexander Movsesyan, Jae-Seung Yeom, Abhik Sarkar, Daniel Milroy, Tapasya Patki, Abhinav Bhatele:
Predicting Cross-Architecture Performance of Parallel Programs. 570-581 - Md Hasanur Rahman, Sheng Di, Shengjian Guo, Xiaoyi Lu, Guanpeng Li, Franck Cappello:
Druto: Upper-Bounding Silent Data Corruption Vulnerability in GPU Applications. 582-594 - Jad El Karchi, Hanze Chen, Ali TehraniJamsaz, Ali Jannesari, Mihail Popov, Emmanuelle Saillard:
MPI Errors Detection using GNN Embedding and Vector Embedding over LLVM IR. 595-607 - Shuaipeng Zhang, Shiyi Li, Chentao Wu, Ruobin Wu, Saiqin Long, Wen Xia:
A Parallel Partial Merge Repair Algorithm for Multi-block Failures for Erasure Storage Systems. 608-618 - Payman Behnam, Uday Kamal, Ali Shafiee, Alexey Tumanov, Saibal Mukhopadhyay:
Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators. 619-630 - Ricardo Nobre, Aleksandar Ilic, Sergio Santander-Jiménez, Leonel Sousa:
IPU-EpiDet: Identifying Gene Interactions on Massively Parallel Graph-Based AI Accelerators. 631-643 - Malith Jayaweera, Yanyu Li, Yanzhi Wang, Bin Ren, David R. Kaeli:
DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware. 644-655 - Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Qiang Wang, Xiaowen Chu:
Benchmarking and Dissecting the Nvidia Hopper GPU Architecture. 656-667 - Emanuele Del Sozzo, Xinyuan Wang, Boma A. Adhi, Carlos Cortes, Jason Anderson, Kentaro Sano:
Exploration of Trade-offs Between General-Purpose and Specialized Processing Elements in HPC-Oriented CGRA. 668-680 - Abeda Sultana, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng:
Hadar: Heterogeneity-Aware Optimization-Based Online Scheduling for Deep Learning Cluster. 681-691 - Chen Chen, Xingbo Wu, Wenshao Zhong, Jakob Eriksson:
Fast Abort-Freedom for Deterministic Transactions. 692-704 - Di Zhang, Monish Soundar Raj, Bing Xie, Sheng Di, Dong Dai:
Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters. 716-727 - Srinjoy Das, Lawrence Rauchwerger:
Automatic Task Parallelization of Dataflow Graphs in ML/DL Models. 728-739 - Thomas B. Rolinger, Alan Sussman:
Adaptive Prefetching for Fine-grain Communication in PGAS Programs. 740-751 - Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai Zhao, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur:
An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression. 752-764 - Zijian Li, Zixuan Chen, Yiying Tang, Xin Ai, Yuanyi Zhu, Zhigao Zhao, Jiang Shao, Guowei Liu, Sen Liu, Bin Liu, Yang Xu:
MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic. 765-779 - Zicheng Wang, Zirui Zhuang, Jingyu Wang, Qi Qi, Haifeng Sun, Jianxin Liao:
Fast Policy Convergence for Traffic Engineering with Proactive Distributed Message-Passing. 780-790 - Chongshan Liang, Yi Dai, Jun Xia, Jinbo Xu, Jintao Peng, Weixia Xu, Ming Xie, Jie Liu, Zhiquan Lai, Sheng Ma, Qi Zhu:
The Self-adaptive and Topology-aware MPI_Bcast leveraging Collective offload on Tianhe Express Interconnect. 791-801 - Bharath Ramesh, Nick Contini, Nawras Alnaasan, Kaushik Kandadi Suresh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions. 802-813 - Tu Dinh Ngoc, Boris Teabe, Georges Da Costa, Daniel Hagimont:
Flexible NVMe Request Routing for Virtual Machines. 814-824 - Xiang Chen, Tao Lu, Jiapin Wang, Yu Zhong, Guangchun Xie, Xueming Cao, Yuanpeng Ma, Bing Si, Feng Ding, Ying Yang, Yunxin Huang, Yafei Yang, You Zhou, Fei Wu:
HA-CSD: Host and SSD Coordinated Compression for Capacity and Performance. 825-838 - Md Nahid Newaz, Sayan Ghosh, Joshua Suetterlein, Nathan R. Tallent, Md Atiqul Mollah, Ming Hua:
Graph Analytics on Jellyfish topology. 839-851 - Sukarn Agarwal, Shounak Chakraborty, Magnus Själander:
TEEMO: Temperature Aware Energy Efficient Multi-Retention STT-RAM Cache Architecture. 852-864 - Li Wan, Fu Chao, Qiang Li, Jun Han:
LockillerTM: Enhancing Performance Lower Bounds in Best-Effort Hardware Transactional Memory. 865-875 - Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, Viktor K. Prasanna:
Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching. 876-888 - Jiaqi Yang, Hao Zheng, Ahmed Louri:
Aurora: A Versatile and Flexible Accelerator for Graph Neural Networks. 890-902 - Lihan Hu, Jing Li, Peng Jiang:
cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding. 903-914 - Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. 915-925 - Gangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor K. Prasanna:
TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning. 926-937 - Ruge Zhang, Haipeng Jia, Yunquan Zhang, Baicheng Yan, Penghao Ma, Long Wang, Wenxuan Zhao:
OpenFFT-SME: An Efficient Outer Product Pattern FFT Library on ARM SME CPUs. 938-949 - Evangelos Georganas, Dhiraj D. Kalamkar, Kirill Voronin, Abhisek Kundu, Antonio Noack, Hans Pabst, Alexander Breuer, Alexander Heinecke:
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures. 950-963 - Kainan Yu, Xinxin Qi, Peng Zhang, Jianbin Fang, Dezun Dong, Ruibo Wang, Tao Tang, Chun Huang, Yonggang Che, Zheng Wang:
Optimizing General Matrix Multiplications on Modern Multi-core DSPs. 964-975 - Yufan Xia, Giuseppe Maria Junior Barca:
Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern Multi-Core Systems. 976-986 - Debasish Pattanayak, Gokarna Sharma:
Time-Color Tradeoff on Uniform Circle Formation by Asynchronous Robots. 987-997 - Xiaohai Dai, Guanxiong Wang, Jiang Xiao, Zhengxuan Guo, Rui Hao, Xia Xie, Hai Jin:
LightDAG: A Low-latency DAG-based BFT Consensus through Lightweight Broadcast. 998-1008 - Rongyuan Tan, Zhuozhao Li:
MAAD: A Distributed Anomaly Detection Architecture for Microservices Systems. 1009-1021 - Jérémie Decouchant, David Kozhaya, Vincent Rahli, Jiangshan Yu:
OneShot: View-Adapting Streamlined BFT Protocols with Trusted Execution Environments. 1022-1033 - Alexandre Valentin Jamet, Georgios Vavouliotis, Daniel A. Jiménez, Lluc Alvarez, Marc Casas:
Practically Tackling Memory Bottlenecks of Graph-Processing Workloads. 1034-1045 - Yihua Wei, Peng Jiang:
GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs. 1046-1057 - Sam Coy, Artur Czumaj, Peter Davies-Peck, Gopinath Mishra:
Parallel Derandomization for Coloring. 1058-1069 - Jiangbo Li, Zichen Xu, Minh Pham, Yicheng Tu, Qihe Zhou:
A Comparative Study of Intersection-Based Triangle Counting Algorithms on GPUs. 1070-1081
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.