default search action
23rd CLUSTER 2021: Portland, OR, USA
- IEEE International Conference on Cluster Computing, CLUSTER 2021, Portland, OR, USA, September 7-10, 2021. IEEE 2021, ISBN 978-1-7281-9666-4
- Bin-Rui Li, Shenggan Cheng, James Lin:
tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores. 1-11 - W. Pepper Marts, Matthew G. F. Dosanjh, Scott Levy, Whit Schonbein, Ryan E. Grant, Patrick G. Bridges:
MiniMod: A Modular Miniapplication Benchmarking Framework for HPC. 12-22 - Daniel Rosendo, Alexandru Costan, Gabriel Antoniu, Matthieu Simonin, Jean-Christophe Lombardo, Alexis Joly, Patrick Valduriez:
Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum. 23-34 - Bing Xie, Qiang Cao, Mayuresh Kunjir, Linli Wan, Jeffrey S. Chase, Anirban Mandal, Mats Rynge:
WIRE: Resource-efficient Scaling with Online Prediction for DAG-based Workflows. 35-46 - Zihan Jiang, Wanling Gao, Fei Tang, Lei Wang, Xingwang Xiong, Chunjie Luo, Chuanxin Lan, Hongxiao Li, Jianfeng Zhan:
HPC AI500 V2.0: The Methodology, Tools, and Metrics for Benchmarking HPC AI Systems. 47-58 - Wenyan Chen, Chengzhi Lu, Kejiang Ye, Yang Wang, Cheng-Zhong Xu:
RPTCN: Resource Prediction for High-dynamic Workloads in Clouds based on Deep Learning. 59-69 - Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, Philippe Preux:
READYS: A Reinforcement Learning Based Strategy for Heterogeneous Dynamic Scheduling. 70-81 - Hongyuan Liu, Bogdan Nicolae, Sheng Di, Franck Cappello, Adwait Jog:
Accelerating DNN Architecture Search at Scale Using Selective Weight Transfer. 82-93 - Jing Cao, Zongwei Zhu, Xuehai Zhou:
SAP-SGD: Accelerating Distributed Parallel Training with High Communication Efficiency on Heterogeneous Clusters. 94-102 - Lizhi Zhang, Zhiquan Lai, Shengwei Li, Yu Tang, Feng Liu, Dongsheng Li:
2PGraph: Accelerating GNN Training over Large Graphs on GPU Clusters. 103-113 - Jaime Cernuda Garcia, Hariharan Devarajan, Luke Logan, Keith Bateman, Neeraj Rajesh, Jie Ye, Anthony Kougkas, Xian-He Sun:
HFlow: A Dynamic and Elastic Multi-Layered I/O Forwarder. 114-124 - Peng Xu, Nannan Zhao, Jiguang Wan, Wei Liu, Shuning Chen, Yuanhui Zhou, Hadeel Albahar, Hanyang Liu, Liu Tang, Changsheng Xie:
Building A Fast and Efficient LSM-tree Store by Integrating Local Storage with Cloud Storage. 125-134 - Ovidiu-Cristian Marcu, Alexandru Costan, Bogdan Nicolae, Gabriel Antoniu:
Virtual Log-Structured Storage for High-Performance Streaming. 135-145 - Pradeep Subedi, Philip E. Davis, Manish Parashar:
RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows. 146-156 - Hanchen Guo, Zhehan Lin, Yunfei Gu, Chentao Wu, Li Jiang, Jie Li, Guangtao Xue, Minyi Guo:
Lazy-WL: A Wear-aware Load Balanced Data Redistribution Method for Efficient SSD Array Scaling. 157-168 - Frederic Schimmelpfennig, Marc-André Vef, Reza Salkhordeh, Alberto Miranda, Ramon Nou, André Brinkmann:
Streamlining distributed Deep Learning I/O with ad hoc file systems. 169-180 - Hao Wu, Jiangming Jin, Jidong Zhai, Yifan Gong, Wei Liu:
Accelerating GPU Message Communication for Autonomous Navigation Systems. 181-191 - Qingxiao Sun, Yi Liu, Hailong Yang, Zhonghui Jiang, Xiaoyan Liu, Ming Dun, Zhongzhi Luan, Depei Qian:
csTuner: Scalable Auto-tuning Framework for Complex Stencil Computation on GPUs. 192-203 - Patrick Diehl, Gregor Daiß, Dominic Marcello, Kevin A. Huck, Sagiv Shiber, Hartmut Kaiser, Juhan Frank, Geoffrey C. Clayton, Dirk Pflüger:
Octo-Tiger's New Hydro Module and Performance Using HPX+CUDA on ORNL's Summit. 204-214 - Manasi Tiwari, Sathish Vadhiyar:
Pipelined Preconditioned s-step Conjugate Gradient Methods for Distributed Memory Systems. 215-225 - Mohsen Koohi Esfahani, Peter Kilpatrick, Hans Vandierendonck:
Thrifty Label Propagation: Fast Connected Components for Skewed-Degree Graphs. 226-237 - Jonathan Lifflander, Nicole Lemaster Slattengren, Philippe P. Pébaÿ, Phil Miller, Francesco Rizzi, Matthew T. Bettencourt:
Optimizing Distributed Load Balancing for Workloads with Time-Varying Imbalance. 238-249 - Hrushit Parikh, Vinit Deodhar, Ada Gavrilovska, Santosh Pande:
Distributed Work Stealing at Scale via Matchmaking. 250-260 - Dominik Scheinert, Lauritz Thamsen, Houkun Zhu, Jonathan Will, Alexander Acker, Thorsten Wittkopp, Odej Kao:
Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts. 261-270 - Ping Chen, Shuibing He, Xuechen Zhang, Shuaiben Chen, Peiyi Hong, Yanlong Yin, Xian-He Sun, Gang Chen:
CSWAP: A Self-Tuning Compression Framework for Accelerating Tensor Swapping in GPUs. 271-282 - Jiannan Tian, Sheng Di, Xiaodong Yu, Cody Rivera, Kai Zhao, Sian Jin, Yunhe Feng, Xin Liang, Dingwen Tao, Franck Cappello:
Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs. 283-293 - Jinyang Liu, Sheng Di, Kai Zhao, Sian Jin, Dingwen Tao, Xin Liang, Zizhong Chen, Franck Cappello:
Exploring Autoencoder-based Error-bounded Compression for Scientific Data. 294-306 - Xiaodong Yu, Sheng Di, Ali Murat Gok, Dingwen Tao, Franck Cappello:
cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions. 307-319 - Jialing Zhang, Jiaxi Chen, Xiaoyan Zhuo, Aekyeung Moon, Seung Woo Son:
DPZ: Improving Lossy Compression Ratio with Information Retrieval on Scientific Data. 320-331 - Subhadeep Bhattacharya, Weikuan Yu, Fahim Tahmid Chowdhury, Kathryn M. Mohror:
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging. 332-343 - Nicholas O. Malott, Rishi R. Verma, Rohit P. Singh, Philip A. Wilsey:
Distributed Computation of Persistent Homology from Partitioned Big Data. 344-354 - Dalin Wang, Feng Zhang, Weitao Wan, Hourun Li, Xiaoyong Du:
FineQuery: Fine-Grained Query Processing on CPU-GPU Integrated Architectures. 355-365 - Shoichi Hirasawa, Hayato Yamaki, Michihiro Koibuchi:
Packet Forwarding Cache of Commodity Switches for Parallel Computers. 366-376 - Megan Grodowitz, Luis E. Peña, Curtis Dunham, Dong Zhong, Pavel Shamis, Steve Poole:
Two-Chains: High Performance Framework for Function Injection and Execution. 377-387 - Wei Liu, Haikun Liu, Xiaofei Liao, Hai Jin, Yu Zhang:
HNGraph: Parallel Graph Processing in Hybrid Memory Based NUMA Systems. 388-397 - Hoang-Dung Do, Valérie Hayot-Sasson, Rafael Ferreira da Silva, Christopher Steele, Henri Casanova, Tristan Glatard:
Modeling the Linux page cache for accurate simulation of data-intensive applications. 398-408 - Bo Fang, Daoce Wang, Sian Jin, Quincey Koziol, Zhao Zhang, Qiang Guan, Suren Byna, Sriram Krishnamoorthy, Dingwen Tao:
Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights. 409-420 - Kurt B. Ferreira, Scott Levy, Victor Kuhns, Nathan DeBardeleben, Sean Blanchard:
Understanding the Effects of DRAM Correctable Error Logging at Scale. 421-432 - Kun Suo, Junggab Son, Dazhao Cheng, Wei Chen, Sabur Baidya:
Tackling Cold Start of Serverless Applications by Efficient and Adaptive Container Runtime Reusing. 433-443 - Matthew Wolf, Jeremy Logan, Kshitij Mehta, Daniel A. Jacobson, Mikaela Cashman, Angelica M. Walker, Greg Eisenhauer, Patrick M. Widener, Ashley Cliff:
Reusability First: Toward FAIR Workflows. 444-455 - Teng Ma, Kang Chen, Shaonan Ma, Zhuo Song, Yongwei Wu:
Thinking More about RDMA Memory Semantics. 456-467 - Tapasya Patki, Adam Bertsch, Ian Karlin, Dong H. Ahn, Brian Van Essen, Barry Rountree, Bronis R. de Supinski, Nathan Besaw:
Monitoring Large Scale Supercomputers: A Case Study with the Lassen Supercomputer. 468-480 - Arnab Das, Tanmay Tirpankar, Ganesh Gopalakrishnan, Sriram Krishnamoorthy:
Robustness Analysis of Loop-Free Floating-Point Programs via Symbolic Automatic Differentiation. 481-491 - Elvis Rojas, Diego Pérez, Jon C. Calhoun, Leonardo Bautista-Gomez, Terry R. Jones, Esteban Meneses:
Understanding Soft Error Sensitivity of Deep Learning Models and Frameworks through Checkpoint Alteration. 492-503 - Edgar A. León, Marc Joos, Nathan Hanford, Adrien Cotte, Tony Delforge, François Diakhaté, Vincent Ducrot, Ian Karlin, Marc Pérache:
On-the-Fly, Robust Translation of MPI Libraries. 504-515 - Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, Pavan Balaji:
Daps: A Dynamic Asynchronous Progress Stealing Model for MPI Communication. 516-527 - Kevin Sala, Sandra Macià, Vicenç Beltran:
Combining One-Sided Communications with Task-Based Programming Models. 528-541 - Wanrong Gao, Jianbin Fang, Chun Huang, Chuanfu Xu, Zheng Wang:
Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures. 542-552 - Yuetsu Kodama, Masaaki Kondo, Mitsuhisa Sato:
Evaluation of SPEC CPU and SPEC OMP on the A64FX. 553-561 - Robert Schöne, Thomas Ilsche, Mario Bielert, Markus Velten, Markus Schmidl, Daniel Hackenberg:
Energy Efficiency Aspects of the AMD Zen 2 Architecture. 562-571 - Julita Corbalán, Oriol Vidal, Lluis Alonso, Jordi Aneas:
Explicit uncore frequency scaling for energy optimisation policies with EAR in Intel architectures. 572-581 - Robert Schöne, Markus Schmidl, Mario Bielert, Daniel Hackenberg:
FIRESTARTER 2: Dynamic Code Generation for Processor Stress Tests. 582-590 - Stefan A. Robila, David Grant, Chris DePrater, Vali Sorell, Terry L. Rodgers, David Martinez, Shlomo Novotny:
Cooling the Data Center: Design of a Mechanical Controls Owner Project Requirements (OPR) Template. 591-595 - Alessio Netti, Woong Shin, Michael Ott, Torsten Wilde, Natalie J. Bates:
A Conceptual Framework for HPC Operational Data Analytics. 596-603 - Thomas Jakobsche, Nicolas Lachiche, Aurélien Cavelan, Florina M. Ciorba:
An Execution Fingerprint Dictionary for HPC Application Recognition. 604-608 - Ashish Pal, Preeti Malakar:
An Integrated Job Monitor, Analyzer and Predictor. 609-617 - Kenneth Lamar, Alexander V. Goponenko, Christina L. Peterson, Benjamin A. Allan, Jim M. Brandt, Damian Dechev:
Backfilling HPC Jobs with a Multimodal-Aware Predictor. 618-622 - Louise Harding, Fabien Wernli, Frédéric Suter:
Sequence-RTG: Efficient and Production-Ready Pattern Mining in System Log Messages. 623-631 - Chengcheng Li, Ahmad Maroof Karimi, Woong Shin, Hairong Qi, Feiyi Wang:
The Challenge of Disproportionate Importance of Temporal Features in Predicting HPC Power Consumption. 632-636 - Shantenu Jha, Allen D. Malony:
Dynamic and Adaptive Monitoring and Analysis for Many-task Ensemble Computing. 637-641 - Jie Yin, Atsushi Hori, Balazs Gerofi, Yutaka Ishikawa:
A Scalability Study of Data Exchange in HPC Multi-component Workflows. 642-648 - Ricardo Macedo, Cláudia Correia, Marco Dantas, Cláudia Brito, Weijia Xu, Yusuke Tanimura, Jason Haga, João Paulo:
The Case for Storage Optimization Decoupling in Deep Learning Frameworks. 649-656 - Marco Dantas, Diogo Leitão, Cláudia Correia, Ricardo Macedo, Weijia Xu, João Paulo:
MONARCH: Hierarchical Storage Management for Deep Learning Frameworks. 657-663 - Luke Logan, Jay F. Lofstead, Scott Levy, Patrick M. Widener, Xian-He Sun, Anthony Kougkas:
pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memory. 664-670 - Sarah Neuwirth, Arnab Kumar Paul:
Parallel I/O Evaluation Techniques and Emerging HPC Workloads: A Perspective. 671-679 - Yuzhen Liu, Oana Marin:
Special function neural network (SFNN) models. 680-685 - Yuuichi Asahi, Sora Hatayama, Takashi Shimokawabe, Naoyuki Onodera, Yuta Hasegawa, Yasuhiro Idomura:
AMR-Net: Convolutional Neural Networks for Multi-resolution Steady Flow Prediction. 686-691 - Xavier Aguilar, Stefano Markidis:
A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations. 692-697 - Li Zhong, Dennis Hoppe, Naweiluo Zhou, Oleksandr Shcherbakov:
Hybrid workflow of Simulation and Deep Learning on HPC: A Case Study for Material Behavior Determination. 698-704 - Martin Svedin, Artur Podobas, Steven Wei Der Chien, Stefano Markidis:
Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain. 705-710 - Md Abdullah Shahneous Bari, Barbara M. Chapman, Anthony Curtis, Robert J. Harrison, Eva Siegmann, Nikolay A. Simakov, Matthew D. Jones:
A64FX performance: experience on Ookami. 711-718 - Sarat Sreepathi, Mark Taylor:
Early Evaluation of Fugaku A64FX Architecture Using Climate Workloads. 719-727 - Miwako Tsuji, Mitsuhisa Sato:
Performance Evaluation and Analysis of A64FX many-core Processor for the Fiber Miniapp Suite. 728-735 - Jens Domke:
A64FX - Your Compiler You Must Decide! 736-740 - Fabio Banchelli, Kilian Peiro, Guillem Ramirez-Gargallo, Joan Vinyals, David Vicente, Marta Garcia-Gasulla, Filippo Mantovani:
Cluster of emerging technology: evaluation of a production HPC system based on A64FX. 741-750 - Jérôme Gurhem, Maxence Vandromme, Miwako Tsuji, Serge G. Petiton, Mitsuhisa Sato:
Sequences of Sparse Matrix-Vector Multiplication on Fugaku's A64FX processors. 751-758 - Karl F. A. Friebel, Stephanie Soldavini, Gerald Hempel, Christian Pilato, Jerónimo Castrillón:
From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics. 759-766 - Nick Brown:
Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAs. 767-774 - Nick Brown, Mark Klaisoongnoen, Oliver Thomson Brown:
Optimisation of an FPGA Credit Default Swap engine by embracing dataflow techniques. 775-778 - Brad Green, Dillon Todd, Jon C. Calhoun, Melissa C. Smith:
TIGRA: A Tightly Integrated Generic RISC-V Accelerator Interface. 779-782 - Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi, Taisuke Boku:
HBM2 Memory System for HPC Applications on an FPGA. 783-786 - Takaaki Miyajima, Kentaro Sano:
A memory bandwidth improvement with memory space partitioning for single-precision floating-point FFT on Stratix 10 FPGA. 787-790 - Naoya Umezu, Yoshiki Yamaguchi, Taisuke Boku:
An FPGA-based storage control with load balancing. 791-794 - Yuting Li, Yun Xu, Xuehai Zhou:
CVFCC: CV-Based Framework for Container Consolidation in Cloud Data Centers. 795-796 - Sahil Sharma, Zhiling Lan, Xingfu Wu, Valerie Taylor:
A Dynamic Power Capping Library for HPC Applications. 797-798 - Shaoheng Luo, Lei Wang, Yufeng Liu, Changhai Zhao, Xudong Zhang:
SDIS: A PB-level seismic data index system with ML methods. 799-800 - Iker Martín-Álvarez, José Ignacio Aliaga, María Isabel Castillo, Rafael Mayo, Sergio Iserte:
Malleability Implementation in a MPI Iterative Method. 801-802 - Chen Zou, Andrew A. Chien, Robert W. Gardner, Ilija Vukotic:
Computational Storage to Increase the Analysis Capability of Tier-2 HEP Data Sites. 803-804 - Chan-Gyu Lee, Hyun-Wook Jin:
NUMA-aware I/O System Call Steering. 805-806 - Michela Taufer, Ewa Deelman, Rafael Ferreira da Silva, Trilce Estrada, Mary W. Hall, Miron Livny:
A Roadmap to Robust Science for High-throughput Applications: The Developers' Perspective. 807-808 - Menuka Warushavithana, Saptashwa Mitra, Mazdak Arabi, F. Jay Breidt, Sangmi Lee Pallickara, Shrideep Pallickara:
A Transfer Learning Scheme for Time Series Forecasting Using Facebook Prophet. 809-810 - Yuyang Wang, Fei Lei, Dezun Dong:
Exploring Node Connection Modes in Multi-Rail Fat-tree. 811-812 - Changhong Wang, Dezun Dong, Zicong Wang, Xiaoyun Zhang, Zhenyu Zhao:
RELAR: A Reinforcement Learning Framework for Adaptive Routing in Network-on-Chips. 813-814 - Saptashwa Mitra, Daniel Rammer, Shrideep Pallickara, Sangmi Lee Pallickara:
A Generative Approach to Visualizing Satellite Data. 815-816 - Lukas Reitz:
Load Balancing Policies for Nested Fork-Join. 817-818 - Xiaoliang Wang, Jianchuan Li, Peiquan Jin, Kuankuan Guo, Yuanjin Lin, Ming Zhao:
Supporting Elastic Compaction of LSM-tree with a FaaS Cluster. 819-820 - Gábor Dániel Balogh, István Z. Reguly:
Automatic Parallelisation of Sturctured Mesh Computations with SYCL. 821-822 - Kevin D. Colby, Shawn Rice:
Halcyon: Unified HPC Center Operations. 823-824 - Keshi Ge, Yiming Zhang, Yongquan Fu, Zhiquan Lai, Xiaoge Deng, Dongsheng Li:
CASQ: Accelerate Distributed Deep Learning with Sketch-Based Gradient Quantization. 825-826 - Sarah Neuwirth:
Toward a Comprehensive Benchmark Suite for Evaluating GASPI in HPC Environments. 827-828 - Trokon Johnson, Herman Lam:
Incorporating Fault-Tolerance Awareness into System-Level Modeling and Simulation. 829-830
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.