HiHGNN: Accelerating HGNNs Through Parallelism and Data Reusability Exploitation
- Runzhen Xue,
- Dengke Han,
- Mingyu Yan,
- Mo Zou,
- Xiaocheng Yang,
- Duo Wang,
- Wenming Li,
- Zhimin Tang,
- John Kim,
- Xiaochun Ye,
- Dongrui Fan
Heterogeneous graph neural networks (HGNNs) have emerged as powerful algorithms for processing heterogeneous graphs (HetGs), widely used in many critical fields. To capture both structural and semantic information in HetGs, HGNNs first aggregate the ...
A 3D Hybrid Optical-Electrical NoC Using Novel Mapping Strategy Based DCNN Dataflow Acceleration
A large number of multiply-accumulate operations and memory accesses required in deep convolutional neural networks (DCNN) leads to high latency and energy consumption (EC), that hinder their further applications. Dataflow-based acceleration schemes ...
Synchronize Only the Immature Parameters: Communication-Efficient Federated Learning By Freezing Parameters Adaptively
Federated learning allows edge devices to collaboratively train a global model without sharing their local private data. Yet, with limited network bandwidth at the edge, communication often becomes a severe bottleneck. In this article, we find that it is ...
FastTuning: Enabling Fast and Efficient Hyper-Parameter Tuning With Partitioning and Parallelism of Search Space
- Xiaqing Li,
- Qi Guo,
- Guangyan Zhang,
- Siwei Ye,
- Guanhua He,
- Yiheng Yao,
- Rui Zhang,
- Yifan Hao,
- Zidong Du,
- Weimin Zheng
Hyper-parameter tuning (HPT) for deep learning (DL) models is prohibitively expensive. Sequential model-based optimization (SMBO) emerges as the state-of-the-art (SOTA) approach to automatically optimize HPT performance due to its heuristic advantages. ...
FedREM: Guided Federated Learning in the Presence of Dynamic Device Unpredictability
Federated learning (FL) is a promising distributed machine learning scheme where multiple clients collaborate by sharing a common learning model while maintaining their private data locally. It can be applied to a lot of applications, e.g., training an ...
Fed-RAC: Resource-Aware Clustering for Tackling Heterogeneity of Participants in Federated Learning
Federated Learning is a training framework that enables multiple participants to collaboratively train a shared model while preserving data privacy. The heterogeneity of devices and networking resources of the participants delay the training and ...
Graph-Centric Performance Analysis for Large-Scale Parallel Applications
Performance analysis is essential for understanding the performance behaviors of parallel programs and detecting performance bottlenecks. Whereas, complex interconnections across several types of performance bugs, as well as inter-process communications ...
Spiking Neural P Systems With Microglia
Spiking neural P systems (SNP systems), one of the parallel and distributed computing models with biological interpretability, have been a hot research topic in bio-inspired computational models in recent years. To improve the stability of the models, ...
Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS Targets
Stream processing systems commonly work with auto-scaling to ensure resource efficiency and quality of service (QoS). Existing auto-scaling solutions lack accuracy in resource allocation because they rely on static QoS-resource models that fail to account ...
Availability-Aware Revenue-Effective Application Deployment in Multi-Access Edge Computing
Multi-access edge computing (MEC) has emerged as a promising computing paradigm to push computing resources and services to the network edge. It allows applications/services to be deployed on edge servers for provisioning low-latency services to nearby ...
AdaptChain: Adaptive Data Sharing and Synchronization for NFV Systems on Heterogeneous Architectures
In a Network Function Virtualization (NFV) system, network functions (NFs) are implemented on general-purpose hardware, including CPU, GPU, and FPGA. Studies have shown that there is no one-size-fits-all processor, as each processor demonstrates ...
CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAs
Coarse-Grained Reconfigurable Array (CGRA) architectures are popular as high-performance and energy-efficient computing devices. Compute-intensive loop constructs of complex applications are mapped onto CGRAs by modulo-scheduling the innermost loop ...
Rollback-Free Recovery for a High Performance Dense Linear Solver With Reduced Memory Footprint
The scale of nowadays High Performance Computing (HPC) systems is the key element that determines the achievement of impressive performance, as well as the reason for their relatively limited reliability. Over the last decade, specific areas of the High ...
Adaptive Neural Control for a Network of Parabolic PDEs With Event-Triggered Mechanism
This paper investigates the finite-time consensus problem for nonlinear parabolic networks by designing a new tracking controller. For undirected topology, the newly designed controller allows to optimize the consensus time by adjusting the parameter <...