Du et al., 2020 - Google Patents

Model parallelism optimization for distributed inference via decoupled CNN structure

Du et al., 2020

Document ID: 11036145885321513058
Author: Du J; Zhu X; Shen M; Du Y; Lu Y; Xiao N; Liao X
Publication year: 2020
Publication venue: IEEE Transactions on Parallel and Distributed Systems

External Links

Cited by

Snippet

It is promising to deploy CNN inference on local end-user devices for high-accuracy and time-sensitive applications. Model parallelism has the potential to provide high throughput and low latency in distributed CNN inference. However, it is non-trivial to use model …

Continue reading at drive.google.com (PDF) (other versions)

238000005457 optimization 0 title abstract description 31

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/485—Task life-cycle, e.g. stopping, restarting, resuming execution
- G06F9/4856—Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G06F9/5088—Techniques for rebalancing the load in a distributed system involving task migration
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores

Similar Documents

Publication	Publication Date	Title
Du et al.	2020	Model parallelism optimization for distributed inference via decoupled CNN structure
Jia et al.	2022	CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices.
CN104461466B (en)	2018-09-21	The method for improving calculating speed based on MPI and OpenMP Hybrid paradigm parallel computations
Pei et al.	2019	Iteration time prediction for CNN in multi-GPU platform: Modeling and analysis
Shetti et al.	2013	Optimization of the HEFT algorithm for a CPU-GPU environment
Wang et al.	2018	Towards memory-efficient allocation of CNNs on processing-in-memory architecture
Wang et al.	2018	Exploiting parallelism for CNN applications on 3D stacked processing-in-memory architecture
CN116258042A (en)	2023-06-13	Large-scale heat transfer heterogeneous parallel simulation method based on DDM
Wu et al.	2012	Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters
Yang et al.	2022	Aero: Design space exploration framework for resource-constrained cnn mapping on tile-based accelerators
Ye et al.	2024	Galaxy: A resource-efficient collaborative edge ai system for in-situ transformer inference
He et al.	2021	Gcim: a near-data processing accelerator for graph construction
Khaitan et al.	2013	Proactive task scheduling and stealing in master-slave based load balancing for parallel contingency analysis
Tianyang et al.	2021	A Survey: FPGA‐Based Dynamic Scheduling of Hardware Tasks
Zagaris et al.	2010	A toolkit for parallel overset grid assembly targeting large-scale moving body aerodynamic simulations
Wang et al.	2021	Directive-based hybrid parallel power system dynamic simulation on multi-core cpu and many-core gpu architecture
Na et al.	2021	Scalable smartphone cluster for deep learning
Yu et al.	2014	GPU-based JFNG method for power system transient dynamic simulation
Song	2021	Analysis on heterogeneous computing
CN110415162B (en)	2020-03-31	Adaptive Graph Partitioning Method for Heterogeneous Fusion Processors in Big Data
Chen et al.	2022	SunwayURANS: 3D full-annulus URANS simulations of transonic axial compressors on Sunway TaihuLight
Du et al.	2022	Enhancing Distributed In-Situ CNN Inference in the Internet of Things
CN116185378A (en)	2023-05-30	Optimization method of calculation graph, data processing method and related products
Zhang et al.	2020	An effective 2-dimension graph partitioning for work stealing assisted graph processing on multi-FPGAs
Khan et al.	2023	Analyzing the Implementation of the Newton Raphson Based Power Flow Formulation in CPU+ GPU Computing Environment