Du et al., 2020 - Google Patents
Model parallelism optimization for distributed inference via decoupled CNN structureDu et al., 2020
View PDF- Document ID
- 11036145885321513058
- Author
- Du J
- Zhu X
- Shen M
- Du Y
- Lu Y
- Xiao N
- Liao X
- Publication year
- Publication venue
- IEEE Transactions on Parallel and Distributed Systems
External Links
Snippet
It is promising to deploy CNN inference on local end-user devices for high-accuracy and time-sensitive applications. Model parallelism has the potential to provide high throughput and low latency in distributed CNN inference. However, it is non-trivial to use model …
- 238000005457 optimization 0 title abstract description 31
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/485—Task life-cycle, e.g. stopping, restarting, resuming execution
- G06F9/4856—Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G06F9/5088—Techniques for rebalancing the load in a distributed system involving task migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Du et al. | Model parallelism optimization for distributed inference via decoupled CNN structure | |
US9038088B2 (en) | Load balancing on hetrogenous processing cluster based on exceeded load imbalance factor threshold determined by total completion time of multiple processing phases | |
Zhou et al. | Optimization of parallel iterated local search algorithms on graphics processing unit | |
Pei et al. | Iteration time prediction for cnn in multi-gpu platform: modeling and analysis | |
Zhang et al. | MrHeter: improving MapReduce performance in heterogeneous environments | |
Shetti et al. | Optimization of the HEFT algorithm for a CPU-GPU environment | |
Wang et al. | Towards memory-efficient allocation of CNNs on processing-in-memory architecture | |
Dzafic et al. | High performance power flow algorithm for symmetrical distribution networks with unbalanced loading | |
Wu et al. | Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters | |
CN114970294A (en) | Three-dimensional strain simulation PCG parallel optimization method and system based on Shenwei architecture | |
CN116258042A (en) | Large-scale heat transfer heterogeneous parallel simulation method based on DDM | |
Alaasam et al. | Scientific micro-workflows: where event-driven approach meets workflows to support digital twins | |
Tianyang et al. | A Survey: FPGA‐Based Dynamic Scheduling of Hardware Tasks | |
Zagaris et al. | A toolkit for parallel overset grid assembly targeting large-scale moving body aerodynamic simulations | |
Zhang et al. | EdgeNN: Efficient Neural Network Inference for CPU-GPU Integrated Edge Devices | |
CN110415162B (en) | Adaptive graph partitioning method facing heterogeneous fusion processor in big data | |
Yu et al. | GPU-based JFNG method for power system transient dynamic simulation | |
CN116185378A (en) | Optimization method of calculation graph, data processing method and related products | |
Du et al. | Enhancing Distributed In-Situ CNN Inference in the Internet of Things | |
Fang et al. | A real-time and reliable dynamic migration model for concurrent taskflow in a GPU cluster | |
Khan et al. | Analyzing the Implementation of the Newton Raphson Based Power Flow Formulation in CPU+ GPU Computing Environment | |
KR20220117433A (en) | A open memory sub-system optimized for artificial intelligence semiconductors | |
Na et al. | Scalable smartphone cluster for deep learning | |
Song | Analysis on heterogeneous computing | |
Chen et al. | SunwayURANS: 3D full-annulus URANS simulations of transonic axial compressors on Sunway TaihuLight |