Nothing Special   »   [go: up one dir, main page]

Du et al., 2020 - Google Patents

Model parallelism optimization for distributed inference via decoupled CNN structure

Du et al., 2020

View PDF
Document ID
11036145885321513058
Author
Du J
Zhu X
Shen M
Du Y
Lu Y
Xiao N
Liao X
Publication year
Publication venue
IEEE Transactions on Parallel and Distributed Systems

External Links

Snippet

It is promising to deploy CNN inference on local end-user devices for high-accuracy and time-sensitive applications. Model parallelism has the potential to provide high throughput and low latency in distributed CNN inference. However, it is non-trivial to use model …
Continue reading at drive.google.com (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Programme initiating; Programme switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Programme initiating; Programme switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • G06F9/4856Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/80Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5009Computer-aided design using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores

Similar Documents

Publication Publication Date Title
Du et al. Model parallelism optimization for distributed inference via decoupled CNN structure
Jia et al. CoDL: efficient CPU-GPU co-execution for deep learning inference on mobile devices.
CN104461466B (en) The method for improving calculating speed based on MPI and OpenMP Hybrid paradigm parallel computations
Pei et al. Iteration time prediction for CNN in multi-GPU platform: Modeling and analysis
Shetti et al. Optimization of the HEFT algorithm for a CPU-GPU environment
Wang et al. Towards memory-efficient allocation of CNNs on processing-in-memory architecture
Wang et al. Exploiting parallelism for CNN applications on 3D stacked processing-in-memory architecture
CN116258042A (en) Large-scale heat transfer heterogeneous parallel simulation method based on DDM
Wu et al. Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters
Yang et al. Aero: Design space exploration framework for resource-constrained cnn mapping on tile-based accelerators
Ye et al. Galaxy: A resource-efficient collaborative edge ai system for in-situ transformer inference
He et al. Gcim: a near-data processing accelerator for graph construction
Khaitan et al. Proactive task scheduling and stealing in master-slave based load balancing for parallel contingency analysis
Tianyang et al. A Survey: FPGA‐Based Dynamic Scheduling of Hardware Tasks
Zagaris et al. A toolkit for parallel overset grid assembly targeting large-scale moving body aerodynamic simulations
Wang et al. Directive-based hybrid parallel power system dynamic simulation on multi-core cpu and many-core gpu architecture
Na et al. Scalable smartphone cluster for deep learning
Yu et al. GPU-based JFNG method for power system transient dynamic simulation
Song Analysis on heterogeneous computing
CN110415162B (en) Adaptive Graph Partitioning Method for Heterogeneous Fusion Processors in Big Data
Chen et al. SunwayURANS: 3D full-annulus URANS simulations of transonic axial compressors on Sunway TaihuLight
Du et al. Enhancing Distributed In-Situ CNN Inference in the Internet of Things
CN116185378A (en) Optimization method of calculation graph, data processing method and related products
Zhang et al. An effective 2-dimension graph partitioning for work stealing assisted graph processing on multi-FPGAs
Khan et al. Analyzing the Implementation of the Newton Raphson Based Power Flow Formulation in CPU+ GPU Computing Environment