Nothing Special   »   [go: up one dir, main page]

Du et al., 2020 - Google Patents

Model parallelism optimization for distributed inference via decoupled CNN structure

Du et al., 2020

View PDF
Document ID
11036145885321513058
Author
Du J
Zhu X
Shen M
Du Y
Lu Y
Xiao N
Liao X
Publication year
Publication venue
IEEE Transactions on Parallel and Distributed Systems

External Links

Snippet

It is promising to deploy CNN inference on local end-user devices for high-accuracy and time-sensitive applications. Model parallelism has the potential to provide high throughput and low latency in distributed CNN inference. However, it is non-trivial to use model …
Continue reading at drive.google.com (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Programme initiating; Programme switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Programme initiating; Programme switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • G06F9/4856Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/80Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5009Computer-aided design using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores

Similar Documents

Publication Publication Date Title
Du et al. Model parallelism optimization for distributed inference via decoupled CNN structure
US9038088B2 (en) Load balancing on hetrogenous processing cluster based on exceeded load imbalance factor threshold determined by total completion time of multiple processing phases
Zhou et al. Optimization of parallel iterated local search algorithms on graphics processing unit
Pei et al. Iteration time prediction for cnn in multi-gpu platform: modeling and analysis
Zhang et al. MrHeter: improving MapReduce performance in heterogeneous environments
Shetti et al. Optimization of the HEFT algorithm for a CPU-GPU environment
Wang et al. Towards memory-efficient allocation of CNNs on processing-in-memory architecture
Dzafic et al. High performance power flow algorithm for symmetrical distribution networks with unbalanced loading
Wu et al. Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters
CN114970294A (en) Three-dimensional strain simulation PCG parallel optimization method and system based on Shenwei architecture
CN116258042A (en) Large-scale heat transfer heterogeneous parallel simulation method based on DDM
Alaasam et al. Scientific micro-workflows: where event-driven approach meets workflows to support digital twins
Tianyang et al. A Survey: FPGA‐Based Dynamic Scheduling of Hardware Tasks
Zagaris et al. A toolkit for parallel overset grid assembly targeting large-scale moving body aerodynamic simulations
Zhang et al. EdgeNN: Efficient Neural Network Inference for CPU-GPU Integrated Edge Devices
CN110415162B (en) Adaptive graph partitioning method facing heterogeneous fusion processor in big data
Yu et al. GPU-based JFNG method for power system transient dynamic simulation
CN116185378A (en) Optimization method of calculation graph, data processing method and related products
Du et al. Enhancing Distributed In-Situ CNN Inference in the Internet of Things
Fang et al. A real-time and reliable dynamic migration model for concurrent taskflow in a GPU cluster
Khan et al. Analyzing the Implementation of the Newton Raphson Based Power Flow Formulation in CPU+ GPU Computing Environment
KR20220117433A (en) A open memory sub-system optimized for artificial intelligence semiconductors
Na et al. Scalable smartphone cluster for deep learning
Song Analysis on heterogeneous computing
Chen et al. SunwayURANS: 3D full-annulus URANS simulations of transonic axial compressors on Sunway TaihuLight