Nothing Special   »   [go: up one dir, main page]

CN113076190A - Computing method based on cooperation of CPU and GPU - Google Patents

Computing method based on cooperation of CPU and GPU Download PDF

Info

Publication number
CN113076190A
CN113076190A CN202110233041.5A CN202110233041A CN113076190A CN 113076190 A CN113076190 A CN 113076190A CN 202110233041 A CN202110233041 A CN 202110233041A CN 113076190 A CN113076190 A CN 113076190A
Authority
CN
China
Prior art keywords
task
gpu
cpu
calculation
cpu module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110233041.5A
Other languages
Chinese (zh)
Inventor
李健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Blue Yun Polytron Technologies Inc
Original Assignee
Beijing Blue Yun Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Blue Yun Polytron Technologies Inc filed Critical Beijing Blue Yun Polytron Technologies Inc
Priority to CN202110233041.5A priority Critical patent/CN113076190A/en
Publication of CN113076190A publication Critical patent/CN113076190A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a calculation method based on the cooperation of a CPU and a GPU, which comprises the following steps: screening out a CPU module with an idle memory meeting the requirement according to the size of the idle memory of the CPU module; the CPU module analyzes the input calculation task and determines a task segmentation reference corresponding to the calculation task; the CPU module divides the calculation task into a plurality of subtasks according to an even division strategy in the task division reference; the CPU module distributes a plurality of subtasks to GPU modules with corresponding quantity for parallel calculation; and the CPU module receives the calculation results transmitted by each GPU module and combines all the calculation results. The computing task is divided by each CPU module and distributed to the GPU modules for parallel computing, and finally, the result is merged, so that the multiple CPUs and the multiple GPUs can finish the same computing task together.

Description

Computing method based on cooperation of CPU and GPU
Technical Field
The invention relates to the technical field of computing methods, in particular to a computing method based on cooperation of a CPU and a GPU.
Background
The GPU module (GPU for short) runs the parallel part of the application using the graphics processing unit, thereby achieving an advanced function many times faster than the standard CPU module (CPU for short). GPU acceleration has gained a great deal of use in high performance computing data centers because GPUs can provide faster running high performance computing programs, such as computing problems in the fields of biology, physics, seismic data processing, finance, and other disciplines.
The GPGPU is a general-purpose computing task which is originally processed by a central processing unit and is calculated by a graphics processor for processing graphics tasks. These general purpose computations often have no relationship to graphics processing. Due to the powerful parallel processing capability and programmable pipelining of modern graphics processors, stream processors are enabled to process non-graphics data. In particular, when single instruction stream multiple data Stream (SIMD) is faced and the computation load of data processing is much larger than the requirement of data scheduling and transmission, the performance of the general-purpose graphics processor greatly surpasses that of the conventional cpu application.
At present, in the existing computing method, the cooperative computing of the CPU and the GPU is rarely used, the CPU and the GPU usually execute respective functions, and the cooperation is rarely used when a program is operated, so the efficiency is not high.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
Aiming at the technical problems in the related art, the invention provides a calculation method based on the cooperation of a CPU and a GPU, which can solve the defects in the prior art.
In order to achieve the technical purpose, the technical scheme of the invention is realized as follows:
a calculation method based on cooperation of a CPU and a GPU comprises the following steps:
s1, screening out a CPU module with an idle memory meeting the requirement according to the size of the idle memory of the CPU module;
s2, analyzing the input calculation task by the CPU module, and determining a task segmentation standard corresponding to the calculation task;
s3, the CPU module divides the calculation task into a plurality of subtasks according to the uniform division strategy in the task division reference;
s4, the CPU module distributes the subtasks to a corresponding number of GPU modules for parallel computation;
s5, the CPU module receives the calculation results transmitted by each GPU module and combines all the calculation results.
Further, step S2 specifically includes:
s21, substituting the calculation task into an analysis model by the CPU module for analysis to obtain a task label of the calculation task;
s22, comparing the task label with a pre-stored task division reference table, and finding the task division reference corresponding to the task label.
Further, in S3, the CPU module generates a task state table after dividing the calculation task into a plurality of the subtasks.
Further, in S5, the CPU module detects whether all the subtasks are completed at regular time through the task state table, and merges all the calculation results after all the subtasks are completed.
Further, step S4 specifically includes:
s41, the CPU module evaluates the priority of each subtask;
s42, the CPU module checks the free memories of all the GPU modules, and screens out all the GPU modules with free memories meeting the subtask requirements;
s43, the CPU module evaluates the calculation ability of all the screened GPU modules, and appoints the GPU module with corresponding calculation ability for each subtask according to the priority.
Further, in S4, the GPU module further subdivides the received subtasks, allocates subtask nibbles to each stream processor of the GPU module, and calculates each subtask nibble in parallel.
Further, step S5 specifically includes:
s51, each GPU module transmits the calculation result back to the memory of the CPU module from the memory of the GPU module;
s52, the CPU module merges all the calculation results according to the merging reference in the task dividing reference;
s53, the CPU module clears the memory of the relevant GPU module.
The invention has the beneficial effects that: the computing task is divided by each CPU module and distributed to a plurality of GPU modules for parallel computing, and finally, the result is merged, so that a plurality of CPUs and a plurality of GPUs can jointly complete the same computing task; the task label of the calculation task is obtained by substituting the calculation task into the analysis model, and is compared with the task segmentation reference table to obtain the task segmentation reference corresponding to the calculation task, so that different task segmentation references can be adopted according to different calculation tasks, the accuracy of the task segmentation is improved, and the accuracy of the calculation result is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic diagram of a computing method based on cooperation of a CPU and a GPU according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
As shown in fig. 1, the calculation method based on cooperation of the CPU and the GPU according to the embodiment of the present invention includes the following steps:
s1, screening out a CPU module with an idle memory meeting the requirement according to the size of the idle memory of the CPU module;
s2, analyzing the input calculation task by the CPU module, and determining a task segmentation standard corresponding to the calculation task;
s3, the CPU module divides the calculation task into a plurality of subtasks according to the uniform division strategy in the task division reference;
s4, the CPU module distributes the subtasks to a corresponding number of GPU modules for parallel computation;
s5, the CPU module receives the calculation results transmitted by each GPU module and combines all the calculation results.
In an embodiment of the present invention, step S2 specifically includes:
s21, substituting the calculation task into an analysis model by the CPU module for analysis to obtain a task label of the calculation task;
s22, comparing the task label with a pre-stored task division reference table, and finding the task division reference corresponding to the task label.
In an embodiment of the present invention, in S3, the CPU module generates a task state table after dividing the computing task into a plurality of the subtasks.
In a specific embodiment of the present invention, in S5, the CPU module detects whether all the subtasks are completed at regular time through the task state table, and merges all the calculation results after all the subtasks are completed.
In an embodiment of the present invention, step S4 specifically includes:
s41, the CPU module evaluates the priority of each subtask;
s42, the CPU module checks the free memories of all the GPU modules, and screens out all the GPU modules with free memories meeting the subtask requirements;
s43, the CPU module evaluates the calculation ability of all the screened GPU modules, and appoints the GFU module with corresponding calculation ability for each subtask according to the priority.
In an embodiment of the present invention, in S4, the GPU module further subdivides the received subtasks, allocates a subtask detail block to each stream processor of the GPU module, and calculates each subtask detail block in parallel.
In an embodiment of the present invention, step S5 specifically includes:
s51, each GPU module transmits the calculation result back to the memory of the CPU module from the memory of the GPU module;
s52, the CPU module merges all the calculation results according to the merging reference in the task dividing reference;
s53, the CPU module clears the memory of the relevant GPU module.
In order to facilitate understanding of the above-described embodiments of the present invention, the following detailed description of the embodiments of the present invention is provided by way of specific usage.
For a device with multiple CPUs and multiple GPUs, the CPUs and the GPUs in the device are generally connected via an AGP or PCI-E bus via a north bridge, and each has an independent external memory, which is a memory and a video memory. The task division is to distribute a calculation task to each thread according to a uniform division strategy to be matched and completed. The CPU is responsible for scheduling and dividing each parallel computing job and scheduling and dividing each corresponding GPU parallel computing job besides general processing tasks such as an operating system, system software, general application programs and the like; while the GPU is responsible for parallel computing processing.
The multiple CPUs are communicated through a bus and perform calculation, and the multiple GPUs are used for exchanging data through a unified shared memory and performing calculation; the high-speed serial bus is connected with the GPU and the CPU, and the memory of the CPU and the shared memory of the GPU are used for exchanging and calculating data.
When the method is used specifically, 1) local resources and cloud resources are collected for integration, repeated data are removed, a task type database is established by taking task type names as categories, and then a task analysis model is established by adopting a label matching method on the basis of the established task type database.
2) Setting a task division reference, and storing the set task division reference in a chart form;
3) screening out a CPU module with an idle memory meeting the requirement for processing a computing task according to the size of the idle memory of the CPU module;
4) the CPU module substitutes the calculation task into the analysis model for analysis to obtain a task label of the calculation task, and then compares the task label with a task division reference table to find a task division reference corresponding to the task label, wherein the task division reference comprises an even division strategy and a merging reference corresponding to the calculation task;
5) the CPU module divides the calculation task into a plurality of subtasks according to an even division strategy and generates a task state table, the even division strategy can enable the calculation time of each subtask to be consistent during calculation, the time difference of each GPU module for completing the subtasks is small, and the situation that part of GPU modules are idle when the calculation task which consumes much time is executed is avoided;
6) the CPU module evaluates the priority of each subtask, then checks the idle memories of all GPU modules, screens out all GPU modules of which the idle memories meet the requirements of the subtasks, then evaluates the computing power of all screened GPU modules, and appoints GPU modules with corresponding computing power for each subtask according to the priority;
7) the GPU module further subdivides the received subtasks, sub-task detailed blocks are distributed to each stream processor, and each sub-task detailed block is calculated in parallel to obtain a calculation result;
8) and each GPU module transmits the calculation result back to the memory of the CPU module from the memory of the GPU module, then the CPU module regularly detects whether all the subtasks are completed through the task state table, after all the subtasks are completed, the CPU module merges all the calculation results according to a merging standard, and finally the CPU module clears the memory of the relevant GPU module.
In summary, according to the above technical solution of the present invention, each CPU module divides a computation task and distributes the computation task to a plurality of GPU modules for parallel computation, and finally, in a manner of combining results, multiple CPUs and multiple GPUs complete the same computation task together, and the architecture can fully utilize the parallel processing capability of the GPU cores to realize fast parallel computation of a large data volume; the task label of the calculation task is obtained by substituting the calculation task into the analysis model, and is compared with the task segmentation reference table to obtain the task segmentation reference corresponding to the calculation task, so that different task segmentation references can be adopted according to different calculation tasks, the accuracy of the task segmentation is improved, and the accuracy of the calculation result is improved.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A computing method based on cooperation of a CPU and a GPU is characterized by comprising the following steps:
s1, screening out a CPU module with an idle memory meeting the requirement according to the size of the idle memory of the CPU module;
s2, analyzing the input calculation task by the CPU module, and determining a task segmentation standard corresponding to the calculation task;
s3, the CPU module divides the calculation task into a plurality of subtasks according to the uniform division strategy in the task division reference;
s4, the CPU module distributes the subtasks to a corresponding number of GPU modules for parallel computation;
s5, the CPU module receives the calculation results transmitted by each GPU module and combines all the calculation results.
2. The computing method based on cooperation of the CPU and the GPU as claimed in claim 1, wherein step S2 specifically includes:
s21, substituting the calculation task into an analysis model by the CPU module for analysis to obtain a task label of the calculation task;
s22, comparing the task label with a pre-stored task division reference table, and finding the task division reference corresponding to the task label.
3. The method for computing based on cooperation of a CPU and a GPU as claimed in claim 1, wherein in S3, the CPU module generates a task state table after dividing the computing task into a plurality of the subtasks.
4. The computing method based on cooperation of the CPU and the GPU as claimed in claim 3, wherein in S5, the CPU module periodically detects whether all the subtasks are completed through the task state table, and merges all the computing results after all the subtasks are completed.
5. The computing method based on cooperation of the CPU and the GPU as claimed in claim 1, wherein step S4 specifically includes:
s41, the CPU module evaluates the priority of each subtask;
s42, the CPU module checks the free memories of all the GPU modules, and screens out all the GPU modules with free memories meeting the subtask requirements;
s43, the CPU module evaluates the calculation ability of all the screened GPU modules, and appoints the GPU module with corresponding calculation ability for each subtask according to the priority.
6. The method for computing based on cooperation of a CPU and a GPU as claimed in claim 1, wherein in S4, the GPU module further subdivides the received subtasks, allocates a sub-task detail block for each stream processor of the GPU module, and computes each sub-task detail block in parallel.
7. The computing method based on cooperation of the CPU and the GPU as claimed in claim 1, wherein step S5 specifically includes:
s51, each GPU module transmits the calculation result back to the memory of the CPU module from the memory of the GPU module;
s52, the CPU module merges all the calculation results according to the merging reference in the task dividing reference;
s53, the CPU module clears the memory of the relevant GPU module.
CN202110233041.5A 2021-02-23 2021-02-23 Computing method based on cooperation of CPU and GPU Pending CN113076190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110233041.5A CN113076190A (en) 2021-02-23 2021-02-23 Computing method based on cooperation of CPU and GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110233041.5A CN113076190A (en) 2021-02-23 2021-02-23 Computing method based on cooperation of CPU and GPU

Publications (1)

Publication Number Publication Date
CN113076190A true CN113076190A (en) 2021-07-06

Family

ID=76609755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110233041.5A Pending CN113076190A (en) 2021-02-23 2021-02-23 Computing method based on cooperation of CPU and GPU

Country Status (1)

Country Link
CN (1) CN113076190A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739884A (en) * 2023-08-16 2023-09-12 北京蓝耘科技股份有限公司 Calculation method based on cooperation of CPU and GPU

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739884A (en) * 2023-08-16 2023-09-12 北京蓝耘科技股份有限公司 Calculation method based on cooperation of CPU and GPU
CN116739884B (en) * 2023-08-16 2023-11-03 北京蓝耘科技股份有限公司 Calculation method based on cooperation of CPU and GPU

Similar Documents

Publication Publication Date Title
US8990827B2 (en) Optimizing data warehousing applications for GPUs using dynamic stream scheduling and dispatch of fused and split kernels
US20120256922A1 (en) Multithreaded Processor and Method for Realizing Functions of Central Processing Unit and Graphics Processing Unit
CN105487838A (en) Task-level parallel scheduling method and system for dynamically reconfigurable processor
US8799858B2 (en) Efficient execution of human machine interface applications in a heterogeneous multiprocessor environment
US10268519B2 (en) Scheduling method and processing device for thread groups execution in a computing system
WO2023082575A1 (en) Graph execution pipeline parallelism method and apparatus for neural network model computation
CN104657111A (en) Parallel computing method and device
CN113434548B (en) Spark-based large-scale data stream analysis method and system
CN113076190A (en) Computing method based on cooperation of CPU and GPU
US9965318B2 (en) Concurrent principal component analysis computation
WO2022121273A1 (en) Simt instruction processing method and device
CN113051049A (en) Task scheduling system, method, electronic device and readable storage medium
CN114356550B (en) Automatic computing resource allocation method and system for three-level parallel middleware
CN110262884B (en) Running method for multi-program multi-data-stream partition parallel in core group based on Shenwei many-core processor
CN116302504B (en) Thread block processing system, method and related equipment
CN111913816A (en) Implementation method, device, terminal and medium for clusters in GPGPU (general purpose graphics processing unit)
EP4432210A1 (en) Data processing method and apparatus, electronic device, and computer-readable storage medium
CN113407333B (en) Task scheduling method, system, GPU and equipment for Warp level scheduling
CN116107634A (en) Instruction control method and device and related equipment
CN114003359A (en) Task scheduling method and system based on elastic and durable thread block and GPU
CN113076191A (en) Cluster GPU resource scheduling system
CN117194041B (en) Parallel optimization method and system for high-performance computer
Huang et al. Solving quadratic programming problems on graphics processing unit
Wang et al. Performance optimization for CPU-GPU heterogeneous parallel system
Iliakis et al. Decoupled mapreduce for shared-memory multi-core architectures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination