CN113076190A

CN113076190A - Computing method based on cooperation of CPU and GPU

Info

Publication number: CN113076190A
Application number: CN202110233041.5A
Authority: CN
Inventors: 李健
Original assignee: Beijing Blue Yun Polytron Technologies Inc
Current assignee: Beijing Blue Yun Polytron Technologies Inc
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2021-07-06

Abstract

The invention discloses a calculation method based on the cooperation of a CPU and a GPU, which comprises the following steps: screening out a CPU module with an idle memory meeting the requirement according to the size of the idle memory of the CPU module; the CPU module analyzes the input calculation task and determines a task segmentation reference corresponding to the calculation task; the CPU module divides the calculation task into a plurality of subtasks according to an even division strategy in the task division reference; the CPU module distributes a plurality of subtasks to GPU modules with corresponding quantity for parallel calculation; and the CPU module receives the calculation results transmitted by each GPU module and combines all the calculation results. The computing task is divided by each CPU module and distributed to the GPU modules for parallel computing, and finally, the result is merged, so that the multiple CPUs and the multiple GPUs can finish the same computing task together.

Description

Computing method based on cooperation of CPU and GPU

Technical Field

The invention relates to the technical field of computing methods, in particular to a computing method based on cooperation of a CPU and a GPU.

Background

The GPU module (GPU for short) runs the parallel part of the application using the graphics processing unit, thereby achieving an advanced function many times faster than the standard CPU module (CPU for short). GPU acceleration has gained a great deal of use in high performance computing data centers because GPUs can provide faster running high performance computing programs, such as computing problems in the fields of biology, physics, seismic data processing, finance, and other disciplines.

The GPGPU is a general-purpose computing task which is originally processed by a central processing unit and is calculated by a graphics processor for processing graphics tasks. These general purpose computations often have no relationship to graphics processing. Due to the powerful parallel processing capability and programmable pipelining of modern graphics processors, stream processors are enabled to process non-graphics data. In particular, when single instruction stream multiple data Stream (SIMD) is faced and the computation load of data processing is much larger than the requirement of data scheduling and transmission, the performance of the general-purpose graphics processor greatly surpasses that of the conventional cpu application.

At present, in the existing computing method, the cooperative computing of the CPU and the GPU is rarely used, the CPU and the GPU usually execute respective functions, and the cooperation is rarely used when a program is operated, so the efficiency is not high.

An effective solution to the problems in the related art has not been proposed yet.

Disclosure of Invention

Aiming at the technical problems in the related art, the invention provides a calculation method based on the cooperation of a CPU and a GPU, which can solve the defects in the prior art.

In order to achieve the technical purpose, the technical scheme of the invention is realized as follows:

a calculation method based on cooperation of a CPU and a GPU comprises the following steps:

s1, screening out a CPU module with an idle memory meeting the requirement according to the size of the idle memory of the CPU module;

s2, analyzing the input calculation task by the CPU module, and determining a task segmentation standard corresponding to the calculation task;

s3, the CPU module divides the calculation task into a plurality of subtasks according to the uniform division strategy in the task division reference;

s4, the CPU module distributes the subtasks to a corresponding number of GPU modules for parallel computation;

s5, the CPU module receives the calculation results transmitted by each GPU module and combines all the calculation results.

Further, step S2 specifically includes:

s21, substituting the calculation task into an analysis model by the CPU module for analysis to obtain a task label of the calculation task;

s22, comparing the task label with a pre-stored task division reference table, and finding the task division reference corresponding to the task label.

Further, in S3, the CPU module generates a task state table after dividing the calculation task into a plurality of the subtasks.

Further, in S5, the CPU module detects whether all the subtasks are completed at regular time through the task state table, and merges all the calculation results after all the subtasks are completed.

Further, step S4 specifically includes:

s41, the CPU module evaluates the priority of each subtask;

s42, the CPU module checks the free memories of all the GPU modules, and screens out all the GPU modules with free memories meeting the subtask requirements;

s43, the CPU module evaluates the calculation ability of all the screened GPU modules, and appoints the GPU module with corresponding calculation ability for each subtask according to the priority.

Further, in S4, the GPU module further subdivides the received subtasks, allocates subtask nibbles to each stream processor of the GPU module, and calculates each subtask nibble in parallel.

Further, step S5 specifically includes:

s51, each GPU module transmits the calculation result back to the memory of the CPU module from the memory of the GPU module;

s52, the CPU module merges all the calculation results according to the merging reference in the task dividing reference;

s53, the CPU module clears the memory of the relevant GPU module.

The invention has the beneficial effects that: the computing task is divided by each CPU module and distributed to a plurality of GPU modules for parallel computing, and finally, the result is merged, so that a plurality of CPUs and a plurality of GPUs can jointly complete the same computing task; the task label of the calculation task is obtained by substituting the calculation task into the analysis model, and is compared with the task segmentation reference table to obtain the task segmentation reference corresponding to the calculation task, so that different task segmentation references can be adopted according to different calculation tasks, the accuracy of the task segmentation is improved, and the accuracy of the calculation result is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic diagram of a computing method based on cooperation of a CPU and a GPU according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

As shown in fig. 1, the calculation method based on cooperation of the CPU and the GPU according to the embodiment of the present invention includes the following steps:

In an embodiment of the present invention, step S2 specifically includes:

In an embodiment of the present invention, in S3, the CPU module generates a task state table after dividing the computing task into a plurality of the subtasks.

In a specific embodiment of the present invention, in S5, the CPU module detects whether all the subtasks are completed at regular time through the task state table, and merges all the calculation results after all the subtasks are completed.

In an embodiment of the present invention, step S4 specifically includes:

s41, the CPU module evaluates the priority of each subtask;

s43, the CPU module evaluates the calculation ability of all the screened GPU modules, and appoints the GFU module with corresponding calculation ability for each subtask according to the priority.

In an embodiment of the present invention, in S4, the GPU module further subdivides the received subtasks, allocates a subtask detail block to each stream processor of the GPU module, and calculates each subtask detail block in parallel.

In an embodiment of the present invention, step S5 specifically includes:

s53, the CPU module clears the memory of the relevant GPU module.

In order to facilitate understanding of the above-described embodiments of the present invention, the following detailed description of the embodiments of the present invention is provided by way of specific usage.

For a device with multiple CPUs and multiple GPUs, the CPUs and the GPUs in the device are generally connected via an AGP or PCI-E bus via a north bridge, and each has an independent external memory, which is a memory and a video memory. The task division is to distribute a calculation task to each thread according to a uniform division strategy to be matched and completed. The CPU is responsible for scheduling and dividing each parallel computing job and scheduling and dividing each corresponding GPU parallel computing job besides general processing tasks such as an operating system, system software, general application programs and the like; while the GPU is responsible for parallel computing processing.

The multiple CPUs are communicated through a bus and perform calculation, and the multiple GPUs are used for exchanging data through a unified shared memory and performing calculation; the high-speed serial bus is connected with the GPU and the CPU, and the memory of the CPU and the shared memory of the GPU are used for exchanging and calculating data.

When the method is used specifically, 1) local resources and cloud resources are collected for integration, repeated data are removed, a task type database is established by taking task type names as categories, and then a task analysis model is established by adopting a label matching method on the basis of the established task type database.

2) Setting a task division reference, and storing the set task division reference in a chart form;

3) screening out a CPU module with an idle memory meeting the requirement for processing a computing task according to the size of the idle memory of the CPU module;

4) the CPU module substitutes the calculation task into the analysis model for analysis to obtain a task label of the calculation task, and then compares the task label with a task division reference table to find a task division reference corresponding to the task label, wherein the task division reference comprises an even division strategy and a merging reference corresponding to the calculation task;

5) the CPU module divides the calculation task into a plurality of subtasks according to an even division strategy and generates a task state table, the even division strategy can enable the calculation time of each subtask to be consistent during calculation, the time difference of each GPU module for completing the subtasks is small, and the situation that part of GPU modules are idle when the calculation task which consumes much time is executed is avoided;

6) the CPU module evaluates the priority of each subtask, then checks the idle memories of all GPU modules, screens out all GPU modules of which the idle memories meet the requirements of the subtasks, then evaluates the computing power of all screened GPU modules, and appoints GPU modules with corresponding computing power for each subtask according to the priority;

7) the GPU module further subdivides the received subtasks, sub-task detailed blocks are distributed to each stream processor, and each sub-task detailed block is calculated in parallel to obtain a calculation result;

8) and each GPU module transmits the calculation result back to the memory of the CPU module from the memory of the GPU module, then the CPU module regularly detects whether all the subtasks are completed through the task state table, after all the subtasks are completed, the CPU module merges all the calculation results according to a merging standard, and finally the CPU module clears the memory of the relevant GPU module.

In summary, according to the above technical solution of the present invention, each CPU module divides a computation task and distributes the computation task to a plurality of GPU modules for parallel computation, and finally, in a manner of combining results, multiple CPUs and multiple GPUs complete the same computation task together, and the architecture can fully utilize the parallel processing capability of the GPU cores to realize fast parallel computation of a large data volume; the task label of the calculation task is obtained by substituting the calculation task into the analysis model, and is compared with the task segmentation reference table to obtain the task segmentation reference corresponding to the calculation task, so that different task segmentation references can be adopted according to different calculation tasks, the accuracy of the task segmentation is improved, and the accuracy of the calculation result is improved.

The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A computing method based on cooperation of a CPU and a GPU is characterized by comprising the following steps:

2. The computing method based on cooperation of the CPU and the GPU as claimed in claim 1, wherein step S2 specifically includes:

3. The method for computing based on cooperation of a CPU and a GPU as claimed in claim 1, wherein in S3, the CPU module generates a task state table after dividing the computing task into a plurality of the subtasks.

4. The computing method based on cooperation of the CPU and the GPU as claimed in claim 3, wherein in S5, the CPU module periodically detects whether all the subtasks are completed through the task state table, and merges all the computing results after all the subtasks are completed.

5. The computing method based on cooperation of the CPU and the GPU as claimed in claim 1, wherein step S4 specifically includes:

s41, the CPU module evaluates the priority of each subtask;

6. The method for computing based on cooperation of a CPU and a GPU as claimed in claim 1, wherein in S4, the GPU module further subdivides the received subtasks, allocates a sub-task detail block for each stream processor of the GPU module, and computes each sub-task detail block in parallel.

7. The computing method based on cooperation of the CPU and the GPU as claimed in claim 1, wherein step S5 specifically includes:

s53, the CPU module clears the memory of the relevant GPU module.