CN117407155A - Resource scheme determining method and device, storage medium and electronic equipment - Google Patents
Resource scheme determining method and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN117407155A CN117407155A CN202311229224.5A CN202311229224A CN117407155A CN 117407155 A CN117407155 A CN 117407155A CN 202311229224 A CN202311229224 A CN 202311229224A CN 117407155 A CN117407155 A CN 117407155A
- Authority
- CN
- China
- Prior art keywords
- node
- resource
- resource scheme
- list
- scheme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000004891 communication Methods 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 description 14
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 239000002699 waste material Substances 0.000 description 4
- 230000003068 static effect Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application discloses a resource scheme determining method, a resource scheme determining device, a storage medium and electronic equipment, and relates to the technical field of computers. The method comprises the following steps: constructing a single-node resource scheme list and a cross-node resource scheme list of the target task; calculating performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list; and determining a target resource scheme of the target task according to the performance value. And calculating performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list, and determining a target resource scheme of the target task according to the performance values. The resource scheme with highest operation efficiency can be selected from a plurality of resource schemes meeting the deadline requirement, and the resource performance is fully exerted.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for determining a resource scheme, a storage medium, and an electronic device.
Background
Graphics processors (Graphics Processing Unit, GPUs), also known as display cores, vision processors, display chips, are designed for computationally intensive, highly parallelized computations. There are various resources in the GPU, and if the allocation of any one of the resources is unbalanced during the execution of the task by the GPU, the GPU resources may be wasted, thereby reducing the computing performance of the GPU. Therefore, in the running process of the GPU, each GPU resource needs to be scheduled in a balanced manner, so that each GPU resource is in a resource balance state, and the running of the whole GPU is in a resource balance state, thereby improving the computing performance of the GPU. Therefore, how to determine a high-performance resource scheme is a problem to be solved.
Disclosure of Invention
In view of the above problems, the present application provides a method, an apparatus, a storage medium, and an electronic device for determining a resource scheme, which solve the problem of how to determine a high-performance resource scheme.
In order to solve the technical problems, the application provides the following scheme:
in a first aspect, the present application provides a method for determining a resource scheme, where the method includes: constructing a single-node resource scheme list and a cross-node resource scheme list of the target task; calculating performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list; and determining a target resource scheme of the target task according to the performance value.
With reference to the first aspect, in one possible implementation manner, determining a maximum value of idle GPUs of a single node in the GPU cluster and the number of idle GPUs in the GPU cluster; when only one node in the GPU cluster has idle GPU, constructing a single-node resource scheme list according to the maximum value of the idle GPU of the single node; when a plurality of nodes in the GPU cluster have idle GPUs, a cross-node resource scheme list is constructed according to the number of the idle GPUs in the GPU cluster.
With reference to the first aspect, in another possible implementation manner, calculating a task end time of the target task under different resource schemes in the single-node resource scheme list and the cross-node resource scheme list; and calculating the performance value of the target task under the resource scheme corresponding to the task ending time according to the ending time indicated by the target task, the task ending time and the total quantity of the GPUs in the GPU cluster.
With reference to the first aspect, in another possible implementation manner, a resource scheme with the largest performance value in the list of single-node resource schemes is used as a single-node target resource scheme; and/or taking the resource scheme with the maximum performance value in the cross-node resource scheme list as the cross-node target resource scheme.
With reference to the first aspect, in another possible implementation manner, when the cross-node target resource scheme exists and the number of idle GPUs in a single node is smaller than the total number of GPUs in the GPU cluster, the cross-node target resource scheme is taken as the target resource scheme; when the cross-node target resource scheme does not exist, the single-node target resource scheme is taken as the target resource scheme.
With reference to the first aspect, in another possible implementation manner, the resource schemes in the single-node resource scheme list and the cross-node resource scheme list are filtered.
With reference to the first aspect, in another possible implementation manner, the total number of GPUs in the GPU cluster is subtracted by 1 as a first value; and deleting the resource schemes in the single-node resource scheme list and the cross-node resource scheme list, wherein the communication time between the GPUs is less than the product of the first value and the calculation time on the single GPU.
In a second aspect, the present application provides a resource scheme determining apparatus, including: the device comprises a construction module, a calculation module and a determination module.
And the construction module is used for constructing a single-node resource scheme list and a cross-node resource scheme list of the target task.
And the calculating module is used for calculating the performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list.
And the determining module is used for determining a target resource scheme of the target task according to the performance value.
With reference to the second aspect, in one possible implementation manner, the construction module is specifically configured to determine a maximum value of idle GPUs of a single node in the GPU cluster and a number of idle GPUs in the GPU cluster; when only one node in the GPU cluster has idle GPU, constructing a single-node resource scheme list according to the maximum value of the idle GPU of the single node; when a plurality of nodes in the GPU cluster have idle GPUs, a cross-node resource scheme list is constructed according to the number of the idle GPUs in the GPU cluster.
With reference to the second aspect, in another possible implementation manner, the calculating module is specifically configured to calculate a task end time of the target task under different resource schemes in the single-node resource scheme list and the cross-node resource scheme list; and calculating the performance value of the target task under the resource scheme corresponding to the task ending time according to the ending time indicated by the target task, the task ending time and the total quantity of the GPUs in the GPU cluster.
With reference to the second aspect, in another possible implementation manner, the determining module is specifically configured to use a resource scheme with a maximum performance value in the list of single-node resource schemes as a single-node target resource scheme; and/or taking the resource scheme with the maximum performance value in the cross-node resource scheme list as the cross-node target resource scheme.
With reference to the second aspect, in another possible implementation manner, the determining module is further configured to use the cross-node target resource scheme as the target resource scheme when the cross-node target resource scheme exists and the number of idle GPUs in the single node is less than the total number of GPUs in the GPU cluster; when the cross-node target resource scheme does not exist, the single-node target resource scheme is taken as the target resource scheme.
With reference to the second aspect, in another possible implementation manner, the construction module is further configured to filter resource schemes in the single-node resource scheme list and the cross-node resource scheme list.
With reference to the second aspect, in another possible implementation manner, the building module is further configured to subtract 1 from the total number of GPUs in the GPU cluster as the first value; and deleting the resource schemes in the single-node resource scheme list and the cross-node resource scheme list, wherein the communication time between the GPUs is less than the product of the first value and the calculation time on the single GPU.
In order to achieve the above object, according to a third aspect of the present application, there is provided a storage medium including a stored program, wherein a device in which the storage medium is controlled to execute the above-described resource scheme determination method of the first aspect when the program runs.
To achieve the above object, according to a fourth aspect of the present application, there is provided an electronic device, the device including at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete communication with each other through a bus; the processor is configured to invoke program instructions in the memory to perform the resource scheme determination method of the first aspect described above.
By means of the technical scheme, the technical scheme provided by the application has the following advantages:
according to the resource scheme determining method, the device, the storage medium and the electronic equipment, the performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list are calculated, and the target resource scheme of the target task is determined according to the performance values. The resource scheme with highest operation efficiency can be selected from a plurality of resource schemes meeting the deadline requirement, and the resource performance is fully exerted.
In addition, all the resource schemes are divided into a single-node resource scheme and a cross-node resource scheme, and filtering is performed on the single-node resource scheme and the cross-node resource scheme so as to filter resource schemes with running time longer than that of the single-node resource scheme when the same number of GPUs are adopted. Further, the resource scheme capable of carrying out effective distributed training is reserved, and the phenomenon of resource waste is reduced.
The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a resource scheme determining method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a resource scheme determining apparatus according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The words "first", "second", and the like in the embodiments of the present application do not have a logical or time-series dependency, and are not limited in number and execution order. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms. These terms are only used to distinguish one element from another element.
The term "at least one" in the embodiments of the present application means one or more, and the term "plurality" in the embodiments of the present application means two or more.
It should also be understood that the term "if" may be interpreted as "when … …" ("white" or "upon") or "in response to a determination" or "in response to detection". Similarly, the phrase "if a [ stated condition or event ] is detected" may be interpreted as "when a [ stated condition or event ] is determined" or "in response to a determination" or "when a [ stated condition or event ] is detected" or "in response to a detection of a [ stated condition or event ] depending on the context.
To facilitate an understanding of the aspects of the present application, a brief description of related concepts is first presented below.
Graphics processor (Graphics Processing Unit, GPU), also known as display core, vision processor, display chip, is a microprocessor that is dedicated to image and graphics related operations on personal computers, workstations, gaming machines, and some mobile devices (e.g., tablet computers, smartphones, etc.).
The foregoing is a description of technical terms related to the embodiments of the present application, and is not repeated herein below.
As described in the background, in recent years, GPUs have achieved continuous high-speed development in hardware architecture, and have evolved into highly parallel, multi-threaded, and multi-processing core processors with powerful computing capabilities, which employ a single instruction multi-thread (Single Instruction Multiple Thread, SIMT) architecture that differs from the central processing unit (Central Processing Unit, CPU), increasing the flexibility of programming.
GPUs are dedicated to solving the problem that can be expressed as data parallel computations, i.e., most data elements have the same data path, but have extremely high computation density (ratio of mathematical operations to memory operations), which can hide memory access delays. Because of the high performance computing power of GPU devices, GPUs are currently in wide use in deep learning, machine learning, and the like. With the continuous development and popularization of artificial intelligence technology, training of an AI model is increasingly dependent on a large number of GPU devices. How to balance and schedule each GPU resource after the devices form a cluster, so that each GPU resource is in a resource balance state, the operation of the whole GPU is in the resource balance state, and further the calculation performance of the GPU is improved to become a research hot spot.
In view of this, an embodiment of the present application provides a method for determining a resource scheme, where after a GPU cluster of the present application receives a task submitted by a user and including a task deadline, a plurality of resource schemes are established for the task, and an optimal resource scheme is determined from the plurality of resource schemes, so that the task can run on a GPU indicated by the optimal resource scheme.
The specific method comprises the following steps: constructing a single-node resource scheme list and a cross-node resource scheme list of the target task; calculating performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list; and determining a target resource scheme of the target task according to the performance value. According to the method and the device, performance values of different resource schemes in a single-node resource scheme list and a cross-node resource scheme list are calculated, and a target resource scheme of a target task is determined according to the performance values. The resource scheme with highest operation efficiency can be selected from a plurality of resource schemes meeting the deadline requirement, and the resource performance is fully exerted.
In addition, all the resource schemes are divided into a single-node resource scheme and a cross-node resource scheme, and filtering is performed on the single-node resource scheme and the cross-node resource scheme so as to filter resource schemes with running time longer than that of the single-node resource scheme when the same number of GPUs are adopted. Further, the resource scheme capable of carrying out effective distributed training is reserved, and the phenomenon of resource waste is reduced.
The embodiment of the application also provides a resource scheme determining device which can be used for executing the resource scheme determining method. Alternatively, the resource scheme determining device may be an electronic device with data processing capability, or a functional module in the electronic device, which is not limited thereto.
For example, the electronic device may be a server, which may be a single server, or may be a server cluster composed of a plurality of servers. As another example, the electronic device may be a terminal device such as a cell phone, tablet, desktop, laptop, handheld computer, notebook, ultra-mobile Personal Computer (UMPC), netbook, cell phone, personal digital assistant (Personal Digital Assistant, PDA), augmented Reality (Augmented Reality, AR), virtual Reality (VR) device, etc. For another example, the electronic device may also be a video recording device, a video monitoring device, or the like. The specific form of the electronic device is not particularly limited in the present application.
Taking the electronic device as an example, as shown in fig. 1, fig. 1 is a hardware structure of an electronic device 100 provided in the present application.
As shown in fig. 1, the electronic device 100 includes a processor 110, a communication line 120, and a communication interface 130.
Optionally, the electronic device 100 may also include a memory 140. The processor 110, the memory 140, and the communication interface 130 may be connected by a communication line 120.
The processor 110 may be a central processing unit (Central Processing Unit, CPU), a general purpose processor network processor (Network Processor, NP), a digital signal processor (Digital Signal Processing, DSP), a microprocessor, a microcontroller, a programmable logic device (Programmable Logic Device, PLD), or any combination thereof. The processor 110 may also be any other apparatus having a processing function, such as a circuit, a device, or a software module, without limitation.
In one example, processor 110 may include one or more CPUs, such as CPU0 and CPU1 in fig. 1.
As an alternative implementation, electronic device 100 includes multiple processors, e.g., processor 170 may be included in addition to processor 110. Communication line 120 is used to communicate information between various components included in electronic device 100.
A communication interface 130 for communicating with other devices or other communication networks. The other communication network may be an ethernet, a radio access network (Radio Access Network, RAN), a wireless local area network (Wireless Local Area Networks, WLAN), etc. The communication interface 130 may be a module, a circuit, a transceiver, or any device capable of enabling communication.
Memory 140 for storing instructions. Wherein the instructions may be computer programs.
The Memory 140 may be, but is not limited to, a Read-Only Memory (ROM) or other type of static storage device capable of storing static information and/or instructions, an access Memory (Random Access Memory, RAM) or other type of dynamic storage device capable of storing information and/or instructions, an electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), a compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc storage, an optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disc storage medium or other magnetic storage device, etc.
It should be noted that the memory 140 may exist separately from the processor 110 or may be integrated with the processor 110. Memory 140 may be used to store instructions or program code or some data or the like. The memory 140 may be located in the electronic device 100 or may be located outside the electronic device 100, without limitation.
The processor 110 is configured to execute instructions stored in the memory 140 to implement a communication method provided in the following embodiments of the present application. For example, when the electronic device 100 is a terminal or a chip in a terminal, the processor 110 may execute instructions stored in the memory 140 to implement steps performed by a transmitting end in the embodiments described below in this application.
As an alternative implementation, the electronic device 100 further comprises an output device 150 and an input device 160. The output device 150 may be a device capable of outputting data of the electronic apparatus 100 to a user, such as a display screen, a speaker, or the like. The input device 160 is a device capable of inputting data to the electronic apparatus 100, such as a keyboard, a mouse, a microphone, or a joystick.
It should be noted that the structure shown in fig. 1 does not constitute a limitation of the computing device, and the computing device may include more or less components than those shown in fig. 1, or may combine some components, or may be arranged in different components.
The resource scheme determining device and the application scenario described in the embodiments of the present application are for more clearly describing the technical scheme of the embodiments of the present application, and do not constitute a limitation to the technical scheme provided in the embodiments of the present application, and as a person of ordinary skill in the art can know, with the evolution of the resource scheme determining device and the appearance of a new service scenario, the technical scheme provided in the embodiments of the present application is equally applicable to similar technical problems.
Next, a detailed description will be given of a resource scheme determination method with reference to the drawings. Fig. 2 is a flow chart of a method for determining a resource scheme provided in the present application. The method specifically comprises the following steps:
step 210, a single node resource scheme list and a cross node resource scheme list of the target task are constructed.
After receiving a target task which is submitted by a user and contains task deadline, the GPU cluster establishes a plurality of resource schemes for the target task, and determines an optimal resource scheme from the plurality of resource schemes so that the target task can run on the GPU indicated by the optimal resource scheme.
One or more GPUs may be deployed on one computer device (node). When the resource scheme indicates that the GPU devices in the GPU cluster are all on the same computer device, the resource scheme at the moment is a single-node resource scheme. When the resource scheme indicates that the GPU devices in the GPU cluster are on multiple different computer devices, the resource scheme at this time is a cross-node resource scheme. Thus, in the present embodiment, the resource scheme list may be divided into a single node resource scheme list and a cross node resource scheme list.
Specifically, first, a node with an idle GPU is determined, the number of idle GPUs on the node, and the number of nodes with idle GPUs in the GPU cluster. And secondly, determining the maximum value of the idle GPUs of the single node in the GPU cluster, namely, which node in the GPU cluster has the largest number of idle GPUs and the number of all idle GPUs in the GPU cluster.
When only one node in the GPU cluster has idle GPUs, namely the number of the nodes with the idle GPUs in the GPU cluster is 1, a single-node resource scheme list is built according to the maximum value of the idle GPUs of the single node. The single-node resource scheme can occupy one idle GPU on a single node, and can also occupy all idle GPUs (single-node idle GPU maximum value) on the single node, so that all single-node resource schemes from occupying one idle GPU on the single node to occupying all idle GPUs on the single node are added into the single-node resource scheme list.
When a plurality of nodes in the GPU cluster have idle GPUs, namely the number of the nodes with the idle GPUs in the GPU cluster is larger than 1, constructing a cross-node resource scheme list according to the number of the idle GPUs in the GPU cluster. In a cross-node resource scheme, a task may occupy at most all idle GPUs in a GPU cluster. Therefore, at least two idle GPUs on the cross-node are occupied, and all cross-node resource schemes between occupying all idle GPUs in the GPU cluster are added into a cross-node resource scheme list.
In addition, there is bandwidth sensitivity in the distributed deep learning process. That is, due to the difference of bandwidths between the GPU devices, when two different resource schemes adopt the same number of GPU devices, the running time of the task under the two resource schemes is different due to the different layout modes of the GPU devices.
When the GPU clusters indicated by the resource scheme are all on the same node, the GPU devices directly perform physical communication, so that the bandwidth speed is the direct connection bandwidth between the GPU devices. When the GPU clusters indicated by the resource scheme are on a plurality of nodes, the GPU devices need to communicate with each other through a network, so the bandwidth speed is the network bandwidth between the nodes.
In the case where the total number of GPUs and the number of parameters in the GPU cluster are unchanged, the communication time between the GPUs and the GPU device increases with the decrease of the bandwidth. When the bandwidth performance between GPU devices is insufficient to support distributed training, a situation may occur in which the running time of multi-node distributed training is longer than that of single-node training. However, in the field of artificial intelligence, it is required that the run time of multi-node distributed training is shorter than that of single-node training. The run time of the multi-node distributed training isThe run time of the single-node training is +.>Wherein T is 1 Representing the time of running tasks on a single GPU device, T 2 Representing communication time between GPU and GPU device, S d Data quantity representing target task S b Represents the data volume for one iteration round, and N represents the total number of GPUs in the GPU cluster.
From the above derivation T can be obtained 2 <(N-1)*T 1 Therefore, when the resource scheme is a cross-node resource scheme, subtracting 1 from the total number of GPUs in the GPU cluster as a first value; and deleting the resource schemes in the single-node resource scheme list and the cross-node resource scheme list, wherein the communication time between the GPUs is less than the product of the first value and the calculation time on the single GPU. To filter out resource schemes that employ the same number of GPUs, the run time of the cross-node resource scheme is greater than the run time of the single-node resource scheme. Further, the resource scheme capable of carrying out effective distributed training is reserved, and the phenomenon of resource waste is reduced.
Step 220, calculating performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list.
Because of the strong sensitivity of some models with more parameters to bandwidth, the training acceleration effect can be obtained on a single node, and the training acceleration effect can not be obtained on a cross-node resource scheme with the same GPU number because the communication overhead is not completely counteracted. Therefore, the bandwidth difference of different models on the single-node and cross-node resource layout modes needs to be considered, and when the single-node resource scheme is selected too pursuing to reduce the completion time of the current task, queuing delay and resource utilization rate reduction of the subsequent arriving task when the single-node resource scheme is needed can be caused.
In order to measure the performance of tasks under different resource schemes and select the resource scheme with highest operation efficiency from a plurality of resource schemes meeting the deadline requirement, the resource performance is fully exerted. After obtaining the single-node resource scheme and the cross-node resource scheme, the method also needs to calculate the performance values of the resource schemes so as to determine the resource scheme with the best energy from the resource schemes.
Specifically, calculating task end time of a target task under different resource schemes in a single-node resource scheme list and a cross-node resource scheme list; and calculating the performance value of the target task under the resource scheme corresponding to the task ending time according to the ending time indicated by the target task, the task ending time and the total quantity of the GPUs in the GPU cluster.
The smaller the number of resources used by one resource scheme and the shorter the available task running end time, the higher the resource performance exerted.
And 230, determining a target resource scheme of the target task according to the performance value.
And taking the resource scheme with the maximum performance value in the single-node resource scheme list as a single-node target resource scheme. And taking the resource scheme with the maximum performance value in the cross-node resource scheme list as a cross-node target resource scheme.
Further, when the cross-node target resource scheme exists and the number of idle GPUs in a single node is smaller than the total number of GPUs in the GPU cluster, the cross-node target resource scheme is used as the target resource scheme. When the cross-node target resource scheme does not exist, the single-node target resource scheme is taken as the target resource scheme.
In summary, the present application calculates performance values of different resource schemes in a single node resource scheme list and a cross node resource scheme list, and determines a target resource scheme of a target task according to the performance values. The resource scheme with highest operation efficiency can be selected from a plurality of resource schemes meeting the deadline requirement, and the resource performance is fully exerted.
In addition, all the resource schemes are divided into a single-node resource scheme and a cross-node resource scheme, and filtering is performed on the single-node resource scheme and the cross-node resource scheme so as to filter resource schemes with running time longer than that of the single-node resource scheme when the same number of GPUs are adopted. Further, the resource scheme capable of carrying out effective distributed training is reserved, and the phenomenon of resource waste is reduced.
It will be appreciated that, in order to implement the functions of the above embodiments, the computer device includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and computer software. Whether a function is implemented as a piece or as computer software driven hardware depends upon the particular application scenario and design constraints imposed on the solution.
Further, as an implementation of the method embodiment shown in fig. 2, an embodiment of the present application provides a resource scheme determining device, where the device is configured to match a resource scheme for a task to be executed. The embodiment of the device corresponds to the foregoing method embodiment, and for convenience of reading, details of the foregoing method embodiment are not described one by one in this embodiment, but it should be clear that the device in this embodiment can correspondingly implement all the details of the foregoing method embodiment. As shown in fig. 3, the resource scheme determining apparatus 300 includes: a construction module 310, a calculation module 320, and a determination module 330.
A construction module 310, configured to construct a list of single-node resource schemes and a list of cross-node resource schemes for the target task.
A calculating module 320, configured to calculate performance values of different resource schemes in the single node resource scheme list and the cross node resource scheme list.
A determining module 330, configured to determine a target resource scheme of the target task according to the performance value.
Further, as shown in fig. 3, the construction module 310 is specifically configured to determine a maximum value of idle GPUs of a single node in the GPU cluster and a number of idle GPUs in the GPU cluster; when only one node in the GPU cluster has idle GPU, constructing a single-node resource scheme list according to the maximum value of the idle GPU of the single node; when a plurality of nodes in the GPU cluster have idle GPUs, a cross-node resource scheme list is constructed according to the number of the idle GPUs in the GPU cluster.
Further, as shown in fig. 3, the calculating module 320 is specifically configured to calculate a task end time of the target task under different resource schemes in the single-node resource scheme list and the cross-node resource scheme list; and calculating the performance value of the target task under the resource scheme corresponding to the task ending time according to the ending time indicated by the target task, the task ending time and the total quantity of the GPUs in the GPU cluster.
Further, as shown in fig. 3, the determining module 330 is specifically configured to use the resource scheme with the largest performance value in the list of single-node resource schemes as the single-node target resource scheme; and/or taking the resource scheme with the maximum performance value in the cross-node resource scheme list as the cross-node target resource scheme.
Further, as shown in fig. 3, the determining module 330 is further configured to use the cross-node target resource scheme as the target resource scheme when the cross-node target resource scheme exists and the number of idle GPUs in the single node is smaller than the total number of GPUs in the GPU cluster; when the cross-node target resource scheme does not exist, the single-node target resource scheme is taken as the target resource scheme.
Further, as shown in fig. 3, the construction module 310 is further configured to filter the resource schemes in the single-node resource scheme list and the cross-node resource scheme list.
Further, as shown in fig. 3, the construction module 310 is further configured to subtract 1 from the total number of GPUs in the GPU cluster as a first value; and deleting the resource schemes in the single-node resource scheme list and the cross-node resource scheme list, wherein the communication time between the GPUs is less than the product of the first value and the calculation time on the single GPU.
The embodiment of the application provides a storage medium, on which a program is stored, which when executed by a processor, implements the resource scheme determination method.
The embodiment of the application provides a processor, which is used for running a program, wherein the program runs to execute the resource scheme determining method.
The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: constructing a single-node resource scheme list and a cross-node resource scheme list of the target task; calculating performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list; and determining a target resource scheme of the target task according to the performance value.
Further, determining the maximum value of the idle GPUs of the single node in the GPU cluster and the number of the idle GPUs in the GPU cluster; when only one node in the GPU cluster has idle GPU, constructing a single-node resource scheme list according to the maximum value of the idle GPU of the single node; when a plurality of nodes in the GPU cluster have idle GPUs, a cross-node resource scheme list is constructed according to the number of the idle GPUs in the GPU cluster.
Further, calculating task end time of the target task under different resource schemes in the single-node resource scheme list and the cross-node resource scheme list; and calculating the performance value of the target task under the resource scheme corresponding to the task ending time according to the ending time indicated by the target task, the task ending time and the total quantity of the GPUs in the GPU cluster.
Further, the resource scheme with the largest performance value in the single-node resource scheme list is used as a single-node target resource scheme; and/or taking the resource scheme with the maximum performance value in the cross-node resource scheme list as the cross-node target resource scheme.
Further, when the cross-node target resource scheme exists and the number of idle GPUs in a single node is smaller than the total number of GPUs in the GPU cluster, the cross-node target resource scheme is used as the target resource scheme; when the cross-node target resource scheme does not exist, the single-node target resource scheme is taken as the target resource scheme.
Further, the resource schemes in the single-node resource scheme list and the cross-node resource scheme list are filtered.
Further, subtracting 1 from the total number of GPUs in the GPU cluster as a first value; and deleting the resource schemes in the single-node resource scheme list and the cross-node resource scheme list, wherein the communication time between the GPUs is less than the product of the first value and the calculation time on the single GPU.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, the device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.
Claims (10)
1. A method for determining a resource scheme, the method comprising:
constructing a single-node resource scheme list and a cross-node resource scheme list of the target task;
calculating performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list;
and determining a target resource scheme of the target task according to the performance value.
2. The method of claim 1, wherein constructing the list of single-node resource schemes and the list of cross-node resource schemes for the target task comprises:
determining a single-node idle Graphics Processing Unit (GPU) maximum value in a GPU cluster and the number of idle GPUs in the GPU cluster;
when only one node in the GPU cluster has idle GPU, constructing the single-node resource scheme list according to the maximum value of the idle GPU of the single node;
when a plurality of nodes in the GPU cluster have idle GPUs, the cross-node resource scheme list is constructed according to the number of the idle GPUs in the GPU cluster.
3. The method of claim 2, wherein calculating performance values for different resource schemes in the list of single-node resource schemes and the list of cross-node resource schemes comprises:
calculating task ending time of the target task under different resource schemes in the single-node resource scheme list and the cross-node resource scheme list;
and calculating the performance value of the target task under a resource scheme corresponding to the task ending time according to the ending time indicated by the target task, the task ending time and the total quantity of GPUs in the GPU cluster.
4. A method according to claim 3, wherein determining a target resource scheme for the target task from the performance value comprises:
taking the resource scheme with the maximum performance value in the single-node resource scheme list as a single-node target resource scheme; and/or
And taking the resource scheme with the maximum performance value in the cross-node resource scheme list as a cross-node target resource scheme.
5. The method according to claim 4, wherein the method further comprises:
when the cross-node target resource scheme exists and the number of idle GPUs in a single node is smaller than the total number of GPUs in the GPU cluster, taking the cross-node target resource scheme as the target resource scheme;
and when the cross-node target resource scheme does not exist, taking the single-node target resource scheme as the target resource scheme.
6. The method of claim 1, wherein after constructing the list of single node resource schemes and the list of cross node resource schemes for the target task, the method further comprises:
and filtering the resource schemes in the single-node resource scheme list and the cross-node resource scheme list.
7. The method of claim 6, wherein filtering the resource schemes in the list of single-node resource schemes and the list of cross-node resource schemes comprises:
subtracting 1 from the total number of GPUs in the GPU cluster as a first value;
and deleting the resource schemes in the single-node resource scheme list and the cross-node resource scheme list, wherein the communication time between the GPUs is not less than the product of the first value and the calculation time on the single GPU.
8. A resource scheme determining apparatus, the apparatus comprising:
the construction module is used for constructing a single-node resource scheme list and a cross-node resource scheme list of the target task;
the calculation module is used for calculating the performance values of different resource schemes in the single-node resource scheme list and the cross-node resource scheme list;
and the determining module is used for determining a target resource scheme of the target task according to the performance value.
9. A storage medium comprising a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the resource scheme determination method according to any one of claims 1-7.
10. An electronic device comprising at least one processor, and at least one memory, bus coupled to the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke program instructions in the memory to perform the resource scheme determination method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311229224.5A CN117407155A (en) | 2023-09-22 | 2023-09-22 | Resource scheme determining method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311229224.5A CN117407155A (en) | 2023-09-22 | 2023-09-22 | Resource scheme determining method and device, storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117407155A true CN117407155A (en) | 2024-01-16 |
Family
ID=89486217
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311229224.5A Pending CN117407155A (en) | 2023-09-22 | 2023-09-22 | Resource scheme determining method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117407155A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111736987A (en) * | 2020-05-29 | 2020-10-02 | 山东大学 | Task scheduling method based on GPU space resource sharing |
CN114116149A (en) * | 2020-08-25 | 2022-03-01 | 华为技术有限公司 | Task scheduling method, device, equipment and medium |
CN114647515A (en) * | 2022-04-12 | 2022-06-21 | 杭州电子科技大学 | GPU cluster-oriented dynamic resource scheduling method |
US20220276899A1 (en) * | 2021-07-07 | 2022-09-01 | Beijing Baidu Netcom Science Technology Co., Ltd. | Resource scheduling method, device, and storage medium |
CN115599533A (en) * | 2021-07-07 | 2023-01-13 | 腾讯科技(深圳)有限公司(Cn) | Task processing method, device, equipment and storage medium |
CN115934362A (en) * | 2023-02-27 | 2023-04-07 | 北京大学 | Deep learning-oriented server non-perception computing cluster scheduling method and product |
-
2023
- 2023-09-22 CN CN202311229224.5A patent/CN117407155A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111736987A (en) * | 2020-05-29 | 2020-10-02 | 山东大学 | Task scheduling method based on GPU space resource sharing |
CN114116149A (en) * | 2020-08-25 | 2022-03-01 | 华为技术有限公司 | Task scheduling method, device, equipment and medium |
US20220276899A1 (en) * | 2021-07-07 | 2022-09-01 | Beijing Baidu Netcom Science Technology Co., Ltd. | Resource scheduling method, device, and storage medium |
CN115599533A (en) * | 2021-07-07 | 2023-01-13 | 腾讯科技(深圳)有限公司(Cn) | Task processing method, device, equipment and storage medium |
CN114647515A (en) * | 2022-04-12 | 2022-06-21 | 杭州电子科技大学 | GPU cluster-oriented dynamic resource scheduling method |
CN115934362A (en) * | 2023-02-27 | 2023-04-07 | 北京大学 | Deep learning-oriented server non-perception computing cluster scheduling method and product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20150295970A1 (en) | Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system | |
CN111506434B (en) | Task processing method and device and computer readable storage medium | |
KR101531752B1 (en) | Locality aware work stealing runtime scheduler | |
US11422857B2 (en) | Multi-level scheduling | |
CN104461673B (en) | A kind of virtual machine (vm) migration determination method and device | |
CN115098412B (en) | Peripheral access controller, data access device and corresponding method, medium and chip | |
US20160210171A1 (en) | Scheduling in job execution | |
WO2021259098A1 (en) | Acceleration system and method based on convolutional neural network, and storage medium | |
CN114706689B (en) | Multi-core processor task scheduling method and system based on subtask characteristics | |
CN112882819B (en) | Method and device for setting chip working frequency | |
CN114005458A (en) | Voice noise reduction method and system based on pipeline architecture and storage medium | |
CN118035618B (en) | Data processor, data processing method, electronic device, and storage medium | |
CN117407155A (en) | Resource scheme determining method and device, storage medium and electronic equipment | |
CN112965788A (en) | Task execution method, system and equipment in hybrid virtualization mode | |
US20230143270A1 (en) | Apparatus and method with scheduling | |
CN111078286A (en) | Data communication method, computing system and storage medium | |
CN117632457A (en) | Method and related device for scheduling accelerator | |
KR20220049294A (en) | Scheduler, method for operating the same and electronic device including the same | |
CN113556242A (en) | Method and equipment for performing inter-node communication based on multi-processing nodes | |
CN118113435A (en) | Task scheduling model construction method and device, storage medium and electronic equipment | |
US20240330230A1 (en) | Apparatus and methods for universal serial bus 4 (usb4) data bandwidth scaling | |
Kong et al. | Energy-constrained scheduling for weakly-hard real-time tasks on multiprocessors | |
CN117651044B (en) | Edge computing task scheduling method and device | |
CN117519996B (en) | Data processing method, device, equipment and storage medium | |
CN118034876A (en) | Task scheduling system, method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |