CN117170848A - Resource scheduling method and device - Google Patents
Resource scheduling method and device Download PDFInfo
- Publication number
- CN117170848A CN117170848A CN202311168782.5A CN202311168782A CN117170848A CN 117170848 A CN117170848 A CN 117170848A CN 202311168782 A CN202311168782 A CN 202311168782A CN 117170848 A CN117170848 A CN 117170848A
- Authority
- CN
- China
- Prior art keywords
- computing
- result
- computing nodes
- computing node
- task set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000012163 sequencing technique Methods 0.000 claims abstract description 87
- 230000004044 response Effects 0.000 claims abstract description 52
- 238000004088 simulation Methods 0.000 claims abstract description 34
- 238000004364 calculation method Methods 0.000 claims description 19
- 238000012216 screening Methods 0.000 claims description 18
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical group OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 claims description 10
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000005265 energy consumption Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses a resource scheduling method and a device, wherein the method comprises the following steps: acquiring comprehensive performance conditions of all computing nodes according to hardware information of a plurality of computing nodes in the cluster, summarizing and classifying the computing nodes according to the comprehensive performance conditions, and marking the category of each computing node to obtain a first sequencing result; and sequencing the plurality of computing nodes according to the measured simulation response time of the computing nodes to obtain a second sequencing result, comparing the second sequencing result with the first sequencing result, if the similarity between the second sequencing result and the first sequencing result is higher than a preset threshold value, proving that the first sequencing result is credible, updating the actual sequencing result to the first sequencing result to obtain a third sequencing result, and scheduling computing resources according to the third sequencing result. The embodiment of the application can allocate proper computing resources for different biased task sets, and realize reasonable resource scheduling, thereby improving scheduling efficiency and reducing energy consumption.
Description
Technical Field
The application belongs to the technical field of computers, and particularly relates to a resource scheduling method and device.
Background
Cloud computing is an internet-based computing approach by which shared software and hardware resources and information can be provided to computers and other devices as needed. The core idea of cloud computing is to uniformly manage and schedule a large number of computing resources connected by a network, so as to form a uniform resource pool to provide on-demand service for users.
At present, the cloud computing resource scheduling strategy can only be set manually, reasonable resource scheduling cannot be realized, the efficiency is low, and the further development of the cloud computing automation management strategy is limited.
Content of the application
The embodiment of the application aims to provide a resource scheduling method and device, which are used for solving the defect of low efficiency of a cloud computing resource scheduling strategy in the prior art.
In order to solve the technical problems, the application is realized as follows:
in a first aspect, a resource scheduling method is provided, including the following steps:
acquiring comprehensive performance conditions of all computing nodes according to hardware information of a plurality of computing nodes in a cluster, summarizing and classifying the computing nodes according to the comprehensive performance conditions, and marking the category of each computing node to obtain a first sequencing result;
according to the first sequencing result and the resource consumption condition of a target task set running in the virtual machine, predicting more needed computing resources in the target virtual machine, and screening out a computing node set matched with the target task set;
according to the execution sequence priority of the most active father-son task flows in the virtual machine, a sample task set is obtained, the sample task set is randomly issued to part of computing nodes in the computing node set, and the simulation response time spent by the part of computing nodes for simulating and running the sample task set is calculated;
and sequencing the plurality of computing nodes according to the measured simulation response time of the computing nodes to obtain a second sequencing result, comparing the second sequencing result with the first sequencing result, if the similarity between the second sequencing result and the first sequencing result is higher than a preset threshold value, proving that the first sequencing result is reliable, updating the actual sequencing result to the first sequencing result to obtain a third sequencing result, and scheduling computing resources according to the third sequencing result.
In a second aspect, there is provided a resource scheduling apparatus, including:
the acquisition module is used for respectively acquiring the comprehensive performance conditions of the computing nodes according to the hardware information of the computing nodes in the cluster, summarizing and classifying the computing nodes according to the comprehensive performance conditions, marking the category of each computing node and obtaining a first sequencing result;
the screening module is used for predicting more needed computing resources in the target virtual machine according to the first sequencing result and the resource consumption condition of the target task set running in the virtual machine, and screening out a computing node set matched with the target task set;
the computing module is used for acquiring a sample task set according to the execution sequence priority of the most active father-son task flow in the virtual machine, randomly issuing the sample task set to part of computing nodes in the computing node set, and computing the simulation response time spent by the part of computing nodes for simulating and running the sample task set;
the scheduling module is used for sequencing the plurality of computing nodes according to the measured simulation response time of the computing nodes to obtain a second sequencing result, comparing the second sequencing result with the first sequencing result, if the similarity between the second sequencing result and the first sequencing result is higher than a preset threshold value, proving that the first sequencing result is reliable, updating the actual sequencing result to the first sequencing result to obtain a third sequencing result, and scheduling computing resources according to the third sequencing result.
According to the embodiment of the application, each computing node in the cluster is respectively sequenced according to the hardware information of each computing node in the cluster and the simulated response time spent by each computing node in the simulation operation of the sample task set, and the computing resources are scheduled according to the sequencing result, so that the appropriate computing resources can be allocated for different biased task sets, and reasonable resource scheduling is realized, thereby improving the scheduling efficiency and reducing the energy consumption.
Drawings
FIG. 1 is a flowchart of a resource scheduling method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a resource scheduling device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In order to improve efficiency and reduce energy consumption, and allocate most suitable computing resources for computing resource characteristics required by different task flows, an embodiment of the application provides a cluster computing resource scheduling method and device based on cloud computing, wherein the method comprises the following steps: and marking expected computing power for each computing node according to the hardware computing power of each computing node, summarizing, and sorting and classifying a plurality of computing nodes in the cluster. Then, obtaining the most active part father-son task flow of the target virtual machine, obtaining a task set, determining the detailed condition of physical resources occupied by the task set, estimating the computing resources possibly more depended by the task set of the virtual machine, and obtaining the computing resource occupation condition of each computing node to obtain an available computing cluster; and then, according to the task running in the virtual machine, primarily distributing available computing clusters for the virtual machine. And finally, extracting the obtained partial task set, sequentially and orderly issuing the partial task set to each computing node, recording and comparing the time required for completing the unit task, and selecting the optimal node from the computing cluster so as to obtain the most suitable scheduling scheme.
The resource scheduling method provided by the embodiment of the application is described in detail through specific embodiments and application scenes thereof with reference to the accompanying drawings.
As shown in fig. 1, a flowchart of a resource scheduling method according to an embodiment of the present application is provided, where the method includes the following steps:
step 101, respectively obtaining comprehensive performance conditions of all computing nodes according to hardware information of a plurality of computing nodes in a cluster, carrying out summarization classification on the computing nodes according to the comprehensive performance conditions, and marking the class of each computing node to obtain a first sequencing result.
The hardware information comprises the number of CPU chips, the number of single CPU cores, a CPU reference frequency, the number of GPU chips, the number of single GPU CUDA cores, a GPU core architecture and a GPU core frequency;
specifically, the comprehensive performance score of each computing node in the cluster may be calculated according to the hardware information of each computing node in the cluster by the following formula:
FP C =S C ×CO C ×F C ×V C
FP G =S G ×CU G ×F G ×α
wherein N is P A is CPU performance standard value, b is GPU performance standard value, FP C Theoretical calculation power, FP, for CPU double-precision floating point calculation G Calculating theoretical performance for GPU single-precision floating point;
S C for CPU chip number, CO C F is the number of single CPU cores C For CPU reference frequency, V C Calculating coefficients for CPU single-core single-cycle single-precision floating points;
S G for GPU chip count, CU G For the number of CUDA cores of a single GPU, F G The core frequency of the GPU is shown, and alpha is the weight of the core architecture;
further, the integrated performance score N of each computing node can be calculated according to P And sorting and classifying the plurality of computing nodes in the cluster, and determining the category of each computing node to obtain a first sorting result, wherein the category comprises CPU bias, GPU bias or no bias.
Step 102, predicting more needed computing resources in the target virtual machine according to the first sorting result and the resource consumption condition of the target task set running in the virtual machine, and screening out a computing node set matched with the target task set.
Specifically, the resource consumption condition of a target task set running in a virtual machine can be obtained, and the type of hardware resource biased by the target task set is evaluated according to the CPU, GPU and memory resource consumed by the target task set;
and screening a plurality of computing nodes matched with hardware resources biased by the target task set according to the evaluation result, the first sequencing result and the category of each computing node in the cluster.
And step 103, acquiring a sample task set according to the priority of the execution sequence of the most active father-son task flow in the virtual machine, randomly issuing the sample task set to part of computing nodes in the computing node set, and calculating the simulation response time spent by the part of computing nodes in simulating and running the sample task set.
In particular, the sample task set may be issued to some of the compute nodes in the compute node set, computing simulated response times that each compute node spends running the sample task set in a simulation.
Step 104, sorting the plurality of computing nodes according to the measured simulation response time of the computing nodes to obtain a second sorting result, comparing the second sorting result with the first sorting result, if the similarity between the second sorting result and the first sorting result is higher than a preset threshold, proving that the first sorting result is reliable, updating the actual sorting result to the first sorting result to obtain a third sorting result, and scheduling computing resources according to the third sorting result.
Specifically, the partial computing nodes can be sequenced according to the length sequence of the simulation response time, so as to obtain a second sequencing result; under the condition that the similarity between the second sorting result and the first sorting result is lower than a preset threshold, the sample task set is issued to the rest of computing nodes except for the part of computing nodes in the computing node set, and the simulation response time spent by the rest of computing nodes for simulating running the sample task set is calculated; sequencing all the computing nodes in the computing node set according to the length sequence of the simulation response time of each computing node in the computing node set to obtain an actual optimal node set; and scheduling the computing resources according to the ordering among the computing nodes in the actual optimal node set.
In this embodiment, under the condition that the similarity between the second sorting result and the first sorting result is higher than a preset threshold, the first sorting result is trusted, the second sorting result is updated to the first sorting result to obtain a third sorting result, and the simulation response time of the computing node with the shortest simulation response time spent in simulating and running the sample task set in the computing node set can also be obtained, and whether the simulation response time is smaller than the completion time of the computing node where the sample task set is currently located is judged, if so, computing resources are scheduled according to the second sorting result; otherwise, it is determined that no scheduling of computing resources is required.
According to the embodiment of the application, each computing node in the cluster is respectively sequenced according to the hardware information of each computing node in the cluster and the simulated response time spent by each computing node in the simulation operation of the sample task set, and the computing resources are scheduled according to the sequencing result, so that the appropriate computing resources can be allocated for different biased task sets, and reasonable resource scheduling is realized, thereby improving the scheduling efficiency and reducing the energy consumption.
In the embodiment of the application, the cloud computing resource scheduling method comprises the following steps of: the domain controller is used to obtain standard hardware information for each physical node and calculate its expected computing power. Based on the obtained comprehensive performance condition of each hardware of each computing node, the computing tasks with higher proficiency are calibrated and sorted according to different computing tasks. And acquiring the occupation condition of the computing resources in the virtual machine, and predicting the computing resources possibly more biased in the task flow according to the average load condition of each component of the virtual machine. And according to the obtained estimated result, the average load condition of the target computing nodes is combined, the computing nodes which have no spare resources and do not meet the conditions are eliminated, and the computing node set which possibly meets the conditions is primarily screened out. And acquiring the most active part of the father-son task flows in the virtual machine, and arranging according to the priority of the execution sequence of the father-son task flows to obtain a sample task set. And issuing the obtained sample task set to the first-screened computing node set, simulating the task set to run in part of computing nodes, and calculating the time spent by the computing nodes in response to the task set. And sequentially arranging the calculation node sets according to the length sequence of the simulation response time to obtain an actual optimal calculation node set, and comparing the actual optimal calculation node set with the estimated result to verify whether the calculation node set is highly consistent. If the result meets the requirement, the simulated shortest response time is compared with the current completion time of the task set, and if the current response time is smaller than the simulated shortest completion time, scheduling is not needed. If the result meets the requirement and the computing node where the current task set is located is not the optimal node, the scheduling is completed according to the set sequence. If the result does not meet the requirement, continuing to issue the sample task set to the rest computing nodes, calculating the response task set time of the sample task set, obtaining an actual optimal node set, and completing scheduling according to the sequence.
The hardware capability of the computing node specifically comprises:
CPU part: the CPU chip number, the number of single CPU cores, the CPU reference frequency and the CPU instruction set;
GPU part: the method comprises the steps of GPU chip number, single GPU CUDA core number, GPU core architecture, GPU reference frequency and GPU special memory capacity;
other parts: memory frequency, memory capacity, network traffic bandwidth.
Accordingly, the method for predicting the expected computing power of each computing node by utilizing the hardware information specifically comprises the following steps:
using CPU and GPU information calculations, the expected computing force is calculated according to the following formula:
CPU part:
FP64 = number of CPU chips core number mononuclear dominant frequency mononuclear single cycle floating point calculated value
FP C =S C ×CO C ×F C ×V C
V c =2 (number of single-core FMAs) ×2 (fused multiply add operation) ×512/64
GPU part:
FP32 = GPU chip count FP32 core count accelerating frequency
FP G =S G ×CU G ×F G ×α
Comprehensive performance weighting:
wherein, FP C Representing the theoretical calculation power of CPU double-precision floating point calculation, S C For CPU chip number, CO C F is the number of single CPU cores C For CPU reference frequency, V C Calculating coefficients for CPU single-core single-cycle single-precision floating points; FP (Fabry-Perot) G Representing theoretical performance of GPU single-precision floating point calculation, S G For GPU chip count, CU G For GPU CUDA core number, F G The core frequency of the GPU is shown, and alpha is the weight of the core architecture; n (N) P And a is a CPU performance standard value, and b is a GPU performance standard value.
Because the CPU and GPU performances of each computing node are different, according to N p The values of (2) may be sorted for each compute node, and CPU bias, GPU bias, no bias may be obtained.
Further, the method for evaluating the bias of the task set of the virtual machine comprises the following steps: and acquiring the resource consumption condition of the task set running in the virtual machine, and evaluating the hardware resources possibly more biased by the task set according to the CPU, GPU and memory resources of the virtual machine consumed by the task flow.
Specifically, acquiring the occupation condition of virtual resources in a target virtual machine; estimating computing resources which the task flow is likely to depend on according to the occupation conditions of the CPU, the GPU and the average flow bandwidth; according to the estimated result and the estimated method of the calculated nodes, the calculated nodes meeting the requirements are primarily screened out; obtaining a sample task set formed by partial task flows in a target virtual machine; and issuing the sample task set to part of the computing nodes, recording the task completion time of the sample task set, and comparing the task completion time with the sequencing result of the computing nodes. And comparing the actual result with the estimated result, and completing scheduling according to the method.
In an embodiment of the present application, a task scheduling device based on cloud computing includes:
and the node computing power calculating module is used for collecting hardware data of the computing nodes and sequencing the hardware data according to expected computing power of the computing nodes.
And the node screening module is used for completing the performance sequencing of the computing nodes according to the time of completing the sample task set by the computing nodes, and primarily screening and secondarily screening out the optimal computing node set.
And the task consumption time calculation module is used for calculating the time consumed by each calculation node for completing the sample task set after the sample task set is issued to the calculation node set.
And the simulation running module is used for acquiring a sample task set in the target virtual machine and issuing the sample task set to the preliminarily screened computing nodes.
And the resource scheduling module is used for judging whether the actual sequencing result is identical with the estimated result or not, and completing scheduling for the virtual machine according to the result.
As shown in fig. 2, a schematic structural diagram of a resource scheduling device according to an embodiment of the present application includes:
the obtaining module 210 is configured to obtain the comprehensive performance conditions of each computing node according to the hardware information of the computing nodes in the cluster, and perform summary classification on the computing nodes according to the comprehensive performance conditions, and mark the class of each computing node to obtain a first sorting result.
The hardware information comprises the number of CPU chips, the number of single CPU cores, a CPU reference frequency, the number of GPU chips, the number of single GPU CUDA cores, a GPU core architecture and a GPU core frequency;
the obtaining module 210 is specifically configured to calculate, according to hardware information of each computing node in the cluster, a comprehensive performance score of each computing node according to the following formula:
FP C =S C ×CO C ×F C ×V C
FP G =S G ×CU G ×F G ×α
wherein N is P A is CPU performance standard value, b is GPU performance standard value, FP C Theoretical calculation power, FP, for CPU double-precision floating point calculation G Calculating theoretical performance for GPU single-precision floating point;
S C for CPU chip number, CO C F is the number of single CPU cores C For CPU reference frequency, V C Calculating coefficients for CPU single-core single-cycle single-precision floating points;
S G for GPU chip count, CU G For the number of CUDA cores of a single GPU, F G The core frequency of the GPU is shown, and alpha is the weight of the core architecture;
the obtaining module 210 is further specifically configured to obtain the comprehensive performance score N according to each computing node P And sorting and classifying the plurality of computing nodes in the cluster, and determining the category of each computing node to obtain a first sorting result, wherein the category comprises CPU bias, GPU bias or no bias.
And the screening module 220 is configured to predict computing resources that may be more needed in the target virtual machine according to the first ordering result and the resource consumption condition of the target task set running in the virtual machine, and screen out a computing node set that matches the target task set.
Specifically, the screening module 220 is specifically configured to obtain a resource consumption condition of a target task set running in a virtual machine, and evaluate a type of hardware resource biased by the target task set according to a CPU, a GPU, and a memory resource consumed by the target task set; and screening a plurality of computing nodes matched with hardware resources biased by the target task set according to the evaluation result, the first sequencing result and the category of each computing node in the cluster.
And the calculating module 230 is configured to obtain a sample task set according to the execution sequence priority of the most active parent-child task flows in the virtual machine, randomly issue the sample task set to a part of computing nodes in the computing node set, and calculate a simulation response time spent by the part of computing nodes in simulating and running the sample task set.
Specifically, the calculating module 230 is specifically configured to issue the sample task set to a part of the computing nodes in the computing node set, and calculate a simulated response time spent by each computing node to simulate running the sample task set.
The scheduling module 240 is configured to rank the plurality of computing nodes according to the measured simulated response time of the computing nodes, obtain a second ranking result, compare the second ranking result with the first ranking result, prove that the first ranking result is reliable if the similarity between the second ranking result and the first ranking result is higher than a preset threshold, update the actual ranking result to the first ranking result, obtain a third ranking result, and schedule computing resources according to the third ranking result.
Specifically, the scheduling module 240 is further configured to sort the partial computing nodes according to the order of the simulated response time, so as to obtain a second sorting result; under the condition that the similarity between the second sorting result and the first sorting result is lower than a preset threshold, the sample task set is issued to the rest of computing nodes except for the part of computing nodes in the computing node set, and the simulation response time spent by the rest of computing nodes for simulating running the sample task set is calculated; sequencing all the computing nodes in the computing node set according to the length sequence of the simulation response time of each computing node in the computing node set to obtain an actual optimal node set; and scheduling the computing resources according to the ordering among the computing nodes in the actual optimal node set.
In this embodiment, the scheduling module 240 is specifically configured to, if the similarity between the second sorting result and the first sorting result is higher than a preset threshold, indicate that the first sorting result is reliable, update the second sorting result to the first sorting result to obtain a third sorting result, obtain a simulated response time of a computing node in the computing node set, where the simulated response time spent on simulating running the sample task set is shortest, and determine whether the simulated response time is less than a completion time of the computing node where the sample task set is currently located, if so, schedule computing resources according to the third sorting result; otherwise, it is determined that no scheduling of computing resources is required.
According to the embodiment of the application, each computing node in the cluster is respectively sequenced according to the hardware information of each computing node in the cluster and the simulated response time spent by each computing node in the simulation operation of the sample task set, and the computing resources are scheduled according to the sequencing result, so that the appropriate computing resources can be allocated for different biased task sets, and reasonable resource scheduling is realized, thereby improving the scheduling efficiency and reducing the energy consumption.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the above-mentioned embodiments of the resource scheduling method, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.
Claims (10)
1. A method for scheduling resources, comprising the steps of:
acquiring comprehensive performance conditions of all computing nodes according to hardware information of a plurality of computing nodes in a cluster, summarizing and classifying the computing nodes according to the comprehensive performance conditions, and marking the category of each computing node to obtain a first sequencing result;
according to the first sequencing result and the resource consumption condition of a target task set running in the virtual machine, predicting more needed computing resources in the target virtual machine, and screening out a computing node set matched with the target task set;
according to the execution sequence priority of the most active father-son task flows in the virtual machine, a sample task set is obtained, the sample task set is randomly issued to part of computing nodes in the computing node set, and the simulation response time spent by the part of computing nodes for simulating and running the sample task set is calculated;
and sequencing the plurality of computing nodes according to the measured simulation response time of the computing nodes to obtain a second sequencing result, comparing the second sequencing result with the first sequencing result, if the similarity between the second sequencing result and the first sequencing result is higher than a preset threshold value, proving that the first sequencing result is reliable, updating the actual sequencing result to the first sequencing result to obtain a third sequencing result, and scheduling computing resources according to the third sequencing result.
2. The method of claim 1, wherein the hardware information includes a number of CPU chips, a number of individual CPU cores, a CPU reference frequency, a number of GPU chips, a number of individual GPU CUDA cores, a GPU core architecture, and a GPU core frequency;
the method for respectively obtaining the comprehensive performance conditions of each computing node according to the hardware information of a plurality of computing nodes in the cluster specifically comprises the following steps:
according to the hardware information of each computing node in the cluster, the comprehensive performance score of each computing node is calculated through the following formula:
FP C =S C ×CO C ×F C ×V C
FP G =S G ×CU G ×F G ×α
wherein N is P A is CPU performance standard value, b is GPU performance standard value, FP C Theoretical calculation power, FP, for CPU double-precision floating point calculation G Calculating theoretical performance for GPU single-precision floating point;
S C for CPU chip number, CO C F is the number of single CPU cores C For CPU reference frequency, V C Calculating coefficients for CPU single-core single-cycle single-precision floating points;
S G for GPU chip count, CU G For the number of CUDA cores of a single GPU, F G The core frequency of the GPU is shown, and alpha is the weight of the core architecture;
the method comprises the steps of summarizing and classifying the plurality of computing nodes according to the comprehensive performance condition, marking the category of each computing node, and obtaining a first ordering result, and specifically comprises the following steps:
according to the comprehensive performance score N of each computing node P And sorting and classifying the plurality of computing nodes in the cluster, and determining the category of each computing node to obtain a first sorting result, wherein the category comprises CPU bias, GPU bias or no bias.
3. The method according to claim 2, wherein predicting the computing resources that may be more needed in the target virtual machine according to the first ordering result and the resource consumption condition of the target task set running in the virtual machine, and screening the computing node set that matches the target task set, specifically includes:
acquiring the resource consumption condition of a target task set running in a virtual machine, and evaluating the type of hardware resource biased by the target task set according to CPU, GPU and memory resources consumed by the target task set;
and screening a plurality of computing nodes matched with hardware resources biased by the target task set according to the evaluation result, the first sequencing result and the category of each computing node in the cluster.
4. The method according to claim 1, wherein the randomly issuing the sample task set to a part of the computing nodes in the computing node set calculates a simulated response time spent by the part of the computing nodes to simulate running the sample task set, specifically comprising:
issuing the sample task set to part of computing nodes in the computing node set, and calculating simulation response time spent by each computing node for simulating running the sample task set;
the step of sorting the plurality of computing nodes according to the simulated response time of each computing node to obtain a second sorting result specifically includes:
sequencing the partial computing nodes according to the length sequence of the simulated response time to obtain a second sequencing result;
the step of sequencing the plurality of computing nodes according to the measured simulated response time of the computing nodes to obtain a second sequencing result, further comprises:
under the condition that the similarity between the second sorting result and the first sorting result is lower than a preset threshold, the sample task set is issued to the rest of computing nodes except for the part of computing nodes in the computing node set, and the simulation response time spent by the rest of computing nodes for simulating running the sample task set is calculated;
sequencing all the computing nodes in the computing node set according to the length sequence of the simulation response time of each computing node in the computing node set to obtain an actual optimal node set;
and scheduling the computing resources according to the ordering among the computing nodes in the actual optimal node set.
5. The method of claim 1, wherein if the similarity between the second ranking result and the first ranking result is higher than a preset threshold, proving that the first ranking result is reliable, and updating the actual ranking result to the first ranking result to obtain a third ranking result, specifically comprising:
under the condition that the similarity between the second sorting result and the first sorting result is higher than a preset threshold, the first sorting result is credible, the second sorting result is updated to the first sorting result to obtain a third sorting result, the simulation response time of a computing node with the shortest simulation response time spent on simulating and running the sample task set in the computing node set is obtained, whether the simulation response time is smaller than the completion time of the computing node where the sample task set is currently located is judged, and if yes, computing resources are scheduled according to the third sorting result; otherwise, it is determined that no scheduling of computing resources is required.
6. A resource scheduling apparatus, comprising:
the acquisition module is used for respectively acquiring the comprehensive performance conditions of the computing nodes according to the hardware information of the computing nodes in the cluster, summarizing and classifying the computing nodes according to the comprehensive performance conditions, marking the category of each computing node and obtaining a first sequencing result;
the screening module is used for predicting more needed computing resources in the target virtual machine according to the first sequencing result and the resource consumption condition of the target task set running in the virtual machine, and screening out a computing node set matched with the target task set;
the computing module is used for acquiring a sample task set according to the execution sequence priority of the most active father-son task flow in the virtual machine, randomly issuing the sample task set to part of computing nodes in the computing node set, and computing the simulation response time spent by the part of computing nodes for simulating and running the sample task set;
the scheduling module is used for sequencing the plurality of computing nodes according to the measured simulation response time of the computing nodes to obtain a second sequencing result, comparing the second sequencing result with the first sequencing result, if the similarity between the second sequencing result and the first sequencing result is higher than a preset threshold value, proving that the first sequencing result is reliable, updating the actual sequencing result to the first sequencing result to obtain a third sequencing result, and scheduling computing resources according to the third sequencing result.
7. The apparatus of claim 6, wherein the hardware information comprises a number of CPU chips, a number of individual CPU cores, a CPU reference frequency, a number of GPU chips, a number of individual GPU CUDA cores, a GPU core architecture, and a GPU core frequency;
the acquisition module is specifically configured to calculate, according to hardware information of each computing node in the cluster, a comprehensive performance score of each computing node according to the following formula:
FP C =S C ×CO C ×F C ×V C
FP G =S G ×CU G ×F G ×α
wherein N is P A is CPU performance standard value, b is GPU performance standard value, FP C Theoretical calculation power, FP, for CPU double-precision floating point calculation G Calculating theoretical performance for GPU single-precision floating point;
S C for CPU chip number, CO C F is the number of single CPU cores C For CPU reference frequency, V C Calculating coefficients for CPU single-core single-cycle single-precision floating points;
S G for GPU chip count, CU G For the number of CUDA cores of a single GPU, F G The core frequency of the GPU is shown, and alpha is the weight of the core architecture;
the obtaining module is further specifically configured to obtain a comprehensive performance score N according to each computing node P And sorting and classifying the plurality of computing nodes in the cluster, and determining the category of each computing node to obtain a first sorting result, wherein the category comprises CPU bias, GPU bias or no bias.
8. The apparatus of claim 7, wherein the device comprises a plurality of sensors,
the screening module is specifically used for acquiring the resource consumption condition of a target task set running in the virtual machine, and evaluating the type of hardware resource biased by the target task set according to CPU, GPU and memory resources consumed by the target task set; and screening a plurality of computing nodes matched with hardware resources biased by the target task set according to the evaluation result, the first sequencing result and the category of each computing node in the cluster.
9. The apparatus of claim 6, wherein the device comprises a plurality of sensors,
the computing module is specifically configured to issue the sample task set to a part of computing nodes in the computing node set, and calculate a simulated response time spent by each computing node in simulating running the sample task set;
the scheduling module is further used for sequencing the partial computing nodes according to the length sequence of the simulated response time to obtain a second sequencing result; under the condition that the similarity between the second sorting result and the first sorting result is lower than a preset threshold, the sample task set is issued to the rest of computing nodes except for the part of computing nodes in the computing node set, and the simulation response time spent by the rest of computing nodes for simulating running the sample task set is calculated; sequencing all the computing nodes in the computing node set according to the length sequence of the simulation response time of each computing node in the computing node set to obtain an actual optimal node set; and scheduling the computing resources according to the ordering among the computing nodes in the actual optimal node set.
10. The apparatus of claim 6, wherein the device comprises a plurality of sensors,
the scheduling module is specifically configured to, when the similarity between the second sorting result and the first sorting result is higher than a preset threshold, indicate that the first sorting result is reliable, update the second sorting result to the first sorting result, obtain a third sorting result, obtain a simulated response time of a computing node with a shortest simulated response time spent in simulating the sample task set in the computing node set, and determine whether the simulated response time is less than a completion time of a computing node where the sample task set is currently located, if so, schedule computing resources according to the third sorting result; otherwise, it is determined that no scheduling of computing resources is required.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311168782.5A CN117170848B (en) | 2023-09-11 | 2023-09-11 | Resource scheduling method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311168782.5A CN117170848B (en) | 2023-09-11 | 2023-09-11 | Resource scheduling method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117170848A true CN117170848A (en) | 2023-12-05 |
CN117170848B CN117170848B (en) | 2024-06-11 |
Family
ID=88942692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311168782.5A Active CN117170848B (en) | 2023-09-11 | 2023-09-11 | Resource scheduling method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117170848B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117742966A (en) * | 2023-12-28 | 2024-03-22 | 广州优刻谷科技有限公司 | Computing mode generation method, system and storage medium based on edge computing |
CN118037245A (en) * | 2024-04-12 | 2024-05-14 | 浪潮云洲工业互联网有限公司 | Comprehensive management and control method, equipment and medium for energy sources of power computing facility |
CN118296630A (en) * | 2024-05-29 | 2024-07-05 | 杭州锘崴信息科技有限公司 | Multi-party sharing processing method and device for data and government data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101650746A (en) * | 2009-09-27 | 2010-02-17 | 中国电信股份有限公司 | Method and system for verifying sequencing results |
CN110311987A (en) * | 2019-07-24 | 2019-10-08 | 中南民族大学 | Node scheduling method, apparatus, equipment and the storage medium of microserver |
CN110362392A (en) * | 2019-07-15 | 2019-10-22 | 深圳乐信软件技术有限公司 | A kind of ETL method for scheduling task, system, equipment and storage medium |
US20210103468A1 (en) * | 2019-10-03 | 2021-04-08 | International Business Machines Corporation | Performance biased scheduler extender |
CN113296905A (en) * | 2021-03-30 | 2021-08-24 | 阿里巴巴新加坡控股有限公司 | Scheduling method, scheduling device, electronic equipment, storage medium and software product |
CN116643875A (en) * | 2023-04-23 | 2023-08-25 | 超聚变数字技术有限公司 | Task scheduling method, server and server cluster |
-
2023
- 2023-09-11 CN CN202311168782.5A patent/CN117170848B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101650746A (en) * | 2009-09-27 | 2010-02-17 | 中国电信股份有限公司 | Method and system for verifying sequencing results |
CN110362392A (en) * | 2019-07-15 | 2019-10-22 | 深圳乐信软件技术有限公司 | A kind of ETL method for scheduling task, system, equipment and storage medium |
CN110311987A (en) * | 2019-07-24 | 2019-10-08 | 中南民族大学 | Node scheduling method, apparatus, equipment and the storage medium of microserver |
US20210103468A1 (en) * | 2019-10-03 | 2021-04-08 | International Business Machines Corporation | Performance biased scheduler extender |
CN113296905A (en) * | 2021-03-30 | 2021-08-24 | 阿里巴巴新加坡控股有限公司 | Scheduling method, scheduling device, electronic equipment, storage medium and software product |
CN116643875A (en) * | 2023-04-23 | 2023-08-25 | 超聚变数字技术有限公司 | Task scheduling method, server and server cluster |
Non-Patent Citations (1)
Title |
---|
吕相文 等: "云计算环境下多GPU资源调度机制研究", 《小型微型计算机系统》, no. 4, pages 687 - 693 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117742966A (en) * | 2023-12-28 | 2024-03-22 | 广州优刻谷科技有限公司 | Computing mode generation method, system and storage medium based on edge computing |
CN117742966B (en) * | 2023-12-28 | 2024-06-11 | 广州优刻谷科技有限公司 | Computing mode generation method, system and storage medium based on edge computing |
CN118037245A (en) * | 2024-04-12 | 2024-05-14 | 浪潮云洲工业互联网有限公司 | Comprehensive management and control method, equipment and medium for energy sources of power computing facility |
CN118296630A (en) * | 2024-05-29 | 2024-07-05 | 杭州锘崴信息科技有限公司 | Multi-party sharing processing method and device for data and government data |
Also Published As
Publication number | Publication date |
---|---|
CN117170848B (en) | 2024-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117170848B (en) | Resource scheduling method and device | |
CN113377540B (en) | Cluster resource scheduling method and device, electronic equipment and storage medium | |
CN109669774B (en) | Hardware resource quantification method, hardware resource arrangement method, hardware resource quantification device and hardware resource arrangement device and network equipment | |
CN110363427A (en) | Method and device for model quality assessment | |
CN108616553B (en) | Method and device for resource scheduling of cloud computing resource pool | |
CN103677960A (en) | Game resetting method for virtual machines capable of controlling energy consumption | |
CN110333991A (en) | Cloud platform task maximum resource usage prediction method | |
CN104182278A (en) | Method and device for judging busy degree of computer hardware resource | |
CN113807046A (en) | Test excitation optimization regression verification method, system and medium | |
CN111090401B (en) | Storage device performance prediction method and device | |
CN108681505A (en) | A kind of Test Case Prioritization method and apparatus based on decision tree | |
CN109918444A (en) | Training/verifying/management method/system, medium and equipment of model result | |
CN108833592A (en) | Cloud host schedules device optimization method, device, equipment and storage medium | |
CN117608809B (en) | Multi-task plan progress prediction system based on gradient lifting decision tree | |
CN117093463A (en) | Test program scheduling strategy generation method and device, storage medium and electronic equipment | |
CN116962419A (en) | Method and device for generating server allocation policy, electronic equipment and storage medium | |
CN116541128A (en) | Load adjusting method, device, computing equipment and storage medium | |
CN112052187B (en) | Distribution method of test matrix | |
CN116701896A (en) | Image tag determining method, image tag determining device, computer device, and storage medium | |
CN112148491B (en) | Data processing method and device | |
CN113076184A (en) | Power acquisition terminal task scheduling method based on fuzzy comprehensive evaluation | |
CN112767134A (en) | Sample screening method and device and electronic equipment | |
CN116775439B (en) | AI cloud computing resource pool evaluation system | |
CN112631776B (en) | A partition expansion method, device, storage medium and computer equipment of Kafka | |
Imai et al. | Simulation-Based Management Method for Circular Manufacturing Using Response Surfaces |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |