CN118394533B - Resource scheduling method, computer device, storage medium, and program product - Google Patents
Resource scheduling method, computer device, storage medium, and program product Download PDFInfo
- Publication number
- CN118394533B CN118394533B CN202410845845.4A CN202410845845A CN118394533B CN 118394533 B CN118394533 B CN 118394533B CN 202410845845 A CN202410845845 A CN 202410845845A CN 118394533 B CN118394533 B CN 118394533B
- Authority
- CN
- China
- Prior art keywords
- target
- interface device
- host
- target interface
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000013507 mapping Methods 0.000 claims abstract description 105
- 238000012545 processing Methods 0.000 claims abstract description 11
- 230000015654 memory Effects 0.000 claims description 33
- 230000001133 acceleration Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 9
- 238000007726 management method Methods 0.000 description 28
- 238000011176 pooling Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000013468 resource allocation Methods 0.000 description 4
- 239000007787 solid Substances 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/382—Information transfer, e.g. on bus using universal interface adapter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4004—Coupling between buses
- G06F13/4022—Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5022—Mechanisms to release resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Stored Programmes (AREA)
Abstract
The disclosure relates to the technical field of data processing, and provides a resource scheduling method, computer equipment, a storage medium and a program product, wherein the method comprises the following steps: receiving a scheduling request input by a user, wherein the scheduling request is used for indicating a target mapping relation, and the target mapping relation characterizes a target host to call resources of target interface equipment; determining the physical position of the target interface device according to the current mapping relation of the target interface device; thermally removing the target interface device to disconnect a current mapping relationship of the target interface device; and thermally adding the target interface device to the target host to establish the target mapping relationship between the target interface device and the target host. According to the technical scheme provided by one or more embodiments of the present disclosure, dynamic allocation and adjustment of equipment resources can be achieved, and utilization efficiency of the equipment resources is improved.
Description
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a resource scheduling method, a computer device, a storage medium, and a program product.
Background
In recent years, in the internet field, various differentiated computing scene demands are endless, such as artificial intelligence, machine learning, high-performance computing, and the like. In order to meet these differentiated computing demands, server providers often choose to decouple devices such as a central processing unit (CPU, central Processing Unit), a graphics processing unit (GPU, graphics Processing Unit), a Non-volatile memory host controller interface specification (Non-Volatile Memory Express, NVMe) hard disk, and integrate the resources of these devices through a whole cabinet, so that large-scale deployment and use of server products can be realized.
However, on a conventional whole cabinet system, the allocation relationship of device resources such as general computing resources (e.g., CPU resources), heterogeneous acceleration resources (e.g., GPU resources), storage resources (e.g., NVMe hard disk resources) and the like is generally fixed, so that it is difficult to satisfy the requirement of dynamic scheduling of the device resources inside the system.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide a resource scheduling method, a computer device, a storage medium, and a program product, which can implement dynamic allocation and adjustment of device resources, thereby improving the utilization efficiency of the device resources.
In a first aspect, the present disclosure provides a resource scheduling method, the method including: receiving a scheduling request input by a user, wherein the scheduling request is used for indicating a target mapping relation, and the target mapping relation characterizes a target host to call resources of target interface equipment; determining the physical position of the target interface device according to the current mapping relation of the target interface device; thermally removing the target interface device to disconnect a current mapping relationship of the target interface device; and thermally adding the target interface device to the target host to establish the target mapping relationship between the target interface device and the target host.
In a second aspect, the present disclosure provides a resource scheduling apparatus, the apparatus comprising: the receiving module is used for receiving a scheduling request input by a user, wherein the scheduling request is used for indicating a target mapping relation, and the target mapping relation characterizes a target host to call resources of target interface equipment; a determining module, configured to determine a physical location of the target interface device according to a current mapping relationship of the target interface device; a removal module, configured to thermally remove the target interface device, so as to disconnect a current mapping relationship of the target interface device; and the adding module is used for thermally adding the target interface device to the target host so as to establish the target mapping relation between the target interface device and the target host.
In a third aspect, the present disclosure provides a computer device, where the electronic device includes a memory and a processor, where the memory is configured to store a computer program, and where the computer program is executed by the processor to implement the resource scheduling method described above.
In a fourth aspect, the present disclosure provides a computer readable storage medium for storing a computer program which, when executed by a processor, implements the above-described resource scheduling method.
In a fifth aspect, the present disclosure provides a computer program product comprising computer instructions for causing a computer to perform the above-described resource scheduling method.
According to the technical scheme provided by one or more embodiments of the present disclosure, a resource calling relationship between a target host and a target interface device can be established through a scheduling request, dynamic allocation and adjustment of interface device resources can be realized at a software layer, and the utilization efficiency of device resources is improved. In the process, the resource allocation relation which is expected to be realized by the user can be accurately transmitted by establishing the target mapping relation. The position of the target interface device can be quickly locked through the current mapping relation of the target device, and the efficiency of device resource redistribution is improved. In addition, based on the technical means of heat removal and heat addition, the redeployment of the interface equipment resources can be rapidly completed, and the efficiency of dynamically distributing the equipment resources is further improved.
Drawings
The features and advantages of the various embodiments of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are schematic and should not be construed as limiting the disclosure in any way, in which:
FIG. 1 shows a schematic diagram of steps of a resource scheduling method in one embodiment of the present disclosure;
fig. 2 is a schematic diagram of a system structure of a whole cabinet system according to an embodiment of the disclosure;
FIG. 3 illustrates a system architecture diagram of another whole cabinet system in one embodiment of the present disclosure;
FIG. 4 shows a flow diagram of a resource scheduling method in one embodiment of the present disclosure;
FIG. 5 shows a schematic diagram of steps of a templated resource scheduling method in one embodiment of the disclosure;
FIG. 6 shows a flow diagram of a method for scheduling a plated resource in one embodiment of the disclosure;
FIG. 7 is a schematic diagram of functional modules of a resource scheduling apparatus according to an embodiment of the present disclosure;
Fig. 8 is a schematic diagram showing a hardware configuration of a computer device in one embodiment of the present disclosure.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by those skilled in the art without the inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
In the related art, devices such as a general computing resource (e.g., a CPU resource) device, a heterogeneous acceleration resource (e.g., a GPU resource) device, a storage resource (e.g., an NVMe hard disk resource) device, and the like may be integrated on a whole cabinet system through a device interface in a modularized manner, so as to implement large-scale deployment and use of server products. However, on conventional whole cabinet systems, the resource scheduling and allocation relationships between these devices are generally fixed, and it is difficult to meet the requirements of dynamic scheduling of device resources inside the system.
In view of this, the resource scheduling method provided in one or more embodiments of the present disclosure may implement dynamic allocation and adjustment of device resources, thereby improving the utilization efficiency of the device resources. Referring to fig. 1, a resource scheduling method provided in one embodiment of the present disclosure may include the following steps.
S1: and receiving a scheduling request input by a user, wherein the scheduling request is used for indicating a target mapping relation, and the target mapping relation characterizes a target host to call resources of target interface equipment.
In this embodiment, the target host may be a host device including computing resources, and may be responsible for executing program instructions, processing data, controlling other devices, and so on. The target interface device may be a heterogeneous acceleration resource device, a storage resource device, etc., and may be controlled by the host device, and may be called by the host device. Under the control of the target host, the target interface device can cooperate with the target host to realize the functions of some computing systems together.
In this embodiment, the scheduling request input by the user may be received in various manners, such as a touch screen, a screen input, a keyboard input, a voice command, a physical switch, and the like. According to the input request of the user, the target mapping relation which the user expects to realize can be determined. According to the target mapping relation, the resource of a target host using a new target interface device can be adjusted, and the resource of the target interface device can be used by the new target host. Thus, flexible allocation and scheduling of interface device resources can be realized.
In some embodiments, a current usage status of an interface device may be detected. The interface device may be determined to be in an idle, normal, high-load, or other state by determining a comparison result between a certain index (e.g., a resource utilization rate of the interface device, an average power consumption during a preset period, an operating temperature, etc.) reflecting a current use state of the interface device and a predetermined threshold. According to different use states of the interface equipment, prompt information can be sent to the user to guide the user to issue a scheduling request.
Optionally, after statistics of the use states of different interface devices, an optional scheduling request item can be generated through a preset formula, an analysis model and the like, so that a user can select the optional scheduling request item, and resource scheduling efficiency is further improved.
In some implementations, the target interface device may include a heterogeneous acceleration device in the first resource pool and a storage device in the second resource pool. The target mapping relationship may be that the target host invokes a resource of at least one heterogeneous acceleration device or that the target host invokes a resource of at least one storage device. The target mapping relationship may also be that the target host invokes the resource of at least one heterogeneous acceleration device while invoking the resource of at least one storage device. The target mapping relationship may also be a resource that the target host invokes other interface devices (e.g., network card, sound card, etc.).
In a practical application scenario, the heterogeneous acceleration device may include at least one of a graphics processing unit, a digital signal processor, a field programmable gate array, an application specific integrated circuit, and the like. The storage device may include at least one of a non-volatile memory host controller interface specification hard disk, hard disk drive, solid state disk, memory, mountain drive, memory card, storage tape, and the like.
In some embodiments, the resource scheduling method may be applied to a switching device. The first number of candidate hosts and the second number of candidate interface devices may be indirectly connected through the switching device. A candidate physical path may be provided between any candidate host and any candidate interface device. These candidate physical paths may be used to configure the data link. The data link may be used to establish a mapping relationship. Alternatively, the switching device may connect the candidate host and the candidate interface device using a peripheral component interconnect express (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIe) type port in a unified manner.
The multiple hosts and the multiple interface devices are connected through the switching device, and a physical channel between any host and any interface device is communicated, so that the hardware foundation of the resource scheduling method is ensured. And a mapping relation is established based on the configured data link, so that the software layer can rapidly deploy resource scheduling tasks.
In some embodiments, the switching device may include a third number of switching chips, the switching chips being connected in pairs. Any of the switch chips may include a fourth number of switch chip ports. The switch chip port may be physically connected to either the candidate host or the candidate interface device. The switch chip port may be a PCIe type port.
The switching equipment comprises a plurality of switching chips and a plurality of chip ports of each switching chip, so that the switching equipment can be connected with a large number of hosts and interface equipment simultaneously, and the resource processing capacity of the whole cabinet is improved.
Referring to fig. 2, in an actual application example, a whole cabinet system may support 8 hosts, 16 GPUs, and 64 NVMe hard disks. Wherein, 8 hosts are located in 8 host boxes, 16 GPUs are located in one GPU box, 64 NVMe hard disks are distributed in 4 hard disk boxes, and 16 NVMe hard disks are arranged in each hard disk box. Through a switch box, 16 GPUs located in the GPU box and 64 NVMe hard disks located in the 4 hard disk boxes can be connected at the same time.
Inside the switch box, there may be 8 switch chips, each of which may have 9 switch chip ports. The exchange chips are connected in pairs, so that a host connected with the exchange chips can be connected with interface equipment connected with other exchange chips, and a physical path exists between the host and the interface equipment.
In order to facilitate software management and maintain the resource allocation relation between the host and the interface device in the whole system, the hardware can be identified by a number. For example, the box number of the host can be set to 0 to 7, respectively, depending on the location of the host box in the cabinet. The box number of the GPU may be set according to the location of the GPU box in the whole cabinet. The device number of the GPU may be set to 0 to 15, respectively, depending on the location of each GPU within a single GPU box. The cartridge number of the NVMe hard disk may be set according to the position of the hard disk cartridge in the whole cabinet. The device number of the NVMe hard disk may be set to 0 to 15, respectively, according to the position of each NVMe hard disk within a single hard disk cartridge. The switch chip numbers may be set to 0 to 7, respectively, depending on the location of the switch chip within the switch box. The chip port number may be set to 0 to 8, respectively, according to the definition of each switch chip port inside a single switch chip.
S2: and determining the physical position of the target interface equipment according to the current mapping relation of the target interface equipment.
In this embodiment, in order to implement resource allocation of the target interface device, it is necessary to accurately determine the actual physical location of the resource, so as to perform configuration and management of the data link. The method for determining the physical location of the target interface device may be to query the current mapping relationship of the target interface device. The physical location of the interface device can be determined by the parameters recorded in the mapping relationship.
In some embodiments, the target interface device may be packaged within an interface device box, the physical location of the target interface device may be determined, and the box identification and device identification of the target interface device may be determined first. And determining a target interface device box corresponding to the target interface device and a first physical position of the target interface device box based on the box identification. The second physical location of the target interface device in the target interface device box may be further determined by the device identification. From the first physical location and the second physical location, a particular physical location of the target interface device may be determined.
In some embodiments, a target mapping relationship between a target host and a target interface device may be recorded, and an allocation relationship table may be generated. The allocation relation table can monitor the current scheduled status of the target interface device. The allocation relation table can also quickly provide the position information of the target interface device when the device resource is scheduled next time. Optionally, some preset data formats may be used to record the target mapping relationship between the target host and the target interface device. For example, json format may be employed, which reduces data storage and is easy to read and edit.
In one practical example, an allocation relation table may be as follows:
{
"Host Box_ID": "0",
"Switch ID": "0",
"Switch_Port ID": "4",
"Device": [
{
"GPU Box_ID": "0",
"GPU ID": "3",
"Switch ID": "0",
"Switch_Port ID": "5",
},
{
"NVMe Box_ID": "1",
"NVMe ID": "5",
"Switch ID": "0",
"Switch_Port ID": "6",
}
]
}。
wherein the Host box_id may represent a Box identification of the Host. The GPU Box_ID may represent a Box identification of the GPU and the GPU ID may represent a device identification of the GPU. The NVMe box_id may represent a Box identification of the NVMe hard disk, and the NVMe ID may represent a device identification of the NVMe hard disk. The Switch ID may represent the device identification of the Switch chip in the Switch device box. The switch_port ID may represent a chip Port identification of the Switch chip.
It can be seen that, according to the allocation relation table, it is possible to specify which port of the switching device is connected to the host in the host box with a specific number, and to have a data link relation with which interface device is connected to the port of the switching device. At the same time, the name and position of the interface device connected with the host are recorded.
S3: and thermally removing the target interface device to disconnect the current mapping relation of the target interface device.
In the embodiment, the target interface device can be removed by heat, so that the interface device can be removed conveniently on the basis of not influencing the operation of the host, the maintenance time and the cost are reduced, and the resource scheduling efficiency is improved.
In this embodiment, the thermal removal of the target interface device may be to disconnect the current data link of the target interface device and maintain the current physical path of the target interface device. The broken data link realizes the recovery of equipment resources, and the maintained physical path ensures the reuse of the equipment resources.
In some embodiments, a reset instruction may be sent to the target interface device, and the reset completed target interface device may be hot-added to the target host. And the reset operation can ensure the state consistency of the target interface equipment, ensure the equipment to restore to the initial state and avoid the problem of new connection caused by the prior state or configuration problem. The reset operation can also clear potential errors, can clear error states or abnormal data possibly existing in the equipment, and is prepared for reconnection. After the device is reset, the next hot addition of the target interface device in the optimal state can be ensured, and the risk of connection failure is reduced.
S4: and thermally adding the target interface device to the target host to establish the target mapping relationship between the target interface device and the target host.
In the embodiment, the target interface device is added by heat, so that the interface device can be added conveniently on the basis of not influencing the operation of the host, the maintenance time and the cost are reduced, and the resource scheduling efficiency is improved. The target mapping relation between the target interface device and the target host is established, so that the target host can call the resources of the target interface device, and the recycling and the optimal configuration of the equipment resources are realized.
In some implementations, the target interface device is hot-added to the target host, and a target data link between the target host and the target interface device may be configured according to a target physical path of the candidate physical paths. The target physical path may be formed by the switching device connecting the target host with the target interface device.
Referring to fig. 3, in a practical application example, in a whole cabinet system, a host and an interface device may complete pooled interconnection through a high-performance switching device. The hosts may be located in a host box, and multiple host boxes may form a common pool of computing resources. The interface devices may be located in respective interface device boxes, and the plurality of interface device boxes may form an interface device resource pool with Input/Output (IO) interfaces. The switching device may connect and manage devices in the general purpose computing resource pool, the interface device resource pool. The switching device may be a switching device box and may include a pooling management engine, dedicated physical channels, and dedicated software interfaces to interact with users.
The pooling management engine in the switching device may centrally manage and control the devices in all resource pools through a programmable interactive interface, such as a universal asynchronous receiver Transmitter (Universal Asynchronous Receiver/Transmitter, UART). Through UART interfaces, the pooling management engine can dynamically configure the data link attribute of the IO ports of different devices. Under different application scenarios, the pooled management engine may be compatible with a variety of hosts and interface devices.
The pooling management engine in the switching device may also provide an application programming interface (Application Programming Interface, API), such as a Representational state transfer (Representational STATE TRANSFER, REST) style API, for the whole cabinet management platform to ensure that the whole cabinet management platform performs centralized scheduling and management on the resources of the resource pool.
Referring to fig. 4, in an actual application example, a user may perform operations such as on-demand allocation, dynamic expansion and resource release on resources such as computing resources and interface device resources through an application programming interface provided by the pooling management engine. The method is specifically designed as follows, and a user can call different amounts of interface equipment resources to be matched with the computing nodes according to load requirements. When some interface device resources are temporarily unloaded, the user can release the resources back to the whole interface device resource pool so as to be convenient for other computing nodes to use, thereby realizing the efficient circulation and full utilization of the interface device resources. Assuming that an interface device dynamically adjusts from one host to another, a specific dynamic resource switching procedure may be as follows.
1. The user initiates a resource dynamic adjustment request according to the load condition of the interface equipment resources (such as heterogeneous acceleration resources) on the host A, and adjusts the interface equipment B without resource load;
2. The pooling management engine obtains the exchange chip identification, the exchange chip port identification, the equipment box identification and the equipment identification of the interface equipment B according to the mapping relation between the host A and the interface equipment B provided by the whole cabinet management platform;
3. The pooling management engine confirms the physical location of the interface device B;
4. the pooling management engine sends UART commands to the exchange chip and thermally removes the interface device B;
5. The whole cabinet management platform sends a reset signal to the interface equipment B;
6. The interface device B completes the reset and can send reset feedback information to the pooling management engine;
7. the pooling management engine sends UART commands to the switch chip to hot add the interface device B to another host C.
It should be noted that, the user may acquire the load condition of the interface device resource on the host a, which may be obtained by monitoring each host through the whole cabinet management platform, or may be actively reported by each host according to a preset rule (for example, some trigger conditions for low-load devices).
Referring to fig. 5, the resource scheduling method provided in one embodiment of the present disclosure may further include the following steps.
S21: and responding to a selection instruction input by a user, and reading a preset scheduling template.
S22: and configuring a host and interface equipment related to the scheduling template according to at least one group of expected mapping relations contained in the scheduling template, and realizing the expected mapping relations, wherein the expected mapping relations represent resources of an expected host calling expected interface equipment.
In this embodiment, the specific method and content for implementing the desired mapping relationship may be referred to the above steps S1 to S4, which are not described herein again.
It should be noted that, in order to meet the requirement of fast switching of the interface device resources in multiple application scenarios, that is, flexible dynamic scheduling of the interface device resources in different application scenarios, the present disclosure further defines scheduling templates corresponding to multiple typical application scenarios. In each template, expected mapping relations of some hosts and interface devices are preset, and the expected mapping relations may be some optimized mapping relations aiming at the scene. The determination of these desired mappings may be predetermined based on historical data and may be generated through model training. Based on the scheduling template aiming at the typical application scene, the resource scheduling is rapidly and automatically performed, so that the scheduling efficiency of the equipment resources can be improved.
In one practical application example, the content of one scheduling template may be in the following form:
{
"Host Box_ID": "0",
"Switch ID": "0",
"Switch_Port ID": "4",
"Device": [
{
"GPU Box_ID": "0",
"GPU ID": "3",
"Switch ID": "0",
"Switch_Port ID": "5",
},
… … … N devices
]
}
……
{
"Host Box_ID": "8",
"Switch ID": "8",
"Switch_Port ID": "4",
"Device": [
{
"GPUB Box_ID": "0",
"GPU ID": "13",
"Switch ID": "0",
"Switch_Port ID": "15",
},
… … … N devices
]
}。
In the scheduling template, a mapping relationship of a plurality of hosts to the interface device may be determined. When the scheduling template is successfully applied, each host can use the resources of the corresponding interface device according to the respective mapping relation.
In some implementations, a template switch request may be received that is input by a user. According to the template switching request, the mapping relation between the candidate host and the candidate interface device can be changed from the first expected mapping relation in the first template to the second expected mapping relation in the second template.
Specifically, the first desired mapping relationship and the second desired mapping relationship may be compared to determine a distinguishing host from a distinguishing interface device. And configuring a mapping relation corresponding to the distinguishing host and a mapping relation corresponding to the distinguishing interface device according to the second expected mapping relation. In this process, the mapping relationship between the part of hosts and the part of interface devices which exist together in the first expected mapping relationship and the second expected mapping relationship can be reserved, so that repeated scheduling is avoided.
Because different scheduling templates are defined, the scheduling and configuration of equipment resources can be automatically realized by receiving template switching requests input by a user in modes of touch, screen input, keyboard input, voice command, physical switch and the like, and the user is not required to repeatedly add a plurality of scheduling requests, so that the resource scheduling efficiency is improved.
Referring to fig. 6, in an actual application example, according to an actual load condition of a device resource, a user may initiate a request for switching a scheduling template according to a prompt message of a management platform of a complete machine cabinet. The pooled management engine can respond to the template switching request to realize rapid deployment of a plurality of device resources in the interface device resource pool. The specific flow can be as follows:
1. The user initiates a templated resource dynamic scheduling request according to the load;
2. The whole cabinet management platform compares the difference between the mapping relation between the current multiple hosts and the interface devices and the expected mapping relation in the template configuration file, confirms the exchange chip identification, the exchange chip port identification, the equipment box identification and the equipment identification of the first interface device to be released, and sends a first application programming interface instruction to the pooling management engine;
3. the pooling management engine sends UART commands to the exchange chip and thermally removes the first interface device;
4. The whole cabinet management platform inquires the current equipment mapping relation and sends a reset signal to the corresponding first interface equipment;
5. The whole cabinet management platform compares the difference between the mapping relation between the current multiple hosts and the interface devices and the expected mapping relation in the template configuration file, confirms the exchange chip identification, the exchange chip port identification, the device box identification and the device identification of the second interface device needing hot addition, and sends a second application programming interface instruction to the pooling management engine;
6. the pooling management engine sends UART commands to the exchange chip, and a second interface device is added hot;
7. And the whole cabinet management platform updates and records the mapping relation between the current multiple hosts and the interface equipment to finish the templated resource dynamic scheduling.
In the process of carrying out dynamic scheduling of the templated resources, the premise of thermally removing a certain interface device is that the service related to the interface device is finished running, so that system faults or service accidents are avoided.
According to the technical scheme provided by one or more embodiments of the present disclosure, a resource calling relationship between a target host and a target interface device can be established through a scheduling request, dynamic allocation and adjustment of interface device resources can be realized at a software layer, and the utilization efficiency of device resources is improved. In the process, the resource allocation relation which is expected to be realized by the user can be accurately transmitted by establishing the target mapping relation. The position of the target interface device can be quickly locked through the current mapping relation of the target device, and the efficiency of device resource redistribution is improved. In addition, based on the technical means of heat removal and heat addition, the redeployment of the interface equipment resources can be rapidly completed, and the efficiency of dynamically distributing the equipment resources is further improved.
According to the technical scheme provided by one or more embodiments of the present disclosure, a plurality of scheduling templates are provided for users to use in a typical application scenario, so that dynamic scheduling of equipment resources can be rapidly realized. Meanwhile, on-line dynamic switching of different resource scheduling templates can be supported, and rapid deployment of a plurality of devices in the device resource pool is realized on the premise that all services in the whole cabinet system are not affected.
The disclosure further provides a resource scheduling device, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Referring to fig. 7, a resource scheduling apparatus provided in one embodiment of the present disclosure includes:
A receiving module 100, configured to receive a scheduling request input by a user, where the scheduling request is used to indicate a target mapping relationship, and the target mapping relationship characterizes a target host to call a resource of a target interface device;
A determining module 200, configured to determine a physical location of the target interface device according to a current mapping relationship of the target interface device;
a removal module 300, configured to thermally remove the target interface device to disconnect a current mapping relationship of the target interface device;
An adding module 400, configured to thermally add the target interface device to the target host, so as to establish the target mapping relationship between the target interface device and the target host.
In one embodiment, the method is applied to a switching device, a first number of candidate hosts and a second number of candidate interface devices are indirectly connected through the switching device, and a candidate physical path is arranged between any candidate host and any candidate interface device, wherein the candidate physical path is used for configuring a data link, and the data link is used for establishing a mapping relation.
In one embodiment, the switching device includes a third number of switching chips, where the switching chips are connected in pairs, and any of the switching chips includes a fourth number of switching chip ports, where the switching chip ports are used to physically connect to the candidate host or the candidate interface device.
In one embodiment, the adding module 400 is specifically configured to configure a target data link between the target host and the target interface device according to a target physical path in the candidate physical paths, where the target physical path is formed by the switching device connecting the target host and the target interface device.
In one embodiment, the removing module 300 is specifically configured to disconnect the current data link of the target interface device and maintain the current physical path of the target interface device; sending a reset instruction to the target interface device; and thermally adding the target interface device with the reset completion to the target host.
In one embodiment, the target interface device includes a heterogeneous acceleration device in a first resource pool, and a storage device in a second resource pool, and the target mapping relationship includes: the target host invokes resources of at least one of the heterogeneous acceleration devices and/or the target host invokes resources of at least one of the storage devices.
In one embodiment, the heterogeneous acceleration device includes a graphics processing unit and the storage device includes a non-volatile memory host controller interface specification hard disk.
In one embodiment, the target interface device is encapsulated in an interface device box, and the determining module 200 is specifically configured to determine a box identifier and a device identifier of the target interface device; determining a target interface device box corresponding to the target interface device and a first physical position of the target interface device box based on the box identifier; determining a second physical location of the target interface device in the target interface device box based on the device identification; and determining the physical position of the target interface device according to the first physical position and the second physical position.
In one embodiment, the apparatus further includes a recording module 500 configured to record the target mapping relationship between the target host and the target interface device in a preset data format, and generate an allocation relationship table, where the preset data format includes json format.
In one embodiment, the apparatus further includes a template selection module 600, configured to read a preset scheduling template in response to a selection instruction input by a user; and configuring a host and interface equipment related to the scheduling template according to at least one group of expected mapping relations contained in the scheduling template, and realizing the expected mapping relations, wherein the expected mapping relations represent resources of an expected host calling expected interface equipment.
In one embodiment, the apparatus further includes a template selection module 600, further configured to receive a template switching request input by a user; and changing the mapping relation between the candidate host and the candidate interface device from the first expected mapping relation in the first template to the second expected mapping relation in the second template according to the template switching request.
In one embodiment, the apparatus further includes a template selection module 600, specifically configured to compare the first expected mapping relationship and the second expected mapping relationship, and determine a distinguishing host and a distinguishing interface device; and configuring the mapping relation corresponding to the distinguishing host and the mapping relation corresponding to the distinguishing interface equipment according to the second expected mapping relation.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The resource scheduling device in this embodiment is presented in the form of a functional unit, where the module refers to an ASIC (Application SPECIFIC INTEGRATED Circuit) Circuit, a processor and a memory that execute one or more software or firmware programs, and/or other devices that can provide the above functions.
The disclosure also provides a computer device having the resource scheduling apparatus shown in fig. 7.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the disclosure, as shown in fig. 8, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 8.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 40 may be connected by a bus or other means, for example in fig. 8.
The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Portions of the present invention may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or aspects in accordance with the present invention by way of operation of the computer. Those skilled in the art will appreciate that the form of computer program instructions present in a computer readable medium includes, but is not limited to, source files, executable files, installation package files, etc., and accordingly, the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for embodiments of the apparatus, computer device, storage medium, and program product, the description is relatively simple, as it is substantially similar to the method embodiments, with reference to the description of portions of the method embodiments being relevant.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.
Although embodiments of the present disclosure have been described with reference to the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the disclosure, and such modifications and variations fall within the scope as defined by the appended claims.
Claims (14)
1. A method for scheduling resources, the method comprising:
receiving a scheduling request input by a user, wherein the scheduling request is used for indicating a target mapping relation, and the target mapping relation characterizes a target host to call resources of target interface equipment;
Determining the physical position of the target interface device according to the current mapping relation of the target interface device;
Thermally removing the target interface device to disconnect a current mapping relationship of the target interface device;
Thermally adding the target interface device to the target host to establish the target mapping relationship of the target interface device and the target host;
Responding to a selection instruction input by a user, and reading a preset scheduling template;
And configuring a host and interface equipment related to the scheduling template according to at least one group of expected mapping relations contained in the scheduling template, and realizing the expected mapping relations, wherein the expected mapping relations represent resources of an expected host calling expected interface equipment.
2. The method according to claim 1, wherein the method is applied to a switching device, a first number of candidate hosts and a second number of candidate interface devices are indirectly connected through the switching device, and a candidate physical path is provided between any one of the candidate hosts and any one of the candidate interface devices, wherein the candidate physical path is used for configuring a data link, and the data link is used for establishing a mapping relationship.
3. The method of claim 2, wherein the switching device comprises a third number of switching chips, the switching chips being connected in pairs, any one of the switching chips comprising a fourth number of switching chip ports for physically connecting to the candidate host or the candidate interface device.
4. A method according to claim 2 or 3, wherein said hot-adding the target interface device to the target host comprises:
And configuring a target data link between the target host and the target interface device according to a target physical path in the candidate physical paths, wherein the target physical path is formed by connecting the target host and the target interface device by the switching device.
5. A method according to claim 2 or 3, wherein the thermally removing the target interface device comprises:
disconnecting a current data link of the target interface device and maintaining a current physical path of the target interface device;
sending a reset instruction to the target interface device;
and thermally adding the target interface device with the reset completion to the target host.
6. The method of claim 1, wherein the target interface device comprises a heterogeneous acceleration device in a first resource pool and a storage device in a second resource pool, the target mapping relationship comprising:
The target host invokes resources of at least one of the heterogeneous acceleration devices and/or the target host invokes resources of at least one of the storage devices.
7. The method of claim 6, wherein the heterogeneous acceleration device comprises a graphics processing unit and the storage device comprises a non-volatile memory host controller interface specification hard disk.
8. The method of claim 1, wherein the target interface device is packaged within an interface device box, and wherein determining the physical location of the target interface device comprises:
determining a box identifier and an equipment identifier of the target interface equipment;
determining a target interface device box corresponding to the target interface device and a first physical position of the target interface device box based on the box identifier;
determining a second physical location of the target interface device in the target interface device box based on the device identification;
And determining the physical position of the target interface device according to the first physical position and the second physical position.
9. The method according to claim 1, wherein the method further comprises:
And recording the target mapping relation between the target host and the target interface device by adopting a preset data format, and generating an allocation relation table, wherein the preset data format comprises a json format.
10. The method according to claim 1, wherein the method further comprises:
receiving a template switching request input by a user;
And changing the mapping relation between the candidate host and the candidate interface device from the first expected mapping relation in the first template to the second expected mapping relation in the second template according to the template switching request.
11. The method of claim 10, wherein changing the mapping between the candidate host and the candidate interface device from the first desired mapping in the first template to the second desired mapping in the second template comprises:
comparing the first expected mapping relation and the second expected mapping relation, and determining a distinguishing host and distinguishing interface equipment;
and configuring the mapping relation corresponding to the distinguishing host and the mapping relation corresponding to the distinguishing interface equipment according to the second expected mapping relation.
12. A computer device, comprising:
A memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the resource scheduling method of any one of claims 1 to 11.
13. A computer readable storage medium having stored thereon computer instructions for causing a computer to perform the resource scheduling method of any one of claims 1 to 11.
14. A computer program product comprising computer instructions for causing a computer to perform the resource scheduling method of any one of claims 1 to 11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410845845.4A CN118394533B (en) | 2024-06-27 | 2024-06-27 | Resource scheduling method, computer device, storage medium, and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410845845.4A CN118394533B (en) | 2024-06-27 | 2024-06-27 | Resource scheduling method, computer device, storage medium, and program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118394533A CN118394533A (en) | 2024-07-26 |
CN118394533B true CN118394533B (en) | 2024-10-01 |
Family
ID=91986469
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410845845.4A Active CN118394533B (en) | 2024-06-27 | 2024-06-27 | Resource scheduling method, computer device, storage medium, and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118394533B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111880750A (en) * | 2020-08-13 | 2020-11-03 | 腾讯科技(深圳)有限公司 | Method, device and equipment for distributing read-write resources of disk and storage medium |
CN117978811A (en) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | Mapping relation determination method and system, storage medium and electronic device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114356516A (en) * | 2021-12-23 | 2022-04-15 | 科大讯飞股份有限公司 | Resource scheduling method, related device, equipment and storage medium |
CN114675976B (en) * | 2022-05-26 | 2022-09-16 | 深圳前海环融联易信息科技服务有限公司 | GPU (graphics processing Unit) sharing method, device, equipment and medium based on kubernets |
-
2024
- 2024-06-27 CN CN202410845845.4A patent/CN118394533B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111880750A (en) * | 2020-08-13 | 2020-11-03 | 腾讯科技(深圳)有限公司 | Method, device and equipment for distributing read-write resources of disk and storage medium |
CN117978811A (en) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | Mapping relation determination method and system, storage medium and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN118394533A (en) | 2024-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10108460B2 (en) | Method and system for integrated deployment planning for virtual appliances | |
US7793298B2 (en) | Facilitating access to input/output resources via an I/O partition shared by multiple consumer partitions | |
US20070101029A1 (en) | Multiplexed computer peripheral device connection switching interface | |
CN104503932B (en) | More mainboard server main substrate Management Controller referee methods and system | |
CN113312142B (en) | Virtualized processing system, method, device and equipment | |
US10972361B2 (en) | System and method for remote hardware support using augmented reality and available sensor data | |
CN105450759A (en) | System mirror image management method and device | |
US7614055B2 (en) | Selecting a processor to run an executable of a distributed software application upon startup of the distributed software application | |
US20060167886A1 (en) | System and method for transmitting data from a storage medium to a user-defined cluster of local and remote server blades | |
CN116723198A (en) | Multi-node server host control method, device, equipment and storage medium | |
US20060010133A1 (en) | Management of a scalable computer system | |
JP4786255B2 (en) | Storage system and storage control method | |
US10817820B2 (en) | Facilitating provisioning in a mixed environment of locales | |
US7516108B2 (en) | Block allocation times in a computer system | |
CN112965806B (en) | Method and device for determining resources | |
CN118394533B (en) | Resource scheduling method, computer device, storage medium, and program product | |
CN110532150B (en) | Case management method and device, storage medium and processor | |
CN117041184B (en) | IO expansion device and IO switch | |
US10528397B2 (en) | Method, device, and non-transitory computer readable storage medium for creating virtual machine | |
CN103617077A (en) | Intelligent cloud migration method and system | |
CN108958889A (en) | The management method and device of virtual machine in cloud data system | |
CN117472596B (en) | Distributed resource management method, device, system, equipment and storage medium | |
US11960899B2 (en) | Dual in-line memory module map-out in an information handling system | |
US8819203B1 (en) | Techniques for providing an application service to an application from an appliance-style application services platform | |
US20240248701A1 (en) | Full stack in-place declarative upgrades of a kubernetes cluster |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |