Nothing Special   »   [go: up one dir, main page]

CN116366727A - Service request and service quality driven edge resource scheduling method and application - Google Patents

Service request and service quality driven edge resource scheduling method and application Download PDF

Info

Publication number
CN116366727A
CN116366727A CN202310300866.3A CN202310300866A CN116366727A CN 116366727 A CN116366727 A CN 116366727A CN 202310300866 A CN202310300866 A CN 202310300866A CN 116366727 A CN116366727 A CN 116366727A
Authority
CN
China
Prior art keywords
service
request
upper limit
resource
concurrent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310300866.3A
Other languages
Chinese (zh)
Other versions
CN116366727B (en
Inventor
曾锃
滕昌志
缪巍巍
夏元轶
张瑞
李世豪
张明轩
毕思博
余益团
肖茂然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202310300866.3A priority Critical patent/CN116366727B/en
Priority claimed from CN202310300866.3A external-priority patent/CN116366727B/en
Publication of CN116366727A publication Critical patent/CN116366727A/en
Application granted granted Critical
Publication of CN116366727B publication Critical patent/CN116366727B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Computer And Data Communications (AREA)

Abstract

The invention discloses a service request and service quality driven edge resource scheduling method and application, which are applied to a server-free computing cluster, wherein the server-free computing cluster comprises a plurality of edge nodes, each edge node comprises a plurality of service instances, and the method specifically comprises the following steps: inquiring the service states of all service instances on the edge node to determine whether to send out alarm information; if so, determining a priority queue based on service states and time delay sensitivity degrees of all service instances on the edge node, wherein the priority queue comprises service instances with resource usage amount larger than the application amount and real-time concurrent request amount larger than 0, and the service instances in the priority queue have priority information; based on the priority information, an upper limit of the number of concurrent requests of the service instance in the priority queue is adjusted. The method can adapt to the situation that different service instances need different upper limits of the resource amount, and can adapt to the requirement of different service requests on time delay.

Description

Service request and service quality driven edge resource scheduling method and application
Technical Field
The present invention relates to the technical field of resource scheduling, and in particular, to a method for scheduling edge resources driven by service requests and quality of service and an application thereof.
Background
The cloud computing reduces the burden of a user to manage physical facilities, and the operation and maintenance management capability is sunk to an infrastructure layer, so that the operation and maintenance management capability is used as a trend after unpacking. The server-free calculation is used as a novel cloud calculation model, only a user is required to provide core codes of services, and a platform for supporting code operation is managed and maintained by cloud manufacturers.
In a serverless computing platform, one service instance may carry multiple concurrent requests simultaneously. However, with different resource allocations, the number of concurrent requests that a single instance can carry is different, and the latency requirements and request rates for different service instances may also be different. Meanwhile, when multiple service instances are run on a single node, there may be a resource contention phenomenon, where resources allocated to one instance are limited, and the higher the instance load is, the more serious the resource contention phenomenon is caused, and the delay of the request is prolonged.
The instance load and the request execution time can show a positive correlation, and the larger the load is, the larger the time delay of instance processing requests is. Therefore, not only is a reasonable initial load upper limit value set for the service instance under a certain resource amount, but also the load upper limit value is required to be dynamically modified according to real-time conditions in the running process of the service instance. Therefore, how to ensure that service instances are reasonably scheduled on the premise of meeting the service quality is a problem to be solved.
In the conventional single application mode, in order not to affect the delay sensitive application, the application is always exclusive to the node, but this results in low cluster resource utilization. There are studies on obtaining the influence of contention on an application through offline testing, judging whether the application can be mixed in an offline application through a regression model or a classification model, and also studies on judging whether a delay-sensitive application can run in the same node with more batch applications by monitoring the service quality of the application online, or obtaining a scheduling strategy by using a small amount of offline test data in a collaborative filtering mode. There has been no study to develop a study of multi-service instance competing node resources in a serverless computing scenario.
A clustered resource management platform (e.g., YARN, kubernetes) allows the upper limit of the amount of resources for multiple instances on a node to break through the amount of allocable resources for the node, helping to handle the request load of an application at peak. However, when multiple instances on a node are under high load, contention on resources may occur for processes between the multiple instances due to the limited resources of the node, resulting in a degradation of the quality of service of each other.
Most of the researches do not consider the edge node out-of-service computing scenario in the existing service instance scheduling method. The multi-service instance on the node competes for node resources and has certain requirements on the time delay of each service instance, but the sensitivity is different. Meanwhile, since one service instance can carry multiple concurrent requests, the multiple requests are limited by the upper limit of resources of the instance, contention of resources also occurs, and a reasonable upper limit of the number of concurrent requests needs to be set according to the upper limit of resources of the instance.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person of ordinary skill in the art.
Disclosure of Invention
The invention aims to provide a service request and service quality driven edge resource scheduling method and application, which are used for solving the technical problem of ensuring that a service instance is reasonably scheduled on the premise of meeting the service quality.
To achieve the above object, an embodiment of the present invention provides a service request and quality of service driven edge resource scheduling method, which is applied to a serverless computing cluster, where the serverless computing cluster includes a plurality of edge nodes, and each of the edge nodes includes a plurality of service instances, and the method includes:
inquiring the service states of all service instances on the edge node to determine whether to send out alarm information; if so, the first and second data are not identical,
determining a priority queue based on service states and time delay sensitivity degrees of all service instances on the edge node, wherein the priority queue comprises service instances with resource usage amount larger than the application amount and real-time concurrent request amount larger than 0, and the service instances in the priority queue have priority information;
and adjusting the upper limit of the number of concurrent requests of the service instance in the priority queue based on the priority information.
In one or more embodiments of the present invention, querying service states of all service instances on the edge node to determine whether to send out alarm information specifically includes:
inquiring the proportion of all service instances on the edge node to finish request processing within a set upper limit of request time delay;
and determining whether to send out alarm information based on the query proportion.
In one or more embodiments of the present invention, adjusting an upper limit of a number of concurrent requests of a service instance in the priority queue based on the priority information specifically includes:
when the edge node receives the alarm information, the upper limit of the number of concurrent requests of the service instance in the priority queue is adjusted downwards according to the order of the priority from low to high;
and/or when the edge node does not receive the alarm information, periodically checking whether the resource usage of all the service instances on the edge node is less than the total quantity of concurrent requests; if so, the first and second data are not identical,
and up-regulating the upper limit of the number of concurrent requests of the service instance with the up-regulated upper limit of the number of concurrent requests in the priority queue according to the order of the priority from high to low.
In one or more embodiments of the invention, the method further comprises:
in the stage of deploying the service instance to a cluster, inputting the upper limit of the resource quantity of the service instance, the standard request delay and the time delay growth proportion into a pre-trained machine learning model to obtain the upper limit of the concurrent request quantity of the service instance;
the standard request time delay is the time delay of the request of the service instance under the condition of no resource contention, and the time delay increment proportion is the maximum increment proportion of the request of the service instance relative to the standard request time delay under the condition of the resource contention.
In one or more embodiments of the present invention, the method specifically includes:
acquiring request time delays of the service instance in an offline mode under the conditions of the same resource quantity upper limit and different concurrent request quantity, and request time delays under the conditions of the different resource quantity upper limit and different concurrent request quantity;
constructing a sample data set based on the upper limit of the resource quantity, the concurrent request quantity and the corresponding request time delay;
training the machine learning model based on the sample dataset until a training condition is met;
wherein the machine learning model is a regression model.
In one or more embodiments of the invention, the method further comprises:
acquiring the resource quantity upper limit, the concurrency request quantity and the request time delay under different resource quantity upper limits and different concurrency request quantities of a new service instance in an online mode, and constructing an update data set;
the machine learning model is updated based on the update data set.
In one or more embodiments of the invention, the method further comprises:
acquiring request time delays of different types of service instances under different resource quantity upper limits and different concurrent request quantities so as to construct a request time delay increase proportion set;
determining a service level target as a service request delay upper limit of different types of service instances under different delay increasing proportions, and screening out partial request delay of each service instance, which is larger than the service request delay upper limit, under different concurrent request quantity;
inputting part of request time delay, different resource quantity upper limits and different time delay increasing proportions of each service instance into the machine learning model to obtain a plurality of different concurrent request quantity upper limits corresponding to each service instance;
and determining the maximum value in a plurality of different upper limits of the quantity of concurrent requests corresponding to each service instance as the upper limit of the quantity of concurrent requests corresponding to the service instance.
In another aspect of the present invention, there is provided a service request and quality of service driven edge resource scheduling system, the system comprising:
the query module is used for querying the service states of all the service instances on the edge node to determine whether to send out alarm information;
the determining module is used for determining a priority queue based on the service states and the time delay sensitivity degree of all the service instances on the edge node, wherein the priority queue comprises service instances with the resource usage amount larger than the application amount and the real-time concurrent request amount larger than 0, and the service instances in the priority queue have priority information;
and the adjusting module is used for adjusting the upper limit of the quantity of concurrent requests of the service instance in the priority queue based on the priority information.
In another aspect of the present invention, there is provided an electronic device including:
at least one processor; and
a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the service request and quality of service driven edge resource scheduling method as described above.
In another aspect of the invention, a machine-readable storage medium is provided that stores executable instructions that, when executed, cause the machine to perform the service request and quality of service driven edge resource scheduling method described above.
Compared with the prior art, according to the service request and service quality driven edge resource scheduling method and application of the embodiment of the invention, by introducing the dynamic concurrency request quantity upper limit adjustment strategy, when the edge node resource contends to trigger an alarm, the request concurrency quantity upper limit of the service instance under the edge node can be timely adjusted, the resource contending condition is reduced, the cluster load peak period is spent, and meanwhile, the proportion of each service instance completing request processing in the set request time delay upper limit can be obviously improved.
On the other hand, the method trains a machine learning model according to test data of the request time delay of the service instance under different resource quantity upper limits and different concurrency request quantities, and fits a value as the request concurrency quantity upper limit when the service instance is deployed through the machine learning model, so that the method can adapt to the situation that different service instances need different resource quantity upper limits, and simultaneously can adapt to the requirements of different service requests on time delay.
Drawings
FIG. 1 is a flow chart of a method for service request and quality of service driven edge resource scheduling in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of a structure of a server-less computing cluster in a service request and QoS driven edge resource scheduling method according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method for scheduling service requests and quality of service driven edge resources to downregulate the upper limit of the number of concurrent requests of service instances in a priority queue according to an embodiment of the present invention;
FIG. 4 is a frame diagram of setting an upper limit on the number of concurrent requests of a service instance based on a pre-trained machine learning model in a service request and quality of service driven edge resource scheduling method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a service request and QoS driven edge resource scheduling system according to an embodiment of the present invention;
FIG. 6 is a second schematic diagram of a service request and QoS driven edge resource scheduling system according to an embodiment of the present invention;
fig. 7 is a hardware configuration diagram of an electronic device for service request and quality of service driven edge resource scheduling according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention is, therefore, to be taken in conjunction with the accompanying drawings, and it is to be understood that the scope of the invention is not limited to the specific embodiments.
Throughout the specification and claims, unless explicitly stated otherwise, the term "comprise" or variations thereof such as "comprises" or "comprising", etc. will be understood to include the stated element or component without excluding other elements or components.
Referring to fig. 1, an embodiment of a service request and qos driven edge resource scheduling method of the present invention is described, where the method is applied to a serverless computing cluster, and the serverless computing cluster includes a plurality of edge nodes, and each edge node includes a plurality of service instances.
The single service instance in the server-free computing cluster simultaneously processes a plurality of concurrent requests, but resources are distributed by taking the level of the service instance as a unit, so that the plurality of requests are limited by the resources of the service instance, and the service quality cannot be met due to resource competition. Meanwhile, multiple service instances of the same edge node break through the resource request amount because of being possibly under high load at the same time, so that the node is subjected to high load, and processes among multiple instances are subjected to resource competition, so that the service quality cannot be met.
Based on the two technical problems of single-instance multi-concurrency request resource contention and resource contention among multiple instances of the same node, the invention mainly takes two aspects of setting and adjusting the upper limit of the number of the concurrent requests of the instances as starting points, researches a service instance scheduling mechanism taking service quality as a guarantee, solves the problem of resource contention of different applications under the same host, and establishes a service instance scheduling strategy for the application load of the host by analyzing the performance of typical applications under the condition of differentiated resource configuration, counting the performance interference condition of the applications in the host.
In this embodiment, the method includes the following steps.
S101, inquiring service states of all service instances on the edge node to determine whether to send out alarm information.
Specifically, the ratio of the completion of request processing of all service instances on the edge node within the set upper limit of the request time delay is queried, and whether alarm information is sent or not is determined based on the queried ratio.
Referring to fig. 2, a serverless computing cluster has an overall cluster monitor responsible for pulling metrics on all edge nodes. Each node has an alarm receiver in it that is responsible for receiving alarms belonging to the service instance on its own node. The alarm receiver contains information of all service instances on its own node, including priority information of the service instances, upper limit of the number of concurrent requests of the service instances, etc.
The alert receiver may periodically query whether all service instances under each edge node trigger an alert rule, for example, whether the request delay of a certain service instance is greater than a set request delay upper limit for a period of time, and in this embodiment, the requested Service Level Objective (SLO) may be set as the request delay upper limit.
In the process that a plurality of service instances are operated by the same edge node, the situation that the service instances are simultaneously under high load to cause resource competition can occur, when the resource competition occurs, the request delay of the service instances can be influenced, the proportion of the service instances completing the request processing within the set upper limit of the request delay can be reduced, and when the request delay is reduced to the set threshold value, an alarm can be triggered.
And after the alarm information is sent out, introducing a dynamic concurrency request quantity upper limit adjustment strategy.
S102, determining a priority queue based on service states and time delay sensitivity degrees of all service instances on the edge node.
The priority queue comprises service instances with the resource usage amount larger than the application amount and the real-time concurrent request amount larger than 0, and the service instances in the priority queue have priority information.
Specifically, in this embodiment, it may be determined, according to service states of all service instances on the edge node, which service instances (for example, the resource usage amount may be greater than the application amount and the real-time concurrent request amount may be greater than 0) are selected to join the priority queue. Then, according to the delay sensitivity of all service instances, the priority order can be performed according to the order from high to low (or from low to high), so as to obtain a priority queue containing the priority information of the service instances.
S103, based on the priority information, adjusting the upper limit of the number of concurrent requests of the service instance in the priority queue.
Referring specifically to fig. 3, when the alarm receiver of the edge node receives the alarm information, the upper limit of the number of concurrent requests of the service instance in the priority queue may be lowered in order of priority from low to high, for example, the upper limit of the number of concurrent requests of the service instance may be reduced by step (step > 0). If the alarm is triggered repeatedly, the upper limit of the number of concurrent requests of the service instances is continuously adjusted until the alarm is sent out after resource contention does not occur among the service instances in the node.
It should be noted that, for the service instances of the same type, the upper limit of the number of concurrent requests on the same edge node is adjusted downward, so as to avoid the situation of unbalanced request load under multiple instances.
When the edge node does not receive the alarm information, the alarm receiver periodically checks whether the upper limit of the number of concurrent requests of all service instances on the edge node needs to be recovered.
The recovery policy is equivalent to a reverse downregulation policy, which is to perform up-regulation according to the order of priority from high to low each time, and up-regulation is a service instance which has been down-regulated in advance by the upper limit of the number of concurrent requests.
Specifically, the alarm receiver periodically checks whether the resource usage amount of all service instances on the edge node is smaller than the total amount of concurrent requests, if the resource usage amount of all service instances is smaller than the total amount of concurrent requests, the sum of the resource usage amounts on the node does not exceed the allocable amount of node resources, and a recovery strategy can be adopted, that is, the upper limit of the number of concurrent requests of the service instances with the lower upper limit of the number of concurrent requests in the priority queue is adjusted up according to the order of priority from high to low. That is, by means of the up-regulation strategy of the number of concurrent requests, the node scheduler can schedule more requests, so that the proportion of each service instance to complete request processing within the set upper limit of the request time delay can be remarkably improved, and the resource utilization rate is further improved.
The scheme mainly solves the problem of resource contention among a plurality of service instances in the same edge node, and by introducing a dynamic concurrency request quantity upper limit adjustment strategy, when the edge node resource contention occurs to trigger an alarm, the request concurrency quantity upper limit of the service instance under the edge node can be timely adjusted, the resource contention condition is reduced, the cluster load peak period is passed, and meanwhile, the proportion of the completion of request processing of each service instance in the set request time delay upper limit can be obviously improved.
Referring to fig. 4, the service request and service quality driven edge resource scheduling method further includes: in the stage of deploying the service instance to the cluster, the upper limit of the resource quantity of the service instance, the standard request time delay and the time delay increasing proportion are input into a pre-trained machine learning model to obtain the upper limit of the concurrent request quantity of the service instance.
The standard request time delay is the time delay of the request of the service instance under the condition of no resource contention, and the time delay increment proportion is the maximum increment proportion of the request of the service instance relative to the standard request time delay under the condition of the resource contention.
In this embodiment, the pre-training of the machine learning model is performed under off-line testing.
Specifically, first, a service instance f in an offline mode is acquired i Request time delay under the same resource amount upper Limit, different concurrent request quantity con epsilon { 1..the N } and request time delay under different resource amount upper Limit epsilon { k,2 k..the nk } and different concurrent request quantity, wherein k is a resource allocation basic unit, and k is 0.5 core when the resource amount is CPU.
And then constructing a sample data set based on the obtained upper limit of all the resource amounts, the concurrent request quantity and the corresponding request time delay. And finally training the machine learning model based on the sample data set until the training condition is met.
In this embodiment, the machine learning model may be a regression model, and the specific model structure is:
(latency,ratio,f,limit)→con
wherein, in the input vector of the regression model:
f represents a service of a certain type; limit is the upper limit of the amount of resources for service f.
latency represents the delay of the request of the service under the condition of no resource contention, namely standard request delay, and the average delay can be selected, and can also be 95-bit delay or 99-bit delay.
The ratio relates to a request setting request delay upper limit SLO (Service Level Objective, service level target), the ratio e {100+k, 100+2k..mu.100+nk }, the numerical value is a percentage of latency, k represents a percentage change unit, the ratio specifically represents how many times the average delay the request delay upper limit is when resource contention occurs, the request is regarded as an SLO which does not satisfy the request, the system should increase the satisfaction proportion of the request SLO as much as possible, for example, the service f sets the ratio to 115% of latency, and the delay of the request cannot exceed 115% latency.
The regression model is output as the number of concurrent requests con under limit setting that meets the ratio requirement.
Based on the scheme, the request time delay according to the service instance is tested under different upper limits of resource amounts and different concurrent request amounts respectively, and a certain sample data set is generated according to the input and output design scheme of the regression model. When a service instance needs to be deployed in a cluster, the regression model outputs a concurrent request quantity upper limit set value according to the resource quantity upper limit condition set by the instance and the request delay service quality requirement of the service.
The method initially requires acquiring a set of data from an offline test to train a regression model. During the online process, the model is updated by collecting data such as the time delay of the request of the serverless computing cluster during the running process, the number of concurrent requests carried at the time, the upper limit of the resource amount of the instance and the like.
Continuing to refer to fig. 4, an update data set is constructed by acquiring an upper limit of the resource amount of the new service instance in the online mode, the number of concurrent requests, and the request delay under different upper limits of the resource amount and different numbers of concurrent requests, and the machine learning model is updated based on the update data set.
Specifically, when the deployment request of the service f reaches the Gateway, the trained regression model outputs the concurrent request upper limit con according to the resource amount upper limit of the service f, the delay and the delay increase ratio. When a service instance needs to be generated, con is loaded into the service instance as a configuration and limits the upper limit of the number of concurrent requests when the service instance runs.
Cluster monitoring will constantly collect and save data for service f. When the data of the new service f which is not subjected to offline training is deployed in the cluster, a default concurrency upper limit is set for the example of the new service f, and when the time delay data of the new service f acquires certain data under different resource allocation upper limits and different concurrency request numbers, the regression model is updated.
There may be different latency requirements for different types of services. And acquiring the request time delays of different types of service instances under different resource quantity upper limits and different concurrent request quantities so as to construct a request time delay increase proportion set.
The request latency growth proportion set contains the growth proportions of different latency. For different percentile delays, a specific percentile delay for service f, i.e., a standard request delay, or an average delay, may be obtained. And then different request delay growth ratios ratio are increased according to different resource allocation upper limit to generate different data.
The service level objective SLO is determined as the service request delay upper bound for different types of service instances at their different delay increase rates. Some requests of services have the condition that the average delay and the high-resolution delay are relatively large in difference, and partial request delays of each service, which are larger than the upper limit of the service request delays under different concurrent request numbers, are screened out.
And then, inputting part of request time delay, different resource quantity upper limits and different time delay increase ratios of each service instance into a machine learning model to obtain a plurality of different concurrent request quantity upper limits corresponding to each service instance. And finally, determining the maximum value in a plurality of different upper limits of the quantity of concurrent requests corresponding to each service instance as the upper limit of the quantity of concurrent requests corresponding to the service instance.
Referring to fig. 5, an embodiment of the service request and qos driven edge resource scheduling system of the present invention is described, which in this embodiment includes a query module 201, a determination module 202, and an adjustment module 203.
A query module 201, configured to query service states of all service instances on the edge node to determine whether to issue alarm information;
a determining module 202, configured to determine a priority queue based on service states and delay sensitivity degrees of all service instances on the edge node, where the priority queue includes service instances with a resource usage amount greater than an application amount and a real-time concurrent request amount greater than 0, and the service instances in the priority queue have priority information;
and the adjusting module 203 is configured to adjust an upper limit of the number of concurrent requests of the service instance in the priority queue based on the priority information.
In one embodiment, the query module 201 is specifically configured to: inquiring the proportion of all service instances on the edge node to finish request processing within the set upper limit of the request time delay; based on the ratio of the queries, it is determined whether to issue alert information.
In one embodiment, the adjustment module 203 is specifically configured to: when the edge node receives the alarm information, the upper limit of the number of concurrent requests of the service instances in the priority queue is lowered according to the order of the priority from low to high; and/or when the edge node does not receive the alarm information, periodically checking whether the resource usage of all the service instances on the edge node is less than the total quantity of concurrent requests; if so, the upper limit of the number of concurrent requests of the service instance, which is subjected to the lower adjustment of the upper limit of the number of concurrent requests in the priority queue, is adjusted upwards according to the order of the priority from high to low.
Referring to FIG. 6, in one embodiment, the system may further include an input module 204, a build module 205, and an update module 206.
An input module 204, configured to input, in a stage of deploying the service instance to the cluster, an upper limit of a resource amount of the service instance, a standard request delay, and a time delay growth proportion into a pre-trained machine learning model, so as to obtain an upper limit of a concurrent request amount of the service instance; the standard request time delay is the time delay of the request of the service instance under the condition of no resource contention, and the time delay increment proportion is the maximum increment proportion of the request of the service instance relative to the standard request time delay under the condition of the resource contention.
A building module 205, configured to obtain request delays of service instances in an offline mode under the same resource amount upper limit and different concurrent request amounts, and request delays under different resource amount upper limit and different concurrent request amounts; constructing a sample data set based on the upper limit of the resource quantity, the concurrent request quantity and the corresponding request time delay; training the machine learning model based on the sample data set until a training condition is met; wherein the machine learning model is a regression model.
The updating module 206 is configured to obtain an upper limit of resource amount, a concurrent request number, and a request delay under different upper limits of resource amount and different concurrent request numbers of a new service instance in an online mode, and construct an updated data set; the machine learning model is updated based on the updated dataset.
Fig. 7 shows a hardware block diagram of an electronic device 30 for service request and quality of service driven edge resource scheduling according to an embodiment of the present description. As shown in fig. 7, the electronic device 30 may include at least one processor 301, a memory 302 (e.g., a non-volatile memory), a memory 303, and a communication interface 304, and the at least one processor 301, the memory 302, the memory 303, and the communication interface 304 are connected together via a bus 305. The at least one processor 301 executes at least one computer readable instruction stored or encoded in memory 302.
It should be appreciated that the computer-executable instructions stored in memory 302, when executed, cause at least one processor 301 to perform the various operations and functions described above in connection with fig. 1-4 in various embodiments of the present specification.
In embodiments of the present description, electronic device 30 may include, but is not limited to: personal computers, server computers, workstations, desktop computers, laptop computers, notebook computers, mobile computing devices, smart phones, tablet computers, cellular phones, personal Digital Assistants (PDAs), handsets, messaging devices, wearable computing devices, consumer electronic devices, and the like.
According to one embodiment, a program product, such as a computer readable storage medium, is provided. The computer-readable storage medium may have instructions (i.e., the elements described above implemented in software) that, when executed by a computer, cause the computer to perform the various operations and functions described above in connection with fig. 1-4 in various embodiments of the present specification. In particular, a system or apparatus provided with a readable storage medium having stored thereon software program code implementing the functions of any of the above embodiments may be provided, and a computer or processor of the system or apparatus may be caused to read out and execute instructions stored in the readable storage medium.
According to the service request and service quality driven edge resource scheduling method and application of the embodiment of the invention, by introducing the dynamic concurrent request quantity upper limit adjustment strategy, when the edge node resource contends to trigger an alarm, the request concurrent quantity upper limit of the service instance under the edge node can be timely adjusted, the resource contending condition is reduced, the cluster load peak period is passed, and meanwhile, the proportion of the completion of the request processing of each service instance in the set request time delay upper limit can be obviously improved.
On the other hand, the method trains a machine learning model according to test data of the request time delay of the service instance under different resource quantity upper limits and different concurrency request quantities, and fits a value as the request concurrency quantity upper limit when the service instance is deployed through the machine learning model, so that the method can adapt to the situation that different service instances need different resource quantity upper limits, and simultaneously can adapt to the requirements of different service requests on time delay.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (10)

1. A service request and quality of service driven edge resource scheduling method, applied to a serverless computing cluster, the serverless computing cluster comprising a plurality of edge nodes, each of the edge nodes comprising a plurality of service instances, the method comprising:
inquiring the service states of all service instances on the edge node to determine whether to send out alarm information; if so, the first and second data are not identical,
determining a priority queue based on service states and time delay sensitivity degrees of all service instances on the edge node, wherein the priority queue comprises service instances with resource usage amount larger than the application amount and real-time concurrent request amount larger than 0, and the service instances in the priority queue have priority information;
and adjusting the upper limit of the number of concurrent requests of the service instance in the priority queue based on the priority information.
2. The service request and service quality driven edge resource scheduling method according to claim 1, wherein querying service states of all service instances on the edge node to determine whether to send out alarm information specifically comprises:
inquiring the proportion of all service instances on the edge node to finish request processing within a set upper limit of request time delay;
and determining whether to send out alarm information based on the query proportion.
3. The method for scheduling service requests and edge resources driven by service quality according to claim 2, wherein adjusting the upper limit of the number of concurrent requests of service instances in the priority queue based on the priority information specifically comprises:
when the edge node receives the alarm information, the upper limit of the number of concurrent requests of the service instance in the priority queue is adjusted downwards according to the order of the priority from low to high;
and/or when the edge node does not receive the alarm information, periodically checking whether the resource usage of all the service instances on the edge node is less than the total quantity of concurrent requests; if so, the first and second data are not identical,
and up-regulating the upper limit of the number of concurrent requests of the service instance with the up-regulated upper limit of the number of concurrent requests in the priority queue according to the order of the priority from high to low.
4. The service request and quality of service driven edge resource scheduling method of claim 1, further comprising:
in the stage of deploying the service instance to a cluster, inputting the upper limit of the resource quantity of the service instance, the standard request delay and the time delay growth proportion into a pre-trained machine learning model to obtain the upper limit of the concurrent request quantity of the service instance;
the standard request time delay is the time delay of the request of the service instance under the condition of no resource contention, and the time delay increment proportion is the maximum increment proportion of the request of the service instance relative to the standard request time delay under the condition of the resource contention.
5. The service request and quality of service driven edge resource scheduling method of claim 4, wherein the method specifically comprises:
acquiring request time delays of the service instance in an offline mode under the conditions of the same resource quantity upper limit and different concurrent request quantity, and request time delays under the conditions of the different resource quantity upper limit and different concurrent request quantity;
constructing a sample data set based on the upper limit of the resource quantity, the concurrent request quantity and the corresponding request time delay;
training the machine learning model based on the sample dataset until a training condition is met;
wherein the machine learning model is a regression model.
6. The service request and quality of service driven edge resource scheduling method of claim 4, further comprising:
acquiring the resource quantity upper limit, the concurrency request quantity and the request time delay under different resource quantity upper limits and different concurrency request quantities of a new service instance in an online mode, and constructing an update data set;
the machine learning model is updated based on the update data set.
7. The service request and quality of service driven edge resource scheduling method of claim 4, further comprising:
acquiring request time delays of different types of service instances under different resource quantity upper limits and different concurrent request quantities so as to construct a request time delay increase proportion set;
determining a service level target as a service request delay upper limit of different types of service instances under different delay increasing proportions, and screening out partial request delay of each service instance, which is larger than the service request delay upper limit, under different concurrent request quantity;
inputting part of request time delay, different resource quantity upper limits and different time delay increasing proportions of each service instance into the machine learning model to obtain a plurality of different concurrent request quantity upper limits corresponding to each service instance;
and determining the maximum value in a plurality of different upper limits of the quantity of concurrent requests corresponding to each service instance as the upper limit of the quantity of concurrent requests corresponding to the service instance.
8. A service request and quality of service driven edge resource scheduling system, the system comprising:
the query module is used for querying the service states of all the service instances on the edge node to determine whether to send out alarm information;
the determining module is used for determining a priority queue based on the service states and the time delay sensitivity degree of all the service instances on the edge node, wherein the priority queue comprises service instances with the resource usage amount larger than the application amount and the real-time concurrent request amount larger than 0, and the service instances in the priority queue have priority information;
and the adjusting module is used for adjusting the upper limit of the quantity of concurrent requests of the service instance in the priority queue based on the priority information.
9. An electronic device, comprising:
at least one processor; and
a memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the service request and quality of service driven edge resource scheduling method of any one of claims 1 to 7.
10. A machine readable storage medium storing executable instructions that when executed cause the machine to perform the service request and quality of service driven edge resource scheduling method of any one of claims 1 to 7.
CN202310300866.3A 2023-03-24 Service request and service quality driven edge resource scheduling method and system Active CN116366727B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310300866.3A CN116366727B (en) 2023-03-24 Service request and service quality driven edge resource scheduling method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310300866.3A CN116366727B (en) 2023-03-24 Service request and service quality driven edge resource scheduling method and system

Publications (2)

Publication Number Publication Date
CN116366727A true CN116366727A (en) 2023-06-30
CN116366727B CN116366727B (en) 2024-11-12

Family

ID=

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106789726A (en) * 2016-12-20 2017-05-31 兴唐通信科技有限公司 A kind of high concurrent service resource allocation method based on service priority
CN113934512A (en) * 2021-10-21 2022-01-14 平安国际智慧城市科技股份有限公司 Load balancing implementation method, device, equipment and storage medium
CN114265699A (en) * 2021-12-31 2022-04-01 城云科技(中国)有限公司 Task scheduling method and device, electronic device and readable storage medium
US20220239758A1 (en) * 2021-01-22 2022-07-28 Avago Technologies International Sales Pte. Limited Distributed Machine-Learning Resource Sharing and Request Routing
CN115629858A (en) * 2022-10-17 2023-01-20 南京航空航天大学 Self-adaptive method for number of function examples in server-free background and application
CN115665258A (en) * 2022-10-21 2023-01-31 南京航空航天大学 Deep reinforcement learning-based priority perception deployment method for multi-target service function chain

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106789726A (en) * 2016-12-20 2017-05-31 兴唐通信科技有限公司 A kind of high concurrent service resource allocation method based on service priority
US20220239758A1 (en) * 2021-01-22 2022-07-28 Avago Technologies International Sales Pte. Limited Distributed Machine-Learning Resource Sharing and Request Routing
CN113934512A (en) * 2021-10-21 2022-01-14 平安国际智慧城市科技股份有限公司 Load balancing implementation method, device, equipment and storage medium
CN114265699A (en) * 2021-12-31 2022-04-01 城云科技(中国)有限公司 Task scheduling method and device, electronic device and readable storage medium
CN115629858A (en) * 2022-10-17 2023-01-20 南京航空航天大学 Self-adaptive method for number of function examples in server-free background and application
CN115665258A (en) * 2022-10-21 2023-01-31 南京航空航天大学 Deep reinforcement learning-based priority perception deployment method for multi-target service function chain

Similar Documents

Publication Publication Date Title
CN112162865B (en) Scheduling method and device of server and server
CN111950988B (en) Distributed workflow scheduling method and device, storage medium and electronic equipment
CN109254842B (en) Resource management method and device for distributed stream system and readable storage medium
CN110990138B (en) Resource scheduling method, device, server and storage medium
CN112363813A (en) Resource scheduling method and device, electronic equipment and computer readable medium
US11436050B2 (en) Method, apparatus and computer program product for resource scheduling
CN112685153A (en) Micro-service scheduling method and device and electronic equipment
CN112685170A (en) Dynamic optimization of backup strategies
CN109981744B (en) Data distribution method and device, storage medium and electronic equipment
US20240256362A1 (en) Allocating computing resources for deferrable virtual machines
CN110609738A (en) Adaptive data synchronization
CN115543577A (en) Kubernetes resource scheduling optimization method based on covariates, storage medium and equipment
CN115756812A (en) Resource adjusting method and device and storage medium
WO2022026044A1 (en) Sharing of compute resources between the virtualized radio access network (vran) and other workloads
CN109783236B (en) Method and apparatus for outputting information
CN115525394A (en) Method and device for adjusting number of containers
WO2020206699A1 (en) Predicting virtual machine allocation failures on server node clusters
CN116366727B (en) Service request and service quality driven edge resource scheduling method and system
Chen et al. HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions
CN109992408B (en) Resource allocation method, device, electronic equipment and storage medium
Liu et al. 5G/B5G Network Slice Management via Staged Reinforcement Learning
CN111857990B (en) Method and system for enhancing YARN long-type service scheduling
CN117082008B (en) Virtual elastic network data transmission scheduling method, computer device and storage medium
CN116366727A (en) Service request and service quality driven edge resource scheduling method and application
CN115827232A (en) Method, device, system and equipment for determining configuration for service model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant