WO2023092466A1 - Resource sharing-aware online task unloading method - Google Patents
Resource sharing-aware online task unloading method Download PDFInfo
- Publication number
- WO2023092466A1 WO2023092466A1 PCT/CN2021/133572 CN2021133572W WO2023092466A1 WO 2023092466 A1 WO2023092466 A1 WO 2023092466A1 CN 2021133572 W CN2021133572 W CN 2021133572W WO 2023092466 A1 WO2023092466 A1 WO 2023092466A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- server
- site
- users
- tasks
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000011068 loading method Methods 0.000 claims abstract description 37
- 238000005457 optimization Methods 0.000 claims abstract description 8
- 238000009877 rendering Methods 0.000 claims description 36
- 230000009471 action Effects 0.000 claims description 28
- 238000005192 partition Methods 0.000 claims description 20
- 238000003860 storage Methods 0.000 claims description 19
- 230000002787 reinforcement Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012856 packing Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 13
- 238000004088 simulation Methods 0.000 description 12
- 230000009467 reduction Effects 0.000 description 10
- 230000002452 interceptive effect Effects 0.000 description 9
- 238000002474 experimental method Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 238000013178 mathematical model Methods 0.000 description 5
- 238000009827 uniform distribution Methods 0.000 description 5
- 230000002776 aggregation Effects 0.000 description 4
- 238000004220 aggregation Methods 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
Definitions
- the present invention relates to the field of computer technology, and more specifically, to a resource sharing-aware online task offloading method.
- Cloud-based interactive applications have high requirements on the user's network environment, and it is difficult to meet the user's delay requirements.
- Utilizing emerging mobile edge computing and 5G networks the rendering tasks of interactive applications are offloaded to edge servers close to users, which can reduce user latency.
- Edge computing comes in the form of localized clouds. Telecom operators are upgrading infrastructure in existing communication networks to mobile edge computing platforms, with access sites such as cellular radio base stations, aggregation sites such as housing distributed antenna systems, and core sites such as central offices. These sites are equipped with computing and storage resources, cooling, power systems, etc., and have been redesigned to accommodate edge servers. Since the edge server is close to the user, using mobile edge computing can reduce the response time of user requests.
- Cloud-based interactive applications such as virtual reality and cloud games
- cloud resources to handle computationally intensive tasks, which avoids the need for high-end hardware (usually expensive and energy-intensive) on user devices and makes clients light Quantify.
- cloud-based interactive applications require high-throughput and low-latency network connections. If the distance between the user and the data center is long, it will be difficult to meet the low latency requirement of the user.
- edge-assisted interactive applications offload computationally intensive 3D rendering tasks to mobile edge computing systems, and stream edge-rendered video to end users via 5G connections. Since the edge server is close to the end user, this approach can greatly reduce latency.
- an edge-assisted interactive application system consists of cloud servers, edge servers, and user equipment.
- the cloud server hosts the core logic of the application; the edge server is responsible for rendering and encoding; and the user device is responsible for decoding and display.
- An information ring is formed between the three.
- the cloud server generates a rendering command and transmits it to the edge server; the edge server renders and generates a video frame and transmits it to the user device; the user device generates a control command and transmits it to the cloud server, which updates the application logic after receiving it.
- task offloading in edge computing can be divided into three categories according to optimization objectives: minimizing task delay, minimizing system cost, and simultaneously minimizing task delay and system cost.
- a rendering task involves multiple types of resources such as storage, memory, GPU memory, CPU, GPU, and bandwidth. With the exception of bandwidth, each type of resource can be shared by multiple users served by the same server. Utilizing the feature of multi-dimensional resource sharing in rendering tasks, users who share resources can be assigned to the same server, thereby saving resource consumption and cost. See Figure 2 for an example to illustrate resource sharing.
- rendered assets board and pieces
- An instance is an execution of the application, such as the example for a game of chess with two cooperating users on each side.
- these cached data can be shared by all rendering tasks of the instance.
- some users have the same viewpoint throughout the running of the application, so these users can share the rendering tasks run by the CPU and GPU, for example, user 1 and user 2 in the example share the viewpoint, and therefore share rendering task 1.
- users of an application instance can be allocated together to save resource consumption.
- each user's rendering task should be offloaded to a certain server at an edge site. Since interactive applications are latency sensitive, when choosing an edge location for offloading, make sure that the latency for each user is acceptable. Also, the cost of servers at different sites may vary. Therefore, it is necessary to provide a new task offloading method to take advantage of the multi-dimensional resource sharing characteristics of the rendering task, minimize the cost of the used server, or minimize the number of used servers when the server cost is the same.
- the purpose of the present invention is to overcome the above-mentioned defects in the prior art, and provide an online task offloading method with resource sharing awareness.
- the method includes:
- Instance-based user set U and target site set V' users are preassigned to sites by solving the modeled user assignment problem, where V' Indicates the set of all sites starting from site C in the site sequence, the optimization goal of the user allocation problem is to meet the delay requirements of each user and maximize the opportunity of resource sharing; in the current site C, for the obtained user set
- the user tasks in the model determine the servers to be loaded by solving the modeled server loading problem, wherein the optimization goal of the server loading problem is to use the least number of servers to load their tasks for the users pre-assigned to each site, and satisfy the server requirements. resource capacity constraints.
- the present invention has the advantage of utilizing the characteristics of multi-dimensional resource sharing of tasks to provide a new technical solution for unloading rendering tasks, which can not only meet the needs of each user's delay, but also minimize the use of The cost of the server, the cost can be rental cost or energy consumption etc.
- FIG. 1 is a schematic diagram of a system architecture of an edge-assisted interactive application in the prior art
- FIG. 2 is an example of resource sharing in the prior art
- Figure 3 is an example of user allocation according to one embodiment of the present invention.
- Fig. 4 is a schematic diagram showing that the cost reduction rate increases as the number of instances increases according to an embodiment of the present invention
- Fig. 5 is a schematic diagram of the cost reduction rate in the case of equal server costs according to an embodiment of the present invention.
- Fig. 6 is a schematic diagram of performance in a dynamic environment according to an embodiment of the present invention.
- Fig. 7 is a schematic diagram of costs under various delay limit values according to an embodiment of the present invention.
- Fig. 8 is a schematic diagram of the cost reduction rate when the delay is limited to 40 ms according to an embodiment of the present invention.
- Fig. 9 is a schematic diagram of the empirical cumulative distribution function of the cardinality of the set division blocks under different weight values ⁇ according to an embodiment of the present invention.
- Fig. 10 is a schematic diagram of cost reduction rates under different weight values ⁇ according to an embodiment of the present invention.
- the present invention proposes a sharing-aware online task offloading method, which is used to optimize the task offloading problem of multiple instances of a single application.
- the cost of the server The present invention alternately and iteratively solves two sub-problems: the user allocation problem and the server loading problem.
- the user allocation problem is to allocate users to edge sites so that the latency of each user satisfies the requirement while maximizing the chance of resource sharing.
- the user assignment problem is modeled as a set partition problem and a heuristic algorithm is proposed to solve it.
- the problem of server loading is that in each site, for the above-mentioned users pre-allocated to the site, use the least number of servers to load their rendering tasks, and at the same time meet the resource capacity limit of the server.
- the server loading problem is modeled as a multidimensional bin packing problem.
- the shared nature of resources complicates server loading issues. Therefore, further, it is proposed that each user can be regarded as an item, or a group of users who share a perspective can be regarded as an item, or users who belong to the same instance can be regarded as an item, and then solve the problem of multiple items. box problem.
- the difference between the above three loading methods is that the granularity is different, and the loading scheme and the resulting cost are also different.
- the granular decision problem is modeled as a multi-armed bandit problem and solved using a reinforcement learning algorithm.
- the user delay will be analyzed first, and then the task offloading problem, user allocation problem and server loading problem will be introduced respectively, and the sharing-aware online offloading method and its two sub-methods will be introduced in detail: the sharing-aware user assignment algorithm and granular decision-making algorithms.
- rendering tasks affects user latency.
- user latency consists of the upstream latency of transmitting control commands, the downstream latency of transmitting rendering commands and rendered video frames.
- rendering tasks are offloaded is related to downlink latency, not uplink latency. Therefore, this article takes the downlink delay as an example for illustration.
- b vu and d vu represent the downlink throughput and delay from edge site v to user equipment u respectively, and the delay of transmitting video frames can be obtained as:
- the downlink delay can be obtained:
- each instance is an execution of the application, which is participated by one or more users.
- each instance is an execution of the application, which is participated by one or more users.
- a group of edge servers (denoted by S)
- a group of edge sites (denoted by V).
- the cost of an edge server depends on the location of the edge site where it is located.
- the goal of the present invention is to offload the rendering tasks of these instances to edge servers and minimize the overall cost.
- the following modeling can be extended to the scenario of heterogeneous edge servers, such as the hardware configuration of the server is different, or the capacity of the server is different, etc., and the proposed method can also be applied to the scenario of heterogeneous edge servers.
- the task assignment problem is defined as: Given a group of users U and a group of edge servers S, assign each user in U to an edge server in S, not only to make each user’s delay requirement satisfied, but also The resources on each edge server are limited by its capacity while minimizing the total cost of the edge servers used. For each user, the rendering tasks serving it will run on the assigned server. For each server, all users sharing a view will share the same rendering task.
- t uv denote the network delay incurred by offloading user u's tasks to site v.
- t uj t uv . Therefore, the delay for user u is ⁇ j ⁇ S t uj x uj .
- the delay is capped at a tolerable maximum value, denoted by ⁇ . Therefore, it can be expressed as:
- p K represent the maximum number of tasks allowed by each server, namely:
- a boolean variable zi j is introduced, denoting whether server j hosts instance i ⁇ I. Then, the capacity constraint can be expressed as:
- bandwidth constraint can be expressed as:
- k(u) represents the shared view user group to which user u belongs. This means that x uj is 1 only when y k(u),j is 1. That is, event server j hosting user group k(u) is a necessary condition for event server j hosting user u. A similar relationship exists between y kj and z ij . If server j hosts user group k, then it must host the instance to which the user group belongs. Let i(k) denote an instance containing user group k, then the following expression:
- the goal of the present invention is to minimize the total cost of the servers used, such as the total fee paid by the application provider to the edge computing service provider.
- the goal is to minimize ⁇ j ⁇ S c j w j . Therefore, the task assignment problem is modeled as a Boolean linear program as follows:
- the user allocation problem is to allocate users to edge sites, not only to satisfy each user's latency requirement, but also to maximize resource sharing.
- a given set of users is partitioned into disjoint subsets, and each subset is assigned to a site. Different partitions will result in different resource consumption and thus different costs. Finding the optimal partition is a set partitioning problem.
- R 3 ⁇ u 2 , u 3 , u 4 , u 7 ⁇ .
- Users assigned to site v must have latency requirements met, so these users must be a subset of R v . Additionally, each user must be assigned to a site. Therefore, the sets of users assigned to each site must be pairwise disjoint, and the union of these sets equals the entire set of users. That is, the result of user assignment is a set partition.
- the user allocation scheme in Figure 3(b) is a set partition
- the user allocation scheme ⁇ u 1 ,u 3 ⁇ , ⁇ u 2 ,u 5 ,u 6 ⁇ , ⁇ u 4 ,u 7 ⁇ , ⁇ u 8 ⁇ in Figure 3(c) is Another kind of collection partitioning. Both allocation schemes are valid but may incur different costs.
- the allocation scheme in Figure 3(b) requires 3 rendering tasks, running in sites v 1 , v 3 and v 4 , because in site v 3 users u 2 , u 3 and u 4 share the perspective and thus share the same rendering task. The same goes for users assigned to site v4 .
- the allocation scheme in Figure 3(c) requires 1, 2, 2, and 1 rendering tasks in each site, for a total of 6. The higher the number of tasks, the higher the cost incurred.
- U represent the user set of a certain instance, which is different from the symbol U used to represent the user set of all instances.
- the user assignment problem is a partition to solve U.
- a division corresponds to an indexed family, which is composed of a subset of U, and the index set is V, represented by ⁇ U v :v ⁇ V ⁇ .
- An index family satisfies the following two constraints.
- One divided block U v corresponds to the set of users assigned to site v.
- Segmentation ensures that each user is assigned to an edge site and that the user's latency requirements are met at the assigned site.
- R v denote all users in U that meet the delay requirement on site v, then it can be expressed as:
- the user set U v allocated to the edge site v must satisfy:
- the binary group can be The cost of is defined as:
- ⁇ is the weight of
- the value of ⁇ affects the degree of resource sharing.
- the user assignment problem for a single instance is defined as: given a set of users U of a certain instance, a set defined in Equation (22) and a cost function c: found one The subset with the smallest cost in , making it a partition of U.
- This is a set partitioning problem.
- This modeling ensures that each user is assigned to a certain edge site, and the user's latency requirements are satisfied at this site, that is, the first two constraints of the mathematical model (17) are satisfied.
- a shared-aware user assignment algorithm is introduced to solve this problem.
- the server loading problem is solved.
- the optimization goal of the present invention is to use the least number of servers to load the user's rendering tasks.
- each server has three resource capacity constraints: a limited number of instances, a limited number of shared view user groups, and a limited number of users. Therefore, the server loading problem can be modeled as a three-dimensional bin packing problem.
- each user can be regarded as an item with resource requirements ⁇ 1,1,1>, and stored in a box with capacity ⁇ p I , p K , p B >.
- Each element in the triplet corresponds to the number of instances, tasks, and users, respectively.
- users belonging to the same shared view user group can be grouped together and viewed as items with resource requirements ⁇ 1, 1, n B >, where n B is the number of users in the user group.
- users belonging to the same instance can be grouped together and viewed as an item with resource requirements ⁇ 1,n K ,n B >, where n K and n B are the tasks in the instance (i.e.
- Coarse-grained e.g., instances
- fine-grained e.g., users
- the optimal granularity level depends on many factors, such as the capacity of the server ⁇ p I , p K , p B >, the delay limit ⁇ , whether the server costs of each edge site are the same, etc.
- the optimal level of granularity is different, and it is difficult to obtain the decision-making formula of the granularity level through the analysis model. Therefore, in one embodiment, the optimal level of granularity is learned from experience using reinforcement learning methods.
- the granular decision-making problem is modeled as a multi-armed bandit problem.
- the goal of the present invention is to maximize the expected total reward in the long run of action selection. In the following, we will specifically introduce the use of granular decision-making algorithm to solve the granular decision-making problem.
- a group of users When a certain level of granularity is selected, a group of users will form a group of items according to the level of granularity, and then use the least number of servers to load their rendering tasks, while meeting the constraints of 3D resource capacity.
- the 3D bin packing problem is NP-hard and there are many approximate or heuristic algorithms that can solve it.
- the traditional First-Fit algorithm can be used: when loading a new item, the algorithm scans the servers that have been started in order, and loads the item in the first server that meets the capacity constraint; If none of the existing servers satisfies the capacity constraint, a new server is started.
- resource capacity constraints (9), (13) and (14) need to be evaluated, taking into account the resource sharing outlined in the mathematical model (17). It is worth noting that the granularity decision algorithm of the present invention is applicable to any bin packing algorithm.
- the server loading problem becomes a cardinality constrained bin packing problem, which is still NP-hard, and some approximate algorithms have been proposed to solve it.
- the user allocation algorithm steps of shared awareness include:
- Step S101 given a user set U of a certain instance and a set of edge sites V, a set of 2-tuples can be obtained
- the first element of each two-tuple is site v
- the second element is a subset R v of user set U, which is the user meeting the delay requirement on site v.
- Step S102 from the collection cost-effective
- the smallest binary group (called ⁇ v * , A * >) is used as the division block of the user set U corresponding to the site v.
- Step S103 update collection Delete the users assigned in step S102 (namely A*) from each dyad to ensure that the selected dyads are pairwise disjoint.
- Step S104 from the collection Remove invalid two-tuples in
- Step S105 repeat steps S102 to S104 until the union of the selected 2-tuples is equal to U.
- the granular decision problem is modeled as a multi-armed bandit problem, which is solved using the action-value method in reinforcement learning.
- Q(a) denote the value function of action a, which is used to evaluate the merits of choosing each action.
- the value function Q(a) is obtained by computing the average reward obtained in the steps of choosing action a.
- the initial value of the value function Q(a) has a great influence on the convergence, first initialize Q(a), and then use Q(a) to select actions.
- an action a is selected in m steps to initialize Q(a).
- N(a) denote the number of times action a is selected.
- the granular decision-making algorithm includes the following steps:
- step S201 the value function Q(a) is initialized according to the following rules.
- three actions are taken sequentially at the user group granularity level, instance granularity level and user granularity level. Repeat the above process until each action is selected m times, and end the initialization phase. Assign the average reward obtained by action a in m steps to Q(a), and assign m to N(a).
- step S202 after the initialization phase ends, the action with the highest value function Q(a) is selected with the probability of 1- ⁇ , and the action is randomly selected with the probability of ⁇ .
- Step S203 use action a to perform server loading on the sequentially arriving k instances. Count the number of newly started servers during this period, and use its negative number as the reward R obtained by action a.
- step S204 the counting function N(a) is incremented by 1.
- Step S206 repeat steps S202 to S205.
- the parameter ⁇ balances exploration and exploitation in reinforcement learning.
- the parameter m affects the quality of the initialization and further affects the learning process. The larger m is, the closer the value function is to the real value, but too large m may slow down the convergence of the algorithm.
- the parameter k is the number of instances performing offloading at each step, which affects the quality of the reward. If k is too small, 1 in extreme cases, the rewards of each action are very similar or even 0, so that it is impossible to distinguish the pros and cons of actions; if k is too large, the update of the value function needs to unload a large number of instances, and at the same time This leads to slower convergence of the algorithm.
- the online offloading method is introduced based on the sub-algorithms introduced above.
- the present invention unloads instances one by one in the order they arrive. For each instance, the user allocation problem is first solved, followed by the server loading problem for each site.
- a site may not be able to accommodate all the users assigned to it because the number of servers per site is not considered when user allocation is made. Therefore, the sites are sorted in the order of non-decreasing cost, and then the servers are loaded on the sites in turn. The purpose is to give priority to the site with the lowest cost. Insufficient resources occur from time to time.
- the steps of uninstalling any instance in the sharing-aware online uninstalling method include:
- step S301 the sites are sorted in a non-decreasing order of their costs, and if the costs are the same, the site with more servers has a higher priority.
- C denote the currently lowest-cost site.
- Step S302 according to the station sequence obtained in step S301, let V' represent the set of all stations starting from station C in the sequence.
- the user allocation algorithm can be solved to obtain a set partition ⁇ U v :v ⁇ V' ⁇ of the user set U, where the user set corresponding to the current site C is U C .
- step S303 the current optimum granularity level is obtained by the granularity decision algorithm.
- Step S304 according to the granularity level obtained in step S303, use the first adaptation algorithm to load the server in the current site C for the tasks of the users in the user set U C obtained in step S302 until there are not enough servers.
- Step S305 delete the user who successfully loaded the task in step S304 from the user set U of the instance.
- Step S306 according to the station sequence obtained in step S301, the station next to the current station C is taken as station C.
- Step S307 repeat steps S302 to S306, until the tasks of all users in the user set U have been loaded or all sites have been tried.
- the sharing-aware offloading method (or called SAO) of the present invention takes resource sharing into consideration when offloading rendering tasks.
- the algorithm or called SBO
- SBO unloading algorithm
- the SBO algorithm Use different user allocation algorithms and server loading algorithms. For each instance, the SBO algorithm assigns each user to the edge site with the lowest latency and the lowest cost. If the cost is the same, the site with the most servers is preferred.
- the SBO algorithm uses the first-fit algorithm and takes users as the granularity.
- the present invention is compared with the user as the granularity (SAO-U), the user group as the granularity (SAO-G) and the instance as the granularity (SAO-I) Algorithms for server loading are compared.
- the performance of the algorithm is evaluated using the cost reduction rate defined below:
- the set parameters are shown in Table 1. Instances are started sequentially and remain running until the end of the experiment. Each experimental result is obtained after 20 random simulation experiments.
- the present invention can significantly reduce the cost.
- the method of the present invention reduces the cost by up to 52% compared to the SBO algorithm.
- the performance improvement comes from two aspects: 1) user allocation algorithm based on shared awareness; 2) granular decision algorithm.
- the cost of the SAO-U algorithm is reduced by 50%. This shows that server loading is also performed at the granularity of users, and resource sharing can reduce costs.
- the method of the present invention outperforms the SAO-U algorithm by about two percentage points. This suggests that granular decision algorithms can further reduce costs.
- Table 2 shows the convergence results of the granular decision-making algorithm in various situations through 20 random simulation experiments with 4000 instances.
- the granular decision-making algorithm converges to the granularity level of instances in 16 experiments, and converges to the granularity level of user groups in 4 experiments.
- the method (SAO) of the present invention can reduce the cost by as much as 30%.
- the performance of the SAO-I algorithm is very poor, because
- the performance of the SAO-U algorithm is the best, because the size of items with user granularity is small, and resources can be fully utilized.
- the method of the present invention (SAO) achieves almost the best performance by learning.
- the granularity decision algorithm converges to the granularity level of users in 16 experiments, and converges to the granularity level of user groups in 4 experiments. It successfully avoids the granularity level of the worst performing instance and almost converges to the best granularity level.
- the method of the present invention allocates resources for it, and when the instance is completed, the occupied resources will be released and reused later.
- Evaluate the performance of the algorithm in the above dynamic environment 4000 instances arrive and leave sequentially, the arrival time interval of the instances obeys the exponential distribution with a mean value of 500 milliseconds, and the running time of the instances obeys the uniform distribution of 10 to 20 minutes. Other parameter settings are shown in Table 1.
- the experimental results are obtained through 20 random simulation experiments.
- Figure 6(a) shows the average number of running instances in the system over time. In each simulation experiment, the number of running instances first increases as the instances arrive sequentially, then reaches a maximum before the first instance leaves, remains there for a while, and then decreases after the last instance arrives.
- Figure 6(b) shows the average cost reduction rate of each algorithm over time.
- the method (SAO) of the present invention reduces the cost by more than 40%, and learns the optimal granularity of server loading.
- the server cost under three values of delay limit ⁇ is compared: 20 ms, 30 ms and 40 ms. Since the SBO algorithm consumes a large number of servers when the delay limit ⁇ is 20 milliseconds, in this set of experiments, the number of servers for each site is set to obey a uniform distribution from 100 to 150. As shown in Fig. 7, the cost of both the method of the present invention (SAO) and the SBO algorithm decreases as the delay constraint increases. The reason is that the higher the value of the latency limit, the more users the latency requirements are satisfied in each edge site, and the greater the opportunity for resource sharing and cost reduction.
- SAO method of the present invention
- Fig. 8 shows the cost reduction rate when the delay limit ⁇ is 40ms. Unlike the case where the delay limit ⁇ is 30 ms, the SAO-I algorithm performs poorly. Because the higher the value of the latency limit, the more users the latency requirement satisfies in each edge site, the more users may be clustered together after user allocation. Loading servers at the granularity of instances will lead to waste of resources due to the large size of the items. The method of the present invention (SAO) achieves the best performance through learning. As shown in the third row of Table 2, in 20 random simulation experiments with 4000 instances, the granular decision-making algorithm converges to the granularity level of user groups in all experiments.
- Figure 10 shows the cost reduction rate of the method of the present invention (SAO) at different values of ⁇ .
- SAO cost reduction rate of the method of the present invention
- the present invention proposes a shared-aware online offloading method, which is used to solve the task offloading problem of multiple instances of a single application, which can satisfy the delay requirements of each user in the instance while minimizing The cost of the servers used.
- the present invention utilizes the characteristic of multi-dimensional resource sharing of tasks, and reduces resource consumption through resource sharing, thereby reducing system cost.
- a sharing-aware user allocation algorithm is proposed to allocate users to edge sites so that the latency of each user meets the requirements while maximizing resource sharing. The algorithm considers making the user's latency satisfy the requirements, while aggregating users together, minimizing the cost of the server and improving the chance of resource sharing among users.
- an optimal level of granularity for loading tasks is learned from experience using reinforcement learning methods. Since the cost of server loading at different granularity levels is different, using the best granularity level can reduce the cost of server loading.
- the present invention can be a system, method and/or computer program product.
- a computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.
- a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
- a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above.
- RAM random access memory
- ROM read-only memory
- EPROM erasable programmable read-only memory
- flash memory static random access memory
- SRAM static random access memory
- CD-ROM compact disc read only memory
- DVD digital versatile disc
- memory stick floppy disk
- mechanically encoded device such as a printer with instructions stored thereon
- a hole card or a raised structure in a groove and any suitable combination of the above.
- computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
- Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
- Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
- Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
- the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect).
- LAN local area network
- WAN wide area network
- an electronic circuit such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA)
- FPGA field programmable gate array
- PDA programmable logic array
- These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
- These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Disclosed in the present invention is a resource sharing-aware online task unloading method. The method comprises: sorting stations, so as to obtain a station sequence; pre-allocating users to the stations by means of solving a modeled user allocation problem, wherein the optimization objectives of the user allocation problem are meeting the delay requirement of each user, and maximizing a resource sharing opportunity; and for each station, by means of solving a modeled server loading problem, determining, for user tasks in an obtained user set, servers to be subjected to task loading, wherein the optimization objectives of the server loading problem are loading, by using the fewest servers, the tasks of the users pre-allocated to each station, and meeting resource capacity limitations of the servers. By using the present invention, a task unloading problem of a plurality of instances of a single application can be solved, the delay requirement of each user in the instances can be met, and the cost of the servers used can be minimized.
Description
本发明涉及计算机技术领域,更具体地,涉及一种资源共享感知的在线任务卸载方法。The present invention relates to the field of computer technology, and more specifically, to a resource sharing-aware online task offloading method.
基于云的交互式应用程序对用户的网络环境要求高,难以满足用户的延迟要求。利用新兴的移动边缘计算和5G网络,交互式应用程序的渲染任务被卸载到距离用户近的边缘服务器上,可以降低用户延迟。Cloud-based interactive applications have high requirements on the user's network environment, and it is difficult to meet the user's delay requirements. Utilizing emerging mobile edge computing and 5G networks, the rendering tasks of interactive applications are offloaded to edge servers close to users, which can reduce user latency.
边缘计算以本地化云的形式出现。电信运营商正在将现有通信网络中的基础设施升级为移动边缘计算平台,存在诸如蜂窝无线电基站的接入站点,诸如容纳分布式天线系统的聚合站点,以及诸如中心局的核心站点。这些站点配备了计算和存储资源、冷却、电力系统等,并且经过重新设计,可以容纳边缘服务器。由于边缘服务器靠近用户,利用移动边缘计算可以降低用户请求的响应时间。Edge computing comes in the form of localized clouds. Telecom operators are upgrading infrastructure in existing communication networks to mobile edge computing platforms, with access sites such as cellular radio base stations, aggregation sites such as housing distributed antenna systems, and core sites such as central offices. These sites are equipped with computing and storage resources, cooling, power systems, etc., and have been redesigned to accommodate edge servers. Since the edge server is close to the user, using mobile edge computing can reduce the response time of user requests.
基于云的交互式应用程序,例如虚拟现实和云游戏,利用云资源来处理计算密集型任务,这种方式可以避免用户设备对高端硬件(通常昂贵且能耗高)的需求,使客户端轻量化。然而,基于云的交互式应用需要高吞吐量和低延迟的网络连接。如果用户与数据中心之间的距离远,那么用户的低延迟要求将难以满足。利用新兴的移动边缘计算,边缘辅助交互式应用程序将计算密集型3D渲染任务卸载到移动边缘计算系统中,并通过5G连接将边缘渲染的视频流传输给终端用户。由于边缘服务器靠近终端用户,这种方式能够大幅度地降低延迟。Cloud-based interactive applications, such as virtual reality and cloud games, utilize cloud resources to handle computationally intensive tasks, which avoids the need for high-end hardware (usually expensive and energy-intensive) on user devices and makes clients light Quantify. However, cloud-based interactive applications require high-throughput and low-latency network connections. If the distance between the user and the data center is long, it will be difficult to meet the low latency requirement of the user. Leveraging the emerging mobile edge computing, edge-assisted interactive applications offload computationally intensive 3D rendering tasks to mobile edge computing systems, and stream edge-rendered video to end users via 5G connections. Since the edge server is close to the end user, this approach can greatly reduce latency.
如图1所示,一个边缘辅助交互式应用系统由云服务器、边缘服务器和用户设备组成。云服务器托管应用程序的核心逻辑;边缘服务器负责渲染和编码;用户设备负责解码和显示。三者之间形成了一个信息环。云服 务器生成渲染命令,并传输给边缘服务器;边缘服务器渲染生成视频帧,并传输给用户设备;用户设备生成控制命令,并传输给云服务器,其接收到之后更新应用程序逻辑。As shown in Figure 1, an edge-assisted interactive application system consists of cloud servers, edge servers, and user equipment. The cloud server hosts the core logic of the application; the edge server is responsible for rendering and encoding; and the user device is responsible for decoding and display. An information ring is formed between the three. The cloud server generates a rendering command and transmits it to the edge server; the edge server renders and generates a video frame and transmits it to the user device; the user device generates a control command and transmits it to the cloud server, which updates the application logic after receiving it.
在现有技术中,根据优化目标可以将边缘计算中任务卸载划分为三类:最小化任务延迟、最小化系统成本以及同时最小化任务延迟和系统成本。尽管这些技术方案以不同的方式来降低任务的时延,或者降低系统的成本,但均没有考虑任务的多维资源共享的特性,而利用任务之间的资源共享,可以有效地降低资源消耗,从而降低系统的成本。In the prior art, task offloading in edge computing can be divided into three categories according to optimization objectives: minimizing task delay, minimizing system cost, and simultaneously minimizing task delay and system cost. Although these technical solutions use different methods to reduce the delay of tasks or reduce the cost of the system, none of them consider the characteristics of multi-dimensional resource sharing of tasks, and the use of resource sharing between tasks can effectively reduce resource consumption, thereby Reduce system cost.
以运行渲染任务为例,其涉及多种类型的资源,如存储、内存、GPU内存、CPU、GPU和带宽。除了带宽外,每种类型的资源都可以被同一台服务器提供服务的多位用户共享。利用渲染任务具有多维度的资源共享的特征,可以将共享资源的用户分配到同一台服务器上,从而节省资源消耗和成本。参见图2的示例来说明资源共享。第一,对于一个应用程序(示例中为多人3D国际象棋游戏),渲染素材(棋盘和棋子)存储在存储器中,这些素材可以被多个应用程序实例共享。一个实例是应用程序的一次执行,例如示例中对应一局国际象棋游戏,其中每一方都有两位互相合作的用户。第二,在一个实例中,这些缓存数据可以被该实例的所有渲染任务共享。第三,某些用户在整个应用程序运行过程中拥有相同的视角,因此这些用户可以共享由CPU和GPU运行的渲染任务,例如示例中的用户1和用户2共享视角,因此共享渲染任务1。利用这种多维资源共享的特性,可以将一个应用实例的用户分配到一起来节省资源消耗。Taking running a rendering task as an example, it involves multiple types of resources such as storage, memory, GPU memory, CPU, GPU, and bandwidth. With the exception of bandwidth, each type of resource can be shared by multiple users served by the same server. Utilizing the feature of multi-dimensional resource sharing in rendering tasks, users who share resources can be assigned to the same server, thereby saving resource consumption and cost. See Figure 2 for an example to illustrate resource sharing. First, for one application (in the example, a multiplayer 3D chess game), rendered assets (board and pieces) are stored in memory, and these assets can be shared by multiple application instances. An instance is an execution of the application, such as the example for a game of chess with two cooperating users on each side. Second, in an instance, these cached data can be shared by all rendering tasks of the instance. Third, some users have the same viewpoint throughout the running of the application, so these users can share the rendering tasks run by the CPU and GPU, for example, user 1 and user 2 in the example share the viewpoint, and therefore share rendering task 1. Using this feature of multi-dimensional resource sharing, users of an application instance can be allocated together to save resource consumption.
一个应用程序实例的多位用户通常位于网络的不同位置,给定若干配备了服务器的边缘站点供用户卸载任务,要将每位用户的渲染任务卸载到某个边缘站点的某台服务器上。由于交互式应用程序对延迟敏感,在选择进行卸载的边缘站点时,要确保每位用户的延迟都能满足要求。此外,不同站点的服务器的成本可能不相同。因此,需要提供新的任务卸载方法,以利用渲染任务的多维资源共享的特性,最小化使用的服务器的成本,或者在服务器成本相同的情况下,最小化使用服务器的数量。Multiple users of an application instance are usually located in different locations on the network. Given several edge sites equipped with servers for users to offload tasks, each user's rendering task should be offloaded to a certain server at an edge site. Since interactive applications are latency sensitive, when choosing an edge location for offloading, make sure that the latency for each user is acceptable. Also, the cost of servers at different sites may vary. Therefore, it is necessary to provide a new task offloading method to take advantage of the multi-dimensional resource sharing characteristics of the rendering task, minimize the cost of the used server, or minimize the number of used servers when the server cost is the same.
发明内容Contents of the invention
本发明的目的是克服上述现有技术的缺陷,提供资源共享感知的在线任务卸载方法。该方法包括:The purpose of the present invention is to overcome the above-mentioned defects in the prior art, and provide an online task offloading method with resource sharing awareness. The method includes:
对站点进行排序,得到站点序列,其中用C表示排序第一的站点;Sorting the sites to obtain a site sequence, where C represents the site ranked first;
执行以下步骤,直到用户集合中的用户任务已装载或目标站点已遍历:基于实例的用户集合U和目标站点集合V’,通过求解建模的用户分配问题将用户预分配到站点,其中V’表示站点序列中从站点C开始的所有站点的集合,所述用户分配问题的优化目标是满足每位用户的延迟要求,并最大化资源共享的机会;在当前站点C中,对得到的用户集合中的用户任务通过求解建模的服务器装载问题,确定待装载任务的服务器,其中所述服务器装载问题的优化目标是针对预分配给各站点的用户,使用最少的服务器装载其任务,并满足服务器的资源容量限制。Perform the following steps until user tasks in the user set are loaded or target sites are traversed: Instance-based user set U and target site set V', users are preassigned to sites by solving the modeled user assignment problem, where V' Indicates the set of all sites starting from site C in the site sequence, the optimization goal of the user allocation problem is to meet the delay requirements of each user and maximize the opportunity of resource sharing; in the current site C, for the obtained user set The user tasks in the model determine the servers to be loaded by solving the modeled server loading problem, wherein the optimization goal of the server loading problem is to use the least number of servers to load their tasks for the users pre-assigned to each site, and satisfy the server requirements. resource capacity constraints.
与现有技术相比,本发明的优点在于,利用了任务的多维资源共享的特性,提供进行卸载渲染任务的新技术方案,既能使每位用户的延迟满足需求,同时又能最小化使用的服务器的成本,成本可以是租赁成本或能源消耗等。Compared with the prior art, the present invention has the advantage of utilizing the characteristics of multi-dimensional resource sharing of tasks to provide a new technical solution for unloading rendering tasks, which can not only meet the needs of each user's delay, but also minimize the use of The cost of the server, the cost can be rental cost or energy consumption etc.
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.
被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例,并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.
图1是现有技术中边缘辅助交互式应用的系统架构示意图;FIG. 1 is a schematic diagram of a system architecture of an edge-assisted interactive application in the prior art;
图2是现有技术中的资源共享示例;FIG. 2 is an example of resource sharing in the prior art;
图3是根据本发明一个实施例的用户分配示例;Figure 3 is an example of user allocation according to one embodiment of the present invention;
图4是根据本发明一个实施例的随着实例数量的增加,成本降低率也随之增加的示意图;Fig. 4 is a schematic diagram showing that the cost reduction rate increases as the number of instances increases according to an embodiment of the present invention;
图5是根据本发明一个实施例的在服务器成本相等的情况下的成本降低率示意图;Fig. 5 is a schematic diagram of the cost reduction rate in the case of equal server costs according to an embodiment of the present invention;
图6是根据本发明一个实施例的动态环境中的性能示意图;Fig. 6 is a schematic diagram of performance in a dynamic environment according to an embodiment of the present invention;
图7是根据本发明一个实施例的多种延迟限制值下的成本示意图;Fig. 7 is a schematic diagram of costs under various delay limit values according to an embodiment of the present invention;
图8是根据本发明一个实施例的延迟限制为40ms的成本降低率示意图;Fig. 8 is a schematic diagram of the cost reduction rate when the delay is limited to 40 ms according to an embodiment of the present invention;
图9是根据本发明一个实施例的不同权重值θ下集合划分块的基数的经验累积分布函数示意图;Fig. 9 is a schematic diagram of the empirical cumulative distribution function of the cardinality of the set division blocks under different weight values θ according to an embodiment of the present invention;
图10是根据本发明一个实施例的不同权重值θ下的成本降低率示意图。Fig. 10 is a schematic diagram of cost reduction rates under different weight values θ according to an embodiment of the present invention.
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific values should be construed as exemplary only, and not as limitations. Therefore, other instances of the exemplary embodiment may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.
本发明提出了一种共享感知(sharing-aware)的在线任务卸载方法,用于优化单个应用的多个实例的任务卸载问题,目标是使每位用户的延迟要求都得到满足,同时最小化使用的服务器的成本。本发明交替和迭代地求解两个子问题:用户分配问题和服务器装载问题。The present invention proposes a sharing-aware online task offloading method, which is used to optimize the task offloading problem of multiple instances of a single application. The cost of the server. The present invention alternately and iteratively solves two sub-problems: the user allocation problem and the server loading problem.
用户分配问题是将用户分配到边缘站点,使得每位用户的延迟都满足要求,同时最大化资源共享的机会。在一个实施例中,将用户分配问题建 模成集合划分问题,并提出了一种启发式算法来求解。The user allocation problem is to allocate users to edge sites so that the latency of each user satisfies the requirement while maximizing the chance of resource sharing. In one embodiment, the user assignment problem is modeled as a set partition problem and a heuristic algorithm is proposed to solve it.
服务器装载问题是在每个站点中,针对上述预分配给该站点的用户,使用最少的服务器装载其渲染任务,同时要满足服务器的资源容量限制。在一个实施例中,将服务器装载问题建模成多维装箱问题。然而,资源共享的特点使得服务器装载问题变复杂。因此,进一步地,提出了可以将每个用户视为一个物品,或者将一组共享视角的用户视为一个物品,或者将同属于一个实例的用户视为一个物品,然后对多个物品求解装箱问题。上述三种装载方式的区别在于粒度不同,装载方案和产生的成本也不同。因此,存在三种粒度级别:单个用户级别,共享视角用户组级别和实例级别。为了找到一种实现最低成本的粒度级别,将粒度决策问题建模成多臂赌博机问题,并使用强化学习算法来求解。The problem of server loading is that in each site, for the above-mentioned users pre-allocated to the site, use the least number of servers to load their rendering tasks, and at the same time meet the resource capacity limit of the server. In one embodiment, the server loading problem is modeled as a multidimensional bin packing problem. However, the shared nature of resources complicates server loading issues. Therefore, further, it is proposed that each user can be regarded as an item, or a group of users who share a perspective can be regarded as an item, or users who belong to the same instance can be regarded as an item, and then solve the problem of multiple items. box problem. The difference between the above three loading methods is that the granularity is different, and the loading scheme and the resulting cost are also different. Therefore, there are three levels of granularity: individual user level, shared perspective user group level, and instance level. To find a level of granularity that achieves the lowest cost, the granular decision problem is modeled as a multi-armed bandit problem and solved using a reinforcement learning algorithm.
在下文中,将首先对用户延迟进行分析,然后,分别介绍任务卸载问题、用户分配问题和服务器装载问题,并详细介绍共享感知的在线卸载方法和其包含的两个子方法:共享感知的用户分配算法和粒度决策算法。In the following, the user delay will be analyzed first, and then the task offloading problem, user allocation problem and server loading problem will be introduced respectively, and the sharing-aware online offloading method and its two sub-methods will be introduced in detail: the sharing-aware user assignment algorithm and granular decision-making algorithms.
1)、用户延迟分析1), user delay analysis
渲染任务的卸载位置会影响用户延迟。如图1所示,在边缘辅助交互式应用中,用户延迟由传输控制命令的上行延迟、传输渲染命令和渲染后的视频帧的下行延迟组成。渲染任务的卸载位置与下行延迟有关,而与上行延迟无关。因此,本文以考虑下行延迟为例进行说明。Where rendering tasks are offloaded affects user latency. As shown in Figure 1, in edge-assisted interactive applications, user latency consists of the upstream latency of transmitting control commands, the downstream latency of transmitting rendering commands and rendered video frames. Where rendering tasks are offloaded is related to downlink latency, not uplink latency. Therefore, this article takes the downlink delay as an example for illustration.
假设用户u的渲染任务被卸载到边缘站点v,其下行延迟用t
uv表示。令r和f分别表示渲染命令和视频帧的平均数据大小,令b
v和d
v分别表示从云服务器到边缘站点v的下行吞吐量和延迟,可以得到传输渲染命令的延迟为:
Assuming that the rendering task of user u is offloaded to edge site v, its downlink delay is denoted by t uv . Let r and f denote the average data size of the rendering command and video frame respectively, let b v and d v denote the downlink throughput and delay from the cloud server to the edge site v respectively, and the delay of transmitting the rendering command can be obtained as:
令b
vu和d
vu分别表示从边缘站点v到用户设备u的下行吞吐量和延迟,可以得到传输视频帧的延迟为:
Let b vu and d vu represent the downlink throughput and delay from edge site v to user equipment u respectively, and the delay of transmitting video frames can be obtained as:
根据公式(1)和公式(2),可以得到下行延迟:According to formula (1) and formula (2), the downlink delay can be obtained:
由公式(3)可知,下行延迟与v的位置有关。因此,在卸载渲染任务时,需要确保每位用户的延迟要求都得到满足。It can be known from formula (3) that the downlink delay is related to the position of v. Therefore, when offloading rendering tasks, it is necessary to ensure that each user's latency requirements are met.
2)、任务卸载问题建模2) Modeling of task offloading problem
给定一个应用程序和一组实例(用I表示),每个实例是该应用程序的一次执行,其由一位或多位用户参与。另外,给定一组边缘服务器(用S表示),分布在一组边缘站点(用V表示)中。假设所有边缘服务器的硬件配置都是相同的,边缘服务器的成本取决于其所在的边缘站点的位置。本发明的目标是将这些实例的渲染任务卸载到边缘服务器上,并最小化总成本。以下的建模可以扩展到异构边缘服务器的场景,如服务器的硬件配置不相同,或服务器的容量不相同等情况,并且所提出的方法也可以应用到异构边缘服务器的场景。Given an application and a set of instances (denoted by I), each instance is an execution of the application, which is participated by one or more users. In addition, given a group of edge servers (denoted by S), distributed in a group of edge sites (denoted by V). Assuming that all edge servers have the same hardware configuration, the cost of an edge server depends on the location of the edge site where it is located. The goal of the present invention is to offload the rendering tasks of these instances to edge servers and minimize the overall cost. The following modeling can be extended to the scenario of heterogeneous edge servers, such as the hardware configuration of the server is different, or the capacity of the server is different, etc., and the proposed method can also be applied to the scenario of heterogeneous edge servers.
将任务分配问题定义为:给定一组用户U和一组边缘服务器S,将U中的每位用户分配给S中的一台边缘服务器,不仅要使得每位用户的延迟要求得到满足,而且每台边缘服务器上的资源受其容量限制,同时最小化所使用的边缘服务器的总成本。对于每位用户,为其提供服务的渲染任务将运行在所分配的服务器上。对于每台服务器,所有的共享视角的用户将共享同一个渲染任务。The task assignment problem is defined as: Given a group of users U and a group of edge servers S, assign each user in U to an edge server in S, not only to make each user’s delay requirement satisfied, but also The resources on each edge server are limited by its capacity while minimizing the total cost of the edge servers used. For each user, the rendering tasks serving it will run on the assigned server. For each server, all users sharing a view will share the same rendering task.
令布尔变量x
uj表示服务器j是否托管用户u的渲染任务。每位用户必须被分配到一台服务器上,表示为:
Let the Boolean variable x uj denote whether server j hosts rendering tasks for user u. Each user must be assigned to a server, expressed as:
如前所述,用户延迟取决于卸载任务的站点。令t
uv表示将用户u的任务卸载到站点v所产生的网络延迟。对于站点v中的任何一台服务器j,有t
uj=t
uv。因此,用户u的延迟是∑
j∈St
ujx
uj。延迟以一个可容忍的最大值为上限,用τ表示。因此,可表示为:
As mentioned earlier, user latency depends on the site where the task is offloaded. Let t uv denote the network delay incurred by offloading user u's tasks to site v. For any server j in site v, t uj =t uv . Therefore, the delay for user u is ∑ j∈S t uj x uj . The delay is capped at a tolerable maximum value, denoted by τ. Therefore, it can be expressed as:
接下来,对资源约束进行建模。假设每台边缘服务器都有充足的存储空间来托管应用程序。否则,某些服务器会因为无法托管任何渲染任务而被忽略。令向量p=(p
C,p
G,p
M,p
GM,p
B)表示每台服务器的CPU、GPU、 内存、GPU内存和带宽容量。为简单起见,假设渲染任务对每种资源的需求量均为一个单位。渲染任务的多维资源共享的特性使对资源约束的建模变得复杂,因此,引入辅助变量。
Next, model resource constraints. It is assumed that each edge server has sufficient storage space to host the application. Otherwise, some servers are ignored because they cannot host any rendering tasks. Let the vector p=(p C , p G , p M , p GM , p B ) represent the CPU, GPU, memory, GPU memory and bandwidth capacity of each server. For simplicity, assume that the rendering task requires one unit of each resource. The multi-dimensional resource sharing nature of rendering tasks complicates the modeling of resource constraints, therefore, auxiliary variables are introduced.
假设U中的用户分为若干共享视角的用户组,用K表示。为了对CPU和GPU的资源约束进行建模,引入布尔变量y
kj,表示服务器j是否托管共享视角用户组k的渲染任务。因此,服务器j上的任务数为∑
k∈Ky
kj。由于可能存在的任务共享,该任务数可能小于分配给服务器的用户数量(即∑
u∈Ux
uj)。令布尔变量w
j表示服务器j是否已启动。从而,容量约束可表示为:
Assume that users in U are divided into several user groups that share perspectives, denoted by K. To model the resource constraints of CPU and GPU, a Boolean variable y kj is introduced to indicate whether server j hosts rendering tasks for user group k sharing a view. Therefore, the number of tasks on server j is ∑ k∈K y kj . Due to possible task sharing, this number of tasks may be smaller than the number of users assigned to the server (ie ∑ u ∈ U x uj ). Let the Boolean variable w j denote whether server j is started. Thus, the capacity constraint can be expressed as:
上述两个约束可以进行合并,令p
K表示每台服务器允许的最大任务数,即:
The above two constraints can be combined, let p K represent the maximum number of tasks allowed by each server, namely:
p
K=min{p
C,p
G} (8)
p K =min{p C ,p G } (8)
将上述公式(6)和(7)两个约束替换为:Replace the two constraints of the above formulas (6) and (7) with:
公式(9)的约束确保了如果w
j为0,则所有变量y
kj一定为0。
The constraint of formula (9) ensures that if w j is 0, then all variables y kj must be 0.
为了对内存和GPU内存的资源约束进行建模,引入了布尔变量zi
j,表示服务器j是否托管实例i∈I。然后,容量约束可表示为:
To model the resource constraints of memory and GPU memory, a boolean variable zi j is introduced, denoting whether server j hosts instance i ∈ I. Then, the capacity constraint can be expressed as:
因为任何两个实例都有单独的内存和GPU内存,因此,对所有实例求和。上述公式(10)和(11)的两个约束也可以进行合并,令p
I表示每台服务器允许的最大实例数,即:
Since any two instances have separate memory and GPU memory, sum over all instances. The two constraints of the above formulas (10) and (11) can also be combined, let p I represent the maximum number of instances allowed by each server, namely:
p
I=min{p
M,p
GM} (12)
p I =min{p M , p GM } (12)
然后,将上述公式(10)和(11)的两个约束替换为:Then, replace the two constraints of the above formulas (10) and (11) with:
与可以共享的资源不同,假设用户不能共享带宽。因此,带宽约束可 以表示为:Unlike resources that can be shared, assumed users cannot share bandwidth. Therefore, the bandwidth constraint can be expressed as:
需要注意的是,上述变量之间存在某种关系。首先,x
uj和y
kj必须满足约束:
It should be noted that there is a certain relationship between the above variables. First, x uj and y kj must satisfy the constraints:
其中k(u)表示用户u所属的共享视角用户组。这意味着只有当y
k(u),j为1时x
uj才为1。也就是事件服务器j托管用户组k(u)是事件服务器j托管用户u的必要条件。y
kj和z
ij之间也存在类似的关系。如果服务器j托管了用户组k,那么它必定托管了用户组所属的实例。令i(k)表示包含用户组k的实例,那么有以下表达式:
where k(u) represents the shared view user group to which user u belongs. This means that x uj is 1 only when y k(u),j is 1. That is, event server j hosting user group k(u) is a necessary condition for event server j hosting user u. A similar relationship exists between y kj and z ij . If server j hosts user group k, then it must host the instance to which the user group belongs. Let i(k) denote an instance containing user group k, then the following expression:
最后,公式(13)给出了必要条件z
ij≤w
j。
Finally, formula (13) gives the necessary condition z ij ≤ w j .
本发明的目标是最小化所使用的服务器的总成本,例如应用程序提供商支付给边缘计算服务提供商的总费用。令c
v表示站点v中每台服务器的成本,对于站点v中的任何服务器j,都有c
j=c
v。目标是最小化∑
j∈Sc
jw
j。因此,将任务分配问题建模为如下所示的布尔线性规划:
The goal of the present invention is to minimize the total cost of the servers used, such as the total fee paid by the application provider to the edge computing service provider. Let c v denote the cost of each server in site v, and for any server j in site v, c j = c v . The goal is to minimize ∑ j∈S c j w j . Therefore, the task assignment problem is modeled as a Boolean linear program as follows:
3)、用户分配问题建模3) Modeling of user assignment problem
用户分配问题是将用户分配到边缘站点,不仅要使得每位用户的延迟要求得到满足,而且要最大化资源共享。最终,给定的用户集合被划分成不相交的子集,每个子集被分配到一个站点。不同的划分将导致不同的资源消耗,从而产生不同的成本。找到最佳的划分是一个集合划分问题。The user allocation problem is to allocate users to edge sites, not only to satisfy each user's latency requirement, but also to maximize resource sharing. Ultimately, a given set of users is partitioned into disjoint subsets, and each subset is assigned to a site. Different partitions will result in different resource consumption and thus different costs. Finding the optimal partition is a set partitioning problem.
参见图3所示来说明用户分配问题。给定4个边缘站点(用正方形表示)和8位用户(用圆形表示)。用户都属于同一个实例,但属于2个不同的共享视角用户组,用实心圆和空心圆区分用户组。在该示例中,用二分图表示用户和站点之间的关系,即如果用户的延迟要求在一个站点上得 到满足,则用户和站点之间存在一条边。其中浅灰色边表示可行的分配,深灰色边表示实际的分配。每个站点都有一组与之相连的用户,与站点v相连的用户集合用R
v表示。例如,在图3(a)中,R
3={u
2,u
3,u
4,u
7}。被分配到站点v的用户,其延迟要求必须得到满足,因此这些用户必须是R
v的子集。此外,每位用户必须被分配到某一个站点。因此,分配给每个站点的用户集合必须两两互不相交,并且这些集合的并集等于整个用户集合。也就是说,用户分配的结果是一种集合划分。例如,图3(b)中的用户分配方案
是一种集合划分,图3(c)中的用户分配方案{{u
1,u
3},{u
2,u
5,u
6},{u
4,u
7},{u
8}}是另一种集合划分。两种分配方案都是有效的,但可能产生不同的成本。图3(b)中的分配方案需要3个渲染任务,分别在站点v
1、v
3和v
4中运行,因为在站点v
3中,用户u
2、u
3和u
4共享视角,因此共享同一个渲染任务。分配到站点v
4的用户也是如此。相比之下,图3(c)中的分配方案在每个站点中需要的渲染任务数量分别是1、2、2和1个,总共6个。任务数量越多,产生的成本越高。
Refer to Figure 3 to illustrate the user allocation problem. Given 4 edge sites (represented by squares) and 8 users (represented by circles). All users belong to the same instance, but belong to two different user groups with shared perspectives. Use solid circles and hollow circles to distinguish user groups. In this example, the relationship between users and sites is represented by a bipartite graph, that is, there is an edge between a user and a site if the user's latency requirement is satisfied on one site. The light gray edges represent feasible assignments, and the dark gray edges represent actual assignments. Each site has a set of users connected to it, and the set of users connected to site v is denoted by Rv . For example, in FIG. 3( a ), R 3 ={u 2 , u 3 , u 4 , u 7 }. Users assigned to site v must have latency requirements met, so these users must be a subset of R v . Additionally, each user must be assigned to a site. Therefore, the sets of users assigned to each site must be pairwise disjoint, and the union of these sets equals the entire set of users. That is, the result of user assignment is a set partition. For example, the user allocation scheme in Figure 3(b) is a set partition, the user allocation scheme {{u 1 ,u 3 },{u 2 ,u 5 ,u 6 },{u 4 ,u 7 },{u 8 }} in Figure 3(c) is Another kind of collection partitioning. Both allocation schemes are valid but may incur different costs. The allocation scheme in Figure 3(b) requires 3 rendering tasks, running in sites v 1 , v 3 and v 4 , because in site v 3 users u 2 , u 3 and u 4 share the perspective and thus share the same rendering task. The same goes for users assigned to site v4 . In contrast, the allocation scheme in Figure 3(c) requires 1, 2, 2, and 1 rendering tasks in each site, for a total of 6. The higher the number of tasks, the higher the cost incurred.
下面,定义用户分配问题。令U表示某一个实例的用户集合,区别于前面表示所有实例的用户集合所使用的符号U。给定一个实例的用户集合U和一组边缘站点V,用户分配问题是求解U的一种划分。一种划分对应一个索引族(indexed family),由U的子集组成,索引集合是V,用{U
v:v∈V}表示。索引族满足以下两个约束。
Next, define the user assignment problem. Let U represent the user set of a certain instance, which is different from the symbol U used to represent the user set of all instances. Given a set U of users for an instance and a set of edge sites V, the user assignment problem is a partition to solve U. A division corresponds to an indexed family, which is composed of a subset of U, and the index set is V, represented by {U v :v∈V}. An index family satisfies the following two constraints.
1)所有子集的并集等于U,即:1) The union of all subsets is equal to U, namely:
∪
v∈VU
v=U (18)
∪ v∈V U v = U (18)
2)所有子集是两两互不相交的,即:2) All subsets are pairwise disjoint, namely:
划分出的一个划分块U
v对应于被分配到站点v的用户集合。
One divided block U v corresponds to the set of users assigned to site v.
划分要确保每位用户都被分配到一个边缘站点,并且用户的延迟要求在分配的站点上得到满足。令R
v表示U中所有在站点v上满足延迟要求的用户,则可以表示为:
Segmentation ensures that each user is assigned to an edge site and that the user's latency requirements are met at the assigned site. Let R v denote all users in U that meet the delay requirement on site v, then it can be expressed as:
R
v={u∈U|t
uv≤τ} (20)
R v ={u∈U|t uv ≤τ} (20)
因此,划分给边缘站点v的用户集合U
v必须满足:
Therefore, the user set U v allocated to the edge site v must satisfy:
为了将此约束引入集合划分问题的建模,引入了一个二元组<v,A>,表示站点v和该站点的一个可行集
然后,得到一个由候选二元组构成的集合:
To introduce this constraint into the modeling of the set partition problem, a two-tuple <v,A> is introduced, denoting a site v and a feasible set of the site Then, get a set of candidate bigrams:
因此,问题变成给定U和
选择
的子集组成U的一种划分,这意味着所选择的二元组的第二个元素能够满足约束(18)和(19)。
Therefore, the problem becomes given U and choose Subsets of U form a partition of U, which means that the second element of the chosen pair satisfies constraints (18) and (19).
可以从
中得到U的多种划分。但是,不同的划分会导致不同类型的资源共享,从而导致不同的资源(内存和处理器)消耗。例如,考虑一种划分,每位用户被分配到不同的边缘站点,那么就没有资源共享,导致资源消耗高。相比之下,考虑另一种划分,所有用户都被分配到同一个边缘站点,如果他们的渲染任务被装载到同一台服务器上,那么就可以共享资源,从而降低资源消耗。
available from Various divisions of U are obtained in . However, different partitions result in different types of resource sharing and thus different resource (memory and processor) consumption. For example, consider a partition where each user is assigned to a different edge site, then there is no resource sharing, resulting in high resource consumption. In contrast, considering another partition, all users are assigned to the same edge site, and if their rendering tasks are loaded on the same server, then the resources can be shared, thereby reducing resource consumption.
给定公式(22)中定义的由候选二元组构成的集合
优选地,选择基数大的集合A,因为资源共享的机会更大。此外,倾向于选择成本低的边缘站点。为了同时满足以上两项要求,可以将二元组
的成本定义为:
Given the set of candidate bigrams defined in formula (22) Preferably, a set A with a large cardinality is selected because there are greater chances of resource sharing. In addition, there is a tendency to choose edge sites with low cost. In order to meet the above two requirements at the same time, the binary group can be The cost of is defined as:
c(<v,A>)=c
v|A|
-θ (23)
c(<v, A>)=c v |A| -θ (23)
成本越低越好。θ是|A|的权重,用于平衡服务器的成本和集合的基数。θ的值会影响资源共享的程度。The lower the cost, the better. θ is the weight of |A|, which is used to balance the cost of the server and the cardinality of the collection. The value of θ affects the degree of resource sharing.
因此,将单个实例的用户分配问题定义为:给定某个实例的一组用户U,一个公式(22)中定义的集合
和一个成本函数c:
找到一个
中成本最小的子集,使其组成U的一种划分。这是一个集合划分问题。这种建模确保了每位用户都被分配到某一个边缘站点,并且用户的延迟要求在该站点上得到满足,即满足数学模型(17)的前两个约束。在下文中,将介绍共享感知的用户分配算法来求解该问题。
Therefore, the user assignment problem for a single instance is defined as: given a set of users U of a certain instance, a set defined in Equation (22) and a cost function c: found one The subset with the smallest cost in , making it a partition of U. This is a set partitioning problem. This modeling ensures that each user is assigned to a certain edge site, and the user's latency requirements are satisfied at this site, that is, the first two constraints of the mathematical model (17) are satisfied. In the following, a shared-aware user assignment algorithm is introduced to solve this problem.
3)、服务器装载问题3), server loading problem
一旦用户被分配到边缘站点,将求解服务器装载问题。对于每个站点,给定一组分配给它的用户,本发明的优化目标是使用最少的服务器来装载 用户的渲染任务。根据数学模型(17),每个服务器具有三种资源容量约束:实例数量受限、共享视角用户组数量受限以及用户数量受限。因此,服务器装载问题可以建模成一个三维装箱问题。Once users are assigned to edge locations, the server loading problem is solved. For each site, given a set of users assigned to it, the optimization goal of the present invention is to use the least number of servers to load the user's rendering tasks. According to the mathematical model (17), each server has three resource capacity constraints: a limited number of instances, a limited number of shared view user groups, and a limited number of users. Therefore, the server loading problem can be modeled as a three-dimensional bin packing problem.
由于将一些用户的任务装载在一起可能会降低资源消耗,因而资源共享使服务器装载问题变得复杂。在一个实施例中,可以将每个用户当作一个资源需求为<1,1,1>的物品,装在一个容量为<p
I,p
K,p
B>的箱子里。三元组中的每个元素分别对应实例数量、任务数量和用户数量。或者,可以将属于同一个共享视角用户组的用户组合在一起,将其视为资源需求为<1,1,n
B>的物品,其中n
B是该用户组的用户数量。类似地,可以将属于同一实例的用户组合在一起,将其视为一个资源需求为<1,n
K,n
B>的物品,其中n
K和n
B分别是该实例中的任务(即用户组)数量和用户数量。不同级别的粒度可能会产生不同的成本。粗粒度(如实例)可以最大化共享资源,但可能会因为物品的尺寸大而导致资源浪费。相比之下,细粒度(如用户)不能最大化资源共享,但由于物品尺寸小,因此可能更充分地利用资源。因此,需要找到一种最佳的粒度级别,以最小化装载的成本。
Resource sharing complicates server loading issues, since loading some users' tasks together may reduce resource consumption. In one embodiment, each user can be regarded as an item with resource requirements <1,1,1>, and stored in a box with capacity <p I , p K , p B >. Each element in the triplet corresponds to the number of instances, tasks, and users, respectively. Alternatively, users belonging to the same shared view user group can be grouped together and viewed as items with resource requirements <1, 1, n B >, where n B is the number of users in the user group. Similarly, users belonging to the same instance can be grouped together and viewed as an item with resource requirements <1,n K ,n B >, where n K and n B are the tasks in the instance (i.e. user groups) and number of users. Different levels of granularity may have different costs. Coarse-grained (eg, instances) can maximize shared resources, but may lead to wasted resources due to the large size of the items. In contrast, fine-grained (e.g., users) do not maximize resource sharing, but may utilize resources more fully due to the small item size. Therefore, it is necessary to find an optimal granularity level to minimize the cost of loading.
根据仿真实验,发现最佳的粒度级别取决于多种因素,例如服务器的容量<p
I,p
K,p
B>,延迟限制τ,各个边缘站点的服务器成本是否相同等。在不同情况下,最佳的粒度级别也不相同,通过分析模型很难得到粒度级别的决策公式。因此,在一个实施例中,使用强化学习方法从经验中学习最佳粒度级别。
According to simulation experiments, it is found that the optimal granularity level depends on many factors, such as the capacity of the server <p I , p K , p B >, the delay limit τ, whether the server costs of each edge site are the same, etc. In different situations, the optimal level of granularity is different, and it is difficult to obtain the decision-making formula of the granularity level through the analysis model. Therefore, in one embodiment, the optimal level of granularity is learned from experience using reinforcement learning methods.
具体地,将粒度决策问题建模成多臂赌博机问题。给定三种动作(粒度级别),每一步,选择一个动作(粒度级别),对于之后依次卸载的k个实例,都用该动作(粒度级别)进行服务器装载。每一步都重复此过程。一些服务器会被启动来装载实例,启动服务器的数量越少越好。因此,将动作的奖励(用R表示)定义为新启动的服务器数量(用C表示)的负数,即R=-C。本发明的目标是在长期的动作选择中最大化预期的总回报。在下文将具体介绍利用粒度决策算法来求解该粒度决策问题。Specifically, the granular decision-making problem is modeled as a multi-armed bandit problem. Three kinds of actions (level of granularity) are given. For each step, an action (level of granularity) is selected. For the k instances that are subsequently uninstalled sequentially, this action (level of granularity) is used to load the server. Repeat this process for each step. Some servers will be started to load the instance, the less number of started servers the better. Therefore, the reward of an action (denoted by R) is defined as the negative number of the number of newly started servers (denoted by C), that is, R=-C. The goal of the present invention is to maximize the expected total reward in the long run of action selection. In the following, we will specifically introduce the use of granular decision-making algorithm to solve the granular decision-making problem.
当选定某一种粒度级别,就将一组用户按照该粒度级别构成一组物品,然后使用最少的服务器来装载其渲染任务,同时满足三维资源容量的 约束。三维装箱问题是NP难问题,有许多近似或启发式算法能够求解该问题。例如,可以使用传统的首次适应(First-Fit)算法:在装载一个新的物品时,算法会按顺序扫描已启动的服务器,将物品装载在满足容量约束的第一台服务器中;如果已启动的服务器中没有任何一台满足容量约束,就启动一台新服务器。为了确定一个物品是否满足服务器的容量约束,需要评估资源容量约束(9)、(13)和(14),并考虑数学模型(17)中列出的资源共享。值得注意的是,本发明的粒度决策算法适用于任何装箱算法。When a certain level of granularity is selected, a group of users will form a group of items according to the level of granularity, and then use the least number of servers to load their rendering tasks, while meeting the constraints of 3D resource capacity. The 3D bin packing problem is NP-hard and there are many approximate or heuristic algorithms that can solve it. For example, the traditional First-Fit algorithm can be used: when loading a new item, the algorithm scans the servers that have been started in order, and loads the item in the first server that meets the capacity constraint; If none of the existing servers satisfies the capacity constraint, a new server is started. To determine whether an item satisfies the server's capacity constraints, resource capacity constraints (9), (13) and (14) need to be evaluated, taking into account the resource sharing outlined in the mathematical model (17). It is worth noting that the granularity decision algorithm of the present invention is applicable to any bin packing algorithm.
考虑上述问题的一个特例:每个应用程序实例中只有一个用户组。在这种情况下,用户组和实例是一一对应的。数学模型(17)中的两个变量y
kj和z
ij变得相同了,因此两个对应的容量约束变成一个:
Consider a special case of the above problem: There is only one user group per application instance. In this case, there is a one-to-one correspondence between user groups and instances. The two variables y kj and z ij in the mathematical model (17) become the same, so the two corresponding capacity constraints become one:
结果,服务器装载问题变成了基数受限的装箱问题(cardinality constrained bin packing problem),仍然是NP难问题,已有一些近似算法被提出用于求解该问题。As a result, the server loading problem becomes a cardinality constrained bin packing problem, which is still NP-hard, and some approximate algorithms have been proposed to solve it.
在下文中将介绍求解上述建模问题涉及到的算法。In the following, the algorithms involved in solving the above modeling problems will be introduced.
1)共享感知的用户分配算法1) Shared perception user allocation algorithm
对于上述将用户分配问题建模成集合划分问题,公式(22)中定义的二元组集合
的基数是指数级的,所以直接求解该问题效率低下。因此,在一个实施例中,提出了一种高效的启发式算法(共享感知的用户分配算法)来求解,算法不从指数级基数的集合
中选择一种划分,而是从集合
的一个子集
开始,即:
For the above modeling of the user assignment problem as a set partition problem, the set of bigrams defined in Eq. (22) The cardinality of is exponential, so solving the problem directly is inefficient. Therefore, in one embodiment, an efficient heuristic algorithm (shared-aware user assignment algorithm) is proposed to solve the algorithm, which does not start from the set of exponential cardinality Choose a partition from , but from the set A subset of start, that is:
具体地,共享感知的用户分配算法步骤包括:Specifically, the user allocation algorithm steps of shared awareness include:
步骤S101,由给定的某一个实例的用户集合U和一组边缘站点V,可以得到二元组集合
每个二元组的第一个元素为站点v,第二个元素为用户集合U的子集R
v,其为在站点v上满足延迟要求的用户。
Step S101, given a user set U of a certain instance and a set of edge sites V, a set of 2-tuples can be obtained The first element of each two-tuple is site v, and the second element is a subset R v of user set U, which is the user meeting the delay requirement on site v.
步骤S102,从集合
中选择成本效益
最小的二元组(称为<v
*,A
*>),作为用户集合U对应站点v的划分块。
Step S102, from the collection cost-effective The smallest binary group (called <v * , A * >) is used as the division block of the user set U corresponding to the site v.
步骤S103,更新集合
从每个二元组中删除步骤S102中已经被分配的用户(即A*),以确保所选的二元组是两两互不相交的。
Step S103, update collection Delete the users assigned in step S102 (namely A*) from each dyad to ensure that the selected dyads are pairwise disjoint.
步骤S105,重复步骤S102至S104,直到所选二元组的并集等于U。Step S105, repeat steps S102 to S104 until the union of the selected 2-tuples is equal to U.
2)粒度决策算法2) Granular decision-making algorithm
在一个实施例中,将粒度决策问题建模成多臂赌博机问题,使用强化学习中的动作价值(action-value)方法来求解该问题。令Q(a)表示动作a的价值函数,用该价值函数来评价选择每个动作的优劣。价值函数Q(a)是通过计算选择动作a的步骤中获得的平均奖励得到的。价值函数Q(a)的初始值对收敛影响很大,首先初始化Q(a),再利用Q(a)来选择动作。在价值函数Q(a)的初始化阶段,在m个步骤中选择动作a来初始化Q(a)。令N(a)表示选择动作a的次数。In one embodiment, the granular decision problem is modeled as a multi-armed bandit problem, which is solved using the action-value method in reinforcement learning. Let Q(a) denote the value function of action a, which is used to evaluate the merits of choosing each action. The value function Q(a) is obtained by computing the average reward obtained in the steps of choosing action a. The initial value of the value function Q(a) has a great influence on the convergence, first initialize Q(a), and then use Q(a) to select actions. In the initialization phase of the value function Q(a), an action a is selected in m steps to initialize Q(a). Let N(a) denote the number of times action a is selected.
具体地,粒度决策算法包括以下步骤:Specifically, the granular decision-making algorithm includes the following steps:
步骤S201,根据以下规则初始化价值函数Q(a),每个步骤,依次采取用户组粒度级别、实例粒度级别和用户粒度级别三个动作。循环上述过程直到每个动作均被选择了m次,结束初始化阶段。将m个步骤中动作a获得的平均奖励赋值给Q(a),将m赋值给N(a)。In step S201, the value function Q(a) is initialized according to the following rules. In each step, three actions are taken sequentially at the user group granularity level, instance granularity level and user granularity level. Repeat the above process until each action is selected m times, and end the initialization phase. Assign the average reward obtained by action a in m steps to Q(a), and assign m to N(a).
步骤S202在初始化阶段结束以后,以1-ξ的概率选择价值函数Q(a)值最高的动作,以ξ的概率随机选择动作。In step S202, after the initialization phase ends, the action with the highest value function Q(a) is selected with the probability of 1-ξ, and the action is randomly selected with the probability of ξ.
步骤S203,使用动作a对依次到达的k个实例进行服务器装载。统计在此期间新启动服务器的数量,将其负数作为动作a获得的奖励R。Step S203, use action a to perform server loading on the sequentially arriving k instances. Count the number of newly started servers during this period, and use its negative number as the reward R obtained by action a.
步骤S204,计数函数N(a)增加1。In step S204, the counting function N(a) is incremented by 1.
步骤S205,利用计数函数N(a)更新价值函数Q(a),即Q(a)=Q(a)+[R-Q(a)]/N(a)。Step S205, using the counting function N(a) to update the value function Q(a), that is, Q(a)=Q(a)+[R-Q(a)]/N(a).
步骤S206,重复步骤S202至S205。Step S206, repeat steps S202 to S205.
在上述算法中,参数ξ平衡了强化学习中的探索和利用。参数m影响初始化的质量,并进一步影响学习的过程。m越大,价值函数越接近真实值,但是m过大可能会导致算法的收敛变慢。参数k是每一个步骤执行卸载的实例的数量,它会影响奖励的质量。如果k太小,极端情况下为1, 则每个动作的奖励都非常相似甚至为0,以至于无法区分动作的优劣;如果k太大,价值函数的更新需要卸载大量的实例,同时会导致算法的收敛变慢。In the above algorithm, the parameter ξ balances exploration and exploitation in reinforcement learning. The parameter m affects the quality of the initialization and further affects the learning process. The larger m is, the closer the value function is to the real value, but too large m may slow down the convergence of the algorithm. The parameter k is the number of instances performing offloading at each step, which affects the quality of the reward. If k is too small, 1 in extreme cases, the rewards of each action are very similar or even 0, so that it is impossible to distinguish the pros and cons of actions; if k is too large, the update of the value function needs to unload a large number of instances, and at the same time This leads to slower convergence of the algorithm.
3)共享感知的在线卸载方法3) Shared-aware online uninstall method
接下来,基于上述介绍的子算法介绍在线卸载方法。本发明按照实例到达的顺序一个一个地进行卸载。对于每个实例,首先求解用户分配问题,然后求解每个站点的服务器装载问题。但是,一个站点可能无法容纳分配给它的所有用户,因为在进行用户分配的时候没有考虑每个站点的服务器数量。因此,按照成本非递减的顺序对站点进行排序,然后依次对站点进行服务器装载,目的是优先利用成本最低的站点,在站点成本相同的情况下,服务器较多的站点优先,以避免在服务器装载时发生资源不够的情况。Next, the online offloading method is introduced based on the sub-algorithms introduced above. The present invention unloads instances one by one in the order they arrive. For each instance, the user allocation problem is first solved, followed by the server loading problem for each site. However, a site may not be able to accommodate all the users assigned to it because the number of servers per site is not considered when user allocation is made. Therefore, the sites are sorted in the order of non-decreasing cost, and then the servers are loaded on the sites in turn. The purpose is to give priority to the site with the lowest cost. Insufficient resources occur from time to time.
具体地,共享感知的在线卸载方法卸载任意一个实例的步骤包括:Specifically, the steps of uninstalling any instance in the sharing-aware online uninstalling method include:
步骤S301,按照站点的成本非递减的顺序对站点进行排序,如果成本相同,则具有更多服务器的站点优先级更高。令C表示当前成本最低的站点。In step S301, the sites are sorted in a non-decreasing order of their costs, and if the costs are the same, the site with more servers has a higher priority. Let C denote the currently lowest-cost site.
步骤S302,根据步骤S301得到的站点序列,令V’表示序列中从站点C开始的所有站点的集合。基于实例的用户集合U和站点集合V’,用户分配算法可以求解得到用户集合U的一个集合划分{U
v:v∈V’},其中,当前站点C对应的用户集合为U
C。
Step S302, according to the station sequence obtained in step S301, let V' represent the set of all stations starting from station C in the sequence. Based on the example user set U and site set V', the user allocation algorithm can be solved to obtain a set partition {U v :v∈V'} of the user set U, where the user set corresponding to the current site C is U C .
步骤S303,由粒度决策算法得到当前的最佳粒度级别。In step S303, the current optimum granularity level is obtained by the granularity decision algorithm.
步骤S304,根据步骤S303得到的粒度级别,使用首次适应算法,在当前站点C中对步骤S302得到用户集合U
C中的用户的任务进行服务器装载,直到服务器不够。
Step S304, according to the granularity level obtained in step S303, use the first adaptation algorithm to load the server in the current site C for the tasks of the users in the user set U C obtained in step S302 until there are not enough servers.
步骤S305,从实例的用户集合U中删除步骤S304中成功装载任务的用户。Step S305, delete the user who successfully loaded the task in step S304 from the user set U of the instance.
步骤S306,根据步骤S301得到的站点序列,将当前站点C的下一个站点作为站点C。Step S306, according to the station sequence obtained in step S301, the station next to the current station C is taken as station C.
步骤S307,重复步骤S302至S306,直到用户集合U中的所有用户的任务都已装载或所有站点都已尝试。Step S307, repeat steps S302 to S306, until the tasks of all users in the user set U have been loaded or all sites have been tried.
综上,本发明的共享感知的卸载方法(或称为SAO),在卸载渲染任务时考虑了资源共享。与不考虑资源共享的算法(或称为SBO)相比,尽管不考虑共享的卸载算法(SBO)与本发明的方法一样交替、迭代地求解用户分配子问题和服务器装载子问题,但SBO算法使用不同的用户分配算法和服务器装载算法。对于每个实例,SBO算法将每位用户分配给延迟满足要求且成本最低的边缘站点,如果成本相同,则服务器最多的站点优先。另外,在服务器装载时,SBO算法使用首次适应算法且以用户为粒度。需要注意的是,对于任何一种方法,在首次适应算法中,在确定一个物品(可以是一位用户、同属于一个共享视角用户组的一组用户、或同属于一个实例的一组用户)是否满足服务器的容量约束时,必须考虑数学模型(17)中的资源共享。In summary, the sharing-aware offloading method (or called SAO) of the present invention takes resource sharing into consideration when offloading rendering tasks. Compared with the algorithm (or called SBO) that does not consider resource sharing, although the unloading algorithm (SBO) that does not consider sharing is the same as the method of the present invention, iteratively solves the user allocation subproblem and the server loading subproblem, but the SBO algorithm Use different user allocation algorithms and server loading algorithms. For each instance, the SBO algorithm assigns each user to the edge site with the lowest latency and the lowest cost. If the cost is the same, the site with the most servers is preferred. In addition, when the server is loaded, the SBO algorithm uses the first-fit algorithm and takes users as the granularity. It should be noted that for any method, in the first adaptation algorithm, when determining an item (it can be a user, a group of users belonging to a shared view user group, or a group of users belonging to an instance) Resource sharing in the mathematical model (17) must be considered when satisfying the capacity constraints of the server.
进一步地,为了验证本发明的效果,进行了仿真实验,并将本发明与以用户为粒度(SAO-U)、以用户组为粒度(SAO-G)和以实例为粒度(SAO-I)进行服务器装载的算法进行了比较。使用下面定义的成本降低率来评估算法的性能:Further, in order to verify the effect of the present invention, a simulation experiment has been carried out, and the present invention is compared with the user as the granularity (SAO-U), the user group as the granularity (SAO-G) and the instance as the granularity (SAO-I) Algorithms for server loading are compared. The performance of the algorithm is evaluated using the cost reduction rate defined below:
在仿真实验中,如果没有另外说明,设置的参数如表1所示。实例依次启动并保持运行直到实验结束。每个实验结果都是经过20次随机仿真实验获得的。实验中,将算法的参数设置如下:θ=1,ξ=0,k=200,m=1。因此,在粒度决策算法中,价值函数的初始化需要3mk=600个实例。此外,在粒度决策算法中,从第二步开始初始化,因为第一步的服务器都是未启动的,从零开始可能会导致奖励的偏差较大。In the simulation experiment, if not stated otherwise, the set parameters are shown in Table 1. Instances are started sequentially and remain running until the end of the experiment. Each experimental result is obtained after 20 random simulation experiments. In the experiment, the parameters of the algorithm are set as follows: θ=1, ξ=0, k=200, m=1. Therefore, in the granular decision algorithm, the initialization of the value function requires 3mk = 600 instances. In addition, in the granular decision-making algorithm, the initialization starts from the second step, because the servers in the first step are not started, starting from zero may lead to a large deviation of the reward.
表1仿真环境的参数设置Table 1 Parameter settings of the simulation environment
参数parameter | 值value |
每个实例的共享视角用户组数量Number of shared perspective user groups per |
22 |
每个共享视角用户组的用户数量Number of users per shared |
44 |
边缘站点的数量Number of |
5050 |
每个边缘站点的服务器数量Number of servers per edge location | 服从[50,100]上的均匀分布Obey the uniform distribution on [50,100] |
每个边缘站点中的服务器成本(c v) Server cost in each edge location (c v ) | 服从[1,10]上的均匀分布Obey the uniform distribution on [1,10] |
<p I,p K,p B> <p I ,p K ,p B > | <5,10,20><5,10,20> |
延迟t uv(毫秒) Delay t uv (ms) | 服从[10,50]上的均匀分布Obey the uniform distribution on [10,50] |
延迟限制τ(毫秒)Latency limit τ (milliseconds) | 3030 |
1)、成本的降低1), cost reduction
对比SBO算法,本发明可以显著降低成本。如图4所示,本发明的方法(SAO)比SBO算法降低了高达52%的成本。性能的提升来自两个方面:1)共享感知的用户分配算法;2)粒度决策算法。首先,与SBO算法相比,SAO-U算法的成本降低了50%。这表明同样是以用户为粒度进行服务器装载,利用资源共享可以降低成本。此外,本发明的方法(SAO)优于SAO-U算法大约两个百分点。这表明粒度决策算法可以进一步降低成本。Compared with the SBO algorithm, the present invention can significantly reduce the cost. As shown in Fig. 4, the method of the present invention (SAO) reduces the cost by up to 52% compared to the SBO algorithm. The performance improvement comes from two aspects: 1) user allocation algorithm based on shared awareness; 2) granular decision algorithm. First, compared with the SBO algorithm, the cost of the SAO-U algorithm is reduced by 50%. This shows that server loading is also performed at the granularity of users, and resource sharing can reduce costs. Furthermore, the method of the present invention (SAO) outperforms the SAO-U algorithm by about two percentage points. This suggests that granular decision algorithms can further reduce costs.
表2通过20次4000个实例的随机仿真实验,粒度决策算法在各种情况下的收敛结果。Table 2 shows the convergence results of the granular decision-making algorithm in various situations through 20 random simulation experiments with 4000 instances.
表2实验结果Table 2 Experimental results
环境参数Environmental parameters | 实例example | 用户组user group | 用户user |
默认的延迟限制τ,默认的服务器成本Default latency limit τ, default server cost | 1616 | 44 | 00 |
默认的延迟限制τ,服务器成本相同Default latency bound τ, |
00 | 44 | 1616 |
延迟限制τ=40ms,默认的服务器成本Latency limit τ=40ms, |
00 | 2020 | 00 |
如表2第一行所示,在20次4000个实例的随机仿真实验中,粒度决策算法在16次实验中收敛到实例的粒度级别,在4次实验中收敛到用户组的粒度级别。As shown in the first row of Table 2, in 20 random simulation experiments with 4000 instances, the granular decision-making algorithm converges to the granularity level of instances in 16 experiments, and converges to the granularity level of user groups in 4 experiments.
接下来,将所有服务器的成本设为相同值。由图5可知,本发明的方法(SAO)可以降低成本多达30%。在这种情况下,SAO-I算法的性能非常差,因为在服务器成本相等的情况下,|A|
-θ在公式(23)中占主导地位,因此实例中的用户更加紧密地聚合起来,以实例为粒度进行服务器装载,尽管可以最大化资源共享,但由于物品的尺寸大而导致资源浪费。相比之下,SAO-U算法的性能最好,因为以用户为粒度的物品的尺寸小,可以充分地利用资源。本发明的方法(SAO)通过学习几乎得到了最好的性能。如表2第二行所示,在20次有4000个实例的随机仿真实验中,粒度决策 算法在16次实验中收敛到用户的粒度级别,在4次实验中收敛到用户组的粒度级别,成功地避开了性能最差的实例这一粒度级别,几乎收敛到了最佳的粒度级别。
Next, set the cost of all servers to the same value. It can be seen from FIG. 5 that the method (SAO) of the present invention can reduce the cost by as much as 30%. In this case, the performance of the SAO-I algorithm is very poor, because |A| -θ dominates in Equation (23) under equal server costs, so the users in the instance are more closely aggregated, Loading servers at the granularity of instances, although resource sharing can be maximized, results in waste of resources due to the large size of items. In contrast, the performance of the SAO-U algorithm is the best, because the size of items with user granularity is small, and resources can be fully utilized. The method of the present invention (SAO) achieves almost the best performance by learning. As shown in the second row of Table 2, in 20 random simulation experiments with 4000 instances, the granularity decision algorithm converges to the granularity level of users in 16 experiments, and converges to the granularity level of user groups in 4 experiments. It successfully avoids the granularity level of the worst performing instance and almost converges to the best granularity level.
2)、动态环境下的性能2), performance in dynamic environment
当一个实例启动时,本发明的方法为其分配资源,当实例完成时,被占用的资源将被释放并在之后重新利用。在上述动态环境中评估算法的性能,4000个实例依次到达和离开,实例的到达时间间隔服从均值为500毫秒的指数分布,实例的运行时间服从10到20分钟的均匀分布。其他参数设置如表1所示。实验结果通过20次随机仿真实验获得。图6(a)显示了随着时间的推移,系统中正在运行的实例的平均数量。在每一次仿真实验中,随着实例的依次到达,正在运行的实例的数量先增加,接着在第一个实例离开前达到最大,并保持了一段时间,然后在最后一个实例到达后减少。图6(b)显示了随着时间推移,各算法的平均成本降低率。与SBO算法相比,本发明的方法(SAO)降低了40%以上的成本,学习到了服务器装载的最佳粒度。When an instance is started, the method of the present invention allocates resources for it, and when the instance is completed, the occupied resources will be released and reused later. Evaluate the performance of the algorithm in the above dynamic environment, 4000 instances arrive and leave sequentially, the arrival time interval of the instances obeys the exponential distribution with a mean value of 500 milliseconds, and the running time of the instances obeys the uniform distribution of 10 to 20 minutes. Other parameter settings are shown in Table 1. The experimental results are obtained through 20 random simulation experiments. Figure 6(a) shows the average number of running instances in the system over time. In each simulation experiment, the number of running instances first increases as the instances arrive sequentially, then reaches a maximum before the first instance leaves, remains there for a while, and then decreases after the last instance arrives. Figure 6(b) shows the average cost reduction rate of each algorithm over time. Compared with the SBO algorithm, the method (SAO) of the present invention reduces the cost by more than 40%, and learns the optimal granularity of server loading.
3)、延迟限制τ的影响3), the influence of delay limit τ
针对评估延迟限制τ对性能的影响,比较了延迟限制τ在三种值下的服务器成本:20毫秒、30毫秒和40毫秒。由于SBO算法在延迟限制τ为20毫秒时会消耗大量服务器,在这组实验中,将每个站点的服务器数量设置为服从100到150的均匀分布。如图7所示,随着延迟限制的增加,本发明的方法(SAO)和SBO算法的成本都降低了。原因是延迟限制的值越高,每个边缘站点中延迟要求满足的用户越多,资源共享和降低成本的机会越大。To evaluate the impact of delay limit τ on performance, the server cost under three values of delay limit τ is compared: 20 ms, 30 ms and 40 ms. Since the SBO algorithm consumes a large number of servers when the delay limit τ is 20 milliseconds, in this set of experiments, the number of servers for each site is set to obey a uniform distribution from 100 to 150. As shown in Fig. 7, the cost of both the method of the present invention (SAO) and the SBO algorithm decreases as the delay constraint increases. The reason is that the higher the value of the latency limit, the more users the latency requirements are satisfied in each edge site, and the greater the opportunity for resource sharing and cost reduction.
图8显示了当延迟限制τ为40毫秒时的成本降低率。与延迟限制τ为30毫秒的情况不同,SAO-I算法性能较差。因为延迟限制的值越高,每个边缘站点中延迟要求满足的用户越多,在用户分配之后,更多的用户可能聚集在一起。以实例为粒度进行服务器装载,由于物品的尺寸大,会导致资源浪费。本发明的方法(SAO)通过学习得到了最好的性能。如表2第三行所示,在20次4000个实例的随机仿真实验中,粒度决策算法在所 有实验中都收敛到用户组的粒度级别。Fig. 8 shows the cost reduction rate when the delay limit τ is 40ms. Unlike the case where the delay limit τ is 30 ms, the SAO-I algorithm performs poorly. Because the higher the value of the latency limit, the more users the latency requirement satisfies in each edge site, the more users may be clustered together after user allocation. Loading servers at the granularity of instances will lead to waste of resources due to the large size of the items. The method of the present invention (SAO) achieves the best performance through learning. As shown in the third row of Table 2, in 20 random simulation experiments with 4000 instances, the granular decision-making algorithm converges to the granularity level of user groups in all experiments.
4)、权重值θ的的影响4), the influence of the weight value θ
进一步地,评估θ对性能的影响,即|A|(集合划分块的基数)相对于Cv(服务器的成本)的权重。图9显示了在有2000个实例的仿真实验中,划分块基数的经验累积分布函数。如果θ的值较大,共享感知的用户分配算法倾向于选择包含更多用户的划分块。Further, the impact of θ on performance is evaluated, that is, the weight of |A| (the cardinality of the collection partition block) relative to Cv (the cost of the server). Figure 9 shows the empirical cumulative distribution function of the partition cardinality in a simulation experiment with 2000 instances. If the value of θ is large, the shared-aware user allocation algorithm tends to choose the partition block that contains more users.
图10显示了在不同的θ值下本发明的方法(SAO)的成本降低率。在问题规模较大的情况下,随着θ从0增加到1,性能变好,因为|A|的权重提高了站点中用户的聚合度,增加了用户之间资源共享的机会。然而,随着θ的进一步增加,性能变差。因为|A|的权重变大,导致服务器的成本被忽视,虽然用户的聚合度变高,但是用户可能被分配给一些服务器成本高的边缘站点。在一个实施例中,将θ设置为1,用以在服务器成本和用户聚合之间取得良好的平衡。Figure 10 shows the cost reduction rate of the method of the present invention (SAO) at different values of θ. In the case of large problem scale, as θ increases from 0 to 1, the performance becomes better, because the weight of |A| improves the aggregation degree of users in the site and increases the chance of resource sharing among users. However, as θ increases further, the performance gets worse. Because the weight of |A| becomes larger, the cost of the server is ignored. Although the aggregation degree of the user becomes higher, the user may be allocated to some edge sites with high server cost. In one embodiment, θ is set to 1 to strike a good balance between server cost and user aggregation.
综上所述,本发明提出了一种共享感知的在线卸载方法,用于求解单个应用的多个实例的任务卸载问题,能够使实例中的每位用户的延迟需求都得到满足,同时最小化所使用的服务器的成本。本发明利用了任务的多维资源共享的特性,通过资源共享降低资源消耗,从而降低系统的成本。提出了一种共享感知的用户分配算法,用于将用户分配到边缘站点,使每位用户的延迟满足要求,同时最大化资源共享。算法考虑使用户的延迟满足要求,同时将用户聚合起来,最小化服务器的成本并提高用户之间资源共享的机会。此外,提出了使用强化学习方法从经验中学习装载任务的最佳粒度级别。由于采用不同的粒度级别进行服务器装载产生的成本不同,利用最佳的粒度级别可以降低服务器装载产生的成本。To sum up, the present invention proposes a shared-aware online offloading method, which is used to solve the task offloading problem of multiple instances of a single application, which can satisfy the delay requirements of each user in the instance while minimizing The cost of the servers used. The present invention utilizes the characteristic of multi-dimensional resource sharing of tasks, and reduces resource consumption through resource sharing, thereby reducing system cost. A sharing-aware user allocation algorithm is proposed to allocate users to edge sites so that the latency of each user meets the requirements while maximizing resource sharing. The algorithm considers making the user's latency satisfy the requirements, while aggregating users together, minimizing the cost of the server and improving the chance of resource sharing among users. Furthermore, an optimal level of granularity for loading tasks is learned from experience using reinforcement learning methods. Since the cost of server loading at different granularity levels is different, using the best granularity level can reduce the cost of server loading.
本发明可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。The present invention can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任 意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++、Python等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以 执行计算机可读程序指令,从而实现本发明的各个方面。Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the invention are implemented by executing computer readable program instructions.
这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein. The scope of the invention is defined by the appended claims.
Claims (10)
- 一种资源共享感知的在线任务卸载方法,包括:A resource sharing-aware online task offloading method, comprising:对站点进行排序,得到站点序列,其中用C表示排序第一的站点;Sorting the sites to obtain a site sequence, where C represents the site ranked first;执行以下步骤,直到用户集合中的用户任务已装载或目标站点已遍历:Perform the following steps until the user tasks in the user collection are loaded or the target site is traversed:基于实例的用户集合U和目标站点集合V’,通过求解建模的用户分配问题将用户预分配到站点,其中V’表示站点序列中从站点C开始的所有站点的集合,所述用户分配问题的优化目标是满足每位用户的延迟要求,并最大化资源共享的机会;Based on the instance-based user set U and target site set V', users are pre-assigned to sites by solving the modeled user allocation problem, where V' represents the set of all sites starting from site C in the site sequence, the user allocation problem The optimization goal of is to meet the delay requirements of each user and maximize the opportunity for resource sharing;在当前站点C中,对得到的用户集合中的用户任务通过求解建模的服务器装载问题,确定待装载任务的服务器,其中所述服务器装载问题的优化目标是针对预分配给各站点的用户,使用最少的服务器装载其任务,并满足服务器的资源容量限制。In the current site C, by solving the modeled server loading problem for the user tasks in the obtained user set, determine the server to be loaded with the task, wherein the optimization goal of the server loading problem is for the users pre-assigned to each site, Use the fewest servers to load its tasks and meet the server's resource capacity constraints.
- 根据权利要求1所述的方法,其特征在于,将所述用户分配问题建模成集合划分问题,对于给定的用户集合被划分成不相交的子集,每个子集被分配到一个站点,并且这些子集的并集等于整个用户集合。The method according to claim 1, wherein the user assignment problem is modeled as a set partition problem, for a given set of users is divided into disjoint subsets, each subset is assigned to a site, And the union of these subsets is equal to the entire set of users.
- 根据权利要求2所述的方式,其特征在于,将单个实例的用户分配问题定义为:给定一组用户集合U和 选择 的子集组成U的一种划分,其中 是由候选二元组构成的集合,表示为: The method according to claim 2, wherein the user allocation problem of a single instance is defined as: given a set of user sets U and choose Subsets of U form a partition of U, where is a set of candidate binary groups, expressed as:
- 根据权利要求3所述的方法,其特征在于,采用以下步骤求解所述单个实例的用户分配问题:The method according to claim 3, wherein the following steps are used to solve the user assignment problem of the single instance:步骤S41:对于给定的一个实例的用户集合U和一组站点V,得到二元组集合 其中 是 的一个子集; Step S41: For a given instance of user set U and a set of sites V, get a set of 2-tuples in yes A subset of;步骤S42:从集合 中选择成本效益最小的二元组,标记为(<v *,A *>),作为用户集合U对应站点v的划分块; Step S42: from the collection Select the binary group with the least cost-effectiveness in , marked as (<v * , A * >), as the division block of the user set U corresponding to the site v;步骤S43:更新集合 以从每个二元组集合中删除已经被分配的用 户A*; Step S43: Update collection To delete the already assigned user A* from each set of 2-tuples;步骤S45:重复步骤S42至S44,直到所选二元组的并集等于U。Step S45: Repeat steps S42 to S44 until the union of the selected two-tuples is equal to U.
- 根据权利要求4所述的方法,其特征在于,对于二元组 将其成本定义为: The method according to claim 4, characterized in that, for the binary Define its cost as:c(<v,A>)=c v|A| -θ c(<v, A>)=c v |A| -θ其中θ是|A|的权重,用于平衡服务器的成本和集合的基数,θ的值影响资源共享的程度。where θ is the weight of |A|, which is used to balance the cost of the server and the cardinality of the collection, and the value of θ affects the degree of resource sharing.
- 根据权利要求1所述的方法,其特征在于,将所述服务器装载问题建模为三维装箱问题,并设置用户粒度级别、用户组粒度级别和实例粒度级别进行决策,其中,所述用户粒度级别是将每个用户视为一个物品,所述用户组粒度级别是将一组共享视角的用户视为一个物品,所述实例粒度级别是将同属于一个实例的用户视为一个物品。The method according to claim 1, wherein the server loading problem is modeled as a three-dimensional bin packing problem, and a user granularity level, a user group granularity level, and an instance granularity level are set for decision-making, wherein the user granularity The level is to treat each user as an item, the user group granularity level is to treat a group of users sharing a perspective as an item, and the instance granularity level is to treat users belonging to the same instance as an item.
- 根据权利要求5所述的方法,其特征在于,将所述粒度级别决策问题建模成多臂赌博机问题,并使用强化学习中的动作价值进行求解,其中强化学习中的动作用于拟合粒度级别,强化学习的目标设置为动作选择中的最大化预期的总回报,对于所选择一个动作,依次卸载多个实例,并用所选择的动作进行服务器装载,将动作的奖励定义为新启动的服务器数量的负数。The method of claim 5, wherein the granularity-level decision-making problem is modeled as a multi-armed bandit problem and solved using action values in reinforcement learning, where actions in reinforcement learning are used to fit At the level of granularity, the goal of reinforcement learning is set to maximize the expected total return in action selection. For a selected action, multiple instances are unloaded in turn, and the server is loaded with the selected action, and the reward of the action is defined as the newly launched A negative number for the number of servers.
- 根据权利要求6所述的方法,其特征在于,求解所述服务器装载问题的约束条件包括:The method according to claim 6, wherein solving the constraint conditions of the server loading problem comprises:其中,布尔变量x uj表示服务器j是否托管用户u的任务,t uv表示将用户u的任务卸载到站点v所产生的网络延迟,用户u的延迟是∑ j∈St ujx uj,τ表示允许的延迟上限,U表示一组用户,S表示一组边缘服务器,布尔变量w j表示服务器j是否已启动,布尔变量y kj表示服务器j是否托管共享视角用户组k的渲染任务,服务器j上的任务数为∑ k∈Ky kj,p K表示每台服务器允许的最大任务数,布尔变量z ij表示服务器j是否托管实例i∈I,p I表示每台服务器允许的最大实例数,k(u)表示用户u所属的共享视角用户组,c v表示站点v中每台服务器的成本,向量p=(p C,p G,p M,p GM,p B)表示每台服务器的CPU、GPU、内存、GPU内存和带宽容量。 Among them, the Boolean variable x uj indicates whether the server j hosts the task of user u, t uv indicates the network delay caused by offloading the task of user u to site v, the delay of user u is ∑ j∈S t uj x uj , τ indicates The upper limit of latency allowed, U represents a group of users, S represents a group of edge servers, Boolean variable w j represents whether server j is started, Boolean variable y kj represents whether server j hosts rendering tasks for user group k sharing a view, on server j The number of tasks is ∑ k∈K y kj , p K represents the maximum number of tasks allowed by each server, the Boolean variable z ij represents whether server j hosts instance i∈I, p I represents the maximum number of instances allowed by each server, k (u) represents the shared perspective user group to which user u belongs, c v represents the cost of each server in site v, and the vector p=(p C , p G , p M , p GM , p B ) represents the CPU of each server , GPU, memory, GPU memory and bandwidth capacity.
- 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现根据权利要求1至8中任一项所述方法的步骤。A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 1 to 8 are implemented.
- 一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至8中任一项所述的方法的步骤。A computer device comprising a memory and a processor, wherein a computer program capable of running on the processor is stored on the memory, wherein any one of claims 1 to 8 is implemented when the processor executes the program The steps of the method described in the item.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/133572 WO2023092466A1 (en) | 2021-11-26 | 2021-11-26 | Resource sharing-aware online task unloading method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/133572 WO2023092466A1 (en) | 2021-11-26 | 2021-11-26 | Resource sharing-aware online task unloading method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023092466A1 true WO2023092466A1 (en) | 2023-06-01 |
Family
ID=86538639
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/133572 WO2023092466A1 (en) | 2021-11-26 | 2021-11-26 | Resource sharing-aware online task unloading method |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023092466A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190366208A1 (en) * | 2018-06-01 | 2019-12-05 | At&T Intellectual Property I, L.P. | Virtualized gaming emulation as a network service |
US20200154459A1 (en) * | 2018-11-13 | 2020-05-14 | Verizon Patent And Licensing Inc. | Systems and methods for assignment of multi-access edge computing resources based on network performance indicators |
CN112637290A (en) * | 2020-12-14 | 2021-04-09 | 厦门宏泰科技研究院有限公司 | Global communication network system based on micro base station and edge calculation |
US11032164B1 (en) * | 2019-05-30 | 2021-06-08 | Cox Communications, Inc. | Edge-based cloud application acceleration |
US20210266834A1 (en) * | 2020-02-25 | 2021-08-26 | South China University Of Technology | METHOD OF MULTI-ACCESS EDGE COMPUTING TASK OFFLOADING BASED ON D2D IN INTERNET OF VEHICLES (IoV) ENVIRONMENT |
CN114265630A (en) * | 2021-11-26 | 2022-04-01 | 深圳大学 | Resource sharing perception online task unloading method |
-
2021
- 2021-11-26 WO PCT/CN2021/133572 patent/WO2023092466A1/en unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190366208A1 (en) * | 2018-06-01 | 2019-12-05 | At&T Intellectual Property I, L.P. | Virtualized gaming emulation as a network service |
US20200154459A1 (en) * | 2018-11-13 | 2020-05-14 | Verizon Patent And Licensing Inc. | Systems and methods for assignment of multi-access edge computing resources based on network performance indicators |
US11032164B1 (en) * | 2019-05-30 | 2021-06-08 | Cox Communications, Inc. | Edge-based cloud application acceleration |
US20210266834A1 (en) * | 2020-02-25 | 2021-08-26 | South China University Of Technology | METHOD OF MULTI-ACCESS EDGE COMPUTING TASK OFFLOADING BASED ON D2D IN INTERNET OF VEHICLES (IoV) ENVIRONMENT |
CN112637290A (en) * | 2020-12-14 | 2021-04-09 | 厦门宏泰科技研究院有限公司 | Global communication network system based on micro base station and edge calculation |
CN114265630A (en) * | 2021-11-26 | 2022-04-01 | 深圳大学 | Resource sharing perception online task unloading method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113950066B (en) | Single server part calculation unloading method, system and equipment under mobile edge environment | |
CN110543336B (en) | Edge calculation task unloading method and device based on non-orthogonal multiple access technology | |
CN113242568A (en) | Task unloading and resource allocation method in uncertain network environment | |
CN110505644B (en) | User task unloading and resource allocation joint optimization method | |
CN109918201B (en) | Task unloading control method and system | |
CN113268341B (en) | Distribution method, device, equipment and storage medium of power grid edge calculation task | |
US20180321981A1 (en) | System and method for self organizing data center | |
CN110519370B (en) | Edge computing resource allocation method based on facility site selection problem | |
CN114338504A (en) | Micro-service deployment and routing method based on network edge system | |
CN113645273B (en) | Internet of vehicles task unloading method based on service priority | |
Alghayadh et al. | Ubiquitous learning models for 5G communication network utility maximization through utility-based service function chain deployment | |
CN110493317B (en) | Method for processing cloud platform resource fragments and related equipment | |
CN111342883A (en) | Resource allocation method and device | |
Lin et al. | Distributed deep neural network deployment for smart devices from the edge to the cloud | |
CN114356585B (en) | Optimization method and device for mobile edge computing and unloading and computer equipment | |
Lorido-Botran et al. | ImpalaE: Towards an optimal policy for efficient resource management at the edge | |
Kiran et al. | Optimising resource allocation for virtual network functions in SDN/NFV‐enabled MEC networks | |
CN110719335A (en) | Resource scheduling method, system and storage medium under space-based cloud computing architecture | |
CN114265630A (en) | Resource sharing perception online task unloading method | |
JP7367257B1 (en) | Communication management device and communication management method | |
CN111694670B (en) | Resource allocation method, apparatus, device and computer readable medium | |
WO2023092466A1 (en) | Resource sharing-aware online task unloading method | |
CN111158893A (en) | Task unloading method, system, equipment and medium applied to fog computing network | |
CN116744364A (en) | DQN-based multi-edge node system joint calculation unloading and resource allocation method | |
CN115988572A (en) | Caching and routing strategy optimization and network content distribution method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21965200 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |