WO2017065629A1 - Task scheduler and method for scheduling a plurality of tasks - Google Patents
Task scheduler and method for scheduling a plurality of tasks Download PDFInfo
- Publication number
- WO2017065629A1 WO2017065629A1 PCT/RU2015/000664 RU2015000664W WO2017065629A1 WO 2017065629 A1 WO2017065629 A1 WO 2017065629A1 RU 2015000664 W RU2015000664 W RU 2015000664W WO 2017065629 A1 WO2017065629 A1 WO 2017065629A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- task
- cores
- tasks
- slow
- fast
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4887—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/501—Performance criteria
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present invention relates to a task scheduler for scheduling a plurality of tasks on a multi-core processor and to a method for scheduling a plurality of tasks on a processor.
- the present invention also relates to a processor and to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the above method.
- Heterogeneous multi-core computing systems are widely used in networked mobile systems such as mobile phones, tablets and even subnotebook computers. These systems contain two types of processor cores: fast cores intended for high performance operation and low power cores intended for power aware operation.
- the first set is sometimes also called hot set, pull of hot cores, pull of fast cores.
- the second set comprises low performance cores with low power consumption and is also called cold set, pull of cold cores or pull of slow cores.
- HMCCS Carrying out tasks on the set of slow cores instead of the set of fast cores allows reducing the overall power consumption. This is of particular importance for mobile systems because it allows prolonging the battery life in mobile systems without recharging.
- the usual system software for operation of HMCCS comprises a compiler and a scheduler.
- the compiler is responsible for creation of programs running on such devices and the scheduler is responsible for loading of such devices during run-time.
- the main question in software development for these systems is what kind of core should be used for a program block or task in an HMCC system. In modern compilers this solution is done by the programmer.
- Another approach consists in changing the affiliation of task or processes, or threads, or blocks of the program with sets of different type cores automatically on the scheduler level.
- a lot of different techniques have been proposed.
- Various types of approaches for optimizing usage of HMCCS have been proposed.
- One direction is devoted to maximization of performance of HMCCS, and another direction is related with performance optimization inside established power consumption budget, and so on.
- the objective of the present invention is to provide a task scheduler and a method for task scheduling, wherein the task scheduler and the method overcome one or more problems of the prior art.
- an objective of the present invention can include increasing the efficiency of using computational systems with heterogeneous multi-core (HMC) architectures which comprises at least two types of cores.
- HMC heterogeneous multi-core
- a first aspect of the invention provides a task scheduler for scheduling a plurality of tasks on a multi-core processor comprising a set of slow cores and a set of fast cores, the task scheduler comprising:
- timing unit configured to compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks
- a task assigning unit configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.
- a slow core runtime of a task is the runtime of the task on a core of the set of slow cores.
- the slow core runtime can be an estimate of the runtime on the slow core runtime, in particular it can be an estimated minimum or maximum runtime on a core of the slow cores.
- the fast core runtime can be defined correspondingly.
- each application is considered as a set of tasks and a special task diagram describes this set of tasks, the hierarchy of tasks in the set and the sequence of task execution.
- the one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task can comprise the range of tasks on the critical path which are operating in the same time range with the candidate task.
- the method of the first aspect ensures that the execution of the candidate task does not prolong the runtime of the entire program.
- the method of the first aspect ensures that tasks are preferably assigned to the slow cores, thus saving energy consumption and leaving the set of fast cores available for the execution of more urgent tasks.
- the task scheduler further comprises:
- a graph construction unit configured to construct a task graph of the plurality of tasks
- a path finding unit configured to determine the critical path of the task graph.
- the task scheduler can have as an input the program code (which in embodiments can be in source code form or in compiled, binary form) and derive, using the graph construction unit and the path finding unit, the necessary information for scheduling the tasks of the program.
- the program code which in embodiments can be in source code form or in compiled, binary form
- the task scheduler of the first implementation can be configured to have as input a program code, which defines a plurality of tasks, and derive (as output) a scheduling for these tasks.
- a task graph can comprise a set of vertexes connected by ribs.
- the ribs are empty of latencies, because latencies are included into the duration of the appropriate tasks.
- vertexes in the task diagram in contrast with task graph contains multiply data as follows: t ⁇ v), t 2 (v), pj(v) and p2(v).
- t ⁇ v) denotes the duration of task v on fast set cores
- t 2 (v) denotes the duration of task v on slow set cores
- pi(v) denotes the power consumption of task v on fast set cores
- p2(v) denotes the power consumption of task v on slow set cores.
- the task scheduler can be configured to obtain the task graph and the critical path of the task graph as input from an external unit.
- the task graph can be determined during compilation of the program.
- the task scheduler further comprises a power computation unit configured to determine a power consumption gain of assigning a candidate task to the set of slow cores, wherein the task assigning unit is configured to assign candidate tasks in an order of decreasing power consumption gain.
- the task scheduler itself is configured to determine the power consumption gain. This means that the task scheduler can be independent of other devices and has fewer requirements that other units providing information regarding the tasks to be executed.
- the power computation unit is configured to determine the power consumption gain as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
- the task scheduler further comprises a preliminary execution unit configured to determine a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
- the preliminary execution unit is configured to determine the slow core and/or fast core runtime before the execution of a program.
- the task scheduler can be configured to determine the slow core and/or fast core runtime of the tasks of a program during the installation of the program.
- a second aspect of the invention relates to a processor comprising a set of fast cores, a set of slow cores and a task scheduler according to the first aspect of the invention or one of its implementations.
- the task scheduler can be integrated into the processor.
- the task scheduler can be integrated into the hardware of the processor. This has the advantage that external components do not need to be modified in order to achieve the performance gain.
- a third aspect of the invention relates to a method for scheduling a plurality of tasks on a processor comprising a set of fast cores and a set of slow cores, the method comprising:
- the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
- the methods according to the third aspect of the invention can be performed by the task scheduler according to the first aspect of the invention. Further features or implementations of the method according to the third aspect of the invention can perform the functionality of the task scheduler according to the first aspect of the invention and its different implementation forms.
- the method further comprises initial steps of:
- the method further comprises:
- determining a power consumption gain of assigning the candidate task to the set of slow cores determining a power consumption gain of assigning the candidate task to the set of slow cores
- the power consumption gain is determined as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
- the method further comprises an initial step of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
- the preliminary runs are carried out for collecting information on task execution time and latency by executing the candidate task on different sets of cores, and wherein the slow core runtime and/or the fast core runtime are determined based on the collected information.
- the task scheduler can thus determine the required information by carrying out the preliminary runs. This can involve additional computation time, but can still lead to a reduction of overall computation time, in particular for long execution times of a program.
- a fourth aspect of the invention refers to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of the third aspect or one of the implementations of the third aspect.
- FIG 1 is a block diagram illustrating a task scheduler in accordance with an embodiment of the present invention
- FIG 2 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
- FIG 3 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
- FIG. 4 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
- FIG. 5 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
- FIG. 6 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention
- FIG. 7 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.
- FIG 1 is a block diagram illustrating a task scheduler 100 in accordance with an embodiment of the present invention.
- the task scheduler 100 comprises a timing unit 110 and a task assigning unit 120. Further, the task scheduler 100 can optionally, as indicated with dashed lines in FIG 1 comprise a graph construction unit 130, a path finding unit 140, a power computation unit 150, and a preliminary computation unit 160.
- the task scheduler 100 can be implemented as part of a processor (not shown in FIG 1) or can be implemented in a hardware device that is located outside the processor.
- FIG 2 is a flow chart illustrating a method 200 for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.
- the method 200 comprises a step 210 of comparing a slow core runtime of a candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task.
- the method comprises a further step 220 of, if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
- the method optionally further comprises three initial or preliminary steps: A first initial step 202 of construction a task graph of the plurality of tasks, a second initial step 204 of determining a critical path of the task graph, and a third initial step 206 of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
- the method steps are carried out in the order as shown in FIG. 2. However, in other embodiments of the invention, the method steps can be carried out in a different order.
- FIG 3 is a schematic diagram which illustrates the problem addressed by the task scheduler and method of the present invention.
- Shown in FIG 3 are a plurality of tasks, a first, second and third task 310, 320, 330 that are on a critical path 305, and a candidate task 340.
- the tasks 310, 320, 330 are allocated to a set of fast cores 302.
- the task scheduler should decide whether to assign it to the set of fast cores 302 or a set of slow cores 304.
- the time of program execution corresponds to the longest path (critical path) through the task graph, evaluated on task execution times.
- the performance of the program is the inverse value of the program execution time.
- To maximize the performance of the program means minimizing the program execution time or minimizing the critical path of the task diagram.
- a minimal value of a critical path corresponds to execution of tasks of critical path on cores of the fast set. All other tasks (not included into the critical path) should migrate among sets to minimize power consumption (this facility is denoted by the "?” sign on the diagram shown on FIG 3)
- first task 410, fourth task 440, fifth task 450 and sixth task 460 are located on a critical path, indicated by dashed line 405, wherein the tasks are assigned to a set of fast or hot cores, 402.
- a second task 420 and a third task 430 are located outside of the critical path, but on the same level as the fourth task 440, indicated as "Level 2" in FIG. 4.
- Second task 420 and third task 430 are considered as candidate tasks in the following.
- A, B, C, D, and E denotes the first task, the second task,the third task, the fourth task, and the fifth task, respectively.
- the second task B and the third task C can be affiliated with the set of slow cores, without exceeding the total runtime, and hence, without the loss of performance if
- the first inequality according to task diagram is valid only for tasks of the same level, namely Level 2, but the second inequality according to the task diagram shown on FIG 4 is valid for the tasks of the range of levels, namely Level 2 and Level 3, because the second task 420 (task B) operates not only on one level, but on a few.
- the second and third tasks 420, 430 should be affiliated with the set of slow cores.
- FIG 5 an example is shown, where a plurality of tasks 510, 540, 550, and
- the migration of the third task 530 to the slow set of cores is available, but migration of the second task 520 is not available. In this case
- the power consumption will decrease by the following value
- Pprofit(Level2) p ⁇ C) - p 2 (C) .
- FIG 6 shows a similar example, where a plurality of tasks 610, 640, 650, and
- p proflt (Level2) Pl (B) - p 2 (B) + Pl (C) - p 2 (C)
- the order of migration is not essential to get a better result in terms of minimization of power consumption. But for example, it is better that, if
- the second task 620 migrates before the first, otherwise the third task 630 migrates the first.
- FIG 7 is a flow chart of an example method for migrating tasks, wherein the task B is belonging to only one fixed level fixed among the set of fast cores and the set of slow cores.
- a list of candidate tasks L is provided to the task scheduler.
- the task scheduler sorts the list L in order of decreasing power consumption profit (computed e.g. as pi - p 2 ) .
- the result is stored in an ordered list Lj.
- a task D which is on a (e.g. previously determined) critical path is taken from the list and, in step 708, put "into the hot pull", i.e. assigned to the fast set of cores.
- step 710 it is checked whether D is the last task on the data layer. If so, there are no more tasks to process and the method stops in step 722.
- step 712 the method proceeds in step 712 and takes task B (the first task in ordered list L ⁇ ).
- step 714 the condition
- step 716 the method proceeds with step 716 and puts B into the "cold pull", i.e., assigns it to the set of slow cores. If the condition is not fulfilled, the method proceeds in step 718 and task B is put into the "hot pull", i.e., it is assigned to the set of fast cores.
- step 720 it is checked whether task B is the last task in the ordered list Li. If so, the method ends in step 722. Otherwise, the method continues with step 724, taking task B as the next task in the ordered list Li .
- D E, ...,S is the range of tasks of critical path which are operating in the same time range with task B.
- embodiments of the present invention comprise mapping tasks to sets of cores. Preliminary can be performed to collect information of execution times on different type core and appropriate power consumptions. After that it is possible to construct task diagram, evaluate the critical path on this diagram corresponding to the maximal value of performance, split this diagram on levels and on each level solve the problem of migration of tasks that are not belonging to the critical path. Potentially, these can be assigned to the set of slow cores, thus reducing overall power consumption.
- the method can comprise further steps.
- heterogeneous multi-core computing system consists of Ci,c2 r en cores of the fast set type (with high energy consumption and high performance) and ck+j , ck+2 , c legally cores of the slow set type (with low energy consumption and low performance), totally n cores. Now let us consider how to bind tasks in complicated software with processor cores of different sets:
- the task diagram is constructed.
- An evaluation of the power consumption is provided according to the tasks affiliation with the sets of cores. A gain in power consumption can be reached if even only one task is affiliated with the slow set cores. If many tasks are affiliated with slow set cores the power consumption profit will be essentially greater.
- Effects of a method in accordance with the present invention can include that a HMCCS performance is improved and/or the power consumption is decreased.
- a method can solve an optimization problem in order to minimize total completion time of each particular application. This can include finding an optimal mapping of tasks to cores that will make completion time reach its potential minimum and simultaneously decrease the power consumption of HMCCS as much as it is possible.
- Embodiments of the present invention can be used in a system with signal processors of SoC type in which the same software is running permanently. Thus, a particularly high power saving is achieved.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
- Power Sources (AREA)
Abstract
The present invention discloses a task scheduler for scheduling a plurality of tasks on a multi-core processor comprising a set of slow cores and a set of fast cores, the task scheduler comprising: a timing unit configured compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task, and a task assigning unit configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.
Description
TASK SCHEDULER AND METHOD FOR SCHEDULING A
PLURALITY OF TASKS
TECHNICAL FIELD
The present invention relates to a task scheduler for scheduling a plurality of tasks on a multi-core processor and to a method for scheduling a plurality of tasks on a processor.
The present invention also relates to a processor and to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the above method.
BACKGROUND
Heterogeneous multi-core computing systems (HMCCS) are widely used in networked mobile systems such as mobile phones, tablets and even subnotebook computers. These systems contain two types of processor cores: fast cores intended for high performance operation and low power cores intended for power aware operation. The first set is sometimes also called hot set, pull of hot cores, pull of fast cores. The second set comprises low performance cores with low power consumption and is also called cold set, pull of cold cores or pull of slow cores.
Carrying out tasks on the set of slow cores instead of the set of fast cores allows reducing the overall power consumption. This is of particular importance for mobile systems because it allows prolonging the battery life in mobile systems without recharging. The usual system software for operation of HMCCS comprises a compiler and a scheduler. The compiler is responsible for creation of programs running on such devices and the scheduler is responsible for loading of such devices during run-time. The main question in software development for these systems is what kind of core should be used for a program block or task in an HMCC system. In modern compilers this solution is done by the programmer.
Another approach consists in changing the affiliation of task or processes, or threads, or blocks of the program with sets of different type cores automatically on the scheduler level. In this context, a lot of different techniques have been proposed. Various types of approaches for optimizing usage of HMCCS have been proposed. One direction is devoted to maximization of performance of HMCCS, and another direction is related with performance optimization inside established power
consumption budget, and so on. However, there is still a need for a more efficient execution of programs on HMCCS.
SUMMARY OF THE INVENTION
The objective of the present invention is to provide a task scheduler and a method for task scheduling, wherein the task scheduler and the method overcome one or more problems of the prior art.
In particular, an objective of the present invention can include increasing the efficiency of using computational systems with heterogeneous multi-core (HMC) architectures which comprises at least two types of cores.
A first aspect of the invention provides a task scheduler for scheduling a plurality of tasks on a multi-core processor comprising a set of slow cores and a set of fast cores, the task scheduler comprising:
a timing unit configured to compare a slow core runtime of at least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks, and
a task assigning unit configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.
In general, a slow core runtime of a task is the runtime of the task on a core of the set of slow cores. The slow core runtime can be an estimate of the runtime on the slow core runtime, in particular it can be an estimated minimum or maximum runtime on a core of the slow cores. The fast core runtime can be defined correspondingly.
In embodiments of the invention, each application is considered as a set of tasks and a special task diagram describes this set of tasks, the hierarchy of tasks in the set and the sequence of task execution.
Each task diagram is divided into levels in hierarchical order. Each lower level corresponds to tasks which are dependent on data only of higher level tasks. The runtimes of tasks are compared with each other on a same level basis. That is, the execution time of a task not belonging to the critical path is compared with the execution times of tasks on the critical path within the same level in the task diagram. With other words, the timing unit is configured to compare a slow core runtime of at
least one candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task.
The one or more critical path tasks, whose runtimes are compared with the runtime of the candidate task, not being on the critical path, are the tasks on the same levels of the critical path than the candidate task. With other words, the one or more critical path tasks are on one or more levels of the critical path, the levels corresponding to the level of the candidate task.
The one or more critical path tasks on one or more levels of the critical path that correspond to the candidate task can comprise the range of tasks on the critical path which are operating in the same time range with the candidate task.
By assigning the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, the method of the first aspect ensures that the execution of the candidate task does not prolong the runtime of the entire program.
On the other hand, by assigning the candidate task to the set of slow cores if the slow core runtime of the candidate task is no longer than a fast core runtime of the one or more critical path tasks, the method of the first aspect ensures that tasks are preferably assigned to the slow cores, thus saving energy consumption and leaving the set of fast cores available for the execution of more urgent tasks.
In a first implementation of the apparatus according to the first aspect, the task scheduler further comprises:
a graph construction unit configured to construct a task graph of the plurality of tasks, and
- a path finding unit configured to determine the critical path of the task graph.
Thus, the task scheduler can have as an input the program code (which in embodiments can be in source code form or in compiled, binary form) and derive, using the graph construction unit and the path finding unit, the necessary information for scheduling the tasks of the program.
In other words, the task scheduler of the first implementation can be configured to have as input a program code, which defines a plurality of tasks, and derive (as
output) a scheduling for these tasks.
A task graph can comprise a set of vertexes connected by ribs. In a preferred embodiment, the ribs are empty of latencies, because latencies are included into the duration of the appropriate tasks. Also vertexes in the task diagram in contrast with task graph contains multiply data as follows: t^v), t2(v), pj(v) and p2(v). Here t^v) denotes the duration of task v on fast set cores, t2(v) denotes the duration of task v on slow set cores, pi(v) denotes the power consumption of task v on fast set cores, p2(v) denotes the power consumption of task v on slow set cores.
In alternative embodiments, also in accordance with the present invention, the task scheduler can be configured to obtain the task graph and the critical path of the task graph as input from an external unit. For example, the task graph can be determined during compilation of the program.
In a second implementation of the apparatus according to the first aspect, the task scheduler further comprises a power computation unit configured to determine a power consumption gain of assigning a candidate task to the set of slow cores, wherein the task assigning unit is configured to assign candidate tasks in an order of decreasing power consumption gain.
Thus, the task scheduler itself is configured to determine the power consumption gain. This means that the task scheduler can be independent of other devices and has fewer requirements that other units providing information regarding the tasks to be executed.
In a third implementation of the apparatus according to the first aspect, the power computation unit is configured to determine the power consumption gain as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
This represents a particularly simple and efficient way of computing a power consumption gain.
In a fourth implementation of the apparatus according to the first aspect, the task scheduler further comprises a preliminary execution unit configured to determine a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
This represents a practical way of computing the power consumption gain. In
embodiments of the invention, the preliminary execution unit is configured to determine the slow core and/or fast core runtime before the execution of a program. For example, the task scheduler can be configured to determine the slow core and/or fast core runtime of the tasks of a program during the installation of the program.
A second aspect of the invention relates to a processor comprising a set of fast cores, a set of slow cores and a task scheduler according to the first aspect of the invention or one of its implementations.
According to this aspect, the task scheduler can be integrated into the processor. For example, the task scheduler can be integrated into the hardware of the processor. This has the advantage that external components do not need to be modified in order to achieve the performance gain.
A third aspect of the invention relates to a method for scheduling a plurality of tasks on a processor comprising a set of fast cores and a set of slow cores, the method comprising:
comparing a slow core runtime of a candidate task that is not on a critical path with a fast core runtime of one or more critical path tasks, and
if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
The methods according to the third aspect of the invention can be performed by the task scheduler according to the first aspect of the invention. Further features or implementations of the method according to the third aspect of the invention can perform the functionality of the task scheduler according to the first aspect of the invention and its different implementation forms.
In a first implementation of the method of the third aspect, the method further comprises initial steps of:
constructing a task graph of the plurality of tasks, and
determining the critical path of the task graph.
Thus, it is possible that the task graph is not previously determined, but determined e.g. by the task scheduler. If the structure of the task graph depends e.g. on some decisions that are done after compile-time, the method can determine the task graph at a later point, e.g. at runtime.
In a second implementation of the method of the third aspect, the method further comprises:
for at least two candidate tasks: determining a power consumption gain of assigning the candidate task to the set of slow cores, and
- assigning the at least two tasks in an order of decreasing power consumption gain.
In a third implementation of the method of the third aspect, the power consumption gain is determined as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
In a fourth implementation of the method of the third aspect, the method further comprises an initial step of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
In a fifth implementation of the method of the third aspect, the preliminary runs are carried out for collecting information on task execution time and latency by executing the candidate task on different sets of cores, and wherein the slow core runtime and/or the fast core runtime are determined based on the collected information.
If the information on task execution time and latency are not provided (e.g. by the compiler), the task scheduler can thus determine the required information by carrying out the preliminary runs. This can involve additional computation time, but can still lead to a reduction of overall computation time, in particular for long execution times of a program.
A fourth aspect of the invention refers to a computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of the third aspect or one of the implementations of the third aspect.
BRIEF DESCRIPTION OF THE DRAWINGS
To illustrate the technical features of embodiments of the present invention more clearly, the accompanying drawings provided for describing the embodiments are introduced briefly in the following. The accompanying drawings in the following description are merely some embodiments of the present invention, but modifications on these embodiments are possible without departing from the scope of the present
invention as defined in the claims.
FIG 1 is a block diagram illustrating a task scheduler in accordance with an embodiment of the present invention,
FIG 2 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention,
FIG 3 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention, FIG. 4 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention,
FIG. 5 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention, FIG. 6 is a schematic diagram illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention, and
FIG. 7 is a flow chart illustrating a method for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.
Detailed Description of the Embodiments
FIG 1 is a block diagram illustrating a task scheduler 100 in accordance with an embodiment of the present invention. The task scheduler 100 comprises a timing unit 110 and a task assigning unit 120. Further, the task scheduler 100 can optionally, as indicated with dashed lines in FIG 1 comprise a graph construction unit 130, a path finding unit 140, a power computation unit 150, and a preliminary computation unit 160.
In embodiments of the invention, the task scheduler 100 can be implemented as part of a processor (not shown in FIG 1) or can be implemented in a hardware device that is located outside the processor.
FIG 2 is a flow chart illustrating a method 200 for scheduling a plurality of tasks in accordance with a further embodiment of the present invention.
The method 200 comprises a step 210 of comparing a slow core runtime of a candidate task that is not on the critical path with a fast core runtime of one or more critical path tasks on one or more levels of the critical path that correspond to the
candidate task.
The method comprises a further step 220 of, if the slow core runtime of the candidate task is longer than a fast core runtime of the one or more critical path tasks, assigning the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
As shown with dashed lines in FIG. 2, the method optionally further comprises three initial or preliminary steps: A first initial step 202 of construction a task graph of the plurality of tasks, a second initial step 204 of determining a critical path of the task graph, and a third initial step 206 of determining a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
In embodiments of the invention, the method steps are carried out in the order as shown in FIG. 2. However, in other embodiments of the invention, the method steps can be carried out in a different order.
FIG 3 is a schematic diagram which illustrates the problem addressed by the task scheduler and method of the present invention.
Shown in FIG 3 are a plurality of tasks, a first, second and third task 310, 320, 330 that are on a critical path 305, and a candidate task 340. The tasks 310, 320, 330 are allocated to a set of fast cores 302. For the candidate task 340, the task scheduler should decide whether to assign it to the set of fast cores 302 or a set of slow cores 304.
Here t; is the time of execution on cores of i-type, (wherein i is 1 or 2) and pi is the power consumption on cores of i-type (i=l for fast cores and i=2 for slow cores).
The time of program execution corresponds to the longest path (critical path) through the task graph, evaluated on task execution times. The performance of the program is the inverse value of the program execution time. To maximize the performance of the program means minimizing the program execution time or minimizing the critical path of the task diagram. A minimal value of a critical path corresponds to execution of tasks of critical path on cores of the fast set. All other tasks (not included into the critical path) should migrate among sets to minimize power consumption (this facility is denoted by the "?" sign on the diagram shown on FIG 3) Now let us consider the problem of maximization of performance and
minimization of power consumption. We will solve this problem step by step. At the first step we construct the solution with maximal performance and at the second step keeping maximal performance value, we will minimize power consumption.
Let us assume we found the critical path K, e.g. as determined by a critical path finding unit as described above. After that, all tasks are divided or organized into levels with respect to tasks of the critical path. Each lower level corresponds to tasks which are dependent on data only of higher level tasks. This is illustrated by the example shown in FIG 4.
In FIG. 4, first task 410, fourth task 440, fifth task 450 and sixth task 460 are located on a critical path, indicated by dashed line 405, wherein the tasks are assigned to a set of fast or hot cores, 402. A second task 420 and a third task 430 are located outside of the critical path, but on the same level as the fourth task 440, indicated as "Level 2" in FIG. 4. Second task 420 and third task 430 are considered as candidate tasks in the following.
When searching the critical path 405, all operations are provided on the set of fast cores, because only by this way one can get the maximal performance of considering HMCCS.
Then the critical path is fixed and all tasks belonging to it are affiliated with the set of fast cores. Let us consider tasks on intermediate levels that are not belonging to the critical path. Since the goal now is to minimize power consumption, it is checked:
Is it possible to affiliate the second task 420 and the third task 430 with the set of slow cores without extending the total runtime.
In the following equations, A, B, C, D, and E denotes the first task, the second task,the third task, the fourth task, and the fifth task, respectively.
The second task B and the third task C can be affiliated with the set of slow cores, without exceeding the total runtime, and hence, without the loss of performance if
t2(C)≤ti(D)
and
t2(B) < t,(D) + t,(E) .
The first inequality according to task diagram is valid only for tasks of the same level, namely Level 2, but the second inequality according to the task diagram
shown on FIG 4 is valid for the tasks of the range of levels, namely Level 2 and Level 3, because the second task 420 (task B) operates not only on one level, but on a few.
In the case presented in FIG 4, the second and third tasks 420, 430 should be affiliated with the set of slow cores.
In FIG 5 an example is shown, where a plurality of tasks 510, 540, 550, and
560 are located on a critical path 505 and for a second and third task 520, 530 it needs to be decided whether to assign these to the set of slow cores 504 or the set of fast cores 502. The potential placement of the second and third task on the set of cold cores is indicated with reference numbers 520' and 530'.
In the example of FIG 5, the migration of the third task 530 to the slow set of cores is available, but migration of the second task 520 is not available. In this case
and
t2(C) > t1(D) .
Since the critical path is not changing the performance keeps its maximal value.
Otherwise migration of any task into the slow set leads to the decreasing or minimization of power consumption. In this case, the power consumption will decrease by the following value
Pprofit(Level2) = p^C) - p2(C) .
FIG 6 shows a similar example, where a plurality of tasks 610, 640, 650, and
660 are located on a critical path 605 and for a second and third task 620, 630 it needs to be decided whether to assign these to the set of slow cores 604 or the set of fast cores 602. The potential placement of the second and third task on the set of cold cores is indicated with reference numbers 620' and 630'. In this case the following inequalities are valid
t2(B)≤t1(D)
and
t2(C)≤ti(D) + ti(E) .
Here the second inequality is also valid in the range of levels 2 and 3.
In this case the effect of decreasing power consumption will be greater than in the previous example and be equal to the following value
pproflt(Level2) = Pl(B) - p2(B) + Pl(C) - p2(C)
The order of migration is not essential to get a better result in terms of minimization of power consumption. But for example, it is better that, if
p1(B) - p2(B)≥p1(C) - p2(C) ,
then the second task 620 migrates before the first, otherwise the third task 630 migrates the first.
FIG 7 is a flow chart of an example method for migrating tasks, wherein the task B is belonging to only one fixed level fixed among the set of fast cores and the set of slow cores.
In a first step 702, a list of candidate tasks L is provided to the task scheduler. In a second step 704, the task scheduler sorts the list L in order of decreasing power consumption profit (computed e.g. as pi - p2). The result is stored in an ordered list Lj.
In a third step 706, a task D, which is on a (e.g. previously determined) critical path is taken from the list and, in step 708, put "into the hot pull", i.e. assigned to the fast set of cores.
In step 710, it is checked whether D is the last task on the data layer. If so, there are no more tasks to process and the method stops in step 722.
If there are more tasks to process, the method proceeds in step 712 and takes task B (the first task in ordered list L\). In step 714, the condition
ti(D)≥t2(B)
is checked. If the condition is fulfilled, the method proceeds with step 716 and puts B into the "cold pull", i.e., assigns it to the set of slow cores. If the condition is not fulfilled, the method proceeds in step 718 and task B is put into the "hot pull", i.e., it is assigned to the set of fast cores.
In step 720 it is checked whether task B is the last task in the ordered list Li. If so, the method ends in step 722. Otherwise, the method continues with step 724, taking task B as the next task in the ordered list Li .
If task B is belonging to a few levels the control inequality in this algorithm should be changed to the more complicated inequality as follows
t1(D) + t1(E) + ... + t1(S) > t2(B) .
Here D, E, ...,S is the range of tasks of critical path which are operating in the same time range with task B.
The foregoing descriptions are only implementation manners of the present
invention, the protection of the scope of the present invention is not limited to this.
Any variations or replacements can be easily made through person skilled in the art.
Therefore, the protection scope of the present invention should be subject to the protection scope of the attached claims.
To summarize, embodiments of the present invention comprise mapping tasks to sets of cores. Preliminary can be performed to collect information of execution times on different type core and appropriate power consumptions. After that it is possible to construct task diagram, evaluate the critical path on this diagram corresponding to the maximal value of performance, split this diagram on levels and on each level solve the problem of migration of tasks that are not belonging to the critical path. Potentially, these can be assigned to the set of slow cores, thus reducing overall power consumption.
In embodiments of the invention, the method can comprise further steps. Let us consider heterogeneous multi-core computing system consists of Ci,c2r en cores of the fast set type (with high energy consumption and high performance) and ck+j , ck+2 , c„ cores of the slow set type (with low energy consumption and low performance), totally n cores. Now let us consider how to bind tasks in complicated software with processor cores of different sets:
1. The static monitoring of HMCCS is provided. In result we evaluate the time of execution of all tasks on different cores tl and t2 and the appropriate values of power consumptions.
2. The task diagram is constructed.
3. We evaluate the critical path on the task diagram suggesting that all evaluations are provided on the fast set cores. This defines the maximal performance on the considering HMCCS.
4. We divide the task diagram on levels starting from initial node down to the last node.
5. On all intermediate levels we solve migration problem for tasks not belonging to the critical path at the data level. Tasks of critical path always are affiliated with fast set cores.
6. An evaluation of the power consumption is provided according to the tasks affiliation with the sets of cores. A gain in power consumption can be reached if
even only one task is affiliated with the slow set cores. If many tasks are affiliated with slow set cores the power consumption profit will be essentially greater.
7. All tasks are executed according to the migration among fast and slow set
cores.
Effects of a method in accordance with the present invention can include that a HMCCS performance is improved and/or the power consumption is decreased.
Here we think about one of the most common goals - minimal completion time (another one - maximal throughput - is not considered here). A method can solve an optimization problem in order to minimize total completion time of each particular application. This can include finding an optimal mapping of tasks to cores that will make completion time reach its potential minimum and simultaneously decrease the power consumption of HMCCS as much as it is possible.
Furthermore, with a task scheduler or method in accordance with the present invention there is considerable less effort for the developer to develop parallel applications for heterogeneous hardware. This results in making the process of developing parallel application for HMCCS hardware easier. Finally, it leads to decrease in labor costs of either software developing or effective porting existing code to specific architecture.
Embodiments of the present invention can be used in a system with signal processors of SoC type in which the same software is running permanently. Thus, a particularly high power saving is achieved.
The system of exploiting heterogeneous multi-core architecture with functionally different performance and power consumption cores. Aspects of the present invention can involve:
• Preliminary static estimation of time of execution and power consumption on a set of fast cores and a set of slow cores.
• Usage of a task diagram for designing an performance-energy efficient scheduler for heterogeneous multi-core devices
· Evaluating the critical path in on a task diagram
• Leveling the task diagram to provide the maximal profit in power
consumption
Evaluation of power consumption on obtained task distribution among sets of cores according to the task diagram to minimize power consumption and simultaneously keep the value of maximal performance.
Claims
Task scheduler (100) for scheduling a plurality of tasks (310-340; 410-460; 510-560; 610-660) on a multi-core processor comprising a set of slow cores (304; 504; 604) and a set of fast cores (302; 402; 502; 602), the task scheduler comprising:
a timing unit (110) configured to compare a slow core runtime of at least one candidate task (340; 420, 430; 520, 530; 620, 630) that is not on a critical path (305; 405; 505; 605) with a fast core runtime of one or more critical path tasks, and
a task assigning unit (120) configured to assign the candidate task to the set of fast cores if the slow core runtime of the candidate task is longer than the fast core runtime of the one or more critical path tasks, and otherwise to assign the candidate task to the set of slow cores.
2. The task scheduler of claim 1, further comprising:
a graph construction unit (130) configured to construct a task graph of the plurality of tasks,
a path finding unit (140) configured to determine the critical path of the task graph.
The task scheduler of claim 1 or 2, further comprising a power computation unit (150) configured to determine a power consumption gain of assigning a candidate task to the set of slow cores, wherein the task assigning unit is configured to assign candidate tasks in an order of decreasing power consumption gain.
The task scheduler of one of the previous claims, wherein the power computation unit is configured to determine the power consumption gain as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores. The task scheduler of one of the previous claims, further comprising a preliminary execution unit (160) configured to determine a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
A processor, comprising a set of fast cores (304; 504; 604), a set of slow cores
(302; 402; 502; 602) and a task scheduler (100) according to one of claims 1 to 5.
Method for scheduling a plurality of tasks (310-340; 410-460; 510-560; 610- 660) on a processor comprising a set of fast cores (302; 402; 502; 602) and a set of slow cores (304; 504; 604), the method comprising:
comparing (210) a slow core runtime of a candidate task that is not on a critical path (305; 405; 505; 605) with a fast core runtime of one or more critical path tasks, and
if the slow core runtime of the candidate task is longer than the fast core runtime of the one or more critical path tasks, assigning (220) the task to the set of fast cores, otherwise assigning the task to the set of slow cores.
The method of claim 7, further comprising initial steps of:
constructing (202) a task graph of the plurality of tasks, and determining (204) a critical path of the task graph.
The method of claim 7 or 8, further comprising:
for at least two candidate tasks: determining a power consumption gain of assigning the candidate task to the set of slow cores, and
assigning the at least two tasks in an order of decreasing power consumption gain.
The method of claim 7 to 9, wherein the power consumption gain is determined as the difference between a power consumption of the candidate task on the set of fast cores and a power consumption of the candidate task on the set of slow cores.
The method of one of claims 7 to 10, further comprising an initial step of determining (206) a slow core runtime and/or a fast core runtime of the candidate task by carrying out one or more preliminary runs of the candidate task.
The method of claim 11, wherein the preliminary runs are carried out for collecting information on task execution time and latency by executing the candidate task on different set of cores, and wherein the slow core runtime and/or the fast core runtime are determined based on the collected information.
A computer-readable storage medium storing program code, the program code comprising instructions for carrying out the method of one of claims 7 to 12.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201580083785.6A CN108139929B (en) | 2015-10-12 | 2015-10-12 | Task scheduling apparatus and method for scheduling a plurality of tasks |
PCT/RU2015/000664 WO2017065629A1 (en) | 2015-10-12 | 2015-10-12 | Task scheduler and method for scheduling a plurality of tasks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/RU2015/000664 WO2017065629A1 (en) | 2015-10-12 | 2015-10-12 | Task scheduler and method for scheduling a plurality of tasks |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017065629A1 true WO2017065629A1 (en) | 2017-04-20 |
Family
ID=55967386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/RU2015/000664 WO2017065629A1 (en) | 2015-10-12 | 2015-10-12 | Task scheduler and method for scheduling a plurality of tasks |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108139929B (en) |
WO (1) | WO2017065629A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170371720A1 (en) * | 2016-06-23 | 2017-12-28 | Advanced Micro Devices, Inc. | Multi-processor apparatus and method of detection and acceleration of lagging tasks |
CN111198757A (en) * | 2020-01-06 | 2020-05-26 | 北京小米移动软件有限公司 | CPU kernel scheduling method, CPU kernel scheduling device and storage medium |
CN114691326A (en) * | 2022-03-16 | 2022-07-01 | 中国船舶重工集团公司第七一一研究所 | Multi-task scheduling method, multi-core processor and machine-side monitoring system |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102641520B1 (en) * | 2018-11-09 | 2024-02-28 | 삼성전자주식회사 | System on chip including multi-core processor and task scheduling method thereof |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004171234A (en) * | 2002-11-19 | 2004-06-17 | Toshiba Corp | Task allocation method in multiprocessor system, task allocation program and multiprocessor system |
US20070143759A1 (en) * | 2005-12-15 | 2007-06-21 | Aysel Ozgur | Scheduling and partitioning tasks via architecture-aware feedback information |
JP2012504800A (en) * | 2008-10-03 | 2012-02-23 | ザ ユニバーシティ オブ シドニー | Scheduling applications for execution in heterogeneous computing systems |
US8887163B2 (en) * | 2010-06-25 | 2014-11-11 | Ebay Inc. | Task scheduling based on dependencies and resources |
CN102193826B (en) * | 2011-05-24 | 2012-12-19 | 哈尔滨工程大学 | Method for high-efficiency task scheduling of heterogeneous multi-core processor |
CN103399626B (en) * | 2013-07-18 | 2016-01-20 | 国家电网公司 | Towards Parallel application dispatching system and the method for the power-aware of hybrid compute environment |
CN103473134B (en) * | 2013-09-23 | 2016-08-17 | 哈尔滨工程大学 | A kind of dependence task dispatching method of heterogeneous multi-nucleus processor |
US9858115B2 (en) * | 2013-10-30 | 2018-01-02 | Mediatek Inc. | Task scheduling method for dispatching tasks based on computing power of different processor cores in heterogeneous multi-core processor system and related non-transitory computer readable medium |
US20150121387A1 (en) * | 2013-10-30 | 2015-04-30 | Mediatek Inc. | Task scheduling method for dispatching tasks based on computing power of different processor cores in heterogeneous multi-core system and related non-transitory computer readable medium |
WO2015130262A1 (en) * | 2014-02-25 | 2015-09-03 | Hewlett-Packard Development Company, L.P. | Multiple pools in a multi-core system |
CN103984595A (en) * | 2014-05-16 | 2014-08-13 | 哈尔滨工程大学 | Isomerous CMP (Chip Multi-Processor) static state task scheduling method |
CN104598310B (en) * | 2015-01-23 | 2017-08-08 | 武汉理工大学 | Low-power consumption scheduling method based on FPGA portion Dynamic Reconfigurable Technique Module Division |
CN104849698B (en) * | 2015-05-21 | 2017-04-05 | 中国人民解放军海军工程大学 | A kind of radar signal method for parallel processing and system based on heterogeneous multi-core system |
-
2015
- 2015-10-12 WO PCT/RU2015/000664 patent/WO2017065629A1/en active Application Filing
- 2015-10-12 CN CN201580083785.6A patent/CN108139929B/en active Active
Non-Patent Citations (3)
Title |
---|
CATHERINE H GEBOTYS ET AL: "Power M inimization in Heterogeneous Processing", 1 January 1996 (1996-01-01), XP055277523, Retrieved from the Internet <URL:https://www.computer.org/csdl/proceedings/hicss/1996/7324/00/73240330.pdf> [retrieved on 20160602] * |
SENG J S ET AL: "Reducing power with dynamic critical path information", MICROARCHITECTURE, 2001. MICRO-34. PROCEEDINGS. 34TH ACM/IEEE INTERNAT IONAL SYMPOSIUM ON DEC. 1-5, 2001, PISCATAWAY, NJ, USA,IEEE, 1 December 2001 (2001-12-01), pages 114 - 123, XP010583676, ISBN: 978-0-7965-1369-4, DOI: 10.1109/MICRO.2001.991110 * |
YU-KWONG KWOK ET AL: "Static scheduling algorithms for allocating directed task graphs to multiprocessors", ACM COMPUTING SURVEYS, ACM, NEW YORK, NY, US, US, vol. 31, no. 4, 1 December 1999 (1999-12-01), pages 406 - 471, XP002461554, ISSN: 0360-0300, DOI: 10.1145/344588.344618 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170371720A1 (en) * | 2016-06-23 | 2017-12-28 | Advanced Micro Devices, Inc. | Multi-processor apparatus and method of detection and acceleration of lagging tasks |
US10592279B2 (en) * | 2016-06-23 | 2020-03-17 | Advanced Micro Devices, Inc. | Multi-processor apparatus and method of detection and acceleration of lagging tasks |
CN111198757A (en) * | 2020-01-06 | 2020-05-26 | 北京小米移动软件有限公司 | CPU kernel scheduling method, CPU kernel scheduling device and storage medium |
CN111198757B (en) * | 2020-01-06 | 2023-11-28 | 北京小米移动软件有限公司 | CPU kernel scheduling method, CPU kernel scheduling device and storage medium |
CN114691326A (en) * | 2022-03-16 | 2022-07-01 | 中国船舶重工集团公司第七一一研究所 | Multi-task scheduling method, multi-core processor and machine-side monitoring system |
Also Published As
Publication number | Publication date |
---|---|
CN108139929B (en) | 2021-08-20 |
CN108139929A (en) | 2018-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9811389B2 (en) | Task assignment for processor cores based on a statistical power and frequency model | |
CN112306678B (en) | Method and system for parallel processing of algorithms based on heterogeneous many-core processor | |
CN112396172A (en) | Method and apparatus for managing power of deep learning accelerator system | |
US8752036B2 (en) | Throughput-aware software pipelining for highly multi-threaded systems | |
TWI516908B (en) | Apparatus, method, and system for improving power performance efficiency by coupling a first core type with a second core type | |
US20150046679A1 (en) | Energy-Efficient Run-Time Offloading of Dynamically Generated Code in Heterogenuous Multiprocessor Systems | |
US20170123775A1 (en) | Compilation of application into multiple instruction sets for a heterogeneous processor | |
US8893104B2 (en) | Method and apparatus for register spill minimization | |
WO2017065629A1 (en) | Task scheduler and method for scheduling a plurality of tasks | |
chul Jung et al. | Dynamic code mapping for limited local memory systems | |
JP2018503184A (en) | System and method for dynamic temporal power steering | |
JP6464982B2 (en) | Parallelization method, parallelization tool, in-vehicle device | |
US10162679B2 (en) | Method and system for assigning a computational block of a software program to cores of a multi-processor system | |
CN108885546B (en) | Program processing method and device based on heterogeneous system | |
WO2022048191A1 (en) | Method and apparatus for reusable and relative indexed register resource allocation in function calls | |
Padoin et al. | Managing power demand and load imbalance to save energy on systems with heterogeneous CPU speeds | |
US10846086B2 (en) | Method for managing computation tasks on a functionally asymmetric multi-core processor | |
CN103593220A (en) | OPENCL compilation | |
Youn et al. | A spill data aware memory assignment technique for improving power consumption of multimedia memory systems | |
US10025639B2 (en) | Energy efficient supercomputer job allocation | |
CN113886057B (en) | Dynamic resource scheduling method based on analysis technology and data stream information on heterogeneous many-core | |
CN114610494A (en) | Resource allocation method, electronic device and computer-readable storage medium | |
US20110296140A1 (en) | RISC processor register expansion method | |
KR102022972B1 (en) | Runtime management apparatus for heterogeneous multi-processing system and method thereof | |
JP2011209846A (en) | Multiprocessor system and task allocation method for the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15860017 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15860017 Country of ref document: EP Kind code of ref document: A1 |