Nothing Special   »   [go: up one dir, main page]

CN113296788B - Instruction scheduling method, device, equipment and storage medium - Google Patents

Instruction scheduling method, device, equipment and storage medium Download PDF

Info

Publication number
CN113296788B
CN113296788B CN202110650043.4A CN202110650043A CN113296788B CN 113296788 B CN113296788 B CN 113296788B CN 202110650043 A CN202110650043 A CN 202110650043A CN 113296788 B CN113296788 B CN 113296788B
Authority
CN
China
Prior art keywords
node
instruction
scheduling
preset
directed acyclic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110650043.4A
Other languages
Chinese (zh)
Other versions
CN113296788A (en
Inventor
殷闻强
王瑶池
张圣铭
陈扬洋
陈光胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eastsoft Microelectronics Co ltd
Original Assignee
Shanghai Eastsoft Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eastsoft Microelectronics Co ltd filed Critical Shanghai Eastsoft Microelectronics Co ltd
Priority to CN202110650043.4A priority Critical patent/CN113296788B/en
Publication of CN113296788A publication Critical patent/CN113296788A/en
Application granted granted Critical
Publication of CN113296788B publication Critical patent/CN113296788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4434Reducing the memory space required by the program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The application provides an instruction scheduling method, an instruction scheduling device, a storage medium and a program product. Firstly, dividing a plurality of instructions into a plurality of basic blocks according to a preset dividing rule, then determining a directed acyclic graph of each basic block according to data dependency relations among the instructions in each basic block, and obtaining the data dependency graph of each basic block. And scheduling the nodes in each directed acyclic graph based on a preset scheduling algorithm until the node ordering of each directed acyclic graph is obtained. Because each instruction in the same basic block has a data dependency relationship, the size of the capacity of a memory occupied by a program can be effectively reduced based on node ordering obtained after the directed acyclic graph is scheduled according to a preset scheduling algorithm, the number of requirements on the memory and registers is reduced, the phenomenon of 'register overflow' is effectively avoided, the required development technical threshold and cost are lower, and the realizability is stronger.

Description

Instruction scheduling method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for instruction scheduling.
Background
In the field of computer technology, a compiler is a computer program that translates source code into computer executable programs that are composed of a plurality of instructions that can be regarded as a basic block of instructions that are executed in a certain order.
For some computer systems, the memory space is very limited, and thus there is a strict requirement for the size of the program capacity. For example, the number of registers in an embedded system is usually small, and the instruction uses the registers as operands, so when the registers are insufficient to store the variables in the program, the corresponding values of the registers are temporarily stored in the memory, that is, a "register overflow" phenomenon occurs. The occurrence of "register overflow" tends to increase the frequency of accessing the memory by the processor, which results in a decrease in program execution efficiency and an increase in program code space.
To avoid the occurrence of the "register overflow" phenomenon, the code needs to be optimized to reduce the space occupied by the program, however, there are a plurality of problems for optimizing the source code. For example, the technical threshold for optimizing source code is often high, so that not only is there a high professional requirement for source code developers, but also a compiler and an instruction set need to be understood in depth, and thus a great deal of manpower is usually required to perform program analysis to modify the source code for optimization. It can be seen that an effective solution is needed to avoid the occurrence of the "register overflow" phenomenon.
Disclosure of Invention
The application provides an instruction scheduling method, an instruction scheduling device, a storage medium and a program product, which are used for providing the instruction scheduling method so as to reduce the number of demands on registers and memories and avoid the occurrence of a register overflow phenomenon.
In a first aspect, the present application provides an instruction scheduling method, including:
dividing a plurality of instructions into a plurality of basic blocks according to a preset dividing rule, wherein the preset dividing rule comprises a preset starting instruction type and a preset ending instruction type, the preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, and the function call instruction does not comprise a specified function call instruction;
determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block, wherein each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one by one;
and scheduling the nodes in each directed acyclic graph based on a preset scheduling algorithm until the node ordering of each directed acyclic graph is obtained.
In one possible design, after the sorting of the nodes for each directed acyclic graph, the method further includes:
determining the instruction execution sequence of the corresponding basic blocks according to the node ordering of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence so as to complete the ordered execution of the plurality of instructions.
In one possible design, the preset initiation instruction type includes any one of the following:
the first instruction of the function, the target address of the preset instruction and the next instruction after the preset instruction;
the preset instructions comprise unconditional jump instructions and conditional jump instructions.
In one possible design, the determining the directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block includes:
determining the data dependency relationship among the instructions in each basic block to generate a User chain of a node corresponding to each instruction, and generating the data dependency graph of each basic block;
obtaining the directed acyclic graph of each basic block according to the User chains of the corresponding nodes in each basic block;
The data dependency graph of each basic block comprises a node set and a directed edge set, wherein each node subset in the node set is used for representing the data dependency relationship of each node, each directed edge subset in the directed edge set is used for representing constraint data of the data dependency relationship of each node, and the constraint data is represented by a weight value.
In one possible design, the data dependency includes: any one or more of alias analysis dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
In one possible design, the scheduling nodes in each directed acyclic graph based on a preset scheduling algorithm includes:
aiming at the schedulable node in each directed acyclic graph, determining the node corresponding to the ending instruction as the first completed scheduling node in the current directed acyclic graph;
scheduling other schedulable nodes except the current completed scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current completed scheduling node;
the preset scheduling algorithm comprises the preset counting rule and the preset priority scheduling rule.
In one possible design, the scheduling of the schedulable nodes other than the current completed scheduling node according to the preset count rule, the preset priority scheduling rule, and the User chain of the completed scheduling node includes:
obtaining a node to be selected according to the User chain of the current completed scheduling node, and updating the count value of the node to be selected according to the preset counting rule and the constraint data corresponding to the node to be selected so as to generate a Next chain;
deleting the current completed scheduling node and the data dependency relationship of the current completed scheduling node in the current directed acyclic graph to obtain a new current directed acyclic graph;
repeating the steps on the new current directed acyclic graph based on the preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are completely scheduled;
the preset priority scheduling rule is used for representing that the node to be selected with the largest calculated value in the Next chain is scheduled preferentially.
In one possible design, when the preferential scheduling is performed, if the number of nodes to be selected with the largest count value in the Next chain is greater than 1, merging the User chain of the nodes to be selected into the Next chain to obtain the lengths of the merged chains, and performing the preferential scheduling on the nodes to be selected corresponding to the minimum chain length.
In one possible design, after completing the scheduling for each schedulable node of each directed acyclic graph, further comprising:
the order in which the schedulable nodes in each directed acyclic graph complete scheduling is determined as the node ordering for each directed acyclic graph.
In a second aspect, the present application provides an instruction scheduling apparatus, including:
the first processing module is used for dividing the plurality of instructions into a plurality of basic blocks according to a preset dividing rule, wherein the preset dividing rule comprises a preset starting instruction type and a preset ending instruction type, the preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, and the function call instruction does not comprise a specified function call instruction;
the second processing module is used for determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block and obtaining the data dependency graph of each basic block, and each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one by one;
and the third processing module is used for scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until the node ordering of each directed acyclic graph is obtained.
In one possible design, the instruction scheduling apparatus further includes: a fourth processing module; the fourth processing module is configured to:
determining the instruction execution sequence of the corresponding basic blocks according to the node ordering of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence so as to complete the ordered execution of the plurality of instructions.
In one possible design, the preset initiation instruction type includes any one of the following:
the first instruction of the function, the target address of the preset instruction and the next instruction after the preset instruction;
the preset instructions comprise unconditional jump instructions and conditional jump instructions.
In one possible design, the second processing module is specifically configured to:
determining the data dependency relationship among the instructions in each basic block to generate a User chain of a node corresponding to each instruction, and generating the data dependency graph of each basic block;
obtaining the directed acyclic graph of each basic block according to the User chains of the corresponding nodes in each basic block;
the data dependency graph of each basic block comprises a node set and a directed edge set, wherein each node subset in the node set is used for representing the data dependency relationship of each node, each directed edge subset in the directed edge set is used for representing constraint data of the data dependency relationship of each node, and the constraint data is represented by a weight value.
In one possible design, the data dependency includes: any one or more of alias analysis dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
In one possible design, the third processing module includes:
the first processing sub-module is used for determining a node corresponding to the ending instruction as a finished scheduling node of the first one in the current directed acyclic graph aiming at the schedulable node in each directed acyclic graph;
the second processing submodule is used for scheduling other schedulable nodes except the current completed scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current completed scheduling node;
the preset scheduling algorithm comprises the preset counting rule and the preset priority scheduling rule.
In one possible design, the second processing sub-module is specifically configured to:
obtaining a node to be selected according to the User chain of the current completed scheduling node, and updating the count value of the node to be selected according to the preset counting rule and the constraint data corresponding to the node to be selected so as to generate a Next chain;
Deleting the current completed scheduling node and the data dependency relationship of the current completed scheduling node in the current directed acyclic graph to obtain a new current directed acyclic graph;
repeating the steps on the new current directed acyclic graph based on the preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are completely scheduled;
the preset priority scheduling rule is used for representing that the node to be selected with the largest calculated value in the Next chain is scheduled preferentially.
In one possible design, when the preferential scheduling is performed, if the number of nodes to be selected with the largest count value in the Next chain is greater than 1, the second processing sub-module is further configured to:
and merging the User chain of the node to be selected into the Next chain to obtain the length of each merged chain, and carrying out the priority scheduling on the node to be selected corresponding to the minimum chain length.
In one possible design, the instruction scheduling apparatus further includes: a fifth processing module; the fifth processing module is configured to:
the order in which the schedulable nodes in each directed acyclic graph complete scheduling is determined as the node ordering for each directed acyclic graph.
In a third aspect, the present application provides an electronic device, comprising:
a processor; the method comprises the steps of,
a memory for storing a computer program of the processor;
wherein the processor is configured to perform any one of the possible instruction scheduling methods provided in the first aspect via execution of the computer program.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements any one of the possible instruction scheduling methods provided in the first aspect.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements any one of the possible instruction scheduling methods provided in the first aspect.
The application provides an instruction scheduling method, an instruction scheduling device, a storage medium and a program product. Firstly, dividing a plurality of instructions into a plurality of basic blocks according to a preset dividing rule, then determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block, wherein nodes in each directed acyclic graph are in one-to-one correspondence with the instructions in each corresponding basic block. And scheduling the nodes in each directed acyclic graph based on a preset scheduling algorithm until the node ordering of each directed acyclic graph is obtained. According to the instruction scheduling method, a plurality of instructions are divided into a plurality of basic blocks, and then scheduling of the instructions is achieved on the basis of obtaining a directed acyclic graph according to the data dependency relationship of the instructions in each basic block. Because each instruction in the same basic block has a data dependency relationship, the size of the capacity of a memory occupied by a program can be effectively reduced based on the node ordering obtained after the directed acyclic graph is scheduled, the number of the demands on the memory and registers is reduced, the phenomenon of 'register overflow' is effectively avoided, the required development technical threshold and cost are lower, and the realizability is stronger.
Drawings
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a flow chart of an instruction scheduling method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating another instruction scheduling method according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a User chain according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a directed acyclic graph provided by an embodiment of the present application;
FIG. 6 is a flowchart illustrating another instruction scheduling method according to an embodiment of the present disclosure;
FIG. 7 is a flowchart illustrating another instruction scheduling method according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a Next chain and directed acyclic graph according to an embodiment of the present application;
FIG. 9 is a schematic diagram of another Next chain and directed acyclic graph provided by an embodiment of the present application;
fig. 10 is a schematic structural diagram of an instruction scheduling apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of another instruction scheduling apparatus according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, it is merely an example of a method and apparatus that is consistent with some aspects of the present application as detailed in the accompanying claims.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data used in this manner may be interchanged where appropriate to describe embodiments of the application such as those capable of being practiced otherwise than as shown or described. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The memory space of the computer system is very limited, when the register is insufficient to store the variable in the program, the corresponding value of the register is temporarily stored in the memory so as to release the available register, and then the register is transferred back to the register from the memory again when needed, so that the phenomenon of 'register overflow' occurs. This tends to increase the frequency of memory accesses by the processor, which in turn results in inefficient program execution and increased program code space. To avoid the occurrence of the "register overflow" phenomenon, code needs to be optimized to reduce the space occupied by the program. However, there are many problems in optimizing the source code, for example, the technical threshold of optimizing the source code is often high, so that not only is the technical requirement on the developer of the source code high, but also a compiler and an instruction set need to be deeply understood, and thus a great deal of manpower is generally required to perform program analysis to modify the source code to realize optimization. In view of this, it can be seen that an effective solution is needed for avoiding the occurrence of the "register overflow" phenomenon.
In view of the foregoing problems in the prior art, the present application provides a method, an apparatus, a device, a storage medium, and a program product for scheduling instructions. The instruction scheduling method provided by the application has the following inventive concept: firstly, dividing a plurality of instructions into a plurality of basic blocks through a preset dividing rule, wherein the preset dividing rule comprises a preset starting instruction type and a preset ending instruction type, so that after a starting instruction in one basic block is determined, a subsequent instruction can be continuously added into the basic block until the instruction of the preset ending instruction type appears, and the range of the basic block is enlarged. Further, aiming at the data dependency relationship among the instructions in each basic block, the scheduling of each node in the directed acyclic graph is realized based on a preset scheduling algorithm on the basis of obtaining the directed acyclic graph, and the node ordering of each node in the directed acyclic graph is obtained, namely, the scheduling of each instruction in each basic block is realized. Because each instruction with the data dependency relationship is in the same basic block, node ordering is achieved by scheduling, the life cycle of variables in a program is shortened, the capacity of a memory occupied by the program can be effectively reduced, the number of demands on the memory and registers is reduced, the phenomenon of 'register overflow' is effectively avoided, the required development technical threshold and cost are lower, and the realizability is stronger.
In the following, an exemplary application scenario of the embodiments of the present application is described.
Fig. 1 is a schematic view of an application scenario provided in the embodiment of the present application, as shown in fig. 1, a single chip microcomputer (Microcontroller Unit, MCU) 11 may be configured in any terminal capable of implementing corresponding control by executing computer instructions, for example, an air conditioner, a smart meter, a printer, etc., where the computer instructions executed by the MCU 11 may be scheduled by the instruction scheduling method provided in the embodiment of the present application. The instruction scheduling method provided in the embodiment of the present application may be performed by the instruction scheduling apparatus provided in the embodiment of the present application, where the computer 12 may be an electronic device corresponding to the instruction scheduling apparatus provided in the embodiment of the present application, and the processor is configured to execute the instruction scheduling method provided in the embodiment of the present application. Specifically, the multiple computer instructions to be executed by the MCU 11 are firstly divided to obtain multiple basic blocks, then, on the basis of obtaining the directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, the nodes in each directed acyclic graph are scheduled according to a designed preset scheduling algorithm, so as to obtain the node ordering of each directed acyclic graph, that is, the scheduling of the instructions in each basic block is realized. Therefore, when the MCU 11 executes the plurality of computer instructions, the execution process of the instructions can be completed for each computer instruction in each basic block according to the node sequence, the life cycle of variables in the program can be shortened, the capacity of a memory occupied by the program can be effectively reduced, the number of demands on the memory and registers is reduced, and the phenomenon of overflowing of the registers is effectively avoided.
In addition, it should be noted that the above application scenario is merely illustrative, and the instruction scheduling method, apparatus, device, storage medium and program product provided in the embodiments of the present application include, but are not limited to, the above application scenario.
The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 2 is a flow chart of an instruction scheduling method according to an embodiment of the present application. As shown in fig. 2, the present embodiment includes:
s101: dividing the plurality of instructions into a plurality of basic blocks according to a preset dividing rule.
The preset dividing rule comprises a preset starting instruction type and a preset ending instruction type.
The compiler may translate the source code into a computer executable program, for example, for the source code, the compiler may generate corresponding intermediate codes for the input source code, each of the generated intermediate codes may be understood as one instruction, and thus, a plurality of instructions corresponding to the source code may be obtained first according to the source code. Wherein intermediate code may be understood as a different representation of source code, or intermediate representation.
After obtaining a plurality of instructions, dividing the plurality of instructions into a plurality of basic blocks according to a preset dividing rule, wherein each basic block consists of a group of the plurality of instructions with sequence. The method comprises the steps of dividing each basic block into a plurality of basic blocks according to a preset dividing rule, wherein each basic block only comprises an inlet and an outlet, the inlet is a starting instruction of the basic block, and the outlet is an ending instruction of the basic block.
The preset division rule may include a preset start instruction type and a preset end instruction type.
The preset starting instruction type can be the first instruction of the function or the target address of the preset instruction, the preset instruction comprises an unconditional jump instruction and a conditional jump instruction, namely, after the jump instruction is completed, the instruction pointed by the PC (Program Counter) register can also be the next instruction after the conditional jump instruction and the unconditional jump instruction. I.e. the preset initiation instruction type comprises any of the above listed instructions or target addresses.
The preset ending instruction type may be an unconditional jump instruction, a conditional jump instruction, a function call instruction, or a last instruction of a function, where the function call instruction does not include a specified function call instruction. The specified function call instruction refers to a partial system library function or a function having a certain attribute. Thus, the preset end instruction type includes any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction, and a last instruction of a function.
For example, for a plurality of instructions corresponding to the source code, determining an instruction conforming to a preset start instruction type as a start instruction of a basic block to be divided, then incorporating other instructions into the basic block until an instruction conforming to a preset end instruction type appears, determining an instruction conforming to a preset end instruction type as an end instruction of the basic block, and completing division of a basic block. The preset ending instruction type does not include a specified function call instruction, so that the ending basic block can further continue to incorporate instructions, and when the basic block is divided, a plurality of adjacent basic blocks can be combined into a larger basic block through a preset dividing rule in the instruction scheduling method provided by the embodiment of the application, so that the range of the basic block is enlarged. And because the instruction can only be scheduled in the basic blocks, if the basic blocks where the instructions are located before being divided by adopting the preset dividing rule are regarded as small basic blocks by expanding the range of the basic blocks, the instruction scheduling can be realized across a plurality of original small basic blocks after being divided by adopting the preset dividing rule, and the variable life cycle when the source code is executed is further facilitated to be shortened.
S102: and determining the directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block.
The nodes in each directed acyclic graph are in one-to-one correspondence with the instructions in each corresponding basic block.
After dividing the basic blocks of a plurality of instructions, determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block for each basic block, wherein each instruction consists of one operation operator and 0 to a plurality of operands, thus, according to the operation operators and the operands in each instruction, the data dependency relationship among each other can be determined, and the directed acyclic graph (Directed Acyclic Graph, DAG) corresponding to each basic block can be formed. Each node in each formed directed acyclic graph corresponds to each instruction in the corresponding basic block one by one, and the edges of the directed acyclic graph are constraint data of the data dependency relationship between the instructions corresponding to adjacent nodes, and the constraint data is represented by weight values.
When more than one operand is needed in each instruction, different operands have different data dependency relationships, if two instructions have data dependency relationships on a plurality of operands, constraint data exist between the data dependency relationships, and the corresponding constraint data can be represented through weight values.
In addition, the data dependency between instructions is determined by the data dependency between the operands of the instructions. For example, the data dependencies of the operands may be any one or more of alias analysis dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies, which embodiments of the present application are not limited in this regard.
The alias analysis dependency is a dependency relationship for determining whether a memory location may have multiple access modes, and if it is not analyzed or determined that a memory location may be accessed in multiple modes, it indicates that the sequence of two consecutive instructions is not changeable, that is, a node corresponding to a subsequent instruction points to a node corresponding to a previous instruction.
Use-Def chain dependencies refer to an instruction that will only value one operand. If the same operand is used directly without the operand being fixed, the value obtained when the operand is read is uncertain, and therefore the command to fix the operand must be executed before the instruction using the operand.
In Use chain dependency, the same operand may be used in multiple instructions, so that multiple instructions with the operand may be concatenated by the same operand to form a corresponding User chain.
Determining the data dependency relationship among the instructions in each basic block, generating a User chain of the node corresponding to each instruction, and further forming a directed acyclic graph to perform instruction scheduling, so that a plurality of instructions with the same operand can be executed together, and further, the access operation to the memory can be reduced.
When the directed acyclic graph of each basic block is determined, the data dependency relationship among the instructions in each basic block can be recorded in the form of a data dependency graph.
In one possible design, a possible implementation manner of the step S102 is shown in fig. 3, and fig. 3 is a flow chart of another instruction scheduling method provided in the embodiment of the present application. As shown in fig. 3, the present embodiment includes:
s1021: and determining the data dependency relationship among the instructions in each basic block to generate a User chain of the node corresponding to each instruction.
S1022: and obtaining a directed acyclic graph of each basic block according to the User chain of each node corresponding to each basic block, and generating and recording a data dependency graph of each basic block.
The data dependency relationship among the instructions can be determined based on the operation operators and the operands of the instructions in each basic block, and the data dependency relationship among all nodes related to the nodes corresponding to the instructions is represented by a User chain, namely a User chain of the nodes corresponding to the instructions is generated. Furthermore, a directed acyclic graph corresponding to each basic block can be constructed according to a User chain of the node corresponding to each instruction in each basic block.
Meanwhile, in order to facilitate recording of the data dependency relationship among the instructions in each basic block, a data dependency graph may be generated to record the data dependency relationship in the form of the data dependency graph. The data dependency graph may be represented by g= (I, E), where g= (I, E) represents a node set, E represents a directed edge set, and each node in the directed acyclic graph corresponding to each basic block has a data dependency relationship that is each subset of the node set (I), i.e., each node subset. Constraint data of data dependency relationship of each node in the directed acyclic graph corresponding to each basic block is each subset of the directed edge set (E), namely each directed edge subset, for example, constraint data of data dependency relationship of nodes i6 and i1 in the directed acyclic graph shown in fig. 3 is 2.
Table 1 lists the instructions in a basic block (i.e., the instructions corresponding to the nodes i0 to i 10), fig. 4 shows a User chain corresponding to the nodes (i 0 to i 10) generated by determining the data dependency relationship between the instructions in the basic block shown in table 1, and fig. 4 is a schematic diagram of a User chain provided in the embodiment of the present application. On the basis of fig. 4, a directed acyclic graph of the basic block shown in fig. 5 can be obtained, and fig. 5 is a schematic diagram of a directed acyclic graph according to an embodiment of the application.
TABLE 1
Under the condition that the instructions corresponding to the nodes in the basic block are known, as in the case of the instructions corresponding to the nodes i0 to i10 in table 1, according to the operation operators and operands constituting each instruction, the data dependency relationship between the instructions corresponding to the nodes can be determined, so as to represent the data dependency relationship through the User chain of the node corresponding to each instruction as shown in fig. 4. The value corresponding to W in fig. 4 is a weight value, and is used to represent constraint data between nodes (i.e., instructions corresponding to the nodes). For example, as shown in fig. 3, the User chain of the node i1 is the node i0, and constraint data between the nodes i0 and i1 is represented by a weight value W, which takes on a value of 1. In addition, the User chains of nodes i0, i2, and i5 are all empty. On the basis of the obtained fig. 4, a directed acyclic graph of the basic blocks corresponding to table 1 shown in fig. 5 can be constructed. As shown in fig. 5, each node of the directed acyclic graph corresponds to each instruction of the basic block one by one, i.e., nodes i0 to i10 correspond one by one to the corresponding instructions as shown in table 1. Edges connecting the directed acyclic graph are constraint data of data dependencies between nodes having data dependencies.
According to the instruction scheduling method provided by the embodiment of the application, for each basic block, a User chain of a node corresponding to each instruction is obtained based on the data dependency relationship among the instructions in each basic block, and then a directed acyclic graph of each basic block is constructed based on the User chain, so that instruction scheduling is performed based on the directed acyclic graph, and the node sequence of each basic block is determined.
S103: and scheduling the nodes in each directed acyclic graph based on a preset scheduling algorithm until the node ordering of each directed acyclic graph is obtained.
After the directed acyclic graph of each basic block is obtained, scheduling each node by using a preset scheduling algorithm provided by the embodiment of the application to obtain the node ordering of each directed acyclic graph. The preset scheduling algorithm is used for representing a scheduling principle and a scheduling mode which are followed when each node in each directed acyclic graph is scheduled.
The node ordering of each directed acyclic graph refers to the node order determined after each node in each directed acyclic graph is scheduled.
In one possible design, a possible implementation of this step S103 is shown in fig. 6. Fig. 6 is a flowchart of another instruction scheduling method according to an embodiment of the present application. As shown in fig. 6, the present embodiment includes:
s1031: and determining the node corresponding to the ending instruction as the first completed scheduling node in the current directed acyclic graph according to the schedulable nodes in each directed acyclic graph.
When each node in each directed acyclic graph is scheduled by a preset scheduling algorithm, a bottom-up mode or a top-down mode can be adopted. The nodes which can be scheduled are required to meet the requirement that the ingress degree is 0, i.e. the nodes are not relied on by other nodes, and the nodes meeting the condition in each directed acyclic graph are defined as schedulable nodes. The position of the node corresponding to the end instruction in the basic block cannot be changed no matter in a bottom-up mode or in a top-down mode, namely, the end instruction after the node is scheduled is still the end instruction of the basic block before the node is scheduled, and the positions of other instructions can be changed before and after the node is scheduled. The data dependency relationships corresponding to the bottom-up scheduling mode and the top-down scheduling mode are opposite, namely, the directions of the directed edge sets (E) are opposite.
It should be noted that, in the embodiment of the present application, a bottom-up scheduling manner is adopted in the process of determining that the data dependency relationship forms the directed acyclic graph, so the embodiment shown in fig. 6 performs scheduling in a bottom-up manner. Specifically, for each schedulable node in the directed acyclic graph, a node corresponding to the end instruction may be first selected, and the node may be determined to be the first completed scheduling node in the current directed acyclic graph being scheduled.
It can be understood that, the scheduling of the instruction can be regarded as ordering the nodes corresponding to the instruction, the order of selecting the nodes is the order of scheduling, and the process of selecting the nodes is the process of scheduling the nodes.
Continuing with the directed acyclic graph of FIG. 5 as an example, if the bottom-up scheduling is performed, it can be seen from Table 1 that the instruction corresponding to node i10 is an end instruction, and thus node i10 is the first completed scheduling node in the current directed acyclic graph of FIG. 5.
S1032: and scheduling other schedulable nodes except the current completed scheduling node according to the preset counting rule, the preset priority scheduling rule and the User chain of the current completed scheduling node.
After the current completed scheduling node is obtained, scheduling other schedulable nodes except the current completed scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current completed scheduling node. In other words, other schedulable nodes are selected one by one according to the preset counting rule, the preset priority scheduling rule and the User chain of the currently completed scheduling node, so that the scheduling of the schedulable nodes is completed. It is understood that the preset scheduling algorithm includes a bottom-up or top-down scheduling manner, a preset count rule, and a preset priority scheduling rule.
Specifically, a possible implementation manner of step S1032 is shown in fig. 7, and fig. 7 is a flow chart of another instruction scheduling method according to an embodiment of the present application. As shown in fig. 7, the present embodiment includes:
s201: and obtaining a node to be selected according to the User chain of the current completed scheduling node, and updating the count value of the node to be selected according to a preset count rule and constraint data corresponding to the node to be selected so as to generate a Next chain.
The node to be selected is a schedulable node with a data dependency relationship with the current completed scheduling node in a User chain of the current completed scheduling node. For example, if the currently completed scheduling node in step S1031 is node i10, it may be determined that the node having a data dependency relationship with i10 is i9 according to the User chain of each node shown in fig. 4, where i9 is the currently obtained node to be selected.
And then, updating the count value of the node to be selected according to a preset count rule and constraint data corresponding to the node to be selected so as to generate a Next chain.
The node to be selected is taken as the head of the Next chain, and the count value updated by the preset count rule is taken as the tail of the Next chain, so that the current Next chain is generated.
In addition, the preset count rule may be represented by the following relation (1):
Count =Count+W(1)
wherein Count Refers to the nodes to be selected obtained after updatingCount value, count, W, constraint data of data dependency relationship between the node to be selected and the node to be selected, i.e. edges of the node to be selected and the node to be selected in the directed acyclic graph.
In addition, it should be noted that, the initial count value of all schedulable nodes in the current directed acyclic graph is 0, and the node to be selected is a node in the Next chain.
S202: and deleting the current completed scheduling node and the data dependency relationship of the current completed scheduling node in the current directed acyclic graph to obtain a new current directed acyclic graph.
And deleting the data dependency relationship between the current completed scheduling node and the current completed scheduling node from the current directed acyclic graph after the current completed scheduling node is obtained, so as to obtain a new current directed acyclic graph.
S203: repeating the steps for the new current directed acyclic graph based on a preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are completely scheduled.
The preset priority scheduling rule is used for representing that the node to be selected with the largest calculated value in the Next chain is subjected to priority scheduling.
After obtaining the new current directed acyclic graph and the Next chain, repeating steps S201 to S203 on the new current directed acyclic graph based on a preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are completely scheduled.
When Next is not null, the preset priority scheduling rule is to perform priority scheduling on the node to be selected with the largest calculated value in the Next chain. Specifically, when the preset priority scheduling rule is used for priority scheduling, when the number of nodes to be selected in the Next chain is one, the node to be selected in the Next chain is the current scheduling node. And when the priority scheduling is performed, if the number of the nodes to be selected with the largest calculated value in the Next chain is greater than 1, merging the User chain of the nodes to be selected into the Next chain, and then obtaining the lengths of the merged chains so as to perform the priority scheduling on the nodes to be selected corresponding to the shortest chain length, namely, performing the priority scheduling on the nodes to be selected corresponding to the shortest chain length, so as to determine the nodes to be selected as the current completed scheduling nodes.
In one possible design, after all schedulable nodes in each directed acyclic graph have completed scheduling by the embodiments shown in fig. 6 and 7, the order in which all schedulable nodes in each directed acyclic graph completed scheduling is determined as the node ordering for each directed acyclic graph.
The specific implementation of the embodiment shown in fig. 6 and 7 is described in detail below in connection with the instructions of each node (i 0 to i 10) corresponding to one basic block shown in table 1, the User chain of each node shown in fig. 4, and the directed acyclic graph of the basic block shown in fig. 5.
First, according to the instructions shown in table 1, the instruction corresponding to the node i10 is an end instruction, so the node i10 is the first completed scheduling node, in other words, the node i10 is the current completed scheduling node.
Then, according to the User chain of each node shown in fig. 4, the node to be selected is i9 and constraint data W between the node i9 and the node i10 is 1 according to the User chain of the node i10 which is the currently completed scheduling node, the count value of the node i9 is updated based on the initial value of the count value of the node i9 being 0 and the preset count rule, then the node i9 to be selected is used as the head, the updated count value (0+1=1, i.e. C: 1) is used as the tail, and a Next chain shown in fig. 8 (a) is obtained, and fig. 8 is a schematic diagram of a Next chain and a directed acyclic graph provided in the embodiment of the present application. Further, the current directed acyclic graph, that is, the current completed scheduling node i10 and the data dependency relationship of the node i10 are deleted from fig. 5, to obtain a new current directed acyclic graph as shown in (b) in fig. 8. And as the number of the nodes i9 to be selected in the current Next chain is one, the node i9 is the current completed scheduling node.
Further, the corresponding steps of generating a Next chain, obtaining a new current directed acyclic graph, and following a preset priority scheduling rule are repeatedly performed to continue scheduling other schedulable nodes (i.e., nodes other than node i10 and node i 9).
The Next chain generated according to the User chain of the node i9 is shown in fig. 9 (a), and fig. 9 is a schematic diagram of another Next chain and a directed acyclic graph according to an embodiment of the application. The new current directed acyclic graph resulting from the deletion of the completed schedule node i9 and its data dependencies is shown in fig. 9 (b). The count values of the nodes i6 and i8 to be selected in the Next chain are equal at this time, so the User chains of the nodes i6 and i8 to be selected are combined into the Next chain according to the priority scheduling, and the chain lengths of the two are respectively obtained, namely, the User chain of the node i6 to be selected and the Next chain are combined and comprise the node i6, the node i8 and the node i1, the chain length is 3, the User chain of the node i8 to be selected and the Next chain are combined and comprise the node i6, the node i8, the node i7 and the node i4, the chain length is 4, namely, the chain length of the former is the shortest chain length, and therefore, the node i6 to be selected is scheduled preferentially, namely, the node i6 is the current scheduling node.
Repeatedly executing the corresponding steps of generating a Next chain, obtaining a new current directed acyclic graph and following a preset priority scheduling rule to continue to other schedulable nodes, wherein the completed scheduling nodes which are sequentially obtained are node i1, node i0, node i8, node i7, node i5, node i4, node i3 and node i2. So far, all schedulable nodes in the directed acyclic graph are completely scheduled. Further, the order (i 10-i9-i6-i1-i0-i8-i7-i5-i4-i3-i 2) in which all schedulable nodes (node i0 through node i 10) in the directed acyclic graph complete scheduling is determined as the node ordering of the directed acyclic graph.
As can be seen from the description of the embodiments above, after the node ordering of each directed acyclic graph is obtained, the scheduling of the instruction corresponding to the node of each directed acyclic graph is completed. According to the instruction scheduling method provided by the embodiment of the application, the intermediate codes of the source codes are subjected to instruction scheduling, the number of instructions is not reduced, and only the sequence of the instructions is recombined, so that when the target codes of the specific target machine are generated based on the instructions corresponding to the intermediate codes, the life cycle of variables is shortened, the number of the demands on registers is reduced, the possibility of occurrence of the phenomenon of 'register overflow' is reduced, and the phenomenon of 'register overflow' is avoided.
According to the instruction scheduling method provided by the embodiment of the application, a plurality of instructions are firstly divided into a plurality of basic blocks, and then scheduling of the instructions is realized on the basis of obtaining a directed acyclic graph according to the data dependency relationship of the instructions in each basic block. Because each instruction in the same basic block has a data dependency relationship, when the instructions are placed in the same basic block and are executed according to node ordering obtained after scheduling, the capacity of a memory occupied by a program can be effectively reduced, the quantity of requirements on a memory and registers is reduced, the phenomenon of 'register overflow' is effectively avoided, the required development technical threshold and cost are lower, and the realizability is stronger.
In one possible design, after the node ordering of each directed acyclic graph is obtained, the instruction execution order of the basic blocks corresponding to the directed acyclic graph may also be determined according to the node ordering. If a bottom-up mode is adopted when each completed scheduling node of each directed acyclic graph is obtained, that is, when the first completed scheduling node is the node corresponding to the ending instruction, the node ordering of the directed acyclic graph is the reverse order of the instruction execution sequence of the basic block corresponding to the directed acyclic graph. Otherwise, if a top-down mode is adopted when each completed scheduling node of each directed acyclic graph is obtained, that is, when the first completed scheduling node is the node corresponding to the starting instruction, the node ordering of the directed acyclic graph is the instruction execution sequence of the basic block corresponding to the directed acyclic graph. After the instruction execution sequence is obtained, each instruction in each basic block is executed in the instruction execution sequence, so that the ordered execution of a plurality of instructions corresponding to the source code is completed.
The advantages of the instruction scheduling method provided in the embodiments of the present application cannot be well quantified due to the different numbers of available registers of different architectures. Assuming an unlimited number of pseudo registers on the target, the variables in the instruction are allocated to the pseudo registers, the greater the number of pseudo registers used, the greater the hardware resources required. Thus, the number of statistical instruction save variables can be used as a measure to evaluate the instruction scheduling method provided by the embodiments of the present application as follows. Taking the instruction fragments shown in the above table 1 as an example, the survival periods of variables corresponding to the instruction scheduling performed by using the instruction scheduling method provided in the embodiment of the present application and the instruction scheduling performed by not using the instruction scheduling method are shown in the following tables 2 and 3 respectively:
TABLE 2
TABLE 3 Table 3
It can be seen from table 2 that the number of variables to be saved is the largest when the instruction of node i5 is executed, and 6 variables are required to be saved. If the number of registers in the target is insufficient to hold these 6 variables, a "register overflow" will occur which will increase Load/Store instructions and increase memory usage, so that the occurrence of "register overflow" is minimized. The instruction scheduling method provided by the embodiment of the application can reduce the probability of 'register overflow', and the effect of the variable life cycle after use is shown in the table 3. It can be seen from table 3 that after instruction scheduling is performed by using the instruction scheduling method provided in the embodiment of the present application, at most 4 variables need to be saved for each instruction. Compared with the variable corresponding to the unused instruction scheduling in the table 2, the demands on the register and the memory are greatly reduced, and the life cycle of the variable becomes as short as possible, so that the limited register resource can be fully utilized in a mode of multiplexing the register, and the utilization rate of the register is improved.
Fig. 10 is a schematic structural diagram of an instruction scheduling apparatus according to an embodiment of the present application. As shown in fig. 10, the instruction scheduling apparatus 300 provided in this embodiment includes:
the first processing module 301 is configured to divide the plurality of instructions into a plurality of basic blocks according to a preset division rule.
The preset dividing rule comprises a preset starting instruction type and a preset ending instruction type.
The preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, wherein the function call instruction does not comprise a specified function call instruction.
And the second processing module 302 is configured to determine a directed acyclic graph of each basic block according to a data dependency relationship between instructions in each basic block, and obtain a data dependency graph of each basic block.
The nodes in each directed acyclic graph are in one-to-one correspondence with the corresponding instructions in each basic block;
and a third processing module 303, configured to schedule the nodes in each directed acyclic graph based on a preset scheduling algorithm until a node ordering of each directed acyclic graph is obtained.
Fig. 11 is a schematic structural diagram of another instruction scheduling apparatus according to an embodiment of the present application on the basis of fig. 10. As shown in fig. 11, the instruction scheduling apparatus 300 provided in this embodiment further includes: a fourth processing module 304, the fourth processing module 304 configured to:
Determining the instruction execution sequence of the corresponding basic blocks according to the node ordering of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence so as to complete the ordered execution of a plurality of instructions.
In one possible design, the preset initiation instruction type includes any one of the following:
the first instruction of the function, the target address of the preset instruction and the next instruction after the preset instruction; the preset instructions comprise unconditional jump instructions and conditional jump instructions.
In one possible design, the second processing module 302 is specifically configured to:
and determining the data dependency relationship among the instructions in each basic block to generate a User chain of the node corresponding to each instruction, and generating a data dependency graph of each basic block.
And obtaining the directed acyclic graph of each basic block according to the User chain of each node corresponding to each basic block.
The data dependency graph of each basic block comprises a node set and a directed edge set, wherein node subsets in the node set are used for representing data dependency relations of each node, directed edge subsets in the directed edge set are used for representing constraint data of the data dependency relations of each node, and the constraint data are represented by weight values.
In one possible design, the data dependencies include: any one or more of alias analysis dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
In one possible design, the third processing module 303 includes:
and the first processing sub-module is used for determining the node corresponding to the ending instruction as the finished scheduling node of the first one in the current directed acyclic graph aiming at the schedulable node in each directed acyclic graph.
And the second processing sub-module is used for scheduling other schedulable nodes except the current completed scheduling node according to the preset counting rule, the preset priority scheduling rule and the User chain of the current completed scheduling node.
The preset scheduling algorithm comprises a preset counting rule and a preset priority scheduling rule.
In one possible design, the second processing sub-module is specifically configured to:
and obtaining a node to be selected according to the User chain of the current completed scheduling node, and updating the count value of the node to be selected according to a preset count rule and constraint data corresponding to the node to be selected so as to generate a Next chain.
And deleting the current completed scheduling node and the data dependency relationship of the current completed scheduling node in the current directed acyclic graph to obtain a new current directed acyclic graph.
Repeating the steps for the new current directed acyclic graph based on a preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are completely scheduled.
The preset priority scheduling rule is used for representing that the node to be selected with the largest calculated value in the Next chain is subjected to priority scheduling.
In one possible design, when performing the preferential scheduling, if the number of nodes to be selected with the largest count value in the Next chain is greater than 1, the second processing sub-module is further configured to:
and merging the User chain of the node to be selected into the Next chain to acquire the length of each merged chain, and carrying out priority scheduling on the node to be selected corresponding to the minimum chain length.
In one possible design, the instruction scheduling apparatus 300 further includes: and a fifth processing module. The fifth processing module is configured to:
and determining the scheduling completion sequence of each schedulable node in each directed acyclic graph as the node ordering of each directed acyclic graph.
It should be noted that, the instruction scheduling apparatus provided in the foregoing fig. 10 and 11 and the optional embodiment may be used to execute the corresponding steps of the instruction scheduling method provided in any one of the foregoing embodiments, and the specific implementation manner and technical effects are similar, and are not repeated here.
The above embodiments of the apparatus provided in the present application are merely illustrative, where the module division is merely a logic function division, and other division manners may be implemented in practice. For example, multiple modules may be combined or may be integrated into another system. The coupling of the individual modules to each other may be achieved by means of interfaces, such as electrical communication interfaces, without excluding the possibility of mechanical interfaces or other forms of interfaces.
Fig. 12 is a schematic structural diagram of an electronic device provided in the present application. As shown in fig. 12, the electronic device 400 may include: at least one processor 401 and a memory 402. Fig. 12 shows an electronic device using one processor as an example.
A memory 402 for storing a computer program of the processor 401. In particular, the program may include program code including computer-operating instructions.
Memory 402 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 401 is configured to execute a computer program stored in the memory 402 to implement the steps of the instruction scheduling method in the above method embodiments.
The processor 401 may be a central processing unit (central processing unit, abbreviated as CPU), or an application specific integrated circuit (application specific integrated circuit, abbreviated as ASIC), or one or more integrated circuits configured to implement embodiments of the present application.
Alternatively, the memory 402 may be separate or integrated with the processor 401. When the memory 402 is a device separate from the processor 401, the electronic apparatus 400 may further include:
bus 403 for connecting processor 401 and memory 402. The bus may be an industry standard architecture (industry standard architecture, abbreviated ISA) bus, an external device interconnect (peripheral component, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. Buses may be divided into address buses, data buses, control buses, etc., but do not represent only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 402 and the processor 401 are integrated on a chip, the memory 402 and the processor 401 may complete communication through an internal interface.
The present application also provides a computer-readable storage medium, which may include: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, etc., and may store a program code, and in particular, the computer readable storage medium stores a computer program, and when the computer program is executed by at least one processor of an electronic device, the electronic device executes the steps of the instruction scheduling method provided in the foregoing embodiments.
Embodiments of the present application also provide a computer program product comprising a computer program stored in a readable storage medium. The computer program may be read from a readable storage medium by at least one processor of an electronic device, and executed by the at least one processor, causes the electronic device to implement the steps of the instruction scheduling method provided by the various embodiments described above.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (18)

1. An instruction scheduling method, comprising:
dividing a plurality of instructions into a plurality of basic blocks according to a preset dividing rule, wherein the preset dividing rule comprises a preset starting instruction type and a preset ending instruction type, the preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, and the function call instruction does not comprise a specified function call instruction;
determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block, wherein each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one by one;
scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until node ordering of each directed acyclic graph is obtained;
the determining the directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block comprises the following steps:
Determining the data dependency relationship among the instructions in each basic block to generate a User chain of the node corresponding to each instruction;
obtaining the directed acyclic graph of each basic block according to the User chain of each node corresponding to each basic block, and generating the data dependency graph of each basic block;
the data dependency graph of each basic block comprises a node set and a directed edge set, wherein each node subset in the node set is used for representing the data dependency relationship of each node, each directed edge subset in the directed edge set is used for representing constraint data of the data dependency relationship of each node, and the constraint data is represented by a weight value.
2. The method of instruction scheduling according to claim 1, further comprising, after said ordering of nodes from which each directed acyclic graph is derived:
determining the instruction execution sequence in the corresponding basic blocks according to the node ordering of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence so as to complete the ordered execution of the plurality of instructions.
3. The instruction scheduling method according to claim 1, wherein the preset starting instruction type includes any one of the following:
The first instruction of the function, the target address of the preset instruction and the next instruction after the preset instruction;
the preset instructions comprise unconditional jump instructions and conditional jump instructions.
4. The instruction scheduling method of claim 1, wherein the data dependency relationship comprises: any one or more of alias analysis dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
5. The method for scheduling instructions according to claim 1, wherein the scheduling nodes in each directed acyclic graph based on a preset scheduling algorithm comprises:
aiming at the schedulable node in each directed acyclic graph, determining the node corresponding to the ending instruction as the first completed scheduling node in the current directed acyclic graph;
scheduling other schedulable nodes except the current completed scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current completed scheduling node;
the preset scheduling algorithm comprises the preset counting rule and the preset priority scheduling rule.
6. The method according to claim 5, wherein the scheduling of other schedulable nodes than the current completed scheduling node according to a preset count rule, a preset priority scheduling rule, and a User chain of the completed scheduling node comprises:
Obtaining a node to be selected according to the User chain of the current completed scheduling node, and updating the count value of the node to be selected according to the preset counting rule and the constraint data corresponding to the node to be selected so as to generate a Next chain;
deleting the current completed scheduling node and the data dependency relationship of the current completed scheduling node in the current directed acyclic graph to obtain a new current directed acyclic graph;
repeating the steps on the new current directed acyclic graph based on the preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are completely scheduled;
the preset priority scheduling rule is used for representing that the node to be selected with the largest calculated value in the Next chain is scheduled preferentially.
7. The method for scheduling instructions according to claim 6, wherein when performing the preferential scheduling, if the number of nodes to be selected with the largest count value in the Next chain is greater than 1, merging the User chain of the nodes to be selected into the Next chain to obtain the lengths of the merged chains, and performing the preferential scheduling on the nodes to be selected corresponding to the shortest chain length.
8. The method of instruction scheduling according to claim 6, further comprising, after said scheduling is completed for all schedulable nodes in each directed acyclic graph:
the order in which all schedulable nodes in each directed acyclic graph complete scheduling is determined as the node ordering for each directed acyclic graph.
9. An instruction scheduling apparatus, comprising:
the first processing module is used for dividing the plurality of instructions into a plurality of basic blocks according to a preset dividing rule, wherein the preset dividing rule comprises a preset starting instruction type and a preset ending instruction type, the preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, and the function call instruction does not comprise a specified function call instruction;
the second processing module is used for determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block and obtaining the data dependency graph of each basic block, and each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one by one;
the third processing module is used for scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until the node ordering of each directed acyclic graph is obtained;
The second processing module is specifically configured to:
determining the data dependency relationship among the instructions in each basic block to generate a User chain of the node corresponding to each instruction;
obtaining the directed acyclic graph of each basic block according to the User chain of each node corresponding to each basic block, and generating the data dependency graph of each basic block;
the data dependency graph of each basic block comprises a node set and a directed edge set, wherein each node subset in the node set is used for representing the data dependency relationship of each node, each directed edge subset in the directed edge set is used for representing constraint data of the data dependency relationship of each node, and the constraint data is represented by a weight value.
10. The instruction scheduling apparatus of claim 9, wherein the instruction scheduling apparatus further comprises: a fourth processing module; the fourth processing module is configured to:
determining the instruction execution sequence of the corresponding basic blocks according to the node ordering of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence so as to complete the ordered execution of the plurality of instructions.
11. The instruction scheduling apparatus of claim 9, wherein the preset initiation instruction type comprises any one of:
the first instruction of the function, the target address of the preset instruction and the next instruction after the preset instruction;
the preset instructions comprise unconditional jump instructions and conditional jump instructions.
12. The instruction scheduling apparatus of claim 9, wherein the data dependency comprises: any one or more of alias analysis dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
13. The instruction scheduling apparatus of claim 9, wherein the third processing module comprises:
the first processing sub-module is used for determining a node corresponding to the ending instruction as a finished scheduling node of the first one in the current directed acyclic graph aiming at the schedulable node in each directed acyclic graph;
the second processing submodule is used for scheduling other schedulable nodes except the current completed scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current completed scheduling node;
The preset scheduling algorithm comprises the preset counting rule and the preset priority scheduling rule.
14. The instruction scheduling apparatus of claim 13, wherein the second processing sub-module is specifically configured to:
obtaining a node to be selected according to the User chain of the current completed scheduling node, and updating the count value of the node to be selected according to the preset counting rule and the constraint data corresponding to the node to be selected so as to generate a Next chain;
deleting the current completed scheduling node and the data dependency relationship of the current completed scheduling node in the current directed acyclic graph to obtain a new current directed acyclic graph;
repeating the steps on the new current directed acyclic graph based on the preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are completely scheduled;
the preset priority scheduling rule is used for representing that the node to be selected with the largest calculated value in the Next chain is scheduled preferentially.
15. The instruction scheduling apparatus of claim 14, wherein, in performing the priority scheduling, if the number of nodes to be selected with the largest count value in the Next chain is greater than 1, the second processing sub-module is further configured to:
And merging the User chain of the node to be selected into the Next chain to obtain the length of each merged chain, and carrying out the priority scheduling on the node to be selected corresponding to the minimum chain length.
16. The instruction scheduling apparatus of claim 14, wherein the instruction scheduling apparatus further comprises: a fifth processing module; the fifth processing module is configured to:
the order in which the schedulable nodes in each directed acyclic graph complete scheduling is determined as the node ordering for each directed acyclic graph.
17. An electronic device, comprising:
a processor; the method comprises the steps of,
a memory for storing a computer program of the processor;
wherein the processor is configured to perform the instruction scheduling method of any one of claims 1 to 8 via execution of the computer program.
18. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the instruction scheduling method of any one of claims 1 to 8.
CN202110650043.4A 2021-06-10 2021-06-10 Instruction scheduling method, device, equipment and storage medium Active CN113296788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110650043.4A CN113296788B (en) 2021-06-10 2021-06-10 Instruction scheduling method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110650043.4A CN113296788B (en) 2021-06-10 2021-06-10 Instruction scheduling method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113296788A CN113296788A (en) 2021-08-24
CN113296788B true CN113296788B (en) 2024-04-12

Family

ID=77328054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110650043.4A Active CN113296788B (en) 2021-06-10 2021-06-10 Instruction scheduling method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113296788B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114548655A (en) * 2022-01-04 2022-05-27 国能黄骅港务有限责任公司 Method, device, equipment and computer readable storage medium for scheduling bulk cargo wharf
CN117827287A (en) * 2022-09-29 2024-04-05 深圳市中兴微电子技术有限公司 Instruction-level parallel scheduling method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5894576A (en) * 1996-11-12 1999-04-13 Intel Corporation Method and apparatus for instruction scheduling to reduce negative effects of compensation code
CN102830954A (en) * 2012-08-24 2012-12-19 北京中科信芯科技有限责任公司 Method and device for instruction scheduling
CN105843660A (en) * 2016-03-21 2016-08-10 同济大学 Code optimization scheduling method for encoder
CN110333857A (en) * 2019-07-12 2019-10-15 辽宁工程技术大学 A kind of custom instruction automatic identifying method based on constraint planning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10685331B2 (en) * 2015-12-08 2020-06-16 TCL Research America Inc. Personalized FUNC sequence scheduling method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5894576A (en) * 1996-11-12 1999-04-13 Intel Corporation Method and apparatus for instruction scheduling to reduce negative effects of compensation code
CN102830954A (en) * 2012-08-24 2012-12-19 北京中科信芯科技有限责任公司 Method and device for instruction scheduling
CN105843660A (en) * 2016-03-21 2016-08-10 同济大学 Code optimization scheduling method for encoder
CN110333857A (en) * 2019-07-12 2019-10-15 辽宁工程技术大学 A kind of custom instruction automatic identifying method based on constraint planning

Also Published As

Publication number Publication date
CN113296788A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
JP5733860B2 (en) Efficient parallel computation of dependency problems
JP6605573B2 (en) Parallel decision tree processor architecture
CN107871301B (en) Task scheduling in a GPU
EP3066560B1 (en) A data processing apparatus and method for scheduling sets of threads on parallel processing lanes
JP4965995B2 (en) Program processing method, processing program, and information processing apparatus
JP2011527788A5 (en)
JP2004302706A (en) Program parallelization device, program parallelization method, and program parallelization program
EP3908920B1 (en) Optimizing hardware fifo instructions
CN113296788B (en) Instruction scheduling method, device, equipment and storage medium
EP3534266B1 (en) Method, apparatus and system for prefetching data
US9043582B2 (en) Enhanced instruction scheduling during compilation of high level source code for improved executable code
EP4040295A1 (en) Memory bandwidth allocation for multi-tenant fpga cloud infrastructures
Haaß et al. Automatic custom instruction identification in memory streaming algorithms
CN117539548A (en) Method, device, equipment and storage medium for executing instruction based on wait mechanism
US20220300322A1 (en) Cascading of Graph Streaming Processors
US20140013312A1 (en) Source level debugging apparatus and method for a reconfigurable processor
CN113704687B (en) Tensor calculation operation method, device and operation system
JP2008250838A (en) Software generation device, method and program
US9934035B2 (en) Device and method for tracing updated predicate values
Xiao et al. Optimization on operation sorting for HLS scheduling algorithms
CN115004150A (en) Method and apparatus for predicting and scheduling duplicate instructions in software pipelining loops
CN118642826A (en) API calling method and device
CN117313595B (en) Random instruction generation method, equipment and system for function verification
CN116804915B (en) Data interaction method, processor, device and medium based on memory
CN117591242B (en) Compiling optimization method, system, storage medium and terminal based on bottom virtual machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant