WO2024001411A1

WO2024001411A1 - Multi-thread scheduling method and device

Info

Publication number: WO2024001411A1
Application number: PCT/CN2023/087477
Authority: WO
Inventors: 沈洋; 徐金林; 牛新伟; 韩建辉; 李铮
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2022-06-27
Filing date: 2023-04-11
Publication date: 2024-01-04
Also published as: CN117331655A

Abstract

Embodiments of the present application provide a multi-thread scheduling method and device. The method comprises: after each message enters a processor core, sequentially storing a thread number carried by each message in a thread management linked list corresponding to a thread group to which the message belongs, and establishing a mapping relationship between the thread number and a node of the thread management linked list; and according to the mapping relationship and the state of a thread state machine corresponding to each thread, scheduling a target thread in an executable state from the thread group according to the order in which the messages enter the processor core, and inputting the target thread into a pipeline corresponding to the target thread.

Description

Multi-thread scheduling method and device

Related applications

This application claims priority to the Chinese patent application with application number 202210738293.8 filed on June 27, 2022, the entire content of which is incorporated into this application by reference.

Technical field

The embodiments of the present application relate to the technical field of core network processors, and specifically, to a multi-thread scheduling method and device.

Background technique

With the rapid development of communication technology, network processors, as the core component of data forwarding in the field of digital communication, are specifically used in various tasks in the communication field such as packet processing, protocol analysis, route lookup, voice/data aggregation, and firewalls.

In order to adapt to the ever-evolving network technology, higher and higher requirements are placed on the processing capabilities of network processors. Traditional network processors adopt a fine-grained multi-thread structure. While using parallel processing technology to improve the parallelism of micro-engine core data processing, they also use multi-thread switching to hide pipeline and memory delays, thereby improving processor throughput; Fine-grained multi-threading switches between threads once every clock cycle, so that the instruction execution processes of multiple threads are intertwined. This kind of interleaving usually uses the Round-Robin polling scheduling algorithm to schedule ready threads according to serial numbers, which cannot It is guaranteed that the packets that enter the micro-engine first are forwarded first according to the order in which the packets enter the micro-engine. There will be a situation where a thread that is ready and has no pause may be delayed by the execution of other threads, thereby slowing down the execution speed of individual threads.

Contents of the invention

Embodiments of the present application provide a multi-thread scheduling method and device to at least solve the problem in related technologies that it is impossible to ensure that messages entering the micro engine first are forwarded first in the order in which the messages enter the micro engine.

According to an embodiment of the present application, a multi-thread scheduling method is provided, including: after each message enters the processor core, the thread number carried by each message is stored in sequence corresponding to the thread group to which the message belongs. in the thread management linked list, and establish a mapping relationship between the thread number and the node of the thread management linked list;

According to the mapping relationship and the state of the thread state machine corresponding to each thread, the target thread in the executable state is scheduled from the thread group in the order in which messages enter the processor core, and the target thread is Enter the pipeline corresponding to the target thread.

According to another embodiment of the present application, a multi-thread scheduling device is provided, including: a setting module configured to sequentially store the thread number carried by each message into the processor core after each message enters the processor core. in the thread management linked list corresponding to the thread group to which the message belongs, and establish a mapping relationship between the thread number and the node of the thread management linked list; the scheduling module is used to calculate the mapping relationship according to the mapping relationship and the thread state machine corresponding to each thread. status, schedule the target thread in the executable state from the thread group according to the order in which messages enter the processor core, and input the target thread into the pipeline corresponding to the target thread.

According to yet another embodiment of the present application, a computer-readable storage medium is also provided, wherein the computer-readable storage medium A computer program is stored in the medium, wherein the computer program is configured to execute the steps in any of the above method embodiments when running.

According to yet another embodiment of the present application, an electronic device is also provided, including a memory and a processor. A computer program is stored in the memory, and the processor is configured to run the computer program to perform any of the above. Steps in method embodiments.

Through this application, the thread scheduling method is optimized by introducing a thread management linked list to ensure that messages that enter the processor core first are scheduled and executed first. Therefore, it is possible to solve the problem in the related technology that the packets that enter the kernel first are forwarded first according to the order in which the packets enter the kernel, thereby achieving the effect of reducing the execution delay of the packets.

Description of drawings

Figure 1 is a hardware structure block diagram of a computer terminal running the multi-thread scheduling method according to the embodiment of the present application;

Figure 2 is a flow chart of a multi-thread scheduling method according to an embodiment of the present application;

Figure 3 is a structural block diagram of a multi-thread scheduling device according to an embodiment of the present application;

Figure 4 is a structural block diagram of a multi-thread scheduling device according to another embodiment of the present application;

Figure 5 is a structural block diagram of a multi-thread scheduling device according to yet another embodiment of the present application;

Figure 6 is a schematic structural diagram of a coarse-grained multi-thread scheduling device according to an embodiment of the present application;

Figure 7 is a schematic diagram corresponding to threads and thread management linked lists according to an embodiment of the present application;

Figure 8 is a schematic diagram of thread state switching according to an embodiment of the present application;

Figure 9 is a flowchart for executing coarse-grained multi-thread scheduling according to an embodiment of the present application.

Detailed ways

The embodiments of the present application will be described in detail below with reference to the accompanying drawings and in combination with the embodiments.

It should be noted that the terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.

The method embodiments provided in the embodiments of this application can be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking running on a computer terminal as an example, FIG. 1 is a hardware structure block diagram of a computer terminal running the multi-thread scheduling method according to the embodiment of the present application. As shown in Figure 1, the computer terminal may include one or more (only one is shown in Figure 1) processors 102 (the processor 102 may include but is not limited to a microprocessor (Central Processing Unit, MCU) or a programmable logic device (Field Programmable Gate Array, FPGA) and other processing devices) and a memory 104 for storing data, wherein the above-mentioned computer terminal may also include a transmission device 106 for communication functions and an input and output device 108. Persons of ordinary skill in the art can understand that the structure shown in Figure 1 is only illustrative, and it does not limit the structure of the above-mentioned computer terminal. For example, the computer terminal may also include more or fewer components than shown in FIG. 1 , or have a different configuration than shown in FIG. 1 .

The memory 104 can be used to store computer programs, for example, software programs and modules of application software, such as the computer program corresponding to the multi-thread scheduling method in the embodiment of the present application. The processor 102 executes the computer program by running the computer program stored in the memory 104. Various functional applications and data processing implement the above methods. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely relative to the processor 102, and these remote memories may be connected to the computer terminal through a network. Examples of the above networks include, but are not limited to Internet, intranet, local area network, mobile communication network and their combinations.

The transmission device 106 is used to receive or send data via a network. Specific examples of the above-mentioned network may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Radio Frequency, RF for short) module, which is used to communicate with the Internet wirelessly.

This embodiment provides a multi-thread scheduling method running on the above-mentioned computer terminal. Figure 2 is a flow chart of the multi-thread scheduling method according to the embodiment of the present application. As shown in Figure 2, the process includes the following steps:

Step S202: After each message enters the processor core, the thread number carried by each message is sequentially stored in the thread management linked list corresponding to the thread group to which the message belongs, and a node of the thread number and thread management linked list is established. the mapping relationship between;

Step S204: According to the mapping relationship and the state of the thread state machine corresponding to each thread, the target thread in the executable state is scheduled from the thread group according to the order in which messages enter the processor core, and the target thread is The thread input corresponds to the pipeline of the target thread.

Before step S202 in this embodiment, the method further includes: assigning a thread number to each message entering the processor core, and dividing all threads into thread groups corresponding to the number of pipelines.

In step S202 of this embodiment, each thread group corresponds to a thread management linked list, and the number of nodes in each thread management linked list is the same as the number of threads included in each thread group.

In step S202 of this embodiment, the mapping relationship between the nodes of the thread management linked list and the thread numbers is represented by a bitmap.

In step S204 of this embodiment, it includes: calculating the transmission request corresponding to each node according to the value of the bitmap and the readiness status of each thread in the thread group, and performing priority scheduling on the thread with the transmission request. , so that the thread that first enters the processor core and is in the ready state is authorized, converted to the executable state, and the thread in the executable state is scheduled as the target thread; obtains the instruction corresponding to the target thread, And input the target thread into the pipeline corresponding to the target thread to execute the instruction.

In this embodiment, after launching the thread into the corresponding pipeline to execute the instruction, the method further includes: scheduling the message after the instruction has been executed out of the processor core, and releasing the thread corresponding to the message; The thread number of the message is cleared from the node of the thread management linked list, and other thread numbers stored in the node of the thread management linked list are moved forward by one node in sequence.

In this embodiment, each pipeline corresponds to a main control state machine, each thread corresponds to a thread state machine, each pipeline is in two states: idle and authorized, and each thread is in four states: idle, ready, executable and waiting. Transition between states.

In this embodiment, each pipeline transitions between two states: idle and authorized, and each thread transitions between four states: idle, ready, executable, and waiting, including: when the main control state machine is in an idle state, it means Allow new messages to enter the processor core. After the new message enters the processor core, the corresponding thread is in an idle state; the thread number of the message is stored in the node of the thread management linked list, and the thread number associated with the thread is retrieved from the instruction storage module. After the corresponding instruction, the thread transfers from the idle state to the ready state; in the authorization state of the main control state machine, the thread in the ready state is authorized, and the authorized thread transfers from the ready state to the executable state; in After the thread in the executable state executes the corresponding instruction, it transitions from the executable state to the idle state; when the thread in the executable state is waiting for data, table lookup, or re-fetching instructions during the execution of the instruction, the thread is changed from the executable state to the idle state. The execution state transfers to the waiting state; the thread in the waiting state waits for data After the end, or the table lookup result is returned, or the instruction is retrieved and returned, the waiting state is transferred back to the ready state; after the thread number of the thread that has completed the instruction is released, the main control state machine enters the idle state.

Through the above steps, the thread scheduling method is optimized by introducing a thread management linked list to ensure that the packets that enter the processor core first are scheduled and executed first. Therefore, it is possible to solve the problem in the related technology that the packets entering the micro-engine first are forwarded first according to the order in which the packets enter the micro-engine, thereby achieving the effect of reducing the execution delay of the packets.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is Better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a storage medium (such as read-only memory/random access memory). The memory (Read-Only Memory/Random Access Memory, ROM/RAM), magnetic disk, optical disk) includes several instructions to cause a terminal device (which can be a mobile phone, computer, server, or network device, etc.) to execute this application Methods described in various embodiments.

This embodiment also provides a multi-thread scheduling device, which is used to implement the above embodiments and optional implementations. What has been described will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Figure 3 is a structural block diagram of a multi-thread scheduling device according to an embodiment of the present application. As shown in Figure 3, the device includes: a creation module 10 and a scheduling module 20.

The establishment module 10 is used to store the thread number carried by each message into the thread management linked list corresponding to the thread group to which the message belongs after each message enters the processor core, and establish the thread number and thread number. Manage the mapping relationship between nodes in the linked list;

The scheduling module 20 is configured to schedule the target thread in the executable state from the thread group according to the order in which messages enter the processor core according to the mapping relationship and the state of the thread state machine corresponding to each thread. And input the target thread into the pipeline corresponding to the target thread.

Figure 4 is a structural block diagram of a multi-thread scheduling device according to another embodiment of the present application. As shown in Figure 4, in addition to all the modules shown in Figure 3, the device also includes:

The allocation module 30 is configured to allocate a thread number to each message entering the processor core, and divide all threads into thread groups corresponding to the number of pipelines.

In an exemplary embodiment, each thread group corresponds to a thread management linked list, and the number of nodes in each thread management linked list is the same as the number of threads included in each thread group.

Figure 5 is a structural block diagram of a multi-thread scheduling device according to yet another embodiment of the present application. As shown in Figure 5, in addition to all the modules shown in Figure 4, the device also includes:

The release module 40 schedules the message that has completed the execution of the instruction corresponding to the target thread out of the processor core, releases the thread corresponding to the message, and clears the thread number of the message in the node of the thread management linked list. , and move other thread numbers stored in the nodes of the thread management linked list forward by one node in sequence.

In an exemplary embodiment, each pipeline corresponds to a main control state machine, each thread corresponds to a thread state machine, each pipeline is in two states: idle and authorized, and each thread is in idle, ready, and available states. Transition between execution and waiting 4 states.

It should be noted that each of the above modules can be implemented through software or hardware. For the latter, it can be implemented through It can be implemented in the following manner, but is not limited to this: the above-mentioned modules are all located in the same processor; or, the above-mentioned modules are located in different processors in any combination.

Embodiments of the present application also provide a computer-readable storage medium that stores a computer program, wherein the computer program is configured to execute the steps in any of the above method embodiments when running.

In an exemplary embodiment, the computer-readable storage medium may include but is not limited to: U disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM) , mobile hard disk, magnetic disk or optical disk and other media that can store computer programs.

An embodiment of the present application also provides an electronic device, including a memory and a processor. A computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.

In an exemplary embodiment, the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.

For specific examples in this embodiment, reference may be made to the examples described in the above-mentioned embodiments and exemplary implementations, and details will not be described again in this embodiment.

In order to facilitate understanding of the technical solutions provided in this application, a detailed description will be given below with reference to embodiments of specific scenarios.

In related technologies, when a micro-engine receives a new message, it first allocates a thread number to the new message. When allocating the thread number, the priority of the message is not distinguished; the traditional least recently used (Least Recently Used, LRU) scheduling algorithm It can make the most frequently used threads get the highest priority, but when the number of threads is large, there is no guarantee that the packets that enter the kernel first will be executed and forwarded first.

In order to solve the problem that messages that enter the kernel first cannot be executed first, a coarse-grained multi-thread scheduling device is provided in an embodiment of the present application. Figure 6 is a multi-thread scheduling device based on coarse-grainedness according to an embodiment of the present application. A schematic structural diagram of the device is shown in Figure 6. The multi-thread scheduling system includes: a thread scheduling module 11, an instruction storage module 12 and a completion scheduling module 13.

Thread scheduling module 11 is used to allocate thread numbers to new messages, divide all threads into thread groups corresponding to the number of pipeline lines, and each thread group schedules ready executable threads according to the order in which the messages enter, from the instruction storage The instruction corresponding to the thread is obtained in the module, and is launched into the pipeline corresponding to the thread group for execution. After the execution is completed, the completion scheduling module 13 is notified; in this implementation, the thread scheduling module 11 functionally includes the above-mentioned establishment in the embodiment. Functions of module 10, scheduling module 20 and allocation module 30.

The instruction storage module 12 is used to store instructions used for thread execution, including an instruction level 2 cache and an instruction level 1 cache;

The completion scheduling module 13 is used to receive the message execution completion signal sent by the thread scheduling module, schedule the corresponding message out of the kernel and release the thread number information; in this implementation, the completion scheduling module 13 is functionally equivalent to the above embodiment. The function of release module 40 in .

Specifically, in the thread scheduling module 11, a thread management linked list is introduced to manage the thread number information corresponding to the packet entry sequence, and schedule advanced messages and ready executable threads from each thread group to launch into the corresponding Pipeline execution; wherein, each thread group corresponds to a thread management linked list, and the number of linked list nodes is the same as the number of threads contained in each thread group; Figure 7 is a schematic diagram corresponding to threads and thread management linked lists according to an embodiment of the present application, as shown in Figure As shown in Figure 7, 20 threads are divided into 2 thread groups. One thread group contains 10 threads. The corresponding thread management linked list has 10 nodes node0-node9. The above nodes are used to store the thread numbers assigned to incoming packets. Information; after the message enters the kernel, the thread number information it carries is stored in the nodes node0-node9 from left to right. There is an existence between the thread management list node and the thread number. A layer of mapping relationship. The mapping relationship between thread management linked list nodes and thread numbers can be maintained through bitmap. Each thread management linked list has 10 bitmap values corresponding to the nodes; according to the bitmap value and thread The readiness status (rdy) of each thread in the group is calculated to determine the launch request corresponding to each node, and participates in priority scheduling (SP). Executable threads that advance messages and are ready are authorized and launched into the pipeline corresponding to the thread group. .

After executing the respective instructions, the messages are scheduled out of the kernel, the corresponding threads are released, the corresponding thread number information is retrieved in the thread management linked list, the thread number information of the matching node is cleared, and at the same time, the thread number information stored in all nodes to the right of the matching node is The thread number information is shifted one node to the left for storage.

In this embodiment, the above two thread groups correspond to two pipelines respectively; the number of threads in each thread group can be 10 or any other number; the pipeline can be divided into five levels of pipelines or other levels. Running water (e.g., seventh level, etc.).

In this embodiment, the thread scheduling module 11 can also control the state transition of each thread. Each pipeline corresponds to a main control state machine, and each thread corresponds to a thread state machine. As shown in Figure 8, the specific conversion includes the following steps:

First, when the main control state machine is in the idle (IDLE) state, it means that new packets are allowed to enter. When new packets enter the kernel, they are first in the IDLE state;

Second, send an instruction fetch instruction to the instruction storage module, maintain the thread management linked list of the corresponding thread in Figure 7 according to the assigned thread number, and store the thread number information in the corresponding node. After the instruction returns from the instruction storage module, the corresponding thread Transition from IDLE state to rdy state;

Third, several threads in the rdy state in the same thread group perform SP scheduling (GRANT) based on the order of incoming packet threads mapped by each node of the thread management chain, so that the thread with the most advanced packet is authorized;

The authorized thread is transferred from the rdy state to the running (exe) state. Only one thread in each thread group can be authorized at the same time. The two thread groups can schedule the executable thread of the most advanced package from their respective groups and launch it into the corresponding Pipeline (pipeline 0 or pipeline 1) execution;

Fourth, after the thread in the exe state executes the packet sending instruction, the corresponding thread transfers from the exe state to the idle state, the message is scheduled out of the kernel, the thread number information of the matching node in the thread management linked list is retrieved and deleted, and the corresponding thread number is released;

Fifth, when a thread in the exe state finds an instruction with data dependency during the execution of the instruction, or an instruction that returns data dependency from a table lookup, or needs to re-fetch instructions and other situations that require a long wait, the corresponding thread will be changed from the exe state. Transfer to wait state;

Sixth, GRANT authorizes the remaining threads with the most advanced package in the rdy state to enter the exe state. After the data waiting period of the thread that previously transferred to the wait state is completed, or the table lookup data has been returned, or the index is retrieved and returned, the wait state will be restarted. Transfer to rdy state; since the thread management linked list saves its packet entry sequence information, when the thread currently in exe state transfers to other states, the thread transferred from wait state to rdy state can still receive priority scheduling until it completes the execution of the package Send an instruction and the thread changes from exe state to idle state;

Seventh, after releasing the corresponding thread number, the main control state machine enters the IDLE state.

Figure 9 is a flow chart for executing coarse-grained multi-thread scheduling according to an embodiment of the present application. As shown in Figure 9, when a new packet enters, it is first determined whether the thread group is in the IDLE state. If not, the new packet It is necessary to wait until there is an idle thread available for allocation; if so, select a thread i from the idle thread and assign it to the incoming message; then, send an instruction fetch instruction to the instruction storage module. After the instruction fetch returns, thread i is transferred from IDLE The state transfers to rdy state; several threads in rdy state in the same thread group perform SP scheduling (GRANT), so that the thread of the most advanced package obtains authorization GRANTi; after obtaining authorization, thread i transfers to exe state; thread i in exe state is in When an instruction with data dependency is found during instruction execution, or an instruction with table lookup returns data dependency, or when instructions need to be re-fetched and require a long wait, thread i transfers to wait state; after the data waiting cycle is completed, or the table lookup data has been returned, or after re-fetching and returning, thread i will re-enter the rdy state from the wait state. Due to the use of SP scheduling based on the packet entry sequence, waiting for the thread currently in the exe state When the thread transfers to other states, thread i can receive priority scheduling until it completes the execution of the package sending instructions, transfers to the idle state, and releases the thread.

In this embodiment, the main control state machine is in the IDLE state, indicating that new packets are allowed to enter. When a new packet enters the kernel, it is first in the idle state, sends an instruction fetch instruction to the instruction storage module, and maintains the correspondence in Figure 7 according to the assigned thread number. The thread management linked list of the thread stores the thread number information in the corresponding node. After the instruction returns from the instruction storage module, the corresponding thread transfers from the idle state to the rdy state. Several threads in the rdy state in the same thread group combine each thread in the thread management linked list. The order of incoming packet threads obtained by node mapping is performed by SP scheduling (GRANT), so that the thread of the most advanced package is authorized. The authorized thread is transferred from the rdy state to the exe state. Only one thread in each thread group can be authorized at the same time. The two thread groups can schedule the executable thread of the most advanced package from their respective groups and launch it into the corresponding pipeline (pipeline 0 or pipeline 1) for execution. The thread in the exe state executes the package and sends the instruction, and the corresponding thread is transferred from the exe state. In the idle state, the message is scheduled out of the kernel, the thread number information of the matching node in the thread management linked list is retrieved and deleted, and the corresponding thread number is released. The thread in the exe state finds instructions with data correlation during the instruction execution, or returns from the table lookup. When there is a data-related instruction, or when instructions need to be re-fetched and require a long wait, the corresponding thread will be transferred from the exe state to the wait state, and GRANT will authorize the remaining threads with the most advanced package in the rdy state to enter the exe state until the previous transfer. The thread data waiting period in the wait state is completed, or the table lookup data has been returned, or after re-fetching and returning, the wait state is transferred back to the rdy state. Since the thread management linked list saves its packet entry sequence information, when the thread currently in the exe state When the thread transfers to other states, the thread that transfers from the wait state to the rdy state can still receive priority scheduling until it completes the execution of the package sending instructions, the thread transfers from the exe state to the idle state, and the main control state machine enters the IDLE state.

Through the above embodiments of the present application, the coarse-grained multi-thread scheduling method ensures that the packets that enter the kernel first are scheduled first, and are only switched when a costly pause occurs (such as re-fetching, table lookup, etc.) Other threads execute, greatly reducing the possibility of slowing down the execution speed of any message, and reducing the execution delay of any message.

Obviously, those skilled in the art should understand that the above-mentioned modules or steps of the present application can be implemented using general-purpose computing devices, and they can be concentrated on a single computing device, or distributed across a network composed of multiple computing devices. They may be implemented in program code executable by a computing device, such that they may be stored in a storage device for execution by the computing device, and in some cases may be executed in a sequence different from that shown herein. Or the described steps can be implemented by making them into individual integrated circuit modules respectively, or by making multiple modules or steps among them into a single integrated circuit module. As such, the application is not limited to any specific combination of hardware and software.

The above are only optional embodiments of the present application and are not intended to limit the present application. For those skilled in the art, the present application may have various modifications and changes. Any modifications, equivalent replacements, improvements, etc. made within the principles of this application shall be included in the protection scope of this application.

Claims

A multi-thread scheduling method including:

After each message enters the processor core, the thread number carried by each message is stored in the thread management linked list corresponding to the thread group to which the message belongs, and the relationship between the thread number and the node of the thread management linked list is established. Mapping relations;

According to the mapping relationship and the state of the thread state machine corresponding to each thread, the target thread in the executable state is scheduled from the thread group in the order in which messages enter the processor core, and the target thread is Enter the pipeline corresponding to the target thread.
The method according to claim 1, wherein before sequentially storing the thread number carried by each message into the thread management linked list corresponding to the thread group to which the message belongs, it further includes:

Each message entering the processor core is assigned a thread number, and all threads are divided into thread groups corresponding to the number of pipelines.
The method according to claim 1, wherein each thread group corresponds to a thread management linked list, and the number of nodes in each thread management linked list is the same as the number of threads contained in each thread group.
The method according to claim 1, wherein the mapping relationship between the nodes of the thread management linked list and the thread number is represented by a bitmap.
The method according to claim 4, wherein, according to the mapping relationship and the state of each thread, the executable messages are scheduled from the thread group in the order in which messages enter the processor core. status of the target thread, and input the target thread into the pipeline corresponding to the target thread, including:

Calculate the transmission request corresponding to each node based on the value of the bitmap and the readiness status of each thread in the thread group;

Priority scheduling is performed on threads with emission requests, so that the thread that enters the processor core first and is in a ready state is authorized, converted to an executable state, and the thread in the executable state is used as the target thread. Scheduling;

Obtain the instruction corresponding to the target thread, and input the target thread into the pipeline corresponding to the target thread to execute the instruction.
The method according to claim 5, wherein after inputting the target thread into the pipeline corresponding to the target thread to execute the instruction, it further includes:

Schedule the message after executing the instruction out of the processor core and release the thread corresponding to the message;

The thread number of the message is cleared in the node of the thread management linked list, and other thread numbers stored in the node of the thread management linked list are moved forward by one node in sequence.
The method of claim 1, wherein,

Each pipeline corresponds to a main control state machine, and each thread corresponds to a thread state machine. Each pipeline transitions between idle and authorized states, and each thread transitions between idle, ready, executable and waiting states.
The method according to claim 7, wherein each pipeline is in two states of idle and authorized and each thread is in four states of idle, ready, executable and waiting, including:

When the main control state machine is in the idle state, it means that new messages are allowed to enter the processor core. After the new messages enter the processor core, the corresponding thread is in the idle state;

After the thread number of the message is stored in the node of the thread management linked list and the instruction corresponding to the thread is retrieved from the instruction storage module, the thread transfers from the idle state to the ready state;

Authorize the thread in the ready state in the authorization state of the main control state machine, and the authorized thread transfers from the ready state to the executable state;

The thread in the executable state transitions from the executable state to the idle state after executing the corresponding instructions;

When a thread in the executable state is waiting for data, waiting for table lookup, or re-fetching instructions during the execution of instructions, the thread will be transferred from the executable state to the waiting state;

The thread in the waiting state transfers from the waiting state to the ready state again after the data wait ends, or the table lookup result returns, or the instruction is retrieved and returned;

After the thread number of the thread that has completed the execution of the instruction is released, the main control state machine enters the idle state.
A multi-thread scheduling device, including:

Establish a module and set it to store the thread number carried by each message into the thread management linked list corresponding to the thread group to which the message belongs after each message enters the processor core, and establish the thread number and thread management The mapping relationship between the nodes of the linked list;

The scheduling module is configured to schedule the target thread in the executable state from the thread group according to the order in which messages enter the processor core according to the mapping relationship and the state of the thread state machine corresponding to each thread, and The target thread is input into the pipeline corresponding to the target thread.
The device of claim 9, further comprising:

An allocation module is configured to allocate a thread number to each message entering the processor core, and divide all threads into thread groups corresponding to the number of pipelines.
The device according to claim 10, wherein each thread group corresponds to a thread management linked list, and the number of nodes in each thread management linked list is the same as the number of threads included in each thread group.
The device of claim 10, further comprising:

A release module configured to schedule the message that has completed the execution of the instruction corresponding to the target thread out of the processor core, release the thread corresponding to the message, and clear the message in the node of the thread management linked list. thread number, and move other thread numbers stored in the nodes of the thread management linked list forward by one node in sequence.
The device according to claim 10, wherein each pipeline corresponds to a main control state machine, each thread corresponds to a thread state machine, each pipeline is in two states: idle and authorized, and each thread is in idle, ready, and Transition between executable and waiting 4 states.
A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, wherein when the computer program is executed by a processor, the method described in any one of claims 1 to 8 is implemented. A step of.
An electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements claims 1 to 8 when executing the computer program. Either The steps of the method described in Item .