CN115878550A - Data processing method, chip, equipment and system - Google Patents
Data processing method, chip, equipment and system Download PDFInfo
- Publication number
- CN115878550A CN115878550A CN202111152703.2A CN202111152703A CN115878550A CN 115878550 A CN115878550 A CN 115878550A CN 202111152703 A CN202111152703 A CN 202111152703A CN 115878550 A CN115878550 A CN 115878550A
- Authority
- CN
- China
- Prior art keywords
- processor core
- data processing
- processing request
- thread
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 209
- 238000000034 method Methods 0.000 claims abstract description 75
- 230000006870 function Effects 0.000 claims description 49
- 230000008569 process Effects 0.000 claims description 26
- 238000004891 communication Methods 0.000 claims description 23
- 238000010586 diagram Methods 0.000 description 11
- 239000000306 component Substances 0.000 description 10
- 238000012546 transfer Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000008521 reorganization Effects 0.000 description 3
- 230000002146 bilateral effect Effects 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000000547 structure data Methods 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000001152 differential interference contrast microscopy Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application discloses a data processing method, which is used for reducing data processing time delay. The method in the embodiment of the application comprises the following steps: the embodiment of the application is applied to a data processing system, the data processing system comprises a first device and a second device, the first device sends a data processing request to the second device, information carried by the data processing request can instruct the second device to determine whether the data processing request is executed at a first processor core or a second processor core, when the information indicates that the data processing request is executed at the first processor core, the second device can schedule the first processor core to execute the data processing request, and when the information indicates that the data processing request is executed at the second processor core, the second device can schedule the second processor core to execute the data processing request.
Description
Technical Field
The embodiment of the application relates to the field of communication, in particular to a data processing method, a chip, equipment and a system.
Background
Remote Direct Memory Access (RDMA) is a technique that is created to address device-internal data processing delays in network transmissions. RDMA directly transfers data in user application to a storage area of a device, and quickly transfers the data from one device to a memory of another device through a network, thereby eliminating multiple data copy operations in the transfer process, without intervention of operating systems of both devices, and reducing the load of a Central Processing Unit (CPU) in the device.
However, for some data processing, for example: for Key-Value pair (Key-Value) data access in a relational database, multiple RDMA access may be caused by adopting the RDMA technology, and the data access delay is high. In another technique, a device may call a CPU of a server to process data (e.g., access Key-Value data) through a Remote Procedure Call (RPC) technique. In this technique, one device sends a request to another device, which typically fetches the request by setting up multiple polling (poll) threads of the CPU and then invokes an execution thread in the CPU to execute the request. Although the above scheme avoids multiple network accesses by directly calling RDMA, the poll thread overhead performed by the CPU is large, which affects the use of the execution thread in the CPU, i.e., the processing speed of data.
Disclosure of Invention
The embodiment of the application provides a data processing method, a chip, equipment and a system, which are used for reducing processing time delay.
A first aspect of an embodiment of the present application provides a data processing method, where the method is applied to a data processing system, where the data processing system includes a first device and a second device, and the method includes: the method comprises the steps that a second device receives a data processing request sent by a first device, wherein the second device comprises a processor, the processor comprises a first processor core and a second processor core, and the processing capacity of the first processor core is larger than that of the second processor core; and the second equipment determines to dispatch the data processing request to the first processor core for processing or determines to dispatch the data processing request to the second processor core for processing according to the information carried by the data processing request.
In the first aspect, the information carried in the data processing request may enable the second device to determine whether the data processing request sent by the first device is executed in the first processor core or the second processor core of the second device, so that a suitable processor core may be selected according to the type of the data processing request to perform data processing, thereby increasing the speed of data processing.
In a possible implementation manner, the second processor core includes a polling thread and a scheduling thread, and in the foregoing step, the receiving, by the second device, the data processing request sent by the first device specifically includes: the polling thread acquires a data processing request sent by the first equipment from a receiving queue of the second equipment through polling; the polling thread sends a data processing request to the scheduling thread.
In the above possible implementation, the second processor core executes the polling thread and the scheduling thread, and since the computing power of the second processor core is small, the required overhead is correspondingly small, that is, the overhead of the polling thread is small, and the processing of the first processor core on the data processing request is not affected under the condition that more polling threads can be executed, thereby reducing the processing delay.
In one possible embodiment, the first processor core and the second processor core include an execution thread; the step of the second device determining to dispatch the data processing request to the first processor core for processing or determining to dispatch the data processing request to the second processor core for processing according to the information carried by the data processing request specifically includes: and the scheduling thread in the second processor core determines to schedule the data processing request to the execution thread in the first processor core for processing according to the information carried by the data processing request, or determines to schedule the data processing request to the execution thread in the second processor core for processing.
In one possible embodiment, the data processing request is a database access request for accessing a database in the second device, and the database access request includes any one of the following: data write requests, data read requests, data update requests, data delete requests, file lock requests, data retrieval requests.
In the above possible implementation manners, the data processing request may be a data writing request, a data reading request, a data updating request, a data deleting request, a file locking request, a data retrieving request, or a remote resource using request, and the like.
In one possible implementation, the information carried by the data processing request includes a function identifier; the step of the second device determining to dispatch the data processing request to the first processor core for processing or determining to dispatch the data processing request to the second processor core for processing according to the information carried by the data processing request specifically includes: and the second equipment determines to dispatch the data processing request to the first processor core for processing according to the function identifier and preset information in the second equipment, or determines to dispatch the data processing request to the second processor core for processing.
In the above possible implementation manner, the second device stores preset information, and the preset information is combined with the function identifier to determine whether to execute the data reading request in the first processor core or the second processor core, where the preset information may be registration information for registering an identifier of a function with low required computing power in the embodiment of the present application, and is used by the second device to distinguish data reading requests with different required computing powers and process the data reading requests in the first processor core or the second processor core correspondingly, and the first device does not need to be improved, so that the cost is reduced.
In a possible implementation manner, before the second device receives the data processing request sent by the first device in the above step, the method further includes: the second device registers the function identifier to generate preset information.
In the above possible implementation, the second device registers part of the function identifier, and the preset information is the registered function identifier, so that the feasibility of the scheme is improved.
In one possible implementation, the step of acquiring, by the polling thread, the data processing request sent by the first device from the receive queue of the second device includes: the polling thread polls at least one doorbell register in the processor and obtains a data processing request from a receive queue bound to the first doorbell register.
In the possible implementation manner, the polling thread only needs to poll at least one doorbell register, and does not need to poll a large number of receiving queues of the second device, so that the polling overhead is reduced.
In a possible implementation manner, in the foregoing step, after the scheduling thread in the second processor core determines to schedule the data processing request to the execution thread in the second processor core for processing according to the information carried in the data processing request, the method further includes: and when the preset condition is met, calling the execution thread of the first processor core to process the data processing request through the scheduling thread of the second processor core, wherein the preset condition indicates that the execution thread of the second processor core cannot execute the data processing request.
In the above possible embodiment, when the data processing request cannot be continuously executed in the execution thread of the second processor core, the scheduling thread of the second processor core may also schedule the execution thread of the first processor core to execute the data processing request, and the scheduling delay may be reduced relative to scheduling of the first processor core and the second processor core by the operating system.
A second aspect of an embodiment of the present application provides a processor chip, which may implement the method in any one of the foregoing first aspect or possible implementation manners of the first aspect. The processor chip comprises a respective first processor core and a second processor core for performing the above method.
A third aspect of embodiments of the present application provides a computing device, including: a processor coupled to a memory for storing instructions that, when executed by the processor, cause the computing device to implement the method of the first aspect or any of the possible implementations of the first aspect. The computing device may be, for example, a network device, or may be a chip or a chip system that supports the network device to implement the method described above.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores instructions that, when executed, cause a computer to perform the method provided by the foregoing first aspect or any one of the possible implementations of the first aspect.
A fifth aspect of embodiments of the present application provides a computer program product, which includes computer program code, and when the computer program code is executed, causes a computer to execute the method provided by the foregoing first aspect or any one of the possible implementation manners of the first aspect.
A sixth aspect of an embodiment of the present application provides a system, where the system includes a first device and a second device, where the first device and the second device are connected in a communication manner, the first device is configured to send a data processing request to the second device, and the second device may perform the method provided in the foregoing first aspect or any one of the possible implementation manners of the first aspect.
Drawings
Fig. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a data processing system according to an embodiment of the present application;
fig. 3 is a schematic diagram of a data processing method according to an embodiment of the present application;
FIG. 4 is an internal flow diagram provided by an embodiment of the present application;
FIG. 5 is another internal flow diagram provided by an embodiment of the present application;
fig. 6 is a schematic structural diagram of a processor chip according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a data processing method, a chip, equipment and a system, which are used for reducing processing time delay.
Embodiments of the present application will be described with reference to the accompanying drawings, and it is to be understood that the described embodiments are only some embodiments of the present application, and not all embodiments of the present application. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
The terms "first," "second," and the like in the description and claims of this application and in the foregoing drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
First, some concepts related to a data processing method provided in the embodiments of the present application are explained.
Remote Direct Memory Access (RDMA) is a technique that is created to address server-side data processing delays in network transmissions. RDMA directly transfers data in a user application to a storage area of a device (e.g., a server), and quickly transfers the data from one device to a memory of another device via a network, thereby eliminating multiple data copy operations in one device during transfer, eliminating the intervention of operating systems of both devices, and reducing the load of a Central Processing Unit (CPU). Currently, RDMA supports multiple data read-write modes: send/Receive (Send/Receive), RDMA read (RDMA read), RDMA write (RDMA write), and the like.
RDMA provides message queue-based point-to-point communication, where each device can directly retrieve messages from other devices without the intervention of the operating system and protocol stack. The messaging service is built on top of the channel connection created between the two communicating parties. When communication between two traffic parties is needed, a channel connection is created, and the head and tail end points of each channel are two pairs of Queue Pairs (QPs). Each pair of QPs consists of a Send Queue (SQ) and a Receive Queue (RQ). In addition to the two basic queues described by QP, RDMA also provides a Completion Queue (CQ) that is used to record the result of sending information in SQ and the result of receiving information in RQ.
Fig. 1 is a schematic diagram of a system architecture applicable to an embodiment of the present application, and as shown in fig. 1, a remote data processing system is composed of a plurality of devices, where the plurality of devices include a device 0, a device 1, a device 2, a device 3, a device …, and a device N, each device is provided with a Network Interface Controller (NIC), and the NICs on the devices realize communication through a network (network). Each device may serve as a client (client) to initiate a remote access request, may also serve as a server (server) to receive an access request, and one device may also serve as only a client or a server.
For data stored and retrieved through a linked list/tree structure (key-value), accessing data requires multiple pointer traces, i.e. multiple main memory accesses. Since the service request needs to be accessed to the main memory many times, a Remote Procedure Call (RPC) is usually used to implement remote data access, so as to implement pointer tracking and data access.
Taking the bilateral RPC as an example, the client calls the CPU of the server through the bilateral RPC to realize data access, for example, key-Value access. That is, the client sends a request to the server, the server side obtains the request through a plurality of polling (poll) threads, then invokes an execution thread to execute the request, and feeds back data obtained by a plurality of Key-values to the client. Although the scheme avoids multiple network accesses of multiple unilateral RDMA, a large amount of poll thread expenses required by the CPU are large, and the use of the execution thread is influenced, namely the processing delay is influenced.
In order to solve the above problem, an embodiment of the present application provides a data processing method, and a data processing system for executing the data processing method may refer to fig. 2, where the system includes a first device and a second device for performing communication, where the first device may include a plurality of devices, for example, a first device 0, a first device 1, …, and a first device N. The second device comprises a communication component, a memory and a processor, wherein the processor comprises a first processor core with high computing power and a second processor core capable of executing multi-threading, and it should be understood that the first processor core may comprise a plurality of processor cores with high computing power, therefore, the first processor core in the processor may also be referred to as a first processor core cluster, and likewise, the second processor core may also comprise a plurality of processor cores, therefore, the second processor core may also be referred to as a second processor core cluster. The computing capacity of the first processor core is higher than that of the second processor core, and the first processor core comprises a poll thread and an execution thread. The second processor core may include a poll thread, a scheduling thread, and an execution thread, wherein the second processor core further includes one or more doorbell registers bound to the poll thread, and an on-chip buffer (buffer) for buffering requests for the execution thread, and the on-chip buffer may not be used, which is not limited herein. The memory comprises a first processor core queue for receiving requests for the execution thread of the first processor core, a second processor core queue for receiving requests for the execution thread of the second processor core, and a queue for receiving Requests (REQ)/sending results (result, RSLT), such as REQ/RSLT 0, REQ/RSLT 1, … and REQ/RSLT N, and the communication component supports transmission of REQ/RSLT and REQ/RSLT between the first device and the REQ/RSLT.
For example, the processor in the second device may adopt a large-size core processor, which is a large-small core architecture (big-small) of an advanced high-precision instruction set processor (ARM) of a core component of an embedded system under a Reduced Instruction Set (RISC) architecture, and is a heterogeneous-operation multi-core processor architecture. In the architecture, a 'big cluster' (a first processor core) formed by processor cores with relatively high power consumption but high computing power and a 'LITTLE cluster' (a second processor core) formed by processor cores with low power consumption and low computing power are combined together, the processor cores share a memory section, and the load can be scheduled and switched on line in real time among different CPU clusters.
The communication component in the second device may be a network card, a Peripheral Component Interconnect Express (PCIE) physical link, a High Capacity Communication System (HCCS) physical link, or another communication component.
It should be understood that the first device and the second device shown in fig. 2 may be independent servers, or may be a computing module within a server or relatively independent between different servers. When the first device and the second device are both servers, the data processing system may be a computer room, a rack, or two remote data centers.
Fig. 3 is a schematic diagram illustrating a data processing method according to an embodiment of the present application, where the method includes:
301. the first device initiates a data processing request to the second device, and correspondingly, the second device receives the data processing request from the first device.
In this embodiment of the application, the first device may generate a data processing request according to a function of the second device that needs to be invoked, and send the data processing request to the second device, and the second device may execute a corresponding function according to the data processing request. For example: the data processing request may be a database access request for accessing a database in the second device, where the database access request includes any one of the following ways: data write requests, data read requests, data update requests, data delete requests, file lock requests, data retrieval requests.
In this embodiment of the present application, a data read request is taken as an example, where the data read request is an RDMA request, in this case, the REQ/RSLT of the first device and the second device in fig. 2 may be RQ/SQ, and the communication component of the first device and the second device is a NIC. The first device sends a data reading request to the second device through the SQ of the first device, where the data reading request includes a key of target data to be read in the second device by the first device, where the data reading request may further include a first identifier or a second identifier, where the first identifier indicates that the data reading request is executed at the first processor core, and the second identifier indicates that the data reading request is executed at the second processor core, or when the data reading request does not include the first identifier, the second device may determine that the data reading request is executed at the second processor core according to the data reading request, or when the data reading request does not include the second identifier, the second device may determine that the data reading request is executed at the first processor core according to the data reading request, which is not limited herein.
In the embodiment of the present application, for example, the first identifier or the second identifier is included in the data read request, where the first identifier indicates that the data read request is executed at the first processor core, and the second identifier indicates that the data read request is executed at the second processor core, for example, the first identifier and the second identifier may also be identifiers obtained by processing a function identifier (function ID) originally included in the RDMA request, where the function identifier is an identifier of a function that the first device wishes to call at the second device, and a computation capability required for the function called by the function identifier corresponding to the first identifier is greater than a computation capability called by the function identifier corresponding to the second identifier. The second device stores preset information, the preset information is combined with the function identifier to determine whether the data reading request is executed in the first processor core or the second processor core, the preset information can be registration information for registering the identifier of the function with low required computing capacity in the embodiment of the application, correspondingly, the preset information only comprises the first identifier, the preset information is used for the second device to distinguish data reading requests with different required computing capacities, when the function identifier in the data reading request is contained in the preset information, the second device executes the data reading request in the second processor core, and when the function identifier in the data reading request is not contained in the preset information, the second device executes the data reading request in the first processor core. The registration process may be performed by the second device and notifies the first device of the registered function identifier, or may be performed by the first device and then notifies the second device of the registered function identifier, or may be performed by the third-party device and notifies the first device and the second device, or the first device does not need to sense the registration process and the registration result, which is not limited in the embodiment of the present application.
The target data is data in the structure of the linked list/tree, i.e., the position of the target data in the memory of the second device needs to be indicated by multiple pointers. For different data volumes of target data, the required computing power in the large and small core processors is different, the target data with low required computing power can be allocated to the second processor core of the large and small core processors for processing, and the target data with high required computing power can be allocated to the first processor core of the large and small core processors for processing.
The data read request may further include a read address or a write address of the target data, which is not limited herein. The user process of the first device may initiate an acquisition request for acquiring the target data, and the user process triggers a doorbell register of the NIC of the first device to enable the NIC of the first device to obtain the acquisition request, and the acquisition request carries the function identifier to form a data reading request, and then sends the data reading request to the second device through the SQ. In this embodiment of the present application, the user process may further trigger the doorbell register through a middleware, where the middleware may be unified communications exchange (UCX) and is configured to promote rapid development by providing an Application Programming Interface (API), shield low-level details, and maintain high performance and scalability.
In this embodiment of the application, the NIC of the second device receives the data reading request through the RQ, and after step 301, the NIC of the second device taps a doorbell to a doorbell register in a second processor core of the large-size core processor according to a mapping relationship between a QP and the doorbell register, where the mapping relationship indicates that each pair of QPs in the second device is bound to one doorbell register. The second processor core further comprises a plurality of poll threads, the number of the doorbell registers is related to the number of the poll threads, the number of the doorbell registers can be the same as the number of the poll threads, each poll thread corresponds to a plurality of doorbell registers, each poll thread only needs to poll the doorbell register bound by the poll thread, and when the doorbell register is triggered, the poll thread can obtain the data reading request from the QP bound by the doorbell register.
302. And the second equipment determines to dispatch the data processing request to the first processor core for processing or determines to dispatch the data processing request to the second processor core for processing according to the information carried by the data processing request.
The information carried by the data processing request may instruct the second device to process in the first processor core, or instruct the second device to process in the second processor core, and after the second device receives the data processing request, when the information instructs the second device to process in the first processor core, the second device may determine to schedule the first processor core to process the data processing request, and when the information instructs the second device to process in the second processor core, the second device may determine to schedule the second processor core to process the data processing request.
Taking a data reading request for accessing the database of the second device as an example, when the second device matches the function identifier included in the data processing request among the registered function identifiers, the second device may directly execute the data reading request through the second processor core, and track the location of the target data through a pointer multiple times based on the key of the target data to read the target data. When the second device does not match the function identifier included in the data processing request in the registered function identifiers, the second device may execute the data reading request through the first processor core.
The main functions of the second processor core can be divided into a poll thread, a scheduling thread and an execution thread, the poll thread sends a data reading request to the scheduling thread, the scheduling thread can match a function identifier in the data reading request with a registered function identifier after receiving the data reading request, when the matching is successful, the empty and full condition of an on-chip buffer area (buffer) of the second processor core can be confirmed, when an on-chip cache is full, the data reading request is placed in a queue of the second processor core, and when the on-chip buffer area is not full, a reading command can be directly written into the on-chip buffer area under the condition that the queue of the second processor core is empty. The execution thread can obtain a data reading request from the on-chip buffer area, and according to the key of the target data in the data reading request, the execution thread can track the reading position of the target data through multiple pointers.
The execution process when the function identifier matching is successful may refer to an internal flowchart shown in fig. 4, where the first device includes a user thread and a NIC, and the second device includes a NIC, a second processor core and a memory, where the second processor core includes a poll thread, a scheduling thread and an execution thread, and the memory stores tree structure data, where the tree structure data includes a root node, a middle node and a leaf node. The user thread of the first device triggers a doorbell register of a NIC of the first device, so that the NIC of the first device sends a data reading request to a NIC of the second device, the NIC of the second device fills the data reading request into an RQ and triggers a doorbell register of a poll thread, so that the poll thread forwards the data reading request in the RQ to a scheduling thread, the scheduling thread schedules an execution thread according to the data reading request, the execution thread performs pointer tracking based on a key of target data, obtains an intermediate node pointed by a root node and a leaf node pointed by the intermediate node by accessing a memory for multiple times, finds a node including the key of the target data according to the leaf node pointed by the intermediate node, the execution thread can obtain the target data, and the NIC of the second device transmits communication of the target data to the NIC of the first device through the first device and the second device by triggering the doorbell register of the NIC of the second device, the user thread of the first device can receive the target data through the RQ and store the receiving result in the CQ, and when the user thread successfully polls from the CQ of the user thread.
In this embodiment, the second processor core may have multiple implementation forms, for example, an area-optimized advanced ARM/fifth-generation reduced instruction set computer (risc-V), each thread has a complete function, and may implement functions such as poll/schedule/execute, and inter-thread switching at a hardware level is implemented through a micro-architecture among each thread, for example, the scheduling of shared resources such as an L/S interface/Arithmetic Logic Unit (ALU) is implemented through arbitration of hardware; the high-performance first processor core can have various implementation forms, such as high-performance x86/ARM/RISC-V, and the implementation position and the requirement of the multi-thread second processor core can be related to the high-performance big core on a small CPU block (die) or not on the same die.
When the function identifier in the data reading request is failed to be matched with the registered function identifier, the scheduling thread of the second processor core can schedule the first processor core to process the data reading request, and the problem that the execution performance of the RPC with large granularity is influenced due to poor processing performance of the second processor core is avoided. Specifically, after the scheduling thread of the second processor core determines that the data reading request is processed by the first processor core, the data reading request can be forwarded to the queue of the first processor core, and a use flag of the queue of the first processor core is recorded, the scheduling thread can record an empty/non-empty flag, and the non-empty flag indicates that the use flag represents that the queue of the first processor core is used, that is, the scheduling thread of the second processor core can maintain empty/non-empty flags of all queues of the first processor core, the first processor core only needs the empty/non-empty flag in the scheduling thread of the poll second processor core, polling for all command queues is not needed, and the number of occupied poll threads is greatly reduced compared with that of direct poll. When the first processor core poll reaches the non-empty flag, the data reading request can be obtained from the command queue, the data reading request is executed through the execution thread of the first processor core, and the target data is obtained through query based on the key of the target data.
In this embodiment of the application, in a scenario that function identifier matching is successful, when the second processor core finds that the data processing request reaches a preset condition in a process of executing the data processing request, the preset condition may be that a service/function that needs to be invoked does not support or supports processing on the second processor core but has poor performance, or that the second processor core does not support the computational power required by the data processing request, for example, a B + tree splitting operation caused by an insert operation, and the second processor core supports the insert function, but the tree splitting operation caused by the insert function is not sufficiently supported by the computational power. At this time, the second processor core may call the execution thread of the first processor core to perform the corresponding operation.
Referring to another internal flowchart shown in fig. 5, taking the data processing request as a data writing request as an example, the first device includes a user thread and a NIC, and the second device includes a NIC, a second processor core, a first processor core and a memory, where the second processor core includes a poll thread, a scheduling thread and a execution thread, and the first processor core includes a poll thread and an execution thread. And the user thread of the first device initiates a data write request, and triggers a doorbell register of the NIC of the first device to enable the NIC of the first device to send the data write request to the second device through communication between the first device and the second device. And the NIC of the second device receives the data write request through the RQ and triggers a doorbell register bound with the RQ, and accordingly, a poll thread bound with the doorbell register in the second processor core can obtain the data write request from the RQ. The data write request comprises a function identification indicating processing of the second processor core, and the scheduling thread of the second processor core schedules the execution thread of the second processor core to execute the data write request. The execution thread tracks the insertion position of the target data in the memory through multiple pointers, when it is determined that all target data can be written by the node in the memory, the target data can be directly inserted into the memory, then a request completion response is sent to the NIC of the second device, the NIC of the second device communicates with the NIC of the first device to forward the request completion response, and the user thread of the first device can poll the NIC of the first device to obtain the request completion response to determine that data writing is completed. When it is determined that all target data cannot be written into the nodes in the memory, and the target data needs to be split into other nodes for writing, that is, the written data may cause a tree splitting operation, the scheduling thread of the second processor core may schedule the first processor core to execute the data writing request, for example, the second processor core may schedule the data writing request and a position reorganization command that needs to be inserted currently, return to the scheduling thread, the scheduling thread fills the reorganized request into a queue of the large core cluster, waits for the large core to call the request and record a use flag, and after a poll thread poll of the second processor core reaches the use flag, the data writing request and the position reorganization command that needs to be inserted currently may be obtained in the queue of the first processor core, and is forwarded to an execution thread of the first processor core for execution, the execution thread of the first processor core may reorganize the tree splitting operation according to the data writing request and the position reorganization command that needs to be inserted currently, and then return a request completion response to the NIC of the second device and the user thread of the first device.
In this embodiment of the application, a first device sends a data processing request to a second device, where information carried in the data processing request may instruct the second device to determine whether to execute the data processing request at a first processor core or a second processor core, and when the information indicates to execute at the first processor core, the second device may schedule the first processor core to execute the data processing request, and when the information indicates to execute at the second processor core, the second device may schedule the second processor core to execute the data processing request. When the second device processes the request with small required computing power, the overhead of the second processor core is small, more concurrent processing data can be supported, the processing delay is reduced, and the processing effect of the request with large required computing power can be ensured due to the high processing power of the first processor core.
The method of data processing is explained above, and a processor chip that executes the method of data processing is described below.
Referring to fig. 6, as shown in fig. 6, a schematic structural diagram of a processor chip according to an embodiment of the present disclosure is shown, where the processor chip 60 includes: a first processor core 601 and a second processor core 602, wherein the processing capacity of the first processor core 601 is larger than that of the second processor core 602;
the second processor core 602 is configured to receive a data processing request from a first device, where the first device is a device communicatively connected to a second device where the processor chip is located;
the second processor core 602 is further configured to determine to schedule the data processing request to the first processor core 601 for processing according to the information carried in the data processing request, or determine to schedule the data processing request to the second processor core 602 for processing.
Optionally, the second processor core 602 includes a polling thread and a scheduling thread, and the second processor core 602 is specifically configured to:
polling is carried out by utilizing a polling thread, and a data processing request sent by first equipment is obtained from a receiving queue of second equipment;
and sending the data processing request to the scheduling thread by utilizing the polling thread.
Optionally, the first processor core 601 and the second processor core 602 include an execution thread, and the second processor core 602 is specifically configured to: and determining to dispatch the data processing request to the execution thread in the first processor core 601 for processing by using the dispatch thread according to the information carried by the data processing request, or determining to dispatch the data processing request to the execution thread in the second processor core 602 for processing.
Optionally, the data processing request is a database access request for accessing a database in the second device, where the database access request includes any one of: data write requests, data read requests, data update requests, data delete requests, file lock requests, data retrieval requests.
Optionally, the information carried by the data processing request includes a function identifier;
the second processor core 602 is specifically configured to determine to dispatch the data processing request to the first processor core 601 for processing according to the function identifier and information preset in the second device, or determine to dispatch the data processing request to the second processor core 602 for processing.
Optionally, the processor chip further comprises at least one doorbell register 603: the second processor core 602 is configured to poll the at least one doorbell register with a polling thread and obtain a data processing request from a receive queue bound to the first doorbell register.
Optionally, the second processor core 602 is further configured to:
after determining that the data processing request is scheduled to the execution thread in the second processor core 602 by using the scheduling thread according to the information carried in the data processing request, when a preset condition is satisfied, calling the execution thread of the first processor core 601 by using the scheduling thread to process the data processing request, wherein the preset condition indicates that the execution thread of the second processor core 602 cannot execute the data processing request.
Fig. 7 is a schematic diagram illustrating one possible logical structure of a computing device 70 according to an embodiment of the present application. The computing device 70 includes: a processor 701, a communication interface 702, a memory system 703, and a bus 704. The processor 701, the communication interface 702, and the memory system 703 are connected to each other by a bus 704. In an embodiment of the present application, the processor 701 is configured to control and manage the actions of the computing device 70, for example, the processor 701 is configured to execute the steps performed by the second device in the method embodiment of fig. 3. Communication interface 702 is used to support communications for computing device 70. A storage system 703 for storing program codes and data for computing device 70.
The processor 701 may be, among other things, a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, transistor logic, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 701 may also be a combination of implementing certain functions, including for example, one or more microprocessor combinations, digital signal processors and microprocessors, and the like. The bus 704 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The first processor core 601, the second processor core 602, and the doorbell register 603 in the processor chip 60 correspond to components in the processor 701 in the computing device 70.
The computing device 70 of this embodiment may correspond to the second device in the above-described method embodiment of fig. 3, and the communication interface 702 in the computing device 70 may implement the functions and/or various steps implemented by the second device in the above-described method embodiment of fig. 3, which are not described herein again for brevity.
In another embodiment of the present application, there is also provided a determination machine readable storage medium having a determination machine executable instruction stored therein, and when the determination machine executable instruction is executed by a processor of a device, the device executes the steps of the method of data processing executed by the second device in the method embodiment of fig. 3 described above.
In another embodiment of the present application, there is also provided a computer program product comprising computer executable instructions stored in a computer readable storage medium; when the processor of the device executes the determining machine execution instructions, the device performs the steps of the method of data processing performed by the second device in the method embodiment of fig. 3 described above.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computing device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
Claims (17)
1. A method of data processing, the method being applied to a data processing system comprising a first device and a second device, the method comprising:
the second device receives a data processing request sent by the first device, wherein the second device comprises a processor, the processor comprises a first processor core and a second processor core, and the processing capacity of the first processor core is greater than that of the second processor core;
and the second equipment determines to dispatch the data processing request to the first processor core for processing or determines to dispatch the data processing request to the second processor core for processing according to the information carried by the data processing request.
2. The method of claim 1, wherein the second processor core comprises a polling thread and a scheduling thread,
the receiving, by the second device, the data processing request sent by the first device specifically includes:
the polling thread acquires a data processing request sent by the first equipment from a receiving queue of the second equipment through polling;
and the polling thread sends the data processing request to the scheduling thread.
3. The method of claim 2, wherein the first processor core and the second processor core comprise threads of execution;
the second device determines to dispatch the data processing request to the first processor core for processing according to the information carried by the data processing request, or determines to dispatch the data processing request to the second processor core for processing, which specifically includes:
and the scheduling thread in the second processor core determines to schedule the data processing request to the execution thread in the first processor core for processing according to the information carried by the data processing request, or determines to schedule the data processing request to the execution thread in the second processor core for processing.
4. The method according to any one of claims 1 to 3, wherein the data processing request is a database access request for accessing a database in the second device, and the database access request includes any one of: data write requests, data read requests, data update requests, data delete requests, file lock requests, data retrieval requests.
5. The method according to claim 1 or 2, wherein the information carried by the data processing request comprises a function identifier;
the second device determines to dispatch the data processing request to the first processor core for processing according to the information carried by the data processing request, or determines to dispatch the data processing request to the second processor core for processing, which specifically includes:
and the second equipment determines to dispatch the data processing request to the first processor core for processing according to the function identifier and preset information in the second equipment, or determines to dispatch the data processing request to the second processor core for processing.
6. The method of claim 5, wherein before the second device receives the data processing request sent by the first device, the method further comprises:
and the second equipment registers the function identifier to generate the preset information.
7. The method of claim 2, wherein the polling thread obtaining the data processing request sent by the first device from the receive queue of the second device by polling comprises:
and the polling thread polls at least one doorbell register in the processor and acquires the data processing request from a receiving queue bound with the first doorbell register.
8. The method according to claim 3, wherein the scheduling thread in the second processor core determines, according to information carried by the data processing request, that the data processing request is scheduled to be processed by an execution thread in the second processor core, and the method further comprises:
and when a preset condition is met, the scheduling thread of the second processor core calls the execution thread of the first processor core to process the data processing request, and the preset condition indicates that the execution thread of the second processor core cannot execute the data processing request.
9. A processor chip, comprising: the system comprises a first processor core and a second processor core, wherein the processing capacity of the first processor core is larger than that of the second processor core;
the second processor core is used for receiving a data processing request from first equipment, wherein the first equipment is equipment in communication connection with second equipment where the processor chip is located;
and the second processor core is further configured to determine to dispatch the data processing request to the first processor core for processing according to information carried by the data processing request, or determine to dispatch the data processing request to the second processor core for processing.
10. The processor chip of claim 9, wherein the second processor core comprises a polling thread and a scheduling thread, the second processor core being configured to:
polling by using the polling thread, and acquiring a data processing request sent by the first equipment from a receiving queue of the second equipment;
and sending the data processing request to the scheduling thread by utilizing the polling thread.
11. The processor chip of claim 10, wherein the first processor core and the second processor core comprise threads of execution, and wherein the second processor core is specifically configured to: and determining to dispatch the data processing request to an execution thread in the first processor core for processing by using the dispatching thread according to the information carried by the data processing request, or determining to dispatch the data processing request to an execution thread in the second processor core for processing.
12. The processor chip according to any one of claims 9 to 11, wherein the data processing request is a database access request for accessing a database in the second device, and the database access request includes any one of: data write requests, data read requests, data update requests, data delete requests, file lock requests, data retrieval requests.
13. The processor chip according to any of claims 9-12, wherein the information carried by the data processing request comprises a function identification;
the second processor core is specifically configured to determine to dispatch the data processing request to the first processor core for processing according to the function identifier and information preset in the second device, or determine to dispatch the data processing request to the second processor core for processing.
14. The processor chip of claim 9, further comprising at least one doorbell register: and the second processor core is used for polling the at least one doorbell register by using the polling thread and acquiring the data processing request from a receiving queue bound with the first doorbell register.
15. The processor chip of claim 10, wherein the second processor core is further configured to:
after the scheduling thread is used for determining to schedule the data processing request to an execution thread in the second processor core according to the information carried by the data processing request, when a preset condition is met, the scheduling thread is used for calling the execution thread of the first processor core to process the data processing request, and the preset condition indicates that the execution thread of the second processor core cannot execute the data processing request.
16. A computing device, comprising: a processor coupled with the memory,
the processor is to execute instructions stored in the memory to cause the computing device to perform the method of any of claims 1 to 8.
17. A system, characterized in that the system comprises a first device and a second device, the first device and the second device being communicatively connected, the first device being configured to send a data processing request to the second device, the second device being configured to perform the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111152703.2A CN115878550A (en) | 2021-09-29 | 2021-09-29 | Data processing method, chip, equipment and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111152703.2A CN115878550A (en) | 2021-09-29 | 2021-09-29 | Data processing method, chip, equipment and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115878550A true CN115878550A (en) | 2023-03-31 |
Family
ID=85756230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111152703.2A Pending CN115878550A (en) | 2021-09-29 | 2021-09-29 | Data processing method, chip, equipment and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115878550A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118550736A (en) * | 2024-07-30 | 2024-08-27 | 鹏钛存储技术(南京)有限公司 | Communication method among multiple CPUs |
-
2021
- 2021-09-29 CN CN202111152703.2A patent/CN115878550A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118550736A (en) * | 2024-07-30 | 2024-08-27 | 鹏钛存储技术(南京)有限公司 | Communication method among multiple CPUs |
CN118550736B (en) * | 2024-07-30 | 2024-10-15 | 鹏钛存储技术(南京)有限公司 | Communication method among multiple CPUs |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110402568B (en) | Communication method and device | |
US8131814B1 (en) | Dynamic pinning remote direct memory access | |
US9244881B2 (en) | Facilitating, at least in part, by circuitry, accessing of at least one controller command interface | |
CN113918101B (en) | Method, system, equipment and storage medium for writing data cache | |
CN111431757B (en) | Virtual network flow acquisition method and device | |
CN110119304B (en) | Interrupt processing method and device and server | |
CN113986791B (en) | Method, system, equipment and terminal for designing intelligent network card fast DMA | |
CN109857545B (en) | Data transmission method and device | |
CN115964319A (en) | Data processing method for remote direct memory access and related product | |
CN115934625B (en) | Doorbell knocking method, equipment and medium for remote direct memory access | |
WO2023104194A1 (en) | Service processing method and apparatus | |
CN114461593B (en) | Log writing method and device, electronic device and storage medium | |
CN115878550A (en) | Data processing method, chip, equipment and system | |
CN114006946B (en) | Method, device, equipment and storage medium for processing homogeneous resource request | |
CN109101439B (en) | Message processing method and device | |
CN117312229B (en) | Data transmission device, data processing equipment, system, method and medium | |
CN210804421U (en) | Server system | |
CN116601616A (en) | Data processing device, method and related equipment | |
CN117909087B (en) | Data processing method and device, central processing unit and electronic equipment | |
CN116775510B (en) | Data access method, device, server and computer readable storage medium | |
CN117407356B (en) | Inter-core communication method and device based on shared memory, storage medium and terminal | |
CN117666921A (en) | Data processing method, accelerator and computing device | |
CN115509771A (en) | Event monitoring method and device and electronic equipment | |
CN118860290A (en) | NVMe write data processing method, terminal and storage medium | |
CN118093225A (en) | Subscription message transmission method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |