CN118519924A

CN118519924A - Cache control method and device, electronic equipment and readable storage medium

Info

Publication number: CN118519924A
Application number: CN202410742980.6A
Authority: CN
Inventors: 金炳丞; 王齐
Original assignee: Beijing Open Source Chip Research Institute
Current assignee: Beijing Open Source Chip Research Institute
Priority date: 2024-06-07
Filing date: 2024-06-07
Publication date: 2024-08-20

Abstract

The embodiment of the invention provides a cache control method, a device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: under the condition that a memory access request of a memory access module is received, recording request information of the memory access request into a memory access register of a control module, waiting for arbitration by an arbitration unit and then feeding the request information into a cache pipeline unit; querying whether the access request hits in a cache or not by using the cache pipeline unit; executing the access request under the condition that the access request hits in a cache; under the condition that the access request does not hit in a cache, the state of the access register is adjusted to be a dormant state, and a first request is sent to a master node; the first request is used for obtaining cache data required by the access request. The embodiment of the invention can ensure the cache consistency in the multi-core processor system under the condition of not needing excessive hardware expenditure and bus expenditure.

Description

Cache control method and device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a cache control method, a device, an electronic apparatus, and a readable storage medium.

Background

With the continued development of processor technology, many-core processors have become mainstream, each processor core being able to independently execute instructions and access memory. However, such parallel processing also brings about a problem of data consistency. In many-core processors, each processor core has its own local Cache (Cache) for storing recently accessed data and instructions to increase access speed.

Thus, when multiple processor cores need to access and modify the same data at the same time, there is a potential for data inconsistencies. To address this problem, cache coherency techniques are typically used to ensure coherency between the different processor cores. Directory-based coherency protocol (Directory-Based Coherence Protocol) is a technology adopted in early multiprocessor systems, and its core idea is to record the state and location information of each data block in the system through a centralized Directory structure, so as to ensure the consistency of data when data access and update are performed between processors. The method has obvious defects that firstly, for a directory structure in a many-core structure, the hardware cost is unacceptable; secondly, the mode of maintaining cache consistency is single, and the problem of low data access efficiency is often caused under a large-scale processor structure. A Bus snoop (Bus snoop) based coherency protocol is a relatively wide range of coherency protocols at present, and in the Bus snoop technology, a cache controller of each processor records the state of updated cache data, and monitors communications on the system Bus to track and respond to operations of other processors on a memory data block, common protocols include MESI, MOESI, and the like. The bus sniffing mode needs each processor core to monitor the bus state in real time, and the bus consumption is large, so that the system performance is influenced.

Disclosure of Invention

The embodiment of the invention provides a cache control method, a cache control device, electronic equipment and a readable storage medium, which can solve the problems of high hardware cost and high bus consumption of a cache consistency technology in the related technology.

In order to solve the problems, the embodiment of the invention discloses a cache control method which is applied to a local cache controller, wherein the local cache controller comprises a bus interface module, a control module and a cache pipeline unit; the method comprises the following steps:

Under the condition that a memory access request of a memory access module is received, recording request information of the memory access request into a memory access register of the control module, waiting for arbitration by an arbitration unit and then feeding the request information into the cache pipeline unit;

querying whether the access request hits in a cache or not by using the cache pipeline unit;

executing the access request under the condition that the access request hits in a cache;

Under the condition that the access request does not hit in a cache, the state of the access register is adjusted to be a dormant state, and a first request is sent to a master node; the first request is used for obtaining cache data required by the access request.

Optionally, the memory access request includes a memory access read request; and under the condition that the access request misses the cache, adjusting the state of the access register to be a dormant state, and sending a first request to a master node, wherein the method comprises the following steps:

Under the condition that the access read request does not hit in a cache, the state of the access register is adjusted to be dormant, a read transaction register in the control module is set, and a first read request is initiated to the master node; the first read request is used for requesting first target data required by the access read request;

after first target data returned by the master node is read from the read transaction register, activating the memory access register, and setting a first flag bit of the read transaction register; the first flag bit is used for indicating to send a response message;

Sending a first response message to the master node; the first response message is used for indicating that a response of the master node to the first read request is received.

Optionally, the executing the access request if the access request hits in the cache includes:

and under the condition that the access read request hits in the cache, writing back first target data corresponding to a first request address of the access read request to the access register.

Optionally, the memory access request includes a memory access write request; and executing the access request under the condition that the access request hits in a cache, wherein the method comprises the following steps:

under the condition that the access and write request hits in a cache, inquiring a second cache line state corresponding to a second request address of the access and write request;

And if the second cache line state is a modified state or an exclusive state, writing the second target data carried by the access and write request back to the cache.

Optionally, the method further comprises:

If the second cache line state is the sharing state, the state of the core register is adjusted to be the dormant state, a non-data transaction register is set, and a second request is sent to the master node; the second request is used for adjusting the second cache line state to an exclusive state;

Transmitting a third response message to the master node under the condition that a second response message of the master node aiming at the second request is received; the third response message is used for indicating that the second response message is received.

Optionally, the adjusting the state of the access register to the sleep state and sending the first request to the master node when the access request misses the cache includes:

Under the condition that the access and write request does not hit the cache, the state of the access register is adjusted to be dormant, the analysis result of the access and write request is written into a second flag bit of a read transaction register through the cache pipeline unit, the second flag bit is set, and a second read request is initiated to the master node; the second read request is used for reading cache data corresponding to the second request address, and setting the cache data corresponding to the second request address in other processor cores as invalid data;

Receiving the cache data returned by the master node, and replacing the cache data by using second target data carried in the access and write request;

Setting a first flag bit of the read transaction register and sending a fourth response message to the master node; the fourth response message is used for indicating that the response of the master node to the second read request is received.

Optionally, the method further comprises:

Judging whether an address item which is the same as a third request address of the third request exists in a memory access register of the control module under the condition that the third request sent by the master node is received;

If the address item which is the same as the third request address exists in the access register, setting the priority of the third request as the lowest priority, waiting for arbitration by the arbitration unit and then feeding the third request into the cache pipeline unit;

and if the address item which is the same as the third request address does not exist in the access register, setting the priority of the third request as the highest priority.

On the other hand, the embodiment of the invention discloses a cache control device which is applied to a local cache controller, wherein the local cache controller comprises a bus interface module, a control module and a cache pipeline unit; the device comprises:

The request receiving module is used for recording the request information of the access request into the access register of the control module under the condition that the access request of the access module is received, and sending the request information into the cache pipeline unit after waiting for arbitration by the arbitration unit;

The query module is used for querying whether the access request hits the cache or not by utilizing the cache pipeline unit;

The execution module is used for executing the access request under the condition that the access request hits in the cache;

The adjusting module is used for adjusting the state of the access register to a dormant state and sending a first request to a main node under the condition that the access request does not hit the cache; the first request is used for obtaining cache data required by the access request.

In still another aspect, the embodiment of the invention also discloses an electronic device, which comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is used for storing executable instructions, and the executable instructions enable the processor to execute the cache control method.

The embodiment of the invention also discloses a readable storage medium, which enables the electronic equipment to execute the cache control method when the instructions in the readable storage medium are executed by the processor of the electronic equipment.

The embodiment of the invention has the following advantages:

The embodiment of the invention provides a cache control method, which realizes non-blocking pipeline processing of access requests cooperatively through a bus interface module, a controller and a cache pipeline unit in a local cache controller; in addition, under the condition of cache miss, the embodiment of the invention adjusts the state of the access register into a dormant state, and sends a first request to the master node, the master node processes the requests from all nodes, and the cache consistency and the memory management are maintained, so that the cache consistency in the multi-core processor system is ensured under the condition of no excessive hardware cost and bus cost.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of steps of an embodiment of a cache control method of the present invention;

FIG. 2 is a schematic diagram of a local cache controller architecture according to the present invention;

FIG. 3 is a flow chart of a process of a processor read request according to the present invention;

FIG. 4 is a schematic flow diagram of a process of a processor write request according to the present invention;

FIG. 5 is a block diagram illustrating an embodiment of a cache control apparatus according to the present invention;

fig. 6 is a block diagram of an electronic device according to an example of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present invention may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, the term "and/or" as used in the specification and claims to describe an association of associated objects means that there may be three relationships, e.g., a and/or B, may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The term "plurality" in embodiments of the present invention means two or more, and other adjectives are similar.

Method embodiment

First, some terms of art that may be involved in the present invention will be explained:

The coherent bus interface (Coherent Hub Interface, CHI) protocol is a protocol for building high-performance, low-power multiprocessor systems. It provides an efficient, low latency method to connect multiple processors, accelerators, memory, and other system components to achieve shared memory access and cache coherency. The CHI protocol is an evolution of the AXI Coherence Extension (ACE) protocol, which is part of the AMBA bus of Arm, and aims to solve the cache coherence problem in multi-core processor systems.

MESI (Modified, exposed, shared) protocol: is a common cache coherency protocol for cache coherency management in multiprocessor systems. It ensures that the cache contents of multiple processors remain consistent by appending state information to each cache line.

Cache (Cache): is a high-speed memory, which has access speed faster than the system memory and is equivalent to the processing speed of a CPU, but the capacity of a buffer memory is much smaller than that of the system memory, and the buffer memory is based on the principle that 'local behavior of program execution and data access' in the memory, namely, the accessed codes and data are concentrated in a part in a certain program execution time and space.

Network-on-Chip (NoC): is a communication network architecture implemented on an integrated circuit chip for interfacing various functional blocks, processor cores, memory units, and other important components on the chip. With the increase in on-chip integration and the rise of multi-core processors, nocs have become increasingly important because it provides an efficient, low-latency way of communicating within the chip.

Request Node (RN): refers to a node responsible for generating transaction requests and sending those requests to other nodes in the system. In the ARM AMBA CHI protocol, the RN is a critical component responsible for initiating transaction requests and interacting with other system nodes. Its main functions include generating transaction requests, address translation and routing, transaction management, and maintaining cache coherency. Through these functions, the RN ensures efficiency and consistency of data transmission in a high performance computing system.

Buffer pipeline (CACHE PIPELINE, CPL): when the processor accesses data, if the data hits in the Cache, the data can be directly read from the Cache, and other tasks are processed in parallel by utilizing a Pipeline technology; if the data does not hit in the Cache, the data needs to be read from the main memory and stored in the Cache for subsequent access. In this process, the Pipeline technique can help to improve the efficiency of data transfer from main memory to Cache.

Master Node (Home Node, HN): is the node responsible for managing and storing data within a particular address range and handling requests from other nodes. HN plays a vital role in maintaining cache coherency and managing memory accesses.

Direct buffer transfer (DIRECT CACHE TRANSFER, DCT): the method refers to the action of directly sending cache data to a requester by the RN in the CHI protocol, and belongs to data transmission from the RN to the RN.

Load Exclusive (LE, also known as Load-Reserved or LR) is a special instruction provided in processor architectures such as RISC-V and ARMv8 to achieve atomic operations and memory synchronization. The Load Exclusive instruction reads a value from memory and stores it in a register. Unlike a normal read instruction, it will also mark the memory address as either a "reserved" or an "Exclusive" state, meaning that other processors or threads should not modify the value of the memory address until a subsequent Store-Exclusive instruction is executed.

Store-Exclusive (SE, also known as Store-Conditional or SC) is a special instruction used with Load-Exclusive (LE) instructions, commonly found in ARMv8 (also known as AArch 64) architecture and RISC-V processors, for implementing atomic operations in lock-free programming. The Store Exclusive instruction attempts to Store a value in a memory address previously marked as "Exclusive" or "reserved" by the Load Exclusive instruction. If the memory address has not been modified by other processors or threads since the Load Exclusive instruction was executed (i.e., is still in an "Exclusive" state), the Store Exclusive instruction will successfully Store the value to the address and clear its "Exclusive" state. If the state of the memory address has changed (i.e., is no longer in the "Exclusive" state), the Store Exclusive instruction may fail.

The message of Snoop: plays an important role in multiprocessor systems for maintaining coherency of shared data. The method ensures that the cache states of all processors are kept consistent by informing other processors of the modification condition of shared data, thereby improving the reliability and performance of the system. In the system design and optimization process, the transmission mode, type and application scene of the Snoop message need to be considered so as to realize efficient cache consistency management.

In-order processor: also known as an in-order processor, is a processor that sequentially executes instructions in a program-specified order.

Out-of-order processor: also known as out-of-order processors, the order of execution of instructions is dynamically adjusted to maximize the throughput and efficiency of the processor by detecting data dependencies and other potential conflicts between instructions.

The Miss Status processing register (Miss Status HANDLING REGISTER, MSHR) is a hardware architecture for handling cache misses (CACHE MISS). The main function of the MSHR is to track and manage the status of cache misses, ensuring that these outstanding requests can be properly handled and restored as data is returned from higher level caches or mainframes.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a cache control method according to the present invention may specifically include the following steps:

step 101, under the condition that a memory access request of a memory access module is received, recording request information of the memory access request into a memory access register of a control module, waiting for arbitration by an arbitration unit and then feeding the request information into a cache pipeline unit;

step 102, inquiring whether the access request hits in a cache or not by using the cache pipeline unit;

Step 103, executing the access request under the condition that the access request hits in a cache;

Step 104, under the condition that the access request does not hit the cache, adjusting the state of the access register to a dormant state, and sending a first request to a master node; the first request is used for obtaining cache data required by the access request.

The cache control method provided by the embodiment of the invention can be applied to a local cache controller, wherein the local cache controller comprises a bus interface module, a control module and a cache pipeline unit.

It should be noted that the present invention provides a local cache controller of a multi-core processor system based on CHI protocol, which can be applied to an in-order processor. The local cache controller and the processor core form an RN node, and exist in a NoC system based on CHI protocol. One or more HN nodes exist in the NoC system, the RN initiates a request to the HN, and the reading and the writing back of data and the change of the cache state are realized. Meanwhile, the RN receives a message of a Snoop from the HN and responds.

Referring to fig. 2, an architecture diagram of a local cache controller according to the present invention is shown. As shown in fig. 2, the local cache controller includes three main modules: a bus interface module, a control module, and a Cache Pipeline (CPL) unit.

The bus interface module is used for sending and receiving CHI messages, and interaction with the HN is realized through the module. It contains 6 channels: 1. RXSNP, a received message of the Snoop; 2. RXRSP: a received Response message; 3. RXDAT, receiving a Data message; 4. TXREQ, a transmitted Request message; 5. TXDAT, sending a Data message; 6. TXRSP, a Response message is sent. The bus interface module may be considered as a bridge of the CHI bus interface and the MSHR, with both data transfers being effected in a non-blocking manner. In addition, there are two functions of the module: 1. link confidence management, realizing link handshake and activation; 2. port arbitration, which is performed when a single sending channel has a plurality of transactions to be transmitted, wherein an adopted arbitration policy is a fixed arbitration priority: the transaction with the snoop attribute has the highest priority, and is the next to exclusive access transaction, and finally the normal transaction.

The control module, which may also be called an MSHR controller, is a core control module of the local cache controller, and is used for coordinating and completing requests from the processor and HN. MSHR in the present invention can be classified into 5 types: 1. a memory access register (LSU req entry) for recording a read-write request from the processor core; 2. a Snoop register (Snp entry) for recording a Snoop request from HN; 3. a read transaction register (READ ENTRY) for recording a read transaction request sent to the HN; 4. a write transaction register (WRITE ENTRY) for recording a write transaction request sent to the HN; 5. a non-data transaction register (DATALESS ENTRY) for recording non-data transaction requests sent to the HN. Each MSHR contains multiple terms to support concurrent processing. Each MSHR holds two parts of content: the data part (request content) and the control part (for flow control) are different from each other, but the control parts are the same, so that a unified control framework is realized. Their data portions are defined as follows:

1. the read-write request of the processor core includes the following fields:

Rd_data read data

Wr_data: written data

Addr: address of read-write request

Wr_en write enable with strobe function

Rd_en: read enabling

Excl _flag: whether or not it is exclusive access

2. Snoop request from HN

RxSnp: received snoop message

TxRsp: response message sent

TxDat: data for DCT

3. Read transaction request sent to HN

TxReq: transmitted request message

RxRsp: received response message

TxRsp: response message sent

RxDat: received data message

4. Write transaction request sent to HN

TxReq: transmitted request message

RxRsp: received response message

TxDat: data message sent

5. Non-data transaction request sent to HN

TxReq: transmitted request message

RxRsp: received response message

TxRsp: response message sent

The MSHR control portion is defined as follows:

Valid: indicating that the item is valid when set (pulled high)

Busy: set (pulled high) indicates that the item of content has been processed

Done: indicating that the content has been processed when set (pulled high)

The MSHR control part adopts the Valid-busy-done control mechanism as above, and has stronger robustness.

The buffer pipeline module is a module for processing tasks from the MSHR item controller in a pipeline mode and is responsible for recording and updating the state and data of the cache, and the pipeline processing improves the data access efficiency under a concurrent scene.

It should be noted that, in the embodiment of the present invention, the memory access module may include a processor core and a next level cache (LLC) controller, and the memory access request may include a read request and a write request.

Under the condition that the local cache controller receives the memory access request of the memory access module, request information is recorded in a memory access register of the MSHR controller, meanwhile, an MSHR control bit valid of the LSU register is set, then the local cache controller enters an arbitration unit to wait for the CPL module to process, and once arbitration passes, the MSHR control bit busy of the local cache controller is set. Wherein the arbitration unit (Arb) is configured to process concurrent request messages from the access register, the read transaction register, the write transaction register, the snoop register and the non-data register. The request information recorded into the access register may include a request address of the access request, an exclusive identifier indicating whether or not the access is exclusive, etc., and when the exclusive identifier indicates that the access request is exclusive, excl _flag of the data portion of the access register may be set (e.g., set to "1") or the level of Excl _flag may be pulled high. In the case where the memory request is a read request, rd_en of the data portion of the memory register may be set or pulled high, indicating that the request is a read request. Similarly, where the memory request is a write request, wr_en of the data portion of the memory register may be set or pulled high, indicating that the request is a write request.

The MSHR controller sends the request to the CPL unit for processing according to the arbitration priority of the request. In one possible application scenario, the priority of the memory request from the memory module is lowest, in other words, the priority of the request recorded in the memory register is lowest.

After the CPL unit receives the access request, it will inquire whether the access request hits the cache. If a hit, the memory access request is directly executed. Illustratively, if the memory request is a read request and the cache is hit, the CPL will return the requested data directly to the memory register, writing to the Rd_data portion. If the access request is a write request and hits the cache, the CPL directly writes the data carried in the write request into the cache and ends the request.

If the memory request misses the cache, the memory register in the MSHR controller is set to a sleep state, e.g., busy in the control portion of the memory register is set or level maintained high, and done is cleared or level maintained low (busy is high, indicating that the transaction in the LSU register waits for CPL processing to complete, and therefore no new request is initiated to the arbitration unit, done is low, indicating that the transaction in the LSU register has not yet been completed). Then, a first request is initiated to the HN node, requesting to acquire the required cached data.

It can be understood that the CPL unit may query the cache according to the request address of the memory request, and if there is data corresponding to the request address in the cache, it may determine that the memory request hits the cache; otherwise, if the data corresponding to the request address does not exist in the cache, it can be determined that the access request misses the cache.

Of course, it may also be determined whether the access read request hits the cache according to the cache line status.

Optionally, the method further comprises:

step S11, under the condition that a memory access request of a memory access module is received, inquiring a cache line state corresponding to a request address of the memory access request;

Step S12, determining that the access request hits in a cache when the cache line state is any one of a modified state, an exclusive state and a shared state;

Step S13, determining that the access request misses the cache when the cache line state is invalid.

The cache line state in the embodiments of the present invention complies with the MESI-based coherency protocol. In the MESI protocol, each cache line has four states, namely an exclusive (E), a modified (M), a shared (S) and an invalid (I) state.

Wherein the M state represents that the contents of the cache line are modified and the cache line is cached only in the CPU. The data in the cache line in this state is not the same as in memory, and it is written to memory at some future time (when other CPUs are to read the contents of the cache line, or when other CPUs are to modify the contents of the memory to which the cache line corresponds).

The E state represents that the content in the memory corresponding to the cache line is only cached by the CPU, and other CPUs do not cache the content in the memory corresponding to the cache line. The contents of the cache line in this state are consistent with the contents of the memory. The cache line may change to the S state when any other CPU reads the contents of the corresponding memory line of the cache line, or the local processor writes the cache line, which changes to the M state.

The S state means that the data is not only in the local CPU cache, but also in the caches of other CPUs. The data of the cache line in this state is consistent with the data in the memory. When a CPU modifies the content of the memory line corresponding to the cache line, the state of the cache line is changed into an I state.

The I state represents that the contents of the cache line are invalid.

In the embodiment of the invention, if the state of the cache line corresponding to the request address of the access request is any one of M, E, S, the access request can be determined to hit in the cache; if the cache line state is I-state, it may be determined that the access request misses the cache.

And under the condition that the access request does not hit in the cache, the local cache controller sets the state of the access register to be dormant, and then sends a first request to the HN node. After receiving the first request, the HN node feeds back the cache data corresponding to the access request to the local cache controller. If the read request misses the cache, the HN node feeds back cache data corresponding to a request address of the read request to the local cache controller after receiving the first request, and the local cache controller writes the cache data back to the Rd_data part of the memory register and feeds back the cache data to the memory module; if the writing request does not hit the cache, after the HN node receives the first request, the cache data corresponding to the request address of the writing request is modified to be in a state that the local cache controller can access, and then the local cache controller writes the data to be written carried in the writing request back to the cache to replace the corresponding cache data in the request address.

According to the cache control method provided by the embodiment of the invention, the non-blocking pipeline processing of the access request is cooperatively realized through the bus interface module, the controller and the cache pipeline unit in the local cache controller; in addition, under the condition of cache miss, the embodiment of the invention adjusts the state of the access register into the dormant state, and sends the first request to the master node, the master node processes the requests from all nodes, and the cache consistency and the memory management are maintained, so that the cache consistency in the multi-core processor system is ensured.

In an alternative embodiment of the present invention, the memory access request comprises a memory access read request; step 104, in the case that the access request misses the cache, adjusting the state of the access register to the sleep state, and sending a first request to the master node, including:

Step S21, under the condition that the access read request does not hit the cache, the state of the access register is adjusted to be a dormant state, a read transaction register in the control module is set, and a first read request is initiated to the master node; the first read request is used for requesting first target data required by the access read request;

Step S22, after the first target data returned by the master node is read from the read transaction register, activating the memory access register, and setting a first flag bit of the read transaction register; the first flag bit is used for indicating to send a response message;

step S23, a first response message is sent to the master node; the first response message is used for indicating that a response of the master node to the first read request is received.

The read request in the embodiment of the invention may be a memory read request, such as a Load-Exclusive instruction. When the access module initiates an access read request, the local cache controller stores the access read request into an access register in the MSHR controller, waits for arbitration to be sent into the CPL unit, and the request priority of the access register is the lowest. The CPL queries whether the cache hits. If there is a cache miss, the memory access register in the MSHR controller may go to a sleep state (e.g., busy in the memory access register control portion is pulled high) while the READ transaction (READ) register in the MSHR is set in preparation for initiating a first READ request to the Home Node (HNF), which may be, for example, a ReadNotSharedDirty request for requesting the first target data required for the memory access READ request.

After receiving the first read request, HNF returns the first target data and sends it to RxDat entries in the read transaction register of MSHR, and sets the RxDat entries (i.e. pulls valid of RxDat entries high or maintains high), waiting for arbitration to send to the CPL unit. After the CPL reads the first target data from the read transaction register and stores the first target data in the Cache, the access register in the MSHR is activated (for example, busy in the control portion of the access register is pulled low or low is maintained), and after the read request in the re-activated access register enters the CPL unit again, the CPL unit should hit the Cache at this time, and the CPL unit returns the read Cache data to the access register, that is, writes the read Cache data into rd_data in the data portion of the access register. The TxRSP entry of the read transaction register is then set and the result of the CPL resolution is written to the TxRsp entry of the read transaction. Item TxRSP is used to indicate a send response (response) message, and if TxRSP is set, the local cache controller sends a first response message, for example CompAck, to the HNF, informing the master node that it has received its response to the first read request.

Optionally, in step 103, executing the access request if the access request hits in the cache, including:

In the embodiment of the present invention, if the memory read request hits in the cache, the CPL unit directly returns the first target data in the first request address of the memory read request to the memory register, i.e. writes the first target data into rd_data of the data portion of the memory register. The local cache controller returns the data in Rd_data to the memory module at a proper time.

Referring to fig. 3, a schematic flow chart of a processing procedure of a processor read request according to an embodiment of the present invention is shown. As shown in FIG. 3, after receiving a read request from a processor core, the local cache controller sets the MSHR memory access register and then determines whether the memory access register request arbitration passes. After the request of the access register passes arbitration, the request is sent to the CPL unit for processing, and specifically, the CPL unit judges the type of the request. If the request type is RxDat, the target data returned by the master node is sent to RxDat items of the read transaction in response to the message, and RxDat items are set, the received Cache line data in RxDat items are written into the Cache, if the access register is in a dormant state, the access register is activated, namely, the busy bit of the control part of the access register is pulled down or kept low, the activated access register sends a request to the arbitration unit again, the request of the access register is successfully executed by the Cache pipeline, and the target data is provided to the local processor, namely, written into Rd_data of the data part of the access register, so that the processing flow is completed.

If the request type is processor reading and the Cache is not hit, writing the analysis result of the Cache pipeline into TxReq items in a read transaction register, setting the TxReq items, initiating TxReq to a main node, requesting the main node to send Cache data, receiving the Cache data returned by the main node, namely RxDat, then sending the Cache data into the Cache pipeline, storing the Cache data into a local Cache, activating a memory access register which is kept in a dormant state, arbitrating the activated memory access register request, and processing the arbitration by a CPL unit. At this time, it can be guaranteed that the Cache hits, and the CPL unit returns the read Cache data to the memory register.

In another alternative embodiment of the present invention, the memory request includes a memory write request; step 103, executing the access request if the access request hits in the cache, including:

step S31, under the condition that the access write request hits in the cache, inquiring a second cache line state corresponding to a second request address of the access write request;

step S32, if the second cache line state is a shared state, the state of the access register is adjusted to be a dormant state, a non-data transaction register is set, and a second request is sent to the master node; the second request is used for adjusting the second cache line state to an exclusive state;

Step S33, under the condition that a second response message of the master node aiming at the second request is received, a third response message is sent to the master node; the third response message is used for indicating that the second response message is received;

Step S34, if the second cache line status is a modified status or an exclusive status, writing the second target data carried by the access write request back to the cache.

The write request in the embodiment of the invention may be a memory write request, such as a Store-Exclusive instruction. When the memory module initiates a memory write request, the local cache controller stores the memory write request into a memory register in the MSHR, waits for arbitration to be sent to the CPL unit, and the memory register request has the lowest priority. The CPL can inquire whether the cache hits or not, if the cache hits and the cache state corresponding to the second request address of the access write request (namely, the second cache line state) is M/E, the CPL can directly write the data into the cache and end the request. Otherwise, the memory access register in the MSHR enters a sleep state, and if the cache state is S, a second request, such as CleaUnique, needs to be initiated. After receiving CleaUnique the request, the master node sends a Snoop message-SnpCleanUnique to the other cores, clears the cache line copy of the other processor cores, changes the cache state to E, and returns a second response message to the local cache controller that initiated the second request, informing it that the cache state has been changed. And after receiving the second response message, the local buffer controller returns a third response message to the master node so as to inform the master node that the master node has received the second response message aiming at the second request.

Optionally, in step 104, in the case that the access request misses the cache, the state of the access register is adjusted to a dormant state, and a first request is sent to the master node, including:

Step S41, under the condition that the access and write request does not hit the cache, the state of the access register is adjusted to be in a dormant state, the analysis result of the access and write request is written into a second flag bit of a read transaction register through the cache pipeline unit, the second flag bit is set, and a second read request is initiated to the master node; the second read request is used for reading cache data corresponding to the second request address, and setting the cache data corresponding to the second request address in other processor cores as invalid data;

step S42, receiving the cache data returned by the master node, and replacing the cache data by using second target data carried in the access write request;

Step S43, setting a first flag bit of the read transaction register and sending a fourth response message to the master node; the fourth response message is used for indicating that the response of the master node to the second read request is received.

In the embodiment of the present invention, if the cache misses, the local cache controller adjusts the state of the access register to the sleep state, and sends a second read request, for example, initiate ReadUnique, to the master node to obtain the cache data with the M/E state. After receiving the ReadUnique request, the master node reads the cache data corresponding to the second request address, and sets the cache data corresponding to the second request address in the other processor cores to invalid data, for example, clears the cache line copy corresponding to the second request address in the other processor cores. Next, the local cache controller may execute the access write request: and replacing the current cache data of the second request address by using the second target data carried in the access and write request. Simultaneously, setting a first flag bit TxRsp item of the read transaction register by the MSHR controller, and sending a fourth response message to the master node; the fourth response message is used for indicating that the response of the master node to the second read request is received.

Specifically, under the condition that the access and write request does not hit the cache, the MSHR controller adjusts the state of the access and memory register to be in a dormant state, the cache pipeline writes the analysis result of the access and memory register request into a second flag bit TxReq item of the read transaction register, sets the TxReq item and initiates a second read request to the master node; the second read request is used for reading cache data corresponding to the second request address, and setting the cache data corresponding to the second request address in other processor cores as invalid data;

The MSHR controller writes the received cache data returned by the master node into RxDat items of a corresponding read transaction register, sets the RxDat items at the same time, initiates a request to an arbitration unit by the set RxDat items, requests the cache pipeline to enter after the arbitration pass RxDat, and writes the received cache line data in an exclusive state into a local cache. And after RxDat requests are successfully executed by the cache pipeline, writing the execution result into TxRsp items in the read transaction register, activating the access register which is kept in a dormant state, arbitrating the activated access register requests, and processing the arbitrated access register requests by the CPL unit. At this time, the CPL unit replaces the Cache data with the second target data carried in the access and write request, and simultaneously releases the access register to complete the access request.

Then, setting the TxRsp item, setting the TxRSP item to indicate to send a response (response) message, and in the case that TxRSP is set, sending, by the local cache controller, a fourth response message, for example CompAck, to the HNF, informing the master node that its response to the second read request has been received, thereby ending the read transaction, and releasing the read transaction register, for example, pulling the done bit of the read transaction register high in a pulsed manner.

Referring to fig. 4, a schematic flow chart of a processing procedure of a processor write request according to an embodiment of the present invention is shown. As shown in FIG. 4, after receiving a write request from the processor core, the local cache controller sets the MSHR access register and then determines whether the write request is arbitrated. After the write request passes the arbitration, the write request is sent to the CPL unit for processing, and specifically, the CPL unit judges the request type. If the request type is Comp, i.e. the response message returned by the master node is sent to RxRsp items of the non-data transaction, and RxRsp items are set, the Cache state is changed to an exclusive state, the execution result of the request by the Cache pipeline is written into TxRsp items in the non-data register, if the access register is in a dormant state, the access register is activated, i.e. the busy bit of the control part of the access register is pulled down or kept low, the activated access register sends a request to the arbitration unit again, the access register request is successfully executed by the Cache pipeline, and the target data carried by the access request is written into the local buffer. Completing the process flow

After the cache pipeline successfully executes the access request, the access register is released, the TxRsp items in the read transaction register are set, txRsp, such as CompAck, is initiated to the master node, cleanUnique is ended, and the non-data transaction register of the MSHR is released.

If the request type is a memory write request and the MSHR memory register is dormant, no processing is performed.

If the request type is a memory access write request, the Cache hits, and the Cache state is M/E, the data is written to the Cache, and the Cache line state is changed to M. At this point, the MSHR register need not be triggered.

If the request type is a memory access write request, the Cache hits, and the Cache state is S, the Cache pipeline adjusts the memory access register state to be a dormant state, simultaneously writes the analysis result of the memory access register request into TxReq items of a non-data transaction register, sets the TxReq items, and initiates a second request to the master node; the second request is used for adjusting the state of the second cache line to be in an exclusive state, and setting cache data corresponding to the second request address in other processor cores to be invalid data; and the MSHR controller writes the received response message of the second request returned by the main node into RxRsp items of a corresponding non-data transaction register, sets the RxRsp items, initiates a request to an arbitration unit by the set RxRsp items, and requests the cache pipeline to modify the target cache line state of the access request into an exclusive state after the request passes RxRsp. After RxRsp requests are successfully executed by the cache pipeline, the execution result is written into TxRsp item in the non-data transaction register, the TxRsp item is set, txRSP item is used for indicating to send a response (response) message, and under the condition that TxRSP is set, the local cache controller sends a first response message, for example CompAck, to the HNF, and informs the master node that a response to the second request has been received, so that the transaction is ended, and the non-data transaction register is released (the done bit of the data transaction register is pulled high in a pulse mode).

If the request type is a memory access write request and the Cache is not hit, setting an MSHR read transaction register, initiating TxReq, judging whether the arbitration of the request is passed or not again after receiving TxDat, and writing the data carried by the memory access write request into a local Cache after the arbitration is passed.

In addition, the local buffer controller may also receive a snoop message. When the local buffer controller receives the request of the monitoring message, the monitoring message is sent to the monitoring register queue, and the request of the first-in first-out queue mode is adopted to enter the CPL unit for pipeline processing. When the request of the snoop register is arbitrated, firstly judging whether the snoop address in the register has a matched item in the MSHR access register, and if so, setting the request priority of the snoop register to be the lowest. Snoop requests can be divided into three categories: cache invalidates requests, forwards read data/instruction requests from other processor cores, and writes back requests to the master node.

If a Cache invalidation request is received, the local Cache state can be disregarded, and the invalidation can be directly discarded.

If a read data/instruction request for forwarding other processor cores is received, if the local cache has a copy of the address, the data is directly sent to the RN.

If a snoop request written back to the main node is received, if the local cache has a copy of the address and the cache state is M, the data is sent back to the main node, otherwise, the cache data is directly invalidated.

Optionally, the method further comprises:

step S51, judging whether an address item which is the same as a third request address of the third request exists in a memory access register of the control module under the condition that the third request sent by the master node is received;

Step S52, if the address item which is the same as the third request address exists in the access register, setting the priority of the third request as the lowest priority, waiting for arbitration by the arbitration unit and then feeding the third request into the cache pipeline unit;

Step S53, if the address item which is the same as the third request address does not exist in the access register, the priority of the third request is set as the highest priority.

It should be noted that, in the embodiment of the present invention, the third request is a snoop request.

In the embodiment of the invention, the arbitration priority of various requests can be classified into 3 grades, the priority of the processor requests is the lowest, the priority of the processor requests is the same, when the priority of the processor requests after being activated by CPL for the second time is the highest in all the processor requests, the priority of the processor requests after being activated by CPL for the second time is the same.

The read-write request received by the local buffer controller is higher than the request of the processor, and the request priority of the monitoring register is highest. When the address of the snoop request is equal to one of the addresses in the MSHR memory registers, the request priority of the snoop register is set to be the lowest. And the priority of the current MSHR access register is readjusted to the highest level until no matched address exists in the current MSHR access register.

In summary, the embodiment of the invention provides a cache control method, which realizes non-blocking pipeline processing of access requests cooperatively through a bus interface module, a controller and a cache pipeline unit in a local cache controller; in addition, under the condition of cache miss, the embodiment of the invention adjusts the state of the access register into the dormant state, and sends the first request to the master node, the master node processes the requests from all nodes, and the cache consistency and the memory management are maintained, so that the cache consistency in the multi-core processor system is ensured.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Device embodiment

Referring to FIG. 5, there is shown a block diagram of a cache control apparatus of the present invention, applied to a local cache controller including a bus interface module, a control module, and a cache pipeline unit; the device may specifically include:

The request receiving module 501 is configured to record, when a memory request of the memory module is received, request information of the memory request into a memory register of the control module, wait for arbitration by the arbitration unit, and then send the request information into the cache pipeline unit;

a query module 502, configured to query, using the cache pipeline unit, whether the access request hits in a cache;

An executing module 503, configured to execute the access request if the access request hits in a cache;

an adjusting module 504, configured to adjust, when the access request misses in the cache, a state of the access register to a sleep state, and send a first request to a master node; the first request is used for obtaining cache data required by the access request.

Optionally, the memory access request includes a memory access read request; the adjustment module comprises:

the first adjusting sub-module is used for adjusting the state of the access register to a dormant state under the condition that the access read request does not hit the cache, setting a read transaction register in the control module and initiating a first read request to the master node; the first read request is used for requesting first target data required by the access read request;

The second adjusting sub-module is used for activating the memory access register after the first target data returned by the main node is read from the read transaction register, and setting a first flag bit of the read transaction register; the first flag bit is used for indicating to send a response message;

The first sending submodule is used for sending a first response message to the main node; the first response message is used for indicating that a response of the master node to the first read request is received.

Optionally, the execution module includes:

and the read request processing sub-module is used for writing the first target data corresponding to the first request address of the access read request back to the access register under the condition that the access read request hits in the cache.

Optionally, the memory access request includes a memory access write request; the execution module comprises:

The first inquiring submodule is used for inquiring a second cache line state corresponding to a second request address of the access and write request under the condition that the access and write request hits in the cache;

and the first write request processing submodule is used for writing the second target data carried by the access write request back to the cache if the second cache line state is a modified state or an exclusive state.

Optionally, the execution module further includes:

The second write request processing submodule is used for adjusting the state of the core register to be a dormant state if the state of the second cache line is a shared state, setting a non-data transaction register and sending a second request to the main node; the second request is used for adjusting the second cache line state to an exclusive state;

a second sending sub-module, configured to send a third response message to the master node when receiving a second response message of the master node for the second request; the third response message is used for indicating that the second response message is received.

Optionally, the adjusting module includes:

The third adjusting sub-module is used for adjusting the state of the access register to a dormant state under the condition that the access and write request does not hit the cache, writing the analysis result of the access and write request into a second flag bit of a read transaction register through the cache pipeline unit, setting the second flag bit and initiating a second read request to the master node; the second read request is used for reading cache data corresponding to the second request address, and setting the cache data corresponding to the second request address in other processor cores as invalid data;

The third write request processing sub-module is used for receiving the cache data returned by the main node and replacing the cache data by using second target data carried in the memory access write request;

The third sending submodule is used for setting a first flag bit of the read transaction register and sending a fourth response message to the master node; the fourth response message is used for indicating that the response of the master node to the second read request is received.

Optionally, the apparatus further comprises:

The judging module is used for judging whether an address item which is the same as a third request address of the third request exists in a memory access register of the control module under the condition that the third request sent by the master node is received;

The first setting module is used for setting the priority of the third request as the lowest priority if the address item which is the same as the third request address exists in the access register, and sending the third request to the cache pipeline unit after waiting for arbitration by the arbitration unit;

and the second setting module is used for setting the priority of the third request as the highest priority if the address item which is the same as the third request address does not exist in the access register.

In summary, the embodiment of the invention provides a cache control device, which cooperatively realizes non-blocking pipeline processing of access requests through a bus interface module, a controller and a cache pipeline unit in a local cache controller; in addition, under the condition of cache miss, the embodiment of the invention adjusts the state of the access register into a dormant state, and sends a first request to the master node, the master node processes the requests from all nodes, and the cache consistency and the memory management are maintained, so that the cache consistency in the multi-core processor system is ensured under the condition of no excessive hardware cost and bus cost.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

The specific manner in which the various modules perform the operations in relation to the processor of the above-described embodiments have been described in detail in relation to the embodiments of the method and will not be described in detail herein.

Referring to fig. 6, a block diagram of an electronic device for access according to an embodiment of the present invention is shown. As shown in fig. 6, the electronic device includes: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus; the memory is configured to store executable instructions that cause the processor to perform the cache control method of the foregoing embodiment.

The Processor may be a CPU (Central Processing Unit ), general purpose Processor, DSP (DIGITAL SIGNAL Processor ), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field Programmable GATE ARRAY ) or other editable device, transistor logic device, hardware component, or any combination thereof. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, etc.

The communication bus may include a path to transfer information between the memory and the communication interface. The communication bus may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The communication bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in fig. 6, but not only one bus or one type of bus.

The memory may be a ROM (Read only memory) or other type of static storage device that can store static information and instructions, a RAM (Random AccessMemory ) or other type of dynamic storage device that can store information and instructions, an EEPROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLYMEMORY ), a CD-ROM (Compact Disc Read OnlyMemory, compact disc Read only memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

The embodiments of the present invention also provide a non-transitory computer-readable storage medium, which when executed by a processor of an electronic device (server or terminal), enables the processor to perform the cache control method shown in fig. 1.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems) and computer program products according to embodiments of the invention. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.

The above detailed description of the cache control method, the device, the electronic equipment and the readable storage medium provided by the invention applies specific examples to illustrate the principle and the implementation of the invention, and the above description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. The cache control method is characterized by being applied to a local cache controller, wherein the local cache controller comprises a bus interface module, a control module and a cache pipeline unit; the method comprises the following steps:

2. The method of claim 1, wherein the memory access request comprises a memory access read request; and under the condition that the access request misses the cache, adjusting the state of the access register to be a dormant state, and sending a first request to a master node, wherein the method comprises the following steps:

3. The method of claim 2, wherein the executing the memory request if the memory request hits in a cache comprises:

4. The method of claim 1, wherein the memory request comprises a memory write request; and executing the access request under the condition that the access request hits in a cache, wherein the method comprises the following steps:

5. The method according to claim 4, wherein the method further comprises:

6. The method of claim 4, wherein the adjusting the state of the access register to the dormant state and sending the first request to the master node if the access request misses the cache comprises:

7. The method according to claim 1, wherein the method further comprises:

8. The cache control device is characterized by being applied to a local cache controller, wherein the local cache controller comprises a bus interface module, a control module and a cache pipeline unit; the device comprises:

9. An electronic device, comprising a processor, a memory, a communication interface, and a communication bus, wherein the processor, the memory, and the communication interface communicate with each other via the communication bus; the memory is configured to store executable instructions that cause the processor to perform the cache control method according to any one of claims 1 to 7.

10. A readable storage medium, wherein instructions in the readable storage medium, when executed by a processor of an electronic device, enable the processor to perform the cache control method of any one of claims 1 to 7.