CN113157602A

CN113157602A - Method and device for distributing memory and computer readable storage medium

Info

Publication number: CN113157602A
Application number: CN202010014955.8A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2021-07-23
Anticipated expiration: 2040-01-07
Also published as: CN113157602B; WO2021139733A1

Abstract

The invention relates to a method, a device and a computer readable storage medium for distributing memory, wherein the device can comprise a combined processing device, and the combined processing device can also comprise a universal interconnection interface and other processing devices. The main equipment of the equipment interacts with other processing devices to jointly complete the specified calculation operation. The combined processing device can also comprise a storage device which is respectively connected with the main equipment and other processing devices and used for storing data of the main equipment and other processing devices.

Description

Method and device for distributing memory and computer readable storage medium

Technical Field

The present disclosure relates generally to the field of computers. More particularly, the present disclosure relates to methods, apparatus, and computer-readable storage media for allocating memory.

Background

Double data rate synchronous dynamic random access memory (DDR SDRAM) is applied more and more widely in current computers, and is equipped with a multi-channel memory control technology, so that the total bandwidth of a memory can be effectively improved, and the requirements of data transmission and processing of a high-speed processor are met. But some multi-channel DDR technologies cannot realize interleaving memory distribution among channels; some cannot realize memory block level interleaving in the channel; some have only two channels in parallel and have limited access bandwidth. Therefore, how to obtain a technical solution for efficiently allocating memory is still a problem to be solved in the prior art.

Disclosure of Invention

To at least partially solve the technical problems mentioned in the background, aspects of the present disclosure provide a method, an apparatus, and a computer-readable storage medium for allocating memory.

In one aspect, the present disclosure provides a method for allocating memory, comprising: receiving a memory allocation application for multi-channel DDR; executing: performing inter-channel interleaved memory allocation on a plurality of channels of the multi-channel DDR; and executing memory allocation interleaved within the channels on each channel for which memory is allocated.

In another aspect, the present disclosure provides an apparatus for performing data read and write operations, comprising: a transceiver configured to receive a memory allocation application from a master device forming a master-slave relationship with the device; a multi-channel DDR configured to store data; a processor configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation application: performing inter-channel interleaved memory allocation on a plurality of channels of the multi-channel DDR; and executing memory allocation interleaved within the channels on each channel for which memory is allocated.

In another aspect, the present disclosure provides a computer readable storage medium having stored thereon computer program code for allocating memory, which when executed by a processor, may perform the aforementioned method.

By using the method, the equipment and the computer readable storage medium disclosed by the invention, the memory can be interleaved and distributed among multiple channels and multiple memory blocks, and the access bandwidth is greatly improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

FIG. 1 is a schematic diagram illustrating a multi-channel interleaving scheme of the present disclosure, taking a two-channel example;

FIG. 2 is a schematic diagram illustrating 2 adjacent memory chunks within the same channel according to an embodiment of the disclosure;

FIG. 3 is a flow chart illustrating a method of allocating memory in accordance with an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating multi-channel interleaving in accordance with an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating multiple memory block interleaving according to an embodiment of the present disclosure;

FIG. 6 is a flow diagram illustrating a method of allocating memory in accordance with another embodiment of the present disclosure;

FIG. 7 is a flow diagram illustrating a method of allocating memory in accordance with another embodiment of the present disclosure;

FIG. 8 is a schematic diagram illustrating multi-channel interleaving between adjacent two channels according to an embodiment of the present disclosure;

FIG. 9 is a flow diagram illustrating a method of allocating memory in accordance with another embodiment of the present disclosure;

FIG. 10 is a schematic diagram illustrating multiple memory block interleaving according to another embodiment of the present disclosure;

FIG. 11 is an architecture diagram illustrating an apparatus for memory allocation according to another embodiment of the present disclosure;

FIG. 12 is a block diagram illustrating an integrated circuit device according to an embodiment of the present disclosure; and

fig. 13 is a block diagram illustrating a board card according to an embodiment of the present disclosure.

Detailed Description

By using the method, the device and the computer readable storage medium disclosed by the invention, the memory of the multi-channel multi-memory block can realize the interleaving and memory allocation between the channels and the memory blocks, so that the access bandwidth is further improved, and the parallelism is increased.

Memories are indispensable in present day computer devices, and with the development of technology, many memories can support multiple channels or multiple memory blocks (banks), such as DDR SDRAM, which is abbreviated as DDR in the industry.

DDR may support multiple channel accesses. The DDR is interleaved among multiple channels, the memory in the same block is distributed into different channels, so that the bandwidth of the multiple DDR in parallel connection can be increased, and the memory access performance is improved. If data is distributed over memory blocks on different channels, the memory controller can read the data in parallel across the multiple channels, almost a factor of four in access speed with a four channel DDR.

When the DDR performs storage, the same block of memory may be distributed to different channels to perform interleaving (interleaving), so that the same block of memory is accessed in parallel, thereby improving the efficiency of the system.

Fig. 1 illustrates a multi-channel interleaving scheme of the present disclosure, taking two channels as an example. As shown, the disclosed embodiment has a first channel 102 and a second channel 104, each of which has a considerable page size (page size), and for convenience of illustration, each channel only shows 32 page sizes 106, and each page size is 16KB, which is enough as granularity of interleaving between channels, and does not affect the normal jump instruction of the upper layer. In performing multi-channel interleave allocation, the memory space is allocated first to the first page size 108 of the first channel 102, then to the first page size 110 of the second channel 104, and then to the second page size 112 of the first channel 102.

When the multiple memory block interleaving scheme is realized by the DDR disclosed by the invention, the multiple memory blocks are interleaved and accessed in the same channel, and when instruction skip exists in upper-layer service, each computing unit can only access the memory blocks in the same channel. FIG. 2 is a diagram showing 2 adjacent memory chunks within the same channel: the first memory block 202 and the second memory block 204, in this disclosure, are interleaved with access in adjacent two memory blocks. Each memory block is shown with 8 address spaces, addresses addr0 through addr 7. In the embodiments of the present disclosure, the address space is smaller than the page size, for example, in the case of 16KB, the address space may be 1KB, that is, when two memory block interleaving accesses are performed in the same channel, the memory unit accessed at each time is one address space. For example, at 0x000, the address addr0 allocated to the first memory block 202; address addr0, allocated to second memory block 204 at 0x 400; at 0x800, address addr1 allocated to the first memory block 202; at 0xc00, address addr1 of the second memory block 204. And distributing in this way until all the page sizes are distributed.

When a plurality of computing units access and an upper-layer service has instruction skip, the memory blocks in the same channel can be accessed only by each computing unit by using the interleaving access of the memory blocks, so that the access to remote channels can be reduced, and the access conflict between different channels of each computing unit can also be reduced.

One embodiment of the present disclosure is a method for allocating memory, particularly when a host side issues a command to an acceleration card device, the command may be a dynamic memory allocation (malloc) function.

The memory of this embodiment has 4 channels, and the allocation is performed by interleaving the 4 channels, and the flow of the allocation method is shown in fig. 3.

In execution of step 302, a memory allocation application for a 4-channel DDR is received. More specifically, a command of the dynamic memory allocation function is received, where the command may be to dynamically allocate a part of memory in the accelerator card device, and the device side will apply for a physical memory in the physical memory after being aligned according to a specified size.

In performing step 304, inter-channel interleaved memory allocation is performed on the 4-channel DDR. In more detail, the device side allocates memory space equally to the 4 channels based on page size. Fig. 4 is a schematic diagram showing this step, and as shown in the drawing, the 4-channel DDR of the present embodiment includes: the first channel 402, the second channel 404, the third channel 406, and the fourth channel 408 each include a plurality of address spaces, and for convenience of illustration, each of the channels only shows 32 address spaces, addresses addr00 through addr31 (not shown), and each address space is 8 KB. If the page size is 16KB, one page size is formed per 2-bit address space.

When performing inter-channel interleaved memory allocation, current devices perform inter-channel interleaved memory allocation on each channel cyclically at a page-sized inter-channel granularity. For example, at 0x000, the first page size 410 allocated to the first channel 402 is addr00 and addr 01; at 0x400, the first page size 412 allocated to the second channel 404 is addr00 and addr 01; at 0x800, the first page size 414 allocated to the third channel 406 with addr00 and addr 01; at 0xc00, the first page size 416 allocated to the fourth channel 408 is addr00 and addr 01; at 0x1000, the second page size 418 allocated to the first channel 402 is addressed as addr02 and addr 03. The allocation is done in this manner.

It should be noted that, under the premise that the page size is 16KB, if the required space is only 14KB, the space is still specified at a full granularity of the page size for alignment at the time of allocation. If the required space is 20KB, the space will be specified at a granularity of 2 pages.

In performing step 306, memory allocation for intra-channel interleaving is performed on a per-channel basis of allocated memory. In this embodiment, interleaved memory allocation, e.g., in units of address space, may be performed on contiguous two memory blocks cyclically with intra-channel granularity less than the page size. FIG. 5 illustrates the first channel 402 of FIG. 4, which may be cut into 4 memory blocks: assuming that the first 4 pages of the first memory block 502 have been allocated to occupy in step 304 (as indicated by the grey block), the memory blocks of this step are interleaved and allocated in the first memory block 502 and the second memory block 504. At 0x000, the address space 510 allocated to the first memory block 502; at 0x400, the address space 512 allocated to the second memory block 504; at 0x800, the address space 514 allocated to the first memory block 502; at 0xc00, address space 516 is allocated to the second memory block 504, as indicated by the sequence of dashed arrows. And distributing in this way, and repeatedly executing until the applied memory is distributed.

In another embodiment of the present disclosure, the memory also has 4 channels, the allocation is performed by interleaving the 4 channels, and the flow of the allocation method is shown in fig. 6.

In step 602, a memory allocation application for a 4-channel DDR is received.

In step 604, it is determined whether the number of computing units involved in the command of the dynamic memory allocation function is 1.

According to the memory allocation application, if the number of the computing units related to the command of the dynamic memory allocation function is 1, then it is more efficient to adopt multi-channel interleaving allocation, so step 606 is executed, and inter-channel interleaving memory allocation is executed on the 4-channel DDR, and the operation manner is the same as that of step 304 in the previous embodiment, which is not described again.

According to the memory allocation application, if in step 604, it is determined that the number of computing units related to the command of the dynamic memory allocation function exceeds 1, it is more efficient to combine interleaving allocation of multiple channels and multiple memory blocks at the same time, so that in step 608, inter-channel and intra-channel interleaved memory allocation is performed, which is the same as the embodiment of fig. 3 and will not be described again.

In another embodiment, step 608 may also only perform interleaved access of memory blocks, for example, 4 computing units and 4 channels, and the present disclosure may allocate each channel to one computing unit, and each computing unit only accesses the memory blocks in the same channel, in this case, access to remote channels can be reduced, and access conflicts between different channels of each computing unit can be reduced.

The embodiments of the present disclosure can perform interleaving access between channels and between memory blocks, and when only one computing unit performs access, the computing unit can enjoy bandwidth of interleaving of multiple channels, and when multiple computing units perform access, each computing unit can access a memory block of a specific channel and/or enjoy interleaving of multiple channels, thereby not only improving access bandwidth, but also increasing parallelism.

The DDR channel is tested by sending a cluster (cluster) access with a single image computing unit (IPU), the cluster having 64 character fields, each character field having a size of 16 KB. Under the condition of not performing any interleaving, 55876 microseconds are consumed for completing the distribution, and the bandwidth is 18.76 GB/s; when only multi-channel interleaving is carried out, the time for completing the distribution is 71080 microseconds, and the bandwidth is 14.756 GB/s; and adopting the mode of interleaving the multi-channel multi-memory blocks disclosed by the invention, the time consumption for completing the allocation is 44057 microseconds, and the bandwidth is 23.8 GB/s. The speed is significantly increased by a lot.

Another embodiment of the present disclosure is a method for allocating memory, where the memory of this embodiment has 4 channels, and the allocation is performed between two channels, and the flow of the allocation method is shown in fig. 7.

In step 702, a plurality of service commands to be allocated to the 4-channel DDR are received, where the service commands include service data and are to be stored in the accelerator card device. In this embodiment, the plurality of service commands includes first service data, second service data, third service data and fourth service data, and each service data has a size of 30 KB.

In execution of step 704, a memory allocation request for a 4-channel DDR is received. More specifically, in response to the service command, a command of the dynamic memory allocation function is received, where the command requests to store the 4 service data into the memory of the accelerator card device.

In executing step 706, the plurality of traffic instructions are allocated to 4 lanes one by one at inter-lane granularity. The inter-channel granularity (i.e., page size) of the DDR of this embodiment is 16KB, which means that each service data requires a space of 2 pages, so that each service data is divided into a first sub data (16KB) and a second sub data (14KB), and is allocated to 4 channels in a following table manner.

In performing step 708, inter-channel interleaved memory allocation is performed on 2 channels of the 4-channel DDR. In more detail, this embodiment performs interleaving between channels in units of page size when interleaving is performed, and allocation only occurs between two adjacent channels. As shown in fig. 8, the DDR of the present embodiment includes: a first channel 802, a second channel 804, a third channel 806, and a fourth channel 808, each channel comprising a plurality of page sizes.

When memory allocation of interleaving among channels is performed, memory is cyclically applied for every two channels according to the granularity of the page size that can be supported by the current device, that is, the first channel 802 and the second channel 804 are interleaved, and the third channel 806 and the fourth channel 808 are interleaved.

As shown in the above table, the first traffic data and the second traffic data will be distributed interleaved between the first channel 802 and the second channel 804, and the third traffic data and the fourth traffic data will be distributed interleaved between the third channel 807 and the fourth channel 808. In more detail, first sub-data of the first service data is allocated to a first page size 810 of the first channel 802; allocating the second sub data of the first service data to a first page size 812 of the second channel 804; allocating the first sub data of the second service data to a second page size 814 of the first channel 802; allocating the second sub data of the second service data to a second page size 816 of the second channel 804; allocate the first sub-data of the third service data to a first page size 818 of the third channel 806; allocating the second sub data of the third service data to the first page size 820 of the fourth channel 808; allocating the first sub data of the fourth service data to a second page size 822 of the third channel 806; the second sub data of the fourth service data is allocated to a second page size 824 of the fourth channel 808.

In step 710, memory allocation for intra-channel interleaving is performed on a per-channel basis of the allocated memory. For example, in the case of 4 calculation units, each channel may be allocated to 1 calculation unit.

In performing step 712, within each channel, the allocated traffic instructions are allocated to the memory blocks one by one at intra-channel granularity. In this embodiment, if there are other service instructions that have not been allocated, for example, the fifth service data, the sixth service data, and so on, the memory block interleaving allocation may be performed in the manner shown in fig. 5 in this step, which is not described again.

In other embodiments, different memory interleaving allocation manners may be adopted for different numbers of computing units. For example, after step 704, a step of determining whether the number of computing units involved in the command of the dynamic memory allocation function is 1 is added. If the number of the computing units related to the command of the dynamic memory allocation function is 1, the multichannel interleaving allocation is more efficient, and the memory allocation interleaved between every two channels is executed on the 4-channel DDR. If the number of the computing units related to the command of the dynamic memory allocation function exceeds 1, the interleaving allocation of the multiple channels and the multiple memory blocks is more efficient, and therefore the memory allocation of interleaving between every two channels and in the channels is executed. The operation manner is similar to that of the embodiments of fig. 3 and 6, and those skilled in the art can easily perform operations based on the description of these embodiments, so that the detailed description is omitted.

Another embodiment of the present disclosure is a method for allocating memory of 4 channels, wherein the allocation is performed among the 4 channels, and a flow of the allocation method is shown in fig. 9.

In execution of step 902, a memory allocation application for a 4-channel DDR is received.

According to the memory allocation application, in step 904, inter-channel interleaved memory allocation is performed on the 4 channels. As in the previous embodiment, when performing inter-channel interleaved memory allocation, the inter-channel interleaved memory allocation is performed on each channel cyclically at the inter-channel granularity of the page size with the current device. Taking FIG. 10 as an example, two memory blocks adjacent to a single channel 1002 are shown: a first memory block 1004, a second memory block 1006. In step 904, it is assumed that the first memory block 1004 has a portion of space allocated, i.e., a gray portion of occupied space 1008.

According to the memory allocation application, in step 906, memory allocation interleaved within a channel is performed on each channel for which memory is allocated. More specifically, memory of inter-channel granularity is allocated one by one on the basis of unallocated space on a channel, and interleaving allocation is preferentially performed on unused memory blocks. In this embodiment, the second memory block 1006 is the primary memory area, and since the first memory block 1004 has participated in the allocation in step 904, it is only used as a spare memory area.

If the primary memory area (i.e., the second memory block 1006) is not enough to be stored, in step 908, the spare memory area (i.e., the first memory block 1004) is additionally used for memory allocation according to the memory allocation application, i.e., allocated within the unoccupied space in the first memory block 1004.

The embodiment can perform interleaving access among channels and memory blocks, not only can have the interleaved bandwidth of a plurality of channels, but also can better use the space of each memory block.

Another embodiment of the present disclosure is a computer-readable storage medium, on which a computer program code for allocating memory is stored, when the computer program code is executed by a processor, the method of the foregoing embodiments may be performed, for example, the technical solutions shown in fig. 3, fig. 6, fig. 7, and fig. 9.

Fig. 11 is a system 1100 for memory allocation according to another embodiment of the disclosure, the system 1100 including a host device 1102 and a device 1104, the host device 1102 being a host. Device 1104 may be an accelerator card that includes multiple computing units 1106, a transceiver 1108, a processor 1110, a buffer 1112, and multiple channel DDRs 1114, where 4 computing units 1106 are illustrated as an example and not limiting to only 4 computing units 1106, and similarly 4 DDRs 1114 are illustrated as an example and not limiting to only 4 DDRs 1114. The transceiver 1108 is configured to receive a memory allocation request from the master device 1102 in a master-slave relationship with the device 1104; the processor 1110 is configured to perform the following memory allocation operations on the multi-channel DDR 1114 according to the received memory allocation application: performing inter-channel interleaved Memory allocation on multiple channels of multi-channel DDR 1114 and intra-channel interleaved Memory allocation on each channel to which Memory is allocated, wherein processor 1110 includes a System Memory Management Unit (SMMU); the buffer 1112 may be a Last Level Cache (LLC) configured to implement memory allocation for interleaving in the channel; the multi-channel DDR 1114 is configured to store data.

The processor 1110 receives commands from the dynamic memory allocation function from the master device 1102, and when applying for memory space, targets a block of memory addresses in the device 1104. After receiving the memory application, the transceiver 1108 converts the memory application into a physical address through the processor 1110, obtains a real memory address on the DDR, and performs interleaving and allocation to the multi-channel DDR 1114 according to the memory address.

More specifically, this embodiment is applicable to the implementation of memory by the processor 1110 and the buffer 1112. The processor 1110 implements, through a driver, interleaving allocation among the multiple channels of the DDR 1114 at granularity of page size for the memory of the application, and implements interleaving allocation of multiple memory blocks within a channel through the buffer 1112. Further, in the memory application process, the allocation of the memory address requires the memory management module in the driver and the system memory management unit to participate together, wherein the memory management module is responsible for aligning the size of the application, and allocates the physical memory to the channel of the DDR 1114 according to the channel information of the current application, such as a single channel or multiple channels. The system memory management unit is responsible for managing the virtual address and managing the mapping between the applied physical address and the virtual address by laying a page table.

Each multi-channel DDR 1114 includes a DDR controller (not shown), coupled to buffer 1112. After sending the physical address to the buffer 1112, the processor 1110 applies for a corresponding virtual address according to the management of the virtual address, and implements mapping from the physical address to the virtual address by laying a page table, and then sends the virtual address related to the memory allocation application to the host 1102 through the transceiver 1108.

The host device 1102 can access the memory after receiving the virtual address, the host device 1102 sends the data to the transceiver 1108 according to the returned virtual address, and the processor 1110 converts the virtual address into a physical address and obtains a real memory address on the multi-channel DDR 1114 through the buffer 1112, so as to perform inter-channel and/or inter-memory block interleaving allocation.

If data needs to be written from the device 1104 to the host device 1102, the processor 1110 is configured to apply for memory from the host device 1102 through the transceiver 1108, and after receiving a virtual address from the host device 1102, the processor 1110 converts the virtual address into a physical address, extracts data stored in the multi-channel DDR 1114, and sends the data to the host device 1102 through the transceiver 1108.

In the case where the host 1102 writes data to the device 1104, taking an example of interleaving allocation among 4-channel DDRs 1114, the transceiver 1108 receives a memory allocation application for the 4-channel DDR, and the processor 1110 selects, based on the state of the computing unit 1106, to perform inter-channel interleaving memory allocation on the 4-channel DDR if multi-channel interleaving allocation is more efficient, for example, only one computing unit 1106 is used, the processor 1110 converts a virtual address into a physical address, and obtains a real memory address on the multi-channel DDR 1114 from the buffer 1112 to perform multi-channel interleaving allocation. Processor 1110 may then continue to perform intra-channel interleaved memory allocation based on each channel of allocated memory using multi-memory block interleaved allocation.

The architecture of this embodiment can implement the technical solutions shown in fig. 3, fig. 6, fig. 7 and fig. 9, and those skilled in the art can easily understand the technical details without creative investment, so that the detailed description is omitted.

The embodiment can perform interleaving access between channels and between memory blocks, and when one computing unit accesses, the computing unit can enjoy the bandwidth of interleaving of multiple channels, and when multiple computing units access, each computing unit simultaneously accesses between channels and the memory of a specific channel, thereby reducing access to remote channels and access conflicts between computing units.

Fig. 12 is a block diagram illustrating an integrated circuit device 1200 according to an embodiment of the disclosure. As shown, the integrated circuit apparatus 1200 includes a host device 1202, and the host device 1202 may be the host device 1102 of fig. 11. Additionally, integrated circuit device 1200 also includes a general interconnect interface 1204 and a device 1206, and device 1206 may be device 1104 of fig. 11.

In this embodiment, the main device 1202 may be one or more types of general and/or special purpose processors such as a central processing unit, a graphic processing unit, an artificial intelligence processing unit, etc., and the number thereof is not limited but determined according to actual needs.

According to the technical solution of this embodiment, the universal interconnect interface 1204 may be used for transmitting data and control instructions between the main device 1202 and the device 1206. For example, host 1202 may retrieve the required input data from device 1206 via generic interconnect interface 1204, writing to a storage device on chip with host 1202. Further, the master device 1202 may obtain control commands from the device 1206 via the universal interconnect interface 1204, and write the control commands to the control cache on the master device 1202 chip. Alternatively or in the alternative, the universal interconnect interface 1204 may also read data from a memory module of the host device 1202 and transmit to the device 1206.

Optionally, integrated circuit device 1200 may also include a storage device 1208, which may be coupled to host 1202 and device 1206, respectively. In one or more embodiments, storage 1208 may be used to store data for master 1202 and device 1206, particularly for data that may not be stored in its entirety in internal storage of master 1202 or device 1206 for which operations are required.

According to different application scenarios, the integrated circuit device 1200 of the present disclosure can be used as an SOC system-on-chip of a mobile phone, a robot, an unmanned aerial vehicle, a video capture device, and the like, thereby effectively reducing the core area of a control portion, increasing the processing speed, and reducing the overall power consumption. In this case, a generic interconnect interface 1204 of the integrated circuit device 1200 interfaces with certain components of the apparatus. Some of the components referred to herein may be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface.

In some embodiments, the present disclosure also discloses a chip or integrated circuit chip comprising integrated circuit device 1200. In other embodiments, the present disclosure also discloses a chip packaging structure, which includes the above chip.

In some embodiments, the disclosure also discloses a board card comprising the chip packaging structure. Referring to fig. 13, which provides the aforementioned exemplary board 1300, the board 1300 may include other kit components besides the chip 1302, and the kit components may include but are not limited to: memory device 1304, interface device 1306 and control device 1308.

The memory device 1304 is coupled to the chip 1302 within the chip package via a bus 1314 for storing data. The memory device 1304 may include multiple sets of memory 1310. Each set of memory 1310 is coupled to the chip 1302 by a bus 1314. Each bank of memory 1310 may be a DDR SDRAM ("Double Data Rate SDRAM").

Unlike that shown in FIG. 13, in one embodiment, memory device 1304 can include 4 sets of memory 1310. Each set of memory 1310 may include a plurality of DDR4 pellets (chips). In one embodiment, the chip 1302 may include 4 DDR4 controllers 72, 64 bits of the DDR4 controller 72 are used for data transmission, 8 bits are used for ECC check, and the DDR4 controller may be the processor 1110.

In one embodiment, each set of memory 1310 may include a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the chip 1302 for controlling data transfer and data storage of each memory 1310. Interleaving allocation between the chip 1302 and the memory 1310 may be performed in the same manner as the previous embodiments.

An interface device 1306 is electrically connected to the chip 1302 within the chip package structure. The interface device 1306 is used to realize data transmission between the chip 1302 and an external device 1312 (e.g., a server or a computer). In one embodiment, the interface device 1306 may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip 1302 by the server through the standard PCIE interface, so as to implement data transfer. In another embodiment, the interface device 1306 may also be other interfaces, and the disclosure does not limit the specific expression of the other interfaces, and the switching function can be implemented. In addition, the calculation results of the chip 1302 are still transmitted back to the external device 1312 by the interface device 1306.

Control device 1308 is electrically connected to chip 1302 to monitor the status of chip 1302. Specifically, the chip 1302 and the control device 1308 may be electrically connected through an SPI interface. The control device 1308 may include a single chip microprocessor ("MCU"). The chip 1302 may include multiple processing chips, multiple processing cores, or multiple processing circuits, and may carry multiple loads. Thus, the chip 1302 can be in different operating states such as multi-load and light load. The control device 1308 may be utilized to regulate the operating state of multiple processing chips, multiple processes, and/or multiple processing circuits within the chip 1302.

In some embodiments, the present disclosure also discloses an electronic device or apparatus including the above board card 1300. According to different application scenarios, the electronic device or apparatus may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

The foregoing may be better understood in light of the following clauses:

clause a1, a method for allocating memory, comprising: receiving a memory allocation application for multi-channel DDR; executing: performing inter-channel interleaved memory allocation on a plurality of channels of the multi-channel DDR; and executing memory allocation interleaved within the channels on each channel for which memory is allocated.

Clause a2, the method of clause a1, wherein performing the inter-channel interleaved memory allocation comprises performing the inter-channel interleaved memory allocation on every two channels of the multi-channel DDR.

Clause A3, the method of clause a1, wherein performing the inter-channel interleaved memory allocation comprises performing the inter-channel interleaved memory allocation for the plurality of channels based on page size.

Clause a4, the method of clause A3, wherein performing inter-channel interleaved memory allocation for a plurality of channels based on page size comprises performing inter-channel interleaved memory allocation on each channel cyclically at an inter-channel granularity of page size.

Clause a5, the method of clause a4, wherein performing interchannel interleaved memory allocation on each channel cyclically at the interchannel granularity comprises: sequentially distributing memory with granularity among the channels one by one on the plurality of channels; and repeatedly executing the allocation according to the sequence until the memory of the application is allocated.

Clause a6, the method of clause A3, wherein performing intra-channel interleaved memory allocation comprises interleaving memory allocated on each channel based on the page size over a plurality of memory chunks within the channel.

Clause a7, the method of clause a6, wherein the interleaved allocation of memory onto the plurality of memory chunks within a channel comprises performing the interleaved memory allocation on the respective memory chunks cyclically at an intra-channel granularity that is less than the page size.

Clause A8, the method of clause a7, wherein cyclically performing interleaved memory allocations on respective memory chunks comprises: sequentially allocating memories of the granularity in the channel one by one on the plurality of memory blocks in the channel; and repeatedly executing the sequential allocation until the memory obtained by interleaving among the channels is allocated.

Clause a9, the method of clause a6, wherein the channel further comprises one or more memory blocks participating in memory allocation as spare memory areas, the method further comprising additionally using the spare memory areas for memory allocation according to the memory allocation application.

Clause a10, the method of any one of clauses a1-9, further comprising: receiving a plurality of service instructions to be distributed on a plurality of channels of a multi-channel DDR; distributing the plurality of service instructions to a plurality of channels one by one with the inter-channel granularity; and within each channel, allocating the allocated traffic instructions to the plurality of memory blocks one by one at an intra-channel granularity.

Clause a11, the method of clause a10, wherein the traffic instruction comprises traffic data, the method further comprising interleaving, on a per-lane granularity, traffic data in the traffic instruction over a plurality of memory chunks.

Clause a12, an apparatus for performing data read and write operations, comprising: a transceiver configured to receive a memory allocation application from a master device forming a master-slave relationship with the device; a multi-channel DDR configured to store data; a processor configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation application: performing inter-channel interleaved memory allocation on a plurality of channels of the multi-channel DDR; and executing memory allocation interleaved within the channels on each channel for which memory is allocated.

Clause a13, the apparatus of clause a12, wherein in a memory allocation operation, the processor is further configured to implement the inter-channel interleaved memory allocation using a driver, and in a memory allocation operation, the processor is further configured to implement the intra-channel interleaved memory allocation using a buffer.

Clause a14, the device of any of clauses a12-13, wherein upon completion of the memory allocation, the processor is configured to send a virtual address associated with the memory allocation application to the master device via the transceiver, the transceiver is configured to receive data sent by the master device to the device based on the virtual address, and the processor is configured to store the data on a multi-channel DDR to which corresponding memory is allocated.

Clause a15, the device of clause a13, wherein during the writing of data to the master device, the processor is configured to apply for memory to the master device through the transceiver, and upon receiving a virtual address from the master device, send data stored in the DDR memory to the master device through the transceiver.

Clause a16, a computer-readable storage medium having stored thereon computer program code for allocating memory, the computer program code, when executed by a processor, performing the method of any of clauses a 1-11.

The foregoing detailed description of the embodiments of the present disclosure has been presented for purposes of illustration and description and is intended to be exemplary only and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Meanwhile, a person skilled in the art should, according to the idea of the present disclosure, change or modify the embodiments and applications of the present disclosure. In view of the above, this description should not be taken as limiting the present disclosure.

Claims

1. A method for allocating memory, comprising:

receiving a memory allocation application for multi-channel DDR;

executing:

performing inter-channel interleaved memory allocation on a plurality of channels of the multi-channel DDR; and

and executing memory allocation of interleaving in the channels on each channel for allocating the memory.

2. The method of claim 1, wherein performing the inter-channel interleaved memory allocation comprises performing the inter-channel interleaved memory allocation on every two channels of the multi-channel DDR.

3. The method of claim 1, wherein performing the memory allocation for inter-channel interleaving comprises performing the memory allocation for inter-channel interleaving for the plurality of channels based on page size.

4. The method of claim 3, wherein performing inter-channel interleaved memory allocation for a plurality of channels based on page size comprises performing inter-channel interleaved memory allocation on each channel cyclically at an inter-channel granularity of page size.

5. The method of claim 4, wherein performing inter-channel interleaved memory allocation on each channel cyclically at the inter-channel granularity comprises:

sequentially distributing memory with granularity among the channels one by one on the plurality of channels; and

and repeatedly executing the sequential allocation until the applied memory is allocated.

6. The method of claim 3, wherein performing intra-channel interleaved memory allocation comprises interleaving memory allocated on individual channels based on the page size over a plurality of memory chunks within a channel.

7. The method of claim 6, wherein the interleaved allocation of memory onto the plurality of memory chunks within a lane comprises performing the interleaved memory allocation on the respective memory chunks cyclically at an intra-lane granularity that is less than the page size.

8. The method of claim 7, wherein cyclically performing interleaved memory allocations on respective memory chunks comprises:

sequentially allocating memories of the granularity in the channel one by one on the plurality of memory blocks in the channel; and

and repeatedly executing the sequential allocation until the memory obtained by interleaving the channels is allocated.

9. The method of claim 6, wherein the channel further comprises one or more memory blocks participating in memory allocation as a spare memory region, the method further comprising additionally using the spare memory region for memory allocation according to the memory allocation application.

10. The method of any of claims 1-9, further comprising:

receiving a plurality of service instructions to be distributed on a plurality of channels of a multi-channel DDR;

distributing the plurality of service instructions to a plurality of channels one by one with the inter-channel granularity; and

within each channel, the allocated traffic instructions are allocated to the plurality of memory blocks one by one at intra-channel granularity.

11. The method of claim 10, wherein the traffic instructions include traffic data, the method further comprising interleaving the traffic data in the traffic instructions onto a plurality of memory chunks one by one at intra-channel granularity.

12. An apparatus for performing data read and write operations, comprising:

a transceiver configured to receive a memory allocation application from a master device forming a master-slave relationship with the device;

a multi-channel DDR configured to store data;

a processor configured to perform the following memory allocation operations on the multi-channel DDR according to the received memory allocation application:

13. The apparatus of claim 12, wherein in a memory allocation operation, the processor is further configured to implement the inter-channel interleaved memory allocation using a driver, and in a memory allocation operation, the processor is further configured to implement the intra-channel interleaved memory allocation using a buffer.

14. The device of any of claims 12-13, wherein upon completion of the memory allocation, the processor is configured to send a virtual address associated with the memory allocation application to the master device via the transceiver, the transceiver is configured to receive data sent by the master device to the device based on the virtual address, and the processor is configured to store the data on a multi-channel DDR to which corresponding memory is allocated.

15. The device of claim 13, wherein during the writing of data to the host device, the processor is configured to apply for memory to the host device via the transceiver and send data stored in the DDR memory to the host device via the transceiver after receiving a virtual address from the host device.

16. A computer-readable storage medium having stored thereon computer program code for allocating memory, which when executed by a processor performs the method according to any of claims 1-11.