Nothing Special   »   [go: up one dir, main page]

CN117271112A - Memory pool-based model deployment data storage management method and device - Google Patents

Memory pool-based model deployment data storage management method and device Download PDF

Info

Publication number
CN117271112A
CN117271112A CN202311064559.6A CN202311064559A CN117271112A CN 117271112 A CN117271112 A CN 117271112A CN 202311064559 A CN202311064559 A CN 202311064559A CN 117271112 A CN117271112 A CN 117271112A
Authority
CN
China
Prior art keywords
memory
target
model deployment
memory unit
data storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311064559.6A
Other languages
Chinese (zh)
Inventor
屈顺娇
于钧
徐文东
邓皓匀
钱少华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Changan Automobile Co Ltd
Original Assignee
Chongqing Changan Automobile Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Changan Automobile Co Ltd filed Critical Chongqing Changan Automobile Co Ltd
Priority to CN202311064559.6A priority Critical patent/CN117271112A/en
Publication of CN117271112A publication Critical patent/CN117271112A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of intelligent driving, in particular to a memory pool-based model deployment data storage management method and device, wherein the method comprises the following steps: when a memory application instruction is received in the process of model deployment operation, current memory demand information is acquired, a preset hash table is searched, and memory block information, a starting address and an idle identifier corresponding to each memory unit in a preset memory pool are packaged in the hash table; determining a target memory unit corresponding to the target memory block information; if the idle mark of the target memory unit is in the idle state, the idle mark of the target memory unit is adjusted to be in the occupied state, and the target memory unit is called according to the starting address of the target memory unit so as to store the model deployment data. According to the method and the device, the memory block information, the starting address and the idle mark of the memory pool are packaged into the hash table, the starting address is obtained by searching the hash table, the response speed of the memory pool is improved, and the frequency of applying for the memory to the operating system is reduced.

Description

Memory pool-based model deployment data storage management method and device
Technical Field
The application relates to the technical field of intelligent driving, in particular to a memory pool-based model deployment data storage management method and device.
Background
The deep learning neural network model in the intelligent driving field is finally required to be deployed into the embedded equipment, the computing power and the memory demand are increased along with the increase or the increase of the deployment model, and the memory is frequently applied to and released from the operating system in the running process, so that a large amount of memory fragments are generated, the memory and the computing resources of the operating system of the embedded equipment are limited, the effect of memory resource management is poor, and the probability of the problems of triggering resource preemption, unstable system operation, reduced real-time performance of model deployment and the like is increased.
Accordingly, there is a need for improvement and advancement in the art.
Disclosure of Invention
The application provides a memory pool-based model deployment data storage management method and device, which are used for solving the technical problem that the probability that the problems of triggering resource preemption, unstable system operation, reduced real-time performance of model deployment and the like are increased when memory is frequently applied to and released from an operating system in the model deployment operation process in the related technology.
In order to achieve the above purpose, the present application adopts the following technical scheme:
an embodiment of a first aspect of the present application provides a memory pool-based model deployment data storage management method, including the following steps:
when a memory application instruction is received in the model deployment operation process, current memory demand information is acquired;
searching a preset hash table according to the current memory demand information, wherein memory block information, a starting address and an idle identifier corresponding to each memory unit in a preset memory pool are packaged in the hash table;
if target memory block information matched with the current memory demand information is searched in the hash table, determining a target memory unit corresponding to the target memory block information;
and if the idle mark of the target memory unit is in an idle state, adjusting the idle mark of the target memory unit to be in an occupied state, and calling the target memory unit according to the starting address of the target memory unit so as to store the model deployment data.
According to the technical means, the method and the device for processing the memory block in the hash table directly search the starting address of the available memory unit in the hash table by presetting the memory pool and packaging the memory block information, the starting address and the idle mark of the memory pool into the hash table, so that the target memory unit is called, the response speed of the memory pool is improved, the application efficiency of the memory block is improved, and the recovery and reuse of the memory block are facilitated.
Optionally, the memory requirement information includes: memory requirement type and memory requirement size; the memory block information includes: memory size and memory type.
According to the technical means, the starting address, the memory size, the memory type and the idle identifier of the memory unit are packaged in the hash table, when the hash table is searched according to the memory requirement type and the memory requirement size, the memory unit can be compared with the memory size and the memory type in the table, and if the comparison result is matched, the memory unit is the target memory unit meeting the requirement. According to the embodiment of the application, the memory units in the memory pool are unified and managed in the hash table, so that the response speed of the memory pool is improved, and the running performance of a program is improved.
Optionally, after searching the preset hash table according to the memory requirement information, the method further includes:
if the target memory block information matched with the current memory demand information is not searched in the hash table or the idle mark of the target memory unit is in an occupied state, determining a memory allocation interface according to the current memory demand information;
and applying for the target memory block matched with the memory demand information to an operating system through the memory allocation interface.
According to the technical means, different types of memory allocation interfaces are designed for memory allocation, and memory blocks are acquired through the corresponding types of memory allocation interfaces when the application is performed, so that allocation efficiency is improved.
Optionally, the memory allocation interface is a CPU memory allocation interface, a GPU memory allocation interface, or a page lock memory allocation interface;
the CPU memory allocation interface is used for allocating CPU memory, and the CPU memory is used for storing data transmitted on the host;
the GPU memory allocation interface is used for allocating GPU memory, and the GPU memory is used for storing data transmitted between devices;
the page-locked memory allocation interface is used for allocating page-locked memory, and the page-locked memory is used for storing data transmitted between the device and the host.
According to the technical means, the embodiment of the application can be compatible with GPU and CPU memory management at the same time, is convenient for unified management, and improves data transmission efficiency.
Optionally, if the idle identifier of the target memory unit is in an idle state, adjusting the idle identifier of the target memory unit to be in an occupied state, and calling the target memory unit according to the starting address of the target memory unit to store the model deployment data, and then further including:
and when a recovery instruction for the target memory unit is received, adjusting the idle mark corresponding to the target memory unit in the hash table to be in an idle state.
According to the technical means, the memory state of the memory unit is adjusted in the hash table, so that the efficiency and convenience of memory pool management are improved.
Optionally, the memory pool-based model deployment data storage management method further includes:
when a basic communication instruction is received in the model deployment operation process, determining a target communication type according to the basic communication instruction;
searching a corresponding relation between a preset communication type and a basic communication interface, determining a target basic communication interface corresponding to the target communication type, and communicating through the target basic communication interface;
wherein the communication types include: communication between CPUs, communication between CPUs and GPUs, and communication between GPUs and GPUs.
According to the technical means, the communication is performed by using the target basic communication interface corresponding to the communication type, so that the function template of the target basic communication interface can be directly applied, and the communication efficiency is improved.
Optionally, the memory pool-based model deployment data storage management method further includes:
when a data processing instruction is received in the model deployment running process, determining a target function type according to the data processing instruction;
searching a corresponding relation between a preset function type and a basic function interface, determining a target basic function interface corresponding to the target function type, and performing data processing through the target basic function interface;
wherein the function types include a transpose function, a screening function, and a sorting function.
According to the technical means, the data processing efficiency is improved by using the preset basic function interface to process the data.
An embodiment of a second aspect of the present application provides a memory pool-based model deployment data storage management device, including:
the acquisition module is used for acquiring current memory demand information when a memory application instruction is received in the model deployment running process;
the searching module is used for searching a preset hash table according to the current memory demand information, and memory block information, a starting address and an idle identifier corresponding to each memory unit in a preset memory pool are packaged in the hash table;
the determining module is used for determining a target memory unit corresponding to the target memory block information if the target memory block information matched with the current memory demand information is searched in the hash table;
and the calling module is used for adjusting the idle mark of the target memory unit to be in an occupied state if the idle mark of the target memory unit is in the idle state, and calling the target memory unit according to the starting address of the target memory unit so as to store the model deployment data.
An embodiment of a third aspect of the present application provides an embedded device, where the embedded device includes a memory, a processor, and a memory pool-based model deployment data storage management program stored in the memory and capable of running on the processor, where when the processor executes the memory pool-based model deployment data storage management program, the steps of the memory pool-based model deployment data storage management method described above are implemented.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a memory pool based model deployment data storage management program, which when executed by a processor, implements the steps of a memory pool based model deployment data storage management method as described above.
The beneficial effects of this application:
(1) According to the method and the device, the memory block information, the starting address and the idle mark of the memory pool are packaged into the hash table, and the starting address of the available memory unit is directly searched in the hash table, so that the target memory unit is called, the response speed of the memory pool is improved, the application efficiency of the memory block is improved, the recovery and reuse of the memory block are facilitated, the number of times of applying for releasing the memory to an operating system is reduced, the stability and instantaneity of model operation are improved while the system pressure is reduced, and the probability of triggering the problems of resource preemption, unstable system operation, reduced instantaneity of model deployment and the like is avoided.
(2) According to the method and the device, different types of memory allocation interfaces are designed for memory allocation, and memory blocks are acquired through the corresponding types of memory allocation interfaces when the method and the device are applied, so that the allocation efficiency is improved; and the system is compatible with GPU and CPU memory management, is convenient for unified management, and improves the data transmission efficiency.
(3) According to the method and the device, the target basic communication interface corresponding to the communication type is used for communication, so that the function template of the target basic communication interface can be directly applied, and the communication efficiency is improved; and the data processing efficiency is improved by using the preset basic function interface to process the data.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flowchart of a method for managing data storage of a memory pool-based model deployment according to an embodiment of the present application;
FIG. 2 is a flowchart of an embodiment of a neural network model deployment memory management of the present application;
FIG. 3 is a schematic structural diagram of a memory pool-based model deployment data storage management device according to an embodiment of the present application;
fig. 4 is a schematic block diagram of an internal structure of an embedded device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.
The following describes a memory pool-based model deployment data storage management method and device according to an embodiment of the present application with reference to the accompanying drawings. Aiming at the problem that the frequent application and release of memory to an operating system in the model deployment operation process mentioned in the background art can increase the probability of problems such as triggering resource preemption, unstable system operation, reduced real-time performance of model deployment and the like, the application provides a memory pool-based model deployment data storage management method, wherein in the method, when a memory application instruction is received in the model deployment operation process, the current memory demand information is acquired; searching a preset hash table according to the current memory demand information, wherein memory block information, a starting address and an idle identifier corresponding to each memory unit in a preset memory pool are packaged in the hash table; if target memory block information matched with the current memory demand information is searched in the hash table, determining a target memory unit corresponding to the target memory block information; if the idle mark of the target memory unit is in the idle state, the idle mark of the target memory unit is adjusted to be in the occupied state, and the target memory unit is called according to the starting address of the target memory unit so as to store the model deployment data. According to the method and the device, the memory block information, the starting address and the idle mark of the memory pool are packaged into the hash table, and the starting address of the available memory unit is directly searched in the hash table, so that the target memory unit is called, the response speed of the memory pool is improved, the application efficiency of the memory block is improved, the recovery and reuse of the memory block are facilitated, the number of times of applying for releasing the memory to an operating system is reduced, the stability and instantaneity of model operation are improved while the system pressure is reduced, and the probability of triggering the problems of resource preemption, unstable system operation, reduced instantaneity of model deployment and the like is avoided.
Specifically, fig. 1 is a flow chart of a method for managing model deployment data storage based on a memory pool according to an embodiment of the present application.
As shown in fig. 1, the memory pool-based model deployment data storage management method includes the following steps:
in step S101, when a memory application instruction is received during the model deployment operation, current memory requirement information is obtained.
In the embodiment of the application, when the embedded device receives the memory application instruction in the model deployment running process, the embedded device obtains the current memory requirement information, and the embedded device can be an embedded development board. And according to the execution information and the hardware resource information of the model deployment to the development board and the running process, obtaining the memory demand information, so as to schedule the memory in a planning way and obtain an optimal scheduling strategy.
In step S102, a preset hash table is searched according to the current memory requirement information, and memory block information, a start address and an idle identifier corresponding to each memory unit in a preset memory pool are encapsulated in the hash table.
When the neural network model needs to be deployed, the memory pool is uniformly used for management and allocation, so that the frequency of applying for releasing the memory to an operating system is reduced, the system pressure is reduced, and the running stability and instantaneity of the neural network model are improved. In order to further facilitate management of the memory pool, the memory block information, the starting address and the idle identifier are packaged by using a hash table, so that the memory block is convenient to recycle and reuse.
In one embodiment, the memory requirement information includes: memory requirement type and memory requirement size; the memory block information includes: memory size and memory type.
In the embodiment of the application, when a new memory space requirement exists in the model deployment operation process, an application is provided for a memory pool, the required memory requirement size and the memory requirement type are provided, and stream streams are also required to be provided when data synchronization requirements exist. The memory pool can put the starting address, the memory size, the memory type and the idle identifier of the memory unit into the hash table for unified management and maintenance, and can search matching information in the memory pool according to the requirements set by a caller. Specifically, a structure body is constructed to store information such as a starting address, a memory size, a memory type, an idle identifier, a use frequency and the like, and the structure body is stored in a two-stage hash table in a classified manner according to the memory type and the memory size to be subjected to unified maintenance management, so that the response speed of a memory pool is improved, and the running performance of a program is improved.
In step S103, if the target memory block information matching the current memory requirement information is searched in the hash table, determining a target memory unit corresponding to the target memory block information.
When the memory pool receives a new memory application, the embodiment of the application searches whether the memory unit meeting the requirement exists in the hash table preferentially through the memory type and the memory size in the memory pool, and if the memory unit is idle, the idle mark in the structure body is set to be in an occupied state to indicate that the memory unit is occupied.
In this embodiment of the present application, after searching the preset hash table according to the memory requirement information, the method further includes: if the target memory block information matched with the current memory demand information is not searched in the hash table or the idle mark of the target memory unit is in an occupied state, determining a memory allocation interface according to the current memory demand information; and applying for the target memory block matched with the memory demand information to the operating system through the memory allocation interface.
The embodiment of the present application further includes two other cases, when there is a memory unit in the memory pool that meets the requirement, but the memory unit is already occupied, or when there is no memory unit in the memory pool that meets the requirement, the memory pool applies for a target memory block matching the memory requirement information to the operating system. According to the embodiment of the application, different types of memory allocation interfaces are designed for memory allocation, and when the application is applied, the memory blocks are acquired through the corresponding types of memory allocation interfaces, so that the allocation efficiency is improved.
In the embodiment of the present application, the memory allocation interface is a CPU memory allocation interface, a GPU memory allocation interface, or a page lock memory allocation interface. The CPU memory allocation interface is used for allocating CPU memory, and the CPU memory is used for storing data transmitted on the host; the GPU memory allocation interface is used for allocating GPU memory, and the GPU memory is used for storing data transmitted between devices; the page lock memory allocation interface is used for allocating page lock memory, and the page lock memory is used for storing data transmitted between the device and the host.
In the embodiment of the application, the memalign, cudamalloc and cudamellochost interfaces are used as substrates, and three memory allocation modes are provided: the memory allocated by the memalign is the memory of the exchange page, and is mainly used for storing data only transmitted on a host in deployment, and is marked as CPU memory; the cudamelloc is distributed with GPU video memory, and is mainly used for storing data only transferred between devices in deployment, and is marked as GPU memory; the cumarochost allocates page-locked memory, labeled as page-locked memory (cudahst memory), for storing data transferred between the device and the host in the deployment. In deployment, data transmission between a host and a device cannot be avoided, and page locking memory allocated by the cudamellochast has the advantages that the data transmission bandwidth between the device and the host is high, and the GPU can directly access the memory space on some devices, so that copying work between the host and the device can be saved, the data transmission efficiency is improved, and the running performance of a neural network on an embedded development board is improved. The embodiment of the application can be compatible with GPU and CPU memory management at the same time, can be directly used as a plug-in, and is suitable for a neural network model deployment framework.
In step S104, if the idle identifier of the target memory unit is in the idle state, the idle identifier of the target memory unit is adjusted to be in the occupied state, and the target memory unit is called according to the starting address of the target memory unit, so as to store the model deployment data.
According to the method and the device for the memory block application, the starting address of the available memory unit is directly searched in the hash table, so that the target memory unit is called, and the application efficiency of the memory block is improved. In practical engineering application, the problem of limited hardware resources of the vehicle-mounted deployment platform is considered, and the embodiment of the application improves the memory utilization rate and the memory access speed in the model deployment operation process, has pluggable performance, and can be quickly transplanted to different vehicle-mounted deployment platforms.
In this embodiment, after step S104, the method further includes: when a recovery instruction for the target memory unit is received, the idle mark corresponding to the target memory unit in the hash table is adjusted to be in an idle state.
When the memory unit applied in the memory pool preset in the embodiment of the application is used up and needs to be recovered, the memory is not released to the operating system directly through operations such as free and cudafree, and the memory unit is recovered to the memory pool. Specifically, the hash table maintained in the memory pool is first searched for the starting address of the memory unit. If not, the memory unit needing to be recovered is not considered to belong to the memory pool management range; otherwise, the memory unit is considered to belong to the memory pool management range, and the idle mark corresponding to the memory unit is set to be in an idle state. According to the embodiment of the application, the idle identification of the memory unit is adjusted in the hash table, so that the efficiency and convenience of memory pool management are improved.
In the embodiment of the application, when a basic communication instruction is received in the model deployment operation process, determining a target communication type according to the basic communication instruction; searching a corresponding relation between a preset communication type and a basic communication interface, determining a target basic communication interface corresponding to the target communication type, and communicating through the target basic communication interface; wherein, the communication type includes: communication between CPUs, communication between CPUs and GPUs, and communication between GPUs and GPUs.
The embodiment of the application designs a basic communication interface, and because communication between devices always exists in the deep learning deployment process, the embodiment of the application designs an interface supporting basic communication, namely, a storage function related to CUDA is used for guaranteeing that data among the CPU, the GPU and the GPU can be quickly migrated and copied, and a corresponding conversion function is used for guaranteeing data synchronization in the model operation process. According to the method and the device for communication, the target basic communication interface corresponding to the communication type is used for communication, the function template of the target basic communication interface can be directly applied, and communication efficiency is improved.
In the embodiment of the application, when a data processing instruction is received in the model deployment running process, determining a target function type according to the data processing instruction; searching a corresponding relation between a preset function type and a basic function interface, determining a target basic function interface corresponding to the target function type, and performing data processing through the target basic function interface; the function types comprise a transposition function, a screening function and a sorting function.
The embodiment of the application also designs a basic function interface, and a large amount of tensor data need to be transposed, screened, sequenced and the like in the operation process of the deep learning model, so that the embodiment of the application designs an interface capable of realizing the functions of transposition, screening, sequencing and the like by using functions of permute, sort, mask and the like, and supports a developer to freely add and integrate on the basis of the interfaces according to the needs in the use process. According to the embodiment of the application, the data processing is performed by using the preset basic function interface, so that the data processing efficiency is improved.
The basic communication interface and the basic function interface designed by the embodiment of the application basically form a basic storage class which is used for adapting to the deployment of various deep learning models, the adaptive storage class is constructed according to the execution information of the neural network model, and the storage class supports, but is not limited to, float, double, int32_t, int8_t and other types of data and supports a CPU, a GPU and data transmitted between the GPU and the CPU; the communication interface and the expansion interface are designed to support relevant basic function implementation, including but not limited to communication between CPUs, communication between GPU and CPU, sequencing, transposition and the like. The storage classes adapting to different data formats and scenes are combined and packaged according to the use probability in the deployment and operation process of the comprehensive deep learning model, a constructor and a reload function of the storage classes are provided, and a universal template class is constructed according to some common parameters including equipment serial numbers, memory types, memory sizes, stream streams and the like so as to improve the speed and performance of data storage and migration.
In one embodiment, a flowchart of a particular neural network model deployment memory management is shown in FIG. 2.
A1, receiving a memory application;
a2, judging whether a memory block meeting the requirement exists in the memory pool or not; if yes, executing the step A3; if not, executing the step B1;
step B1, applying for a new memory block from an operating system; continuing to execute the step A3;
step A3, distributing to a caller;
step A4, when the caller does not use the memory block any more, the memory block is recycled to the memory pool;
step A5, calling a designed basic communication interface during communication;
step A6, calling a designed basic function interface during data processing;
and A7, calling a designed storage class construction function when storing data.
Next, a memory pool-based model deployment data storage management device according to an embodiment of the present application will be described with reference to the accompanying drawings.
As shown in fig. 3, the memory pool-based model deployment data storage management device 10 includes:
the obtaining module 100 is configured to obtain current memory requirement information when a memory application instruction is received during a model deployment operation process;
the searching module 200 is configured to search a preset hash table according to the current memory requirement information, where the hash table is encapsulated with memory block information, a starting address and an idle identifier corresponding to each memory unit in the preset memory pool;
a determining module 300, configured to determine a target memory unit corresponding to the target memory block information if target memory block information matched with the current memory requirement information is searched in the hash table;
and the calling module 400 is configured to adjust the idle identifier of the target memory unit to an occupied state if the idle identifier of the target memory unit is in the idle state, and call the target memory unit according to the starting address of the target memory unit to store the model deployment data.
Optionally, the memory requirement information includes: memory requirement type and memory requirement size; the memory block information includes: memory size and memory type.
Optionally, the memory pool based model deployment data storage management device 10 further includes:
the allocation interface module is used for determining an internal memory allocation interface according to the current internal memory demand information if the target internal memory block information matched with the current internal memory demand information is not searched in the hash table or the idle mark of the target internal memory unit is in an occupied state; and applying for the target memory block matched with the memory demand information to the operating system through the memory allocation interface.
Optionally, the memory allocation interface is a CPU memory allocation interface, a GPU memory allocation interface, or a page lock memory allocation interface;
the CPU memory allocation interface is used for allocating CPU memory, and the CPU memory is used for storing data transmitted on the host;
the GPU memory allocation interface is used for allocating GPU memory, and the GPU memory is used for storing data transmitted between devices;
the page lock memory allocation interface is used for allocating page lock memory, and the page lock memory is used for storing data transmitted between the device and the host.
Optionally, the memory pool based model deployment data storage management device 10 further includes:
and the searching module is used for adjusting the idle mark corresponding to the target memory unit in the hash table into an idle state when receiving the recovery instruction to the target memory unit.
Optionally, the memory pool based model deployment data storage management device 10 further includes:
the basic communication module is used for determining a target communication type according to the basic communication instruction when the basic communication instruction is received in the model deployment operation process; searching a corresponding relation between a preset communication type and a basic communication interface, determining a target basic communication interface corresponding to the target communication type, and communicating through the target basic communication interface; wherein, the communication type includes: communication between CPUs, communication between CPUs and GPUs, and communication between GPUs and GPUs.
Optionally, the memory pool based model deployment data storage management device 10 further includes:
the data processing module is used for determining the type of the target function according to the data processing instruction when the data processing instruction is received in the model deployment running process; searching a corresponding relation between a preset function type and a basic function interface, determining a target basic function interface corresponding to the target function type, and performing data processing through the target basic function interface; the function types comprise a transposition function, a screening function and a sorting function.
It should be noted that the foregoing explanation of the embodiment of the memory pool based model deployment data storage management method is also applicable to the memory pool based model deployment data storage management device of this embodiment, and will not be repeated herein.
According to the memory pool-based model deployment data storage management device provided by the embodiment of the application, when a memory application instruction is received in the model deployment operation process, the current memory demand information is obtained; searching a preset hash table according to the current memory demand information, wherein memory block information, a starting address and an idle identifier corresponding to each memory unit in a preset memory pool are packaged in the hash table; if target memory block information matched with the current memory demand information is searched in the hash table, determining a target memory unit corresponding to the target memory block information; if the idle mark of the target memory unit is in the idle state, the idle mark of the target memory unit is adjusted to be in the occupied state, and the target memory unit is called according to the starting address of the target memory unit so as to store the model deployment data. According to the method and the device, the memory block information, the starting address and the idle mark of the memory pool are packaged into the hash table, and the starting address of the available memory unit is directly searched in the hash table, so that the target memory unit is called, the response speed of the memory pool is improved, the application efficiency of the memory block is improved, the recovery and reuse of the memory block are facilitated, the number of times of applying for releasing the memory to an operating system is reduced, the stability and instantaneity of model operation are improved while the system pressure is reduced, and the probability of triggering the problems of resource preemption, unstable system operation, reduced instantaneity of model deployment and the like is avoided.
Fig. 4 is a schematic structural diagram of an embedded device according to an embodiment of the present application. The embedded device may include:
memory 501, processor 502, and a computer program stored on memory 501 and executable on processor 502.
The processor 502 implements the memory pool-based model deployment data storage management method provided in the above embodiments when executing a program.
Further, the embedded device further includes:
a communication interface 503 for communication between the memory 501 and the processor 502.
Memory 501 for storing a computer program executable on processor 502.
The memory 501 may include high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 501, the processor 502, and the communication interface 503 are implemented independently, the communication interface 503, the memory 501, and the processor 502 may be connected to each other via a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (Periphera l Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the figures are shown with only one line, but not with only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 501, the processor 502, and the communication interface 503 are integrated on a chip, the memory 501, the processor 502, and the communication interface 503 may perform communication with each other through internal interfaces.
The processor 502 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more integrated circuits configured to implement embodiments of the present application.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the memory pool-based model deployment data storage management method as described above.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "N" is at least two, such as two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can read instructions from and execute instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or part of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (10)

1. The memory pool-based model deployment data storage management method is characterized by comprising the following steps of:
when a memory application instruction is received in the model deployment operation process, current memory demand information is acquired;
searching a preset hash table according to the current memory demand information, wherein memory block information, a starting address and an idle identifier corresponding to each memory unit in a preset memory pool are packaged in the hash table;
if target memory block information matched with the current memory demand information is searched in the hash table, determining a target memory unit corresponding to the target memory block information;
and if the idle mark of the target memory unit is in an idle state, adjusting the idle mark of the target memory unit to be in an occupied state, and calling the target memory unit according to the starting address of the target memory unit so as to store the model deployment data.
2. The memory pool based model deployment data storage management method of claim 1, wherein the memory requirement information comprises: memory requirement type and memory requirement size; the memory block information includes: memory size and memory type.
3. The memory pool based model deployment data storage management method of claim 1, further comprising, after searching a preset hash table according to the memory requirement information:
if the target memory block information matched with the current memory demand information is not searched in the hash table or the idle mark of the target memory unit is in an occupied state, determining a memory allocation interface according to the current memory demand information;
and applying for the target memory block matched with the memory demand information to an operating system through the memory allocation interface.
4. The memory pool based model deployment data storage management method of claim 3, wherein the memory allocation interface is a CPU memory allocation interface, a GPU memory allocation interface, or a page lock memory allocation interface;
the CPU memory allocation interface is used for allocating CPU memory, and the CPU memory is used for storing data transmitted on the host;
the GPU memory allocation interface is used for allocating GPU memory, and the GPU memory is used for storing data transmitted between devices;
the page-locked memory allocation interface is used for allocating page-locked memory, and the page-locked memory is used for storing data transmitted between the device and the host.
5. The memory pool based model deployment data storage management method of claim 1, wherein if the idle identifier of the target memory unit is in an idle state, adjusting the idle identifier of the target memory unit to an occupied state, and calling the target memory unit according to the start address of the target memory unit to store the model deployment data, further comprising:
and when a recovery instruction for the target memory unit is received, adjusting the idle mark corresponding to the target memory unit in the hash table to be in an idle state.
6. The memory pool based model deployment data storage management method of claim 1, wherein the memory pool based model deployment data storage management method further comprises:
when a basic communication instruction is received in the model deployment operation process, determining a target communication type according to the basic communication instruction;
searching a corresponding relation between a preset communication type and a basic communication interface, determining a target basic communication interface corresponding to the target communication type, and communicating through the target basic communication interface;
wherein the communication types include: communication between CPUs, communication between CPUs and GPUs, and communication between GPUs and GPUs.
7. The memory pool based model deployment data storage management method of claim 1, wherein the memory pool based model deployment data storage management method further comprises:
when a data processing instruction is received in the model deployment running process, determining a target function type according to the data processing instruction;
searching a corresponding relation between a preset function type and a basic function interface, determining a target basic function interface corresponding to the target function type, and performing data processing through the target basic function interface;
wherein the function types include a transpose function, a screening function, and a sorting function.
8. A memory pool based model deployment data storage management device, comprising:
the acquisition module is used for acquiring current memory demand information when a memory application instruction is received in the model deployment running process;
the searching module is used for searching a preset hash table according to the current memory demand information, and memory block information, a starting address and an idle identifier corresponding to each memory unit in a preset memory pool are packaged in the hash table;
the determining module is used for determining a target memory unit corresponding to the target memory block information if the target memory block information matched with the current memory demand information is searched in the hash table;
and the calling module is used for adjusting the idle mark of the target memory unit to be in an occupied state if the idle mark of the target memory unit is in the idle state, and calling the target memory unit according to the starting address of the target memory unit so as to store the model deployment data.
9. An embedded device comprising a memory, a processor, and a memory pool based model deployment data storage management program stored in the memory and executable on the processor, the processor implementing the steps of the memory pool based model deployment data storage management method of any of claims 1-7 when executing the memory pool based model deployment data storage management program.
10. A computer readable storage medium, wherein a memory pool based model deployment data storage management program is stored on the computer readable storage medium, which when executed by a processor, implements the steps of the memory pool based model deployment data storage management method of any of claims 1-7.
CN202311064559.6A 2023-08-22 2023-08-22 Memory pool-based model deployment data storage management method and device Pending CN117271112A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311064559.6A CN117271112A (en) 2023-08-22 2023-08-22 Memory pool-based model deployment data storage management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311064559.6A CN117271112A (en) 2023-08-22 2023-08-22 Memory pool-based model deployment data storage management method and device

Publications (1)

Publication Number Publication Date
CN117271112A true CN117271112A (en) 2023-12-22

Family

ID=89209452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311064559.6A Pending CN117271112A (en) 2023-08-22 2023-08-22 Memory pool-based model deployment data storage management method and device

Country Status (1)

Country Link
CN (1) CN117271112A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118095413A (en) * 2024-03-01 2024-05-28 暗物质(北京)智能科技有限公司 Multi-mode model deployment system, method, electronic equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118095413A (en) * 2024-03-01 2024-05-28 暗物质(北京)智能科技有限公司 Multi-mode model deployment system, method, electronic equipment and readable storage medium
CN118095413B (en) * 2024-03-01 2024-09-06 暗物质(北京)智能科技有限公司 Multi-mode model deployment system, method, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US7624247B2 (en) Method and apparatus for executing dynamic memory management with object-oriented program
CN117271112A (en) Memory pool-based model deployment data storage management method and device
US20240111549A1 (en) Method and apparatus for constructing android running environment
CN106776395A (en) A kind of method for scheduling task and device of shared cluster
CN105700877A (en) Application deployment method and apparatus
CN114168490A (en) Method for determining memory recovery threshold and related equipment
CN109783221B (en) Virtual machine resource allocation method and device and resource server
CN1967500A (en) Resource using method in automatic testing process
CN112486642A (en) Resource scheduling method and device, electronic equipment and computer readable storage medium
CN113849260A (en) Instance processing core allocation method and device
CN110795234A (en) Resource scheduling method and device
US8528007B1 (en) Firmware downloading through process file system
CN111158875A (en) Multi-module-based multi-task processing method, device and system
CN115576706A (en) Method and device for interfacing with third-party system, electronic equipment and readable medium
CN110308914A (en) Upgrade processing method, device, equipment, system and computer readable storage medium
CN115601221B (en) Resource allocation method and device and artificial intelligent training system
WO2021218384A1 (en) Algorithm update method, system and device
CN116521266A (en) Management method and device for vehicle-mounted application starting configuration, vehicle and storage medium
US20170090820A1 (en) Method and device for operating a many-core system
CN108279982B (en) Method, system and equipment for managing pbs resources and hadoop resources
CN113760379B (en) Method and device for adding parameters in published program
CN117093345B (en) Task linked list execution method and device, terminal equipment and storage medium
CN115858013B (en) Multi-research and development project parallel resource allocation method, system, device and medium
CN117742977B (en) Method for copying memory data of chip, electronic equipment and medium
CN1442800A (en) Dynamic SNMF network equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination