CN111079925B - Operation method, device and related product - Google Patents
Operation method, device and related product Download PDFInfo
- Publication number
- CN111079925B CN111079925B CN201811221806.8A CN201811221806A CN111079925B CN 111079925 B CN111079925 B CN 111079925B CN 201811221806 A CN201811221806 A CN 201811221806A CN 111079925 B CN111079925 B CN 111079925B
- Authority
- CN
- China
- Prior art keywords
- instruction
- macro
- data
- instructions
- operating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Advance Control (AREA)
Abstract
The disclosure relates to an operation method, an operation device and a related product. The device comprises a device determining module and an instruction generating module, wherein the device determining module is used for determining the running device for executing the macro instruction according to the received macro instruction. The instruction generating module is used for generating an operation instruction according to the macro instruction and the operation equipment. The operation method, the operation device and the related products provided by the embodiment of the disclosure can be used in a cross-platform mode, and have the advantages of good applicability, high instruction conversion speed, high processing efficiency, low error probability and low development cost of manpower and material resources.
Description
Technical Field
The present disclosure relates to the field of information processing technologies, and in particular, to a method and an apparatus for generating a neural network instruction, and a related product.
Background
With the continuous development of science and technology, neural network algorithms are more and more widely used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of neural network algorithms is higher and higher, the scale of the neural network algorithms is continuously increased. A large-scale neural network model based on a Graphics Processing Unit (GPU) and a Central Processing Unit (CPU) takes a lot of computation time and consumes a lot of power. In the related art, the method for accelerating the processing speed of the neural network model has the problems of incapability of cross-platform processing, low processing efficiency, high development cost, easiness in making mistakes and the like.
Disclosure of Invention
In view of this, the present disclosure provides a method and an apparatus for generating a neural network instruction, and a related product, so that the neural network instruction can be used across platforms, the processing efficiency is improved, and the error probability and the development cost are reduced.
According to a first aspect of the present disclosure, there is provided a neural network instruction generating apparatus, the apparatus comprising:
the device determining module is used for determining running devices for executing the macro instructions according to the received macro instructions;
and the instruction generating module is used for generating an operating instruction according to the macro instruction and the operating equipment.
According to a second aspect of the present disclosure, there is provided a machine learning arithmetic device, the device including:
one or more neural network instruction generating devices according to the first aspect, configured to acquire data to be operated and control information from another processing device, execute a specified machine learning operation, and transmit an execution result to the other processing device through an I/O interface;
when the machine learning arithmetic device comprises a plurality of neural network instruction generating devices, the plurality of neural network instruction generating devices can be connected through a specific structure and transmit data;
the plurality of neural network instruction generating devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support larger-scale machine learning operation; a plurality of the neural network instruction generating devices share the same control system or own respective control systems; the plurality of neural network instruction generating devices share a memory or have respective memories; the interconnection mode of the plurality of neural network instruction generating devices is any interconnection topology.
According to a third aspect of the present disclosure, there is provided a combined processing apparatus, the apparatus comprising:
the machine learning arithmetic device, the universal interconnect interface, and the other processing device according to the second aspect;
and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network operation device of the second aspect or the combination processing device of the third aspect.
According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure, which includes the machine learning chip of the fourth aspect.
According to a sixth aspect of the present disclosure, a board card is provided, which includes the machine learning chip packaging structure of the fifth aspect.
According to a seventh aspect of the present disclosure, there is provided an electronic device, which includes the machine learning chip of the fourth aspect or the board of the sixth aspect.
According to an eighth aspect of the present disclosure, there is provided a neural network instruction generating method, the method comprising:
determining running equipment for executing the macro instruction according to the received macro instruction;
and generating an operation instruction according to the macro instruction and the operation equipment.
In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
The device comprises a device determining module and a command generating module, wherein the device determining module is used for determining running equipment for executing a macro command according to the received macro command. The instruction generating module is used for generating an operation instruction according to the macro instruction and the operation equipment. The method, the device and the related products can be used in a cross-platform mode, the applicability is good, the instruction conversion speed is high, the processing efficiency is high, the error probability is low, and the cost of developing manpower and material resources is low.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a block diagram of a neural network instruction generating device according to an embodiment of the present disclosure.
Fig. 2 illustrates a block diagram of a neural network instruction generating device, according to an embodiment of the present disclosure.
Fig. 3a and 3b are schematic diagrams illustrating an application scenario of a neural network instruction generating device according to an embodiment of the present disclosure.
Fig. 4a, 4b show block diagrams of a combined processing device according to an embodiment of the present disclosure.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure.
Fig. 6 illustrates a flow diagram of a neural network instruction generation method according to an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Fig. 1 shows a block diagram of a neural network instruction generating device according to an embodiment of the present disclosure. As shown in fig. 1, the apparatus includes a device determination module 11 and an instruction generation module 12. The device determining module 11 is configured to determine, according to the received macro instruction, an operating device that executes the macro instruction. The instruction generating module 12 is configured to generate an operation instruction according to the macro instruction and the operation device.
In this implementation, a macro is a name for batch processing, and a macro may be a rule or pattern, or syntax replacement, which is automatically performed when the macro is encountered. The macroinstruction can be formed by integrating commonly used instructions to be executed for processing data such as calculation, control and transportation.
In one possible implementation, the macro instruction may include at least one of: calculating macros, controlling macros and data carrying macros. Wherein the compute macroinstructions may include at least one of neural network compute macroinstructions, vector logic compute macroinstructions, matrix vector compute macroinstructions, scalar compute macroinstructions, and scalar logic compute macroinstructions. The control macro-instructions may include at least one of unconditional jump macro-instructions and conditional jump macro-instructions. The data handling macros can include at least one of read macros and write macros. Reading the macro instruction may include at least one of reading a neuron macro instruction, reading a synapse macro instruction, and reading a scalar macro instruction. The write macro instruction may include at least one of a write neuron macro instruction, a write synapse macro instruction, and a write scalar macro instruction.
In one possible implementation, the macro instruction may contain at least one of the following options: identification of a specified device for executing the macro-instruction, operation type, input address, output address, input quantity, output quantity, operand, and instruction parameters. The execution instructions may include at least one of the following options: operation type, input address, output address, operands, and instruction parameters.
The identifier of the specific device may be a physical address, an IP address, a name, a number, and the like of the specific device. The mark may comprise one or any combination of numbers, letters, symbols. When the position of the identifier of the specified device of the macro instruction is empty, determining that the macro instruction has no specified device; alternatively, when the macro instruction does not include the field "identification of specified device", it is determined that the macro instruction does not have a specified device. The operation type may refer to a type of the macro instruction performing an operation on the data, and represents a specific type of the macro instruction, for example, when an operation type of a macro instruction is "XXX", the specific type of the macro instruction performing the operation on the data may be determined according to "XXX". The instruction set required for executing the macro instruction may be determined according to the operation type, for example, when the operation type of a macro instruction is "XXX", the instruction set required for the macro instruction is all instruction sets required for performing the processing corresponding to "XXX". The input address may be an input address of data, an address for obtaining data such as a read address, and the output address may be an output address of processed data, an address for storing data such as a write address. The input amount may be information indicating the size of the data amount, such as the input size and the input length of the data. The output quantity may be information indicating the size of the data quantity, such as the output size and the output length of the data. The operands may include the length of the register, the address of the register, the identity of the register, the immediate, and the like. The immediate is the number given in the immediate addressing mode instruction. Instruction parameters may refer to parameters associated with the execution of the macro instruction. For example, the instruction parameters may be the address and length of the second operand, etc. The instruction parameters may be the size of the convolution kernel, the step size of the convolution kernel, the filling of the convolution kernel, etc.
In this implementation, for a macro instruction, it must include an operation code, i.e., an operation type, and at least one operation field, which includes an identifier of a specific device, an input address, an output address, an input quantity, an output quantity, an operand, and an instruction parameter. An opcode may be the portion of an instruction or field (usually denoted by a code) specified in a computer program that is to perform an operation, and is an instruction sequence number that tells the device executing the instruction which instruction specifically needs to be executed. The operation domain may be a source of all data required for executing the corresponding instruction, including parameter data, data to be operated on or processed, a corresponding operation method, or an address or the like storing the parameter data, the data to be operated on or processed, the corresponding operation method.
It should be understood that the instruction format and the contained content of the macro instructions may be set as desired by those skilled in the art, and the present disclosure is not limited thereto.
In this embodiment, the device determining module 11 may determine one or more operating devices according to the macro instruction. Instruction generation module 12 may generate one or more execution instructions. When a plurality of generated operating instructions are provided, the plurality of operating instructions may be executed in the same operating device or different operating devices, and the present disclosure is not limited thereto.
The neural network instruction generating device provided by the embodiment of the disclosure comprises a device determining module and an instruction generating module, wherein the device determining module is used for determining an operation device for executing a macro instruction according to the received macro instruction. The instruction generating module is used for generating an operation instruction according to the macro instruction and the operation equipment. The device can be used in a cross-platform mode, the applicability is good, the instruction conversion speed is high, the processing efficiency is high, the error probability is low, and the development labor and material cost is low.
Fig. 2 illustrates a block diagram of a neural network instruction generating device, according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 2, the apparatus may further include a macro instruction generation module 13. The macro instruction generating module 13 is configured to receive an instruction to be executed, and generate a macro instruction according to the determined identifier of the specific device and the instruction to be executed.
In this implementation, the specified device may be determined according to the operation type, input amount, output amount, and the like of the instruction to be executed. The received instruction to be executed can be one or more.
The instructions to be executed may include at least one of: the method comprises the steps of a calculation instruction to be executed, a control instruction to be executed and a data carrying instruction to be executed. The to-be-executed computation instruction may include at least one of a to-be-executed neural network computation instruction, a to-be-executed vector logic computation instruction, a to-be-executed matrix vector computation instruction, a to-be-executed scalar computation instruction, and a to-be-executed scalar logic computation instruction. The control instruction to be executed may include at least one of an unconditional jump instruction to be executed and a conditional jump instruction to be executed. The data-handling instructions to be executed may include at least one of read instructions to be executed and write instructions to be executed. The read to execute instruction may include at least one of a read neuron instruction to execute, a read synapse instruction to execute, and a read scalar instruction to execute. The write to execute instruction may include at least one of a write neuron to execute instruction, a write synapse to execute instruction, and a write scalar to execute instruction.
The instructions to be executed may contain at least one of the following options: operation type, input address, output address, input quantity, output quantity, operands, and instruction parameters.
In this implementation, when there is one instruction to be executed, the determined identifier of the specific device may be added to the instruction to be executed, so as to generate the macro instruction. For example, some instruction m to be executed is "XXX … … param". Where XXX is the operation type and param is the instruction parameter. The designated device m-1 may be determined according to the operation type "XXX" of the instruction m to be executed. Then, an identification (e.g., 09) specifying the device M-1 is added to the instruction M to be executed, and a macro instruction M "XXX 09, … … param" corresponding to the instruction M to be executed is generated. When the to-be-executed instruction is multiple, the identifier of the specified device corresponding to each determined to-be-executed instruction may be added to the to-be-executed instruction, and one macro instruction or multiple corresponding macro instructions may be generated according to the multiple to-be-executed instructions with the identifiers of the specified devices.
It should be understood that the instruction format and the content of the instructions to be executed can be set by those skilled in the art according to the needs, and the present disclosure is not limited thereto.
In one possible implementation, as shown in fig. 2, the device determining module 11 may include a first determining sub-module 111. The first determining sub-module 111 is configured to determine the specified device as an operating device when it is determined that the macro instruction includes an identifier of the specified device and a resource of the specified device meets an execution condition for executing the macro instruction. Wherein, the execution condition may include: the designated device contains a set of instructions corresponding to the macroinstructions.
In this implementation, the macro may include an identification of one or more designated devices that execute the macro. When the macro instruction includes the identifier of the specified device and the resource of the specified device meets the execution condition, the first determining sub-module 111 may directly determine the specified device as the operating device, so as to save the generation time for generating the operating instruction based on the macro instruction and ensure that the generated operating instruction can be executed by the corresponding operating device.
In one possible implementation, as shown in fig. 2, the apparatus may further include a resource acquisition module 14. The device determination module 11 may also include a second determination submodule 112. The resource obtaining module 14 is configured to obtain resource information of the alternative device. The second determining submodule 112 is configured to, when it is determined that the macro instruction does not include the identifier of the specified device, determine, according to the received macro instruction and resource information of the alternative device, an operating device for executing the macro instruction from the alternative device. Wherein the resource information may comprise a set of instructions contained by the alternative device. The instruction set included in the alternative device may be a set of instructions corresponding to the type of operation of one or more macro-instructions. The more instruction sets that the alternative device contains, the more types of macro instructions the alternative device is able to execute.
In this implementation, the second determining submodule 112 may determine, when it is determined that the macro instruction does not include the identifier of the specific device, one or more operating devices capable of executing the macro instruction from the alternative devices. Wherein the determined instruction set of the operating device includes an instruction set corresponding to a macro instruction. For example, the received macroinstruction is a neural network computing macroinstruction, and an alternative device containing an instruction set corresponding to the neural network computing macroinstruction may be determined as the operational device to ensure that it can operate the generated operational instruction.
In one possible implementation, as shown in fig. 2, the device determining module 11 may further include a third determining sub-module 113. When it is determined that the macro instruction includes the identifier of the specified device and the resource of the specified device does not satisfy the execution condition for executing the macro instruction, the third determining sub-module 113 determines the operating device according to the macro instruction and the resource information of the alternative device.
In this implementation, when it is determined that the macro includes the identifier of the specified device and the resource of the specified device does not satisfy the execution condition, the third determining sub-module 113 may determine that the specified device of the macro does not have the capability of executing the macro. The third determination submodule 113 may determine an operating device from among the alternative devices, and may determine an alternative device containing an instruction set corresponding to the macro instruction as the operating device.
In a possible implementation manner, as shown in fig. 2, the macro may include at least one of an input quantity and an output quantity, and the instruction generating module 12 is further configured to determine a data quantity of the macro and generate the operation instruction according to the data quantity of the macro, the macro and resource information of the operation device. The data volume of the macro instruction may be determined according to at least one of the input volume and the output volume, and the resource information of the operating device may further include at least one of a storage capacity and a remaining storage capacity.
The storage capacity of the operating device may refer to the amount of binary information that the memory of the operating device can accommodate. The remaining storage capacity of the operating device may refer to the storage capacity that the operating device is currently available for instruction execution after the occupied storage capacity is removed. The resource information of the running device can characterize the running capability of the running device. The larger the storage capacity and the larger the remaining storage capacity are, the stronger the operation capability of the operation device is.
In this implementation, the instruction generating module 12 may determine a specific manner of splitting the macro instruction according to the resource information of each operating device, the data size of the macro instruction, and the like, so as to split the macro instruction and generate the operating instruction corresponding to the operating device.
In one possible implementation, as shown in fig. 2, the instruction generating module 12 may include a first instruction generating submodule 121. The first instruction generating sub-module 121 is configured to, when it is determined that there is one running device and a resource of the running device does not meet a capacity condition for executing the macro instruction, split the macro instruction into a plurality of running instructions according to a running data amount and a data amount of the running device, so that the running device sequentially executes the plurality of running instructions. The operation data amount of the operation device may be determined according to the resource information of the operation device, each operation instruction may include at least one of an operation input amount and an operation output amount, and the operation input amount and the operation output amount may be determined according to the operation data amount.
In this implementation, the operation data amount of the operation device may be determined according to the storage capacity or the remaining storage capacity of the operation device. The capacity condition may be that the amount of operation data of the operation device is greater than or equal to the amount of data of the macro instruction, in other words, that the resource of the operation device does not satisfy the capacity condition for executing the macro instruction may mean that: the amount of operating data of the operating device is less than the amount of data of the macro instruction. The operation input quantity and the operation output quantity are required to be less than or equal to the operation data quantity so as to ensure that the generated operation instruction can be executed by the operation equipment. The operation input amount (or operation output amount) of different operation instructions in the plurality of operation instructions may be the same or different, and the disclosure does not limit this.
In this implementation, when it is determined that there is one running device and the resource of the running device meets the capacity condition for executing the macro instruction, the first instruction generation sub-module 121 may directly convert the macro instruction into one running instruction, and may also split the macro instruction into multiple running instructions, which is not limited in this disclosure.
In one possible implementation, as shown in FIG. 2, the instruction generation module 12 may include a second instruction generation submodule 122. The second instruction generating submodule 122 is configured to split the macro instruction according to the operation data volume and the data volume of each operating device when it is determined that a plurality of operating devices are provided, and generate an operation instruction corresponding to each operating device. The operation data amount of each operating device may be determined according to the resource information of each operating device, the operation instruction may include at least one of an operation input amount and an operation output amount, and the operation input amount and the operation output amount are determined according to the operation data amount of the operating device that executes the operation instruction.
In this implementation, the operation input quantity and the operation output quantity need to be less than or equal to the operation data quantity to ensure that the generated operation instruction can operate the device to execute. The second instruction generation sub-module 122 may generate one or more operation instructions for each operation device according to the operation data amount of each operation device, so as to be executed by the corresponding operation device.
In the above implementation, the operation instruction includes at least one of the operation input quantity and the operation output quantity, except that the data quantity of the operation instruction can be limited to be executed by the corresponding operation device. And special limiting requirements of different operation instructions on operation input quantity and/or operation output quantity can be met.
In a possible implementation manner, for some operation instructions that do not have a special limited requirement on the operation input amount and/or the operation output amount, where the operation input amount and/or the operation output amount may not be included, a default operation input amount and a default operation output amount may be preset, so that when it is determined that the operation input amount and the operation output amount do not exist in the received operation instructions, the operation device may use the default operation input amount and the default operation output amount as the operation input amount and the operation output amount of the operation instructions. By presetting the default operation input quantity and the default operation output quantity, the generation process of the operation instruction can be simplified, and the generation time of the operation instruction is saved.
In one possible implementation, default input quantities and default output quantities for different types of macro instructions may be preset. When the macro instruction does not include the input amount and the output amount, the corresponding default input amount and default output amount set in advance may be used as the input amount and the output amount of the macro instruction. And then determining the data volume of the macro instruction according to the default input volume and/or the default output volume, and generating an operation instruction according to the data volume of the macro instruction, the macro instruction and the resource information of the operation equipment. When the macro instruction does not include the input quantity and the output quantity, the generated operation instruction may not include the operation input quantity and the operation output quantity, or may include at least one of the operation input quantity and the operation output quantity. When the operation instruction does not include the operation input amount and/or the operation output amount, the operation device may execute the operation instruction according to a preset default operation input amount and/or a default operation output amount.
In a possible implementation manner, the instruction generating module 12 may further split the macro instruction according to the macro instruction and a preset macro instruction splitting rule to generate the operation instruction. The macro splitting rule may be determined according to a conventional macro splitting manner (for example, splitting according to a processing procedure of the macro, etc.), in combination with a threshold of an amount of data of execution of instructions that can be executed by all the alternative devices. The macro instruction is divided into the operation instructions of which the operation input quantity and the operation output quantity are less than or equal to the operation data quantity threshold value, so that the generated operation instructions can be executed in the corresponding operation equipment (the operation equipment is any one of the alternative equipment). The storage capacities (or remaining storage capacities) of all the candidate devices may be compared, and the determined minimum storage capacity (or remaining storage capacity) may be determined as an operation data amount threshold of the instruction that can be executed by all the candidate devices.
It should be understood that, the person skilled in the art can set the generation mode of the operation instruction according to the actual needs, and the present disclosure does not limit this.
In this embodiment, the operation instruction generated by the instruction generating module according to the macro instruction may be an instruction to be executed, or may be one or more analyzed instructions obtained by analyzing the instruction to be executed, which is not limited in this disclosure.
In one possible implementation, as shown in fig. 2, the apparatus may further include a queue building module 15. The queue building module 15 is configured to sort the operation instructions according to a queue sorting rule, and build an instruction queue corresponding to the operation device according to the sorted operation instructions.
In this implementation, an instruction queue uniquely corresponding to each execution device may be constructed for each execution device. The operating instructions can be sequentially sent to the operating equipment uniquely corresponding to the instruction queue according to the sequence of the operating instructions in the instruction queue; or the instruction queue may be sent to the execution device, so that the execution device sequentially executes the execution instructions in the instruction queue according to the order of the execution instructions in the instruction queue. By the mode, the operation equipment can execute the operation instruction according to the instruction queue, the operation instruction is prevented from being executed mistakenly and delayed, and the operation instruction is prevented from being omitted.
In this implementation, the queue sorting rule may be determined according to information such as a predicted execution time for executing the operation instruction, a generation time of the operation instruction, an operation input amount, an operation output amount, and an operation type related to the operation instruction itself, which is not limited by this disclosure.
In one possible implementation, as shown in FIG. 2, the apparatus may also include an instruction dispatch module 16. The instruction dispatch module 16 is configured to send the execution instruction to the execution device, so that the execution device executes the execution instruction.
In this implementation, when there is one execution instruction executed by the execution device, the execution instruction may be directly sent to the execution device. When the number of the operation instructions executed by the operation device is multiple, all of the multiple operation instructions may be sent to the operation device, so that the operation device sequentially executes the multiple operation instructions. The plurality of operation instructions can also be sequentially sent to the corresponding operation equipment, wherein after the operation equipment completes the current operation instruction, the next operation instruction corresponding to the current operation instruction is sent to the operation equipment each time. The manner in which the person skilled in the art can send the operation instruction to the operation device is set, and the present disclosure does not limit this.
In one possible implementation, as shown in FIG. 2, the instruction dispatch module 16 may include an instruction assembly submodule 161, an assembly translation submodule 162, and an instruction issue submodule 163. The instruction assembling sub-module 161 is used for generating an assembling file according to the operation instruction. The assembly translation sub-module 162 is used to translate the assembly file into a binary file. The instruction sending submodule 163 is configured to send the binary file to the operating device, so that the operating device executes the operating instruction according to the binary file.
By the mode, the data volume of the operation instruction can be reduced, the time for sending the operation instruction to the operation equipment is saved, and the conversion and execution speed of the macro instruction is improved.
In this implementation manner, after the binary file is sent to the running device, the running device may decode the received binary file to obtain a corresponding running instruction, and execute the obtained running instruction to obtain an execution result.
In a possible implementation manner, the running device may be one or any combination of a CPU, a GPU, and an embedded Neural-Network Processing Unit (NPU). In this way, the speed at which the device generates the run instruction from the macro instruction is increased.
In one possible implementation, the apparatus may be provided in the CPU and/or the NPU. The process of generating the operation instruction according to the macroinstruction is realized by the CPU and/or the NPU, and more possible modes are provided for realizing the device.
In one possible implementation, the compute macroinstruction refers to a macroinstruction for performing data computations, which may include at least one of machine learning computations, neural network computations, vector logic computations, matrix vector computations, scalar computations, and scalar logic computations.
The neural network computation macroinstruction may refer to a macroinstruction for computing a neural network algorithm. For example, a convolution calculation macro instruction, a Pooling calculation macro instruction, and the like, which calculate a neural network algorithm such as convolution (convolution) operation, Pooling (Pooling) operation, and the like. Different types of neural network computational macros correspond to different types of operations. For example, the operation type corresponding to the convolution calculation macro may be CONV.
A vector logic compute macro instruction may refer to a macro instruction that is used to perform a logical operation on a vector. For example, macro instructions that perform logical computations such as AND, Compare, OR, etc. on vectors. Different types of vector logic compute macroinstructions correspond to different operation types. For example, the operation type corresponding to the vector and the compute macro may be VAND, and the operation type corresponding to the vector or the compute macro may be VOR.
A matrix vector computation macro may refer to a macro that is used to compute matrices and vectors. For example, the macro instruction performs calculation such as matrix multiplication vector calculation, vector multiplication matrix calculation, tensor calculation, matrix addition calculation, and matrix subtraction calculation on the matrix and the vector. The different types of matrix vector calculation macro-instructions correspond to different operation types, for example, the operation type corresponding to the matrix addition calculation macro-instruction is MADD, and the operation type corresponding to the matrix multiplication vector calculation macro-instruction is MMV.
A scalar compute macro instruction may refer to a macro instruction used to perform arithmetic operations on scalars. For example, a macro instruction that performs calculation such as scalar addition, scalar subtraction, scalar multiplication, and scalar division on scalars. Different types of scalar compute macro instructions correspond to different operation types. For example, the operation type corresponding to the scalar subtraction macro is SSUB, and the operation type corresponding to the scalar addition macro is SADD.
The scalar logical compute macro instruction may be a macro instruction for performing a logical operation on a scalar. For example, a macro instruction performs logical operations such as and, compare, or, and not on scalars. Different types of scalar logic computation macro instructions correspond to different operation types, for example, the operation type corresponding to a scalar and computation macro instruction may be SAND, the operation type corresponding to a scalar or computation macro instruction may be SOR.
In one possible implementation, the control macro refers to a macro that controls the instruction stream to jump to a target jump location. The unconditional jump macro may be a macro for controlling the instruction stream to jump unconditionally to a specified location. The conditional jump macro may be a macro for controlling an instruction stream so that the conditional jump macro jumps to a specified location when a condition that needs to be satisfied is true.
In one possible implementation, the data transfer macro is a macro for performing transfer processing such as reading and writing of data. The read macro may be a macro for reading data from a memory to a location where the data is stored, or may be a macro for reading data. Depending on the type of data, the read macro may include a read neuron macro for reading in neuron data, a read synapse macro for reading in synapse data, and a read scalar macro for reading in scalar data. A write macro may refer to a macro that writes data from its storage location into memory, and may refer to a macro that is used to write data. Depending on the type of data, the write macro-instructions may include a write neuron macro-instruction for writing neuron data, a write synapse macro-instruction for writing synapse data, and a write scalar macro-instruction for writing scalar data. The neuron data is input neurons and output neurons in the neural network algorithm, and the synapse data is weight values in the neural network algorithm.
The present disclosure provides an operating device, which is configured to execute the operating instruction generated by the neural network instruction generating apparatus. The operating device comprises a control module and an execution module. The control module is used for acquiring data, the neural network model and the operation instruction, can also be used for analyzing the operation instruction, acquiring a plurality of analysis instructions and sending the plurality of analysis instructions and the data to the execution module. The execution module is used for executing a plurality of analysis instructions according to the data to obtain an execution result.
In one possible implementation, the execution device further includes a storage module. The memory module may include at least one of a register and a cache, and the cache may include a scratch pad cache. The cache may be used to store data. The registers may be used to store scalar data within the data.
In one possible implementation, the control module may include an instruction storage sub-module and an instruction processing sub-module. The instruction storage submodule is used for storing the operation instruction. The instruction processing submodule is used for analyzing the operation instruction to obtain a plurality of analysis instructions.
In one possible implementation, the control module may further include a store queue submodule. The storage queue submodule is used for storing an operation instruction queue, and the operation instruction queue comprises an operation instruction and a plurality of analysis instructions which are required to be executed by the operation equipment. And all the instructions in the operation instruction queue are sequentially arranged according to the execution sequence.
In one possible implementation, the execution module may further include a dependency processing sub-module. The dependency relationship processing submodule is used for caching the first analysis instruction in the instruction storage submodule when the first analysis instruction is determined to have an incidence relationship with a zeroth analysis instruction before the first analysis instruction, and extracting the first analysis instruction from the instruction storage submodule to send the first analysis instruction to the execution module after the zeroth analysis instruction is executed.
The association relationship between the first parsing instruction and the zeroth parsing instruction before the first parsing instruction may include: the first storage address interval for storing the data required by the first analysis instruction and the zeroth storage address interval for storing the data required by the zeroth analysis instruction have an overlapping area. Conversely, the no association relationship between the first parse instruction and the zeroth parse instruction may be that the first memory address interval and the zeroth memory address interval have no overlapping area.
The present disclosure provides a neural network instruction processing system, which includes the above neural network instruction generating device and the above operating device.
It should be noted that, although the neural network instruction generating device, the operating device, and the neural network instruction processing system are described above by taking the above embodiments as examples, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.
Application example
An application example according to the embodiment of the present disclosure is given below in conjunction with "a working process of generating an operation instruction by a neural network instruction generating device according to a macro instruction" as an exemplary application scenario to facilitate understanding of a flow of the neural network instruction generating device. It is to be understood by those skilled in the art that the following application examples are for the purpose of facilitating understanding of the embodiments of the present disclosure only and are not to be construed as limiting the embodiments of the present disclosure.
First, the instruction format of the macro instruction, the instruction format of the instruction to be executed, and the process of the execution device executing the execution instruction are described, and the following is a specific example.
The instruction format of the macro instruction may be the following format example.
The instruction format of the neural network computing macro may be:
Type device_id,input_addr,output_addr,input_h,input_w,input_c,output_h,output_w,output_c,[param1,param2,…]
the Type is an operation Type, the device _ id is an identifier of a designated device, the input _ addr is an input address, the output _ addr is an output address, the input _ h, the input _ w and the input _ c are input neuron scales (i.e., input quantities), the output _ h, the output _ w and the output _ c are output neuron scales (i.e., output quantities), and param1 and param2 are instruction parameters.
For the neural network computing macro instruction, the operation type, the input address and the output address must be contained, and the operation instruction generated according to the neural network computing macro instruction also must contain the operation type, the operation input address and the operation output address, wherein the operation input address and the operation output address are respectively determined according to the input address and the output address.
Taking convolution calculation macro as an example, the instruction format is: CONV device _ id, input _ addr, output _ addr, input _ h, input _ w, input _ c, output _ h, output _ w, output _ c, kernel, stride, pad. The convolution computation macro-instruction when invoked may be an example as follows:
@CONV#0,#4,#500,#5,#5,#32,#3,#3,#16,#3,#1,#0
wherein the operation type of the convolution calculation macro is CONV. Device 0 is designated. The input address of the data is address 4. The output address of the data is address 500. The input amount of data is 5x5x 32. The output of the data is 3x3x 16. The size of the convolution kernel is 3, the step size of the convolution kernel is 1, and the padding of the convolution kernel is 0.
In the example of the convolution calculation macro, the run instructions generated are "@ CONV #4, #500, #5, #5, #32, #3, #3, #16, #3, #1, # 0". After the operating device receives the operating instruction, the executing process is as follows: data of an input amount of 5x5x32 is acquired from address 4. When the convolution operation is performed on the data with the input amount of 5x5x32 according to the size (3), the step size (1) and the padding (0) of the convolution kernel in the operation instruction to obtain the execution result with the data amount of 3x3x16, the execution result is stored to the output address 500.
The instruction format for the vector logic to compute the macro-instructions may be:
Type device_id,input_addr,output_addr,input_size,output_size,[param1,param2,…]
the Type is an operation Type, the device _ id is an identifier of a designated device, the input _ addr is an input address, the output _ addr is an output address, the input _ size is the size of an input vector (i.e., an input quantity), the output _ size is the size of an output vector (i.e., an output quantity), and param1 and param2 are instruction parameters. The instruction parameter may be an address and length of the second operand.
For a vector logic computation macro instruction, the vector logic computation macro instruction must include an operation type, an input address and an output address, and an operation instruction generated by computing the macro instruction according to the vector logic must also include an operation type, an operation input address and an operation output address, wherein the operation input address and the operation output address are determined according to the input address and the output address respectively.
For example, the operation instructions generated by calculating macro instructions according to certain vector logic are "@ VAND #501, #7, #33, # 4". After the operation device receives the operation instruction, the execution process is as follows: an input vector of size 33 is acquired from the input address 501, and logical operation is performed on the input vector to obtain an output vector of size 4, and the output vector of size 4 is stored as an execution result at the output address 7.
The instruction format of the matrix vector compute macro may be:
Type device_id,input_addr,output_addr,input_size,output_size,[param1,param2,…]
the Type is an operation Type, idevic _ id is an identifier of a designated device, input _ addr is an input address, output _ addr is an output address, input _ size is the size of an input vector (i.e., an input quantity), output _ size is the size of an output vector (i.e., an output quantity), and param1 and param2 are instruction parameters. The instruction parameter may be an address and length of the second operand.
For the matrix vector computing macro-instruction, the macro-instruction must include an operation type, an input address and an output address, and an operation instruction generated by computing the macro-instruction according to the matrix vector must also include an operation type, an operation input address and an operation output address, wherein the operation input address and the operation output address are determined according to the input address and the output address respectively.
Taking "@ MADD #502, #8, #34, # 5" as an example of the operation instruction generated by calculating a macro instruction from a certain matrix vector. After the operation device receives the operation instruction, the execution process is as follows: an input matrix vector of size 34 is obtained from the input address 502. And performing matrix addition calculation on the input matrix vector to obtain an output matrix vector with the size of 5. The output matrix vector of size 5 is stored as the execution result at memory address 8.
The instruction format of the scalar compute macro may be:
Type device_id,op1,op2,ans
the Type is an operation Type, the device _ id is an identifier of a designated device, and the op1 and the op2 are two operands. And Ans is the storage address of the calculation result of the scalar calculation macro instruction or the identification of the register for storing the calculation result. The size of the acquired scalar quantity and the size of the scalar quantity output by calculating the acquired scalar quantity may be set in advance.
For scalar compute macro-instructions, it must contain an operation type, a first operand, a second operand, and an output address. And the operation instruction generated by the macro instruction calculated according to the scalar also comprises an operation type, a first operation operand, a second operation operand and an operation output address. Wherein the first operation operand, the second operation operand and the operation output address are determined according to the first operand, the second operand and the output address respectively.
Taking "@ SADD #503, #504, # 3" as an example of the execution instruction generated from a scalar calculation macro instruction. After the operation device receives the operation instruction, the execution process is as follows: a first scalar is fetched from an address 503 of the register and a second scalar is fetched from an address 504 of the register, the first scalar and the second scalar are added, and a result obtained by the addition calculation is stored as an execution result at a storage address 3 of the register.
The instruction format of the scalar logic compute macro instruction may be:
Type device_id,op1,op2,ans
the Type is an operation Type, the device _ id is an identifier of a designated device, and the op1 and the op2 are two operands. Ans is the storage address of the calculation result of the scalar logic calculation macro instruction or the identification of the register used for storing the calculation result. The size of the acquired scalar quantity and the size of the scalar quantity output by calculating the acquired scalar quantity may be set in advance.
For scalar logic compute macro-instructions, they must contain an operation type, a first operand, a second operand, and an output address. And the operation instruction generated by computing the macro instruction according to the scalar logic also comprises an operation type, a first operation operand, a second operation operand and an operation output address. Wherein the first operation operand, the second operation operand and the operation output address are determined according to the first operand, the second operand and the output address respectively.
Take the example where the execution instructions generated by a scalar logic calculation macro are "@ SAND #703, #704, # 8". After the operation device receives the operation instruction, the execution process is as follows: a first scalar is fetched from an address 703 of the register and a second scalar is fetched from an address 704 of the register, and the first scalar and the second scalar are subjected to an and logical operation, and the obtained result is stored as an execution result at a storage address 8 of the register.
The instruction format of the unconditional jump macro may be:
Jump device_id,src
wherein Jump is an operation type corresponding to the unconditional Jump macro instruction, device _ id is an identifier of the designated device, and src is a target Jump position to which the instruction stream needs to Jump. The target jump location may be the length of a register, the address of a register, the identity of a register, an immediate, etc.
For the unconditional jump macro, the unconditional jump macro must contain an operation type and a target jump position, and the operation instruction generated according to the unconditional jump macro must also contain the operation type and the target jump position. Wherein the operation target jump position is determined according to the target jump position.
Take the example where the run instruction generated from a certain unconditional Jump macro is "@ Jump # 505". After the operation device receives the operation instruction, the execution process is as follows: the current instruction stream is jumped to address 505 for continued execution.
The instruction format of the conditional jump macro instruction may be:
CB device_id,src,condition
the CB is an operation type corresponding to the conditional jump macro instruction, the device _ id is an identifier of a designated device, the src is a target jump position to which the instruction stream needs to jump, and the condition is a jump condition. For example, the condition may be "whether the value of the register is zero or not true", and when the value of the register is zero, the jump to the target jump position may be performed. The target jump location may be the length of a register, the address of a register, the identity of a register, an immediate, etc.
For the conditional jump macro, it must contain the operation type and the target jump position, and the operation instruction generated according to the conditional jump macro must also contain the operation type and the target jump position. Wherein the operation target jump position is determined according to the target jump position.
Take the example where the run instruction generated from a conditional jump macro instruction is "@ CB #506# h". After the operation device receives the operation instruction, the execution process is as follows: and judging whether the jump condition 'h' is true, and jumping the current instruction stream to the address 506 for continuous execution when the 'h' is true.
The instruction format for reading the neuron macroinstruction may be:
NLOAD device_id,src_addr,des_addr,size
the method comprises the steps of obtaining an NLOAD, obtaining device _ id, src _ addr, des _ addr and size, wherein the NLOAD is an operation type corresponding to a neuron macro instruction, the device _ id is an identifier of a designated device, the src _ addr is a data read-in address for reading neuron data, the des _ addr is a data encryption mode address for storing an encryption mode required for reading the neuron data, and the size is the read-in amount of the neuron data.
Take an example where the operation instruction generated from a certain read neuron macroinstruction is "@ NLOAD #505#506# 9". After the operation device receives the operation instruction, the execution process is as follows: the encryption method of the neuron data is acquired from the address 506, and the neuron data with the read amount of 9 is read from the address 505 according to the encryption method.
The instruction format of the read synapse macro may be:
WLOAD device_id,src_addr,des_addr,size
WLOAD is an operation type corresponding to a synapse reading macro instruction, device _ id is an identifier of a designated device, src _ addr is a data reading address for reading synapse data, des _ addr is a data encryption mode address for storing an encryption mode required for reading synapse data, and size is a reading amount of synapse data.
Take an example where the operation instruction generated according to a certain read synapse macro instruction is "@ WLOAD #507#508# 10". After the operation device receives the operation instruction, the execution process is as follows: the encryption method for reading the synapse data is obtained from the address 508, and the synapse data with the reading quantity of 10 is read from the address 507 according to the encryption method.
The instruction format of the read scalar macro instruction may be:
SLOAD device_id,src,des
the method comprises the steps of obtaining a scalar macro instruction, obtaining the scalar data read-in address, and obtaining the scalar data read-in address.
An example of an operation instruction generated by a certain read scalar macro instruction is "@ SLOAD #601# 602". After the operation device receives the operation instruction, the execution process is as follows: the encryption mode of the scalar data is obtained from the address 602, and the scalar data stored therein is read from the address 601 according to the encryption mode.
The data read-in address and the data encryption mode address contained in the neuron macro instruction, the synapse macro instruction and the scalar macro instruction can be the addresses, numbers, names and other identifications of the registers. For the neuron macro instruction reading, the synapse macro instruction reading and the scalar macro instruction reading, the neuron macro instruction reading and the scalar macro instruction reading must comprise an operation type, a data reading address and a data encryption mode address, and the operation instruction also must comprise the operation type, the data reading address operating and the data encryption mode address operating. The operation data reading address and the operation data encryption mode address are determined according to the data reading address and the data encryption mode address respectively.
The instruction format of the write neuron macroinstruction may be:
NSTORE device_id,src_addr,des_addr,size
the NSTORE is an operation type corresponding to the neuron macro instruction, the device _ id is an identifier of a designated device, the src _ addr is a data writing address for writing neuron data, the des _ addr is a data encryption mode address for storing an encryption mode required for writing the neuron data, and the size is the writing amount of the data.
Take an example where the run instruction generated according to a write neuron macro instruction is "@ NSTORE #603#604# 14". After the operation device receives the operation instruction, the execution process is as follows: the encryption of the neuron data is obtained from the address 604, according to which the neuron data to be written is written at the address 14.
The instruction format for writing a synapse macro may be:
WSTORE device_id,src_addr,des_addr,size
the WSTORE is an operation type corresponding to a writing synapse macro instruction, the device _ id is an identifier of a designated device, the src _ addr is a data writing address for writing synapse data, the des _ addr is a data encryption mode address for storing an encryption mode required for writing synapse data, and the size is the writing amount of the data.
Take an example where the operation instruction generated according to a certain write synapse macro instruction is "@ WSTORE #605#606# 15". After the operation device receives the operation instruction, the execution process is as follows: the encryption mode of the synapse data obtained from the address 606 is used, and the synapse data to be written is written into the address 14 according to the encryption mode.
The instruction format of the write scalar macro instruction may be:
SSTORE device_id,src,des
the SSTORE is an operation type corresponding to a scalar macro instruction, the device _ id is an identifier of a designated device, the src is a data writing address for writing scalar data, and the des is a data encryption mode address for storing an encryption mode required by writing scalar data.
Take the example where the execution instruction generated from a certain write scalar macro instruction is "@ SSTORE #607# 608". After the operation device receives the operation instruction, the execution process is as follows: the encryption mode of the scalar data is retrieved from address 608, and the scalar data to be written is written to address 14 according to the encryption mode.
The data writing address and the data encryption mode address contained in the writing neuron macro instruction, the writing synapse macro instruction and the writing scalar macro instruction can be the addresses, numbers, names and other identifications of the registers. For the write neuron macro instruction, the write synapse macro instruction and the write scalar macro instruction, the operation type, the data write address and the data encryption mode address are required to be contained, and the operation instruction also comprises the operation type, the operation data write address and the operation data encryption mode address. The operation data writing address and the operation data encryption mode address are determined according to the data writing address and the data encryption mode address respectively.
The instruction format of the instruction to be executed may be the following format example.
The instruction format of the neural network computation instruction to be executed may be:
Type input_addr,output_addr,input_h,input_w,input_c,output_h,output_w,output_c,[param1,param2,…]
the Type is an operation Type, input _ addr is an input address, output _ addr is an output address, input _ h, input _ w and input _ c are input neuron scales (i.e., input quantities), output _ h, output _ w and output _ c are output neuron scales (i.e., output quantities), and param1 and param2 are instruction parameters.
Taking the convolution instruction to be executed as an example, the instruction format is CONV input _ addr, output _ addr, input _ h, input _ w, input _ c, output _ h, output _ w, output _ c, kernel, stride, and pad. The convolution instructions to be executed when called may be:
@CONV#6,#500,#5,#5,#32,#3,#3,#16,#3,#1,#0
the operation type of the convolution instruction to be executed is convolution neural network calculation. The input address of the data is address 6. The output address of the data is address 500. The input amount of data is 5x5x 32. The output of the data is 3x3x 16. The size of the convolution kernel is 3, the step size of the convolution kernel is 1, and the padding of the convolution kernel is 0.
The instruction format of the vector logic computation instruction to be executed may be:
Type input_addr,output_addr,input_size,output_size,[param1,param2,…]
the Type is an operation Type, input _ addr is an input address, output _ addr is an output address, input _ size is the size of an input vector (i.e., an input quantity), output _ size is the size of an output vector (i.e., an output quantity), and param1 and param2 are instruction parameters. The instruction parameter may be an address and length of the second operand.
The instruction format of the matrix vector calculation instruction to be executed may be:
Type input_addr,output_addr,input_size,output_size,[param1,param2,…]
the Type is an operation Type, input _ addr is an input address, output _ addr is an output address, input _ size is the size of an input vector (i.e., an input quantity), output _ size is the size of an output vector (i.e., an output quantity), and param1 and param2 are instruction parameters. The instruction parameter may be an address and length of the second operand.
The instruction format of the scalar compute instruction to be executed may be:
Type op1,op2,ans
the Type is an operation Type, and the op1 and the op2 are two operands. And Ans is a storage address of a calculation result of the scalar calculation instruction to be executed or an identification of a register for storing the calculation result.
The instruction format of the scalar logic compute instruction to be executed may be:
Type op1,op2,ans
the Type is an operation Type, and the op1 and the op2 are two operands. And Ans is a storage address of a calculation result of the scalar logic calculation instruction to be executed or an identification of a register for storing the calculation result.
The instruction format of the unconditional jump instruction to be executed may be:
Jump src
wherein Jump is the operation type corresponding to the unconditional Jump instruction to be executed, and src is the target Jump position to which the instruction stream needs to Jump.
The instruction format of the conditional jump instruction to be executed may be:
CB src,condition
the CB is an operation type corresponding to a conditional jump instruction to be executed, src is a target jump position to which an instruction stream needs to jump, and condition is a jump condition. For example, the condition may be "whether the value of the register is zero or not true", and when the value of the register is zero, the jump to the target jump position may be performed.
The instruction format for the read neuron instruction to be executed may be:
NLOAD src_addr,des_addr,size
the NLOAD is an operation type corresponding to a neuron reading instruction to be executed, the src _ addr is a data reading address for reading neuron data, the des _ addr is a data encryption mode address for storing an encryption mode required for reading the neuron data, and the size is the reading amount of the neuron data.
The instruction format of the read synapse instruction to be executed may be:
WLOAD src_addr,des_addr,size
WLOAD is the operation type corresponding to the instruction for reading synapse to be executed, src _ addr is the data read-in address for reading synapse data, des _ addr is the data encryption mode address for storing the encryption mode required for reading synapse data, and size is the read-in amount of synapse data.
The instruction format in which the read scalar instruction is to be executed may be:
SLOAD src,des
wherein, SLOAD is the operation type corresponding to the scalar reading instruction to be executed, src is the data read-in address for reading scalar data, and des is the data encryption mode address for storing the encryption mode required for reading scalar data.
The instruction format of the write neuron instruction to be executed may be:
NSTORE src_addr,des_addr,size
wherein NSTORE is the operation type corresponding to the neuron writing instruction to be executed, src _ addr is the data writing address of the neuron data to be written, des _ addr is the data encryption mode address for storing the encryption mode required by the neuron data to be written, and size is the writing amount of the neuron data.
The instruction format of the write synapse instruction to be executed may be:
WSTORE src_addr,des_addr,size
the WSTORE is an operation type corresponding to a writing synapse instruction to be executed, src _ addr is a data writing address for writing synapse data, des _ addr is a data encryption mode address for storing an encryption mode required for writing synapse data, and size is writing amount of synapse data.
The instruction format of the write scalar primitive instruction to be executed may be:
SSTORE src,des
the SSTORE is an operation type corresponding to a scalar reading instruction to be executed, src is a data writing address for writing scalar data, and des is a data encryption mode address for storing an encryption mode required by writing scalar data.
Fig. 3a and 3b are schematic diagrams illustrating an application scenario of a neural network instruction generating device according to an embodiment of the present disclosure. As shown in FIGS. 3a and 3b, the alternative devices for executing the macro-instructions may be a plurality of devices, and the alternative devices may be CPU-1, CPU-2, …, CPU-n, NPU-1, NPU-2, …, NPU-n and GPU-1, GPU-2, … and GPU-n. The working process and principle of generating the operation instruction according to a certain macro instruction are as follows.
And acquiring resource information of the alternative device, wherein the resource information comprises the residual storage capacity and the storage capacity of the alternative device and an instruction set contained in the alternative device. The resource obtaining module 14 sends the obtained resource information of the candidate device to the device determining module 11 and the instruction generating module 12.
The device determination module 11 (including a first determination sub-module 111, a second determination sub-module 112, and a third determination sub-module 113)
And when the macro instruction is received, determining the running equipment for executing the macro instruction according to the received macro instruction. For example, the following macro instruction is received. Where the macro instructions may be from different platforms.
Macro instruction 1: @ XXX #01 … …
Macro instruction 2: @ SSS #02 … …
Macro instruction 3: @ DDD #04 … …
Macro instruction 4: @ NNN … …
When the first determining sub-module 111 determines that the macro instruction contains the identifier of the specified device and determines that the specified device contains the instruction set corresponding to the macro instruction, the first determining sub-module 111 may determine the specified device as an operating device for executing the macro instruction and send the identifier of the determined operating device to the instruction generating module 12. For example, the first determination submodule 111 may determine a specific device corresponding to the identifier 01, such as CPU-2 (the CPU-2 includes an instruction set corresponding to the macroinstruction 1), as an execution device for executing the macroinstruction 1. A specific device, such as CPU-1(CPU-1 contains an instruction set corresponding to macroinstruction 2) to which the identifier 02 corresponds may be determined as an execution device for executing macroinstruction 2.
When the third determining sub-module 113 determines that the macro instruction contains the identifier of the specified device and determines that the specified device does not contain the instruction set corresponding to the macro instruction, the third determining sub-module 113 may determine, as the operating device, the candidate device containing the instruction set corresponding to the macro instruction, and send the identifier of the determined operating device to the instruction generating module 12. For example, when it is determined that the specified device corresponding to the identifier 04 does not include the instruction set corresponding to the macro instruction 3, the third determination sub-module 113 may determine, as the execution device for executing the macro instruction 3, the candidate device, such as NPU-n or NPU-2, including the instruction set corresponding to the operation type DDD of the macro instruction 3.
When the second determining submodule 112 determines that the macro instruction does not have the identifier of the specified device (the position corresponding to the identifier of the specified device is empty, or the macro instruction does not include the field of "identifier of the specified device"), the second determining submodule 112 may determine the operating device from the alternative device according to the macro instruction and the resource information of the alternative device (the specific determination process is detailed in the description related to the second determining submodule 112), and send the determined identifier of the operating device to the instruction generating module 12. For example, since the macro instruction 4 does not have the identifier of the specified device, the second determining submodule 112 may determine, from the candidate devices, an execution device, for example, GPU-n (GPU-n includes an instruction set corresponding to the operation type NNN) for executing the macro instruction 4, according to the operation type NNN of the macro instruction 4 and resource information (included instruction set) of the candidate devices.
Instruction generation module 12 (including first instruction generation module 121 and second instruction generation module 122)
When the number of the operating devices is one and the resource of the operating device does not satisfy the capacity condition for executing the macro instruction, the first instruction generating module 121 splits the macro instruction into a plurality of operating instructions according to the operating data amount and the data amount of the operating device, and sends the plurality of operating instructions to the queue constructing module 15. For example, a plurality of execution instructions 2-1, 2-2, …, 2-n are generated based on the data amount of the macro instruction 2 and the execution data amount of the execution device CPU-1. And generating a plurality of operating instructions 4-1, 4-2, … and 4-n according to the data volume of the macro instruction 4 and the operating data volume of the operating device GPU-n.
When it is determined that there is one running device and the resource of the running device meets the capacity condition for executing the macro instruction, the first instruction generating module 121 may generate one running instruction according to the macro instruction and send the running instruction to the queue building module 15. For example, one operation instruction 1-1 is generated based on the data amount of the macro instruction 1 and the operation data amount of the operation device CPU-2.
When determining that a plurality of operating devices are provided, the second instruction generating module 122 splits the macro instruction according to the operating data amount of each operating device and the data amount of the macro instruction, generates an operating instruction corresponding to each operating device, and sends the operating instruction to the queue building module 15. For example, according to the data amount of the macro instruction 3, the operation data amount of the operation device NPU-n, and the operation data amount of the operation device NPU-2, a plurality of operation instructions 3-1, 3-2, …, 3-n are generated for the operation device NPU-n, and a plurality of operation instructions 3 ' -1, 3 ' -2, …, 3 ' -n are generated for the operation device NPU-2.
When receiving the operation instruction, all the operation instructions to be executed by each operation device are sorted according to the queue sorting rule, a unique corresponding instruction queue is constructed for each operation device according to the sorted operation instructions, and the instruction queue is sent to the instruction dispatching module 16. In particular, the amount of the solvent to be used,
for an operation instruction 1-1 executed by the operation device CPU-2. The instruction queue CPU-2 "constructed corresponding to the execution device CPU-2 includes only the execution instructions 1-1.
For a plurality of execution instructions 2-1, 2-2, …, 2-n executed by the execution device CPU-1. And sequencing the plurality of operating instructions 2-1, 2-2, … and 2-n according to a queue sequencing rule, and constructing an instruction queue CPU-1' corresponding to the operating equipment CPU-1 according to the sequenced plurality of operating instructions 2-1, 2-2, … and 2-n.
For a plurality of execution instructions 3-1, 3-2, …, 3-n executed by the execution device NPU-n. The multiple operating instructions 3-1, 3-2, …, 3-n are sorted according to a queue sorting rule, and an instruction queue NPU-n' corresponding to the operating equipment NPU-n is constructed according to the sorted multiple operating instructions 3-n, …, 3-2, 3-1.
For the plurality of execution instructions 3 ' -1, 3 ' -2, …, 3 ' -n executed by the execution device NPU-2. The plurality of operating instructions 3 '-1, 3' -2, …, 3 '-n are ordered according to a queue ordering rule, and an instruction queue NPU-2 "corresponding to the operating device NPU-2 is constructed according to the ordered plurality of operating instructions 3' -n, …, 3 '-2, 3' -1.
For the plurality of execution instructions 4-1, 4-2, …, 4-n executed by the execution device GPU-n. And sequencing the plurality of operating instructions 4-1, 4-2, … and 4-n according to a queue sequencing rule, and constructing an instruction queue GPU-n' corresponding to the operating equipment GPU-n according to the sequenced plurality of operating instructions 4-1, 4-2, … and 4-n.
After the instruction queues are received, the operation instructions in each instruction queue are sequentially sent to corresponding operation equipment, so that the operation equipment executes the operation instructions. For example, the execution instruction 1-1 included in the instruction queue CPU-2 ″ is sent to its corresponding execution device CPU-2. And sequentially sending a plurality of running instructions 2-1, 2-2, … and 2-n in the instruction queue CPU-1' to the corresponding running equipment CPU-1. And sequentially sending the plurality of operating instructions 3-n, …, 3-2 and 3-1 in the instruction queue NPU-n' to the corresponding operating equipment NPU-n. And sequentially sending the plurality of running instructions 3 '-n, …, 3' -2 and 3 '-1 in the instruction queue NPU-2' to the corresponding running equipment NPU-2. And sequentially sending the multiple operating instructions 4-1, 4-2, … and 4-n in the queue GPU-n' to the corresponding operating equipment GPU-n.
After receiving the instruction queue, the operation device CPU-2, the operation device CPU-1, the operation device NPU-n and the operation device NPU-2 execute the operation instructions in sequence according to the arrangement sequence of the operation instructions in the instruction queue. Taking the operating device CPU-2 as an example, a specific process of executing the received operating instruction will be described. The running device CPU-2 comprises a control module, an execution module and a storage module. The control module comprises an instruction storage submodule, an instruction processing submodule and a storage queue submodule, and the execution module comprises a dependency relationship processing submodule, which refers to the relevant description about the operation equipment in detail.
Assume that the execution instruction 1-1 generated from the macro instruction 1 is "@ XXX … …". After receiving the operation instruction 1-1, the operation device CPU-2 executes the operation instruction 1-1 as follows:
the control module of the operating device CPU-2 obtains data, a neural network model and an operating instruction 1-1. The instruction storage submodule is used for storing an operation instruction 1-1. The instruction processing submodule is used for analyzing the operation instruction 1-1, obtaining a plurality of analysis instructions such as an analysis instruction 0, an analysis instruction 1 and an analysis instruction 2, and sending the plurality of analysis instructions to the storage queue submodule and the execution module. The storage queue submodule is used for storing an operation instruction queue, the operation instruction queue comprises an analysis instruction 0, an analysis instruction 1, an analysis instruction 2 and other operation instructions which are required to be executed by the CPU-2 of the operation equipment, and all the instructions are sequentially arranged in the operation instruction queue according to the execution sequence. For example, the obtained sequence of the execution of the multiple analysis instructions is analysis instruction 0, analysis instruction 1, and analysis instruction 2, and there is an association relationship between analysis instruction 1 and analysis instruction 0.
After the execution module of the operating device CPU-2 receives the plurality of analysis instructions, the dependency relationship processing submodule judges whether an association relationship exists among the plurality of analysis instructions. And the dependency relationship processing submodule determines that the analysis instruction 1 and the analysis instruction 0 have an incidence relationship, caches the analysis instruction 1 into the instruction storage submodule, and extracts the analysis instruction 1 from the cache and sends the analysis instruction 1 to the execution module after determining that the analysis instruction 0 is executed, so that the execution module can execute the analysis instruction.
The execution module receives and executes the resolving instruction 0, the resolving instruction 1 and the resolving instruction 2 to complete the operation of the operation instruction 1-1.
The working process of the above modules can refer to the above related description.
Therefore, the device can be used in a cross-platform mode, the applicability is good, the instruction conversion speed is high, the processing efficiency is high, the error probability is low, and the cost of developing manpower and material resources is low.
The present disclosure provides a machine learning arithmetic device, which may include one or more of the above neural network instruction generating devices, and is configured to acquire data to be operated and control information from other processing devices, and execute a specified machine learning operation. The machine learning arithmetic device can obtain macro instructions or instructions to be executed from other machine learning arithmetic devices or non-machine learning arithmetic devices, and transmit the execution result to peripheral equipment (also called other processing devices) through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one neural network command generating device is included, the neural network command generating devices can be linked and transmit data through a specific structure, for example, a PCIE bus is used for interconnection and data transmission, so as to support larger-scale operation of the neural network. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.
Fig. 4a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in fig. 4a, the combined processing device includes the machine learning arithmetic device, the universal interconnection interface, and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices may cooperate with the machine learning computing device to perform computing tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device acquires required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Fig. 4b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 4b, the combined processing device may further include a storage device, and the storage device is connected to the machine learning operation device and the other processing device respectively. The storage device is used for storing data stored in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
The present disclosure provides a machine learning chip, which includes the above machine learning arithmetic device or combined processing device.
The present disclosure provides a machine learning chip package structure, which includes the above machine learning chip.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure. As shown in fig. 5, the board includes the above-mentioned machine learning chip package structure or the above-mentioned machine learning chip. The board may include, in addition to the machine learning chip 389, other kits including, but not limited to: memory device 390, interface device 391 and control device 392.
The memory device 390 is coupled to a machine learning chip 389 (or a machine learning chip within a machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each group of memory cells 393 is coupled to a machine learning chip 389 via a bus. It is understood that each group 393 may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.
In one embodiment, memory device 390 may include 4 groups of memory cells 393. Each group of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers therein, where 64bit is used for data transmission and 8bit is used for ECC check in the 72-bit DDR4 controller. It is appreciated that when DDR4-3200 particles are used in each group of memory cells 393, the theoretical bandwidth of data transfer may reach 25600 MB/s.
In one embodiment, each group 393 of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage of each memory unit 393.
The control device 392 is electrically connected to a machine learning chip 389. The control device 392 is used to monitor the state of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a single chip Microcomputer (MCU). For example, machine learning chip 389 may include multiple processing chips, multiple processing cores, or multiple processing circuits, which may carry multiple loads. Therefore, the machine learning chip 389 can be in different operation states such as a multi-load and a light load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.
The present disclosure provides an electronic device, which includes the above machine learning chip or board card.
The electronic device may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric rice cookers, humidifiers, washing machines, electric lamps, gas cookers, and range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasound apparatus and/or an electrocardiograph.
Fig. 6 illustrates a flow diagram of a neural network instruction generation method according to an embodiment of the present disclosure. As shown in fig. 6, the method is applied to the neural network instruction generating device, and includes step S41 and step S42. In step S41, an execution device that executes the macro instruction is determined based on the received macro instruction. In step S42, an operation instruction is generated based on the macro instruction and the operation device.
In one possible implementation, step S41 may include: and when the macro instruction is determined to contain the identification of the specified device and the resource of the specified device meets the execution condition for executing the macro instruction, determining the specified device as the operating device. Wherein, the execution condition may include: the designated device contains a set of instructions corresponding to the macroinstructions.
In one possible implementation, the method may further include: and acquiring resource information of the alternative equipment. Wherein, step S41 may further include: and when the macro instruction is determined not to contain the identifier of the specified device, determining the running device for executing the macro instruction from the alternative devices according to the received macro instruction and the resource information of the alternative devices. Wherein the resource information may comprise a set of instructions contained by the alternative device.
In one possible implementation, step S41 may further include: and when the macro instruction is determined to contain the identification of the specified equipment and the resource of the specified equipment does not meet the execution condition for executing the macro instruction, determining the running equipment according to the macro instruction and the resource information of the alternative equipment.
In one possible implementation, the macro instruction may contain at least one of an input quantity and an output quantity. Step S42 may include: and determining the data volume of the macro instruction, and generating an operation instruction according to the data volume of the macro instruction, the macro instruction and the resource information of the operation equipment. Wherein, the data amount can be determined according to at least one of the input amount and the output amount, and the resource information of the operation device can further comprise at least one of the storage capacity and the residual storage capacity.
In one possible implementation manner, generating the operation instruction according to the data size of the macro instruction, and the resource information of the operation device may include: when the number of the operating devices is determined to be one and the resources of the operating devices do not meet the capacity condition for executing the macro instructions, the macro instructions are split into a plurality of operating instructions according to the operating data and the data volume of the operating devices, so that the operating devices sequentially execute the plurality of operating instructions. The operation data volume of the operation device may be determined according to the resource information of the operation device, each operation instruction may include at least one of an operation input volume and an operation output volume, and the operation input volume and the operation output volume are determined according to the operation data volume.
In one possible implementation manner, generating the operation instruction according to the data size of the macro instruction, and the resource information of the operation device may include: when a plurality of running devices are determined, the macro instructions are split according to the running data volume and the data volume of each running device, and the running instructions corresponding to each running device are generated. The operation data amount of each operating device may be determined according to the resource information of each operating device, the operation instruction may include at least one of an operation input amount and an operation output amount, and the operation input amount and the operation output amount are determined according to the operation data amount of the operating device that executes the operation instruction.
In one possible implementation, the method may further include: and sequencing the operating instructions according to a queue sequencing rule, and constructing an instruction queue corresponding to the operating equipment according to the sequenced operating instructions.
In one possible implementation, the method may further include: and receiving an instruction to be executed, and generating a macro instruction according to the determined identifier of the specified device and the instruction to be executed.
In one possible implementation, the method may further include: and sending the operation instruction to the operation equipment so as to enable the operation equipment to execute the operation instruction.
In one possible implementation manner, sending the execution instruction to the execution device to cause the execution device to execute the execution instruction includes: generating an assembly file according to the operation instruction; translating the assembly file into a binary file; and sending the binary file to the running equipment so that the running equipment executes the running instruction according to the binary file.
In one possible implementation, the resource information may include at least one of a storage capacity of the alternative device, a remaining storage capacity, and an instruction set included in the alternative device.
In one possible implementation, the running device may be one or any combination of a CPU, a GPU and an NPU.
In one possible implementation, the method may be applied in a CPU and/or NPU.
In one possible implementation, the macro instruction may include at least one of the following instructions: calculating macros, controlling macros and data carrying macros.
Wherein the compute macroinstructions may include at least one of neural network compute macroinstructions, vector logic compute macroinstructions, matrix vector compute macroinstructions, scalar compute macroinstructions, and scalar logic compute macroinstructions. The control macro-instructions may include at least one of unconditional jump macro-instructions and conditional jump macro-instructions. The data handling macros can include at least one of read macros and write macros. Reading the macro instruction may include at least one of reading a neuron macro instruction, reading a synapse macro instruction, and reading a scalar macro instruction. The write macro instruction may include at least one of a write neuron macro instruction, a write synapse macro instruction, and a write scalar macro instruction.
In one possible implementation, the macro instruction may contain at least one of the following options: identification of a specified device for executing the macro-instruction, operation type, input address, output address, input quantity, output quantity, operand, and instruction parameters. The execution instructions may include at least one of the following options: operation type, input address, output address, operands, and instruction parameters.
According to the neural network instruction generation method provided by the embodiment of the disclosure, operation equipment for executing a macro instruction is determined according to the received macro instruction; and generating an operation instruction according to the macro instruction and the operation equipment. The method can be used in a cross-platform mode, and is good in applicability, high in instruction conversion speed, high in processing efficiency, low in error probability, and low in development labor and material cost.
The present disclosure also provides a neural network instruction execution method, which is applied to the above operating device, and the method includes: the data, the neural network model and the operation instruction are obtained through the operation equipment, the operation instruction is analyzed to obtain a plurality of analysis instructions, and the plurality of analysis instructions are executed according to the data to obtain an execution result.
In one possible implementation, the method may further include: the data and scalar data in the data are stored by the running device. The running equipment comprises a storage module, the storage module comprises any combination of a register and a cache, and the cache comprises a temporary cache. And the cache is used for storing data. And the register is used for storing scalar data in the data.
In one possible implementation, the method may further include:
storing the operation instruction through the operation equipment;
analyzing the operation instruction through the operation equipment to obtain a plurality of analysis instructions;
and storing an operation instruction queue through the operation equipment, wherein the operation instruction queue comprises an operation instruction and a plurality of analysis instructions, and the operation instruction queue operation instruction and the plurality of analysis instructions are sequentially arranged according to the executed sequence.
In one possible implementation, the method may further include:
the method comprises the steps that when the running equipment determines that the first analysis instruction and a zero analysis instruction before the first analysis instruction have an incidence relation, the first analysis instruction is cached, and after the execution of the zero analysis instruction is finished, the cached first analysis instruction is executed.
The method for analyzing the data comprises the following steps that an incidence relation exists between a first analysis instruction and a zeroth analysis instruction before the first analysis instruction: the first storage address interval for storing the data required by the first resolving instruction and the zeroth storage address interval for storing the data required by the zeroth resolving instruction have an overlapped area.
According to the neural network instruction execution method provided by the embodiment of the disclosure, the operation equipment is used for acquiring data, a neural network model and an operation instruction, analyzing the operation instruction to obtain a plurality of analysis instructions, and executing the plurality of analysis instructions according to the data to obtain an execution result. The method can be used in a cross-platform mode, and is good in applicability, high in instruction conversion speed, high in processing efficiency, low in error probability, and low in development labor and material cost.
The present disclosure also provides a neural network instruction processing method, which is applied to a neural network instruction processing system, and the neural network instruction processing system includes the neural network instruction generating device and the operating device. The method comprises the neural network instruction generating method applied to the neural network instruction generating device and the neural network instruction executing method applied to the operating equipment. The method can be used in a cross-platform mode, and is good in applicability, high in instruction conversion speed, high in processing efficiency, low in error probability, and low in development labor and material cost.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present disclosure may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of hardware or a form of a software program module.
The integrated modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (27)
1. An apparatus for neural network instruction generation, the apparatus comprising:
the device determining module is used for determining running devices for executing the macro instructions according to the received macro instructions;
the instruction generating module is used for generating an operation instruction according to the macro instruction and the operation equipment so as to enable the operation equipment to execute the received operation instruction;
the device determination module includes:
a first determining submodule, configured to determine, when it is determined that the macro instruction includes an identifier of a specific device and a resource of the specific device meets an execution condition for executing the macro instruction, the specific device as the operating device,
wherein the execution condition includes: the designated device contains an instruction set corresponding to the macroinstruction.
2. The apparatus of claim 1, further comprising:
a resource obtaining module for obtaining resource information of the alternative device,
the device determination module further includes:
a second determining submodule, configured to determine, when it is determined that the macro instruction does not include the identifier of the specified device, an operating device for executing the macro instruction from the candidate device according to the received macro instruction and the resource information of the candidate device,
wherein the resource information comprises a set of instructions contained by the alternative device.
3. The apparatus of claim 2, wherein the device determination module further comprises:
and a third determining submodule, configured to determine, when it is determined that the macro instruction includes the identifier of the specified device and the resource of the specified device does not meet an execution condition for executing the macro instruction, an operating device according to the macro instruction and the resource information of the candidate device.
4. The apparatus of any of claims 1-3, wherein the macro instruction includes at least one of an input quantity and an output quantity,
the instruction generating module is further configured to determine a data volume of the macro instruction, generate an operating instruction according to the data volume of the macro instruction, and the resource information of the operating device,
wherein the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one of a storage capacity and a remaining storage capacity.
5. The apparatus of claim 4, wherein the instruction generation module comprises:
a first instruction generation submodule, configured to split the macro instruction into multiple operation instructions according to an operation data amount and the data amount of the operation device when it is determined that there is one operation device and a resource of the operation device does not meet a capacity condition for executing the macro instruction, so that the operation device sequentially executes the multiple operation instructions,
the operation data volume of the operation equipment is determined according to the resource information of the operation equipment, each operation instruction comprises at least one of operation input volume and operation output volume, and the operation input volume and the operation output volume are determined according to the operation data volume.
6. The apparatus of claim 4, wherein the instruction generation module comprises:
a second instruction generation submodule, configured to split the macro instruction according to the operation data amount and the data amount of each operating device when it is determined that a plurality of operating devices are provided, to generate an operation instruction corresponding to each operating device,
the operation data volume of each operation device is determined according to the resource information of each operation device, the operation instruction comprises at least one of operation input volume and operation output volume, and the operation input volume and the operation output volume are determined according to the operation data volume of the operation device executing the operation instruction.
7. The apparatus of claim 1, further comprising:
and the queue construction module is used for sequencing the operating instructions according to a queue sequencing rule and constructing an instruction queue corresponding to the operating equipment according to the sequenced operating instructions.
8. The apparatus of claim 1, further comprising:
and the macro instruction generating module is used for receiving the instruction to be executed and generating the macro instruction according to the determined identifier of the specified equipment and the instruction to be executed.
9. The apparatus of claim 1, further comprising:
an instruction dispatching module for sending the operation instruction to the operation device,
wherein the instruction dispatch module comprises:
the instruction assembly submodule is used for generating an assembly file according to the operation instruction;
the assembly translation submodule is used for translating the assembly file into a binary file;
and the instruction sending submodule is used for sending the binary file to the operating equipment so as to enable the operating equipment to execute the operating instruction according to the binary file.
10. The apparatus of claim 1,
the running equipment is one or any combination of a CPU, a GPU and an NPU;
the device is arranged in a CPU and/or an NPU;
the macro instructions include at least one of the following instructions: calculating macros, control macros and data transfer macros,
wherein the compute macroinstructions comprise at least one of neural network compute macroinstructions, vector logic compute macroinstructions, matrix vector compute macroinstructions, scalar compute macroinstructions, and scalar logic compute macroinstructions,
the control macro-instruction includes at least one of an unconditional jump macro-instruction and a conditional jump macro-instruction,
the data handling macro comprises at least one of a read macro and a write macro, the read macro comprising at least one of a read neuron macro, a read synapse macro and a read scalar macro, the write macro comprising at least one of a write neuron macro, a write synapse macro and a write scalar macro;
the macro instruction includes at least one of the following options: an identification of a specified device for executing the macro-instruction, an operation type, an input address, an output address, an input quantity, an output quantity, an operand, and an instruction parameter,
the execution instructions include at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
11. A machine learning arithmetic device, the device comprising:
one or more neural network instruction generating devices as claimed in any one of claims 1 to 10, configured to obtain data to be operated and control information from other processing devices, perform specified machine learning operation, and transmit the execution result to other processing devices through the I/O interface;
when the machine learning arithmetic device comprises a plurality of neural network instruction generating devices, the plurality of neural network instruction generating devices can be connected through a specific structure and transmit data;
the plurality of neural network instruction generating devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support larger-scale machine learning operation; a plurality of the neural network instruction generating devices share the same control system or own respective control systems; the plurality of neural network instruction generating devices share a memory or have respective memories; the interconnection mode of the plurality of neural network instruction generating devices is any interconnection topology.
12. A combined treatment device, characterized in that the device comprises:
the machine learning computing device, universal interconnect interface, and other processing device of claim 11;
and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
13. The combined processing device according to claim 12, further comprising: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.
14. A machine learning chip, the machine learning chip comprising:
a machine learning computation apparatus according to claim 11 or a combined processing apparatus according to claim 12.
15. An electronic device, characterized in that the electronic device comprises:
the machine learning chip of claim 14.
16. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface apparatus and a control device and a machine learning chip according to claim 14;
wherein the machine learning chip is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the machine learning chip and external equipment;
and the control device is used for monitoring the state of the machine learning chip.
17. The board of claim 16,
the memory device includes: the multi-group memory cell, each group the memory cell with the machine learning chip passes through bus connection, the memory cell is: DDR SDRAM;
the machine learning chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit;
the interface device is as follows: a standard PCIE interface.
18. A neural network instruction generation method, the method comprising:
determining running equipment for executing the macro instruction according to the received macro instruction;
generating an operation instruction according to the macro instruction and the operation equipment so that the operation equipment executes the received operation instruction;
the determining, according to the received macro instruction, an operating device that executes the macro instruction includes:
when the macro instruction is determined to contain the identification of the specified device, and the resource of the specified device meets the execution condition for executing the macro instruction, determining the specified device as the running device,
wherein the execution condition includes: the designated device contains an instruction set corresponding to the macroinstruction.
19. The method of claim 18, further comprising:
the resource information of the alternative device is acquired,
the method for determining the running equipment for executing the macro instruction according to the received macro instruction comprises the following steps:
when determining that the macro instruction does not contain the identifier of the specified device, determining an operating device for executing the macro instruction from the alternative device according to the received macro instruction and the resource information of the alternative device,
wherein the resource information comprises a set of instructions contained by the alternative device.
20. The method of claim 19, wherein determining, based on the received macro instruction, an execution device to execute the macro instruction comprises:
and when the macro instruction is determined to contain the identifier of the specified device and the resource of the specified device does not meet the execution condition for executing the macro instruction, determining the running device according to the macro instruction and the resource information of the alternative device.
21. The method of any of claims 18-20, wherein the macro instruction includes at least one of an input quantity and an output quantity,
generating an operation instruction according to the macro instruction and the operation device, wherein the operation instruction comprises:
determining the data volume of the macro instruction, generating an operation instruction according to the data volume of the macro instruction, the macro instruction and the resource information of the operation equipment,
wherein the data amount is determined according to at least one of the input amount and the output amount, and the resource information of the operating device further includes at least one of a storage capacity and a remaining storage capacity.
22. The method of claim 21, wherein generating the run instruction according to the data size of the macro instruction, and the resource information of the run device comprises:
when the number of the running devices is determined to be one and the resource of the running device does not meet the capacity condition for executing the macro instruction, splitting the macro instruction into a plurality of running instructions according to the running data amount and the data amount of the running device so as to enable the running device to sequentially execute the plurality of running instructions,
the operation data volume of the operation equipment is determined according to the resource information of the operation equipment, each operation instruction comprises at least one of operation input volume and operation output volume, and the operation input volume and the operation output volume are determined according to the operation data volume.
23. The method of claim 21, wherein generating the run instruction according to the data size of the macro instruction, and the resource information of the run device comprises:
when a plurality of running devices are determined, splitting the macroinstruction according to the running data volume and the data volume of each running device to generate a running instruction corresponding to each running device,
the operation data volume of each operation device is determined according to the resource information of each operation device, the operation instruction comprises at least one of operation input volume and operation output volume, and the operation input volume and the operation output volume are determined according to the operation data volume of the operation device executing the operation instruction.
24. The method of claim 18, further comprising:
and sequencing the operating instructions according to a queue sequencing rule, and constructing an instruction queue corresponding to the operating equipment according to the sequenced operating instructions.
25. The method of claim 18, further comprising:
and receiving an instruction to be executed, and generating the macroinstruction according to the determined identifier of the specified equipment and the instruction to be executed.
26. The method of claim 18, further comprising:
sending the operating instruction to the operating device,
wherein sending the operation instruction to the operation device includes:
generating an assembly file according to the operation instruction;
translating the assembly file into a binary file;
and sending the binary file to the operating equipment so that the operating equipment executes the operating instruction according to the binary file.
27. The method of claim 18,
the running equipment is one or any combination of a CPU, a GPU and an NPU;
the method is applied to a CPU and/or an NPU;
the macro instructions include at least one of the following instructions: calculating macros, control macros and data transfer macros,
wherein the compute macroinstructions comprise at least one of neural network compute macroinstructions, vector logic compute macroinstructions, matrix vector compute macroinstructions, scalar compute macroinstructions, and scalar logic compute macroinstructions,
the control macro-instruction includes at least one of an unconditional jump macro-instruction and a conditional jump macro-instruction,
the data handling macro comprises at least one of a read macro and a write macro, the read macro comprising at least one of a read neuron macro, a read synapse macro and a read scalar macro, the write macro comprising at least one of a write neuron macro, a write synapse macro and a write scalar macro;
the macro instruction includes at least one of the following options: an identification of a specified device for executing the macro-instruction, an operation type, an input address, an output address, an input quantity, an output quantity, an operand, and an instruction parameter,
the execution instructions include at least one of the following options: the operation type, the input address, the output address, the operand, and the instruction parameter.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811221806.8A CN111079925B (en) | 2018-10-19 | 2018-10-19 | Operation method, device and related product |
PCT/CN2019/111852 WO2020078446A1 (en) | 2018-10-19 | 2019-10-18 | Computation method and apparatus, and related product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811221806.8A CN111079925B (en) | 2018-10-19 | 2018-10-19 | Operation method, device and related product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111079925A CN111079925A (en) | 2020-04-28 |
CN111079925B true CN111079925B (en) | 2021-04-09 |
Family
ID=70309258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811221806.8A Active CN111079925B (en) | 2018-10-19 | 2018-10-19 | Operation method, device and related product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111079925B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113704687B (en) * | 2020-05-21 | 2024-04-05 | 杭州海康威视数字技术股份有限公司 | Tensor calculation operation method, device and operation system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106390400A (en) * | 2015-07-28 | 2017-02-15 | 精工爱普生株式会社 | Calculation apparatus, calculation system, calculation method, and recording medium |
CN106557332A (en) * | 2016-11-30 | 2017-04-05 | 上海寒武纪信息科技有限公司 | A kind of multiplexing method and device of instruction generating process |
CN106991476A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | Apparatus and method for performing artificial neural network forward operation |
CN107016175A (en) * | 2017-03-23 | 2017-08-04 | 中国科学院计算技术研究所 | It is applicable the Automation Design method, device and the optimization method of neural network processor |
CN107315571A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing full articulamentum neutral net forward operation |
CN107450972A (en) * | 2017-07-04 | 2017-12-08 | 阿里巴巴集团控股有限公司 | A kind of dispatching method, device and electronic equipment |
CN107923741A (en) * | 2016-02-15 | 2018-04-17 | 欧姆龙株式会社 | Arithmetic unit, operation method and operation program |
CN108052347A (en) * | 2017-12-06 | 2018-05-18 | 北京中科睿芯智能计算产业研究院有限公司 | A kind of device for executing instruction selection, method and command mappings method |
CN108364061A (en) * | 2018-02-13 | 2018-08-03 | 北京旷视科技有限公司 | Arithmetic unit, operation execute equipment and operation executes method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8909576B2 (en) * | 2011-09-16 | 2014-12-09 | International Business Machines Corporation | Neuromorphic event-driven neural computing architecture in a scalable neural network |
-
2018
- 2018-10-19 CN CN201811221806.8A patent/CN111079925B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106390400A (en) * | 2015-07-28 | 2017-02-15 | 精工爱普生株式会社 | Calculation apparatus, calculation system, calculation method, and recording medium |
CN106991476A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | Apparatus and method for performing artificial neural network forward operation |
CN107923741A (en) * | 2016-02-15 | 2018-04-17 | 欧姆龙株式会社 | Arithmetic unit, operation method and operation program |
CN107315571A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing full articulamentum neutral net forward operation |
CN106557332A (en) * | 2016-11-30 | 2017-04-05 | 上海寒武纪信息科技有限公司 | A kind of multiplexing method and device of instruction generating process |
CN107016175A (en) * | 2017-03-23 | 2017-08-04 | 中国科学院计算技术研究所 | It is applicable the Automation Design method, device and the optimization method of neural network processor |
CN107450972A (en) * | 2017-07-04 | 2017-12-08 | 阿里巴巴集团控股有限公司 | A kind of dispatching method, device and electronic equipment |
CN108052347A (en) * | 2017-12-06 | 2018-05-18 | 北京中科睿芯智能计算产业研究院有限公司 | A kind of device for executing instruction selection, method and command mappings method |
CN108364061A (en) * | 2018-02-13 | 2018-08-03 | 北京旷视科技有限公司 | Arithmetic unit, operation execute equipment and operation executes method |
Also Published As
Publication number | Publication date |
---|---|
CN111079925A (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111079909B (en) | Operation method, system and related product | |
CN111079925B (en) | Operation method, device and related product | |
CN111078284B (en) | Operation method, system and related product | |
CN111078291B (en) | Operation method, system and related product | |
CN111079916B (en) | Operation method, system and related product | |
CN111079924B (en) | Operation method, system and related product | |
CN111078282B (en) | Operation method, device and related product | |
CN111078283B (en) | Operation method, device and related product | |
CN111079907B (en) | Operation method, device and related product | |
CN111079912B (en) | Operation method, system and related product | |
CN111078281B (en) | Operation method, system and related product | |
CN111079913B (en) | Operation method, device and related product | |
CN111079911B (en) | Operation method, system and related product | |
CN111078280B (en) | Operation method, device and related product | |
CN111079910B (en) | Operation method, device and related product | |
CN111078285B (en) | Operation method, system and related product | |
CN111079915B (en) | Operation method, device and related product | |
CN111078125B (en) | Operation method, device and related product | |
CN111078293B (en) | Operation method, device and related product | |
CN111079914B (en) | Operation method, system and related product | |
CN111381872A (en) | Operation method, device and related product | |
CN111325331B (en) | Operation method, device and related product | |
CN111399905B (en) | Operation method, device and related product | |
CN111400341B (en) | Scalar lookup instruction processing method and device and related product | |
CN112346707A (en) | Instruction processing method and device and related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |