CN113537476B - Computing device and related product - Google Patents
Computing device and related product Download PDFInfo
- Publication number
- CN113537476B CN113537476B CN202010301181.7A CN202010301181A CN113537476B CN 113537476 B CN113537476 B CN 113537476B CN 202010301181 A CN202010301181 A CN 202010301181A CN 113537476 B CN113537476 B CN 113537476B
- Authority
- CN
- China
- Prior art keywords
- tensor
- scaling
- input
- index
- activation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 claims abstract description 72
- 230000015654 memory Effects 0.000 claims abstract description 41
- 230000004913 activation Effects 0.000 claims description 109
- 238000011084 recovery Methods 0.000 claims description 59
- 238000000034 method Methods 0.000 claims description 24
- 238000013473 artificial intelligence Methods 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 13
- 230000005540 biological transmission Effects 0.000 claims description 7
- 238000012544 monitoring process Methods 0.000 claims description 5
- 230000003068 static effect Effects 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 abstract 1
- 238000001994 activation Methods 0.000 description 97
- 238000004891 communication Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 238000013528 artificial neural network Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 230000003213 activating effect Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000004044 response Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000005291 magnetic effect Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000013011 mating Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000008187 granular material Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Neurology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Image Processing (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The present disclosure relates to an arithmetic device and related products. The computing device may include a processor that may include a plurality of processing units for executing sequences of instructions, and a memory unit for storing data, which may include a random access memory and a register file. Multiple processing units in a processor may share some memory space, such as shared part of RAM memory space and register files, as well as having separate memory spaces. The computing device can improve the computing performance when the neural network model is computed.
Description
Technical Field
The disclosure relates to the technical field of artificial intelligence, and in particular relates to an arithmetic device and related products.
Background
In the field of artificial intelligence technology, a neural network algorithm is a machine learning algorithm which is very popular recently, and has very good effects in various fields, such as image recognition, voice recognition, natural language processing and the like.
Disclosure of Invention
Accordingly, it is desirable to provide an arithmetic device and related products for solving the above-mentioned problems.
According to an aspect of the present disclosure, there is provided an arithmetic device including:
The device comprises a processing unit, a first storage unit and a second storage unit;
the computing device receives input data from an external input and a binary executable file,
The operation device stores the activation table into a first storage unit and stores the input tensor into a second storage unit;
And when the processing unit runs the binary executable file, reading an input tensor from a second storage unit, performing scaling processing on the input tensor to obtain a scaled input tensor, searching an activation table in a first storage unit to activate the scaled input tensor to obtain an intermediate activation result, performing scaling recovery processing on the intermediate activation result to obtain an activation result, and outputting the activation result to the second storage unit.
According to another aspect of the present disclosure, there is provided an artificial intelligence chip comprising an arithmetic device as described above.
According to another aspect of the present disclosure, there is provided an electronic device comprising an artificial intelligence chip as described above.
According to another aspect of the present disclosure, there is provided a board including: a memory device, an interface device and a control device, and an artificial intelligence chip as described above;
Wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device;
The storage device is used for storing data;
the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment;
The control device is used for monitoring the state of the artificial intelligent chip.
According to the embodiment of the disclosure, the input tensor of the input activation operator can be scaled, so that the size of an element in the scaled input tensor is scaled to a proper range, an intermediate activation result is obtained after the scaled input tensor is activated by searching the activation table, and the intermediate activation result is recovered to obtain a final activation result. By means of the method and the device for scaling the input tensor, the input tensor is scaled to the range which can be covered by the existing activation table, a large number of activation tables do not need to be set according to the actual range of the elements in the input tensor, and therefore operator performance reduction caused by traversing of the large number of activation tables can be avoided, and performance of an activation operator is improved.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a schematic diagram of a processor according to an embodiment of the present disclosure.
Fig. 2 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure.
Fig. 3 shows a flow chart of an activation process according to an embodiment of the present disclosure.
Fig. 4 shows a block diagram of a board according to an embodiment of the present disclosure.
Fig. 5 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure.
Fig. 6 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, specification, and drawings of this disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of this disclosure are taken to specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in this disclosure and in the claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Nonlinear operators such as ReLU (RECTIFIED LINEAR units, modified linear units) used as activation functions, sigmoid, tanH (hyperbolic tangent functions), etc., sqrt, rsqrt, etc., for normalization operators, log, etc., for loss function operators, and sin/cos, etc., for position coding will be used in deep neural networks. Currently, in the related art, an interpolation table is used to implement these nonlinear operators, i.e. a number axis region is divided into a plurality of segments, each segment uses a straight line to fit a curve of a nonlinear function corresponding to the segment, the straight line is represented by a parameter (k, b), and the parameters k, b are respectively the slope and intercept of the straight line of the segment. And integrating the parameters (k, b) corresponding to each segment into a table serving as an activation table for inquiring during calculation. To cover a larger number axis region, multiple activation tables may be used to fit the nonlinear operator. Thus, all the activation tables need to be traversed during computation to achieve the final result.
The range of values of the elements in the input tensor of the nonlinear operator is not fixed, so that a relatively large numerical range is usually required to be supported to be able to cope with the requirements of the deep neural network. But the larger the support value range, the more active tables are needed. Since the final result of calculation needs to traverse all the active tables, the operator performance is obviously reduced as the number of active tables is increased, and a large number of active tables need to occupy more memory space, so that the operation performance is also affected.
In order to solve the above technical problems, the present disclosure provides an arithmetic device. The computing device may include a processor, which may include a processing unit, and may be a general-purpose processor, such as a CPU (Central Processing Unit ), or an artificial Intelligence Processor (IPU) for performing artificial intelligence operations. The artificial intelligence operations may include machine learning operations, brain-like operations, and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor may include, for example, one or a combination of a GPU (Graphics Processing Unit ), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (DIGITAL SIGNAL Process, digital signal processing unit), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) chip. The present disclosure is not limited by the specific type of processor.
In one possible implementation, the processors referred to in this disclosure may include multiple processing units, each of which may independently execute various tasks assigned thereto, such as: convolution operation task, pooling task or full connection task, etc. The present disclosure is not limited to the tasks that the processing unit operates on.
Fig. 1 shows a schematic diagram of a processor according to an embodiment of the present disclosure. As shown in fig. 1, the processor 100 may include a plurality of processing units 101 and a memory unit 102, the plurality of processing units 101 for executing sequences of instructions, and the memory unit 102 for storing data may include a random access memory (RAM, random Access Memory) and a register file. Multiple processing units 101 in processor 100 may share some memory space, such as shared part of RAM memory space and register files, as well as having separate memory space.
Fig. 2 illustrates a block diagram of an computing device according to an embodiment of the present disclosure, as illustrated in fig. 2, including a processing unit, a first storage unit, and a second storage unit, where the storage unit 102 in fig. 1 may be an example of the first storage unit, and the processing unit and the first storage unit in fig. 2 may be included as a part of a processor, and the processor may include a plurality of processing units.
In one possible implementation, the first memory unit may be an on-chip SRAM, and the second memory unit may be an off-chip DDR.
The operation device may receive input data and instructions input from the outside, wherein the input data may include an activation table and an input tensor to be processed, and the operation device stores the activation table in the first storage unit and stores the input tensor in the second storage unit.
When the processing unit runs the instruction, an input tensor is read from the second storage unit, scaling processing is carried out on the input tensor to obtain a scaled input tensor, an activation table in the first storage unit is searched for activating the scaled input tensor to obtain an intermediate activation result, scaling recovery processing is carried out on the intermediate activation result to obtain an activation result, and the activation result is output to the second storage unit.
Before operation, the operation device may load the packed input data and instructions into and store them to corresponding locations, e.g., store the activation table and instructions to SRAM and the input tensor to DDR.
In one possible implementation, the input tensor may be tensor data extracted from an image, audio or video. Specifically, the initial input tensor may be extracted according to the image, audio or video data, the above input tensor may be obtained by performing an operation on the initial input tensor by using a neural network, and a specific operation process may include convolution, full connection, and the like, that is, the input tensor input to the activation operator may be a result of each layer operation before passing through an activation layer of the neural network. The processing unit may execute the instructions to perform processing of the input data. The processing may include a scaling process, an activation, a scaling recovery process, as described above, where the scaling process may refer to operations of narrowing the data in the input tensor to within a range that the activation table can cover, for example, dividing the data in the input tensor by a multiple of the scaling, or multiplying by a scaling coefficient, or the like. The process of activation is the same as the way the activation table is traversed in the related art mentioned previously. The scaling recovery process may refer to processing the activated data again according to a multiple of the scaling to obtain equivalent data when not scaled, e.g. may be multiplied by the multiple of the scaling, etc.
According to the embodiment of the disclosure, the input tensor of the input activation operator can be scaled, so that the size of an element in the scaled input tensor is scaled to a proper range, an intermediate activation result is obtained after the scaled input tensor is activated by searching the activation table, and the intermediate activation result is recovered to obtain a final activation result. By the data activation method of the embodiment, the input tensor is scaled, the elements in the input tensor are scaled to the range which can be covered by the existing activation table, and a large number of activation tables do not need to be set according to the actual range of the elements in the input tensor, so that operator performance reduction caused by traversing a large number of activation tables can be avoided, and the performance of an activation operator is improved.
In one possible implementation, the scaling process includes:
The processing unit calculates a first scaling tensor according to the input tensor, the activation interval and the scaling coefficient, and performs para-multiplication on the first scaling tensor and the input tensor to obtain a scaling input tensor;
The scaling recovery process includes: the processing unit carries out recovery processing on the intermediate activation result by adopting the first recovery tensor according to the input tensor, the activation interval and the recovery coefficient first recovery tensor to obtain an activation result of the input tensor;
the activation interval is used for representing a numerical range which can be covered by the activation table, the scaling coefficient is used for representing a multiple of scaling the data in the input tensor, and the recovery coefficient is used for representing a multiple of scaling recovery of the intermediate activation result.
In one possible implementation, elements that fall within the range of values covered by the activation table may be activated by looking up the activation table, elements that do not fall within this range of values may not be activated by looking up the activation table, and a scaling process may be required.
In one possible implementation, the activation interval may be represented as (l, h), where l is a lower boundary, h is an upper boundary, and l and h may be data of the same value and opposite sign, for example, l is-1, and h is 1. It should be noted that the above representation of the activation interval is only one example of the present disclosure, and is not limiting of the present disclosure in any way, and it will be understood by those skilled in the art that l and h are not limited to the above-listed examples.
The scaling factor may be a parameter that scales elements of the input tensor that do not fall within the active interval, and may be used to represent a multiple of scaling data of the input tensor as described above. The scaling factor of embodiments of the present disclosure may be preset, for example, the scaling factor may be a parameter of an immediate form representation of the scaling instruction. The recovery coefficient may be determined according to the scaling coefficient, or may be preset, and the recovery coefficient may be used to obtain a first recovery tensor, and the intermediate activation result may be processed by the first recovery tensor to obtain a final activation result.
Fig. 3 shows a flow chart of an activation process according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 3, the procedure from the scaling process to the restoration process may include:
In step S11: calculating a first scaling tensor and a first recovery tensor according to the input tensor, the activation interval, the scaling coefficient and the recovery coefficient;
In step S12: performing para-multiplication on the first scaling tensor and the input tensor to obtain a scaling input tensor;
In step S13: searching an activation table to activate the zoom input tensor to obtain an intermediate activation result;
in step S14: and carrying out recovery processing on the intermediate activation result by adopting the first recovery tensor to obtain an activation result of the input tensor.
That is, the processing unit may calculate the first scaling tensor and the first recovery tensor together, and a specific process of calculating the first scaling tensor and the first recovery tensor may include: determining a first index tensor and a second index tensor according to the input tensor and the activation interval; calculating a second scaling tensor according to the second index tensor and the scaling coefficient, and calculating a second recovery tensor according to the second index tensor and the recovery coefficient; a first scaled tensor is calculated from the first index tensor and the second scaled tensor, and a first recovered tensor is calculated from the first index tensor and the second recovered tensor.
Wherein the shape of the first index tensor, the shape of the second index tensor and the shape of the input tensor are the same. The same shape may refer to the tensor having the same dimensions and the same number of elements in each dimension.
In one possible implementation, the values of the elements in the first index tensor and the second index tensor may be determined by: if any element in the input tensor is in the activation interval, the element at the position corresponding to the element in the first index tensor is 1, and the element at the position corresponding to the element in the second index tensor is 0; if any element in the input tensor is not in the active section, the element corresponding to the element in the first index tensor is 0, and the element corresponding to the element in the second index tensor is 1.
The above-determined manner is merely one example of the present disclosure, and the present disclosure is not limited thereto, and for example, the first index tensor or the second index tensor may be determined first, and then elements in the determined index tensor may be flipped to obtain another index tensor. That is, if any element of the input tensor is within the activation interval, the element of the first index tensor at the position corresponding to the element is 1, and if any element of the input tensor is not within the activation interval, the element of the first index tensor at the position corresponding to the element is 0. The first index tensor is flipped to obtain the second index tensor, and the flipping means that the element originally being 0 is set to 1, and the element originally being 1 is set to 0.
For example, assuming that the activation interval is [ -10,10], assuming that the input tensor is I, the input tensor is tensor data extracted from the image data, the input tensor I may be represented by NHWC (batch, height, width, channels), which may contain three channels RGB (Red, green, blue) for the image data, the data activation method of the present disclosure will be described using one channel in one picture as an example, for example, R channel, and the input tensor IR isThat can determine that the first index tensor M corresponding to the input tensor can beThe second index tensor N may be
The elements in the second index tensor may be multiplied by the scaling factor to obtain a second scaled tensor P, and the elements in the second index tensor may be multiplied by the recovery factor to obtain a second recovery tensor Q.
For example, assuming the scaling factor is α and the recovery factor is β, then the second scaling tensor P isThe second recovery tensor Q isThe size of the scaling factor may be a fixed value set in advance, such as 1/100, 1/50, etc., or may be adjusted according to the input tensor and the activation interval, for example, in the above example of the disclosure, the range of the numerical value representing one pixel is 0-255, the activation interval is [ -10, 10], and the scaling factor α may be set to 10/255. The recovery coefficient may be set to 1/α, or may be set to another value according to an actual activation function, which is not limited in this disclosure.
The first scaled tensor is the sum of the first index tensor and the second scaled tensor, and the first recovered tensor is the sum of the first index tensor and the second recovered tensor.
Assuming that the first scaling tensor is A and the first recovery tensor is B, taking the above example as an example, the first scaling tensorFirst recovery tensor
Para-multiplication may refer to multiplying elements of the same position. Taking the above example as an example, the first scaled tensor a is para-multiplied with the input tensor IR to obtain the scaled input tensor:
because the input tensor is scaled, elements in the input tensor can fall into an activation interval, and the scaled input tensor can be activated by looking up an activation table to obtain an activation tensor Y as an intermediate activation result.
Because the input tensor is scaled before the activation, the restoration processing can be performed after the activation to obtain a final activation result, and the whole activation process is completed.
The specific recovery processing manner may be different according to different activating operators, for example, for some activating operators, the activating result of the input tensor may be obtained by performing para-multiplication on Y and the first recovery tensor B, for example, sigmoid, tanH (hyperbolic tangent function), and the like. Or for some activating operators, performing para-addition on the Y and the first recovery tensor B to obtain an activating result of the input tensor, for example, log for a loss function operator.
According to the computing device disclosed by the invention, by scaling the input tensor, a larger input range can be covered by using fewer activation tables, and the range of legal input is expanded to the maximum extent under the condition that operator storage cost and calculation cost are not obviously reduced.
It should be noted that, for the sake of simplicity of description, the foregoing process embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of actions described, as some steps may take place in other order or simultaneously in accordance with the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.
It should be further noted that, although the steps in the flowchart are sequentially shown as indicated by arrows, the steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in the flowcharts may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order in which the sub-steps or stages are performed is not necessarily sequential, and may be performed in turn or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
In one possible implementation manner, an artificial intelligence chip is also disclosed, which includes the above-mentioned computing device.
In one possible implementation, a board is also disclosed, which includes a memory device, an interface device, and a control device, and the artificial intelligence chip described above; wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment; the control device is used for monitoring the state of the artificial intelligent chip.
Fig. 4 shows a block diagram of a board according to an embodiment of the present disclosure, and referring to fig. 4, the board may further include other mating components in addition to the chip 389, where the mating components include, but are not limited to: a memory device 390, an interface device 391 and a control device 392;
The memory device 390 is connected to the artificial intelligence chip through a bus for storing data. The memory device may include multiple sets of memory cells 393. Each group of storage units is connected with the artificial intelligent chip through a bus. It is understood that each set of memory cells may be DDR SDRAM (Double sided DATA RATE SDRAM, double speed synchronous dynamic random access memory).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 sets of the memory cells. Each set of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include 4 72-bit DDR4 controllers therein, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is understood that the theoretical bandwidth of data transfer can reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells.
In one embodiment, each set of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each storage unit.
The interface device is electrically connected with the artificial intelligent chip. The interface device is used for realizing data transmission between the artificial intelligent chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may be another interface, and the disclosure is not limited to the specific form of the other interface, and the interface unit may be capable of implementing a switching function. In addition, the results of the computation of the artificial intelligence chip are still transmitted back to the external device (e.g., server) by the interface device.
The control device is electrically connected with the artificial intelligence chip. The control device is used for monitoring the state of the artificial intelligent chip. Specifically, the artificial intelligent chip and the control device can be electrically connected through an SPI interface. The control device may comprise a single chip microcomputer (Micro Controller Unit, MCU). The artificial intelligent chip can comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, and can drive a plurality of loads. Therefore, the artificial intelligent chip can be in different working states such as multi-load and light-load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligent chip.
In one possible implementation, an electronic device is disclosed that includes the artificial intelligence chip described above. The electronic device includes a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, an intelligent terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
Fig. 5 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure. For example, electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 5, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including computer program instructions executable by processor 820 of electronic device 800 to perform the above-described methods.
Fig. 6 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to FIG. 6, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical features of the foregoing embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, all of the combinations of the technical features should be considered as being within the scope of the disclosure.
The foregoing may be better understood in light of the following clauses:
clause A1, an arithmetic device, the arithmetic device comprising:
The device comprises a processing unit, a first storage unit and a second storage unit;
the computing device receives input data from an external input and a binary executable file,
The operation device stores the activation table into a first storage unit and stores the input tensor into a second storage unit;
And when the processing unit runs the binary executable file, reading an input tensor from a second storage unit, performing scaling processing on the input tensor to obtain a scaled input tensor, searching an activation table in a first storage unit to activate the scaled input tensor to obtain an intermediate activation result, performing scaling recovery processing on the intermediate activation result to obtain an activation result, and outputting the activation result to the second storage unit.
Clause A2, the computing device of clause A1, the scaling process comprising:
The processing unit calculates a first scaling tensor according to the input tensor, the activation interval and the scaling coefficient, and performs para-multiplication on the first scaling tensor and the input tensor to obtain a scaling input tensor;
The scaling recovery process includes: the processing unit carries out recovery processing on the intermediate activation result by adopting the first recovery tensor according to the input tensor, the activation interval and the recovery coefficient first recovery tensor to obtain an activation result of the input tensor;
the activation interval is used for representing a numerical range which can be covered by the activation table, the scaling coefficient is used for representing a multiple of scaling the data in the input tensor, and the recovery coefficient is used for representing a multiple of scaling recovery of the intermediate activation result.
Clause A3, the computing device of clause A2, the processing unit being configured to:
determining a first index tensor and a second index tensor according to the input tensor and the activation interval;
Calculating a second scaling tensor according to the second index tensor and the scaling coefficient, and calculating a second recovery tensor according to the second index tensor and the recovery coefficient;
a first scaled tensor is calculated from the first index tensor and the second scaled tensor, and a first recovered tensor is calculated from the first index tensor and the second recovered tensor.
Clause a4 the computing device of clause A2 or A3, the shape of the first index tensor, the second index tensor and the shape of the input tensor being the same,
If any element in the input tensor is in the activation interval, the element at the position corresponding to the element in the first index tensor is 1, and the element at the position corresponding to the element in the second index tensor is 0;
if any element in the input tensor is not in the active section, the element corresponding to the element in the first index tensor is 0, and the element corresponding to the element in the second index tensor is 1.
Clause a5 the computing device of clause A3, wherein the processing unit is configured to multiply the element in the second index tensor by the scaling factor to obtain a second scaled tensor, and multiply the element in the second index tensor by the recovery factor to obtain a second recovery tensor.
Clause a6 the computing device of any of clauses A3-A5, wherein the first scaled tensor is a sum of a first index tensor and a second scaled tensor, and the first recovery tensor is a sum of the first index tensor and the second recovery tensor.
Clause A7. is the computing device of any of clauses A1-A6, the input tensor being tensor data extracted from an image, audio, or video.
Clause A8. is an arithmetic device according to any of clauses A1-A6,
The first memory unit is an on-chip Static Random Access Memory (SRAM), and the second memory unit is an off-chip Double Data Rate (DDR) memory.
Clause A9. is an artificial intelligence chip comprising the computing device of any of clauses A1-A8.
Clause a10 an electronic device comprising the artificial intelligence chip of clause A9.
Clause a11. A board card, the board card comprising: a memory device, interface means and control device, an artificial intelligence chip as set forth in clause A9;
Wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device;
The storage device is used for storing data;
the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment;
The control device is used for monitoring the state of the artificial intelligent chip.
The foregoing has outlined rather closely the embodiments of the present disclosure, and detailed description of the principles and embodiments of the present disclosure have been presented herein with the application of specific examples, the description of the examples above being merely intended to facilitate an understanding of the method of the present disclosure and its core ideas. Meanwhile, those skilled in the art will recognize that modifications or variations made on the basis of the specific embodiments and application scope of the present disclosure are within the scope of the protection of the present disclosure in light of the ideas of the present disclosure. In view of the foregoing, this description should not be construed as limiting the disclosure.
Claims (9)
1. An arithmetic device, characterized in that the arithmetic device comprises:
The device comprises a processing unit, a first storage unit and a second storage unit;
the computing device receives input data from an external input and a binary executable file,
The operation device stores the activation table into a first storage unit and stores the input tensor into a second storage unit;
When the processing unit runs the binary executable file, reading an input tensor from a second storage unit, performing scaling processing on the input tensor to obtain a scaled input tensor, searching an activation table in a first storage unit to activate the scaled input tensor to obtain an intermediate activation result, performing scaling recovery processing on the intermediate activation result to obtain an activation result, and outputting the activation result to the second storage unit;
The processing unit is used for:
determining a first index tensor and a second index tensor according to the input tensor and the activation interval;
Calculating a second scaling tensor according to the second index tensor and the scaling coefficient, and calculating a second recovery tensor according to the second index tensor and the recovery coefficient;
Calculating a first scaled tensor from the first index tensor and the second scaled tensor, and calculating a first recovered tensor from the first index tensor and the second recovered tensor;
the shape of the first index tensor and the second index tensor is the same as the shape of the input tensor,
If any element in the input tensor is in the activation interval, the element at the position corresponding to the element in the first index tensor is 1, and the element at the position corresponding to the element in the second index tensor is 0;
if any element in the input tensor is not in the active section, the element corresponding to the element in the first index tensor is 0, and the element corresponding to the element in the second index tensor is 1.
2. The computing device of claim 1, wherein the scaling process comprises:
The processing unit calculates a first scaling tensor according to the input tensor, the activation interval and the scaling coefficient, and performs para-multiplication on the first scaling tensor and the input tensor to obtain a scaling input tensor;
The scaling recovery process includes: the processing unit carries out recovery processing on the intermediate activation result by adopting the first recovery tensor according to the input tensor, the activation interval and the recovery coefficient first recovery tensor to obtain an activation result of the input tensor;
the activation interval is used for representing a numerical range which can be covered by the activation table, the scaling coefficient is used for representing a multiple of scaling the data in the input tensor, and the recovery coefficient is used for representing a multiple of scaling recovery of the intermediate activation result.
3. The computing device of claim 1, wherein the processing unit is configured to multiply elements in a second index tensor by the scaling factor to obtain a second scaling tensor, and multiply elements in a second index tensor by the recovery factor to obtain a second recovery tensor.
4. The computing device of claim 1, wherein the first scaled tensor is a sum of a first index tensor and a second scaled tensor, and the first recovered tensor is a sum of a first index tensor and a second recovered tensor.
5. The computing device of any of claims 1-4, wherein the input tensor is tensor data extracted from an image, audio, or video.
6. The computing device according to any one of claims 1 to 4, wherein,
The first memory unit is an on-chip Static Random Access Memory (SRAM), and the second memory unit is an off-chip Double Data Rate (DDR) memory.
7. An artificial intelligence chip, characterized in that the chip comprises an arithmetic device according to any one of claims 1-6.
8. An electronic device, characterized in that, the electronic device comprising the artificial intelligence chip of claim 7.
9. A board, characterized in that, the board includes: a memory device, interface means and control device, an artificial intelligence chip as claimed in claim 7;
Wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device;
The storage device is used for storing data;
the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment;
The control device is used for monitoring the state of the artificial intelligent chip.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010301181.7A CN113537476B (en) | 2020-04-16 | 2020-04-16 | Computing device and related product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010301181.7A CN113537476B (en) | 2020-04-16 | 2020-04-16 | Computing device and related product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113537476A CN113537476A (en) | 2021-10-22 |
CN113537476B true CN113537476B (en) | 2024-09-06 |
Family
ID=78120266
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010301181.7A Active CN113537476B (en) | 2020-04-16 | 2020-04-16 | Computing device and related product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113537476B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108009106A (en) * | 2016-10-27 | 2018-05-08 | 谷歌公司 | Neural computing module |
CN110147879A (en) * | 2019-04-03 | 2019-08-20 | 中国科学院计算技术研究所 | A kind of activation device and method for neural network processor |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341542B (en) * | 2016-04-29 | 2021-06-11 | 中科寒武纪科技股份有限公司 | Apparatus and method for performing recurrent neural networks and LSTM operations |
US10643297B2 (en) * | 2017-05-05 | 2020-05-05 | Intel Corporation | Dynamic precision management for integer deep learning primitives |
WO2018217829A1 (en) * | 2017-05-23 | 2018-11-29 | Intel Corporation | Methods and apparatus for enhancing a neural network using binary tensor and scale factor pairs |
US10691975B2 (en) * | 2017-07-19 | 2020-06-23 | XNOR.ai, Inc. | Lookup-based convolutional neural network |
CN108388446A (en) * | 2018-02-05 | 2018-08-10 | 上海寒武纪信息科技有限公司 | Computing module and method |
US20210216871A1 (en) * | 2018-09-07 | 2021-07-15 | Intel Corporation | Fast Convolution over Sparse and Quantization Neural Network |
US20190392296A1 (en) * | 2019-06-28 | 2019-12-26 | John Brady | Hardware agnostic deep neural network compiler |
-
2020
- 2020-04-16 CN CN202010301181.7A patent/CN113537476B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108009106A (en) * | 2016-10-27 | 2018-05-08 | 谷歌公司 | Neural computing module |
CN110147879A (en) * | 2019-04-03 | 2019-08-20 | 中国科学院计算技术研究所 | A kind of activation device and method for neural network processor |
Also Published As
Publication number | Publication date |
---|---|
CN113537476A (en) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110471858B (en) | Application program testing method, device and storage medium | |
CN111443917B (en) | Neural network operation optimization method and device and related products | |
CN106055097A (en) | Screen lighting control method and apparatus, and electronic device | |
CN105447462A (en) | Facial pose estimation method and device | |
CN109754011A (en) | Data processing method, device and Related product based on Caffe | |
CN111859035B (en) | Data processing method and device | |
EP3822742A1 (en) | Method, apparatus and device for triggering shooting mode, and storage medium | |
CN107341777A (en) | image processing method and device | |
CN112416352A (en) | Data processing method, data processing device, computer equipment and storage medium | |
CN112188103A (en) | Image processing method and device and electronic equipment | |
CN113297128B (en) | Data processing method, device, computer equipment and storage medium | |
CN113033761B (en) | Data processing method, device, computer equipment and storage medium | |
CN113033813B (en) | Data processing method, device, computer equipment and storage medium | |
EP3617990A1 (en) | Picture processing method and apparatus, computer readable storage medium, and electronic device | |
CN113537476B (en) | Computing device and related product | |
CN107527334A (en) | Human face light moving method and device | |
CN112765538B (en) | Data processing method, device, computer equipment and storage medium | |
CN106550267B (en) | Multimedia messages coding/decoding method and device | |
RU2645590C2 (en) | Data processing means and method | |
CN111783969A (en) | Data processing method, data processing device, computer equipment and storage medium | |
US20210117199A1 (en) | Method, device and storage medium for processing overhead of memory access | |
CN113762518B (en) | Data processing method, device, computer equipment and storage medium | |
CN109871848B (en) | Character recognition method and device for mobile terminal | |
CN111462284A (en) | Animation generation method, animation generation device and electronic equipment | |
CN113762488B (en) | Processor, data processing method, computer device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |