CN109615073B - Neural network model construction method, device and storage medium - Google Patents
Neural network model construction method, device and storage medium Download PDFInfo
- Publication number
- CN109615073B CN109615073B CN201811463775.7A CN201811463775A CN109615073B CN 109615073 B CN109615073 B CN 109615073B CN 201811463775 A CN201811463775 A CN 201811463775A CN 109615073 B CN109615073 B CN 109615073B
- Authority
- CN
- China
- Prior art keywords
- array
- neural network
- network model
- classification result
- cell
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a method for constructing a neural network model for realizing image classification, which comprises the following steps: s1, constructing a unit structure search network, a system structure search network, an image training set and a random coding array; s2, generating a neural network model by using a unit structure search network, a system structure search network and a random coding array; s3, inputting the image training set into a neural network model to obtain an actual classification result; s4, judging whether the actual classification result meets the preset condition, if not, performing the step S5; s5, updating a unit structure search network and a system structure search network according to the actual classification result and the theoretical classification of the image training set; s6, repeating S2-S5 until a judgment that the actual classification result meets the preset condition is made at S4. The method disclosed by the invention converts the original search space into two spaces of unit structure search and system structure search, and the optimal structure of the system is searched in an automatic learning mode, so that the flexibility of the generated model architecture is enhanced.
Description
Technical Field
The present invention relates to the field of image classification, and more particularly, to a method and an apparatus for constructing a neural network model, and a readable storage medium.
Background
The neural network model is a model structure which can be randomly stacked, basic components comprise FC (full connection layer), Convolution layer, Polling layer, Activation function and the like, the output of the former component is used as the input of the latter component, and different component connection modes and hyper-parameter configuration modes have different effects in different application scenes. Neural Architecture Search (NAS) aims to Search an optimal Neural network model from a collection of Neural network components. Common search methods include random search, bayesian optimization, evolutionary algorithms, reinforcement learning, gradient-based algorithms, and the like.
Zoph et al proposed to search a best network structure by using RNN in 2016, but since the search space is too large and it takes 22,400GPU working days, in 2017, it was changed to a convolution unit (conv cell) with the best effect by using reinforcement learning to search CNN, and then these conv cells were used to construct a better network, but the algorithm still needs 2000 GPU working days to obtain the current best architecture on CIFAR-10 and ImageNet. Many acceleration methods have been proposed, such as proposing weight sharing among multiple architectures, and microarchitectural searching based on the gradient descent of a continuous search space. However, these algorithms adopt a method of manually setting a network architecture, so that flexibility of the architecture is challenging.
Therefore, the current neural architecture search algorithm has the following problems:
(1) because the combination mode is too many, the search space is huge, and the function calculation cost is huge;
(2) the model architecture is designed manually and lacks flexibility.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a method for constructing a neural network model for implementing classification of an image, including the following steps:
s1, constructing a unit structure search network, a system structure search network, an image training set and a random coding array;
s2, generating the neural network model by using a unit structure search network, an architecture search network and a random code array;
s3, inputting the image training set into the neural network model to obtain an actual classification result;
s4, judging whether the actual classification result meets a preset condition according to the theoretical classification of the image training set, and if not, performing the step S5;
s5, updating the cell structure search network and the architecture search network according to the actual classification result and the theoretical classification;
s6, repeating the steps S2-S5 until a judgment is made at S4 that the actual classification result meets the preset conditions.
In some embodiments, the neural network model that obtains the actual classification result that satisfies the preset condition is the optimal neural network model.
In some embodiments, the step S2 further includes:
s21, searching the random coding array by using the unit structure searching network and the system structure searching network to obtain a unit structure coding array and a system structure coding array; and
and S22, decoding the unit structure coding array and the system structure coding array by using a decoder to obtain the neural network model.
In some embodiments, the cell structure encoding arrays include a falling cell array and a normal cell array.
In some embodiments, the descending cell array and the normal cell array each include a plurality of data blocks, wherein each data block includes constraint condition information, deep learning operation information, and stitching operation information.
In some embodiments, the architectural coding array is used to enable selection of deep learning operation information and selection of stitching operation information for the cell structure coding array.
In some embodiments, the step S4 further includes:
s41, calculating an error value of the actual classification result according to the theoretical classification of the image training set;
s42, judging whether the error value is smaller than the threshold value, if the error value is larger than the threshold value, then executing the step S5.
In some embodiments, the step S5 further includes:
s41, calculating a loss function value by using the actual classification result and the theoretical classification of the image training set;
s42, updating the cell structure search network and the architecture search network with the loss function value.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:
at least one processor; and
a memory storing a computer program operable on the processor to perform the steps of any of the methods of constructing a neural network model described above when the program is executed by the processor.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of any one of the methods for constructing a neural network model as described above.
The invention has the following beneficial technical effects: the embodiment provided by the invention converts the original search space into two spaces of unit structure search and system structure search, and searches the optimal structure of the system in an automatic learning mode, thereby enhancing the flexibility of the generated model architecture, reducing the computational complexity and realizing the efficient search of the architecture.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a schematic diagram of a cell structure according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a neural network model provided in an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for constructing a neural network model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a cell structure search network according to an embodiment of the present invention;
FIG. 5 is a flow chart of the decoding of the unit structure coding array and the decoding of the architecture coding array according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
According to one aspect of the invention, a method for constructing a neural network model for realizing image classification is provided, and the specific implementation concept is that a random array is generated firstly, then the random array is input into an encoder formed by a unit structure search network and an architecture search network to obtain an encoding array (a unit structure encoding array and an architecture encoding array), and then the encoding array is analyzed into a corresponding actual neural network model according to a decoding rule. And when the data of the image training set is trained by the actual neural network model, generating a corresponding loss function (loss) value, and finally updating the encoder according to the loss function (loss) value so as to update the actual neural network model.
In the invention, a plurality of cell structures are obtained through a cell structure coding array, wherein each cell structure (cell) is a final framework keystone, and then a plurality of cell structures are connected in series through a system structure to form a convolution network, thereby obtaining a neural network model.
As shown in fig. 1(a), a cell is a directed acyclic graph composed of N ordered nodes. Each node x(i)Are all feature maps in a convolutional network, with each directed edge (i, j) being for x(i)A certain operation o(i,j). Suppose that each cell has an input node hi-1And an output node hi+1. For a convolution cell, the input node is defined as the cell output of the previous layer, node hiFor input node toAnd (5) mapping the features after the row convolution operation. The output of the cell is obtained by applying a stitching operation in the channel dimension to all unused intermediate nodes.
Each intermediate node is computed based on all nodes before it:
wherein, gamma takes {0,1}, when r takes 1, it represents summation operation, when gamma takes 0, it represents splicing operation. A special zero operation is also included to indicate that there is no connection between the two nodes. The task of learning is thus reduced to learning its side-to-side operations.
Let O denote a set of candidate operations (e.g., convolution, max pooling, zero, etc.), and each operation is denoted by O (-). To fully represent the search space, the present invention parameterizes all possible choices of operations, the formula:
wherein the operation mixing weight of a pair of nodes (i, j) is given by the vector α of the dimension | O |(i,j)And (4) parameterizing. After the above formula, the task of searching the unit structure is converted into the discrete variable α ═ α(i,j)And (5) learning. And finally, obtaining a unit structure parameter corresponding to a discrete maximum possible operation, wherein alpha is the coding (encoding) of the unit structure and the value is {0,1 }. As shown in fig. 1(b), the dashed line corresponds to a zero operation.
The operations in O herein include 14 in total:
table 1 list of selectable operations in the search space of the present invention
For the ith intermediate node, a total of (i +1) × 14 parameters are required. Let i ═5, 6 x 14 parameters are needed, wherein the parameters include 4 intermediate nodes and 2 input nodes, and the parameters are excessive. Since each intermediate node is set to be fixed with only 2 inputs, in order to simplify the calculation amount, the O space is changed into O in the invention1And O2Two spaces, namely a cellular structure search network and an architectural search network, where O1Indicating the selection of the intermediate node to the input node, O2Indicating a selection of an operation.
The coding module is used for representing the model structure in a parameterization mode, namely different codes correspond to different model structures, and the process of searching the optimal model structure can be simplified into the process of searching the optimal code.
In the present invention, O2The operations in (1) include: identity, 1x3 and 3x1 contributions, 1x7 and 7x1 contributions, 3x3 differential contributions, 3x3 average potential, 3x3 max potential, 5x5 max potential, 7x7 max potential, 1x1 contributions, 3x3 contributions, 3x3 depthwise-subparable conv, 5x5 depthwise-subparable conv, 7x7 depthwise-subpable conv. All operation steps are 1 and their convolved feature maps are filled to maintain their spatial resolution. And convolution operations are performed sequentially using the Relu-CONV-BN and each separable convolution is always applied twice.
In the present invention, the convolution unit contains N-5 nodes (divide by h)i-1,hi,hi+1Outer) where the output node is defined as the deep-stitching of all unused intermediate nodes. The architecture is formed by stacking (connecting in series) a plurality of unit structures. The input node of cell k is equivalent to the output of cell k-1 and it is necessary to insert a 1x1 convolution. In order to dimension the model down, the architecture needs to insert a reduction cell (reduction cell) with all the operation steps of 2 connecting the input nodes. The structure is thus encoded as (α)normal,αreduce) All normal cells (normal cells) share αnormalAll reduce cells share αreduceThat is, the unit structure encoding array includes a falling unit array and a normal unit array, and the falling unit array and the normal unit array each include a plurality of data blocks, each of whichEach data block comprises constraint condition information, deep learning operation information and splicing operation information. And the system structure coding array is used for realizing the selection of the deep learning operation and the splicing operation of the unit structure coding array, namely, the normal cells and the reduce cells are connected in series to construct a model. FIG. 2 shows a neural network model in which M, L, N represents the number of normal cells repeated.
In some embodiments, according to an aspect of the present invention, an embodiment of the present invention provides a method for constructing a neural network model for image classification, as shown in fig. 3, which may include the following steps:
and S1, constructing a unit structure search network, an architecture search network, an image training set and a random coding array.
In some embodiments, the input of the cell structure search network is the output of the last cell structure search, and the final output represents the coding matrix of the cell structure, and the present invention assumes that one cell structure (as shown in fig. 4(a), and the dashed line in the figure represents the jump connection) is composed of five blocks (as shown in fig. 4 (b)) with the same structure. The concrete meanings of the cell coding matrix are as follows, each block: the first 6 columns correspond to the hidden layer A selection and the middle 13 columns correspond to the layer o2Selection of 13 deep learning operations in space; since each block has a 2 x 19 matrix of two operations; finally, splicing operation selection 1x 2 is carried out; thus, 2 × 19 is changed into 1 × 38 form, and then 1 × 2 is followed, so that a 1 × 40 coding matrix is obtained.
The invention assumes 5 blocks in total for 1 cell, and 5x 40 is obtained. There are two types of cells (normal and reduce) in total, so the cell search finally outputs a 10 x 40 matrix.
The constraint conditions in the structure search are as follows:
(1) the first 6 columns correspond to the selection ranges of the inputs: the input of the first intermediate node can only be from hi-1,hiCan only be selected from hi-1,hi,h0And so on until the input selectable range of the fifth intermediate node is hi-1,hi,h0,h1,h2,h3Block not selected lastLine splicing and outputting; 0 means not used, 1 means used, and the 6 columns only have one 1 present.
(2) Selection range of 13 deep learning operations: identity, 1x3 and 3x1 conjugation, 1x7 and 7x1 conjugation, 3x3 differential conjugation, 3x3 average substitution, 3x3 max substitution, 5x5 max substitution, 7x7 max substitution, 1x1 conjugation, 3x3 conjugation, 3x3 depthwise-subparable constv, 5x5 depthwise-subparable constv, 7x7 depthwise-subparable constv; 0 means not used, 1 means used, and the 13 columns can only have one 1 present.
(3) Selecting a range of the combination operation after two operations in the block: element addition and channel dimension splicing; 0 means not used, 1 means used, and the 2 columns only have one 1 present.
(4) The dimensions of the reduce cell and the normal cell are reduced, the number of channels is increased, and the channels are divided into 2 classes for realization.
The input of the architecture search network is the output of the last architecture search, and the output is the encoding result of the architecture (architecture layer number x3 matrix, as shown in table 2).
Normal cell | Reduce | None | |
1 | 0 | 0 | |
0 | 0 | 1 | |
0 | 1 | 0 | |
… | … | … |
Table 2 architecture coding (assuming 10 system layers, 10 x3 matrix is formed)
Different from the manual setting of the architecture by the previous work, the invention also encodes the architecture, further searches the optimal model and enhances the flexibility of the model architecture, which is a great advantage of the invention.
The original search space dimension is equal to (cell space dimension × system space dimension); now the search space dimension is equal to (cell space dimension + system space dimension); since the spatial dimension of the cell is far beyond billions, the spatial dimension of the system is about 3layer_numTherefore, the dimension of the original search space is far larger than that of the current search space, and the original mode needs about 3 more flowerslayer_numThe time is doubled, and the advantages of the adopted existing coding and decoding modes are more obvious along with the increase of the number of the layers of the net.
And S2, generating the neural network model by using the unit structure search network, the architecture search network and the random code array.
In some embodiments, step S2 may further include the steps of:
s21, searching the random coding array by using the unit structure searching network and the system structure searching network to obtain a unit structure coding array and a system structure coding array; and
and S22, decoding the unit structure coding array and the system structure coding array by using a decoder to obtain the neural network model.
In some embodiments, as shown in FIG. 5, FIG. 5 illustrates a cell structure encoding array decoding rule and an architecture encoding array decoding rule. The decoding rule of the unit structure coding array is as follows, h _ (i-1) is input, h _ i is obtained through convolution operation, subscript corresponding to 1 of M0 < 0:5 > is taken, hidden layer A can be obtained according to the coding rule, subscript corresponding to 1 of M0 < 6:18 > is taken, corresponding operation of the hidden layer A can be obtained according to the coding rule, similarly, subscript corresponding to 1 of M0 < 19:24 > is taken, hidden layer B can be obtained according to the coding rule, subscript corresponding to 1 of M0 < 25:37 > is taken, corresponding operation of the hidden layer B can be obtained according to the coding rule, subscript corresponding to 1 of M0 < 38:39 > is taken, and hidden layer fusion operation can be obtained according to the coding rule, so that a new hidden layer is obtained. And (3) decoding the structure of the 2 nd block according to M [1] [ ], decoding 5 blocks in sequence and finally splicing together to obtain the cell structure. The decoding rule of the system structure coding array is as follows, images batch is input, subscript of N0 corresponding to 1 is taken, the layer is connected with cell type according to the coding rule, the decoding cell structure is continuously connected in series by analogy, and finally the model structure is output.
And S3, inputting the image training set into the neural network model to obtain an actual classification result.
And S4, judging whether the actual classification result meets the preset condition according to the theoretical classification of the image training set.
In some embodiments, step S4 may include:
s41, calculating an error value of the actual classification result according to the theoretical classification of the image training set;
in some embodiments, the error value may be a value of error/total value in the actual classification result, for example, if there are 100 classification results in total, and there are 50 correct classification results, the error value is 0.5.
And S42, judging whether the error value is smaller than a threshold value, and if the error value is larger than the threshold value, performing the subsequent steps.
The threshold may be set according to actual requirements, and may be 0.05-0.15. For example, if the desired result is more accurate, the threshold may be set to a lower value, such as 0.1, or lower, such as 0.05.
S5, updating the cell structure search network and the architecture search network according to the actual classification result and the theoretical classification.
In some embodiments, step S5 may further include the steps of:
s41, calculating a loss function value by using the actual classification result and the theoretical classification of the image training set;
s42, updating the cell structure search network and the architecture search network with the loss function value.
In some embodiments, the technical solutions in the prior art may be adopted to implement the updating of the unit structure search network and the architecture search network by using the loss function value. For example, the loss function value updating unit structure may be used to search for a network and parameters of the architecture search network.
In some embodiments, the unit structure search network and the architecture search network are continuously updated with the loss function value to finally obtain an optimal loss function value, and at this time, the obtained neural network model is the optimal model.
S6, repeating the steps S2-S5 until a judgment is made at S4 that the actual classification result meets the preset conditions.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 6, an embodiment of the present invention further provides a computer apparatus 501, including:
at least one processor 520; and
a memory 510, said memory 510 storing a computer program 511 executable on said processor, said processor 520 when executing said program performing the steps of any of the methods of constructing a neural network model as described above.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 7, an embodiment of the present invention further provides a computer-readable storage medium 601, the computer-readable storage medium 601 stores a computer program 610, and the computer program 610, when executed by a processor, performs the steps of any one of the methods for constructing a neural network model as described above.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
In addition, the apparatuses, devices and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television and the like, or may be a large terminal device, such as a server and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed in the embodiment of the present invention may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) described herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions described herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Claims (9)
1. A method of constructing a neural network model for enabling classification of images, the method comprising the steps of:
s1, constructing a unit structure search network, a system structure search network, an image training set and a random coding array;
s2, generating the neural network model by using a unit structure search network, an architecture search network and a random code array;
s3, inputting the image training set into the neural network model to obtain an actual classification result;
s4, judging whether the actual classification result meets a preset condition according to the theoretical classification of the image training set, and if not, performing the step S5;
s5, updating the cell structure search network and the architecture search network according to the actual classification result and the theoretical classification;
s6, repeating the steps S2-S5 until the judgment that the actual classification result meets the preset condition is obtained in S4;
wherein the step S2 further includes:
s21, searching the random coding array by using the unit structure searching network and the system structure searching network to obtain a unit structure coding array and a system structure coding array; and
and S22, decoding the unit structure coding array and the system structure coding array by using a decoder to obtain the neural network model.
2. The method of claim 1, wherein the neural network model that yields the actual classification result satisfying the preset condition is an optimal neural network model.
3. The method of claim 1, wherein the cell structure encoding arrays include a falling cell array and a normal cell array.
4. The method of claim 3, wherein the descending cell array and the normal cell array each include a plurality of data blocks, wherein each data block includes constraint information, deep learning operation information, and stitching operation information.
5. The method of claim 4, wherein the architectural coding array is used to enable selection of deep learning operation information and selection of stitching operation information for the cell structure coding array.
6. The method of claim 1, wherein the step S4 further comprises:
s41, calculating an error value of the actual classification result according to the theoretical classification of the image training set;
s42, judging whether the error value is smaller than the threshold value, if the error value is larger than the threshold value, then executing the step S5.
7. The method of claim 1, wherein the step S5 further comprises:
s41, calculating a loss function value by using the actual classification result and the theoretical classification of the image training set;
s42, updating the cell structure search network and the architecture search network with the loss function value.
8. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor, when executing the program, performs the method of any of claims 1-7.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811463775.7A CN109615073B (en) | 2018-12-03 | 2018-12-03 | Neural network model construction method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811463775.7A CN109615073B (en) | 2018-12-03 | 2018-12-03 | Neural network model construction method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109615073A CN109615073A (en) | 2019-04-12 |
CN109615073B true CN109615073B (en) | 2021-06-04 |
Family
ID=66006198
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811463775.7A Active CN109615073B (en) | 2018-12-03 | 2018-12-03 | Neural network model construction method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109615073B (en) |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10678244B2 (en) | 2017-03-23 | 2020-06-09 | Tesla, Inc. | Data synthesis for autonomous control systems |
US10671349B2 (en) | 2017-07-24 | 2020-06-02 | Tesla, Inc. | Accelerated mathematical engine |
US11409692B2 (en) | 2017-07-24 | 2022-08-09 | Tesla, Inc. | Vector computational unit |
US11893393B2 (en) | 2017-07-24 | 2024-02-06 | Tesla, Inc. | Computational array microprocessor system with hardware arbiter managing memory requests |
US11157441B2 (en) | 2017-07-24 | 2021-10-26 | Tesla, Inc. | Computational array microprocessor system using non-consecutive data formatting |
US11561791B2 (en) | 2018-02-01 | 2023-01-24 | Tesla, Inc. | Vector computational unit receiving data elements in parallel from a last row of a computational array |
US11215999B2 (en) | 2018-06-20 | 2022-01-04 | Tesla, Inc. | Data pipeline and deep learning system for autonomous driving |
US11361457B2 (en) | 2018-07-20 | 2022-06-14 | Tesla, Inc. | Annotation cross-labeling for autonomous control systems |
US11636333B2 (en) | 2018-07-26 | 2023-04-25 | Tesla, Inc. | Optimizing neural network structures for embedded systems |
US11562231B2 (en) | 2018-09-03 | 2023-01-24 | Tesla, Inc. | Neural networks for embedded devices |
SG11202103493QA (en) | 2018-10-11 | 2021-05-28 | Tesla Inc | Systems and methods for training machine models with augmented data |
US11196678B2 (en) | 2018-10-25 | 2021-12-07 | Tesla, Inc. | QOS manager for system on a chip communications |
US11816585B2 (en) | 2018-12-03 | 2023-11-14 | Tesla, Inc. | Machine learning models operating at different frequencies for autonomous vehicles |
US11537811B2 (en) | 2018-12-04 | 2022-12-27 | Tesla, Inc. | Enhanced object detection for autonomous vehicles based on field view |
US11610117B2 (en) | 2018-12-27 | 2023-03-21 | Tesla, Inc. | System and method for adapting a neural network model on a hardware platform |
US11150664B2 (en) | 2019-02-01 | 2021-10-19 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
US10997461B2 (en) | 2019-02-01 | 2021-05-04 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11567514B2 (en) | 2019-02-11 | 2023-01-31 | Tesla, Inc. | Autonomous and user controlled vehicle summon to a target |
US10956755B2 (en) | 2019-02-19 | 2021-03-23 | Tesla, Inc. | Estimating object properties using visual image data |
CN110059804B (en) * | 2019-04-15 | 2021-10-08 | 北京迈格威科技有限公司 | Data processing method and device |
DE102019206620A1 (en) * | 2019-04-18 | 2020-10-22 | Robert Bosch Gmbh | Method, device and computer program for creating a neural network |
CN110175671B (en) * | 2019-04-28 | 2022-12-27 | 华为技术有限公司 | Neural network construction method, image processing method and device |
CN110278370B (en) * | 2019-06-21 | 2020-12-18 | 上海摩象网络科技有限公司 | Method and device for automatically generating shooting control mechanism and electronic equipment |
CN112215332B (en) * | 2019-07-12 | 2024-05-14 | 华为技术有限公司 | Searching method, image processing method and device for neural network structure |
CN110659721B (en) * | 2019-08-02 | 2022-07-22 | 杭州未名信科科技有限公司 | Method and system for constructing target detection network |
CN110555514B (en) * | 2019-08-20 | 2022-07-12 | 北京迈格威科技有限公司 | Neural network model searching method, image identification method and device |
CN110428046B (en) * | 2019-08-28 | 2023-12-15 | 腾讯科技(深圳)有限公司 | Method and device for acquiring neural network structure and storage medium |
CN110659690B (en) * | 2019-09-25 | 2022-04-05 | 深圳市商汤科技有限公司 | Neural network construction method and device, electronic equipment and storage medium |
CN110751267B (en) * | 2019-09-30 | 2021-03-30 | 京东城市(北京)数字科技有限公司 | Neural network structure searching method, training method, device and storage medium |
CN111191785B (en) * | 2019-12-20 | 2023-06-23 | 沈阳雅译网络技术有限公司 | Structure searching method based on expansion search space for named entity recognition |
CN113326929A (en) * | 2020-02-28 | 2021-08-31 | 深圳大学 | Progressive differentiable network architecture searching method and system based on Bayesian optimization |
CN113469891A (en) * | 2020-03-31 | 2021-10-01 | 武汉Tcl集团工业研究院有限公司 | Neural network architecture searching method, training method and image completion method |
CN113705276B (en) * | 2020-05-20 | 2024-08-27 | 武汉Tcl集团工业研究院有限公司 | Model construction method, model construction device, computer equipment and medium |
CN111931904A (en) * | 2020-07-10 | 2020-11-13 | 华为技术有限公司 | Neural network construction method and device |
CN114926698B (en) * | 2022-07-19 | 2022-10-14 | 深圳市南方硅谷半导体股份有限公司 | Image classification method for neural network architecture search based on evolutionary game theory |
CN117707795B (en) * | 2024-02-05 | 2024-05-10 | 南京邮电大学 | Graph-based model partitioning side collaborative reasoning method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874956A (en) * | 2017-02-27 | 2017-06-20 | 陕西师范大学 | The construction method of image classification convolutional neural networks structure |
CN107172428A (en) * | 2017-06-06 | 2017-09-15 | 西安万像电子科技有限公司 | The transmission method of image, device and system |
CN108021983A (en) * | 2016-10-28 | 2018-05-11 | 谷歌有限责任公司 | Neural framework search |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140249882A1 (en) * | 2012-10-19 | 2014-09-04 | The Curators Of The University Of Missouri | System and Method of Stochastic Resource-Constrained Project Scheduling |
CN105303252A (en) * | 2015-10-12 | 2016-02-03 | 国家计算机网络与信息安全管理中心 | Multi-stage nerve network model training method based on genetic algorithm |
CN106295803A (en) * | 2016-08-10 | 2017-01-04 | 中国科学技术大学苏州研究院 | The construction method of deep neural network |
US10019655B2 (en) * | 2016-08-31 | 2018-07-10 | Adobe Systems Incorporated | Deep-learning network architecture for object detection |
-
2018
- 2018-12-03 CN CN201811463775.7A patent/CN109615073B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108021983A (en) * | 2016-10-28 | 2018-05-11 | 谷歌有限责任公司 | Neural framework search |
CN106874956A (en) * | 2017-02-27 | 2017-06-20 | 陕西师范大学 | The construction method of image classification convolutional neural networks structure |
CN107172428A (en) * | 2017-06-06 | 2017-09-15 | 西安万像电子科技有限公司 | The transmission method of image, device and system |
Non-Patent Citations (1)
Title |
---|
Progressive Neural Architecture Search;Chenxi Liu1 等;《arXiv》;20180726;1-20 * |
Also Published As
Publication number | Publication date |
---|---|
CN109615073A (en) | 2019-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109615073B (en) | Neural network model construction method, device and storage medium | |
US11531889B2 (en) | Weight data storage method and neural network processor based on the method | |
KR20160117537A (en) | Hierarchical neural network device, learning method for determination device, and determination method | |
CN113128432B (en) | Machine vision multitask neural network architecture searching method based on evolution calculation | |
CN111898750A (en) | Neural network model compression method and device based on evolutionary algorithm | |
CN110009048B (en) | Method and equipment for constructing neural network model | |
CN109902808B (en) | Method for optimizing convolutional neural network based on floating point digital variation genetic algorithm | |
CN114462594A (en) | Neural network training method and device, electronic equipment and storage medium | |
CN116109920A (en) | Remote sensing image building extraction method based on transducer | |
CN115294337B (en) | Method for training semantic segmentation model, image semantic segmentation method and related device | |
CN114912578A (en) | Training method and device of structure response prediction model and computer equipment | |
CN117910518B (en) | Method and system for analyzing generated data | |
JP6795721B1 (en) | Learning systems, learning methods, and programs | |
KR102382491B1 (en) | Method and apparatus for sequence determination, device and storage medium | |
CN112561050A (en) | Neural network model training method and device | |
KR20230069578A (en) | Sign-Aware Recommendation Apparatus and Method using Graph Neural Network | |
CN112381147A (en) | Dynamic picture similarity model establishing method and device and similarity calculating method and device | |
CN111539517A (en) | Graph convolution neural network generation method based on graph structure matrix characteristic vector | |
CN111312340A (en) | SMILES-based quantitative structure effect method and device | |
CN113905066B (en) | Networking method of Internet of things, networking device of Internet of things and electronic equipment | |
CN116997911A (en) | Accelerating convolutional neural networks to perform convolutional operations | |
CN110147804B (en) | Unbalanced data processing method, terminal and computer readable storage medium | |
CN111143641A (en) | Deep learning model training method and device and electronic equipment | |
CN113435572A (en) | Construction method of self-evolution neural network model for intelligent manufacturing industry | |
CN112115914A (en) | Target detection method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |