CN111507410B - Construction method of rolling capsule layer and classification method and device of multi-view images - Google Patents
Construction method of rolling capsule layer and classification method and device of multi-view images Download PDFInfo
- Publication number
- CN111507410B CN111507410B CN202010309310.7A CN202010309310A CN111507410B CN 111507410 B CN111507410 B CN 111507410B CN 202010309310 A CN202010309310 A CN 202010309310A CN 111507410 B CN111507410 B CN 111507410B
- Authority
- CN
- China
- Prior art keywords
- capsule
- layer
- input
- output layer
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The application provides a construction method of a convolution capsule layer, the convolution capsule layer at least comprises an input layer and an output layer, the input layer and the output layer are provided with a plurality of capsules, and the method comprises the following steps: s1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule; s2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector; s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer; s4, obtaining the input of the capsule of the output layer according to the distribution probability; and S5, activating the input of the capsules of the output layer through a Squash activation function to obtain the output of the capsules of the output layer. In addition, a construction device of the convolution capsule layer, a multi-view image classification method and device and an electronic device are also provided.
Description
Technical Field
The application relates to the technical field of pattern recognition, in particular to a construction method of a rolling capsule layer and a classification method and device of multi-view images.
Background
Convolutional Neural Networks (CNNs) have made a breakthrough in many computer vision tasks in recent years and are significantly superior to many traditional tactical feature-driven models. Two common topics for improving CNN performance are increasing the depth and width of the network (e.g., the number of levels of the network and the number of cells per level), and using as much training data as possible. Although CNN has been successful, it also has many limitations, such as invariance caused by merging and an inability to understand the spatial relationship between elements, and to address these limitations, a dynamic routing based CapsNet network has been proposed, comprising only one convolutional layer and one fully connected capsule layer, which has shown comparable results to CNN in several standard datasets. In addition to dynamic routing, the use of EM routing to represent the matrix capsule of each entity by a gesture matrix has many extensions, such as data enhancement using mixed hit and miss layers. Attempts by existing algorithms to create a depth CapsNet by simply stacking fully connected capsule layers will result in an architecture similar to the MLP model, but with some limitations. First, dynamic routing used in capsule networks is an extremely computationally expensive process, and having multiple routing layers results in increased training and reasoning times. Second, it has recently been shown that stacking fully connected capsule layers together can lead to poor learning of the middle layer. This is because when there are too many capsules, the coupling coefficient tends to be too small, thereby attenuating gradient flow and inhibiting learning. Third, it has been shown that, particularly in the lower layers, the relevant cells tend to concentrate in local areas. Although local routing can make explicit use of this observation, such local routing cannot be implemented in fully connected capsules.
Disclosure of Invention
Technical problem to be solved
The application provides a construction method of a rolling capsule layer and a classification method and device of multi-view images, which at least solve the technical problems.
(II) technical scheme
In a first aspect, the present application provides a method of constructing a layer of convoluted capsules, the convoluted capsule layer comprising at least an input layer having a plurality of capsules and an output layer, the method comprising: s1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule; s2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector; s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer; s4, obtaining the input of the capsule of the output layer according to the distribution probability; and S5, activating the input of the capsules of the output layer through a Squash activation function to obtain the output of the capsules of the output layer.
Optionally, the method for constructing the self-attention route includes: will predict vector [ wj,hj,ni,nj,dj]Transpose to obtain [ wj,hj,nj,ni,dj]So as to output the number n of capsules corresponding to the jth capsule of the output layerjAs heads of a multi-headed attention mechanism, along niCalculating the correlation between initial prediction vectors of the ith capsule of the input layer after affine transformation in the dimension where wjFor the width of the convolved feature map, hjFor the height of the convolved feature map, niFor the number of elements of the i-th capsule of the input layer, njNumber of elements of jth capsule of output layer, djIs the dimension of the capsule.
Optionally, the assignment probability calculation process is as follows:
obtaining attention value head of prediction vectorhWherein, in the step (A),x is a query vectorY is a key value vectorZ is a vector of values
Taking the attention value as a weight coefficient from the input layer capsule to the output layer capsule;
splicing the weight coefficients to obtain the probability value c from the input layer capsule to the output layer capsuleij=Concat(head1,...,headh,…,headH),H=nj。
Optionally, the input of the capsule of the output layer is calculated by:
wherein s isjIs the input of the capsules of the output layer,cijthe probability value of an input layer capsule to an output layer capsule,for the prediction vector, i is the ith capsule of the input layer and j is the jth capsule of the output layer.
Optionally, the calculation formula of the output of the capsule of the output layer is:
wherein v isjIs the output vector of the jth capsule of the output layer.
In a second aspect, the present application provides a method for classifying multi-view images based on the above rolling capsule layer, including: inputting the image into a convolutional neural network to obtain a main characteristic image; and inputting the main characteristic image into two convolution capsule layers to obtain a classification result of the multi-view image.
Optionally, the convolutional neural network comprises an input layer, a plurality of convolutional layers, a ReLU layer for making part of neuron outputs 0 to cause sparsity, and a max-firing layer for compressing the feature image to obtain a main feature image.
In a third aspect, the present application provides an apparatus for constructing a layer of convoluted capsules, the layer of convoluted capsules comprising at least an input layer having a plurality of capsules and an output layer, the apparatus comprising: the inner product module is used for carrying out inner product on the Gabor filter and the convolution kernel so as to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule; the convolution module is used for convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector; the building module is used for building a white attention route so as to obtain the distribution probability of the capsules of the input layer to the capsules of the output layer; an obtaining module for obtaining an input of a capsule of the output layer according to the distribution probability; and the activation module is used for activating the input of the capsules of the output layer by the Squash activation function to obtain the output of the capsules of the output layer.
In a fourth aspect, the present application provides a device for classifying multi-view images, comprising: the first input module is used for inputting the image into the convolutional neural network to obtain a main characteristic image; and the second input module is used for inputting the main characteristic image into two layers of the convolution capsule layers to obtain a classification result of the multi-view image.
In a fifth aspect, the present application provides an electronic device, comprising: a processor; and a memory having computer readable instructions stored thereon, which when executed by the processor, cause the processor to perform the above-described method.
(III) advantageous effects
The application provides a building method of a rolling capsule layer and a classification method and device of a multi-view image, the traditional capsule building method based on common convolution is replaced by a 3d convolution method based on gabor convolution, the complexity of an algorithm can be greatly reduced, the building of a deep capsule network is realized, the modulation of a gabor filter for convolution can be used for guiding the learning of convolution characteristics, and finally the deep capsule network based on sausage coverage learning is built, so that the problems of gradient disappearance caused by deep stacking and excessive coupling of capsules in the traditional capsule network building process are solved. The method can be used for multi-view image classification, such as image retrieval, intelligent monitoring, intelligent transportation, monitoring security and the like.
Drawings
FIG. 1 schematically illustrates a step diagram of a method of building a rolled layer of capsules according to an embodiment of the disclosure;
FIG. 2 schematically illustrates a flow chart of a method of building a rolled layer of capsules according to an embodiment of the disclosure;
fig. 3 schematically shows a step diagram of a classification method of multi-view images according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of a build apparatus that rolls a layer of capsules according to an embodiment of the disclosure;
fig. 5 schematically shows a block diagram of a classification apparatus of a multi-view image according to an embodiment of the present disclosure;
fig. 6 schematically shows a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure will be described below with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). In addition, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon for use by or in connection with an instruction execution system.
Embodiments of the present disclosure provide a method of constructing a layer of convoluted capsules comprising at least an input layer having a plurality of capsules and an output layer, as shown in fig. 1, the method comprising: s1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule; s2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector; s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer; s4, obtaining the input of the capsule of the output layer according to the distribution probability; and S5, activating the input of the capsules of the output layer through a Squash activation function to obtain the output of the capsules of the output layer.
The facial image attribute editing method in the present disclosure will be described in detail below with reference to the accompanying drawings. The input layer of the rolled capsule layer of the disclosed embodiments includes a plurality of capsules and the output layer also includes a plurality of capsules.
S1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule;
as shown in FIG. 2, firstly, a gabor filter with 4 directions can be initialized, the parameters are fixed, secondly, a convolution kernel with fixed size and learnable parameters is initialized, and the two are subjected to inner product to obtain a convolution gabor filter wijWherein i is inputThe ith capsule of the layer, j being the jth capsule of the output layer.
S2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector;
convolving the above-mentioned convolved gabor filter with the i-th input feature map uiPerforming convolution operation to obtain a prediction vectorThe calculation formula is as follows:
s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer;
the self-attention route can be constructed by using a prediction vector [ wj,hj,ni,nj,dj]Transpose to obtain [ wj,hj,nj,ni,dj]So as to output the number n of capsules corresponding to the jth capsule of the output layerjAs heads of a multi-headed attention mechanism, along niCalculating the correlation between initial prediction vectors of the ith capsule of the input layer after affine transformation in the dimension where wjFor the width of the convolved feature map, hjFor the height of the convolved feature map, niFor the number of elements of the i-th capsule of the input layer, njNumber of elements of jth capsule of output layer, djIs the dimension of the capsule.
Obtaining attention value head of prediction vectorhWherein, in the step (A),x is a query vectorY is a key value vectorZ is a vector of valuesX, Y and Z can be obtained by linear mapping of parameter matrix, firstly, the similarity between query vector X and key value vector Y is calculated in the form of inner product, then the scale factorIs to adjust to avoid excessive inner product values, dim is the dimension of the query vector and the key-value vector.
Using the above attention value as the weight coefficient head from the input layer capsule to the output layer capsuleh;
Splicing the weight coefficients to obtain the probability value c from the input layer capsule to the output layer capsuleij=Concat(head1,...,headh,…,headH),H=nj. Specifically, the distribution probability from the capsule of the input layer to the capsule of the output layer can be obtained by splicing the weight coefficients corresponding to the attention heads
S4, obtaining the input of the capsule of the output layer according to the distribution probability;
the formula for the input of the capsule of the output layer is:
wherein s isjInput of capsules as output layer, cijThe probability value of an input layer capsule to an output layer capsule,for the prediction vector, i is the ith capsule of the input layer and j is the jth capsule of the output layer.
And S5, activating the input of the capsules of the output layer through a Squash activation function to obtain the output of the capsules of the output layer.
The formula for the output of the capsules of the output layer is:
wherein v isjIs the output vector of the jth capsule of the output layer.
The method is based on the self-attention mechanism and the intensive study of gabor convolution, a novel method for constructing a gabor convolution capsule based on a self-attention route is provided, and on the basis, a convolution capsule network of the attention route is constructed, so that parameter decrement and local routing can be realized, and an application mechanism of a dynamic routing mechanism in a convolution neural network is realized to construct a deeper network structure. The method solves the problems of gradient disappearance caused by deep stacking and excessive coupling of capsules in the traditional capsule network construction process, and ensures the accuracy of feature representation in multi-view image classification.
The present disclosure further discloses a method for classifying multi-view images based on the above rolling capsule layer, as shown in fig. 3, the method includes:
s31, inputting the image into a convolutional neural network to obtain a main characteristic image;
the convolutional neural network includes an input layer, a plurality of convolutional layers, a ReLU layer, and a max-firing layer. Converting the multi-view image X into (X)1,x2,……xm) And inputting an input layer, wherein the ReLU layer is used for enabling partial neuron output to be 0, sparseness is caused, and the max-posing layer is used for compressing the characteristic image to obtain a main characteristic image.
And S32, inputting the main characteristic image into two layers of the convolution capsule layers to obtain a classification result of the multi-perspective image.
Based on the same inventive concept, the embodiment of the present disclosure further provides a device for constructing a convolution capsule layer, and the following introduces a facial image attribute editing device according to the embodiment of the present disclosure with reference to fig. 4.
Fig. 4 schematically illustrates a block diagram of a build apparatus 400 for rolling a layer of capsules, in accordance with an embodiment of the disclosure.
As shown in fig. 4, the building apparatus 400 for rolling a layer of capsules includes an inner product module 410, a convolution module 420, a building module 430, an obtaining module 440, and an activation module 450. The build device 400 may perform the various methods described above with reference to fig. 1 and 2.
The convolutional capsule layer includes at least an input layer having a plurality of capsules and an output layer, the apparatus comprising:
the inner product module 410 performs, for example, operation S1 described with reference to fig. 1 above, for inner-product the Gabor filter with the convolution kernel to obtain a convolved Gabor filter of input layer capsules to output layer capsules;
the convolution module 420 performs, for example, operation S2 described with reference to fig. 1 above, for convolving the convolved gabor filter with the convolved feature map input by the input layer to obtain a prediction vector;
the building module 430 performs, for example, operation S3 described with reference to fig. 1 above, for building a self-attention route to obtain an assignment probability of a capsule of the input layer to a capsule of the output layer;
the obtaining module 440 performs, for example, operation S4 described with reference to fig. 1 above, for obtaining an input of a capsule of the output layer according to the assigned probability;
the activation module 450 performs, for example, operation S5 described with reference to fig. 1 above, for activating the input of the capsules of the output layer via the Squash activation function to obtain the output of the capsules of the output layer.
The embodiment of the present disclosure further provides a device for classifying multi-view images, and the device 500 for classifying multi-view images according to the embodiment of the present disclosure is described below with reference to fig. 5.
Fig. 5 schematically shows a block diagram of a classification apparatus 500 of a multi-view image according to an embodiment of the present disclosure.
As shown in fig. 5, the apparatus 500 for classifying multi-view images includes a first input module 510 and a second input module 520. The apparatus 500 for classifying a multi-view image may perform various methods described above with reference to fig. 3.
The first input module 510 performs, for example, operation S31 described with reference to fig. 3 above, for inputting the image into a convolutional neural network to obtain a main feature image;
the second input module 520 performs, for example, operation S32 described with reference to fig. 3 above, for inputting the main feature image into two layers of the convolution capsule layer to obtain a classification result of the multi-perspective image.
Fig. 6 schematically shows a block diagram of an electronic device adapted to implement the methods of the present disclosure, in accordance with an embodiment of the present disclosure. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 includes a processor 610, a computer-readable storage medium 620. The computer system 600 may perform a method according to an embodiment of the disclosure.
In particular, the processor 610 may comprise, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 610 may also include onboard memory for caching purposes. The processor 610 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
Computer-readable storage medium 620 may be, for example, any medium that can contain, store, communicate, propagate, or transport the instructions. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The computer-readable storage medium 620 may include a computer program 621, which computer program 621 may include code/computer-executable instructions that, when executed by the processor 610, cause the processor 610 to perform a method according to an embodiment of the disclosure, or any variation thereof.
The computer program 621 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 621 may include one or more program modules, including 621A, 621B, … …, for example. It should be noted that the division and number of the modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, so that the processor 610 may execute the method according to the embodiment of the present disclosure or any variation thereof when the program modules are executed by the processor 610.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A method of constructing a layer of convoluted capsules comprising at least an input layer having a plurality of capsules and an output layer, the method comprising:
s1, performing inner product on the Gabor filter and the convolution kernel to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule;
s2, convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector;
s3, constructing a self-attention route to acquire the distribution probability of the capsules of the input layer to the capsules of the output layer; the construction method of the self-attention route comprises the following steps: the prediction vector [ w ]j,hj,ni,nj,dj]Transpose to obtain [ wj,hj,nj,ni,dj]So as to output the number n of capsules corresponding to the jth capsule of the output layerjAs heads of a multi-headed attention mechanism, along niCalculating the correlation between initial prediction vectors of the ith capsule of the input layer after affine transformation in the corresponding dimensionWherein, wjFor the width of the convolved feature map, hjFor the height of the convolved feature map, niFor the number of elements of the i-th capsule of the input layer, njNumber of elements of jth capsule of output layer, djIs the dimension of the capsule; the calculation process of the distribution probability comprises the following steps: obtaining attention value head of the prediction vectorhWherein, in the step (A),x is a query vectorY is a key value vectorZ is a vector of values Is a prediction vector; taking the attention value as a weight coefficient of an input layer capsule to an output layer capsule; splicing the weight coefficients to obtain the probability value c from the input layer capsule to the output layer capsuleij=Concat(head1,...,headh,…,headH),H=nj;
S4, obtaining the input of the capsule of the output layer according to the distribution probability;
and S5, activating the input of the capsule of the output layer through a Squash activation function to obtain the output of the capsule of the output layer.
2. The building method according to claim 1, wherein the input of the capsule of the output layer is calculated by the formula:
4. A method for classifying multi-view images based on the method for constructing a rolling capsule layer according to any one of claims 1 to 3, comprising:
inputting the image into a convolutional neural network to obtain a main characteristic image;
and inputting the main characteristic image into two layers of the convolution capsule layers to obtain a classification result of multi-view images.
5. The method according to claim 4, wherein the convolutional neural network comprises an input layer, a plurality of convolutional layers, a ReLU layer for making a partial neuron output 0, resulting in sparsity, and a max-posing layer for compressing the feature image to obtain a main feature image.
6. An apparatus for constructing a convolutional capsule layer comprising at least an input layer having a plurality of capsules and an output layer, the apparatus comprising:
the inner product module is used for carrying out inner product on the Gabor filter and the convolution kernel so as to obtain a convolution Gabor filter from the input layer capsule to the output layer capsule;
the convolution module is used for convolving the convolution gabor filter with the convolution characteristic diagram input by the input layer to obtain a prediction vector;
a construction module, configured to construct a self-attention route to obtain an assignment probability of a capsule of the input layer to a capsule of the output layer, wherein the construction method of the self-attention route includes: the prediction vector [ w ]j,hj,ni,nj,dj]Transpose to obtain [ wj,hj,nj,ni,dj]So as to output the number n of capsules corresponding to the jth capsule of the output layerjAs heads of a multi-headed attention mechanism, along niCalculating the correlation between initial prediction vectors of the ith capsule of the input layer after affine transformation in the dimension where wjFor the width of the convolved feature map, hjFor the height of the convolved feature map, niFor the number of elements of the i-th capsule of the input layer, njNumber of elements of jth capsule of output layer, djIs the dimension of the capsule; the calculation process of the distribution probability comprises the following steps: obtaining attention value head of the prediction vectorhWherein, in the step (A),x is a query vectorY is a key value vectorZ is a vector of values Is a prediction vector; taking the attention value as a weight coefficient of an input layer capsule to an output layer capsule; splicing the weight coefficients to obtain the input layerProbability value c of capsule to output layer capsuleij=Concat(head1,...,headh,…,headH),H=nj;
An obtaining module for obtaining an input of a capsule of an output layer according to the distribution probability;
and the activation module is used for activating the input of the capsule of the output layer by a Squash activation function to obtain the output of the capsule of the output layer.
7. A multi-view image classification device based on the method for constructing a rolling capsule layer according to any one of claims 1 to 3, comprising:
the first input module is used for inputting the image into the convolutional neural network to obtain a main characteristic image;
and the second input module is used for inputting the main characteristic image into two layers of the convolution capsule layers to obtain a classification result of the multi-view image.
8. An electronic device, comprising:
a processor; and
a memory having computer-readable instructions stored thereon that, when executed by the processor, cause the processor to perform the method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010309310.7A CN111507410B (en) | 2020-04-17 | 2020-04-17 | Construction method of rolling capsule layer and classification method and device of multi-view images |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010309310.7A CN111507410B (en) | 2020-04-17 | 2020-04-17 | Construction method of rolling capsule layer and classification method and device of multi-view images |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111507410A CN111507410A (en) | 2020-08-07 |
CN111507410B true CN111507410B (en) | 2021-02-12 |
Family
ID=71869444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010309310.7A Active CN111507410B (en) | 2020-04-17 | 2020-04-17 | Construction method of rolling capsule layer and classification method and device of multi-view images |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111507410B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113205137B (en) * | 2021-04-30 | 2023-06-20 | 中国人民大学 | Image recognition method and system based on capsule parameter optimization |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456014A (en) * | 2013-09-04 | 2013-12-18 | 西北工业大学 | Scene matching suitability analyzing method based on multiple-feature integrating visual attention model |
CN106097335A (en) * | 2016-06-08 | 2016-11-09 | 安翰光电技术(武汉)有限公司 | Digestive tract focus image identification system and recognition methods |
CN107909059A (en) * | 2017-11-30 | 2018-04-13 | 中南大学 | It is a kind of towards cooperateing with complicated City scenarios the traffic mark board of bionical vision to detect and recognition methods |
CN109063724A (en) * | 2018-06-12 | 2018-12-21 | 中国科学院深圳先进技术研究院 | A kind of enhanced production confrontation network and target sample recognition methods |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100046816A1 (en) * | 2008-08-19 | 2010-02-25 | Igual-Munoz Laura | Method for automatic classification of in vivo images |
-
2020
- 2020-04-17 CN CN202010309310.7A patent/CN111507410B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456014A (en) * | 2013-09-04 | 2013-12-18 | 西北工业大学 | Scene matching suitability analyzing method based on multiple-feature integrating visual attention model |
CN106097335A (en) * | 2016-06-08 | 2016-11-09 | 安翰光电技术(武汉)有限公司 | Digestive tract focus image identification system and recognition methods |
CN107909059A (en) * | 2017-11-30 | 2018-04-13 | 中南大学 | It is a kind of towards cooperateing with complicated City scenarios the traffic mark board of bionical vision to detect and recognition methods |
CN109063724A (en) * | 2018-06-12 | 2018-12-21 | 中国科学院深圳先进技术研究院 | A kind of enhanced production confrontation network and target sample recognition methods |
Non-Patent Citations (1)
Title |
---|
BDARS_CapsNet: Bi-Directional Attention Routing Sausage Capsule Network;Xin Ning et al.;《IEEE Access》;20200323;59059-59068 * |
Also Published As
Publication number | Publication date |
---|---|
CN111507410A (en) | 2020-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110175671B (en) | Neural network construction method, image processing method and device | |
US20220108546A1 (en) | Object detection method and apparatus, and computer storage medium | |
US20220092351A1 (en) | Image classification method, neural network training method, and apparatus | |
KR102302725B1 (en) | Room Layout Estimation Methods and Techniques | |
CN112446270B (en) | Training method of pedestrian re-recognition network, pedestrian re-recognition method and device | |
CN112446398B (en) | Image classification method and device | |
US12131521B2 (en) | Image classification method and apparatus | |
US20220215227A1 (en) | Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium | |
CN111914997B (en) | Method for training neural network, image processing method and device | |
WO2022052601A1 (en) | Neural network model training method, and image processing method and device | |
US20230153615A1 (en) | Neural network distillation method and apparatus | |
WO2021008206A1 (en) | Neural architecture search method, and image processing method and device | |
CN111797882B (en) | Image classification method and device | |
US20170083754A1 (en) | Methods and Systems for Verifying Face Images Based on Canonical Images | |
CN113570029A (en) | Method for obtaining neural network model, image processing method and device | |
CN113065645B (en) | Twin attention network, image processing method and device | |
CN111695673B (en) | Method for training neural network predictor, image processing method and device | |
CN111797970B (en) | Method and device for training neural network | |
CN110222718B (en) | Image processing method and device | |
CN112529904B (en) | Image semantic segmentation method, device, computer readable storage medium and chip | |
EP3965071A2 (en) | Method and apparatus for pose identification | |
CN111008631B (en) | Image association method and device, storage medium and electronic device | |
WO2022156475A1 (en) | Neural network model training method and apparatus, and data processing method and apparatus | |
CN113516227A (en) | Neural network training method and device based on federal learning | |
US20220222934A1 (en) | Neural network construction method and apparatus, and image processing method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230117 Address after: Room 302, Floor 3, Building 20, No. 2, Jingyuan North Street, Daxing Economic and Technological Development Zone, Beijing, 100176 (Yizhuang Cluster, High-end Industrial Zone, Beijing Pilot Free Trade Zone) Patentee after: Zhongke Shangyi Health Technology (Beijing) Co.,Ltd. Address before: 100083 No. 35, Qinghua East Road, Beijing, Haidian District Patentee before: INSTITUTE OF SEMICONDUCTORS, CHINESE ACADEMY OF SCIENCES |