CN110334769A - Target identification method and device - Google Patents
Target identification method and device Download PDFInfo
- Publication number
- CN110334769A CN110334769A CN201910614107.8A CN201910614107A CN110334769A CN 110334769 A CN110334769 A CN 110334769A CN 201910614107 A CN201910614107 A CN 201910614107A CN 110334769 A CN110334769 A CN 110334769A
- Authority
- CN
- China
- Prior art keywords
- layer
- pixel
- depth image
- image
- rgb
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000011176 pooling Methods 0.000 claims description 58
- 238000012545 processing Methods 0.000 claims description 32
- 239000011800 void material Substances 0.000 claims description 18
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 238000005286 illumination Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 101000632154 Homo sapiens Ninjurin-1 Proteins 0.000 description 1
- -1 NIN2 Proteins 0.000 description 1
- 102100027894 Ninjurin-1 Human genes 0.000 description 1
- 101100187138 Oryza sativa subsp. japonica NIN3 gene Proteins 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the present application discloses a kind of target identification method and device, acquires the RGB image and depth image of target area;Empty filling, the depth image repaired are carried out to depth image;The depth image of reparation is encoded to obtain triple channel depth image;RGB image and triple channel depth image are inputted into trained identification model in advance, obtain the target identification result in RGB image.The application carries out target identification using preparatory trained identification model, in conjunction with RGB image and depth image, improves the accuracy rate of target identification.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a target identification method and apparatus.
Background
The current target recognition is realized based on RGB images, and a target is recognized by extracting color features, texture features and contour features from the RGB images. However, due to the influence of environmental factors such as illumination and the like during imaging, the available feature information of the target cannot be completely reflected by the features extracted in the existing target identification process based on the RGB image, so that the identification accuracy of the target is low.
Disclosure of Invention
The application aims to provide a target identification method and a target identification device so as to improve the accuracy of target identification, and the method comprises the following technical scheme:
an object recognition method, comprising:
collecting an RGB image and a depth image of a target area;
filling holes in the depth image to obtain a repaired depth image;
coding the restored depth image to obtain a three-channel depth image;
inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
In the above method, preferably, the filling the hole in the depth image to obtain a restored depth image includes:
carrying out binarization processing on the depth image to obtain a mask;
determining a hole point in the depth image according to the mask;
clustering pixel values in the grayed RGB images to obtain clustered images, wherein the clustered images identify pixel points with approximate pixel values in the grayed RGB images;
determining a first pixel corresponding to the void point and all second pixels of the same kind as the first pixel in the grayed RGB image, wherein the second pixels correspond to non-void points in the depth image;
calculating the distance between the first pixel and each second pixel;
and taking the depth value corresponding to the second pixel with the shortest distance between the first pixels as the filling value of the hole point.
In the above method, preferably, the filling the hole in the depth image to obtain a restored depth image includes:
carrying out binarization processing on the depth image to obtain a mask;
determining a hole point in the depth image according to the mask;
determining a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel in the RGB image, wherein the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
calculating the distance between the first pixel and each second pixel;
and taking the depth value corresponding to the second pixel with the shortest distance between the first pixels as the filling value of the hole point.
The above method, preferably, the identification model includes:
a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
In the above method, preferably, the deep network unit includes: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
An object recognition apparatus comprising:
the acquisition module is used for acquiring the RGB image and the depth image of the target area;
the filling module is used for filling the hole in the depth image to obtain a repaired depth image;
the coding module is used for coding the repaired depth image to obtain a three-channel depth image;
the recognition module is used for inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
The above apparatus, preferably, the filling module includes:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
the clustering unit is used for clustering the pixel values in the grayed RGB images to obtain clustered images, and the clustered images identify pixels with approximate pixel values in the grayed RGB images;
a second determining unit, configured to determine, in the grayed RGB image, a first pixel corresponding to the hole point and all second pixels of the same kind as the first pixel, where the second pixels correspond to non-hole points in the depth image;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
The above apparatus, preferably, the filling module includes:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
a third determining unit, configured to determine, in the RGB image, a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel, where the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
The above apparatus, preferably, the identification model includes: a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
The above apparatus, preferably, the deep network unit includes: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
According to the scheme, the target identification method and the target identification device collect the RGB image and the depth image of the target area; filling holes in the depth image to obtain a repaired depth image; coding the restored depth image to obtain a three-channel depth image; and inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image. According to the method and the device, the target recognition is carried out by utilizing the pre-trained recognition model and combining the RGB image and the depth image, the accuracy of the target recognition is improved, and the problem that the recognition accuracy is low due to the influence of environmental factors such as illumination on the existing target recognition method is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an implementation of a target identification method according to an embodiment of the present application;
fig. 2 is a flowchart of an implementation of filling a hole in a depth image to obtain a restored depth image according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a recognition model provided in an embodiment of the present application;
fig. 4 is an exemplary diagram of an inclusion module provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an object recognition apparatus according to an embodiment of the present application;
fig. 6 is a frame of image to be subjected to target recognition according to an embodiment of the present disclosure;
fig. 7 is a target recognition result obtained by processing the image shown in fig. 6 and the corresponding depth image based on the target recognition method provided in the embodiment of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of an implementation of a target identification method according to an embodiment of the present application, which may include:
step S101: the RGB image and the depth image of the target area are collected.
An RGB-D depth camera may be employed to capture RGB images and depth images of the target zone. When the RGB-D based depth camera collects images, one frame of depth image can be collected at the same time when one frame of RGB image is collected. In displaying, only the RGB image may be displayed.
Step S102: and filling holes in the depth image to obtain a repaired depth image.
There are often holes in the depth image acquired with the depth camera that need to be repaired. In an alternative embodiment, the holes may be filled with depth values of pixels around the holes to repair the depth map.
Step S103: and coding the repaired depth image to obtain a three-channel depth image.
Optionally, the restored depth image may be encoded by using an HHA encoding method, and three channels in the obtained three-channel depth image may be three channels of a horizontal difference, a ground height, and an angle of a surface normal vector. The HHA coding method emphasizes the complementary information between the channel data.
Step S104: inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the recognition model is obtained by training a plurality of RGB images marked with targets and depth images corresponding to the marked RGB images as samples in advance.
In the embodiment of the application, a plurality of pairs of RGB images and depth images acquired by an RGB-D depth camera are taken as training samples in advance, and labeling information corresponding to the RGB images is taken as a label to be trained to obtain the recognition model. The annotation information corresponding to the RGB image may include a text identifier corresponding to a specified region in the RGB image. The text label may not be identified in the RGB image, but the text label is stored in association with the RGB image, and the specified area information of the RGB (the specified area information is identified by a graphic in the RGB image, for example, the specified area information may be a rectangular box), wherein the specified area information is used to specify the position of the object in the RGB image.
According to the target recognition method, the pre-trained recognition model is used, the depth image and the RGB image are combined to perform target recognition, the target recognition precision is improved, and the problem that the recognition accuracy is low due to the fact that the existing target recognition method is influenced by environmental factors such as illumination is solved.
In an optional embodiment, an implementation flowchart of the above hole filling for a depth image to obtain a restored depth image is shown in fig. 2, and may include:
step S201: and carrying out binarization processing on the depth image to obtain a mask.
Optionally, in the depth image, a point with a depth value of zero may be binarized to be 0, and a point with a depth value of non-zero may be binarized to be 255, and may be expressed as:
mask denotes a mask, and A (i, j) denotes a depth value at (i, j).
Step S202: determining a hole point in the depth image according to the mask.
The hole point is a point in the mask with a value of 0.
Step S203: and clustering the pixel values in the grayed RGB image to obtain a clustered image, wherein the clustered image identifies pixel points with approximate pixel values in the grayed RGB image.
The grayed RGB image refers to a grayscale image converted from an RGB image. Optionally, a K-means algorithm may be used to cluster pixel values in the grayed RGB image. Alternatively, other clustering algorithms may be used to cluster the pixel values in the grayed RGB images, such as hierarchical clustering algorithms. The cluster image characterizes which pixels in the grayed RGB image have similar pixel values.
Step S204: and determining a first pixel corresponding to the void point and all second pixels which are the same as the first pixel in the RGB image, wherein the second pixels correspond to non-void points in the depth image.
Pixels in the RGB image correspond to pixels in the depth map one-to-one. The pixels belonging to the same cluster include both a first pixel corresponding to a hole point and a second pixel corresponding to a non-hole point.
Step S205: and calculating the distance between the first pixel and each second pixel.
In the embodiment of the present application, for a first pixel corresponding to each empty point, a distance between the first pixel and each second pixel in the same cluster is calculated through a pixel value (i.e., a gray value), where the distance may be an euclidean distance, or may be other distances, such as a cosine similarity distance.
In an alternative embodiment, the distance between the first pixel and the second pixel may be a combined distance between the first pixel and the second pixel calculated from the euclidean distance between the first pixel and the second pixel and the image pixel distance. Image pixel distance refers to the distance between two pixels measured in pixels. For example,
assuming that the coordinates of the pixel point a in the image are the 30 th row and the 30 th column in the 10 th row and the coordinates of the pixel point b in the 13 th row and the 34 th column, the distance between the two pixel points in the row direction is 3, and the distance in the column direction is 4, and the image pixel distance between the pixel point a and the pixel point b is 5. After the euclidean distance between the pixel point a and the pixel point b and the image pixel distance are obtained, the sum of the two (namely, the euclidean distance and the image pixel distance) can be used as the comprehensive distance between the pixel point a and the pixel point b, or the weighted sum of the two is carried out to obtain the comprehensive distance between the pixel point a and the pixel point b, or the weighted sum of the two is respectively squared and then summed to obtain the comprehensive distance between the pixel point a and the pixel point b.
Step S206: and taking the depth value corresponding to the second pixel with the shortest distance with the first pixel as the filling value of the hole point. That is, the hole point is filled with a depth value corresponding to a second pixel having the shortest distance between the first pixels.
In an optional embodiment, the above-mentioned performing hole filling on the depth image to obtain the repaired depth image may be implemented in a manner that:
and carrying out binarization processing on the depth image to obtain a mask.
Determining a hole point in the depth image according to the mask.
The implementation of the above two steps can refer to the foregoing embodiments, and is not described in detail here.
And determining a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel in the RGB image, wherein the second pixel is a pixel in the preset neighborhood corresponding to a non-void point in the depth image.
In this embodiment, the second pixel is a pixel in the neighborhood of the first pixel.
And calculating the distance between the first pixel and each second pixel. The calculation process can be referred to the foregoing embodiments and will not be described in detail here.
And taking the depth value corresponding to the second pixel with the shortest distance with the first pixel as the filling value of the hole point.
In an alternative embodiment, a schematic structural diagram of the recognition model is shown in fig. 3, and may include: a deep network unit and a convolutional neural network unit; wherein,
the depth network unit (NIN network unit for short) is configured to process the three-channel depth image to extract features of the three-channel depth image. In the example shown in fig. 3, the input three-channel depth image is HHA _ Img, with a size of 300 × 300.
The convolutional neural network unit (CNN network unit for short) is used for processing the RGB image, extracting the characteristics of the RGB image, and processing the characteristics of the three-channel depth image and the characteristics of the RGB image to obtain a target identification result in the RGB image. In the example shown in fig. 3, the input RGB image is RGB _ Img, and the size is 300 × 300.
Optionally, the NIN network unit includes three multilayer perceptron convolutional layers (i.e., three mlpconv network layers, respectively identified as NIN1, NIN2, and NIN3 in fig. 3), where the mlpconv layers are actually performed by performing a normal convolution (convolution) and then a conventional mlp (multilayer perceptron). The multi-layer perceptron is a 2-layer (input layer +1 hidden layer) perceptron, which is to perform weighted linear recombination on an element at the same position in each feature layer output by a common convolutional layer, which is equivalent to the operation result on a local block in 1X1 convolution, and then perform such operation on each element in the feature map, which is equivalent to 1X1 convolution. Since the constraint is linear and mlp is non-linear, the latter allows higher abstraction and thus greater generalization capability. In the cross-channel case, mlpconv is equivalent to convolutional layer +1 × 1 convolutional layer.
The CNN network unit comprises two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
Taking fig. 3 as an example, the 7 × 7 convolution layer Conv _7 × 7, the maximum pooling layer maxpool, 3 × 3 convolution layer Conv _3 × 3 and the maximum pooling layer maxpool connected in sequence in fig. 3 constitute two convolution pooling layers; the Incep (3a) and the Incep (3b) which are connected in sequence form a first Incep module with two layers; the largest pooling layer maxpool connected with the inclusion (3b) constitutes a first pooling layer; the sequentially connected Incepration (4a) -Incepration (4e) form a five-layer second Incepration module; the largest pooling layer maxpool connected with the inclusion (4e) forms a second pooling layer; the inclusion (5a) and the inclusion (5b) which are connected in sequence form a third inclusion module with two layers; the average value pooling layer avgpool connected with the inclusion (5b) forms a third pooling layer; dropout is the signal loss layer; linear is a linear layer; softmax is a classification layer; detection is a decision layer; Non-Maximum Suppression is the output layer.
The inclusion module is used to simultaneously convolve and re-aggregate the features output by the previous layer in multiple dimensions. Specifically, an example of the inclusion module is shown in fig. 4, in which the sizes of convolution kernels used by the inclusion module are 1 × 1, 3 × 3, and 5 × 5, the use of convolution kernels with different sizes means different sizes of receptive fields, and the final concatenation means aggregation of features with different scales, so that the sizes of convolution kernels are 1 × 1, 3 × 3, and 5, mainly for convenience of alignment. After the convolution step stride is set to 1, as long as the edge extension parameters pad are set to 0, 1 and 2 respectively, features with the same dimension can be obtained after convolution, then the features can be directly spliced together, meanwhile, 3 × 3 pooling layers are introduced into the network, the more the network goes to the back, the more abstract the features are, and the more the receptive field involved in each feature is, therefore, as the number of layers increases, the proportion of convolution of 3 × 3 and 5 × 5 also increases, but the convolution kernel using 5 × 5 still brings huge calculation amount. For this purpose, a 1 × 1 convolution kernel is used for dimensionality reduction.
Corresponding to the embodiment of the method, the present application further provides a target identification device, and a schematic structural diagram of the target identification device provided by the present application is shown in fig. 5, and the target identification device may include:
an acquisition module 51, a filling module 52, an encoding module 53 and an identification module 54; wherein,
the acquisition module 51 is configured to acquire an RGB image and a depth image of the target area;
the filling module 52 is configured to perform hole filling on the depth image to obtain a repaired depth image;
the encoding module 53 is configured to encode the repaired depth image to obtain a three-channel depth image;
the recognition module 54 is configured to input the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
The target recognition device provided by the application acquires an RGB image and a depth image of a target area; filling holes in the depth image to obtain a repaired depth image; coding the restored depth image to obtain a three-channel depth image; and inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image. According to the method and the device, the target recognition is carried out by utilizing the recognition model trained in advance and combining the RGB image and the depth image, and the accuracy of the target recognition is improved.
In an alternative embodiment, the filling module 52 may include:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
the clustering unit is used for clustering the pixel values in the grayed RGB images to obtain clustered images, and the clustered images identify pixels with approximate pixel values in the grayed RGB images;
a second determining unit, configured to determine, in the grayed RGB image, a first pixel corresponding to the hole point and all second pixels of the same kind as the first pixel, where the second pixels correspond to non-hole points in the depth image;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
In an alternative embodiment, the filling module 52 may include:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
a third determining unit, configured to determine, in the RGB image, a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel, where the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
In an alternative embodiment, the encoding module 53 may specifically be configured to: and performing HHA coding on the repaired depth image to obtain a three-channel depth image.
In an alternative embodiment, the identifying the model may include: a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
In an optional embodiment, the deep network unit includes: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
As shown in fig. 6-7, fig. 6 is a frame of image to be subjected to target recognition, and the frame of image and the depth image corresponding to the frame of image are processed based on the target recognition method provided by the present application, and the obtained target recognition result is shown in fig. 7. In this example, the target is a chair, and when the recognition model is trained, the chair recognition model is obtained by training using a training sample including the chair.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
It should be understood that the technical problems can be solved by combining and combining the features of the embodiments from the claims.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method of object recognition, comprising:
collecting an RGB image and a depth image of a target area;
filling holes in the depth image to obtain a repaired depth image;
coding the restored depth image to obtain a three-channel depth image;
inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
2. The method of claim 1, wherein the hole filling the depth image to obtain a restored depth image comprises:
carrying out binarization processing on the depth image to obtain a mask;
determining a hole point in the depth image according to the mask;
clustering pixel values in the grayed RGB images to obtain clustered images, wherein the clustered images identify pixel points with approximate pixel values in the grayed RGB images;
determining a first pixel corresponding to the void point and all second pixels of the same kind as the first pixel in the grayed RGB image, wherein the second pixels correspond to non-void points in the depth image;
calculating the distance between the first pixel and each second pixel;
and taking the depth value corresponding to the second pixel with the shortest distance between the first pixels as the filling value of the hole point.
3. The method of claim 1, wherein the hole filling the depth image to obtain a restored depth image comprises:
carrying out binarization processing on the depth image to obtain a mask;
determining a hole point in the depth image according to the mask;
determining a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel in the RGB image, wherein the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
calculating the distance between the first pixel and each second pixel;
and taking the depth value corresponding to the second pixel with the shortest distance between the first pixels as the filling value of the hole point.
4. The method of claim 1, wherein identifying the model comprises:
a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
5. The method of claim 4, wherein the deep network unit comprises: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
6. An object recognition apparatus, comprising:
the acquisition module is used for acquiring the RGB image and the depth image of the target area;
the filling module is used for filling the hole in the depth image to obtain a repaired depth image;
the coding module is used for coding the repaired depth image to obtain a three-channel depth image;
the recognition module is used for inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
7. The apparatus of claim 6, wherein the fill module comprises:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
the clustering unit is used for clustering the pixel values in the grayed RGB images to obtain clustered images, and the clustered images identify pixels with approximate pixel values in the grayed RGB images;
a second determining unit, configured to determine, in the grayed RGB image, a first pixel corresponding to the hole point and all second pixels of the same kind as the first pixel, where the second pixels correspond to non-hole points in the depth image;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
8. The apparatus of claim 6, wherein the fill module comprises:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
a third determining unit, configured to determine, in the RGB image, a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel, where the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
9. The apparatus of claim 6, wherein the recognition model comprises: a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
10. The apparatus of claim 9, wherein the deep network unit comprises: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910614107.8A CN110334769A (en) | 2019-07-09 | 2019-07-09 | Target identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910614107.8A CN110334769A (en) | 2019-07-09 | 2019-07-09 | Target identification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110334769A true CN110334769A (en) | 2019-10-15 |
Family
ID=68143410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910614107.8A Pending CN110334769A (en) | 2019-07-09 | 2019-07-09 | Target identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334769A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112102199A (en) * | 2020-09-18 | 2020-12-18 | 贝壳技术有限公司 | Method, device and system for filling hole area of depth image |
CN113393421A (en) * | 2021-05-08 | 2021-09-14 | 深圳市识农智能科技有限公司 | Fruit evaluation method and device and inspection equipment |
CN113902786A (en) * | 2021-09-23 | 2022-01-07 | 珠海视熙科技有限公司 | Depth image preprocessing method, system and related device |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102447925A (en) * | 2011-09-09 | 2012-05-09 | 青岛海信数字多媒体技术国家重点实验室有限公司 | Virtual viewpoint image synthesis method and device |
CN102625127A (en) * | 2012-03-24 | 2012-08-01 | 山东大学 | Optimization method suitable for virtual viewpoint generation of 3D television |
CN103236082A (en) * | 2013-04-27 | 2013-08-07 | 南京邮电大学 | Quasi-three dimensional reconstruction method for acquiring two-dimensional videos of static scenes |
CN103248909A (en) * | 2013-05-21 | 2013-08-14 | 清华大学 | Method and system of converting monocular video into stereoscopic video |
US20170069071A1 (en) * | 2015-09-04 | 2017-03-09 | Electronics And Telecommunications Research Institute | Apparatus and method for extracting person region based on red/green/blue-depth image |
CN106651871A (en) * | 2016-11-18 | 2017-05-10 | 华东师范大学 | Automatic filling method for cavities in depth image |
CN107977650A (en) * | 2017-12-21 | 2018-05-01 | 北京华捷艾米科技有限公司 | Method for detecting human face and device |
CN108230380A (en) * | 2016-12-09 | 2018-06-29 | 广东技术师范学院 | Indoor entrance detection method based on the three-dimensional depth of field |
US10062004B2 (en) * | 2015-08-20 | 2018-08-28 | Kabushiki Kaisha Toshiba | Arrangement detection apparatus and pickup apparatus |
CN108734210A (en) * | 2018-05-17 | 2018-11-02 | 浙江工业大学 | A kind of method for checking object based on cross-module state multi-scale feature fusion |
CN109636732A (en) * | 2018-10-24 | 2019-04-16 | 深圳先进技术研究院 | A kind of empty restorative procedure and image processing apparatus of depth image |
-
2019
- 2019-07-09 CN CN201910614107.8A patent/CN110334769A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102447925A (en) * | 2011-09-09 | 2012-05-09 | 青岛海信数字多媒体技术国家重点实验室有限公司 | Virtual viewpoint image synthesis method and device |
CN102625127A (en) * | 2012-03-24 | 2012-08-01 | 山东大学 | Optimization method suitable for virtual viewpoint generation of 3D television |
CN103236082A (en) * | 2013-04-27 | 2013-08-07 | 南京邮电大学 | Quasi-three dimensional reconstruction method for acquiring two-dimensional videos of static scenes |
CN103248909A (en) * | 2013-05-21 | 2013-08-14 | 清华大学 | Method and system of converting monocular video into stereoscopic video |
US10062004B2 (en) * | 2015-08-20 | 2018-08-28 | Kabushiki Kaisha Toshiba | Arrangement detection apparatus and pickup apparatus |
US20170069071A1 (en) * | 2015-09-04 | 2017-03-09 | Electronics And Telecommunications Research Institute | Apparatus and method for extracting person region based on red/green/blue-depth image |
CN106651871A (en) * | 2016-11-18 | 2017-05-10 | 华东师范大学 | Automatic filling method for cavities in depth image |
CN108230380A (en) * | 2016-12-09 | 2018-06-29 | 广东技术师范学院 | Indoor entrance detection method based on the three-dimensional depth of field |
CN107977650A (en) * | 2017-12-21 | 2018-05-01 | 北京华捷艾米科技有限公司 | Method for detecting human face and device |
CN108734210A (en) * | 2018-05-17 | 2018-11-02 | 浙江工业大学 | A kind of method for checking object based on cross-module state multi-scale feature fusion |
CN109636732A (en) * | 2018-10-24 | 2019-04-16 | 深圳先进技术研究院 | A kind of empty restorative procedure and image processing apparatus of depth image |
Non-Patent Citations (1)
Title |
---|
LUKAS SCHNEIDER等: "Multimodal Neural Networks:RGB-D for Semantic Segmentation and Object Detection", 《SCANDINAVIAN CONFERENCE ON IMAGE ANALYSIS》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112102199A (en) * | 2020-09-18 | 2020-12-18 | 贝壳技术有限公司 | Method, device and system for filling hole area of depth image |
CN112102199B (en) * | 2020-09-18 | 2024-11-08 | 贝壳技术有限公司 | Depth image cavity region filling method, device and system |
CN113393421A (en) * | 2021-05-08 | 2021-09-14 | 深圳市识农智能科技有限公司 | Fruit evaluation method and device and inspection equipment |
CN113902786A (en) * | 2021-09-23 | 2022-01-07 | 珠海视熙科技有限公司 | Depth image preprocessing method, system and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111027547B (en) | Automatic detection method for multi-scale polymorphic target in two-dimensional image | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
US9633282B2 (en) | Cross-trained convolutional neural networks using multimodal images | |
CN109840556B (en) | Image classification and identification method based on twin network | |
CN107944450B (en) | License plate recognition method and device | |
CN109241871A (en) | A kind of public domain stream of people's tracking based on video data | |
CN109886330B (en) | Text detection method and device, computer readable storage medium and computer equipment | |
CN110675408A (en) | High-resolution image building extraction method and system based on deep learning | |
CN111681273A (en) | Image segmentation method and device, electronic equipment and readable storage medium | |
CN114155527A (en) | Scene text recognition method and device | |
CN110765833A (en) | Crowd density estimation method based on deep learning | |
CN110569856A (en) | sample labeling method and device, and damage category identification method and device | |
CN109272487A (en) | The quantity statistics method of crowd in a kind of public domain based on video | |
CN114049356B (en) | Method, device and system for detecting structure apparent crack | |
CN113887472B (en) | Remote sensing image cloud detection method based on cascade color and texture feature attention | |
CN110334769A (en) | Target identification method and device | |
CN114648669A (en) | Motor train unit fault detection method and system based on domain-adaptive binocular parallax calculation | |
CN109117723A (en) | Blind way detection method based on color mode analysis and semantic segmentation | |
CN115546466A (en) | Weak supervision image target positioning method based on multi-scale significant feature fusion | |
CN114494786A (en) | Fine-grained image classification method based on multilayer coordination convolutional neural network | |
CN108154199B (en) | High-precision rapid single-class target detection method based on deep learning | |
CN111898671B (en) | Target identification method and system based on fusion of laser imager and color camera codes | |
CN112418262A (en) | Vehicle re-identification method, client and system | |
Yuan et al. | Graph neural network based multi-feature fusion for building change detection | |
CN116994024A (en) | Method, device, equipment, medium and product for identifying parts in container image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191015 |