Nothing Special   »   [go: up one dir, main page]

CN110334769A - Target identification method and device - Google Patents

Target identification method and device Download PDF

Info

Publication number
CN110334769A
CN110334769A CN201910614107.8A CN201910614107A CN110334769A CN 110334769 A CN110334769 A CN 110334769A CN 201910614107 A CN201910614107 A CN 201910614107A CN 110334769 A CN110334769 A CN 110334769A
Authority
CN
China
Prior art keywords
layer
pixel
depth image
image
rgb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910614107.8A
Other languages
Chinese (zh)
Inventor
郭建亚
李骊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing HJIMI Technology Co Ltd
Original Assignee
Beijing HJIMI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing HJIMI Technology Co Ltd filed Critical Beijing HJIMI Technology Co Ltd
Priority to CN201910614107.8A priority Critical patent/CN110334769A/en
Publication of CN110334769A publication Critical patent/CN110334769A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the present application discloses a kind of target identification method and device, acquires the RGB image and depth image of target area;Empty filling, the depth image repaired are carried out to depth image;The depth image of reparation is encoded to obtain triple channel depth image;RGB image and triple channel depth image are inputted into trained identification model in advance, obtain the target identification result in RGB image.The application carries out target identification using preparatory trained identification model, in conjunction with RGB image and depth image, improves the accuracy rate of target identification.

Description

Target identification method and device
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a target identification method and apparatus.
Background
The current target recognition is realized based on RGB images, and a target is recognized by extracting color features, texture features and contour features from the RGB images. However, due to the influence of environmental factors such as illumination and the like during imaging, the available feature information of the target cannot be completely reflected by the features extracted in the existing target identification process based on the RGB image, so that the identification accuracy of the target is low.
Disclosure of Invention
The application aims to provide a target identification method and a target identification device so as to improve the accuracy of target identification, and the method comprises the following technical scheme:
an object recognition method, comprising:
collecting an RGB image and a depth image of a target area;
filling holes in the depth image to obtain a repaired depth image;
coding the restored depth image to obtain a three-channel depth image;
inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
In the above method, preferably, the filling the hole in the depth image to obtain a restored depth image includes:
carrying out binarization processing on the depth image to obtain a mask;
determining a hole point in the depth image according to the mask;
clustering pixel values in the grayed RGB images to obtain clustered images, wherein the clustered images identify pixel points with approximate pixel values in the grayed RGB images;
determining a first pixel corresponding to the void point and all second pixels of the same kind as the first pixel in the grayed RGB image, wherein the second pixels correspond to non-void points in the depth image;
calculating the distance between the first pixel and each second pixel;
and taking the depth value corresponding to the second pixel with the shortest distance between the first pixels as the filling value of the hole point.
In the above method, preferably, the filling the hole in the depth image to obtain a restored depth image includes:
carrying out binarization processing on the depth image to obtain a mask;
determining a hole point in the depth image according to the mask;
determining a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel in the RGB image, wherein the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
calculating the distance between the first pixel and each second pixel;
and taking the depth value corresponding to the second pixel with the shortest distance between the first pixels as the filling value of the hole point.
The above method, preferably, the identification model includes:
a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
In the above method, preferably, the deep network unit includes: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
An object recognition apparatus comprising:
the acquisition module is used for acquiring the RGB image and the depth image of the target area;
the filling module is used for filling the hole in the depth image to obtain a repaired depth image;
the coding module is used for coding the repaired depth image to obtain a three-channel depth image;
the recognition module is used for inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
The above apparatus, preferably, the filling module includes:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
the clustering unit is used for clustering the pixel values in the grayed RGB images to obtain clustered images, and the clustered images identify pixels with approximate pixel values in the grayed RGB images;
a second determining unit, configured to determine, in the grayed RGB image, a first pixel corresponding to the hole point and all second pixels of the same kind as the first pixel, where the second pixels correspond to non-hole points in the depth image;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
The above apparatus, preferably, the filling module includes:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
a third determining unit, configured to determine, in the RGB image, a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel, where the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
The above apparatus, preferably, the identification model includes: a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
The above apparatus, preferably, the deep network unit includes: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
According to the scheme, the target identification method and the target identification device collect the RGB image and the depth image of the target area; filling holes in the depth image to obtain a repaired depth image; coding the restored depth image to obtain a three-channel depth image; and inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image. According to the method and the device, the target recognition is carried out by utilizing the pre-trained recognition model and combining the RGB image and the depth image, the accuracy of the target recognition is improved, and the problem that the recognition accuracy is low due to the influence of environmental factors such as illumination on the existing target recognition method is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an implementation of a target identification method according to an embodiment of the present application;
fig. 2 is a flowchart of an implementation of filling a hole in a depth image to obtain a restored depth image according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a recognition model provided in an embodiment of the present application;
fig. 4 is an exemplary diagram of an inclusion module provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an object recognition apparatus according to an embodiment of the present application;
fig. 6 is a frame of image to be subjected to target recognition according to an embodiment of the present disclosure;
fig. 7 is a target recognition result obtained by processing the image shown in fig. 6 and the corresponding depth image based on the target recognition method provided in the embodiment of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of an implementation of a target identification method according to an embodiment of the present application, which may include:
step S101: the RGB image and the depth image of the target area are collected.
An RGB-D depth camera may be employed to capture RGB images and depth images of the target zone. When the RGB-D based depth camera collects images, one frame of depth image can be collected at the same time when one frame of RGB image is collected. In displaying, only the RGB image may be displayed.
Step S102: and filling holes in the depth image to obtain a repaired depth image.
There are often holes in the depth image acquired with the depth camera that need to be repaired. In an alternative embodiment, the holes may be filled with depth values of pixels around the holes to repair the depth map.
Step S103: and coding the repaired depth image to obtain a three-channel depth image.
Optionally, the restored depth image may be encoded by using an HHA encoding method, and three channels in the obtained three-channel depth image may be three channels of a horizontal difference, a ground height, and an angle of a surface normal vector. The HHA coding method emphasizes the complementary information between the channel data.
Step S104: inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the recognition model is obtained by training a plurality of RGB images marked with targets and depth images corresponding to the marked RGB images as samples in advance.
In the embodiment of the application, a plurality of pairs of RGB images and depth images acquired by an RGB-D depth camera are taken as training samples in advance, and labeling information corresponding to the RGB images is taken as a label to be trained to obtain the recognition model. The annotation information corresponding to the RGB image may include a text identifier corresponding to a specified region in the RGB image. The text label may not be identified in the RGB image, but the text label is stored in association with the RGB image, and the specified area information of the RGB (the specified area information is identified by a graphic in the RGB image, for example, the specified area information may be a rectangular box), wherein the specified area information is used to specify the position of the object in the RGB image.
According to the target recognition method, the pre-trained recognition model is used, the depth image and the RGB image are combined to perform target recognition, the target recognition precision is improved, and the problem that the recognition accuracy is low due to the fact that the existing target recognition method is influenced by environmental factors such as illumination is solved.
In an optional embodiment, an implementation flowchart of the above hole filling for a depth image to obtain a restored depth image is shown in fig. 2, and may include:
step S201: and carrying out binarization processing on the depth image to obtain a mask.
Optionally, in the depth image, a point with a depth value of zero may be binarized to be 0, and a point with a depth value of non-zero may be binarized to be 255, and may be expressed as:
mask denotes a mask, and A (i, j) denotes a depth value at (i, j).
Step S202: determining a hole point in the depth image according to the mask.
The hole point is a point in the mask with a value of 0.
Step S203: and clustering the pixel values in the grayed RGB image to obtain a clustered image, wherein the clustered image identifies pixel points with approximate pixel values in the grayed RGB image.
The grayed RGB image refers to a grayscale image converted from an RGB image. Optionally, a K-means algorithm may be used to cluster pixel values in the grayed RGB image. Alternatively, other clustering algorithms may be used to cluster the pixel values in the grayed RGB images, such as hierarchical clustering algorithms. The cluster image characterizes which pixels in the grayed RGB image have similar pixel values.
Step S204: and determining a first pixel corresponding to the void point and all second pixels which are the same as the first pixel in the RGB image, wherein the second pixels correspond to non-void points in the depth image.
Pixels in the RGB image correspond to pixels in the depth map one-to-one. The pixels belonging to the same cluster include both a first pixel corresponding to a hole point and a second pixel corresponding to a non-hole point.
Step S205: and calculating the distance between the first pixel and each second pixel.
In the embodiment of the present application, for a first pixel corresponding to each empty point, a distance between the first pixel and each second pixel in the same cluster is calculated through a pixel value (i.e., a gray value), where the distance may be an euclidean distance, or may be other distances, such as a cosine similarity distance.
In an alternative embodiment, the distance between the first pixel and the second pixel may be a combined distance between the first pixel and the second pixel calculated from the euclidean distance between the first pixel and the second pixel and the image pixel distance. Image pixel distance refers to the distance between two pixels measured in pixels. For example,
assuming that the coordinates of the pixel point a in the image are the 30 th row and the 30 th column in the 10 th row and the coordinates of the pixel point b in the 13 th row and the 34 th column, the distance between the two pixel points in the row direction is 3, and the distance in the column direction is 4, and the image pixel distance between the pixel point a and the pixel point b is 5. After the euclidean distance between the pixel point a and the pixel point b and the image pixel distance are obtained, the sum of the two (namely, the euclidean distance and the image pixel distance) can be used as the comprehensive distance between the pixel point a and the pixel point b, or the weighted sum of the two is carried out to obtain the comprehensive distance between the pixel point a and the pixel point b, or the weighted sum of the two is respectively squared and then summed to obtain the comprehensive distance between the pixel point a and the pixel point b.
Step S206: and taking the depth value corresponding to the second pixel with the shortest distance with the first pixel as the filling value of the hole point. That is, the hole point is filled with a depth value corresponding to a second pixel having the shortest distance between the first pixels.
In an optional embodiment, the above-mentioned performing hole filling on the depth image to obtain the repaired depth image may be implemented in a manner that:
and carrying out binarization processing on the depth image to obtain a mask.
Determining a hole point in the depth image according to the mask.
The implementation of the above two steps can refer to the foregoing embodiments, and is not described in detail here.
And determining a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel in the RGB image, wherein the second pixel is a pixel in the preset neighborhood corresponding to a non-void point in the depth image.
In this embodiment, the second pixel is a pixel in the neighborhood of the first pixel.
And calculating the distance between the first pixel and each second pixel. The calculation process can be referred to the foregoing embodiments and will not be described in detail here.
And taking the depth value corresponding to the second pixel with the shortest distance with the first pixel as the filling value of the hole point.
In an alternative embodiment, a schematic structural diagram of the recognition model is shown in fig. 3, and may include: a deep network unit and a convolutional neural network unit; wherein,
the depth network unit (NIN network unit for short) is configured to process the three-channel depth image to extract features of the three-channel depth image. In the example shown in fig. 3, the input three-channel depth image is HHA _ Img, with a size of 300 × 300.
The convolutional neural network unit (CNN network unit for short) is used for processing the RGB image, extracting the characteristics of the RGB image, and processing the characteristics of the three-channel depth image and the characteristics of the RGB image to obtain a target identification result in the RGB image. In the example shown in fig. 3, the input RGB image is RGB _ Img, and the size is 300 × 300.
Optionally, the NIN network unit includes three multilayer perceptron convolutional layers (i.e., three mlpconv network layers, respectively identified as NIN1, NIN2, and NIN3 in fig. 3), where the mlpconv layers are actually performed by performing a normal convolution (convolution) and then a conventional mlp (multilayer perceptron). The multi-layer perceptron is a 2-layer (input layer +1 hidden layer) perceptron, which is to perform weighted linear recombination on an element at the same position in each feature layer output by a common convolutional layer, which is equivalent to the operation result on a local block in 1X1 convolution, and then perform such operation on each element in the feature map, which is equivalent to 1X1 convolution. Since the constraint is linear and mlp is non-linear, the latter allows higher abstraction and thus greater generalization capability. In the cross-channel case, mlpconv is equivalent to convolutional layer +1 × 1 convolutional layer.
The CNN network unit comprises two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
Taking fig. 3 as an example, the 7 × 7 convolution layer Conv _7 × 7, the maximum pooling layer maxpool, 3 × 3 convolution layer Conv _3 × 3 and the maximum pooling layer maxpool connected in sequence in fig. 3 constitute two convolution pooling layers; the Incep (3a) and the Incep (3b) which are connected in sequence form a first Incep module with two layers; the largest pooling layer maxpool connected with the inclusion (3b) constitutes a first pooling layer; the sequentially connected Incepration (4a) -Incepration (4e) form a five-layer second Incepration module; the largest pooling layer maxpool connected with the inclusion (4e) forms a second pooling layer; the inclusion (5a) and the inclusion (5b) which are connected in sequence form a third inclusion module with two layers; the average value pooling layer avgpool connected with the inclusion (5b) forms a third pooling layer; dropout is the signal loss layer; linear is a linear layer; softmax is a classification layer; detection is a decision layer; Non-Maximum Suppression is the output layer.
The inclusion module is used to simultaneously convolve and re-aggregate the features output by the previous layer in multiple dimensions. Specifically, an example of the inclusion module is shown in fig. 4, in which the sizes of convolution kernels used by the inclusion module are 1 × 1, 3 × 3, and 5 × 5, the use of convolution kernels with different sizes means different sizes of receptive fields, and the final concatenation means aggregation of features with different scales, so that the sizes of convolution kernels are 1 × 1, 3 × 3, and 5, mainly for convenience of alignment. After the convolution step stride is set to 1, as long as the edge extension parameters pad are set to 0, 1 and 2 respectively, features with the same dimension can be obtained after convolution, then the features can be directly spliced together, meanwhile, 3 × 3 pooling layers are introduced into the network, the more the network goes to the back, the more abstract the features are, and the more the receptive field involved in each feature is, therefore, as the number of layers increases, the proportion of convolution of 3 × 3 and 5 × 5 also increases, but the convolution kernel using 5 × 5 still brings huge calculation amount. For this purpose, a 1 × 1 convolution kernel is used for dimensionality reduction.
Corresponding to the embodiment of the method, the present application further provides a target identification device, and a schematic structural diagram of the target identification device provided by the present application is shown in fig. 5, and the target identification device may include:
an acquisition module 51, a filling module 52, an encoding module 53 and an identification module 54; wherein,
the acquisition module 51 is configured to acquire an RGB image and a depth image of the target area;
the filling module 52 is configured to perform hole filling on the depth image to obtain a repaired depth image;
the encoding module 53 is configured to encode the repaired depth image to obtain a three-channel depth image;
the recognition module 54 is configured to input the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
The target recognition device provided by the application acquires an RGB image and a depth image of a target area; filling holes in the depth image to obtain a repaired depth image; coding the restored depth image to obtain a three-channel depth image; and inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image. According to the method and the device, the target recognition is carried out by utilizing the recognition model trained in advance and combining the RGB image and the depth image, and the accuracy of the target recognition is improved.
In an alternative embodiment, the filling module 52 may include:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
the clustering unit is used for clustering the pixel values in the grayed RGB images to obtain clustered images, and the clustered images identify pixels with approximate pixel values in the grayed RGB images;
a second determining unit, configured to determine, in the grayed RGB image, a first pixel corresponding to the hole point and all second pixels of the same kind as the first pixel, where the second pixels correspond to non-hole points in the depth image;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
In an alternative embodiment, the filling module 52 may include:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
a third determining unit, configured to determine, in the RGB image, a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel, where the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
In an alternative embodiment, the encoding module 53 may specifically be configured to: and performing HHA coding on the repaired depth image to obtain a three-channel depth image.
In an alternative embodiment, the identifying the model may include: a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
In an optional embodiment, the deep network unit includes: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
As shown in fig. 6-7, fig. 6 is a frame of image to be subjected to target recognition, and the frame of image and the depth image corresponding to the frame of image are processed based on the target recognition method provided by the present application, and the obtained target recognition result is shown in fig. 7. In this example, the target is a chair, and when the recognition model is trained, the chair recognition model is obtained by training using a training sample including the chair.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
It should be understood that the technical problems can be solved by combining and combining the features of the embodiments from the claims.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of object recognition, comprising:
collecting an RGB image and a depth image of a target area;
filling holes in the depth image to obtain a repaired depth image;
coding the restored depth image to obtain a three-channel depth image;
inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
2. The method of claim 1, wherein the hole filling the depth image to obtain a restored depth image comprises:
carrying out binarization processing on the depth image to obtain a mask;
determining a hole point in the depth image according to the mask;
clustering pixel values in the grayed RGB images to obtain clustered images, wherein the clustered images identify pixel points with approximate pixel values in the grayed RGB images;
determining a first pixel corresponding to the void point and all second pixels of the same kind as the first pixel in the grayed RGB image, wherein the second pixels correspond to non-void points in the depth image;
calculating the distance between the first pixel and each second pixel;
and taking the depth value corresponding to the second pixel with the shortest distance between the first pixels as the filling value of the hole point.
3. The method of claim 1, wherein the hole filling the depth image to obtain a restored depth image comprises:
carrying out binarization processing on the depth image to obtain a mask;
determining a hole point in the depth image according to the mask;
determining a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel in the RGB image, wherein the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
calculating the distance between the first pixel and each second pixel;
and taking the depth value corresponding to the second pixel with the shortest distance between the first pixels as the filling value of the hole point.
4. The method of claim 1, wherein identifying the model comprises:
a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
5. The method of claim 4, wherein the deep network unit comprises: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
6. An object recognition apparatus, comprising:
the acquisition module is used for acquiring the RGB image and the depth image of the target area;
the filling module is used for filling the hole in the depth image to obtain a repaired depth image;
the coding module is used for coding the repaired depth image to obtain a three-channel depth image;
the recognition module is used for inputting the RGB image and the three-channel depth image into a pre-trained recognition model to obtain a target recognition result in the RGB image; the identification model is obtained by training a plurality of labeled RGB images and depth images corresponding to the labeled RGB images serving as samples in advance.
7. The apparatus of claim 6, wherein the fill module comprises:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
the clustering unit is used for clustering the pixel values in the grayed RGB images to obtain clustered images, and the clustered images identify pixels with approximate pixel values in the grayed RGB images;
a second determining unit, configured to determine, in the grayed RGB image, a first pixel corresponding to the hole point and all second pixels of the same kind as the first pixel, where the second pixels correspond to non-hole points in the depth image;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
8. The apparatus of claim 6, wherein the fill module comprises:
a binarization unit, configured to perform binarization processing on the depth image to obtain a mask;
a first determining unit, configured to determine a hole point in the depth image according to the mask;
a third determining unit, configured to determine, in the RGB image, a first pixel corresponding to the void point and a second pixel in a preset neighborhood of the first pixel, where the second pixel is a pixel in the preset neighborhood corresponding to a non-void point;
a calculating unit for calculating a distance between the first pixel and each of the second pixels;
and a filling unit configured to use a depth value corresponding to a second pixel having a shortest distance between the first pixels as a filling value of the hole point.
9. The apparatus of claim 6, wherein the recognition model comprises: a deep network unit and a convolutional neural network unit; wherein,
the depth network unit is used for processing the three-channel depth image so as to extract the characteristics of the three-channel depth image;
the convolution neural network unit is used for processing the RGB image, extracting the characteristics of the RGB image, processing the characteristics of the three-channel depth image and the characteristics of the RGB image, and obtaining a target identification result in the RGB image.
10. The apparatus of claim 9, wherein the deep network unit comprises: three layers of multilayer perceptron convolution layers;
the convolutional neural network unit includes: two layers of convolution pooling layers; a two-layer first inclusion module connected to the two-layer convolutional pooling layer; the first pooling layer is connected with the two first Inception modules; a five-layer second inclusion module connected to the first pooling layer; a second pooling layer connected to the five-layer second inclusion module; a two-layer third inclusion module connected to the second pooling layer; a third pooling layer connected to the two third inclusion modules; a signal loss layer connected to the third pooling layer; a linear layer connected to the loss of signal layer; a classification layer connected to the linear layer; a decision layer connected to the classification layer; an output layer coupled to the decision layer.
CN201910614107.8A 2019-07-09 2019-07-09 Target identification method and device Pending CN110334769A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910614107.8A CN110334769A (en) 2019-07-09 2019-07-09 Target identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910614107.8A CN110334769A (en) 2019-07-09 2019-07-09 Target identification method and device

Publications (1)

Publication Number Publication Date
CN110334769A true CN110334769A (en) 2019-10-15

Family

ID=68143410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910614107.8A Pending CN110334769A (en) 2019-07-09 2019-07-09 Target identification method and device

Country Status (1)

Country Link
CN (1) CN110334769A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102199A (en) * 2020-09-18 2020-12-18 贝壳技术有限公司 Method, device and system for filling hole area of depth image
CN113393421A (en) * 2021-05-08 2021-09-14 深圳市识农智能科技有限公司 Fruit evaluation method and device and inspection equipment
CN113902786A (en) * 2021-09-23 2022-01-07 珠海视熙科技有限公司 Depth image preprocessing method, system and related device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102447925A (en) * 2011-09-09 2012-05-09 青岛海信数字多媒体技术国家重点实验室有限公司 Virtual viewpoint image synthesis method and device
CN102625127A (en) * 2012-03-24 2012-08-01 山东大学 Optimization method suitable for virtual viewpoint generation of 3D television
CN103236082A (en) * 2013-04-27 2013-08-07 南京邮电大学 Quasi-three dimensional reconstruction method for acquiring two-dimensional videos of static scenes
CN103248909A (en) * 2013-05-21 2013-08-14 清华大学 Method and system of converting monocular video into stereoscopic video
US20170069071A1 (en) * 2015-09-04 2017-03-09 Electronics And Telecommunications Research Institute Apparatus and method for extracting person region based on red/green/blue-depth image
CN106651871A (en) * 2016-11-18 2017-05-10 华东师范大学 Automatic filling method for cavities in depth image
CN107977650A (en) * 2017-12-21 2018-05-01 北京华捷艾米科技有限公司 Method for detecting human face and device
CN108230380A (en) * 2016-12-09 2018-06-29 广东技术师范学院 Indoor entrance detection method based on the three-dimensional depth of field
US10062004B2 (en) * 2015-08-20 2018-08-28 Kabushiki Kaisha Toshiba Arrangement detection apparatus and pickup apparatus
CN108734210A (en) * 2018-05-17 2018-11-02 浙江工业大学 A kind of method for checking object based on cross-module state multi-scale feature fusion
CN109636732A (en) * 2018-10-24 2019-04-16 深圳先进技术研究院 A kind of empty restorative procedure and image processing apparatus of depth image

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102447925A (en) * 2011-09-09 2012-05-09 青岛海信数字多媒体技术国家重点实验室有限公司 Virtual viewpoint image synthesis method and device
CN102625127A (en) * 2012-03-24 2012-08-01 山东大学 Optimization method suitable for virtual viewpoint generation of 3D television
CN103236082A (en) * 2013-04-27 2013-08-07 南京邮电大学 Quasi-three dimensional reconstruction method for acquiring two-dimensional videos of static scenes
CN103248909A (en) * 2013-05-21 2013-08-14 清华大学 Method and system of converting monocular video into stereoscopic video
US10062004B2 (en) * 2015-08-20 2018-08-28 Kabushiki Kaisha Toshiba Arrangement detection apparatus and pickup apparatus
US20170069071A1 (en) * 2015-09-04 2017-03-09 Electronics And Telecommunications Research Institute Apparatus and method for extracting person region based on red/green/blue-depth image
CN106651871A (en) * 2016-11-18 2017-05-10 华东师范大学 Automatic filling method for cavities in depth image
CN108230380A (en) * 2016-12-09 2018-06-29 广东技术师范学院 Indoor entrance detection method based on the three-dimensional depth of field
CN107977650A (en) * 2017-12-21 2018-05-01 北京华捷艾米科技有限公司 Method for detecting human face and device
CN108734210A (en) * 2018-05-17 2018-11-02 浙江工业大学 A kind of method for checking object based on cross-module state multi-scale feature fusion
CN109636732A (en) * 2018-10-24 2019-04-16 深圳先进技术研究院 A kind of empty restorative procedure and image processing apparatus of depth image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LUKAS SCHNEIDER等: "Multimodal Neural Networks:RGB-D for Semantic Segmentation and Object Detection", 《SCANDINAVIAN CONFERENCE ON IMAGE ANALYSIS》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102199A (en) * 2020-09-18 2020-12-18 贝壳技术有限公司 Method, device and system for filling hole area of depth image
CN112102199B (en) * 2020-09-18 2024-11-08 贝壳技术有限公司 Depth image cavity region filling method, device and system
CN113393421A (en) * 2021-05-08 2021-09-14 深圳市识农智能科技有限公司 Fruit evaluation method and device and inspection equipment
CN113902786A (en) * 2021-09-23 2022-01-07 珠海视熙科技有限公司 Depth image preprocessing method, system and related device

Similar Documents

Publication Publication Date Title
CN111027547B (en) Automatic detection method for multi-scale polymorphic target in two-dimensional image
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
US9633282B2 (en) Cross-trained convolutional neural networks using multimodal images
CN109840556B (en) Image classification and identification method based on twin network
CN107944450B (en) License plate recognition method and device
CN109241871A (en) A kind of public domain stream of people's tracking based on video data
CN109886330B (en) Text detection method and device, computer readable storage medium and computer equipment
CN110675408A (en) High-resolution image building extraction method and system based on deep learning
CN111681273A (en) Image segmentation method and device, electronic equipment and readable storage medium
CN114155527A (en) Scene text recognition method and device
CN110765833A (en) Crowd density estimation method based on deep learning
CN110569856A (en) sample labeling method and device, and damage category identification method and device
CN109272487A (en) The quantity statistics method of crowd in a kind of public domain based on video
CN114049356B (en) Method, device and system for detecting structure apparent crack
CN113887472B (en) Remote sensing image cloud detection method based on cascade color and texture feature attention
CN110334769A (en) Target identification method and device
CN114648669A (en) Motor train unit fault detection method and system based on domain-adaptive binocular parallax calculation
CN109117723A (en) Blind way detection method based on color mode analysis and semantic segmentation
CN115546466A (en) Weak supervision image target positioning method based on multi-scale significant feature fusion
CN114494786A (en) Fine-grained image classification method based on multilayer coordination convolutional neural network
CN108154199B (en) High-precision rapid single-class target detection method based on deep learning
CN111898671B (en) Target identification method and system based on fusion of laser imager and color camera codes
CN112418262A (en) Vehicle re-identification method, client and system
Yuan et al. Graph neural network based multi-feature fusion for building change detection
CN116994024A (en) Method, device, equipment, medium and product for identifying parts in container image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191015