CN113838135B - Pose estimation method, system and medium based on LSTM double-flow convolutional neural network - Google Patents
Pose estimation method, system and medium based on LSTM double-flow convolutional neural network Download PDFInfo
- Publication number
- CN113838135B CN113838135B CN202111181525.6A CN202111181525A CN113838135B CN 113838135 B CN113838135 B CN 113838135B CN 202111181525 A CN202111181525 A CN 202111181525A CN 113838135 B CN113838135 B CN 113838135B
- Authority
- CN
- China
- Prior art keywords
- depth
- flow
- color
- feature map
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 43
- 230000004927 fusion Effects 0.000 claims abstract description 30
- 238000007781 pre-processing Methods 0.000 claims abstract description 24
- 238000013528 artificial neural network Methods 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 8
- 238000011176 pooling Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 claims description 4
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 16
- 230000015654 memory Effects 0.000 description 9
- 238000004519 manufacturing process Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 239000000047 product Substances 0.000 description 4
- 125000004122 cyclic group Chemical group 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000003213 activating effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a pose estimation method, a pose estimation system and a pose estimation medium based on an LSTM double-flow convolutional neural network, wherein the method comprises the following steps: s1, preprocessing a color image and a depth image, respectively cascading two adjacent frames of color images and depth images, further preprocessing the depth image by MND (maximum length coding), and finally normalizing the color image and the depth image; s2, respectively inputting the preprocessed color image and the preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network to perform feature extraction; s3, fusing the rgb feature map of the color flow output and depth feature map of the depth flow output to generate a new fusion feature map; s4, carrying out global average pooling treatment on the newly generated fusion feature map; s5, predicting the current pose by training through the LSTM neural network. The result shows that the pose estimation model provided by the method has higher precision and robustness under the conditions of motion blur and insufficient light.
Description
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a pose estimation method of an LSTM double-flow convolutional neural network.
Background
The intelligent manufacturing proposed by industry 4.0 is full life cycle oriented to products, and informatization manufacturing under ubiquitous sensing conditions is realized. The intelligent manufacturing technology is based on modern sensing technology, network technology, automation technology and artificial intelligence, and realizes the intellectualization of product design process, manufacturing process and enterprise management and service through perception, man-machine interaction, decision-making, execution and feedback, and is the deep fusion and integration of information technology and manufacturing technology. Indoor mobile robots are one of the representative products that incorporate modern sensing technology, networking technology, and automation technology proposed by industry 4.0.
Mobile robotics are widely used in the fields of resource exploration and development, medical services, home entertainment, military, aerospace, etc., for example Auto Guided Vehicle (AGV), and cleaning robots have been used in logistics transportation and home sanitary cleaning. In intelligent mobile robots, simultaneous localization and mapping (Simultaneous Localization and Mapping, SLAM) is its core technology. The navigation process of a mobile robot can be broken down into three modules: positioning, mapping and path planning. Positioning is used for determining the pose state of the robot in the environment at the current moment; the construction is to integrate the local continuous observation information of the surrounding environment into a globally consistent model; the path planning determines the optimal navigation path in the map.
Artificial intelligence techniques, which are currently capable of simulating human reasoning, judgment and memory, are widely used in various aspects, such as face recognition, object classification, etc. Similar to the application of the deep learning technology in the face recognition field, the visual odometer based on the feature point method also needs to detect, match and screen feature points. Therefore, the application of the deep learning technology to the visual odometer of SLAM has feasibility, and the visual odometer based on the deep learning is more in line with the human perception mode, and has wide research potential and value. Most of the existing visual mileage calculation methods basically go through links such as feature extraction matching, motion estimation, local optimization and the like, and are greatly influenced by camera parameters, motion blur and insufficient light.
The prior art comprises the following steps: a target positioning method based on deep image double-flow convolutional neural network regression learning (patent application number: 201910624713.8, patent publication number: CN 110443849A). According to the method, a binocular camera is adopted to shoot two pictures at the same time, depth reduction is carried out through an image preprocessing technology to obtain a depth image, and a color image is converted into a gray image in image preprocessing. After pretreatment, the two types of images are respectively input into a convolutional neural network to extract features, then the two features are subjected to convolutional feature fusion, and finally the two features are input into a full-union layer to carry out regression operation. The invention adopts the RGB-D camera as the sensor, can directly acquire the RGB image and the corresponding depth image, and does not need to convert the RGB image into the gray image. The RGB image and the depth image are input into a double-flow convolutional neural network after preprocessing is completed, color features in the RGB image and depth features of the depth image are respectively obtained, two feature maps are input into a feature fusion unit for splicing feature fusion, and finally the fusion features are input into a long-short-period memory cyclic neural network (LSTM) for time sequence modeling to obtain pose information. Compared with 201910624713.8, the method has different sensors, different preprocessing methods, different convolutional neural network structures, different feature fusion methods and different pose estimation methods.
Through the search, the closest prior art is: 201910624713.8A target positioning method based on double-flow convolutional neural network regression learning of depth image is characterized in that S1, at each reference position, a binocular camera collects gray level images and corresponding depth images thereof; s2, converting the gray level image and the depth image into three-channel images by using an image preprocessing technology; s3, double-flow CNN with shared weight coefficient is used for offline regression learning to obtain a regression model based on distance; s4 after preprocessing of the gray image and the depth image, the final distance may be estimated by a distance-based regression model. It is clear that it also makes some beneficial attempts, but there are also some problems of poor robustness and large errors in the line of sight estimation.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A pose estimation method, medium and system based on LSTM double-flow convolutional neural network are provided. The technical scheme of the invention is as follows:
a pose estimation method based on an LSTM double-flow convolutional neural network comprises the following steps:
s1, preprocessing a color image and a depth image acquired by an RGB-D camera, respectively cascading two adjacent frames of color images and depth images, preprocessing the depth images by adopting a MND (minimum normal+depth) method, and finally normalizing the color images and the depth images; s2, respectively inputting the preprocessed color image and the preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network to perform feature extraction; s3, fusing the color feature map rgb feature map output by the color flow and the depth feature map depth feature map output by the depth flow to generate a new fused feature map fusion feature map; s4, carrying out global average pooling treatment on the newly generated fusion feature map; s5, predicting the current pose by training through the LSTM neural network.
Further, the color image preprocessing specifically includes that adjacent frames of the color image are cascaded to generate a color image with 640 x 960 size; the depth image preprocessing is specifically that firstly, MND encoding processing is carried out on the depth image, and the width and the height of the depth image are scaled to be n x And n y Taking depth d as the third channel of the image, for scaled surface normal [ n ] x ,n y ,d]Satisfy the following requirementsAdjacent frames of the depth image are then concatenated to generate a 640 x 960 size depth image.
Further, the step S2 inputs the preprocessed color image and the preprocessed depth image into the color flow and the depth flow of the dual-flow convolutional neural network respectively for feature extraction, specifically: adopting a double-flow convolutional neural network architecture, wherein the consistent structure of color flow and depth flow consists of 5 layers of convolutional layers, extracting the characteristics of different layers in an image, and the first four layers are provided with ReLU activating units; pre-processed color image I rgb As input to the color stream, the preprocessed depth image I is to be processed depth And as the input of the depth stream, respectively obtaining a color characteristic spectrum and a depth structure characteristic spectrum through convolution operation.
Further, the dual-flow convolutional neural network adopts a parallel structure, each branch of the parallel structure is composed of five convolutional layers, the first four convolutional layers of each branch pass through a ReLU activation unit, and the formula is expressed as follows:
f(x)=max(0,x) (1)
where x is the input and f (x) is the output after passing through the ReLU unit.
Further, in the step S3, the fusing of the color feature map rgb feature map output by the color flow and the depth feature map depth feature map output by the depth flow to generate a new fused feature map fusion feature map specifically includes: combining the feature maps output by conv5 in two data stream networks to form a new fusion feature map, and performing global mean value pooling processing after batch normalization and ReLU nonlinear activation units, wherein the generated fusion characteristics are expressed as follows:
wherein X is k Is a compound of the formula fusion feature map,is rgb feature map, +.>Is depth feature map.
Further, the step S5 predicts the current pose by training using the LSTM neural network, and specifically includes:
performing time sequence modeling on the image sequence by utilizing an LSTM neural network, and predicting current pose information; the LSTM neural network consists of a forgetting gate, an input gate and an output gate, and is used for memorizing information useful for estimating the current pose information through learning and forgetting information useless for estimating the current pose information; the forgetting door can control to forget useless information in a last state, and the formula is as follows:
f k =σ(W f ·[h k-1 ,x k ]+b f ) (3)
wherein f k Is the output of the forget gate, σ is the sigmoid function, W f Is forgetting parameter, h k-1 Is the hidden state of the last moment, x k Is the input of the current moment, b f Is the bias of the forgetting door;
wherein the input gate determines what information to add to the current state, the input gate is selected by the input selection layer i k And candidate layerThe formula is as follows:
i k =σ(W i ·[h k-1 ,x k ]+b i ) (4)
wherein W is i Is an input parameter, tanh is a hyperbolic tangent function, W C Is a candidate parameter; b i Is a selection layer bias; b c Is a bias for the candidate layer;
wherein the output gate decides what predictions to make, the formula is:
o k =σ(W o ·[h k-1 ,x k ]+b o ) (6)
wherein W is o Is an output parameter; b o Is the bias of the output gate;
finally, a loss function is designed by minimizing Euclidean distance of the real pose and the estimated pose, and the loss function is as follows:
where N is the number of samples; w is the weight coefficient of the position and the gesture;to estimate the pose; />Is the actual pose.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the LSTM dual-flow convolutional neural network-based pose estimation method as claimed in any one of the claims.
An LSTM double-flow convolutional neural network pose estimation system based on the method comprises the following steps:
and a pretreatment module: the method comprises the steps of preprocessing color images and depth images acquired by an RGB-D camera, respectively cascading two adjacent frames of color images and depth images, preprocessing the depth images by adopting a minimized normal + depth method, and finally normalizing the color images and the depth images;
and the feature extraction module is used for: the method comprises the steps of inputting a preprocessed color image and a preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network respectively for feature extraction;
and a fusion module: the method comprises the steps of fusing a color feature map rgb feature map output by a color flow and a depth feature map depth feature map output by a depth flow to generate a new fused feature map fusion feature map; carrying out global average pooling treatment on the newly generated fusion characteristic map fusion feature map;
and a prediction module: and predicting the current pose through training by utilizing the LSTM neural network.
The invention has the advantages and beneficial effects as follows:
aiming at the problems that a visual odometer is sensitive to camera parameters and is greatly influenced by motion blur and insufficient light, the invention provides a convolutional neural network based on LSTM double flow, and the contour features extracted by depth flow supplement the color features extracted by color flow so as to improve the robustness of a pose estimation system in the motion blur and insufficient light environment.
Experiments on the public data set TUM show that the pose estimation is more robust in motion blur and insufficient light environments when the pose estimation fuses the contour features extracted from the depth image. Comparing the algorithm model with other pose estimation methods based on convolutional neural networks, the model has smaller error on vision estimation and obtains superior performance.
According to the pose estimation method based on the LSTM double-flow convolutional neural network, the step S2 is used for respectively inputting the preprocessed color image and depth image into the color flow and depth flow of the double-flow convolutional neural network for feature extraction. The method provides a new double-flow convolutional neural network structure, wherein the color flow and the depth flow have the same structure and are composed of 5 layers of convolutional layers, and the first four layers are provided with ReLU activating units. The method introduces depth features into the system through the double-flow architecture of the convolutional neural network, has higher precision and robustness compared with other gesture regression systems based on the convolutional neural network, and particularly has better performance in challenging environments.
According to the method, according to claims 4-6, the double-flow convolutional neural network is adopted to extract color features and depth features, the color features and the depth features are fused, and finally the fused features are used as inputs of a long-short-term memory cyclic neural network (LSTM) to conduct time sequence modeling, so that the current pose is estimated. The pose estimation is to estimate the pose by finding out the front and back rules in the image stream, and the long and short-term memory cyclic neural network can memorize the previous state, find out the association between the current moment and the past moment, and is just suitable for solving the problem of pose regression. The common method adopts the full-linked layer to predict pose information, and is more suitable for the problems of object identification and classification.
Drawings
FIG. 1 is a diagram of a pose estimation framework based on an LSTM dual-flow convolutional neural network in accordance with a preferred embodiment of the present invention;
FIG. 2 is a block diagram of an LSTM dual-flow convolutional neural network.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
s1, preprocessing a color image and a depth image, respectively cascading two adjacent frames of color images and depth images, further preprocessing the depth image by MND coding, and finally normalizing the color image and the depth image.
S2, respectively inputting the preprocessed color image and the preprocessed depth image into the color flow and the depth flow of the double-flow convolutional neural network to perform feature extraction. And then merging the rgb feature map of the color flow output and the depth feature map of the depth flow output to generate a new fusion feature map. Wherein the dual-flow convolutional neural network adopts a parallel structure, and each branch of the parallel structure consists of five convolutional layers, and the first four convolutional layers of each branch pass through a ReLU activation unit. The formula is as follows:
f(x)=max(0,x) (1)
where x is the input and f (x) is the output after passing through the ReLU unit.
The generated fusion features are expressed as:
wherein X is k Is a compound of the formula fusion feature map,is rgb feature map, +.>Is depth feature map.
S3, predicting the current pose by training through an LSTM neural network. The LSTM neural network is composed of a forgetting gate, an input gate and an output gate, and is used for memorizing information useful for estimating the current pose information through learning, and forgetting information useless for estimating the current pose information. The forgetting door can control to forget useless information in a last state, and the formula is as follows:
f k =σ(W f ·[h k-1 ,x k ]+b f ) (3)
wherein f k Is the output of the forget gate, σ is the sigmoid function, W f Is forgetting parameter, h k-1 Is the hidden state of the last moment, x k Is the input of the current moment, b f Is the bias of the forgetting gate.
The input gate determines what information to add to the current state, and the input gate selects layer i by input k And candidate layerThe formula is as follows:
i k =σ(W i ·[h k-1 ,x k ]+b i ) (4)
wherein W is i Is an input parameter, tanh is a hyperbolic tangent function, W C Is a candidate parameter. b i Is a selection layer bias; b c Is the bias of the candidate layer.
The output gate decides what predictions to make, and its formula is:
o k =σ(W o ·[h k-1 ,x k ]+b o ) (6)
wherein W is o Is an output parameter; b o Is the bias of the output gate.
Finally, a loss function is designed by minimizing Euclidean distance of the real pose and the estimated pose, and the loss function is as follows:
where N is the number of samples; w is the weight coefficient of the position and the gesture;to estimate the pose; />Is the actual pose. .
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.
Claims (7)
1. The pose estimation method based on the LSTM double-flow convolutional neural network is characterized by comprising the following steps of:
s1, preprocessing a color image and a depth image acquired by an RGB-D camera, respectively cascading two adjacent frames of color images and depth images, preprocessing the depth images by adopting a MND (minimum normal+depth) method, and finally normalizing the color images and the depth images; s2, respectively inputting the preprocessed color image and the preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network to perform feature extraction; s3, fusing the color feature map rgb feature map output by the color flow and the depth feature map depth feature map output by the depth flow to generate a new fused feature map fusion feature map; s4, carrying out global average pooling treatment on the newly generated fusion feature map; s5, predicting the current pose by training through an LSTM neural network;
the color image preprocessing specifically comprises the steps of cascading adjacent frames of color images to generate color images with 640 x 960 size; the depth image preprocessing is specifically that firstly, MND encoding processing is carried out on the depth image, and the width and the height of the depth image are scaled to be n x And n y Taking depth d as the third channel of the image, for scaled surface normal [ n ] x ,n y ,d]Satisfy the following requirementsAdjacent frames of the depth image are then concatenated to generate a 640 x 960 size depth image.
2. The pose estimation method based on the LSTM dual-flow convolutional neural network according to claim 1, wherein the step S2 inputs the preprocessed color image and the preprocessed depth image into the color flow and the depth flow of the dual-flow convolutional neural network respectively for feature extraction, specifically comprises: the structure of the color flow and the depth flow are consistent and are composed of 5 convolution layers by adopting a double-flow convolution neural network architecture, and the graph is extractedFeatures of different layers in the image, wherein the first four layers are provided with ReLU activation units; pre-processed color image I rgb As input to the color stream, the preprocessed depth image I is to be processed depth And as the input of the depth stream, respectively obtaining a color characteristic spectrum and a depth structure characteristic spectrum through convolution operation.
3. The pose estimation method based on LSTM dual-flow convolutional neural network according to any one of claims 1-2, wherein the dual-flow convolutional neural network adopts a parallel structure, each branch of the parallel structure is composed of five convolutional layers, the first four convolutional layers of each branch are subjected to a ReLU activation unit, and the formula is expressed as:
f(x)=max(0,x) (1)
where x is the input and f (x) is the output after passing through the ReLU unit.
4. The pose estimation method based on LSTM dual-flow convolutional neural network according to claim 3, wherein the step S3 of fusing the color feature map rgb feature map output by the color flow and the depth feature map depth feature map output by the depth flow to generate a new fused feature map fusion feature map specifically comprises: combining the feature maps output by conv5 in two data stream networks to form a new fusion feature map, and performing global mean value pooling processing after batch normalization and ReLU nonlinear activation units, wherein the generated fusion characteristics are expressed as follows:
wherein X is k Is a compound of the formula fusion feature map,is rgb feature map, +.>Is depth feature map.
5. The method for estimating pose based on LSTM dual-flow convolutional neural network according to claim 4, wherein said step S5 predicts the current pose by training using LSTM neural network, and specifically comprises:
performing time sequence modeling on the image sequence by utilizing an LSTM neural network, and predicting current pose information; the LSTM neural network consists of a forgetting gate, an input gate and an output gate, and is used for memorizing information useful for estimating the current pose information through learning and forgetting information useless for estimating the current pose information; the forgetting door can control to forget useless information in a last state, and the formula is as follows:
f k =σ(W f ·[h k-1 ,x k ]+b f ) (3)
wherein f k Is the output of the forget gate, σ is the sigmoid function, W f Is forgetting parameter, h k-1 Is the hidden state of the last moment, x k Is the input of the current moment, b f Is the bias of the forgetting door;
wherein the input gate determines what information to add to the current state, the input gate is selected by the input selection layer i k And candidate layerThe formula is as follows:
i k =σ(W i ·[h k-1 ,x k ]+b i ) (4)
wherein W is i Is an input parameter, tanh is a hyperbolic tangent function, W C Is a candidate parameter; b i Is a selection layer bias; b c Is a bias for the candidate layer;
wherein the output gate decides what predictions to make, the formula is:
o k =σ(W o ·[h k-1 ,x k ]+b o ) (6)
wherein W is o Is an output parameter; b o Is the bias of the output gate;
finally, a loss function is designed by minimizing Euclidean distance of the real pose and the estimated pose, and the loss function is as follows:
where N is the number of samples; w is the weight coefficient of the position and the gesture;to estimate the pose; />Is the actual pose.
6. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the pose estimation method based on the LSTM dual-flow convolutional neural network according to any one of claims 1 to 5 is implemented.
7. A pose estimation system of an LSTM dual-flow convolutional neural network based on the method of any one of claims 1-5, comprising the steps of:
and a pretreatment module: the method comprises the steps of preprocessing color images and depth images acquired by an RGB-D camera, respectively cascading two adjacent frames of color images and depth images, preprocessing the depth images by adopting a minimized normal + depth method, and finally normalizing the color images and the depth images;
and the feature extraction module is used for: the method comprises the steps of inputting a preprocessed color image and a preprocessed depth image into a color flow and a depth flow of a double-flow convolutional neural network respectively for feature extraction;
and a fusion module: the method comprises the steps of fusing a color feature map rgb feature map output by a color flow and a depth feature map depth feature map output by a depth flow to generate a new fused feature map fusion feature map; carrying out global average pooling treatment on the newly generated fusion characteristic map fusion feature map;
and a prediction module: predicting the current pose through training by utilizing an LSTM neural network;
the color image preprocessing specifically comprises the steps of cascading adjacent frames of color images to generate color images with 640 x 960 size; the depth image preprocessing is specifically that firstly, MND encoding processing is carried out on the depth image, and the width and the height of the depth image are scaled to be n x And n y Taking depth d as the third channel of the image, for scaled surface normal [ n ] x ,n y ,d]Satisfy the following requirementsAdjacent frames of the depth image are then concatenated to generate a 640 x 960 size depth image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111181525.6A CN113838135B (en) | 2021-10-11 | 2021-10-11 | Pose estimation method, system and medium based on LSTM double-flow convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111181525.6A CN113838135B (en) | 2021-10-11 | 2021-10-11 | Pose estimation method, system and medium based on LSTM double-flow convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113838135A CN113838135A (en) | 2021-12-24 |
CN113838135B true CN113838135B (en) | 2024-03-19 |
Family
ID=78968495
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111181525.6A Active CN113838135B (en) | 2021-10-11 | 2021-10-11 | Pose estimation method, system and medium based on LSTM double-flow convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113838135B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114615183B (en) * | 2022-03-14 | 2023-09-05 | 广东技术师范大学 | Routing method, device, computer equipment and storage medium based on resource prediction |
CN115577755A (en) * | 2022-11-28 | 2023-01-06 | 中环服(成都)科技有限公司 | Robot posture correction method, apparatus, computer device, and storage medium |
CN116704026A (en) * | 2023-05-24 | 2023-09-05 | 国网江苏省电力有限公司南京供电分公司 | Positioning method, positioning device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110163909A (en) * | 2018-02-12 | 2019-08-23 | 北京三星通信技术研究有限公司 | For obtaining the method, apparatus and storage medium of equipment pose |
CN110473254A (en) * | 2019-08-20 | 2019-11-19 | 北京邮电大学 | A kind of position and orientation estimation method and device based on deep neural network |
CN111796681A (en) * | 2020-07-07 | 2020-10-20 | 重庆邮电大学 | Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction |
CN111833400A (en) * | 2020-06-10 | 2020-10-27 | 广东工业大学 | Camera position and posture positioning method |
CN112819853A (en) * | 2021-02-01 | 2021-05-18 | 太原理工大学 | Semantic prior-based visual odometer method |
WO2021098766A1 (en) * | 2019-11-20 | 2021-05-27 | 北京影谱科技股份有限公司 | Orb feature visual odometer learning method and device based on image sequence |
-
2021
- 2021-10-11 CN CN202111181525.6A patent/CN113838135B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110163909A (en) * | 2018-02-12 | 2019-08-23 | 北京三星通信技术研究有限公司 | For obtaining the method, apparatus and storage medium of equipment pose |
CN110473254A (en) * | 2019-08-20 | 2019-11-19 | 北京邮电大学 | A kind of position and orientation estimation method and device based on deep neural network |
WO2021098766A1 (en) * | 2019-11-20 | 2021-05-27 | 北京影谱科技股份有限公司 | Orb feature visual odometer learning method and device based on image sequence |
CN111833400A (en) * | 2020-06-10 | 2020-10-27 | 广东工业大学 | Camera position and posture positioning method |
CN111796681A (en) * | 2020-07-07 | 2020-10-20 | 重庆邮电大学 | Self-adaptive sight estimation method and medium based on differential convolution in man-machine interaction |
CN112819853A (en) * | 2021-02-01 | 2021-05-18 | 太原理工大学 | Semantic prior-based visual odometer method |
Also Published As
Publication number | Publication date |
---|---|
CN113838135A (en) | 2021-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113838135B (en) | Pose estimation method, system and medium based on LSTM double-flow convolutional neural network | |
CN111127513B (en) | Multi-target tracking method | |
CN107369166B (en) | Target tracking method and system based on multi-resolution neural network | |
CN110781262B (en) | Semantic map construction method based on visual SLAM | |
CN111079780B (en) | Training method for space diagram convolution network, electronic equipment and storage medium | |
CN112489081B (en) | Visual target tracking method and device | |
CN112749726B (en) | Training method and device for target detection model, computer equipment and storage medium | |
CN112258565B (en) | Image processing method and device | |
CN114708435B (en) | Obstacle size prediction and uncertainty analysis method based on semantic segmentation | |
CN111754546A (en) | Target tracking method, system and storage medium based on multi-feature map fusion | |
Kadim et al. | Deep-learning based single object tracker for night surveillance | |
Yu et al. | LiDAR-based localization using universal encoding and memory-aware regression | |
Panda et al. | Kernel density estimation and correntropy based background modeling and camera model parameter estimation for underwater video object detection | |
CN114861859A (en) | Training method of neural network model, data processing method and device | |
CN112529025A (en) | Data processing method and device | |
Dong et al. | Combination of modified U‐Net and domain adaptation for road detection | |
Alvar et al. | Mixture of merged gaussian algorithm using RTDENN | |
Martinez et al. | Comparative study of optimization algorithms on convolutional network for autonomous driving | |
Duan | [Retracted] Deep Learning‐Based Multitarget Motion Shadow Rejection and Accurate Tracking for Sports Video | |
CN111578956A (en) | Visual SLAM positioning method based on deep learning | |
CN116958041A (en) | Product defect detection method and device, electronic equipment and storage medium | |
CN115239974A (en) | Vision synchronous positioning and map construction closed-loop detection method integrating attention mechanism | |
CN116958624A (en) | Method, device, equipment, medium and program product for identifying appointed material | |
CN112906724B (en) | Image processing device, method, medium and system | |
CN114372999A (en) | Object detection method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |