CN109345588B - Tag-based six-degree-of-freedom attitude estimation method - Google Patents
Tag-based six-degree-of-freedom attitude estimation method Download PDFInfo
- Publication number
- CN109345588B CN109345588B CN201811101406.3A CN201811101406A CN109345588B CN 109345588 B CN109345588 B CN 109345588B CN 201811101406 A CN201811101406 A CN 201811101406A CN 109345588 B CN109345588 B CN 109345588B
- Authority
- CN
- China
- Prior art keywords
- frame
- camera
- pose
- tag
- camera pose
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 239000011159 matrix material Substances 0.000 claims abstract description 34
- 238000013519 translation Methods 0.000 claims description 12
- 241000274965 Cyrestis thyodamas Species 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 abstract description 14
- 238000001514 detection method Methods 0.000 abstract description 11
- 238000003384 imaging method Methods 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 238000005316 response function Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C11/00—Photogrammetry or videogrammetry, e.g. stereogrammetry; Photographic surveying
- G01C11/02—Picture taking arrangements specially adapted for photogrammetry or photographic surveying, e.g. controlling overlapping of pictures
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06K—GRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K7/00—Methods or arrangements for sensing record carriers, e.g. for reading patterns
- G06K7/10—Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
- G06K7/14—Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
- G06K7/1404—Methods for optical code recognition
- G06K7/1408—Methods for optical code recognition the method being specifically adapted for the type of code
- G06K7/1417—2D bar codes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06K—GRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K7/00—Methods or arrangements for sensing record carriers, e.g. for reading patterns
- G06K7/10—Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
- G06K7/14—Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
- G06K7/1404—Methods for optical code recognition
- G06K7/1439—Methods for optical code recognition including a method step for retrieval of the optical code
- G06K7/1443—Methods for optical code recognition including a method step for retrieval of the optical code locating of the code in an image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Remote Sensing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Toxicology (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Electromagnetism (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Automation & Control Theory (AREA)
- Image Analysis (AREA)
- Length Measuring Devices By Optical Means (AREA)
Abstract
The invention discloses a Tag-based six-degree-of-freedom attitude estimation method, which is characterized in that a Tag is added on an object to assist detection, the Tag on the object is identified through a camera to assist SLAM in completing initialization, feature points are continuously extracted from each frame of image after initialization, camera attitude estimation is carried out according to whether a speed matrix corresponding to the previous frame is empty or not, the value obtained by the camera attitude estimation is used as an initial value, a re-projection error function of a map point corresponding to the feature point to an image coordinate system is used as an objective function to carry out camera attitude optimization, the optimized camera attitude and the map point corresponding to the feature point are obtained, and then the camera attitude is converted into the object attitude. The method of the invention has better robustness and high attitude estimation precision when the imaging quality is poor and the object moves at high speed.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a Tag-based six-degree-of-freedom attitude estimation method.
Background
The posture of the three-dimensional object intuitively reflects the characteristics of the three-dimensional object, so that the three-dimensional posture is one of the important characteristics of the three-dimensional object and has been the research focus of researchers at home and abroad. If the three-dimensional posture of the space target can be judged, the purpose of the space target can be roughly judged, and the space target can be well classified.
The attitude estimation problem is the problem of determining the azimuth and the direction of a certain three-dimensional target object. Pose estimation has applications in many areas such as robot vision, motion tracking, and single camera calibration. With the widespread application of computer vision technology, in the case of a fixed camera, photographing a freely moving object in a scene and estimating the posture of the object has become an important research direction. The prior art currently has a number of solutions to the pose estimation problem, such as model-based detection or infrared-based detection.
At present, detection is carried out based on a model, modeling needs to be carried out on a detected object before detection, then how to match a target model with the object in an actual image is considered, during matching, the average distance from a model point set to an image edge point set is used as a matching measure, and then the pose of a target in the image is solved. The model-based attitude estimation method is to calculate the similarity and update the attitude of the object by comparing the real image with the synthetic image. In order to avoid optimization search in a global state space, the existing model-based method generally degrades an optimization problem into a matching problem of a plurality of local features, and depends on accurate detection of the local features. When the noise is large and the accurate local features cannot be extracted, the robustness of the method is greatly influenced. When the infrared-based detection is carried out, the infrared is easily influenced by the temperature of the environment, and the robustness of the method is greatly influenced
Disclosure of Invention
The invention aims to provide a Tag-based six-degree-of-freedom posture estimation method, which is characterized in that the Tag is added on an object to assist detection, the camera identifies the Tag on the object to help SLAM complete initialization, ORB feature points in an image are detected to complete posture estimation of the camera, and then the posture of the camera is converted into the posture of the object, so that the method has good robustness and high posture estimation accuracy when the imaging quality is poor and the object moves at high speed.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a Tag-based six-degree-of-freedom attitude estimation method comprises the following steps:
receiving an input image stream, detecting a two-dimensional code label attached to a moving object in an image, finishing initialization, and taking an image frame after initialization as an initial key frame;
after initialization, continuously extracting feature points of each frame of image, and estimating the pose of the camera according to whether a speed matrix corresponding to the previous frame is empty or not;
taking the value obtained by the estimation of the camera pose as an initial value, and adopting a re-projection error function of the map points corresponding to the feature points to the image coordinate system as an objective function to optimize the camera pose to obtain the optimized camera pose and the map points corresponding to the feature points;
comparing the common map point of the current frame and the current key frame, and taking the current frame as a new current key frame when the common map point of the current frame and the current key frame does not exceed a set threshold value;
and after the optimized camera pose is obtained, the camera pose of the current frame is solved according to the camera pose of the current frame to the current frame key frame, the camera pose of the current key frame and the camera pose of the initialization key frame, and the camera pose is converted into the pose of a moving object.
Further, the method for estimating the pose in six degrees of freedom based on Tag further comprises the following steps:
when a new current key frame is generated, all key frames are also detected in sequence, and if a common map point between a key frame and at least three other key frames is detected to exceed a set threshold value, the key frame is rejected.
Further, the performing camera pose estimation includes:
if the speed matrix corresponding to the previous frame is empty, assigning the camera pose of the current frame to the camera pose of the current key frame;
if the speed matrix corresponding to the previous frame is not empty, the camera pose of the current frame is the speed matrix corresponding to the previous frame multiplied by the camera pose of the previous frame;
and the speed matrix corresponding to the previous frame is the camera pose of the previous frame multiplied by the camera pose of the previous frame.
Further, the objective function is:
wherein eta represents a set of all feature matching points of the current frame and the previous frame, initial values of R and t are camera poses obtained by estimation, namely a rotation matrix and a translation vector of the camera, rho represents a Huber energy function, sigma represents a covariance matrix of a key frame corresponding to the optimized frame, and xiImage coordinates, X, representing the ith feature pointiMap point coordinates representing the ith feature point, EcProRepresenting the reprojection error between the three-dimensional points, calculated by the pose transformation estimated by the camera, and the feature points in the image after reprojection back into the image, EtagProRepresenting reprojection errors, R, of the pose transform estimated by the camera and the acquired Tag pose transform, respectively, reprojected onto the imagetagAnd ttagRespectively representing the acquired Tag rotation matrix and translation vector, and pi is represented by the following definition:
wherein (f)x,fy) Denotes the camera focal length, (c)x,cy) Representing the camera aperture, (X, Y, Z) representing world coordinate system coordinates of the spatial points.
ρ (a) is the Huber energy function:
a represents an error value and δ is a constant.
The invention provides a Tag-based six-degree-of-freedom attitude estimation method, which is characterized in that the Tag is added on an object to assist detection, the camera identifies the Tag on the object to help SLAM complete initialization, ORB feature points in an image are detected to complete pose estimation of the camera, and then the pose of the camera is converted into the pose of the object.
Drawings
FIG. 1 is a flow chart of a Tag-based six-DOF attitude estimation method of the present invention;
fig. 2 is a diagram of a two-dimensional code Tag;
FIG. 3 is a schematic diagram of Tag label coordinates;
FIG. 4 is a schematic view of a camera coordinate system;
FIG. 5 is a schematic diagram of a camera coordinate system relative to a tag coordinate system during initialization.
Detailed Description
The technical solutions of the present invention are further described in detail below with reference to the drawings and examples, which should not be construed as limiting the present invention.
As shown in fig. 1, an embodiment of a Tag-based six-degree-of-freedom attitude estimation method is provided, in this embodiment, a two-dimensional code Tag is attached to a moving object to be estimated, and the two-dimensional code Tag carries two-dimensional code information of a Tag identification number, including the following steps:
and step S1, receiving the input image stream, detecting a two-dimensional code label attached to a moving object in the image, and finishing initialization, wherein the image frame after initialization is taken as an initial key frame.
The initialization process of this embodiment includes: detecting the angular points of two-dimensional code labels attached to moving objects in the images, and establishing a world coordinate system; and extracting the feature points of the image frame during initialization, and calculating the map points corresponding to the feature points.
According to the technical scheme, under the condition that a camera is fixed, a moving object in a scene is shot, an image of the moving object is obtained, and an image stream is input. In this embodiment, a two-dimensional code label is attached to a moving object, and the two-dimensional code label attached to the moving object is detected (Tag detection), and first, binarization is performed on a photographed moving object image to obtain a gray-scale image, so that candidate labels can be searched in the gray-scale image, and each searched candidate label is decoded to determine whether the candidate label is an effective two-dimensional code label.
In an embodiment of the present invention, a process of detecting a two-dimensional code tag is as follows:
in the grayscale map, I (I, j) is the value of the pixel of the image at the coordinate (I, j) point whose gradient value is:
G(i,j)=Ix+Iy (1)
wherein Ix,IyRepresenting the gradient of the image in the horizontal and vertical directions, respectively:
Ix=I(i+1,j)-I(i,j),Iy=I(i,j+1)-I(i,j) (2)
extracting a square corner point in a picture according to a Harris algorithm, firstly calculating gradients of image pixel points in horizontal and vertical directions and a product of the gradients and the product to obtain a 2 x 2 matrix M:
wherein, Ix 2=Ix×Ix,Iy 2=Iy×Iy。
Then, the image is gaussian filtered to obtain a new M. The corner response function Re for each pixel is next calculated using M:
Re=[Ix 2×Iy 2-(Ix,Iy)2]-k(Ix 2+Iy 2)2 (4)
wherein k is a constant coefficient and generally has a value range of 0.04-0.06. And then searching a maximum value point, and if the response value of Re is greater than a threshold value, determining the point as a corner point.
In this embodiment, a straight line in an image is detected by hough transform to determine that the detected corner point is a corner point of a Tag edge, so that image coordinates of four corner points can be obtained. As shown in fig. 2, Tag is a black and white two-dimensional code square, different tags contain different digital coding information, and then it is determined whether the Tag is a square area according to the detected straight line. If the square areas are the candidate areas, matching the candidate areas with the Tag library, and judging whether the square areas are tags or not. It is easy to understand that detecting the two-dimensional code label in the image is a relatively mature technology in the prior art, and is not described herein again.
As shown in fig. 3, the Tag itself has a Tag coordinate system, and the camera coordinate system is a coordinate system (as shown in fig. 4) with the camera optical center (the intersection point of the camera optical axis and the image plane) as the origin, and according to the external reference, the camera coordinate system established on the camera is overlapped with the Tag coordinate system of the Tag through rotational translation, so as to establish a unique world coordinate system of the system. The world coordinate system is the absolute coordinate system of the system, and the coordinates of all points on the image are determined at the origin of the coordinate system.
SLAM (Simultaneous Localization And mapping), And immediate positioning And map construction. Two frames of images are required to be initialized, the real size of the Tag is measured to be 2 hx 2h, the center of the Tag is taken as the origin of a label coordinate system, then 3D coordinates of four corners of the Tag are shown in figure 3, and according to 2D image coordinates (obtained when the Tag is detected) corresponding to the four corners, external parameters between the Tag and a camera in the two frames of images, namely a rotation matrix and a translation vector between the Tag and the camera are obtained through a PnP algorithm, so that rotation translation can be carried out, and a unique world coordinate system is established.
In this embodiment, for each frame in the input image stream, feature points need to be extracted, and a map point corresponding to the feature points is obtained through subsequent steps, where the map point is a three-dimensional coordinate point of a two-dimensional feature point on an SLAM calculation image, and the depth of the two-dimensional feature point is calculated by the map point.
For moving object pictures, feature extraction is also performed in the embodiment, algorithms such as FAST, SURF, SIFT and the like are currently used for extracting feature points, and the FAST algorithm is FAST in extracting the feature points, so that the requirement of the system on real-time performance can be met, and the FAST algorithm is used for detecting the feature points in the embodiment. Generally, the candidate point is considered to be a feature point based on the fact that the difference between the gray value of the image of one circle around the feature point and the gray value of the feature point is large enough:
wherein, i (x) is the gray scale of any point on the circle (p), i (p) is the gray scale of the center of the circle, epsilon is the threshold of the gray scale difference, if N is larger than the given threshold, p is considered as a feature point.
After the feature points are obtained, a descriptor of the feature points is calculated by a BRIEF algorithm. In this embodiment, a feature point is taken as a center, an S × S domain window is taken, a pair of points is randomly selected in the window, the pixel values of the two are compared, τ is defined, and the following binary value assignment is performed:
wherein, p (x), p (y) are pixel values of the random points x, y, respectively. And repeating the formula (6), and taking 256 pairs of points to form a binary code, wherein the code is the description of the characteristic points.
For the frame image at initialization, as shown in fig. 5, the first frame passes through the external reference R between the Tag1And an external reference R between the second frame and Tag2The external parameter between the first frame and the second frame, namely the camera pose R between the two frames can be obtained12:
R12=R1 -1×R2 (5)
Knowing the two-dimensional coordinates of the two frames of feature points and the poses between the two frames, calculating the depth value information of the feature points through triangulation, namely Z-axis data of corresponding points of the feature points in a world coordinate system.
According to the invention, the scale problem of monocular SLAM (the real size of the generated map in the real world cannot be known) can be solved through the real size of Tag, and high-precision map points are formed during initialization. When the system is operated for a long time, the translation distance of the object can still be measured with high precision.
In addition, the present embodiment also takes the image frame when the initialization is completed as the initial key frame.
And step S2, after initialization, continuously extracting feature points of each frame of image, and estimating the pose of the camera according to whether the speed matrix corresponding to the previous frame is empty.
After initialization, the present embodiment continuously processes an image stream, an image frame currently processed is referred to as a current frame, and feature point extraction is performed on the current frame, which has been described above with respect to feature point extraction, and is not described here again.
For the estimation of the camera pose, if the map is created or updated with each map in the present embodiment, the amount of calculation is too large, and therefore the present embodiment calculates by the key frame. In this embodiment, the image frame after initialization is used as an initial key frame, and after a map point corresponding to a feature point of each image frame is obtained, the current key frame is updated for subsequent pose estimation.
The current key frame refers to a corresponding key frame when the current frame is processed, and after initialization, the image frame after initialization is an initial key frame, and the embodiment manages the key frame specifically as follows:
in this embodiment, for each frame of moving object image, after obtaining the map point corresponding to each frame of image feature point, the map point common to the current frame and the current key frame is compared, and when the map point common to the current frame and the current key frame does not exceed a set threshold (e.g., 90%), the current frame is taken as a new current key frame.
In addition, for all key frames, when a new current key frame is generated, all key frames are also detected in sequence, and if a common map point between a key frame and at least three other key frames is detected to exceed a set threshold (for example, at least 90% of map points are repeated), the key frame is eliminated, so that the calculation amount of maintaining the key frames is reduced.
In this embodiment, the camera pose estimation is performed, and the estimated camera pose is used as an initial value for subsequent optimization.
In one embodiment of the present invention, performing camera pose estimation includes:
and if the speed matrix corresponding to the previous frame is empty, performing camera pose estimation according to the current key frame, otherwise performing camera pose estimation according to the motion model. And multiplying the camera pose of the previous frame by the camera pose of the previous frame after the speed matrix corresponding to the previous frame is obtained.
In the embodiment, when the camera pose estimation is performed, if the speed matrix of the previous frame is not empty, the camera pose estimation is performed according to the motion model. The motion model assumes that the camera motion speed is unchanged, that is, when estimating the camera pose of the current frame, the motion model is: the camera pose of the current frame is equal to the speed matrix corresponding to the previous frame multiplied by the camera pose of the previous frame.
It should be noted that, after the speed matrix of the previous frame is obtained, the camera pose of the previous frame is multiplied by the camera pose of the previous frame. In this embodiment, after the camera pose of one frame is obtained through calculation, the speed matrix is also updated to be used for camera pose estimation of subsequent frames.
And if the speed matrix of the previous frame is empty, camera pose estimation is carried out according to the key frame, namely, the camera pose of the current frame is directly assigned to be the camera pose of the current key frame and used as an initial value for subsequent camera pose optimization.
And S3, taking the value obtained by camera pose estimation as an initial value, and performing camera pose optimization by taking a re-projection error function of the map points corresponding to the feature points re-projected to an image coordinate system as an objective function to obtain the optimized camera pose and the map points corresponding to the feature points.
In this embodiment, a beam Adjustment method (Bundle Adjustment) is performed on the currently estimated camera pose, and the camera pose of a single frame is optimized to improve the accuracy of the camera pose, so as to obtain an accurate camera pose of the current frame.
In this embodiment, a descriptor of a feature point is used to match a current frame with a corresponding key frame image feature point, a coordinate of a matched 2D feature point (i.e., an image coordinate) and a map point formed by the matched 2D feature point (i.e., a 3D coordinate point (world coordinate system coordinate) are known, the 3D coordinate is re-projected back to a 2D image according to a current camera pose, the projected back point is compared with an originally corresponding 2D point on the image, the camera pose of the current frame is optimized, and a 3D-2D re-projection error is used to form an optimized objective function.
A new error term E introduced by the methodtagProAnd fusing new data, and jointly optimizing the camera pose by combining the reprojection error with the pose of Tag to obtain a rotation matrix R, a translation vector t and a map point set X which enable the reprojection error to be minimum:
wherein eta represents a set of all feature matching points of the current frame and the previous frame, initial values of R and t are camera poses obtained by estimation, namely a rotation matrix and a translation vector of the camera, rho represents a Huber energy function, sigma represents a covariance matrix of a key frame corresponding to the optimized frame, and xiImage coordinates, X, representing the ith feature pointiMap point coordinates representing the ith feature point, EcProRepresenting the reprojection error between the three-dimensional points, calculated by the pose transformation estimated by the camera, and the feature points in the image after reprojection back into the image, EtagProRepresenting reprojection errors, R, of the pose transform estimated by the camera and the acquired Tag pose transform, respectively, reprojected onto the imagetagAnd ttagRespectively representing the acquired Tag rotation matrix and translation vector, and pi is represented by the following definition:
wherein (f)x,fy) Denotes the camera focal length, (c)x,cy) Representing the camera aperture, (X, Y, Z) representing world coordinate system coordinates of the spatial points.
ρ (a) is the Huber energy function:
a represents the error value and δ is a constant, typically 1.
By the optimization mode, the optimized camera pose is obtained, and the optimized camera pose is the camera pose with the minimum re-projection error.
It should be noted that, a new error term E is added to the above optimization methodtagProAnd the optimization precision is improved. When the new error term E is not addedtagProOptimization can also be performed, but the optimization accuracy is not high by adding a new error term. For the specific optimization process without adding the new error term, it is not described here again.
SLAM uses the rotation matrix and translation vector of the current frame relative to the world coordinate system origin to represent the current camera pose, i.e. the external reference of the current frame relative to the initial key frame, which is not described herein again.
In this embodiment, for the world coordinate system coordinates of the feature points after initialization, the world coordinate system coordinates of the feature points obtained by the above optimization, that is, the corresponding map points, are obtained. For the feature points with unknown depth values, when the current key frame is just formed, the depth of the feature points is given as 1, then the feature points are continuously tracked in the subsequent frames according to the reprojection error of the formula (8), and the depth values of the feature points are obtained through optimization. If the depth values converge, new world coordinate system coordinates are formed, otherwise, the feature points are discarded, and the world coordinate system coordinates are not formed.
It is easy to understand that after the map point corresponding to the feature point of the current frame is obtained, the map point common to the current frame and the current key frame is compared, and the current key frame is updated, which is not described herein again.
And step S4, after the optimized camera pose is obtained, the camera pose of the current frame is obtained according to the camera pose of the current frame to the current frame key frame, the camera pose of the current key frame and the camera pose of the initialization key frame, and the camera pose is converted into the pose of a moving object.
After the optimized camera pose is obtained, the camera pose of the current frame is calculated according to the camera pose of the current frame to the current frame key frame and the camera poses of the current key frame and the initialization key frame, so that the influence of the camera pose estimation error of the previous frame on the camera pose estimation of the current frame is avoided.
Tcamera=Tref×TcamToRef (12)
Wherein T iscameraFor the camera pose, T, of the current framerefCamera pose, T, for current keyframe and initialization keyframecamToRefThe camera pose of the current frame to the current frame key frame.
It should be noted that after the camera pose of the current frame is calculated, the camera pose can be used to calculate a speed matrix, which is not described herein again.
After the camera pose of the current frame is calculated, the camera pose is converted into the pose of an object:
Tobject=Tcamera -1 (13)
wherein T isobjectAnd representing the pose of the object.
In practical application, the moving object can be an unmanned aerial vehicle, the camera pose is converted into the unmanned aerial vehicle pose by the method of the technical scheme, and the three-dimensional point cloud picture of the current unmanned aerial vehicle is output.
The invention provides a six-degree-of-freedom pose estimation method for an object, which is based on pure vision and still has excellent performance under the condition that the object moves rapidly. Through experiments, the technical indexes shown in the table 1 can be obtained:
yaw angle error | Error in pitch angle | Error of flip angle | Error in x-axis direction | Error in y-axis direction | Error in z-axis direction |
0.1° | 0.5° | 0.5° | 1mm | 1mm | 0.3mm |
TABLE 1 measurement of experimental error
The invention introduces Tag to eliminate the accumulated error of the system, completes one-frame initialization of the SLAM through the Tag, and solves the scale drift problem of the monocular SLAM. Due to the fact that the detection efficiency of the feature points is high, the method can achieve real-time detection of 50FPS at most.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and those skilled in the art can make various corresponding changes and modifications according to the present invention without departing from the spirit and the essence of the present invention, but these corresponding changes and modifications should fall within the protection scope of the appended claims.
Claims (4)
1. A method for estimating a posture of six degrees of freedom based on Tag is characterized by comprising the following steps:
receiving an input image stream, detecting a two-dimensional code label attached to a moving object in an image, finishing initialization, and taking an image frame after initialization as an initial key frame;
after initialization, continuously extracting feature points of each frame of image, and estimating the pose of the camera according to whether a speed matrix corresponding to the previous frame is empty or not;
taking the value obtained by the estimation of the camera pose as an initial value, and adopting a re-projection error function of the map points corresponding to the feature points to the image coordinate system as an objective function to optimize the camera pose to obtain the optimized camera pose and the map points corresponding to the feature points;
comparing the common map point of the current frame and the current key frame, and taking the current frame as a new current key frame when the common map point of the current frame and the current key frame does not exceed a set threshold value;
and after the optimized camera pose is obtained, the camera pose of the current frame is solved according to the camera pose of the current frame to the current frame key frame, the camera pose of the current key frame and the camera pose of the initialization key frame, and the camera pose is converted into the pose of a moving object.
2. The Tag-based six-degree-of-freedom attitude estimation method of claim 1, further comprising:
when a new current key frame is generated, all key frames are also detected in sequence, and if a common map point between a key frame and at least three other key frames is detected to exceed a set threshold value, the key frame is rejected.
3. The Tag-based six-degree-of-freedom pose estimation method of claim 1, wherein the performing camera pose estimation comprises:
if the speed matrix corresponding to the previous frame is empty, assigning the camera pose of the current frame to the camera pose of the current key frame;
if the speed matrix corresponding to the previous frame is not empty, the camera pose of the current frame is the speed matrix corresponding to the previous frame multiplied by the camera pose of the previous frame;
and the speed matrix corresponding to the previous frame is the camera pose of the previous frame multiplied by the camera pose of the previous frame.
4. The Tag-based six-degree-of-freedom attitude estimation method of claim 1, wherein the objective function is:
wherein eta represents a set of all feature matching points of the current frame and the previous frame, X is a map point set, initial values of R and t are camera poses obtained by estimation, namely a rotation matrix and a translation vector of a camera, rho represents a Huber energy function, sigma represents a covariance matrix of a key frame corresponding to the optimized frame, and X represents the covariance matrix of the key frame corresponding to the optimized frameiImage coordinates, X, representing the ith feature pointiMap point coordinates representing the ith feature point, EcProRepresenting the reprojection error between the three-dimensional points, calculated by the pose transformation estimated by the camera, and the feature points in the image after reprojection back into the image, EtagProRepresenting reprojection errors, R, of the pose transform estimated by the camera and the acquired Tag pose transform, respectively, reprojected onto the imagetagAnd ttagRespectively representing the acquired Tag rotation matrix and translation vector, and pi is represented by the following definition:
wherein (f)x,fy) Denotes the camera focal length, (c)x,cy) Representing the camera aperture, (X, Y, Z) representing world coordinate system coordinates of the spatial point;
ρ (a) is the Huber energy function:
a represents an error value and δ is a constant.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811101406.3A CN109345588B (en) | 2018-09-20 | 2018-09-20 | Tag-based six-degree-of-freedom attitude estimation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811101406.3A CN109345588B (en) | 2018-09-20 | 2018-09-20 | Tag-based six-degree-of-freedom attitude estimation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109345588A CN109345588A (en) | 2019-02-15 |
CN109345588B true CN109345588B (en) | 2021-10-15 |
Family
ID=65305834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811101406.3A Active CN109345588B (en) | 2018-09-20 | 2018-09-20 | Tag-based six-degree-of-freedom attitude estimation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109345588B (en) |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110047108B (en) * | 2019-03-07 | 2021-05-25 | 中国科学院深圳先进技术研究院 | Unmanned aerial vehicle pose determination method and device, computer equipment and storage medium |
CN109993793B (en) * | 2019-03-29 | 2021-09-07 | 北京易达图灵科技有限公司 | Visual positioning method and device |
CN112150547B (en) * | 2019-06-28 | 2024-03-12 | 北京魔门塔科技有限公司 | Method and device for determining vehicle body pose and looking around vision odometer system |
CN110349213B (en) * | 2019-06-28 | 2023-12-12 | Oppo广东移动通信有限公司 | Pose determining method and device based on depth information, medium and electronic equipment |
CN110458889A (en) * | 2019-08-09 | 2019-11-15 | 东北大学 | A kind of video camera method for relocating based on semantic road sign |
CN112444242B (en) * | 2019-08-31 | 2023-11-10 | 北京地平线机器人技术研发有限公司 | Pose optimization method and device |
CN110703188B (en) * | 2019-09-10 | 2022-03-25 | 天津大学 | Six-degree-of-freedom attitude estimation system based on RFID |
CN110962128B (en) * | 2019-12-11 | 2021-06-29 | 南方电网电力科技股份有限公司 | Substation inspection and stationing method and inspection robot control method |
CN111179342B (en) * | 2019-12-11 | 2023-11-17 | 上海非夕机器人科技有限公司 | Object pose estimation method and device, storage medium and robot |
CN113034538B (en) * | 2019-12-25 | 2023-09-05 | 杭州海康威视数字技术股份有限公司 | Pose tracking method and device of visual inertial navigation equipment and visual inertial navigation equipment |
CN113031582A (en) * | 2019-12-25 | 2021-06-25 | 北京极智嘉科技股份有限公司 | Robot, positioning method, and computer-readable storage medium |
CN111242996B (en) * | 2020-01-08 | 2021-03-16 | 郭轩 | SLAM method based on Apriltag and factor graph |
CN111667535B (en) * | 2020-06-04 | 2023-04-18 | 电子科技大学 | Six-degree-of-freedom pose estimation method for occlusion scene |
CN111667539B (en) * | 2020-06-08 | 2023-08-29 | 武汉唯理科技有限公司 | Camera calibration and plane measurement method |
CN111739071B (en) * | 2020-06-15 | 2023-09-05 | 武汉尺子科技有限公司 | Initial value-based rapid iterative registration method, medium, terminal and device |
CN111862200B (en) * | 2020-06-30 | 2023-04-28 | 同济大学 | Unmanned aerial vehicle positioning method in coal shed |
CN111735446B (en) * | 2020-07-09 | 2020-11-13 | 上海思岚科技有限公司 | Laser and visual positioning fusion method and device |
CN111623773B (en) * | 2020-07-17 | 2022-03-04 | 国汽(北京)智能网联汽车研究院有限公司 | Target positioning method and device based on fisheye vision and inertial measurement |
CN112101145B (en) * | 2020-08-28 | 2022-05-17 | 西北工业大学 | SVM classifier based pose estimation method for mobile robot |
CN112734843B (en) * | 2021-01-08 | 2023-03-21 | 河北工业大学 | Monocular 6D pose estimation method based on regular dodecahedron |
CN112734844B (en) * | 2021-01-08 | 2022-11-08 | 河北工业大学 | Monocular 6D pose estimation method based on octahedron |
CN112857215B (en) * | 2021-01-08 | 2022-02-08 | 河北工业大学 | Monocular 6D pose estimation method based on regular icosahedron |
CN113239072B (en) * | 2021-04-27 | 2024-09-06 | 华为技术有限公司 | Terminal equipment positioning method and related equipment thereof |
CN113298879B (en) * | 2021-05-26 | 2024-04-16 | 北京京东乾石科技有限公司 | Visual positioning method and device, storage medium and electronic equipment |
CN113361400B (en) * | 2021-06-04 | 2024-09-17 | 清远华奥光电仪器有限公司 | Head posture estimation method, device and storage medium |
CN113506369B (en) * | 2021-07-13 | 2024-09-06 | 南京瓦尔基里网络科技有限公司 | Method, device, electronic equipment and medium for generating map |
CN115661247B (en) * | 2022-10-28 | 2024-07-02 | 南方电网电力科技股份有限公司 | Real-time 6DoF algorithm precision measurement method and device |
CN116051630B (en) * | 2023-04-03 | 2023-06-16 | 慧医谷中医药科技(天津)股份有限公司 | High-frequency 6DoF attitude estimation method and system |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9390344B2 (en) * | 2014-01-09 | 2016-07-12 | Qualcomm Incorporated | Sensor-based camera motion detection for unconstrained slam |
CN106803261A (en) * | 2015-11-20 | 2017-06-06 | 沈阳新松机器人自动化股份有限公司 | robot relative pose estimation method |
CN105928505B (en) * | 2016-04-19 | 2019-01-29 | 深圳市神州云海智能科技有限公司 | The pose of mobile robot determines method and apparatus |
CN107564012B (en) * | 2017-08-01 | 2020-02-28 | 中国科学院自动化研究所 | Augmented reality method and device for unknown environment |
CN107328420B (en) * | 2017-08-18 | 2021-03-02 | 上海智蕙林医疗科技有限公司 | Positioning method and device |
CN107862720B (en) * | 2017-11-24 | 2020-05-22 | 北京华捷艾米科技有限公司 | Pose optimization method and pose optimization system based on multi-map fusion |
-
2018
- 2018-09-20 CN CN201811101406.3A patent/CN109345588B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109345588A (en) | 2019-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109345588B (en) | Tag-based six-degree-of-freedom attitude estimation method | |
CN110070615B (en) | Multi-camera cooperation-based panoramic vision SLAM method | |
US11830216B2 (en) | Information processing apparatus, information processing method, and storage medium | |
CN105021124B (en) | A kind of planar part three-dimensional position and normal vector computational methods based on depth map | |
CN110853075B (en) | Visual tracking positioning method based on dense point cloud and synthetic view | |
CN110853100B (en) | Structured scene vision SLAM method based on improved point-line characteristics | |
CN109993793B (en) | Visual positioning method and device | |
Muñoz-Bañón et al. | Targetless camera-LiDAR calibration in unstructured environments | |
JP2011174879A (en) | Apparatus and method of estimating position and orientation | |
CN114140527B (en) | Dynamic environment binocular vision SLAM method based on semantic segmentation | |
CN112164117A (en) | V-SLAM pose estimation method based on Kinect camera | |
CN112419497A (en) | Monocular vision-based SLAM method combining feature method and direct method | |
CN113744315B (en) | Semi-direct vision odometer based on binocular vision | |
CN116468786B (en) | Semantic SLAM method based on point-line combination and oriented to dynamic environment | |
CN114088081A (en) | Map construction method for accurate positioning based on multi-segment joint optimization | |
CN111964680A (en) | Real-time positioning method of inspection robot | |
CN111998862A (en) | Dense binocular SLAM method based on BNN | |
Bu et al. | Semi-direct tracking and mapping with RGB-D camera for MAV | |
Guan et al. | Minimal solvers for relative pose estimation of multi-camera systems using affine correspondences | |
CN108694348B (en) | Tracking registration method and device based on natural features | |
CN116894876A (en) | 6-DOF positioning method based on real-time image | |
CN114972491A (en) | Visual SLAM method, electronic device, storage medium and product | |
Leishman et al. | Robust Motion Estimation with RBG-D Cameras | |
Cheng et al. | Positioning method research for unmanned aerial vehicles based on meanshift tracking algorithm | |
Li et al. | Joint intrinsic and extrinsic lidar-camera calibration in targetless environments using plane-constrained bundle adjustment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |