CN113701766B

CN113701766B - Robot map construction method, robot positioning method and device

Info

Publication number: CN113701766B
Application number: CN202010427965.4A
Authority: CN
Inventors: 张弥; 韩松杉; 王世汉; 刘星; 胡孟宇; 赵家阳
Original assignee: Zhejiang Sineva Intelligent Technology Co ltd
Current assignee: Zhejiang Sineva Intelligent Technology Co ltd
Priority date: 2020-05-20
Filing date: 2020-05-20
Publication date: 2024-09-20
Anticipated expiration: 2040-05-20
Also published as: CN113701766A

Abstract

The invention discloses a robot map construction method, a robot positioning method and a robot positioning device, wherein key frame data can be determined by carrying out feature matching processing, state increment information determination and sliding window optimization processing on each image, closed-loop key frames corresponding to each key frame and closed-loop characteristic point pairs are determined according to the key frame data, and further when the key frames and the corresponding closed-loop key frames meet closed-loop conditions, the characteristic point data, the closed-loop characteristic point pairs and IMU data corresponding to all the key frames can be subjected to BA optimization processing, so that a map which is more in line with actual conditions, high in precision and global consistency is obtained, and an effective global map is provided for positioning of a later-stage robot. And by determining the closed-loop key frame and the closed-loop characteristic point pair, namely executing a closed-loop detection process, the characteristic points can be associated through closed-loop detection, and the re-projection error of the closed-loop points is constructed, so that the accumulated error can be effectively eliminated, and the constructed map is more accurate.

Description

Robot map construction method, robot positioning method and device

Technical Field

The present invention relates to the field of robots, and in particular, to a method for constructing a map of a robot, and a method and apparatus for positioning a robot.

Background

With the increasing demand for manufacturing flexibility, shortened and accelerated product cycle, increased labor costs, and increased demand for human safety trends, the market for global mobile robots is rapidly developing, and the technical demands are increasing, for example, autonomous Guided vehicles (VEHICLES AGV) have been widely used in Automated warehouses, factory material transfer systems, logistics pick-and-place systems, flexible assembly systems, and other intelligent transportation sites. Compared with AGVs based on magnetic stripes, reflecting plates, two-dimension codes and the like, the technical difficulty of the autonomous mobile robot (Autonomous Mobile Robot, AMR) is higher, the use scene is wider, the core technology of AMR navigation is SLAM (Simultaneous Localization AND MAPPING), namely, the robot knows an environment map and knows the real-time position of the robot, and the robot is the basis of achieving the target in a whole process without human intervention and high-efficiency intelligent movement.

In practical application, AMR may include a mapping stage, in which a map with high accuracy and global consistency is constructed mainly taking as much information as possible into consideration, and real-time performance, real-time positioning accuracy and frequency of positioning are negligible.

How to construct a map with high accuracy and global consistency is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the invention provides a method for constructing a robot map, a method and a device for positioning a robot, which are used for constructing a map with high precision and global consistency.

In a first aspect, an embodiment of the present invention provides a method for constructing a robot map, including:

According to image data and inertial measurement unit IMU data synchronously acquired by the robot in real time, determining key frame data according to a preset key frame data determining rule; wherein the key frame data comprises: pose information of the key frame and characteristic point data in the image corresponding to the key frame, wherein the characteristic point is a corner point in the image and is matched with a corner point in an adjacent frame of image;

Determining a closed-loop key frame corresponding to the key frame and a closed-loop characteristic point pair according to the key frame data; wherein the closed loop feature point pair comprises: the first characteristic points are located in the corresponding images of the key frames, and the second characteristic points are located in the corresponding images of the closed-loop key frames, and the first characteristic points are matched with image information described by the second characteristic points;

When the key frames and the corresponding closed-loop key frames meet the preset closed-loop conditions, performing first cluster adjustment (BA) optimization processing on the feature point data, the closed-loop feature point pairs and the IMU data pairs corresponding to all the key frames to obtain a map for representing the environment of the robot;

When the matching condition of angular points in a left eye image and a right eye image included in each acquired frame image and the state increment information between two acquired continuous frames of images are determined, after sliding window optimization processing is carried out on each frame image, determining the key frame according to the number of times of the sliding window optimization processing and the reprojection error of the characteristic point, and determining pose information of the key frame and the characteristic point data in the image corresponding to the key frame; the state increment information includes: position increment information, angle increment information, and speed increment information.

Optionally, in an embodiment of the present invention, the closed loop condition includes:

When determining a forward PNP relative gesture and a backward PNP relative gesture between the key frame and the corresponding closed-loop key frame according to the closed-loop feature point pair and a preset PNP algorithm, wherein the difference value between the forward PNP relative gesture and the backward PNP relative gesture is smaller than a preset first threshold value;

wherein the forward PNP relative pose comprises:

a relative gesture between the three-dimensional position information of the second feature point in the closed-loop key frame corresponding image and the two-dimensional position information of the first feature point corresponding to the key frame corresponding image;

the backward PNP relative pose comprises:

And the relative gesture between the three-dimensional position information of the first characteristic point in any key frame corresponding image and the two-dimensional position information of the second characteristic point corresponding to the closed-loop key frame corresponding image.

Optionally, in an embodiment of the present invention, when determining that the key frame and the corresponding closed-loop key frame do not meet the closed-loop condition, the method further includes:

and carrying out the first BA optimization processing on the feature point data corresponding to the key frame and the IMU data pair to obtain the map.

Optionally, in the embodiment of the present invention, the performing the first BA optimization process on the feature point data corresponding to the key frame and the IMU data pair specifically includes:

And carrying out the first BA optimization processing on the feature point data corresponding to the key frame and the IMU data pair according to the following formula:

Where k represents the kth frame key frame, B represents a list of key frames that have been pre-integrated in determining the state delta information, An observed value representing a feature point on the acquired left-eye image,Representing the observed value of the feature point on the acquired right eye image, j representing the j-th frame key frame, P _l representing the coordinates of the feature point l in the world coordinate system,Representing the pre-integration result of IMU data in determining the state delta information, χ represents the state of the sliding window that needs to be optimized,Representing the jacobian matrix corresponding to the state increment information,Representing the variance matrix of the left-eye image,A variance matrix representing the right-eye image,Representing the residual of the IMU data,Representing the residual of the left-eye image in the key frame corresponding image,And representing the residual error of the right-eye image in the key frame corresponding image.

Optionally, in the embodiment of the present invention, performing a first bundle adjustment BA optimization process on feature point data, the closed loop feature point pair, and the IMU data pair corresponding to all the key frames specifically includes:

and performing first Bundling Adjustment (BA) optimization processing on the feature point data, the closed-loop feature point pairs and the IMU data pairs corresponding to all the key frames according to the following formula:

where k represents a kth frame key frame, B represents a list of key frames on which pre-integration was performed in determining the state delta information, An observed value representing a feature point on the acquired left-eye image,Representing the observed value of the feature point on the acquired right eye image, j representing the j-th frame key frame, P _l representing the coordinates of the feature point l in the world coordinate system,Representing the pre-integration result of IMU data in determining the state delta information, χ represents the state of the sliding window that needs to be optimized,Representing the jacobian matrix corresponding to the state increment information,Representing the variance matrix of the left-eye image,A variance matrix representing a right eye image, C _loopL representing a list of left eye closed loop frame images that accept closed loops, C _loopR representing a list of right eye closed loop frame images that accept closed loops,Representing the residual of the IMU data,Representing the residual of the left-eye image in the key frame corresponding image,Representing the residual of the right-eye image in the key frame corresponding image,Representing the residual error of the left-eye image in the image of the closed-loop frame corresponding to the key frame,And representing the residual error of the right-eye image in the image of the closed-loop frame corresponding to the key frame.

Optionally, in an embodiment of the present invention, determining, according to the key frame data, a closed-loop key frame corresponding to any one of the key frames and a closed-loop feature point pair, includes:

Extracting feature point data and descriptor data from the key frames, and constructing a word bag data set;

According to the bag-of-word data set, calculating the similarity between a first key frame and each second key frame, and determining the second key frame corresponding to the second key frame with the similarity larger than a preset second threshold value as the closed-loop key frame corresponding to the first key frame; wherein, the first key frame is: any one of the key frames, the second key frame is: the key frames except the first key frame in all the key frames;

And determining matched characteristic point pairs in the characteristic points of the corresponding image of the first key frame and the characteristic points of the corresponding image of the closed-loop key frame according to the bag-of-word data set, calculating the Hamming distance between the matched characteristic point pairs, and determining the matched characteristic point pairs with the Hamming distance smaller than a preset third threshold value as the closed-loop characteristic point pairs.

Optionally, in an embodiment of the present invention, the process of sliding window optimization includes:

Performing second BA optimization processing on the N+1 frame images and IMU data corresponding to the N+1 frame images according to the determined matching condition of the corner points in each frame image and the state increment information between two continuous frame images, and determining pose information of the N+1 frame images and coordinate information of the corner points in the N+1 frame images; n is an integer greater than 0;

When the fact that the (N+1) th frame of image is required to be input into a preset sliding window is determined according to pose information of the (N+1) th frame of image and coordinate information of corner points in the (N+1) th frame of image, determining an ith frame of image with earliest acquisition time in the current sliding window as a frame of image to be determined, and removing the ith frame of image from the sliding window; wherein the sliding window comprises N frames of images, and i is an integer greater than 0;

determining the key frame according to the times of the sliding window optimization processing and the reprojection errors of the feature points, wherein the key frame specifically comprises the following steps:

and determining the to-be-determined frame corresponding to the number of times of the sliding window optimization processing not smaller than a first preset value and the included reprojection error of the feature point not larger than a second preset value as the key frame.

Optionally, in the embodiment of the present invention, according to the determined matching condition of the corner points in each frame of image and the state increment information between two continuous frames of images, the second BA optimization process is performed on the n+1 frame of image and the IMU data corresponding to the n+1 frame of image, and specifically includes:

The following formula is adopted, and according to the determined matching condition of the corner points in each frame of image and the state increment information between two continuous frames of images, the second BA optimization processing is carried out on the N+1 frame of image and IMU data corresponding to the N+1 frame of image:

Wherein, Representing the residual of the IMU data,Representing the residual of the image data in question,Representing the pre-integration result of the IMU data in determining the state delta information,The image data is represented by a representation of the image data,Representing the jacobian matrix corresponding to the state increment information,And (3) representing a variance matrix corresponding to the state increment information, wherein χ represents a state of a sliding window to be optimized, r _p-H_p χ represents a priori residual error of the marginalized undetermined frame, k represents a kth frame, B represents a list of images subjected to pre-integration when the state increment information is determined, l represents a characteristic point l, j represents a jth frame, and C represents a list of the acquired images.

In a second aspect, an embodiment of the present invention provides a positioning method for a robot, including:

Acquiring a pre-constructed map, wherein the map is constructed by adopting the construction method provided by the embodiment of the invention;

according to image data and inertial measurement unit IMU data synchronously acquired by the robot in real time, determining key frame data according to a preset key frame data determining rule; wherein the key frame data comprises: pose information of a key frame and characteristic point data in an image corresponding to the key frame, wherein the characteristic point is a corner point in the image and is matched with a corner point in an adjacent frame of image;

Determining a closed-loop key frame corresponding to any key frame according to the map and the key frame data;

when the key frame and the corresponding closed-loop key frame meet the preset closed-loop condition, positioning the robot after performing pose optimization processing on the closed-loop key frame;

When the matching condition of angular points in a left eye image and a right eye image included in each acquired frame image and the state increment information between two acquired continuous frames of images are determined, after sliding window optimization processing is carried out on each frame image, pose information of the key frame and the feature point data in the corresponding image of the key frame are determined; the state increment information includes: position increment information, angle increment information, and speed increment information.

if the closed-loop key frame corresponding image is one of the images for constructing the map, determining a first PNP relative gesture between the key frame and the corresponding closed-loop key frame according to the closed-loop feature point pair and a preset PNP algorithm, wherein the first PNP relative gesture is smaller than a preset fourth threshold;

Or if the image corresponding to the closed-loop key frame is the image acquired by the robot after the map is acquired, determining a second PNP relative gesture between the key frame and the corresponding closed-loop key frame according to the closed-loop feature point pair and the PNP algorithm, wherein the second PNP relative gesture is smaller than a preset fifth threshold.

Optionally, in the embodiment of the present invention, performing pose optimization processing on the closed-loop keyframe specifically includes:

And carrying out pose chart optimization processing on all the determined closed-loop key frames by adopting the following formula:

Wherein S represents a key frame list with sliding window constraints, L represents a key frame list with closed-loop constraints, i represents an i-th frame key frame, j represents a j-th frame key frame, r _i,j represents a residual error between the i-th frame key frame and the j-th frame key frame, ρ (-) represents a robust kernel function, P represents the pose of all optimized key frames, and ψ represents angle information of all optimized key frames.

In a third aspect, an embodiment of the present invention provides a robot map building apparatus, including:

the first unit is used for determining key frame data according to the image data and the inertial measurement unit IMU data which are synchronously acquired by the robot in real time and a preset key frame data determining rule; wherein the key frame data comprises: pose information of the key frame and characteristic point data in the image corresponding to the key frame, wherein the characteristic point is a corner point in the image and is matched with a corner point in an adjacent frame of image;

A second unit, configured to determine a closed-loop key frame corresponding to the key frame and a closed-loop feature point pair according to the key frame data; wherein the closed loop feature point pair comprises: the first characteristic point is positioned in the image corresponding to the key frame, and the second characteristic point is positioned in the image corresponding to the closed-loop key frame, and the first characteristic point is matched with the described image information of the second characteristic point;

A third unit, configured to, when it is determined that the key frame and the corresponding closed-loop key frame meet a preset closed-loop condition, obtain a map for representing an environment where the robot is located after performing a first cluster adjustment BA optimization process on feature point data corresponding to the key frames, the closed-loop feature point pairs, and the IMU data pairs;

In a fourth aspect, an embodiment of the present invention provides a positioning device for a robot, including:

a first unit, configured to obtain a pre-constructed map, where the map is constructed by using the construction method provided by the embodiment of the present invention;

The second unit is used for determining key frame data according to the image data and the inertial measurement unit IMU data which are synchronously acquired by the robot in real time and a preset key frame data determining rule; wherein the key frame data comprises: pose information of a key frame and characteristic point data in an image corresponding to the key frame, wherein the characteristic point is a corner point in the image and is matched with a corner point in an adjacent frame of image;

a third unit, configured to determine a closed-loop keyframe corresponding to any keyframe according to the map and the keyframe data;

A fourth unit, configured to, when it is determined that the key frame and the corresponding closed-loop key frame meet a preset closed-loop condition, perform pose optimization on the closed-loop key frame, and then position the robot;

In a fifth aspect, an embodiment of the present invention provides a robot, including: the construction device, the positioning device and the image acquisition device provided by the embodiment of the invention;

Wherein, the image acquisition device is used for:

And acquiring images used for representing the environment of the robot in real time, and sending the acquired images to the construction device and the positioning device so that the construction device constructs a map according to the acquired images, and the positioning device performs positioning according to the acquired images and the constructed map.

The invention has the following beneficial effects:

According to the method and the device for constructing the robot map, which are provided by the embodiment of the invention, when key frame data are determined, when the matching condition of angular points in a left eye image and a right eye image included in each acquired frame image and the state increment information between two acquired continuous frame images are determined, and after each frame image is subjected to sliding window optimization processing, the key frame is determined according to the number of times of sliding window optimization processing and the reprojection error of characteristic points, pose information of the key frame and the characteristic point data in the image corresponding to the key frame are determined, so that a closed-loop key frame corresponding to any key frame and a closed-loop characteristic point pair are determined according to the key frame data, and a map for representing the environment of the robot can be obtained after the first BA optimization processing is performed on the characteristic point data corresponding to the key frame, the closed-loop characteristic point pair and the IMU data; therefore, the obtained map is more in line with the actual situation, so that a map with high precision and global consistency can be generated, and an effective global map is provided for the positioning of the later-stage robot. And by determining the closed-loop key frame and the closed-loop characteristic point pair, namely executing a closed-loop detection process, the characteristic points can be associated through closed-loop detection, and the re-projection error of the closed-loop points is constructed, so that the accumulated error can be effectively eliminated, and the constructed map is more accurate.

Drawings

Fig. 1 is a flowchart of a method for constructing a robot map according to an embodiment of the present invention;

FIG. 2 is a flowchart of another method for constructing a robot map according to an embodiment of the present invention;

fig. 3 is a flowchart of a positioning method of a robot according to an embodiment of the present invention;

Fig. 4 is a schematic structural diagram of a robot map construction device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a positioning device of a robot according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a robot according to an embodiment of the present invention.

Detailed Description

Specific embodiments of a method for constructing a map of a robot, a method for positioning a robot, and a device according to embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the described embodiments are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The inventors have found in research that currently, the main SLAM techniques may include: laser SLAM and vision SLAM, wherein laser SLAM is mature and widely used, the advantages of vision SLAM compared to laser SLAM are mainly manifested in two aspects: on one hand, the cost of the camera is far lower than that of the laser sensor; on the other hand, compared with 2D laser SLAM, the visual SLAM can obtain a pose of 6DOF, so that the visual SLAM technology with relatively low cost is rapidly developed in AMR.

In practical application, AMR mainly includes a mapping stage and a positioning stage (or may also be referred to as a map multiplexing stage), and targets of the two stages are different, so that requirements for the visual SLAM technology are different. In the mapping stage, a map with high precision and global consistency is mainly constructed by considering as much information as possible, and the real-time property, the real-time positioning precision and the frequency of positioning can be ignored; in the positioning stage, AMR mainly considers that after offline global map loading, the global positioning precision of real-time high precision and high frequency is obtained, and generally, a high-precision map is not required to be reconstructed any more, and the pose of the current track at the past moment can be ignored.

However, when the visual SLAM technology is applied to AMR at present, the requirements of two stages of mapping and positioning cannot be considered, so that the AMR cannot be effectively controlled.

Based on the above, the embodiment of the invention provides a method for constructing and positioning a map of a robot, which is used for constructing the map with high precision and global consistency, and can meet the requirements of two stages of map construction and positioning when a visual SLAM technology is adopted in the subsequent positioning process of the robot.

Specifically, an embodiment of the present invention provides a method for constructing a robot map, as shown in fig. 1, may include:

S101, determining key frame data according to preset key frame data determining rules according to image data and inertial measurement unit IMU data synchronously acquired by a robot in real time; wherein the key frame data includes: pose information of the key frame and characteristic point data in the image corresponding to the key frame, wherein the characteristic point is a corner point in the image and is matched with a corner point in an adjacent frame of image;

Wherein the key frame data determining rule includes: when the matching condition of angular points in a left eye image and a right eye image included in each acquired frame image and the state increment information between two acquired continuous frames of images are determined, after sliding window optimization processing is carried out on each frame image, determining a key frame according to the number of times of the sliding window optimization processing and the reprojection error of characteristic points, and determining pose information of the key frame and characteristic point data in the image corresponding to the key frame; the state increment information includes: position increment information, angle increment information, and speed increment information.

Specifically, in determining key frame data, the following procedure may be adopted:

process 1: acquiring image data and IMU data synchronously acquired by a robot in real time;

The image data may be understood as a multi-frame image, the IMU data is also frame by frame, and the IMU data of one frame may be understood as: IMU data corresponding to a frame of image.

Also, as shown in fig. 2, each of the acquired frame images may include a left-eye image (e.g., "left-eye image" in fig. 2) and a right-eye image (e.g., "right-eye image" in fig. 2), and the left-eye image and the right-eye image in one frame image have the same time stamp information.

Process 2: according to the matching condition of corner points in the left eye image and the right eye image included in each acquired frame image, determining characteristic point matching information in each frame image;

The feature point matching information may include: the number of the feature points and the position information of the feature points.

In determining the feature point matching information in each frame image, the following steps may be adopted:

Step 2.1, as shown in fig. 2, performing feature point matching processing (such as "feature matching" in fig. 2) on a left-eye image in each frame image and a left-eye image in an adjacent previous frame image, and performing feature point matching processing (such as "feature matching" in fig. 2) on a right-eye image in each frame image and a right-eye image in an adjacent previous frame image, to obtain feature points in the left-eye image in each frame image and feature points in the right-eye image in each frame image;

step 2.2, performing feature point supplementing treatment on the left-eye image or the right-eye image;

Illustratively, this step 2.2 is performed for the purpose of:

Taking the left eye image as an example, for the ith frame image, before executing the step 2.1, it is assumed that the number of corner points in the left eye image in the ith frame image is 150, and after executing the step 2.1, it is assumed that the determined number of feature points in the left eye image in the ith frame image is 100;

Then, for the i+1 frame image, if the number of feature points in the left eye image in the i+1 frame image determined after the step 2.1 is executed is likely to be less than 100, and if 50 is assumed, the number of feature points in the left eye image in the i+2 frame image determined after the feature point matching process is less than 50 for the i+2 frame image, when the feature point matching process is performed on the i frame image and the left eye image in the i+1 frame image; the number of determined feature points in the subsequently acquired images is smaller, and as the acquired images are increased, the number of determined feature points may be zero, which is obviously inconsistent with reality.

Therefore, in order to avoid the above situation, step 2.2 needs to be performed, that is, the feature points in the left eye image in the i-th frame image after performing step 2.1 may be supplemented on the basis that the determined feature points in the left eye image in the i-th frame image are 100, and it is assumed that 50 feature points may be supplemented, that is, 150 feature points may be supplemented, so as to ensure that a sufficient number of feature points may be determined in the subsequently acquired images.

The additional feature points may be re-extracted from the left eye image in the i-th frame image after the step 2.1 is performed, and the extraction process may be a prior art, so long as the feature points can be extracted, and the extraction process is not limited herein.

Step 2.3, as shown in fig. 2, performing feature point matching processing (such as "feature matching" in fig. 2) on the left-eye image and the right-eye image of each frame of image after the step 2.1 and the step 2.2 are performed, so as to obtain a matching relationship of feature points in the left-eye image and the right-eye image of each frame of image;

and 2.4, determining the matching information of the characteristic points in each frame of image according to the obtained matching relation, and outputting the matching information to a sliding window.

Process 3: determining state increment information between two acquired continuous frames of images;

Wherein the state increment information includes: position increment information, angle increment information, and speed increment information.

Illustratively, the state increment information represents a relationship between two frames of images, not between two frames of IMU data.

Alternatively, in specifically performing the procedure 3, the following steps may be specifically adopted:

step 3.1, determining inertial measurement unit (Inertial measurement unit, IMU) data (such as an IMU in fig. 2) between two acquired continuous frames of images, and storing the data in a preset cache list;

step 3.2, aligning the integration starting time and the integration ending time of the IMU data with the time information of a corresponding frame of image;

step 3.3, performing iterative integration processing (such as 'IMU pre-integration' in fig. 2) on the aligned IMU data according to a median method to obtain state increment information between two continuous frames of images;

And 3.4, determining a corresponding Jacobian matrix and a variance matrix according to the determined state increment information, and outputting the Jacobian matrix and the variance matrix to a sliding window.

Specifically, the specific implementation process of the steps 3.1 to 3.4 may be referred to in the prior art, and is not specifically limited herein.

Process 4: carrying out sliding window optimization treatment on each frame of image, and screening out frames to be determined from all acquired frame of images;

wherein, when screening the frame to be defined, the following factors may be specifically considered:

the number of feature point matches from frame to frame;

average disparity from frame to frame;

acquisition time interval from frame to frame;

The number of newly added feature points from frame to frame.

In the process 4, when the frame to be determined is screened, the characteristics of rapidness, robustness and the like in the subsequent sliding window processing process are required to be ensured so as to improve the effectiveness of the sliding window processing result.

Process 5: according to the number of times of sliding window optimization processing and the reprojection error of the feature points, determining a key frame from the frames to be determined, and determining pose information of the key frame and feature point data in the key frame, such as 'sliding window optimization processing' in fig. 2, and judging whether the key frame is the key frame.

Illustratively, the key frame data referred to in process 5 can be understood as: and data corresponding to the important images are found from the acquired multi-frame images.

Wherein for sliding windows, an marginalized old frame process may be employed.

For example, assuming that the sliding window contains K1 frame images to be determined, and determining that one frame of image to be determined with earliest acquisition time is an ith frame of image to be determined according to the acquisition sequence of each frame of image to be determined; if the robot acquires a new frame of image (i+a frame of image is assumed), and after the processes 1 to 4, when it is determined that the i+a frame of image needs to be input into the sliding window, the i frame of image can be determined as the frame of image to be determined.

Therefore, K1 frame to-be-determined frame images are always contained in the sliding window, and the images in the sliding window are the latest to-be-determined frame images, so that the follow-up constructed map is more accurate; in addition, the calculation amount can be reduced, and the calculation cost can be reduced.

Optionally, in the sliding window, the state of the K1 frame image to be determined and the K2 feature points included in the K1 frame image to be determined, that is, the state that the sliding window needs to be optimized, may be expressed as:

Wherein χ _K1 represents pose information of the K1 th key frame, w represents a world coordinate system, Representing the displacement of the ith keyframe in the world coordinate system,Representing the speed information of the ith keyframe in the world coordinate system,Indicating the angular information (or referred to as rotation information) of the ith keyframe in the world coordinate system, b _a indicating the bias for acceleration, b _g indicating the bias for gyroscopes,Represents an extrinsic vector, that is, the conversion between the coordinate system of the image acquisition device (such as a camera) and the coordinate system of the IMU, lambda _K2 represents the kth 2 feature point,The position conversion relation of the coordinate system of the left-eye camera relative to the IMU coordinate system is expressed,Representing the relative rotation relation of the coordinate system of the left-eye camera relative to the IMU coordinate system,The position conversion relation of the coordinate system of the right-eye camera relative to the IMU coordinate system is expressed,And the relative rotation relation of the coordinate system of the right-eye camera relative to the coordinate system of the IMU is shown.

Optionally, in the embodiment of the present invention, a sliding window optimization process specifically includes:

according to the determined matching condition of the angular points in each frame of image and the state increment information between two continuous frames of images, performing second BA optimization processing on the N+1 frames of images and IMU data corresponding to the N+1 frames of images, and determining pose information of the N+1 frames of images and coordinate information of the angular points in the N+1 frames of images; n is an integer greater than 0;

When the fact that the (N+1) th frame of image is required to be input into a preset sliding window is determined according to pose information of the (N+1) th frame of image and coordinate information of corner points in the (N+1) th frame of image, the (i) th frame of image with the earliest acquisition time in the current sliding window is determined to be the image to be determined, and the image to be determined is removed from the sliding window; wherein the sliding window comprises N frames of images, and i is an integer greater than 0;

therefore, according to the number of times of the sliding window optimization processing and the reprojection error of the feature points, the key frame is determined, which specifically comprises:

And determining the undetermined frames corresponding to the number of times of sliding window optimization processing not smaller than a first preset value and the included characteristic points with the reprojection errors not larger than a second preset value as key frames.

According to pose information of the n+1 frame image and coordinate information of corner points in the n+1 frame image, determining that the n+1 frame image needs to be input into a preset sliding window can be specifically understood as follows:

If the pose information of the (n+1) -th frame image and the (N) -th frame image are relatively close, the pose of the (n+1) -th frame image and the pose of the (N) -th frame image are not greatly changed, and at the moment, the (n+1) -th frame image can be not required to be input into a sliding window;

If the pose information of the n+1st frame image and the N frame image has a large difference, which indicates that the pose of the n+1st frame image and the N frame image has a large change, the n+1st frame image needs to be input into the sliding window.

Since the number of images included in the sliding window is fixed, if it is determined that the n+1st frame image is input into the sliding window, the frame image (e.g., the i frame image) input earliest in the sliding window needs to be deleted from the sliding window, and at this time, the i frame image may be determined as the frame image to be determined.

And, taking 10 frames of images included in the sliding window as an example, then:

The second BA optimization processing can be performed on the 1 st frame image to the 11 th frame image, then, when the fact that the 11 th frame image does not need to be input into the sliding window is determined, and the 1 st frame image to the 10 th frame image can be input into the sliding window, the first sliding window optimization processing process is completed for the 1 st frame image in the sliding window;

performing second BA optimization processing on the 1 st frame image to the 10 th frame image and the 12 th frame image, then determining that the 12 th frame image is not required to be input into the sliding window, wherein the images included in the sliding window are unchanged, and the second sliding window optimization processing process is completed for the 1 st frame image in the sliding window;

Performing second BA optimization processing on the 1 st to 10 th frame images and the 13 th frame image, then determining that the 13 th frame image is required to be input into a sliding window, changing the images included in the sliding window into the 2 nd to 10 th frame images and the 13 th frame image, and removing the 1 st frame image from the sliding window; at this time, for the 1 st frame image in the sliding window, the third sliding window optimizing process is completed;

Therefore, for the 1 st frame image, the sliding window optimization processing is performed three times, and the 1 st frame image is the image to be determined.

If the first preset value is assumed to be 3, and it is determined that the re-projection error of the feature points included in the 1 st frame image is not greater than the second preset value, the 1 st frame image may be determined to be a key frame image.

Of course, specific values of the first preset value and the second preset value may be set according to actual situations, and are not limited herein.

Therefore, more accurate key frames can be selected according to the processing times and the reprojection errors of the sliding window optimization processing, so that a map with high precision and global consistency can be constructed in the subsequent back-end processing process.

Optionally, in the embodiment of the present invention, according to the determined matching condition of the corner points in each frame of image and the state increment information between two continuous frames of images, performing a second BA optimization process on the n+1 frame of image and IMU data corresponding to the n+1 frame of image, including:

The following formula is adopted, and according to the determined matching condition of the corner points in each frame of image and the state increment information between two continuous frames of images, second BA optimization processing is carried out on the N+1 frames of images and IMU data corresponding to the N+1 frames of images:

Wherein, Representing the residual of the IMU data,Representing the residual of the image data,Representing the pre-integration result of IMU data in determining the state increment information,The image data is represented by a representation of the image data,A jacobian matrix corresponding to the state increment information is represented,The variance matrix corresponding to the state increment information is represented, χ represents the state of the sliding window to be optimized, r _p-H_p χ represents the prior residual error of the marginalized undetermined frame, k represents the kth frame, B represents the list of images which are subjected to pre-integration when the state increment information is determined, l represents the characteristic point l, j represents the jth frame, and C represents the list of the acquired images.

Illustratively, the "residual error of IMU data" mentioned in the foregoing may be understood as: the IMU pre-integration of the observations results in error with the state to be optimized.

That is, in the second BA optimization process (which may also be referred to as a partial BA optimization process), optimization is performed in conjunction with a nonlinear least squares algorithm to determine pose information of the key frame and feature point data (e.g., without limitation, depth information of feature points) in the corresponding image of the key frame.

The method includes that when feature points are selected from the images corresponding to the key frames, after sliding window optimization processing is performed on frames to be fixed, the feature points can be selected from 3D corner points in the sliding window, and the selection principle can be as follows: the error of the average re-projection of each key frame image in the sliding window is small enough, the number of observed times is enough, and the depth estimation is enough to converge. Of course, the feature points may be determined and selected according to other principles, so long as the feature points can be selected so as to facilitate the subsequent map construction, which is not particularly limited herein.

S102, determining a closed-loop key frame corresponding to any key frame and a closed-loop characteristic point pair according to key frame data; wherein the closed loop feature point pair comprises: the first characteristic point is positioned in the image corresponding to the key frame, and the second characteristic point is positioned in the image corresponding to the key frame corresponding to the closed loop, and the first characteristic point is matched with the image information described by the second characteristic point;

In one point, the key frames and the key frame images are in one-to-one correspondence, and the key frames and the closed-loop key frames mentioned in the step S102 can be understood as key frame data and closed-loop key frame data.

The process of determining the closed-loop key frame and the closed-loop feature point pair in S102 may be understood as closed-loop detection, and the specific determination process may be referred to below.

And S103, when the key frames and the corresponding closed-loop key frames meet the preset closed-loop conditions, performing first BA optimization processing on the feature point data, the closed-loop feature point pairs and the IMU data corresponding to all the key frames to obtain a map for representing the environment where the robot is located.

As shown in fig. 2, the "global BA optimization based on the extrinsic evaluation" is the first BA optimization process, and after the optimization process, a map (e.g., "get map" in fig. 2) may be obtained, and the obtained map may be, but is not limited to, a three-dimensional point cloud map.

In the embodiment of the invention, firstly, when the matching condition of angular points in a left eye image and a right eye image included in each acquired frame image and the state increment information between two acquired continuous frame images are determined, and after each frame image is subjected to sliding window optimization processing, a key frame is determined from a frame to be determined according to the number of times of sliding window optimization processing and the reprojection error of characteristic points, and pose information of the key frame and characteristic point data in the image corresponding to the key frame are determined, so that a closed-loop key frame and a closed-loop characteristic point pair corresponding to any key frame are determined according to the key frame data, and then, after the characteristic point data, the closed-loop characteristic point pair and IMU data corresponding to the key frame are subjected to first BA optimization processing, a map for representing the environment where a robot is located is obtained; therefore, the obtained map is more in line with the actual situation, so that a map with high precision and global consistency can be generated, and an effective global map is provided for the positioning of the later-stage robot.

And secondly, determining a closed-loop key frame and a closed-loop characteristic point pair, namely executing a closed-loop detection process, associating characteristic points through closed-loop detection, and constructing a re-projection error of the closed-loop points, so that accumulated errors can be effectively eliminated, and the constructed map is more accurate.

To illustrate, the above-mentioned processing procedure of S101 may be understood as a front-end processing procedure (e.g., a procedure within a dashed box 1 in fig. 2), and the above-mentioned processing procedures of S102 and S103 may be understood as a back-end processing procedure (e.g., a procedure within a dashed box 2 in fig. 2), that is, a process of determining key frame data through the front-end processing, and then transmitting the key frame data to the back-end, where a map is constructed through a process of processing the key frame data, and a powerful binocular vision odometer optimized based on a tightly coupled sliding window is combined with a first BA optimization process of the back-end, thereby constructing a map with high accuracy and global consistency.

Optionally, in the embodiment of the present invention, determining, according to the key frame data, a closed-loop key frame corresponding to the key frame and a closed-loop feature point pair, which specifically includes the following procedures:

extracting feature point data and descriptor data from a key frame, and constructing a word bag data set;

The key frame is determined through the step S101, and the key frame data determined in the step S101 includes not only pose information of the key frame and feature point data in an image corresponding to the key frame, but also three-dimensional position information of feature points and matching information of the feature points in a sliding window, so that a closed-loop key frame and a closed-loop feature point pair can be determined subsequently.

In extracting the feature point data and the descriptor data, the feature point data may be extracted based on the existing fast algorithm, and the descriptor data may be extracted based on the Brief algorithm, but the feature point data and the descriptor data may be extracted based on other algorithms, which is not limited thereto.

Step 2, calculating the similarity between the first key frame and each second key frame according to the word bag data set, and determining the second key frame with the similarity larger than a preset second threshold value as a closed-loop key frame corresponding to the first key frame; wherein, the first key frame is: any key frame, the second key frame is: key frames except the first key frame in all key frames;

the setting of the second threshold may be set according to actual needs, and is not limited herein.

In determining the closed-loop key frame, the determination may be performed based on the bag-of-word data set of DBoW model, but the present invention is not limited to this, and the model based on the closed-loop key frame may be determined.

And 3, determining matched characteristic point pairs in the characteristic points of the first key frame corresponding image and the characteristic points of the corresponding closed-loop key frame corresponding image according to the word bag data set, calculating the Hamming distance between the matched characteristic point pairs, and determining the matched characteristic point pairs with the Hamming distance smaller than a preset third threshold value as closed-loop characteristic point pairs.

The setting of the third threshold may be set according to actual needs, and is not limited herein.

Moreover, when the hamming distance between the matched feature point pairs is calculated, the calculation can be performed based on the Brief descriptor, and the specific calculation process can be seen in the prior art, which is not described in detail herein.

Therefore, the closed-loop key frame corresponding to any key frame and the closed-loop characteristic point pair can be determined through the process, so that the follow-up process is favorably executed, and the effective construction of the map is ensured.

And carrying out first BA optimization processing on the feature point data corresponding to the key frame and the IMU data pair to obtain a map.

Therefore, when the closed-loop condition is not met, only the characteristic point data and the IMU data corresponding to the key frames are subjected to first BA processing, so that the optimization processing is realized according to the conditions, the accuracy of the constructed map can be improved, the operation amount of equipment can be reduced, and the construction efficiency of the map is improved.

Specifically, in an embodiment of the present invention, the closed loop condition includes:

when determining a forward PNP relative gesture and a backward PNP relative gesture between the key frame and a corresponding closed-loop key frame according to the closed-loop feature point pair and a preset PNP algorithm, wherein the difference value between the forward PNP relative gesture and the backward PNP relative gesture is smaller than a preset first threshold value;

Wherein the forward PNP relative pose comprises:

the relative gesture between the three-dimensional position information of the second characteristic point in the corresponding image of the closed-loop key frame and the two-dimensional position information of the corresponding first characteristic point in the corresponding image of the corresponding key frame;

the backward PNP relative pose includes:

And the relative gesture between the three-dimensional position information of the first characteristic point in the image corresponding to any key frame and the two-dimensional position information of the second characteristic point in the image corresponding to the closed-loop key frame.

The feature points in the closed-loop key frame corresponding image are all called second feature points, and the feature points in the key frame corresponding image are all called first feature points, so that the feature points in the key frame corresponding image are conveniently distinguished from the feature points in the closed-loop key frame corresponding image.

Also, in calculating the forward PNP relative pose and the backward PNP relative pose, specific calculation procedures can be found in the prior art, and will not be described in detail herein.

The setting of the first threshold may be set according to actual needs, and is not limited herein.

In this way, through PNP RANSAC algorithm, it can be determined whether the key frame and the corresponding closed-loop key frame accept the closed loop (such as "closed-loop detection" in fig. 2), if so, the closed-loop feature point pair can be considered as the same corner point, the matched feature point can be used for the subsequent first BA optimization processing, and the re-projection error (such as "re-projection error of the closed-loop point in the closed-loop key frame" in fig. 2) is calculated, so that the construction of the map is facilitated; and the mismatching in closed loop detection can be eliminated, so that the accuracy of the constructed map is improved.

Specifically, in the embodiment of the present invention, performing a first bundle adjustment BA optimization process on feature point data, closed loop feature point pairs, and IMU data pairs corresponding to all key frames specifically includes:

and carrying out first bundling adjustment BA optimization processing on the feature point data, the closed loop feature point pairs and the IMU data pairs corresponding to all the key frames according to the following formula:

Where k represents a kth frame key frame, B represents a list of key frames that have been pre-integrated in determining the state delta information, An observed value representing a feature point on the acquired left-eye image,Representing the observed value of the feature point on the acquired right eye image, j representing the j-th frame key frame, P _l representing the coordinates of the feature point l in the world coordinate system,Representing the pre-integration result of IMU data in determining the state delta information, χ represents the state of the sliding window that needs to be optimized,A jacobian matrix corresponding to the state increment information is represented,Representing the variance matrix of the left-eye image,A variance matrix representing a right eye image, C _loopL representing a list of left eye closed loop frame images that accept closed loops, C _loopR representing a list of right eye closed loop frame images that accept closed loops,Representing the residual of the IMU data,Representing the residual of the left-eye image in the key frame corresponding image,Representing the residual of the right-eye image in the key frame corresponding image,Representing the residual of the left-eye image in the image of the closed-loop frame to which the key frame corresponds,And representing the residual error of the right-eye image in the image of the closed-loop frame corresponding to the key frame.

That is, when the first bundle adjustment BA optimization process is performed on the feature point data, the closed-loop feature point pair, and the IMU data pair corresponding to all the key frames:

1. Taking instability of the external parameters estimated in front-end processing into consideration, adding external parameters (namely, a constant conversion relation exists between the IMU and the camera) optimization in the first BA optimization processing;

2. At the same time, in order to further utilize more information to improve the drawing precision, IMU constraint (or IMU time domain constraint) is added in the first BA optimization process )；

3. The key frames obtained through the front-end processing are all used for performing first BA optimization processing so as to construct a high-precision map.

In the first BA optimization process, five residual errors are mainly considered, and the residual errors are respectively:

The key frame corresponds to three-dimensional coordinate information of the feature points observed by the image, and the projection residual error of the left-eye image (namely, the error between the key frame and the real observation result of the left-eye image) and the projection residual error of the right-eye image are shown as 'the re-projection error of the observed feature points' in fig. 2;

The projection residual of the left eye image and the projection residual of the right eye image in the corresponding images of the closed-loop key frames are the closed-loop points (namely, in the case of receiving the closed loops (particularly referring to the description above), such as the "reprojection error of the closed-loop points in the closed-loop key frames" in fig. 2;

IMU constraint residuals from frame to frame, such as "IMU constraint from frame to frame" in fig. 2;

The IMU constraint residual error between frames is mainly used for embodying time sequence constraint and can be used for constraining the scale.

And because the distance time between key frames in the back-end processing process is longer, the pose difference in the sliding window can be used for approximating IMU constraint residual errors.

In addition, when the first cluster adjustment BA optimization is performed on the key frame, the closed-loop feature point pair, and the IMU data pair, it has been determined that the key frame and the corresponding closed-loop key frame satisfy the preset closed-loop condition, that is, the key frame and the corresponding closed-loop key frame accept closed loops, so that the five residual errors need to be considered when the first BA optimization is performed.

Optionally, in the embodiment of the present invention, performing a first BA optimization process on feature point data corresponding to a keyframe and IMU data pairs specifically includes:

And carrying out first BA optimization processing on the feature point data corresponding to the key frame and the IMU data pair according to the following formula:

Where k represents a kth frame key frame, B represents a list of key frames that have been pre-integrated in determining the state delta information, An observed value representing a feature point on the acquired left-eye image,Representing the observed value of the feature point on the acquired right eye image, j representing the j-th frame key frame, P _l representing the coordinates of the feature point l in the world coordinate system,Representing the pre-integration result of IMU data in determining the state delta information, χ represents the state of the sliding window that needs to be optimized,A jacobian matrix corresponding to the state increment information is represented,Representing the variance matrix of the left-eye image,A variance matrix representing the right-eye image,Representing the residual of the IMU data,Representing the residual of the left-eye image in the key frame corresponding image,Representing the residual of the right-eye image in the key frame corresponding image.

When the first BA optimization is performed on the feature point data corresponding to the key frame and the IMU data pair, it is judged that the key frame and the corresponding closed-loop key frame do not meet the preset closed-loop condition, that is, the key frame and the corresponding closed-loop key frame do not accept closed loops, so that only the following three residual errors are considered when the first BA optimization is performed, wherein the residual errors are respectively:

The key frames correspond to three-dimensional coordinate information of feature points observed by the image, and are projection residual errors of the left-eye image (namely errors between the left-eye image and a true observation result of the left-eye image) and projection residual errors of the right-eye image;

IMU constraint residuals from frame to frame.

And, no matter which data is subjected to the first BA optimization process, the first BA optimization process is configured to be optimized by a nonlinear least square method, and χ mentioned in the above formula can be expressed as:

Wherein x _k represents pose information of the kth key frame, w represents a world coordinate system, Representing the displacement of the kth keyframe in the world coordinate system,Represents angle information (or referred to as rotation amount information) of the kth key frame in the world coordinate system,Representing an extrinsic vector, i.e. the amount of conversion between the coordinate system of the image acquisition device (e.g. camera) with respect to the IMU coordinate system, P _m represents the position of the mth feature point in the world coordinate system,The position conversion relation of the coordinate system of the left-eye camera relative to the IMU coordinate system is expressed,Representing the relative rotation relation of the coordinate system of the left-eye camera relative to the IMU coordinate system,The position conversion relation of the coordinate system of the right-eye camera relative to the IMU coordinate system is expressed,And the relative rotation relation of the coordinate system of the right-eye camera relative to the coordinate system of the IMU is shown.

Optionally, in an embodiment of the present invention, after obtaining a map for representing an environment where the robot is located, the method further includes:

and storing the obtained map.

When the map is stored, information related to the map is stored at the same time, which specifically may include:

1. pose information of all key frames subjected to the first BA optimization treatment;

2. The key frames correspond to the position information of the observable feature points in the three-dimensional world coordinate system, the position information of the feature points in the normalized two-dimensional coordinate system and the Brief descriptors corresponding to the feature points;

3. The key frames correspond to the position information of the extracted feature points in the images and used for constructing the bag-of-word data set in the normalized two-dimensional coordinate system, and the corresponding Brief descriptors;

4. The external parameters after the first BA optimization process (i.e., left eye image and IMU, right eye image and IMU external parameters).

It should be noted that, the first BA optimization process needs to be performed when the drawing process is completed, that is, the first BA optimization process is performed after S101 and S102 are performed, so the obtained map is the map after the optimization process, so that the obtained map has higher accuracy and global consistency.

In order to reduce the amount of calculation, the downsampling process may be performed before the first BA optimization process is performed, so as to delete some key frames, that is, key frames with too close distances or very similar directions between adjacent key frames, and the spatial density between adjacent key frames determines whether some of the key frames need to be deleted, thereby reducing the number of key frames, reducing the processing amount of the first BA optimization process, and improving the processing efficiency when the first BA optimization process is performed.

Based on the same inventive concept, the embodiment of the invention provides a positioning method of a robot, as shown in fig. 3, which may include:

S301, acquiring a pre-constructed map, wherein the map is constructed by adopting the construction method provided by the embodiment of the invention;

The map may be obtained by loading, and the obtained map format is the same as the format of the map stored after the map is constructed.

In addition, in the subsequent processing process, all states related to the map are regarded as constant in the subsequent processing process, and when the same sensor suite is adopted, the optimized external parameters can be loaded into initial values in the positioning stage of AMR, so that the accuracy of positioning is ensured while the subsequent positioning processing is realized.

S302, determining key frame data according to preset key frame data determining rules according to image data and inertial measurement unit IMU data synchronously acquired by a robot in real time; wherein the key frame data includes: pose information of the key frame and characteristic point data in the image corresponding to the key frame, wherein the characteristic point is a corner point in the image and is matched with a corner point in an adjacent frame of image;

When the matching condition of angular points in a left eye image and a right eye image included in each acquired frame image and the state increment information between two acquired continuous frame images are determined, after sliding window optimization processing is carried out on each frame image, pose information of a key frame and feature point data in a corresponding image of the key frame are determined; the state increment information includes: position increment information, angle increment information, and speed increment information.

It is noted that, alternatively, the sliding window optimization process may be performed only once when it is performed, to obtain the global pose of AMR and to eliminate the accumulated error of the current trajectory as soon as possible.

For the specific embodiment of S302, reference may be made to the foregoing specific embodiment of S102, and the repetition is not repeated.

S303, determining a closed-loop key frame corresponding to any key frame according to the map and the key frame data;

s304, when the key frame and the corresponding closed-loop key frame meet the preset closed-loop condition, the pose diagram optimization processing is carried out on the closed-loop key frame, and then the robot is positioned.

In this regard, the processing procedure of S302 may be understood as a front-end processing procedure, and the processing procedures of S303 and S304 may be understood as a back-end processing procedure, that is, the front-end processing may determine key frame data, and then send both the key frame data and the map to the back-end, where the processing of the key frame data and the map is located, and in the back-end processing, a powerful binocular vision odometer optimized based on a tightly coupled sliding window is combined with the pose map optimizing processing of the back-end, so as to implement real-time and high-precision positioning.

And after the key frame data is determined in S302, at the front end, the key frame is still subjected to multiple sliding window optimization processing, similar to the front end processing process in the map construction, after the key frame is subjected to multiple sliding window optimization processing, so as to process the latest acquired image and the key frame subjected to the sliding window optimization processing, so as to determine whether the latest acquired image is the key frame. That is, after the key frame data corresponding to the i-th frame key frame is determined, the front-end processing procedure for the i-th frame key frame is not finished, and still continues to be executed.

Optionally, in an embodiment of the present invention, when determining that the key frame and the corresponding closed-loop key frame do not meet the preset closed-loop condition, the method further includes:

And stopping the pose chart optimization processing on the closed-loop key frame pair.

Thus, unnecessary operations can be reduced, positioning efficiency can be improved, and the operation amount of equipment can be reduced.

If the closed-loop key frame corresponding image is one of the images for constructing the map, when a first PNP relative gesture between the key frame and the corresponding closed-loop key frame is determined according to the closed-loop feature point pair and a preset PNP algorithm, the first PNP relative gesture is smaller than a preset fourth threshold;

If the closed-loop key frame corresponding image is one of the images of the constructed map, it is indicated that the closed-loop key frame corresponding image is derived from a certain frame of image in the pre-constructed map, and at this time: the calculation of the closed loop constraint is irrelevant to the state in the front-end processing process (namely, the pose can be directly determined, so that the calculation process is simpler and more accurate); and because the pose of the current key frame mainly depends on a high-precision map, the pose is not affected by errors accumulated in the front-end processing process.

For a point, the calculation process and specific meaning of the PNP relative pose mentioned in the second condition can be referred to in the prior art, and are not described herein.

Therefore, whether the closed loop is accepted between the key frame and the corresponding closed loop key frame can be determined through PNP RANSAC algorithm, if the closed loop is accepted, the closed loop characteristic point pair can be regarded as the same corner point, the matched characteristic point can be used for the subsequent pose diagram optimization processing, and the re-projection error is calculated, so that the robot is positioned; and the mismatching in closed loop detection can be eliminated, so that the positioning accuracy is improved.

Wherein S represents a key frame list with sliding window constraint, L represents a key frame list with closed loop constraint, i represents an ith frame key frame, j represents a jth frame key frame, r _i,j represents a residual error between the ith frame key frame and the jth frame key frame, ρ (-) represents a robust kernel function, P represents the pose of all optimized key frames, and ψ represents angle information of all optimized key frames.

That is, in order to achieve stability of the output pose, a small-scale pose graph is maintained in the back-end processing process to constrain and smooth the global pose, so that pose jump caused by mismatching closed loop detection can be effectively avoided.

Specifically, a keyframe may be regarded as a vertex in the gesture graph, and every two vertices (formed by the keyframe) are connected by a constraint edge, where the types of constraint edges mainly include two types: a sequential constraint edge and a closed loop constraint edge, wherein the sequential constraint edge may represent a relative transformation constraint between two vertices in the front-end processing result, and the keyframes may be connected over several neighboring keyframes that have been determined using the sequential constraint edge; the closed-loop constraint edge may represent a relative transition constraint between a key frame and a corresponding closed-loop key frame, and the closed-loop key frame may be connected with the corresponding key frame by the closed-loop constraint edge.

Therefore, when the key frame and the corresponding closed-loop key frame meet the preset closed-loop condition, the key frame and the corresponding closed-loop key frame are subjected to closed-loop, so that when the pose chart optimization processing is carried out on the closed-loop key frame, two factors, namely the residual error of the closed-loop constraint edge and the residual error of the sequence constraint edge, are required to be considered.

The method includes the steps that if a closed-loop key frame corresponding image corresponding to a key frame is from a constructed map, the pose map is not required to be optimized, and positioning is directly carried out; if the closed-loop key frame corresponding image corresponding to the key frame is not from the constructed map, the pose map optimizing process is needed to realize the positioning of the robot.

In addition, as the AMR moves, the size of the pose map gradually increases, which affects the real-time performance of positioning, so that before the pose map optimization process is performed or after the pose map optimization process is performed, when the motion trail of the AMR is still in the constructed map range, only N latest key frames can be reserved, so that the calculated amount is reduced; if the motion trail of the AMR exceeds the constructed map range, downsampling is performed according to the distance between two neighboring key frames, i.e. key frames with too close distance or very similar direction between neighboring key frames are deleted, and the spatial density between neighboring key frames determines whether some of the key frames need to be deleted, thereby reducing the number of key frames, reducing the processing amount of the pose map optimizing process, and improving the processing efficiency when the pose map optimizing process is performed.

Based on the same inventive concept, the embodiment of the invention provides a robot map construction device, the implementation principle of which is similar to that of the aforementioned robot map construction method, so that the implementation of the construction device can refer to the specific embodiment of the aforementioned construction method, and the repetition is omitted.

Specifically, the device for constructing a robot map according to the embodiment of the present invention, as shown in fig. 4, may include:

A first unit 401, configured to determine key frame data according to a preset key frame data determining rule according to image data and inertial measurement unit IMU data acquired by the robot in real time and synchronously; wherein the key frame data includes: pose information of the key frame and characteristic point data in the image corresponding to the key frame, wherein the characteristic point is a corner point in the image and is matched with a corner point in an adjacent frame of image;

A second unit 402, configured to determine a closed-loop key frame corresponding to the key frame and a closed-loop feature point pair according to the key frame data; wherein the closed loop feature point pair comprises: the first characteristic point is positioned in the image corresponding to the key frame, and the second characteristic point is positioned in the image corresponding to the key frame corresponding to the closed loop, and the first characteristic point is matched with the image information described by the second characteristic point;

A third unit 403, configured to, when it is determined that the key frame and the corresponding closed-loop key frame meet the preset closed-loop condition, obtain a map for representing an environment where the robot is located after performing a first cluster adjustment BA optimization process on the feature point data, the closed-loop feature point pair, and the IMU data pair corresponding to all the key frames;

When the matching condition of angular points in a left eye image and a right eye image included in each acquired frame image and the state increment information between two acquired continuous frames of images are determined, determining a key frame according to the number of times of sliding window optimization processing and the reprojection error of characteristic points after the sliding window optimization processing is carried out on each frame image, and determining pose information of the key frame and characteristic point data in the image corresponding to the key frame; the state increment information includes: position increment information, angle increment information, and speed increment information.

Based on the same inventive concept, the embodiment of the invention provides a positioning device of a robot, the implementation principle of the positioning device is similar to that of the positioning method of the robot, so that the implementation of the positioning device can refer to the specific embodiment of the positioning method, and the repetition is omitted.

Specifically, as shown in fig. 5, the positioning device for a robot provided by the embodiment of the present invention may include:

a first unit 501, configured to obtain a pre-constructed map, where the map is constructed by using the construction method provided by the embodiment of the present invention;

A second unit 502, configured to determine key frame data according to a preset key frame data determining rule according to image data and inertial measurement unit IMU data acquired by the robot in real time and synchronously; wherein the key frame data includes: pose information of the key frame and characteristic point data in the image corresponding to the key frame, wherein the characteristic point is a corner point in the image and is matched with a corner point in an adjacent frame of image;

A third unit 503, configured to determine a closed-loop keyframe corresponding to any keyframe according to the map and the keyframe data;

A fourth unit 504, configured to, when it is determined that the key frame and the corresponding closed-loop key frame meet a preset closed-loop condition, perform pose optimization on the closed-loop key frame, and then position the robot;

Based on the same inventive concept, an embodiment of the present invention provides a robot, as shown in fig. 6, which may include: the above-mentioned construction device 601 provided by the embodiment of the present invention, the above-mentioned positioning device 602 provided by the embodiment of the present invention, and the image acquisition device 603;

wherein, image acquisition device 603 is used for:

the method comprises the steps of acquiring images used for representing the environment of the robot in real time, and sending the acquired images to the construction device 601 and the positioning device 602, so that the construction device 601 constructs a map according to the acquired images, and the positioning device 602 performs positioning according to the acquired images and the constructed map.

Alternatively, in the embodiment of the present invention, the image capturing device may be, but not limited to, a camera, but may also be other forms of vision-based image capturing devices, which are not limited herein.

Optionally, in the embodiment of the present invention, the robot may further include other structures for implementing the functions of the robot besides the construction device, the positioning device and the image acquisition device, which may specifically refer to the prior art, and is not limited herein.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The method for constructing the robot map is characterized by comprising the following steps of:

When the key frames and the corresponding closed-loop key frames meet the preset closed-loop conditions, performing first cluster adjustment BA optimization processing on the feature point data, the closed-loop feature point pairs and the IMU data pairs corresponding to all the key frames to obtain a map for representing the environment of the robot;

Wherein the key frame data determining rule includes: when the matching condition of angular points in a left eye image and a right eye image included in each acquired frame image and the state increment information between two acquired continuous frames of images are determined, after sliding window optimization processing is carried out on each frame image, determining the key frame according to the times of the sliding window optimization processing and the reprojection errors of the characteristic points, and determining pose information of the key frame and the characteristic point data in the corresponding image of the key frame; the state increment information includes: position increment information, angle increment information, and speed increment information;

according to the key frame data, determining a closed-loop key frame corresponding to the key frame and a closed-loop characteristic point pair, which specifically comprises:

2. The method of construction of claim 1, wherein the closed loop condition comprises:

When determining a forward PNP relative gesture and a backward PNP relative gesture between the key frame and the corresponding closed-loop key frame according to the closed-loop feature point pair and a preset PNP algorithm, wherein the difference value between the forward PNP relative gesture and the backward PNP relative gesture is smaller than a preset first threshold;

wherein the forward PNP relative pose comprises:

the backward PNP relative pose comprises:

3. The method of constructing according to claim 1, further comprising, upon determining that the key frame and the corresponding closed-loop key frame do not satisfy the closed-loop condition:

And carrying out the optimization processing of the first cluster adjustment BA on the feature point data corresponding to the key frame and the IMU data pair to obtain the map.

4. The method of constructing as claimed in claim 3, wherein performing said first bundle adjustment BA optimization process on said feature point data corresponding to said key frame and said IMU data pair specifically comprises:

and carrying out the first bundling adjustment BA optimization processing on the characteristic point data corresponding to the key frame and the IMU data pair according to the following formula:

where k represents a kth frame key frame, B represents a list of key frames on which pre-integration was performed in determining the state delta information, An observed value representing a feature point on the acquired left-eye image,Representing the observed value of the feature point on the acquired right eye image, j representing the j-th frame key frame, P _l representing the coordinates of the feature point l in the world coordinate system,Representing the pre-integration result of IMU data in determining the state delta information, χ represents the state of the sliding window that needs to be optimized,Representing the jacobian matrix corresponding to the state increment information,Representing the variance matrix of the left-eye image,A variance matrix representing the right-eye image,Representing the residual of the IMU data,Representing the residual of the left-eye image in the key frame corresponding image,Representing a residual of a right-eye image in the key frame corresponding image, _CSeqL representing a left-eye image in the key frame corresponding image, _CSeqR representing a right-eye image in the key frame corresponding image, and ρ (°) representing a robust kernel function.

5. The method of claim 1, wherein performing a first bundle adjustment BA optimization process on feature point data, the closed loop feature point pair, and the IMU data pair corresponding to all the key frames, specifically comprises:

where k represents a kth frame key frame, B represents a list of key frames on which pre-integration was performed in determining the state delta information, An observed value representing a feature point on the acquired left-eye image,Representing the observed value of the feature point on the acquired right eye image, j representing the j-th frame key frame, P _l representing the coordinates of the feature point l in the world coordinate system,Representing the pre-integration result of IMU data in determining the state delta information, χ represents the state of the sliding window that needs to be optimized,Representing the jacobian matrix corresponding to the state increment information,Representing the variance matrix of the left-eye image,A variance matrix representing a right eye image, C _loopL representing a list of left eye closed loop frame images that accept closed loops, C _loopR representing a list of right eye closed loop frame images that accept closed loops,Representing the residual of the IMU data,Representing the residual of the left-eye image in the key frame corresponding image,Representing the residual of the right-eye image in the key frame corresponding image,Representing the residual error of the left-eye image in the image of the closed-loop frame corresponding to the key frame,Residual errors of right-eye images in images of closed-loop frames corresponding to the key frames are represented, _CSeqL represents left-eye images in images corresponding to the key frames, _CSeqR represents right-eye images in images corresponding to the key frames, and ρ () represents a robust kernel function.

6. The construction method according to claim 1, wherein the sliding window optimizing process specifically includes:

7. The construction method according to claim 6, wherein the second BA optimization process is performed on n+1 frame images and IMU data corresponding to the n+1 frame images according to the determined matching condition of the corner points in each frame image and the state increment information between two consecutive frame images, and specifically includes:

Wherein, Representing the residual of the IMU data,Representing the residual of the image data, r _Β representing the IMU factor residual,Representing the pre-integration result of the IMU data in determining the state delta information,The image data is represented by a representation of the image data,Representing the jacobian matrix corresponding to the state increment information,And (3) representing a variance matrix corresponding to the state increment information, χ representing a state of a sliding window to be optimized, r _p-H_p χ representing a priori residual of the marginalized undetermined frame, k representing a kth frame, B representing a list of images subjected to pre-integration when determining the state increment information, l representing a feature point l, j representing a jth frame, C representing a list of acquired images, and ρ (°) representing a robust kernel function.

8. A method of positioning a robot, comprising:

Obtaining a pre-constructed map, the map being constructed using the construction method of any one of claims 1-7;

Wherein the key frame data determining rule includes: determining pose information of the key frame and characteristic point data in the corresponding image of the key frame when determining matching conditions of angular points in a left eye image and a right eye image included in each acquired image and state increment information between two acquired continuous images and performing sliding window optimization processing on each frame image; the state increment information includes: position increment information, angle increment information, and speed increment information.

9. The positioning method of claim 8, wherein the closed loop condition comprises:

10. The positioning method according to claim 8, wherein the pose optimization processing is performed on the closed-loop keyframe, specifically including:

11. A robot map constructing apparatus, comprising:

A second unit, configured to determine a closed-loop key frame corresponding to the key frame and a closed-loop feature point pair according to the key frame data; wherein the closed loop feature point pair comprises: the first characteristic points are located in the corresponding images of the key frames, and the second characteristic points are located in the corresponding images of the closed-loop key frames, and the first characteristic points are matched with image information described by the second characteristic points;

A third unit, configured to, when it is determined that the key frame and the corresponding closed-loop key frame meet a preset closed-loop condition, obtain a map for representing an environment where the robot is located after performing cluster adjustment BA optimization processing on feature point data corresponding to all the key frames, the closed-loop feature point pairs, and the IMU data pairs;

When the matching condition of angular points in a left eye image and a right eye image included in each acquired frame image and the state increment information between two acquired continuous frames of images are determined, after sliding window optimization processing is carried out on each frame image, determining the key frame according to the number of times of the sliding window optimization processing and the reprojection error of the characteristic point, and determining pose information of the key frame and the characteristic point data in the image corresponding to the key frame; the state increment information includes: position increment information, angle increment information, and speed increment information;

12. A positioning device for a robot, comprising:

a first unit for acquiring a pre-constructed map constructed using the construction method according to any one of claims 1 to 6;

13. A robot comprising a robot body, a robot body and a robot body, characterized by comprising the following steps: the construction device according to claim 11, the positioning device according to claim 12, and the image acquisition device;

Wherein, the image acquisition device is used for: