Nothing Special   »   [go: up one dir, main page]

CN112507778B - Loop detection method of improved bag-of-words model based on line characteristics - Google Patents

Loop detection method of improved bag-of-words model based on line characteristics Download PDF

Info

Publication number
CN112507778B
CN112507778B CN202011111454.8A CN202011111454A CN112507778B CN 112507778 B CN112507778 B CN 112507778B CN 202011111454 A CN202011111454 A CN 202011111454A CN 112507778 B CN112507778 B CN 112507778B
Authority
CN
China
Prior art keywords
visual
bag
words
loop
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011111454.8A
Other languages
Chinese (zh)
Other versions
CN112507778A (en
Inventor
孟庆浩
史佳豪
戴旭阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011111454.8A priority Critical patent/CN112507778B/en
Publication of CN112507778A publication Critical patent/CN112507778A/en
Application granted granted Critical
Publication of CN112507778B publication Critical patent/CN112507778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a loop detection method of an improved bag-of-words model based on line characteristics, which comprises the following steps: LSD (Line Segment Detector) features are extracted from the offline image dataset and corresponding LBD descriptors are computed, which are used as the raw data of the clustering generation dictionary. And constructing an LSD characteristic word bag model by using an improved word bag model construction method, and constructing a visual dictionary tree with self-adaptive branches. And converting the bag-of-words model vector. And optimizing visual word weight. And (3) similarity calculation: and calculating the similarity by adopting an L1 norm according to the visual bag-of-word vector between the current frame and the historical key frame, and obtaining the appearance similarity score between the images. And acquiring loop candidate frames, grouping the loop candidate frames, and removing isolated loop candidate frames with similar appearances. The continuity verification can only consider that the loop is a reliable loop candidate if the loop is continuously detected, and the loop candidate is reserved. And (5) verifying geometric consistency.

Description

一种基于线特征的改进词袋模型的回环检测方法A Loop Closure Detection Method Based on Line Feature Based on Improved Bag of Words Model

技术领域technical field

本发明涉及视觉SLAM(Simultaneous Localization And Mapping:同步定位与建图)领域,尤其涉及一种基于线特征的改进词袋模型的视觉SLAM回环检测方法。The present invention relates to the field of visual SLAM (Simultaneous Localization And Mapping: simultaneous localization and mapping), in particular to a visual SLAM loop closure detection method based on an improved word bag model based on line features.

背景技术Background technique

回环检测是视觉SLAM中不可缺少的一部分,能够消除视觉里程计部分所产生的累计误差,从而构建出全局一致的地图。基于词袋模型的回环检测算法是当前的主要方法,其通过构建词袋模型对比图像之间的相似度来判断是否存在回环。词袋模型最早源于文本分析,通过对比文本中各单词出现的频率来确定文本的相似度。相应地,视觉词袋模型也是通过对比图像中“视觉单词”出现的频率,来衡量两张图像的相似度。Loop closure detection is an indispensable part of visual SLAM, which can eliminate the cumulative error generated by the visual odometry part, thereby constructing a globally consistent map. The loop closure detection algorithm based on the bag-of-words model is the current main method, which judges whether there is a loop by constructing a bag-of-words model to compare the similarity between images. The bag-of-words model originated from text analysis and determined the similarity of text by comparing the frequency of each word in the text. Correspondingly, the bag-of-visual-words model also measures the similarity of two images by comparing the frequency of "visual words" in the images.

2008年Cummins等人(Cummins M,Newman P.FAB-MAP:ProbabilisticLocalization and Mapping in the Space of Appearance[M].Sage Publications,Inc.2008.)提出了基于SURF(Speeded Up Robust Features)特征和Chou-Liu树的词袋模型,并且通过该词袋模型较好地实现了基于图像外观的相机位置识别。但其词袋模型向量是一个二进制向量,即只考虑了视觉单词是否在图像中出现,而没有考虑到不同单词出现的不同频率。In 2008, Cummins et al. (Cummins M, Newman P. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance [M]. Sage Publications, Inc. 2008.) proposed a feature based on SURF (Speeded Up Robust Features) and Chou- The bag-of-words model of Liu tree, and the camera position recognition based on image appearance is well realized through the bag-of-words model. However, the vector of the bag of words model is a binary vector, that is, it only considers whether the visual word appears in the image, and does not consider the different frequencies of different words.

2011年Galvez-Lopez等人(Galvez-Lopez D,Tardos J D.Real-time loopdetection with bags of binary words[C].International Conference onIntelligent Robots and Systems,2011:25-30.)中采用FAST(Features fromAccelerated Segment Test)关键点和BRIEF(Binary Robust Independent ElementaryFeatures)二进制描述符实现点特征的提取和描述,并且引入了k-d树的数据结构进行词典构建。使用分层K-means聚类的方法,构建了基于点特征的二进制描述符视觉词袋模型。k-d树的词典结构也就导致了词典构建过程中所采用的K-means聚类都采用了相同的参数k,然而并不是任何数据使用同一个k值进行聚类,所得到的聚类结果都是最好的。In 2011, Galvez-Lopez et al. (Galvez-Lopez D, Tardos J D. Real-time loop detection with bags of binary words[C]. International Conference on Intelligent Robots and Systems, 2011: 25-30.) adopted FAST (Features from Accelerated Segment Test) key points and Brief (Binary Robust Independent Elementary Features) binary descriptors implement the extraction and description of point features, and introduce the data structure of k-d tree for dictionary construction. Using the method of hierarchical K-means clustering, a visual bag-of-words model of binary descriptors based on point features is constructed. The dictionary structure of the k-d tree also leads to the fact that the K-means clustering used in the dictionary construction process adopts the same parameter k, but not any data is clustered with the same k value, and the obtained clustering results are all it's the best.

随后,2015年Mur-Artal等人(Mur-Artal R,Montiel J M M,Tardos J D.ORB-SLAM:a versatile and accurate monocular SLAM system[J].IEEE Transactions onRobotics,2015,31(5):1147-1163)提出的ORB-SLAM中,构建了基于ORB(Oriented FASTand Rotated BRIEF)点特征的视觉词袋模型。ORB点特征解决了FAST关键点的旋转不变性和尺度不变性问题,并在实验中取得了较好的效果。但该视觉词典仍采用K-means聚类以及k-d树的词典结构,词袋模型构建过程并没有进行改进。Subsequently, in 2015, Mur-Artal et al. (Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: a versatile and accurate monocular SLAM system [J]. IEEE Transactions on Robotics, 2015, 31(5): 1147- 1163) proposed ORB-SLAM, a visual bag of words model based on ORB (Oriented FAST and Rotated BRIEF) point features is constructed. The ORB point feature solves the rotation invariance and scale invariance of FAST keypoints, and achieves good results in experiments. However, the visual dictionary still adopts the dictionary structure of K-means clustering and k-d tree, and the construction process of the bag-of-words model has not been improved.

上述基于点特征词袋模型的检测效果依赖从环境中提取到的点特征的数量,当环境中无法提取到足够的点特征时,且点特征容易扎堆出现,就无法计算视频帧的词袋向量及视频帧之间的外观相似度。The detection effect of the above-mentioned bag-of-words model based on point features depends on the number of point features extracted from the environment. When enough point features cannot be extracted from the environment, and the point features are easy to appear together, the bag-of-words vector of the video frame cannot be calculated. and the appearance similarity between video frames.

在结构化低纹理的环境中,虽然常常无法提取到足够的点特征,但这种场景中有丰富的线特征可以利用。In structured low-texture environments, although sufficient point features are often not extracted, there are abundant line features in such scenes that can be exploited.

Lee等(Lee J H,Zhang G,Lim J,et al.Place recognition using straightlines for vision-based SLAM[C].,2013 IEEE International Conference onRobotics and Automation(ICRA).IEEE,2013,pp.3799-3806)提出了基于MSLD(Mean-Standard deviation Line Descriptor)线特征描述符的词袋模型,在实验中取得较好的效果。但MSLD线特征描述符并不具有尺度不变性,且计算复杂度较高,不利于实时运行。Lee et al. (Lee J H, Zhang G, Lim J, et al. Place recognition using straightlines for vision-based SLAM [C]., 2013 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2013, pp.3799-3806) A bag-of-words model based on MSLD (Mean-Standard deviation Line Descriptor) line feature descriptor is proposed, and good results are obtained in the experiment. However, MSLD line feature descriptors do not have scale invariance, and the computational complexity is high, which is not conducive to real-time operation.

林利蒙等(林利蒙,王梅.改进点线特征的双目视觉SLAM算法[J].计算机测量与控制,2019(9):156-162)将点线特征构建在一个视觉词袋模型中,提取图像的点、线特征,使用词袋模型将两种特征转化到一个词袋向量中,并通过词袋向量计算图像相似度。Lin Limeng et al. (Lin Limeng, Wang Mei. Improved binocular vision SLAM algorithm for point and line features [J]. Computer Measurement and Control, 2019(9): 156-162) Construct point and line features in a visual word bag In the model, the point and line features of the image are extracted, the bag of words model is used to convert the two features into a bag of words vector, and the image similarity is calculated by the bag of words vector.

专利201811250049.7(一种点线特征融合的紧耦合双目视觉惯性SLAM方法)则分别构建点特征词袋模型和线特征词袋模型,并计算两帧之间的点特征相似度评分和线特征相似度评分,并取其加权和作为最终两帧的相似度评分。这两种方法都使用了线特征构建词袋模型,但在视觉词袋模型构建过程中仍然使用的是K-means聚类和k-d树的词典结构,与上述几个词袋模型构建过程并无本质区别,同样无法获得较好的视觉单词聚类结果。而且词袋模型中单词权重计算方法都采用的是TF-IDF(Term Frequency-Inverse DocumentFrequency)法,该方法考虑了视觉单词在当前图像中的频率,以及它在训练数据集上的重要性,但是并未考虑其在回环检测查询数据集上的重要性。Patent 201811250049.7 (a tightly coupled binocular visual-inertial SLAM method for point-line feature fusion) builds a point-feature word bag model and a line-feature word-bag model respectively, and calculates the point feature similarity score and line feature similarity between the two frames. and take its weighted sum as the similarity score of the final two frames. Both methods use line features to build a bag of words model, but the K-means clustering and k-d tree dictionary structure are still used in the construction of the visual bag of words model. The essential difference is also unable to obtain better visual word clustering results. Moreover, the word weight calculation method in the bag of words model adopts the TF-IDF (Term Frequency-Inverse Document Frequency) method, which considers the frequency of visual words in the current image and its importance in the training data set, but Its importance on the loop closure detection query dataset is not considered.

综上所述,线特征是一种可以在结构化环境中代替点特征的局部特征,构建一种实时地基于线特征词袋模型的视觉SLAM回环检测算法,能够有效地解决在结构化低纹理环境中,基于点特征的回环检测算法无法有效检测出回环的问题。本文提出一种基于线特征的改进词袋模型的视觉SLAM回环检测算法,改进词袋模型的构建过程和视觉单词的权重。To sum up, the line feature is a local feature that can replace the point feature in a structured environment, and a real-time visual SLAM loop closure detection algorithm based on the line feature bag-of-words model is constructed, which can effectively solve the problem in structured low-texture. In the environment, the loop closure detection algorithm based on point feature cannot effectively detect the loop closure problem. In this paper, a visual SLAM loop closure detection algorithm based on the improved bag of words model based on line features is proposed, and the construction process of the bag of words model and the weight of visual words are improved.

发明内容SUMMARY OF THE INVENTION

本发明针对在结构化低纹理环境中,难以提取到足够的点特征以实现视觉SLAM回环检测的问题,提出了一种基于线特征的改进词袋模型的回环检测方法。该算法能够在结构化环境中,利用丰富的线特征作为视觉局部特征,实现基于视觉的回环检测。并且通过改进词袋模型的构建方法和视觉单词权重计算方法,提高回环检测的准确率和召回率。技术方案如下:Aiming at the problem that it is difficult to extract enough point features to realize visual SLAM loop detection in a structured low-texture environment, the invention proposes a loop detection method based on an improved bag of words model based on line features. The algorithm can realize vision-based loop closure detection in a structured environment by using rich line features as visual local features. And by improving the construction method of the bag of words model and the visual word weight calculation method, the accuracy and recall rate of loop closure detection are improved. The technical solution is as follows:

一种基于线特征的改进词袋模型的回环检测方法,包括以下步骤:A method for loop closure detection based on a line feature-based improved bag-of-words model, comprising the following steps:

步骤1:通过离线图像数据集提取LSD(Line Segment Detector)特征并计算对应的LBD(Line Band Descriptor)描述符,将描述符作为聚类生成词典的原始数据。Step 1: Extract the LSD (Line Segment Detector) feature from the offline image dataset and calculate the corresponding LBD (Line Band Descriptor) descriptor, and use the descriptor as the original data for clustering to generate a dictionary.

步骤2:利用改进的词袋模型构建方法构建LSD特征词袋模型:在构建词典树的每一次聚类之前,先确定针对当前数据的最优聚类k值k′,然后再将当前数据聚为k′类。如此循环直至最终构建出具有自适应分支的视觉词典树。Step 2: Use the improved bag of words model construction method to build the LSD feature bag of words model: Before constructing each clustering of the dictionary tree, first determine the optimal clustering k value k' for the current data, and then cluster the current data. for class k'. This cycle is repeated until a visual dictionary tree with adaptive branches is finally constructed.

步骤3:词袋模型向量转化:从图像中提取LSD-LBD线特征,根据构建的基于LBD描述符的词袋模型,以及线特征描述符和视觉单词之间的汉明距离,将图像中每一个线特征量化为对应的视觉单词,从而将整幅图像转化为对应的数值向量。Step 3: Vector transformation of bag-of-words model: LSD-LBD line features are extracted from the image. According to the constructed bag-of-words model based on LBD descriptors, as well as the Hamming distance between the line feature descriptors and visual words, each line in the image A line feature is quantized to the corresponding visual word, thereby transforming the entire image into a corresponding numerical vector.

步骤4:视觉单词权重优化:在回环检测中引入一个权重优化参数

Figure BDA0002728735520000021
根据视觉单词在历史关键帧数据集上的分布情况,对词袋模型向量中的视觉单词权重进行优化,计算视觉单词的权重优化参数,并结合TF-IDF法计算出的单词权重,得到权重优化后的视觉词袋向量。Step 4: Visual word weight optimization: Introduce a weight optimization parameter in loop closure detection
Figure BDA0002728735520000021
According to the distribution of visual words on the historical key frame data set, optimize the visual word weight in the bag of words model vector, calculate the weight optimization parameters of the visual word, and combine the word weight calculated by the TF-IDF method to obtain the weight optimization. Post visual bag of words vector.

步骤5:相似度计算:根据当前帧和历史关键帧之间的视觉词袋向量采用L1范数计算相似度,获得图像间的外观相似度评分。Step 5: Similarity calculation: According to the visual word bag vector between the current frame and the historical key frame, the L1 norm is used to calculate the similarity, and the appearance similarity score between the images is obtained.

步骤6:获取回环候选帧并分组:将满足相似度阈值要求的历史关键帧,设置为回环候选帧,对回环候选帧进行分组,将时序相近的回环候选帧分为一组,然后根据整组的相似度评分,以及给定阈值,剔除掉那些孤立的外观相似的回环候选帧。Step 6: Obtain and group loopback candidate frames: Set the historical key frames that meet the similarity threshold requirements as loopback candidate frames, group the loopback candidate frames, and group the loopback candidate frames with similar timing into a group, and then according to the whole group The similarity score of , and given a threshold, those isolated loop closure candidate frames with similar appearance are eliminated.

步骤7:连续性验证:在此阶段,检测在这些回环候选帧中,是否能够在一段时间内持续检测到回环。只有持续检测到回环,才能认为这是一个可靠的回环候选,则保留该回环候选。Step 7: Continuity Verification: At this stage, it is checked whether the loopback can be detected continuously for a period of time in these loopback candidate frames. Only when loopbacks are continuously detected can it be considered as a reliable loopback candidate, and the loopback candidate is retained.

步骤8:几何一致性验证:为了保证回环的准确性,对这些当前帧和回环候选帧的视觉单词分布进行验证,只有这些视觉单词对应的线特征分布的情况相同,才能认为这两帧构成回环。Step 8: Geometric consistency verification: In order to ensure the accuracy of the loopback, the visual word distribution of the current frame and the loopback candidate frame is verified. Only when the line feature distributions corresponding to these visual words are the same, can the two frames be considered to constitute a loopback .

当前视觉SLAM仍主要采用点特征作为视觉特征,相比于点特征回环检测,本发明采用在结构化环境中更为丰富的线特征作为局部视觉特征进行回环检测。本发明的关键点在于,1)首先,采用线特征构建了具有自适应分支数的视觉词典树,提高视觉单词的区分度,降低了局部特征转化为视觉单词的量化误差。2)然后,根据视觉单词在回环检测查询数据集中的分布情况,计算每个单词的权重优化参数,通过参数优化视觉词袋向量,使得词袋向量相似度计算结果具有更高的区分度。相比未优化过的视觉词袋模型,本发明能够在100%的准确率下,获得更高的召回率,从而表明本发明能够更准确更有效的检测出回环。At present, visual SLAM still mainly uses point features as visual features. Compared with point feature loop closure detection, the present invention adopts more abundant line features in a structured environment as local visual features for loop closure detection. The key points of the present invention are: 1) First, a visual dictionary tree with adaptive branch numbers is constructed by using line features, which improves the discrimination of visual words and reduces the quantization error of converting local features into visual words. 2) Then, according to the distribution of visual words in the loopback detection query data set, the weight optimization parameters of each word are calculated, and the visual word bag vector is optimized through the parameters, so that the similarity calculation result of the word bag vector has a higher degree of discrimination. Compared with the unoptimized visual word bag model, the present invention can obtain a higher recall rate at 100% accuracy, which shows that the present invention can detect loop closures more accurately and effectively.

附图说明Description of drawings

图1为改进词袋模型构建流程图Figure 1 is the flow chart of the improved bag-of-words model construction

图2为结构化环境低纹理中LSD线特征提取结果图Figure 2 shows the result of LSD line feature extraction in structured environment with low texture

图3为结构化环境低纹理中ORB点特征提取结果图Figure 3 shows the result of ORB point feature extraction in the low texture of the structured environment

图4为本发明实例LBD描述符构建视觉词典示意图FIG. 4 is a schematic diagram of constructing a visual dictionary from LBD descriptors according to an example of the present invention.

具体实施方式Detailed ways

下面结合具体实例,进一步阐述本发明。应当指出的是,所描述的实施例仅旨在便于对本发明的理解,而不对其起任何限定作用。The present invention will be further described below in conjunction with specific examples. It should be noted that the described embodiments are only intended to facilitate the understanding of the present invention without any limitation thereto.

步骤1:对离线获取的大量图像数据,使用LSD算法提取线段特征,并利用LBD算法计算线特征的描述符。LSD算法能够在快速地实现线特征的检测,LBD描述符可以生成二进制描述符,并能实现快速匹配,采用这种线特征提取和描述方法,能够满足回环检测的实时性要求。Step 1: For a large amount of image data obtained offline, use the LSD algorithm to extract the line segment features, and use the LBD algorithm to calculate the descriptors of the line features. The LSD algorithm can quickly realize the detection of line features, and the LBD descriptor can generate binary descriptors and can achieve fast matching. This method of line feature extraction and description can meet the real-time requirements of loop closure detection.

步骤2:构建自适应分支数的线特征视觉词袋模型。传统的词袋构建方法采用了K-means聚类算法和k-d树的数据结构,因此保留了K-means算法人为指定k值的缺点。本发明在构建词典树的每一次聚类之前,加入一个确定当前数据最优聚类k值的环节,通过引入聚类评价指标轮廓系数来确定聚类的最优k值。首先,计算当前数据在不同k值(k取5-15)下的轮廓系数,并根据轮廓系数选出当前数据下相对最合理的k值,即轮廓系数最大时对应的k值k′,然后以k′作为最终构建的视觉词典树当前节点的聚类k值。如此循环往复,直到构建完词袋模型第5层为止。Step 2: Build a line-featured visual word bag model with adaptive branch number. The traditional bag of words construction method adopts the K-means clustering algorithm and the data structure of the k-d tree, so it retains the disadvantage of manually specifying the k value of the K-means algorithm. Before constructing each clustering of the dictionary tree, the present invention adds a link of determining the optimal clustering k value of the current data, and determines the optimal k value of the clustering by introducing the contour coefficient of the clustering evaluation index. First, calculate the silhouette coefficient of the current data under different k values (k is 5-15), and select the most reasonable k value under the current data according to the silhouette coefficient, that is, the corresponding k value k' when the silhouette coefficient is the largest, and then Take k' as the cluster k value of the current node of the final constructed visual dictionary tree. This cycle repeats until the fifth layer of the bag of words model is constructed.

聚类评价性指标轮廓系数结合了类内聚合度a(i)和类间分离度b(i)两种因素。假定当前数据已被聚为k类,那么当前的轮廓系数S可通过如下方式获得:The contour coefficient of the clustering evaluation index combines two factors, the degree of aggregation within a class (a) and the degree of separation between classes b (i). Assuming that the current data has been clustered into k categories, the current silhouette coefficient S can be obtained as follows:

首先,计算聚类后每个元素的轮廓系数First, calculate the silhouette coefficient of each element after clustering

s(i)=(b(i)-a(i))/(max{a(i),b(i)}) (1)s(i)=(b(i)-a(i))/(max{a(i), b(i)}) (1)

其中类内聚合度a(i)表示当前元素mi到其所属类别其他元素mj的平均距离,类间分离度b(i)表示当前元素mi到其他簇平均距离的最小值。可以看出s(i)∈[-1,1],s(i)接近1,则说明样本元素mi聚类合理;s(i)接近-1,则说明样本元素mi更应该分类到其他簇中;s(i)接近于0,则说明样本元素mi位于两个簇的边界上。The intra-class aggregation degree a(i) represents the average distance between the current element m i and other elements m j of its category, and the inter-class separation degree b(i) represents the minimum value of the average distance between the current element mi and other clusters. It can be seen that s(i)∈[-1,1], if s(i) is close to 1, it means that the sample elements m i are reasonably clustered; if s(i) is close to -1, it means that the sample elements m i should be classified into In other clusters; s(i) is close to 0, indicating that the sample element m i is located on the boundary of the two clusters.

在计算出每个元素的轮廓系数之后,使用所有元素的轮廓系数的平均值,作为当前聚类结果的轮廓系数,即S=1/k*∑0<i≤ks(i)After calculating the silhouette coefficient of each element, use the average of the silhouette coefficients of all elements as the silhouette coefficient of the current clustering result, that is, S=1/k*∑ 0<i≤k s(i)

最后根据不同k值下计算的轮廓系数S,选取轮廓系数最大时所对应的k值,作为视觉词典树中当前节点处最终的聚类分支数。以此类推,对于视觉词典构建过程中所有的中间节点均计算其最优的聚类分枝数,获取相对较好的聚类效果,使得视觉单词更具有区分性。Finally, according to the silhouette coefficient S calculated under different k values, the k value corresponding to the maximum silhouette coefficient is selected as the final number of cluster branches at the current node in the visual dictionary tree. By analogy, the optimal number of clustering branches is calculated for all intermediate nodes in the process of building a visual dictionary, so as to obtain a relatively good clustering effect and make visual words more distinguishable.

步骤3:词袋模型向量转化。根据构建的基于LBD描述符的词袋模型,以及线特征描述符和视觉单词之间的汉明距离,将图像中每一个线特征量化为对应的视觉单词,从而将整幅图像转化为对应的数值向量,如Step 3: bag-of-words model vector transformation. According to the constructed bag-of-words model based on LBD descriptors and the Hamming distance between line feature descriptors and visual words, each line feature in the image is quantified into a corresponding visual word, thereby converting the entire image into a corresponding visual word. numeric vector, such as

Figure BDA0002728735520000041
Figure BDA0002728735520000041

其中wi表示第i个视觉单词,ηi为其对应的权重,采用的是TF-IDF的权重计算方法,即ηi=TFi*IDFi。实际中,每幅图像只包含视觉词典中的少量视觉单词,因此数值向量中大部分ηi=0,即va是一个稀疏的向量。Among them, wi represents the ith visual word, and η i is its corresponding weight. The weight calculation method of TF-IDF is adopted, that is, η i =TF i *IDF i . In practice, each image contains only a few visual words in the visual dictionary, so most of the numerical vectors η i = 0, ie v a is a sparse vector.

步骤4:视觉单词权重优化。TF-IDF权重计算方法考虑了视觉单词在当前图像中的频率,以及它在训练数据集上的重要性,但是并未考虑其在回环检测历史关键帧数据集上的重要性。TF-IDF法认为一个单词出现的文本频数(即包含某个单词的文本数)越小,它区别不同类别文本的能力就越大。那么同样地,可以认为一个视觉单词在历史关键帧数据集上出现的文本频数越小,其对于区分历史关键帧数据集中不同图像的能力也就越大。Step 4: Visual word weight optimization. The TF-IDF weight calculation method considers the frequency of the visual word in the current image and its importance on the training dataset, but does not consider its importance on the historical keyframe dataset for loop closure detection. The TF-IDF method believes that the smaller the text frequency of a word (that is, the number of texts containing a certain word), the greater its ability to distinguish different types of texts. In the same way, it can be considered that the smaller the text frequency of a visual word in the historical key frame dataset, the greater its ability to distinguish different images in the historical key frame dataset.

因此在权重计算中引入重复因子

Figure BDA0002728735520000045
统计历史关键帧数据集中,每一个视觉单词在关键帧中出现的数量Ii。使参数
Figure BDA0002728735520000046
随着关键帧数量Ii的增大而减小,从而降低在此过程中重复出现的视觉单词的权重。具体如下:Therefore, a repetition factor is introduced in the weight calculation
Figure BDA0002728735520000045
Count the number I i of each visual word appearing in the key frame in the historical key frame data set. make parameter
Figure BDA0002728735520000046
It decreases as the number of keyframes I i increases, thereby reducing the weight of repeated visual words in the process. details as follows:

1)在回环检测中,在建立视觉单词与关键帧之间相互索引的同时,统计出现每个视觉单词的关键帧数量

Figure BDA00027287355200000410
并根据对应的关键帧数量计算视觉单词的重复因子
Figure BDA0002728735520000047
其中n为历史关键帧数据集中关键帧的数量,
Figure BDA00027287355200000412
为其中出现视觉单词wi的关键帧数量。1) In loop closure detection, while establishing the mutual index between visual words and key frames, count the number of key frames in which each visual word appears
Figure BDA00027287355200000410
And calculate the repetition factor of the visual word according to the corresponding number of keyframes
Figure BDA0002728735520000047
where n is the number of keyframes in the historical keyframe dataset,
Figure BDA00027287355200000412
is the number of keyframes in which the visual word wi appears.

2)结合重复因子

Figure BDA0002728735520000049
和TF-IDF,生成视觉单词wi新的权重
Figure BDA0002728735520000048
并据此生成新的词袋模型向量v′a。2) Binding repeat factors
Figure BDA0002728735520000049
and TF-IDF, generate new weights for visual words wi
Figure BDA0002728735520000048
And generate a new bag-of-words model vector v' a accordingly.

v′a={(W1,η′1),(w2,η′2),...,(wN,η′N)} (3)v' a = {(W 1 , η' 1 ), (w 2 , η' 2 ), ..., (w N , η' N )} (3)

其中η′i为视觉单词i优化后的权重,

Figure BDA00027287355200000411
为单词i的权重优化参数,TFi为单词i在当前图像中的词频,IDFi为单词i在训练数据集上的逆文档频率。where η′ i is the optimized weight of visual word i,
Figure BDA00027287355200000411
The parameters are optimized for the weight of word i, TF i is the word frequency of word i in the current image, and IDF i is the inverse document frequency of word i on the training dataset.

步骤5:图像相似度计算。使用当前图像和历史关键帧计算出来的新的词袋模型向量,计算图像相似度。对于任意两个图像的词袋模型向量,使用L1范数来评估图像的相似度,具体如下:Step 5: Image similarity calculation. Image similarity is calculated using the new bag-of-words model vector calculated from the current image and historical keyframes. For the bag-of-words model vector of any two images, the L1 norm is used to evaluate the similarity of the images, as follows:

Figure BDA0002728735520000042
Figure BDA0002728735520000042

相似度计算结果在0到1之间,当两幅图像完全无关时,相似度评分为0;当两幅图像完全一致时,相似度评分为1。The similarity calculation result is between 0 and 1. When the two images are completely unrelated, the similarity score is 0; when the two images are completely identical, the similarity score is 1.

步骤6:获取回环候选帧并分组。历史关键帧中,若有与当前关键帧相似度满足一定阈值α的关键帧,则可将其设置为回环候选帧。在得到所有的回环候选帧之后,对回环候选帧进行分组,将时序相近的回环候选帧分为一组,计算组相似度评分。对于每一个候选组,使用I1,I2,I3,..,In表示其中的关键帧,s1,s2,s3,...,sn表示其与当前关键帧的相似度,则组相似度评分可用这些相似度之和来表示,即

Figure BDA0002728735520000043
Figure BDA0002728735520000044
式中vk为该组中第k个关键帧对应的词袋模型向量,vc为当前关键帧对应的词袋模型向量。Step 6: Obtain and group loopback candidate frames. Among the historical key frames, if there is a key frame whose similarity with the current key frame satisfies a certain threshold α, it can be set as a loopback candidate frame. After all the loopback candidate frames are obtained, the loopback candidate frames are grouped, and the loopback candidate frames with similar timings are grouped into a group, and the group similarity score is calculated. For each candidate group, use I 1 , I 2 , I 3 , .., In to denote the key frame in it, and s 1 , s 2 , s 3 , . . . , sn denote its similarity to the current key frame degree, the group similarity score can be expressed by the sum of these similarities, that is,
Figure BDA0002728735520000043
Figure BDA0002728735520000044
where v k is the bag of words model vector corresponding to the kth key frame in the group, and v c is the bag of words model vector corresponding to the current key frame.

通过对回环候选帧进行分组,求出对应的组相似度评分,根据给定的组评分阈值β,剔除掉组评分较低的回环候选关键帧。因为正确的回环关键帧,往往其时序相近的关键帧和当前关键帧也会有较高的相似度,也属于回环候选帧,这样可以排除一些不正确的回环候选帧。By grouping the loop closure candidate frames, the corresponding group similarity score is obtained, and according to the given group score threshold β, the loop closure candidate key frames with lower group scores are eliminated. Because of the correct loopback key frame, the key frame with similar timing and the current key frame will also have a high similarity, and also belong to the loopback candidate frame, so that some incorrect loopback candidate frames can be excluded.

步骤7:连续性检验。在此阶段我们认为,只有当连续的几帧同时检测到回环时,才认为这个回环比较可靠,则保留该回环候选。Step 7: Continuity check. At this stage, we believe that only when a loopback is detected in several consecutive frames at the same time, the loopback is considered reliable, and the loopback candidate is reserved.

步骤8:几何一致性验证。由于视觉词袋模型忽略的视觉特征的空间信息,因此在最后阶段,需要对回环候选帧与当前关键帧进行几何一致性验证,以确保回环检测的准确性。Step 8: Geometric Consistency Verification. Due to the spatial information of visual features ignored by the visual bag of words model, in the final stage, the geometric consistency verification between the loop closure candidate frame and the current key frame is required to ensure the accuracy of loop closure detection.

通过计算当前帧和回环候选帧之间匹配的线特征计算线特征重投影误差,再通过局部BA优化,求出当前帧和回环候选帧之间的位姿变换。并通过计算在该位姿变换下的线特征inlier数,判断位姿变换是否合理,从而判断该回环候选帧是否通过几何一致性验证。The line feature reprojection error is calculated by calculating the matching line features between the current frame and the loop closure candidate frame, and then the pose transformation between the current frame and the loop closure candidate frame is obtained through local BA optimization. And by calculating the inlier number of the line features under the pose transformation, it is judged whether the pose transformation is reasonable, so as to determine whether the loop closure candidate frame has passed the geometric consistency verification.

在图像外观相似度达到阈值α要求提取出回环候选帧,并且通过了一系列保证回环准确性的验证环节之后,则可判定已经发生了回环,并可根据检测出的回环进行全局地图的纠正和更新。After the similarity of the image appearance reaches the threshold α, it is required to extract the loopback candidate frames, and after passing a series of verification steps to ensure the loopback accuracy, it can be determined that a loopback has occurred, and the global map can be corrected and adjusted according to the detected loopback. renew.

Claims (1)

1. A loop detection method of an improved bag-of-words model based on line features comprises the following steps:
step 1: extracting LSD (Line Segment Descriptor) features through an offline image data set, calculating corresponding LBD (Line Band Descriptor) Line feature descriptors, and taking the LBD Line feature descriptors as original data of a clustering generation dictionary;
and 2, step: constructing a bag-of-words model based on LBD descriptors: before each clustering of the dictionary tree is constructed, determining an optimal clustering k value k 'for current data, and then clustering the current data into a k' class; the steps are circulated until a visual dictionary tree with self-adaptive branches is finally constructed;
and step 3: bag of words model vector transformation: extracting LSD line features from the image, quantizing each line feature in the image into a corresponding visual word according to the constructed bag-of-words model based on the LBD descriptor, the line feature descriptor and the Hamming distance between the visual words, and converting the whole image into a corresponding numerical vector;
and 4, step 4: visual word weight optimization: in the loop detection, the mutual index between the visual words and the key frames is established, and the number of the key frames with each visual word is counted
Figure FDA0003806043830000011
And calculating the repetition factor of the visual word according to the number of the corresponding key frames
Figure FDA0003806043830000012
Where n is the number of keyframes in the historical keyframe data set,
Figure FDA0003806043830000013
the number of key frames in which the visual word wi appears;
optimizing the visual word weight in the bag-of-words model vector according to the distribution condition of the visual words on the historical key frame data set, calculating the weight optimization parameters of the visual words, and combining the word weight calculated by the TF-IDF method to obtain the weight-optimized visual bag-of-words vector;
repetition factor in conjunction with visual words
Figure FDA0003806043830000014
And TF-IDF algorithm to generate visual word w i New weights
Figure FDA0003806043830000015
And generates therefrom a new bag-of-words model vector v' a
v′ a ={(w 1 ,η′ 1 ),(w 2 ,η′ 2 ),...,(w N ,η′ N )}
TF i As visual words w i Word frequency, IDF, in the current picture i As visual words w i Inverse document frequency on the training data set;
and 5: and (3) similarity calculation: calculating similarity by adopting an L1 norm according to visual bag-of-word vectors between the current frame and the historical key frame, and obtaining an appearance similarity score between the images;
step 6: loop candidate frames are acquired and grouped: setting the historical key frames meeting the requirement of the similarity threshold as loop candidate frames, grouping the loop candidate frames, dividing the loop candidate frames with similar time sequence into a group, and then rejecting the loop candidate frames with low group scores according to the similarity scores of the whole group and the given threshold;
and 7: and (3) verifying the continuity: at this stage, when a loop is continuously detected in the loop candidate frames, the corresponding loop candidate frames are retained;
and step 8: and (3) verifying geometric consistency: and performing geometric consistency verification on the loop candidate frame and the current key frame to ensure the accuracy of loop detection.
CN202011111454.8A 2020-10-16 2020-10-16 Loop detection method of improved bag-of-words model based on line characteristics Active CN112507778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011111454.8A CN112507778B (en) 2020-10-16 2020-10-16 Loop detection method of improved bag-of-words model based on line characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011111454.8A CN112507778B (en) 2020-10-16 2020-10-16 Loop detection method of improved bag-of-words model based on line characteristics

Publications (2)

Publication Number Publication Date
CN112507778A CN112507778A (en) 2021-03-16
CN112507778B true CN112507778B (en) 2022-10-04

Family

ID=74953814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011111454.8A Active CN112507778B (en) 2020-10-16 2020-10-16 Loop detection method of improved bag-of-words model based on line characteristics

Country Status (1)

Country Link
CN (1) CN112507778B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991448B (en) * 2021-03-22 2023-09-26 华南理工大学 Loop detection method, device and storage medium based on color histogram
CN115063715A (en) * 2022-05-30 2022-09-16 杭州电子科技大学 An acceleration method for ORB-SLAM3 loop closure detection based on gray histogram
CN115240115B (en) * 2022-07-27 2023-04-07 河南工业大学 Visual SLAM loop detection method combining semantic features and bag-of-words model
CN117409388A (en) * 2023-12-11 2024-01-16 天津中德应用技术大学 An improved bag-of-words model for smart car visual SLAM closed-loop detection method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909877A (en) * 2016-12-13 2017-06-30 浙江大学 A kind of vision based on dotted line comprehensive characteristics builds figure and localization method simultaneously
CN109409418A (en) * 2018-09-29 2019-03-01 中山大学 A kind of winding detection method based on bag of words

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108682027A (en) * 2018-05-11 2018-10-19 北京华捷艾米科技有限公司 VSLAM realization method and systems based on point, line Fusion Features
CN109886065A (en) * 2018-12-07 2019-06-14 武汉理工大学 An online incremental loopback detection method
CN109656545B (en) * 2019-01-17 2022-03-25 云南师范大学 Event log-based software development activity clustering analysis method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909877A (en) * 2016-12-13 2017-06-30 浙江大学 A kind of vision based on dotted line comprehensive characteristics builds figure and localization method simultaneously
CN109409418A (en) * 2018-09-29 2019-03-01 中山大学 A kind of winding detection method based on bag of words

Also Published As

Publication number Publication date
CN112507778A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
CN112507778B (en) Loop detection method of improved bag-of-words model based on line characteristics
CN107515895B (en) A visual target retrieval method and system based on target detection
Van Gemert et al. APT: Action localization proposals from dense trajectories.
Zheng et al. Scalable person re-identification: A benchmark
CN107330397B (en) A Pedestrian Re-identification Method Based on Large-Interval Relative Distance Metric Learning
CN109063649B (en) Pedestrian re-identification method based on twin pedestrian alignment residual error network
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
CN109522853A (en) Face datection and searching method towards monitor video
Yue et al. Robust loop closure detection based on bag of superpoints and graph verification
CN112069940A (en) A cross-domain person re-identification method based on staged feature learning
CN107169117B (en) A Human Motion Retrieval Method in Hand Drawing Based on Autoencoder and DTW
CN112633051B (en) Online face clustering method based on image search
CN110991321B (en) Video pedestrian re-identification method based on tag correction and weighting feature fusion
CN110516533B (en) Pedestrian re-identification method based on depth measurement
CN108960142B (en) Pedestrian re-identification method based on global feature loss function
CN104036296B (en) A kind of expression of image and processing method and processing device
CN114926742B (en) A loop detection and optimization method based on second-order attention mechanism
Yang et al. Multi-scale bidirectional fcn for object skeleton extraction
CN106845375A (en) A kind of action identification method based on hierarchical feature learning
CN105930792A (en) Human action classification method based on video local feature dictionary
CN110880010A (en) Visual SLAM closed loop detection algorithm based on convolutional neural network
CN118015539A (en) Improved YOLOv8 dense pedestrian detection method based on GSConv+VOV-GSCSP
CN109785387A (en) Winding detection method, device and the robot of robot
CN111723600A (en) A feature descriptor for person re-identification based on multi-task learning
CN105678349B (en) A kind of sub- generation method of the context-descriptive of visual vocabulary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant