CN110852327A

CN110852327A - Image processing method, device, electronic device and storage medium

Info

Publication number: CN110852327A
Application number: CN201911084122.2A
Authority: CN
Inventors: 施智平; 付超凡; 邵振洲; 关永; 韩旭; 张永祥; 姜那
Original assignee: Capital Normal University
Current assignee: Capital Normal University
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2020-02-28

Abstract

The embodiment of the disclosure discloses an image processing method, an image processing device, an electronic device and a storage medium, wherein the method comprises the following steps: filtering dynamic objects in the images to be processed in the image set to be processed to obtain static images; extracting local features in the static image by using a CNN network, and clustering the local features to obtain a plurality of clustering central points; and constructing a visual bag-of-words model of the static image according to the plurality of clustering center points, mapping local features of the static image to the constructed visual bag-of-words model, expressing feature vectors of the image set to be processed according to the visual bag-of-words model, and performing closed-loop detection according to the result expressed by the feature vectors.

Description

Image processing method, device, electronic device and storage medium

技术领域technical field

本公开涉及图像技术领域，具体涉及一种图像处理方法、装置、电子设备及存储介质。The present disclosure relates to the field of image technology, and in particular, to an image processing method, an apparatus, an electronic device, and a storage medium.

背景技术Background technique

近年来，随着机器人、自动驾驶技术的快速发展，视觉同步定位与地图构建(visual Simultaneous Localization and Mapping,vSLAM)在其中发挥着越来越重要的作用。其中，闭环检测作为vSLAM系统里的重要环节，主要用来判断机器人当前位置是否处于之前已经过空间区域，解决在传感器误差、噪声影响以及环境变化等大规模复杂环境中，由于计算的累积误差所导致的地图创建失败和位置丢失以及地图冗余数据或重复结构问题，从而提升vSLAM的准确性和鲁棒性。In recent years, with the rapid development of robotics and autonomous driving technologies, visual Simultaneous Localization and Mapping (vSLAM) plays an increasingly important role in them. Among them, closed-loop detection, as an important link in the vSLAM system, is mainly used to determine whether the current position of the robot is in the space area that has passed before. The resulting map creation failure and position loss, as well as map redundant data or duplicate structure problems, improve the accuracy and robustness of vSLAM.

目前已有的闭环检测方法可以分为两类：基于人工设计特征的传统闭环检测方法和基于深度学习的闭环检测方法。传统的闭环检测算法是依靠人工设计的特征描述子进行图像相似度匹配，可分为基于局部特征描述子(SIFT、SURF和ORB)的闭环检测方法和基于全局特征描述子(GIST)的闭环检测方法。基于局部特征描述子的闭环检测中最常用的方法为视觉词袋模型(Bag of Visual Words,BoVW)，主要思想是首先对图像提取局部特征，并聚类提取的特征描述子建立视觉词典。依据视觉词典树将图像表示成为一个K维数值向量，再通过度量图像特征向量之间的距离判断是否存在闭环。因为传统人工设计特征闭环检测算法是从图像局部区域中提取边缘、角点、线、曲线和特别属性等的区域特征，所以当周围的环境变化不明显时，具有很好的精确度。然而，由于人工设计特征算法绝大部分依赖于人类的专业知识和经验，不能对图像进行精确的表达，因此当周围的环境发生明显变化时(如：光照变化明显的场景中)，难以提供准确且鲁棒的图像特征描述，稳定性较差，导致复杂环境下的误匹配率增高。The existing closed-loop detection methods can be divided into two categories: traditional closed-loop detection methods based on artificially designed features and closed-loop detection methods based on deep learning. Traditional closed-loop detection algorithms rely on artificially designed feature descriptors for image similarity matching, which can be divided into closed-loop detection methods based on local feature descriptors (SIFT, SURF, and ORB) and closed-loop detection based on global feature descriptors (GIST). method. The most commonly used method for closed-loop detection based on local feature descriptors is the Bag of Visual Words (BoVW) model. According to the visual dictionary tree, the image is represented as a K-dimensional numerical vector, and then it is judged whether there is a closed loop by measuring the distance between the image feature vectors. Because the traditional artificially designed feature closed-loop detection algorithm extracts regional features such as edges, corners, lines, curves and special attributes from the local area of the image, it has good accuracy when the surrounding environment does not change significantly. However, since most of the artificially designed feature algorithms rely on human expertise and experience, they cannot accurately express images. Therefore, when the surrounding environment changes significantly (such as in scenes with obvious changes in illumination), it is difficult to provide accurate images. Moreover, the robust image feature description has poor stability, which leads to an increase in the mismatch rate in complex environments.

随着深度学习在近些年的快速发展，因为CNN(卷积神经网络)可以提取到图像深层次的的特征，具有更丰富的数据信息和更高的特征表达能力，所以CNN被用来从大量数据中学习特征来代替传统的人工设计的特征，以解决闭环检测问题。基于CNN的图像描述子当光照发生显著变化时,它们的表现优于人工设计的描述子，但由于基于CNN的闭环检测算法，提取的是图像全局特征，因此在没有光照变化的稳定环境中，基于全局特征的CNN特征提取闭环检测算法准确率要低于基于局部特征的传统人工设计特征的闭环检测算法。With the rapid development of deep learning in recent years, because CNN (Convolutional Neural Network) can extract the deep-level features of the image, it has richer data information and higher feature expression ability, so CNN is used to extract from the image. Learning features from a large amount of data replace the traditional artificially designed features to solve the closed-loop detection problem. CNN-based image descriptors perform better than artificially designed descriptors when the illumination changes significantly, but since the CNN-based closed-loop detection algorithm extracts global image features, in a stable environment without illumination changes, The accuracy of CNN feature extraction closed-loop detection algorithm based on global features is lower than that of traditional artificially designed features based on local features.

发明内容SUMMARY OF THE INVENTION

本公开实施例提供一种图像处理方法、装置、电子设备及计算机可读存储介质。Embodiments of the present disclosure provide an image processing method, an apparatus, an electronic device, and a computer-readable storage medium.

第一方面，本公开实施例中提供了一种图像处理方法，包括：In a first aspect, the embodiments of the present disclosure provide an image processing method, including:

对待处理图像集中待处理图像中的动态物体进行过滤，得到静态图像；Filter the dynamic objects in the to-be-processed images in the to-be-processed image set to obtain a static image;

利用CNN网络提取所述静态图像中的局部特征，并对所述局部特征进行聚类，得到多个聚类中心点；Use CNN network to extract local features in the static image, and cluster the local features to obtain a plurality of cluster center points;

根据所述多个聚类中心点构建所述静态图像的视觉词袋模型，将所述静态图像的局部特征映射到所构建的所述视觉词袋模型上，并根据所述视觉词袋模型对所述待处理图像集进行特征向量表示，并根据所述特征向量表示的结果进行闭环检测。The visual word bag model of the static image is constructed according to the plurality of cluster center points, the local features of the static image are mapped to the constructed visual word bag model, and the visual word bag model is constructed according to the visual word bag model. The to-be-processed image set is represented by a feature vector, and closed-loop detection is performed according to the result represented by the feature vector.

进一步地，对待处理图像集中待处理图像中的动态物体进行过滤，得到静态图像，包括：Further, the dynamic objects in the to-be-processed images in the to-be-processed image set are filtered to obtain a static image, including:

检测所述待处理图像中的动态物体，提取所述动态物体的区域信息；Detecting the dynamic object in the to-be-processed image, and extracting the area information of the dynamic object;

根据所述区域信息将所述动态物体从所述待处理图像过滤，得到所述静态图像。The static image is obtained by filtering the dynamic object from the to-be-processed image according to the area information.

进一步地，利用CNN网络提取所述静态图像中的局部特征，包括：Further, using the CNN network to extract the local features in the static image, including:

利用多尺度密集全卷积网络对所述静态图像进行处理，返回所述静态图像中关键点信息；Process the static image by using a multi-scale dense fully convolutional network, and return key point information in the static image;

根据所述关键点信息对所述静态图像进行局部裁剪，得到多个局部图像块；locally cropping the static image according to the key point information to obtain a plurality of local image blocks;

利用局部特征网络分别对多个所述局部图像块进行特征提取，得到所述局部图形的特征点位置以及特征描述子。A local feature network is used to perform feature extraction on a plurality of the local image blocks, respectively, to obtain the feature point positions and feature descriptors of the local graphics.

进一步地，将所述静态图像的局部特征映射到所构建的所述视觉词袋模型上，包括：Further, mapping the local features of the static image to the constructed visual word bag model, including:

确定所述静态图像的所述局部特征与所述视觉词袋模型中的所述聚类中心点之间的第一相似度；determining a first similarity between the local feature of the static image and the cluster center point in the bag of visual words model;

根据所述第一相似度将所述静态图像的所述局部特征映射到其中一个所述聚类中心点所属的词袋类别。The local feature of the static image is mapped to a bag-of-words category to which one of the cluster center points belongs according to the first similarity.

进一步地，根据所述视觉词袋模型对所述待处理图像集进行特征向量表示，包括：Further, performing feature vector representation on the to-be-processed image set according to the visual word bag model, including:

统计所述静态图像的局部特征映射至所述视觉词袋模型中各词袋类别的分布情况；Statistical mapping of the local features of the static image to the distribution of the bag-of-words categories in the visual bag-of-words model;

根据所述分布情况得到所述静态图像的特征向量。The feature vector of the static image is obtained according to the distribution.

进一步地，根据所述特征向量表示的结果进行闭环检测，包括：Further, performing closed-loop detection according to the result represented by the feature vector, including:

确定所述待处理图像集中两待处理图像对应的静态图像的特征向量之间的第二相似度；determining the second similarity between the feature vectors of the static images corresponding to the two to-be-processed images in the to-be-processed image set;

根据所述第二相似度确定所述两待检测图像的闭环检测结果。A closed-loop detection result of the two images to be detected is determined according to the second similarity.

第二方面，本发明实施例中提供了一种图像处理装置，包括：In a second aspect, an embodiment of the present invention provides an image processing apparatus, including:

过滤模块，被配置为对待处理图像集中待处理图像中的动态物体进行过滤，得到静态图像；a filtering module, configured to filter dynamic objects in the to-be-processed images in the to-be-processed image set to obtain a static image;

特征提取模块，被配置为利用CNN网络提取所述静态图像中的局部特征，并对所述局部特征进行聚类，得到多个聚类中心点；a feature extraction module, configured to extract local features in the static image by using a CNN network, and cluster the local features to obtain a plurality of cluster center points;

闭环检测模块，被配置为根据所述多个聚类中心点构建所述静态图像的视觉词袋模型，将所述静态图像的局部特征映射到所构建的所述视觉词袋模型上，并根据所述视觉词袋模型对所述待处理图像集进行特征向量表示，并根据所述特征向量表示的结果进行闭环检测。The closed-loop detection module is configured to construct a visual word bag model of the static image according to the plurality of cluster center points, map the local features of the static image to the constructed visual word bag model, and according to The bag-of-visual-words model performs feature vector representation on the to-be-processed image set, and performs closed-loop detection according to the result represented by the feature vector.

所述功能可以通过硬件实现，也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。The functions can be implemented by hardware, or can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

在一个可能的设计中，图像处理装置的结构中包括存储器和处理器，所述存储器用于存储一条或多条支持图像处理装置执行上述第一方面中所述方法的计算机指令，所述处理器被配置为用于执行所述存储器中存储的计算机指令。所述图像处理装置还可以包括通信接口，用于图像处理装置与其他设备或通信网络通信。In a possible design, the structure of the image processing apparatus includes a memory and a processor, where the memory is used to store one or more computer instructions that support the image processing apparatus to perform the method in the first aspect, and the processor is configured to execute computer instructions stored in the memory. The image processing apparatus may further include a communication interface for the image processing apparatus to communicate with other devices or a communication network.

第三方面，本公开实施例提供了一种电子设备，包括存储器和处理器；其中，所述存储器用于存储一条或多条计算机指令，其中，所述一条或多条计算机指令被所述处理器执行以实现第一方面所述的方法。In a third aspect, embodiments of the present disclosure provide an electronic device, including a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are processed by the The device executes to implement the method described in the first aspect.

第四方面，本公开实施例提供了一种计算机可读存储介质，用于存储企业账户的安全认证装置所用的计算机指令，其包含用于执行上述第一方面所述方法所涉及的计算机指令。In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium for storing computer instructions used by a security authentication device for an enterprise account, including computer instructions for executing the method described in the first aspect.

本公开实施例提供的技术方案可以包括以下有益效果：The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:

本公开实施例先利用卷积神经网络对图像进行局部区域分割并特征提取，将提取到的图像局部CNN特征描述子与传统闭环检测词袋模型算法相结合，在词袋模型内把得到的图像局部CNN特征描述子利用聚类中心算法进行聚类，并根据聚类出的多个中心点构建视觉词典树，进而得到表示图像的特征向量，对图像进行相似度比较，判断是否产生回环。该方法在复杂环境下能同时提高闭环检测的准确率和稳定性。In the embodiment of the present disclosure, the convolutional neural network is used to segment the local area of the image and feature extraction, and the extracted local CNN feature descriptor of the image is combined with the traditional closed-loop detection word bag model algorithm, and the obtained image is obtained in the word bag model. The local CNN feature descriptor uses the clustering center algorithm to cluster, and builds a visual dictionary tree according to the clustered multiple center points, and then obtains the feature vector representing the image, and compares the similarity of the image to determine whether a loop closure occurs. This method can simultaneously improve the accuracy and stability of closed-loop detection in complex environments.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

结合附图，通过以下非限制性实施方式的详细描述，本公开的其它特征、目的和优点将变得更加明显。在附图中：Other features, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the attached image:

图1示出根据本公开一实施方式的图像处理方法的流程图；FIG. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure;

图2示出根据图1所示实施方式的步骤S101的流程图；Fig. 2 shows a flowchart of step S101 according to the embodiment shown in Fig. 1;

图3示出根据图1所示实施方式的步骤S102的流程图；FIG. 3 shows a flowchart of step S102 according to the embodiment shown in FIG. 1;

图4示出了本公开实施例中将深度学习优点与传统闭环检测优点相结合的方法流程示意图；FIG. 4 shows a schematic flowchart of a method for combining the advantages of deep learning with the advantages of traditional closed-loop detection in an embodiment of the present disclosure;

图5示出根据本公开一实施方式的图像处理装置的结构框图；FIG. 5 shows a structural block diagram of an image processing apparatus according to an embodiment of the present disclosure;

图6是适于用来实现根据本公开一实施方式的图像处理方法的电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device suitable for implementing an image processing method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

下文中，将参考附图详细描述本公开的示例性实施方式，以使本领域技术人员可容易地实现它们。此外，为了清楚起见，在附图中省略了与描述示例性实施方式无关的部分。Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts unrelated to describing the exemplary embodiments are omitted from the drawings.

在本公开中，应理解，诸如“包括”或“具有”等的术语旨在指示本说明书中所公开的特征、数字、步骤、行为、部件、部分或其组合的存在，并且不欲排除一个或多个其他特征、数字、步骤、行为、部件、部分或其组合存在或被添加的可能性。In the present disclosure, it should be understood that terms such as "comprising" or "having" are intended to indicate the presence of features, numbers, steps, acts, components, parts, or combinations thereof disclosed in this specification, and are not intended to exclude a or multiple other features, numbers, steps, acts, components, parts, or combinations thereof may exist or be added.

另外还需要说明的是，在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。In addition, it should be noted that the embodiments of the present disclosure and the features of the embodiments may be combined with each other under the condition of no conflict. The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

图1示出根据本公开一实施方式的图像处理方法的流程图。如图1所示，所述图像处理方法包括以下步骤：FIG. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in Figure 1, the image processing method includes the following steps:

在步骤S101中，对待处理图像集中待处理图像中的动态物体进行过滤，得到静态图像；In step S101, filtering the dynamic objects in the to-be-processed images in the to-be-processed image set to obtain a static image;

在步骤S102中，利用CNN网络提取所述静态图像中的局部特征，并对所述局部特征进行聚类，得到多个聚类中心点；In step S102, the CNN network is used to extract the local features in the static image, and the local features are clustered to obtain a plurality of cluster center points;

在步骤S103中，根据所述多个聚类中心点构建所述静态图像的视觉词袋模型，将所述静态图像的局部特征映射到所构建的所述视觉词袋模型上，并根据所述视觉词袋模型对所述待处理图像集进行特征向量表示，并根据所述特征向量表示的结果进行闭环检测。In step S103, a visual word bag model of the static image is constructed according to the plurality of cluster center points, the local features of the static image are mapped to the constructed visual word bag model, and according to the The visual bag of words model performs feature vector representation on the to-be-processed image set, and performs closed-loop detection according to the result represented by the feature vector.

本实施例中，待处理图像集可以是待进行闭环检测的多张待处理图像构成的集合。动态物体为待处理图像中处于动态变化的物体，例如人、动物、车辆等。In this embodiment, the set of images to be processed may be a set composed of multiple images to be processed that are to be subjected to closed-loop detection. Dynamic objects are objects that are dynamically changing in the image to be processed, such as people, animals, vehicles, and the like.

本公开实施例在对待处理图像集中的待处理图像进行闭环检测之前，将待处理图像中的动态物体进行过滤，进而得到静态图像。在一些实施例中，可以将待处理图像中的动态物体所在的图像区域中的像素置为预设的一种颜色，例如黑色，因为全黑色区域不会有特征点，也就排除了动态物体上的特征点干扰。In the embodiment of the present disclosure, before the closed-loop detection of the to-be-processed images in the to-be-processed image set is performed, the dynamic objects in the to-be-processed images are filtered to obtain a static image. In some embodiments, the pixel in the image area where the dynamic object in the image to be processed is located may be set to a preset color, such as black, because the completely black area will not have feature points, which excludes the dynamic object feature point interference.

针对从待处理图像集得到的多个静态图像，可以利用CNN网络提取静态图像中的局部特征。利用CNN网络对静态图像进行局部特征的提取过程中，首先利用第一部分CNN网络从静态图像提取关键点信息，并基于关键点将静态图像裁剪成多个局部区域图像块，每个局部区域图像块包括一个从静态图像中提取出来的关键点。之后，针对每个局部区域图像块，再利用第二部分CNN网络从每个局部区域图像块中提取关键点信息，从静态图像对应的多个局部区域图像块中提取出的关键点信息作为静态图像的局部特征，例如静态图像对应的多个局部区域图像块中提取出的关键点的特征向量可以作为静态图像的局部特征。For multiple static images obtained from the set of images to be processed, a CNN network can be used to extract local features in the static images. In the process of using the CNN network to extract local features from the static image, the first part of the CNN network is used to extract the key point information from the static image, and based on the key points, the static image is cropped into multiple local area image blocks, each local area image block. Include a keypoint extracted from a static image. After that, for each local area image block, the second part of the CNN network is used to extract the key point information from each local area image block, and the key point information extracted from the multiple local area image blocks corresponding to the static image is used as the static image. Local features of the image, such as feature vectors of key points extracted from multiple local area image blocks corresponding to the static image, can be used as the local features of the static image.

在得到待处理图像集中每个待处理图像对应的静态图像的局部特征之后，可以利用中心聚类算法对这些局部特征进行聚类，得到多个聚类中心点。中心聚类算法可以采用Kmeans++算法，该算法为已知算法，具体聚类过程在此不再赘述。After the local features of the static images corresponding to each to-be-processed image in the to-be-processed image set are obtained, the central clustering algorithm can be used to cluster these local features to obtain a plurality of cluster center points. The central clustering algorithm may use the Kmeans++ algorithm, which is a known algorithm, and the specific clustering process will not be repeated here.

聚类完成之后，可以根据多个聚类中心点构建待处理图像集中所有待处理图像对应的静态图像的视觉词袋模型。所构建的视觉词袋模型包括多个词袋类别，每个词袋类别代表一个类，且每个词袋类别的中心点为其中一个聚类中心点。一个局部特征与该聚类中心点较为相似时，可以将该局部特征划分到该聚类中心点所在的词袋类别中。After the clustering is completed, a visual word bag model of the static images corresponding to all the to-be-processed images in the to-be-processed image set can be constructed according to the plurality of cluster center points. The constructed visual word bag model includes multiple word bag categories, each word bag category represents a class, and the center point of each word bag category is one of the cluster center points. When a local feature is relatively similar to the cluster center point, the local feature can be classified into the bag of words category where the cluster center point is located.

在构建出视觉词袋模型之后，可以根据与各个聚类中心点之间的相似度，将静态图像的局部特征映射到视觉词袋模型上。也即将静态图像的局部特征划分到视觉词袋模型包括的各个词袋类别中。在映射完成之后，根据映射结果对静态图像进行特征向量表示，而静态图像的特征向量表示结果可以作为该静态图像对应的待处理图像的特征向量表示结果。在一些实施例中，静态图像的特征向量表示可以基于静态图像的局部特征被映射到视觉词袋模型上之后，这些局部调整在各个词袋类别上的分布结果来得到。After the visual word bag model is constructed, the local features of the static image can be mapped to the visual word bag model according to the similarity with each cluster center point. That is, the local features of static images are divided into various word bag categories included in the visual word bag model. After the mapping is completed, the static image is represented by a feature vector according to the mapping result, and the feature vector representation result of the static image can be used as the feature vector representation result of the image to be processed corresponding to the static image. In some embodiments, the feature vector representation of the static image can be obtained based on the distribution results of these local adjustments on the respective bag-of-words categories after the local features of the static image are mapped to the visual bag-of-words model.

在得到了待处理图像集中各个待处理图像的特征向量表示之后，可以根据特征向量表示确定两待处理图像之间的闭环检测结果。After the feature vector representation of each to-be-processed image in the to-be-processed image set is obtained, the closed-loop detection result between the two to-be-processed images can be determined according to the feature vector representation.

在本实施例的一个可选实现方式中，如图2所示，所述步骤S101，即对待处理图像集中待处理图像中的动态物体进行过滤，得到静态图像的步骤，进一步包括以下步骤：In an optional implementation manner of this embodiment, as shown in FIG. 2 , the step S101, that is, the step of filtering the dynamic objects in the to-be-processed images in the to-be-processed image set to obtain a static image, further includes the following steps:

在步骤S201中，检测所述待处理图像中的动态物体，提取所述动态物体的区域信息；In step S201, a dynamic object in the to-be-processed image is detected, and region information of the dynamic object is extracted;

在步骤S202中，根据所述区域信息将所述动态物体从所述待处理图像过滤，得到所述静态图像。In step S202, the dynamic object is filtered from the to-be-processed image according to the area information to obtain the static image.

在图像预处理阶段，对图像进行场景识别并动静态场景分离，在场景分离之前先利用PASCAL VOC2007和VOC2012数据集对YOLOv3模型进行预训练。该数据集中包含人、动物、车辆等多数户外移动物体以及这些物体在图片中的位置和所属分类标识。其中，PASCALVOC 2007含有20个分类，9963张图像及其中标注的24640个物体，PASCAL VOC 2012含有20个分类，11530张图像及其中标注的27450个物体。In the image preprocessing stage, the image is scene recognized and the dynamic and static scenes are separated. Before the scene separation, the YOLOv3 model is pre-trained using the PASCAL VOC2007 and VOC2012 datasets. The dataset contains most outdoor moving objects, such as people, animals, vehicles, etc., as well as the location and classification of these objects in the picture. Among them, PASCALVOC 2007 contains 20 categories, 9963 images and 24,640 objects annotated therein, and PASCAL VOC 2012 contains 20 categories, 11,530 images and 27,450 objects annotated in them.

PASCAL数据集中的物体分类完整的覆盖了自主移动机器人在户外工作时可能遇到的移动物体，并且本公开实施例所用数据集中的户外移动物体绝大多数为人、车辆等，因此使用该数据库训练出的卷积神经网络可以轻松的识别出这些动态物体，进而从待处理图像中过滤掉动态物体。The object classification in the PASCAL data set completely covers the moving objects that the autonomous mobile robot may encounter when working outdoors, and most of the outdoor moving objects in the data set used in the embodiments of the present disclosure are people, vehicles, etc. The convolutional neural network can easily identify these dynamic objects, and then filter out dynamic objects from the image to be processed.

在用于识别图像中的动态物体的卷积神经网络训练完成后，可以利用基于目标检测的动静态场景分离算法对于待处理图像集中的待处理图像进行动静态场景识别并分离。After the training of the convolutional neural network for recognizing dynamic objects in the image is completed, the dynamic and static scene separation algorithm based on object detection can be used to recognize and separate the dynamic and static scenes for the to-be-processed images in the to-be-processed image set.

本公开实施例中使用已预训练好的目标检测YOLOv3模型对待处理图像进行动态物体检测，提取出待处理图像中的动态物体区域信息，该区域信息包括动态物体所在位置的矩形区域及区域大小等。将动态物体区域信息在原图像中对应位置分离，排除掉原图像中行人、车辆等动态物体信息。In the embodiment of the present disclosure, the pre-trained target detection YOLOv3 model is used to perform dynamic object detection on the image to be processed, and the dynamic object area information in the to-be-processed image is extracted, and the area information includes the rectangular area where the dynamic object is located and the area size, etc. . The dynamic object area information is separated from the corresponding position in the original image, and the dynamic object information such as pedestrians and vehicles in the original image is excluded.

在一些实施例中，所采用场景信息分离方式为基于图像像素点所对应的矩阵进行处理，将动态物体区域在原图像中置黑，因为全黑色区域不会有特征点，也就排除了动态物体上的特征点干扰。从而使得后续对图像进行局部特征提取时，对于已分离的动态场景区域将不再进行特征提取。则得到的其他位置区域便是我们需要求解的过滤后的排除掉动态物体的纯静态场景图像。In some embodiments, the adopted scene information separation method is to process based on the matrix corresponding to the image pixel points, and set the dynamic object area to be black in the original image, because the completely black area will not have feature points, so the dynamic object is excluded. feature point interference. Therefore, when the local feature extraction is performed on the image subsequently, the feature extraction will not be performed for the separated dynamic scene area. The other location areas obtained are the filtered pure static scene images that exclude dynamic objects that we need to solve.

本公开实施例在图像预处理阶段采用基于图像语义分割的动静态场景分离算法，对图像进行场景语义分割并分离，在场景分离之前，先利用PASCAL VOC 2007和PASCAL VOC2012数据集对语义分割DeepLabv3 Plus模型进行预训练。The embodiment of the present disclosure adopts the dynamic and static scene separation algorithm based on image semantic segmentation in the image preprocessing stage, and performs scene semantic segmentation and separation on the image. The model is pre-trained.

之后，利用预训练好的DeepLabv3 Plus模型对输入的待处理图像语义分割，并在结果中得到图像动静态场景的语义信息。根据得到的结果提取出待处理图像中动态物体的语义信息(包括动态物体所在区域的图像像素的信息等)，对图像整体语义信息进行基于图像像素点矩阵的遍历，当像素点矩阵为训练时标注的物体分类信息时，将该动态物体的语义信息在其原图像中排除，即直接将动态物体如行人、车辆等语义信息单独从原图像中对应位置置黑，得到只去除动态物体本身信息的静态图像。After that, the pre-trained DeepLabv3 Plus model is used to semantically segment the input image to be processed, and the semantic information of the dynamic and static scenes of the image is obtained in the result. According to the obtained results, the semantic information of the dynamic object in the image to be processed (including the information of the image pixels in the area where the dynamic object is located, etc.) is extracted, and the overall semantic information of the image is traversed based on the image pixel matrix. When the pixel matrix is training When labeling the object classification information, the semantic information of the dynamic object is excluded from the original image, that is, the semantic information of dynamic objects such as pedestrians and vehicles is directly blacked out from the corresponding position in the original image, and only the information of the dynamic object itself is removed. static image.

在一些实施例中，为避免动态物体边缘信息未被完全分离，保证动态目标语义信息去除的更干净彻底，可以把分离后的动态物体区域信息进一步做图像膨胀处理(也即将动态物体所在图像区域置黑之后，将动态物体所在的黑色区域做膨胀处理)，迭代次数为2。In some embodiments, in order to prevent the dynamic object edge information from not being completely separated, and to ensure that the dynamic object semantic information is removed more cleanly and completely, the separated dynamic object region information can be further processed by image expansion (that is, the image region where the dynamic object is located). After blackening, the black area where the dynamic object is located is expanded), and the number of iterations is 2.

在本实施例的一个可选实现方式中，如图3所示，所述步骤S102，即对利用CNN网络提取所述静态图像中的局部特征的步骤，进一步包括以下步骤：In an optional implementation manner of this embodiment, as shown in FIG. 3 , the step S102, that is, the step of extracting local features in the static image by using a CNN network, further includes the following steps:

在步骤S301中，利用多尺度密集全卷积网络对所述静态图像进行处理，返回所述静态图像中关键点信息；In step S301, the static image is processed by a multi-scale dense full convolution network, and the key point information in the static image is returned;

在步骤S302中，根据所述关键点信息对所述静态图像进行局部裁剪，得到多个局部图像块；In step S302, locally cropping the static image according to the key point information to obtain a plurality of local image blocks;

在步骤S303中，利用局部特征网络分别对多个所述局部图像块进行特征提取，得到所述局部图像块的特征点位置以及特征描述子。In step S303, a local feature network is used to perform feature extraction on a plurality of the local image blocks, respectively, to obtain feature point positions and feature descriptors of the local image blocks.

该可选的实现方式中，CNN网络可以使用一种深层架构的稀疏匹配方法LF-Net网络，该CNN网络可以分为两部分，第一部分CNN网络可以是多尺度密集全卷积网络，它返回静态图像的关键点位置、尺度和方向等，依据关键点位置对静态图像进行局部区域裁剪，得到每个静态图像对应的多个局部区域图像块。第二部分CNN网络可以为局部描述符网络，该局部描述符网络用来提取由第一部分网络产生的基于关键点裁剪的图像块。在对静态图像进行局部特征提取时，先利用全卷积网络从待处理图像I生成丰富的特征图o。将生成的特征图o进行尺度不变的关键点检测，依据尺度不变映射选择顶部M个像素作为特征点，最终得到局部图像块的特征点位置以及特征描述子。In this optional implementation, the CNN network can use a deep architecture sparse matching method LF-Net network. The CNN network can be divided into two parts. The first part of the CNN network can be a multi-scale dense fully convolutional network, which returns The position, scale and direction of the key points of the static image are cropped according to the position of the key points, and a plurality of local area image blocks corresponding to each static image are obtained. The second part of the CNN network may be a local descriptor network used to extract the keypoint-based cropped image patches produced by the first part of the network. When extracting local features from static images, a fully convolutional network is first used to generate rich feature maps o from the image I to be processed. The generated feature map o is subjected to scale-invariant key point detection, and the top M pixels are selected as feature points according to the scale-invariant map, and finally the feature point position and feature descriptor of the local image block are obtained.

其中，多尺度密集全卷积网络为依据MSDNet(多尺度密集网络)思想设计得到的全卷积网络结构，该网络无全连接层，采用三个简单的ResNet布局，每个块包含5*5卷积滤波器，然后进行批量归一化，leaky-ReLU激活和另一组5*5卷积。并采取MSDNet(多尺度密集网络)思想：采用了多个尺度来获取图形的抽象特征以生成特征图，分为两个部分的串联：1.上层同尺度的特征的卷积以及上层上个尺度特征图的降采样(diagonal connection)。2.每层都有到其他卷积层的连接，反向传播时，权重向对每个特征提取结果更好的方向更新。Among them, the multi-scale dense fully convolutional network is a fully convolutional network structure designed based on the MSDNet (multi-scale dense network) idea. The network has no fully connected layer and adopts three simple ResNet layouts, each block contains 5*5 Convolution filters followed by batch normalization, leaky-ReLU activation and another set of 5*5 convolutions. And adopt MSDNet (multi-scale dense network) idea: multiple scales are used to obtain abstract features of graphics to generate feature maps, which are divided into two parts in series: 1. The convolution of the features of the same scale in the upper layer and the scale of the upper layer Downsampling of feature maps (diagonal connection). 2. Each layer has connections to other convolutional layers, and during backpropagation, the weights are updated in a direction that is better for each feature extraction result.

选定关键点后利用双线性采样方式(以保留差异性)对归一化图像(将静态图像进行归一化后得到的图像)进行基于关键点周围位置的图像块裁剪。双线性采样方式也即双线型插值算法，一种比较好的图像缩放算法，它充分的利用图像中虚拟点四周的四个真实存在的像素值来共同决定目标图中的一个像素值，从而达到对图像裁剪的目的。After the key points are selected, the normalized image (the image obtained by normalizing the static image) is cropped based on the position around the key point using the bilinear sampling method (to preserve the difference). The bilinear sampling method, also known as the bilinear interpolation algorithm, is a relatively good image scaling algorithm. It makes full use of the four real pixel values around the virtual point in the image to jointly determine a pixel value in the target image. So as to achieve the purpose of image cropping.

利用第一部分网络得到静态图像的多个局部区域图像块之后，可以利用第二部分网络裁剪后的局部区域图像块提取特征(包括局部区域图像块的关键点位置、大小以及特征向量等)，最终得到静态图像的局部特征描述子集合，以用于后续聚类及建立视觉词袋模型。与传统人工提取特征算法相比，LF-Net网络能提取更丰富的特征，并且几乎不受光照等环境的影响，具有更强的稳定性和更高的准确率。After using the first part of the network to obtain multiple local area image blocks of the static image, you can use the local area image blocks cropped by the second part of the network to extract features (including the key point position, size and feature vector of the local area image blocks, etc.), and finally A subset of local feature descriptors of static images are obtained for subsequent clustering and building of a visual bag of words model. Compared with the traditional manual feature extraction algorithm, the LF-Net network can extract richer features and is hardly affected by the environment such as illumination, with stronger stability and higher accuracy.

利用本公开实施例中的上述方式对图像进行局部特征提取时，对于原图像中已分离掉的动态场景区域将不再进行特征提取。即在特征提取时当检测到该区域为动态场景区域时，便绕过该区域不再进行局部特征提取。从而使得动态场景区域不再参与图像的特征表达。最终确定过滤后的图像特征点位置与特征描述子。When the local feature extraction is performed on the image by using the above-mentioned manner in the embodiment of the present disclosure, the feature extraction will not be performed on the separated dynamic scene region in the original image. That is, when the region is detected as a dynamic scene region during feature extraction, the region is bypassed and local feature extraction is no longer performed. Therefore, the dynamic scene area no longer participates in the feature expression of the image. Finally, the filtered image feature point locations and feature descriptors are determined.

在本实施例的一个可选实现方式中，所述步骤S103中将所述静态图像的局部特征映射到所构建的所述视觉词袋模型上的步骤进一步包括以下步骤：In an optional implementation manner of this embodiment, the step of mapping the local features of the static image to the constructed visual word bag model in step S103 further includes the following steps:

该可选的实现方式中，将得到的静态图像的局部特征与闭环检测词袋模型算法相结合，即将提取到的静态图像局部特征描述子集合基于Kmeans++中心聚类算法进行聚类，并依据聚类出的多个聚类中心点构建视觉词袋模型。In this optional implementation, the obtained local features of the static image are combined with the closed-loop detection bag-of-words model algorithm. A visual bag of words model is constructed by classifying multiple cluster center points.

在一些实施例中，通过对待处理图像集中各个待处理图像对应的静态图像对应的局部特征进行中心聚类算法，得到多个聚类中心点之后，可以将静态图像对应的局部特征映射到多个聚类中心点上，具体映射过程可以通过计算局部特征与多个聚类中心点之间的第一相似度，并根据第一相似度进行映射。例如，对于静态图像A的某个局部特征B，可以计算局部特征B的特征描述子与各个聚类中心点Ci的特征描述子之间的第一相似度，第一相似度可以通过欧式距离确定，并将该局部特征B映射到对应的第一相似度最大的聚类中心点Ci对应的词袋类别上。通过这种方式，可以将所有静态图像中提取出的所有局部特征映射到视觉词袋模型上，最终得到视觉词典树。In some embodiments, by performing a central clustering algorithm on the local features corresponding to the static images corresponding to each to-be-processed image in the to-be-processed image set, after obtaining multiple cluster center points, the local features corresponding to the static images can be mapped to multiple On the cluster center point, the specific mapping process may be performed by calculating the first similarity between the local feature and the multiple cluster center points, and performing mapping according to the first similarity. For example, for a certain local feature B of the static image A, the first similarity between the feature descriptor of the local feature B and the feature descriptor of each cluster center point Ci can be calculated, and the first similarity can be determined by the Euclidean distance. , and map the local feature B to the bag of words category corresponding to the cluster center point Ci with the first maximum similarity. In this way, all local features extracted from all static images can be mapped to the visual word bag model, and finally a visual dictionary tree can be obtained.

在本实施例的一个可选实现方式中，所述步骤S103中根据所述视觉词袋模型对所述待处理图像集进行特征向量表示的步骤进一步包括以下步骤：In an optional implementation manner of this embodiment, the step of performing feature vector representation on the to-be-processed image set according to the visual word bag model in step S103 further includes the following steps:

该可选的实现方式中，在将静态图像上的局部特征映射到视觉词典模型上之后，可以根据映射结果统计各个静态图像上的局部特征在视觉词典模型的各个词袋类别上的分布情况，例如一个静态图像上的局部特征在不同词袋类别上的分布频次，之后基于个分布情况得到所述静态图像的特征向量。例如，构建的视觉词袋模型包括3个词袋类别，静态图像A总共对应有10个局部特征，该10个局部特征被映射到该3个词袋类别上之后，在第一个词袋类别上映射有1个局部特征，第二个词袋类别上映射有6个局部特征，而第三个词袋类别上映射有3个词袋类别，则静态图像A的特征向量可以表示为(1，6，3)，当然还可以将该特征向量进行归一化后作为静态图像A的特征向量。可以理解的是，静态图像的特征向量维数与视觉词袋模型中的词袋类别个数相同，也与聚类中心点数量相同。In this optional implementation, after the local features on the static image are mapped to the visual dictionary model, the distribution of the local features on each static image on each word bag category of the visual dictionary model can be counted according to the mapping result, For example, the distribution frequency of local features on a static image on different word bag categories, and then the feature vector of the static image is obtained based on the distribution. For example, the constructed visual word bag model includes 3 word bag categories, and the static image A has a total of 10 local features. After the 10 local features are mapped to the 3 word bag categories, the first word bag category There is 1 local feature mapped on the top, 6 local features are mapped on the second bag of words category, and 3 bag of words categories are mapped on the third bag of words category, the feature vector of static image A can be expressed as (1 , 6, 3), of course, the feature vector can also be normalized as the feature vector of the static image A. It can be understood that the dimension of the feature vector of the static image is the same as the number of bag-of-words categories in the visual bag-of-words model, and is also the same as the number of cluster center points.

在本实施例的一个可选实现方式中，所述步骤S103中根据所述特征向量表示的结果进行闭环检测的步骤进一步包括以下步骤：In an optional implementation manner of this embodiment, the step of performing closed-loop detection according to the result represented by the feature vector in step S103 further includes the following steps:

该可选的实现方式中，待处理图像集中每个待处理图像的特征向量可以由对应的静态图像的特征向量来表示，因此，在确定待处理图像集中两待处理图像之间的闭环检测结果时，可以通过计算该两待处理图像对应的静态图像的特征向量之间的第二相似度，在该第二相似度大于或等于预设阈值时，可以认为该两待处理图像的闭环检测结果为是，也即该两待处理图像之间产生闭环，否则该两待处理图像之间未产生闭环。第二相似度可以采用两待处理图像的余弦距离来确定。In this optional implementation manner, the feature vector of each to-be-processed image in the to-be-processed image set may be represented by the feature vector of the corresponding static image. Therefore, in determining the closed-loop detection result between two to-be-processed images in the to-be-processed image set , the second similarity between the feature vectors of the static images corresponding to the two images to be processed can be calculated, and when the second similarity is greater than or equal to a preset threshold, it can be considered that the closed-loop detection results of the two images to be processed If yes, that is, a closed loop is generated between the two to-be-processed images, otherwise, no closed-loop is generated between the two to-be-processed images. The second similarity may be determined by using the cosine distance of the two images to be processed.

由于稳定环境下基于局部特征提取的传统闭环检测算法具有高精确度，而复杂环境下基于全局特征提取的深度学习闭环检测算法具有更好的稳定性。为能同时提高复杂环境下闭环检测的稳定性和准确率，本公开实施例提出了一种基于动静态场景分离与局部CNN特征表示的闭环检测方法。首先在图像预处理阶段，利用基于目标检测或基于语义分割的动静态场景分离算法对输入图像进行场景分离，排除掉图像中行人、车辆等动态物体的干扰，得到过滤后的纯静态场景图像。然后利用CNN特征提取算法对过滤后的图像进行局部特征提取。在对图像进行局部特征提取时，对于原图像中已分离掉的动态场景区域将不再进行特征提取。最终确定过滤后的图像的特征点位置与特征描述子，并与传统闭环检测词袋模型算法相结合，将提取到的CNN局部特征描述子的集合利用词袋模型进行聚类并构建视觉词典树。依据已构建的词典树得到图像的特征向量，对特征向量进行相似度比较，判断是否产生闭环。Because the traditional closed-loop detection algorithm based on local feature extraction has high accuracy in stable environments, the deep learning closed-loop detection algorithm based on global feature extraction in complex environments has better stability. In order to simultaneously improve the stability and accuracy of closed-loop detection in complex environments, an embodiment of the present disclosure proposes a closed-loop detection method based on dynamic and static scene separation and local CNN feature representation. First, in the image preprocessing stage, the input image is separated by a dynamic and static scene separation algorithm based on object detection or semantic segmentation, and the interference of dynamic objects such as pedestrians and vehicles in the image is excluded, and a filtered pure static scene image is obtained. Then, the CNN feature extraction algorithm is used to extract local features from the filtered images. When performing local feature extraction on an image, feature extraction will not be performed on the separated dynamic scene regions in the original image. Finally, the feature point positions and feature descriptors of the filtered image are determined, and combined with the traditional closed-loop detection word bag model algorithm, the set of extracted CNN local feature descriptors is clustered using the word bag model and a visual dictionary tree is constructed. . The feature vector of the image is obtained according to the constructed dictionary tree, and the similarity of the feature vector is compared to determine whether a closed loop is generated.

本公开实施例产生的有益效果主要在于：The beneficial effects produced by the embodiments of the present disclosure mainly lie in:

1)、一种基于动静态场景分离的闭环检测算法，解决环境中行人、车辆等可移动物体的干扰下视觉词袋模型对图像语义信息的认知度降低，不能有效提取图像中全局的场景信息等问题，本公开实施例在图像预处理阶段，对原图像中的动静态场景分离，过滤掉图像中的非静态场景物体，降低对图像相似度匹配干扰。从而提高闭环检测的准确率。1) A closed-loop detection algorithm based on dynamic and static scene separation, which solves the problem that the visual word bag model reduces the recognition of image semantic information under the interference of pedestrians, vehicles and other movable objects in the environment, and cannot effectively extract the global scene in the image. In the image preprocessing stage, the embodiments of the present disclosure separate dynamic and static scenes in the original image, filter out non-static scene objects in the image, and reduce interference to image similarity matching. Thus, the accuracy of closed-loop detection is improved.

2)、一种基于局部CNN特征表示的闭环检测方法。由于传统闭环检测算法提取图像局部特征，基于深度学习的闭环检测算法提取的是图像全局特征，因此相同稳定环境下，传统基于局部特征的闭环检测算法相比于基于深度学习的闭环检测算法具有更高的精确度高。但在复杂环境下(如光照发生剧烈变化),基于深度学习的闭环检测算法明显优于传统闭环检测算法。2), a closed-loop detection method based on local CNN feature representation. Since the traditional closed-loop detection algorithm extracts the local features of the image, the deep-learning-based closed-loop detection algorithm extracts the global image features. Therefore, under the same stable environment, the traditional closed-loop detection algorithm based on local features has more advantages than the closed-loop detection algorithm based on deep learning. High accuracy is high. However, in complex environments (such as drastic changes in illumination), the closed-loop detection algorithm based on deep learning is significantly better than the traditional closed-loop detection algorithm.

本公开实施例提出了一种将深度学习优点与传统闭环检测优点相结合的方法。本公开实施例先利用卷积神经网络对图像进行局部区域分割并特征提取，将提取到的图像局部CNN特征描述子与传统闭环检测词袋模型算法相结合，在词袋模型内把得到的图像局部CNN特征描述子利用Kmeans++算法进行聚类，并根据聚类出的K个中心点构建视觉词典树，进而得到表示图像的特征向量，对图像进行相似度比较，判断是否产生回环。该方法在复杂环境下能同时提高闭环检测的准确率和稳定性。Embodiments of the present disclosure propose a method that combines the advantages of deep learning with the advantages of traditional closed-loop detection. In the embodiment of the present disclosure, the convolutional neural network is used to segment the local area of the image and feature extraction, and the extracted local CNN feature descriptor of the image is combined with the traditional closed-loop detection word bag model algorithm, and the obtained image is obtained in the word bag model. The local CNN feature descriptor is clustered by the Kmeans++ algorithm, and a visual dictionary tree is constructed according to the K center points clustered, and then the feature vector representing the image is obtained, and the similarity of the image is compared to determine whether a loop is generated. This method can simultaneously improve the accuracy and stability of closed-loop detection in complex environments.

图4示出了本公开实施例中将深度学习优点与传统闭环检测优点相结合的方法流程示意图。如图4所示，在图像预处理阶段，将原始图像进行动静态场景分离，得到场景分离后的静态图像。对静态图像进行局部CNN特征检测，得到图像局部特征集合并进行特征点聚类，在词袋模型中利用统计直方图的方法在每一幅图像中统计特征点在词袋上的频数分布，得到的向量就是该图像的特征向量，对任意两张图像进行相似度比较，判断是否产生闭环。FIG. 4 shows a schematic flowchart of a method for combining the advantages of deep learning with the advantages of traditional closed-loop detection in an embodiment of the present disclosure. As shown in Figure 4, in the image preprocessing stage, the original image is separated from dynamic and static scenes to obtain a static image after scene separation. Perform local CNN feature detection on static images to obtain image local feature sets and cluster feature points. In the bag-of-words model, the method of statistical histogram is used to count the frequency distribution of feature points on the bag of words in each image to obtain The vector of is the feature vector of the image, and compare the similarity between any two images to determine whether a closed loop is generated.

下述为本公开装置实施例，可以用于执行本公开方法实施例。The following are the apparatus embodiments of the present disclosure, which can be used to execute the method embodiments of the present disclosure.

图5示出根据本公开一实施方式的图像处理装置的结构框图，该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。如图5所示，所述图像处理装置包括：FIG. 5 shows a structural block diagram of an image processing apparatus according to an embodiment of the present disclosure. The apparatus may be implemented by software, hardware, or a combination of the two to become part or all of an electronic device. As shown in Figure 5, the image processing device includes:

过滤模块501，被配置为对待处理图像集中待处理图像中的动态物体进行过滤，得到静态图像；The filtering module 501 is configured to filter the dynamic objects in the to-be-processed images in the to-be-processed image set to obtain a static image;

特征提取模块502，被配置为利用CNN网络提取所述静态图像中的局部特征，并对所述局部特征进行聚类，得到多个聚类中心点；The feature extraction module 502 is configured to extract local features in the static image by using a CNN network, and perform clustering on the local features to obtain a plurality of cluster center points;

闭环检测模块503，被配置为根据所述多个聚类中心点构建所述静态图像的视觉词袋模型，将所述静态图像的局部特征映射到所构建的所述视觉词袋模型上，并根据所述视觉词袋模型对所述待处理图像集进行特征向量表示，并根据所述特征向量表示的结果进行闭环检测。The closed-loop detection module 503 is configured to construct a visual word bag model of the static image according to the plurality of cluster center points, map local features of the static image to the constructed visual word bag model, and Feature vector representation is performed on the to-be-processed image set according to the visual word bag model, and closed-loop detection is performed according to the result represented by the feature vector.

本公开实施例提出的上述图像处理装置与上文中描述的图像处理方法对应一致，具体细节可参见上述对图像处理方法的描述，在此不再赘述。The above-mentioned image processing apparatus proposed in the embodiments of the present disclosure corresponds to the image processing method described above. For details, please refer to the above description of the image processing method, which will not be repeated here.

图6是适于用来实现根据本公开实施方式的图像处理方法的电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device suitable for implementing the image processing method according to an embodiment of the present disclosure.

如图6所示，电子设备600包括中央处理单元(CPU)601，其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行本公开上述方法的实施方式中的各种处理。在RAM603中，还存储有电子设备600操作所需的各种程序和数据。CPU601、ROM602以及RAM603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6 , the electronic device 600 includes a central processing unit (CPU) 601 that can be loaded into a random access memory (RAM) 603 according to a program stored in a read only memory (ROM) 602 or a program from a storage section 608 Instead, various processes in the embodiments of the above-described methods of the present disclosure are performed. In the RAM 603, various programs and data necessary for the operation of the electronic device 600 are also stored. The CPU 601 , the ROM 602 and the RAM 603 are connected to each other through a bus 604 . An input/output (I/O) interface 605 is also connected to bus 604 .

以下部件连接至I/O接口605：包括键盘、鼠标等的输入部分606；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607；包括硬盘等的存储部分608；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器610上，以便于从其上读出的计算机程序根据需要被安装入存储部分608。The following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 608 including a hard disk, etc. ; and a communication section 609 including a network interface card such as a LAN card, a modem, and the like. The communication section 609 performs communication processing via a network such as the Internet. A drive 610 is also connected to the I/O interface 605 as needed. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage section 608 as needed.

特别地，根据本公开的实施方式，上文参考本公开实施方式中的方法可以被实现为计算机软件程序。例如，本公开的实施方式包括一种计算机程序产品，其包括有形地包含在及其可读介质上的计算机程序，所述计算机程序包含用于执行本公开实施方式中方法的程序代码。在这样的实施方式中，该计算机程序可以通过通信部分609从网络上被下载和安装，和/或从可拆卸介质611被安装。In particular, according to embodiments of the present disclosure, the methods in the above-referenced embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a readable medium thereof, the computer program comprising program code for performing the methods of the embodiments of the present disclosure. In such an embodiment, the computer program may be downloaded and installed from the network via the communication section 609 and/or installed from the removable medium 611 .

附图中的流程图和框图，图示了按照本公开各种实施方式的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，路程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分，所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the diagram or block diagram may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function. executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本公开实施方式中所涉及到的单元或模块可以通过软件的方式实现，也可以通过硬件的方式来实现。所描述的单元或模块也可以设置在处理器中，这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。The units or modules involved in the embodiments of the present disclosure can be implemented in software or hardware. The described units or modules may also be provided in the processor, and the names of these units or modules do not constitute a limitation on the units or modules themselves in certain circumstances.

作为另一方面，本公开还提供了一种计算机可读存储介质，该计算机可读存储介质可以是上述实施方式中所述装置中所包含的计算机可读存储介质；也可以是单独存在，未装配入设备中的计算机可读存储介质。计算机可读存储介质存储有一个或者一个以上程序，所述程序被一个或者一个以上的处理器用来执行描述于本公开的方法。As another aspect, the present disclosure also provides a computer-readable storage medium, and the computer-readable storage medium may be a computer-readable storage medium included in the apparatus described in the foregoing embodiments; A computer-readable storage medium that fits into a device. The computer-readable storage medium stores one or more programs used by one or more processors to perform the methods described in the present disclosure.

以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离所述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover the above-mentioned technical features without departing from the inventive concept. Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Claims

1. An image processing method, comprising:

filtering dynamic objects in the images to be processed in the image set to be processed to obtain static images;

extracting local features in the static image by using a CNN network, and clustering the local features to obtain a plurality of clustering central points;

and constructing a visual bag-of-words model of the static image according to the plurality of clustering center points, mapping local features of the static image to the constructed visual bag-of-words model, expressing feature vectors of the image set to be processed according to the visual bag-of-words model, and performing closed-loop detection according to the result expressed by the feature vectors.

2. The method of claim 1, wherein filtering dynamic objects in the to-be-processed images in the to-be-processed image set to obtain a static image comprises:

detecting a dynamic object in the image to be processed, and extracting the region information of the dynamic object;

and filtering the dynamic object from the image to be processed according to the region information to obtain the static image.

3. The method according to claim 1 or 2, wherein extracting local features in the static image using a CNN network comprises:

processing the static image by utilizing a multi-scale dense full convolution network, and returning key point information in the static image;

locally cutting the static image according to the key point information to obtain a plurality of local image blocks;

and respectively extracting the features of the local image blocks by using a local feature network to obtain the feature point positions and the feature descriptors of the local image.

4. The method of claim 1 or 2, wherein mapping local features of the static image onto the constructed visual bag-of-words model comprises:

determining a first similarity between the local features of the static image and the cluster center point in the visual bag of words model;

and mapping the local features of the static images to a bag of words category to which one of the cluster center points belongs according to the first similarity.

5. The method of claim 4, wherein performing feature vector representation on the set of images to be processed according to the visual bag-of-words model comprises:

counting the distribution condition of mapping the local features of the static images to each bag type in the visual bag-of-words model;

and obtaining the characteristic vector of the static image according to the distribution condition.

6. The method of claim 5, wherein performing closed-loop detection based on the result of the eigenvector representation comprises:

determining a second similarity between the feature vectors of the static images corresponding to the two images to be processed in the image set to be processed;

and determining the closed-loop detection results of the two images to be detected according to the second similarity.

7. An image processing apparatus characterized by comprising:

the filtering module is configured to filter dynamic objects in the images to be processed in the image set to be processed to obtain static images;

the characteristic extraction module is configured to extract local characteristics in the static image by using a CNN network, and cluster the local characteristics to obtain a plurality of cluster central points;

the closed-loop detection module is configured to construct a visual bag-of-words model of the static image according to the plurality of clustering center points, map local features of the static image to the constructed visual bag-of-words model, perform feature vector representation on the image set to be processed according to the visual bag-of-words model, and perform closed-loop detection according to a result of the feature vector representation.

8. An electronic device comprising a memory and a processor; wherein,

the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method of any one of claims 1-6.

9. A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement the method of any of claims 1-6.