Nothing Special   »   [go: up one dir, main page]

CN110852327A - Image processing method, image processing device, electronic equipment and storage medium - Google Patents

Image processing method, image processing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110852327A
CN110852327A CN201911084122.2A CN201911084122A CN110852327A CN 110852327 A CN110852327 A CN 110852327A CN 201911084122 A CN201911084122 A CN 201911084122A CN 110852327 A CN110852327 A CN 110852327A
Authority
CN
China
Prior art keywords
image
static
processed
local
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911084122.2A
Other languages
Chinese (zh)
Inventor
施智平
付超凡
邵振洲
关永
韩旭
张永祥
姜那
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Original Assignee
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University filed Critical Capital Normal University
Priority to CN201911084122.2A priority Critical patent/CN110852327A/en
Publication of CN110852327A publication Critical patent/CN110852327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure discloses an image processing method, an image processing device, an electronic device and a storage medium, wherein the method comprises the following steps: filtering dynamic objects in the images to be processed in the image set to be processed to obtain static images; extracting local features in the static image by using a CNN network, and clustering the local features to obtain a plurality of clustering central points; and constructing a visual bag-of-words model of the static image according to the plurality of clustering center points, mapping local features of the static image to the constructed visual bag-of-words model, expressing feature vectors of the image set to be processed according to the visual bag-of-words model, and performing closed-loop detection according to the result expressed by the feature vectors.

Description

Image processing method, image processing device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.
Background
In recent years, with the rapid development of robotics and autopilot technologies, visual Simultaneous Localization and Mapping (vSLAM) plays an increasingly important role therein. The closed-loop detection is used as an important link in the vSLAM system and is mainly used for judging whether the current position of the robot passes through a space region before, and the problems of map creation failure and position loss and map redundant data or repeated structures caused by calculation accumulated errors in large-scale complex environments such as sensor errors, noise influence and environmental changes are solved, so that the accuracy and the robustness of the vSLAM are improved.
The existing closed loop detection methods can be divided into two categories: the method comprises a traditional closed-loop detection method based on artificial design characteristics and a closed-loop detection method based on deep learning. The conventional closed-loop detection algorithm relies on artificially designed feature descriptors to perform image similarity matching, and can be divided into a closed-loop detection method based on local feature descriptors (SIFT, SURF, and ORB) and a closed-loop detection method based on global feature descriptors (GIST). The most common method in closed-loop detection based on local feature descriptors is a Visual Bag of Words (BoVW), and the main idea is to extract local features from an image and cluster the extracted feature descriptors to establish a Visual dictionary. And representing the image into a K-dimensional numerical value vector according to the visual dictionary tree, and judging whether a closed loop exists or not by measuring the distance between the image feature vectors. Because the traditional artificially designed feature closed-loop detection algorithm is to extract regional features such as edges, corners, lines, curves and special attributes from local regions of an image, the method has good accuracy when the change of the surrounding environment is not obvious. However, since the artificial design feature algorithm mostly depends on the professional knowledge and experience of human beings, and cannot accurately express the image, when the surrounding environment changes significantly (for example, in a scene with significant illumination change), it is difficult to provide accurate and robust image feature description, and the stability is poor, resulting in an increased mismatch rate in a complex environment.
With the rapid development of deep learning in recent years, because CNN (convolutional neural network) can extract features of deep level of an image, has richer data information and higher feature expression capability, CNN is used to learn features from a large amount of data to replace the features of traditional manual design to solve the closed-loop detection problem. When the illumination changes remarkably, the image descriptors based on the CNN are superior to descriptors designed manually in performance, but because the CNN-based closed-loop detection algorithm extracts the global features of the image, the accuracy of the CNN feature extraction closed-loop detection algorithm based on the global features is lower than that of the traditional closed-loop detection algorithm based on the local features in a stable environment without illumination changes.
Disclosure of Invention
The embodiment of the disclosure provides an image processing method and device, electronic equipment and a computer-readable storage medium.
In a first aspect, an embodiment of the present disclosure provides an image processing method, including:
filtering dynamic objects in the images to be processed in the image set to be processed to obtain static images;
extracting local features in the static image by using a CNN network, and clustering the local features to obtain a plurality of clustering central points;
and constructing a visual bag-of-words model of the static image according to the plurality of clustering center points, mapping local features of the static image to the constructed visual bag-of-words model, expressing feature vectors of the image set to be processed according to the visual bag-of-words model, and performing closed-loop detection according to the result expressed by the feature vectors.
Further, filtering the dynamic object in the to-be-processed image set to obtain a static image, including:
detecting a dynamic object in the image to be processed, and extracting the region information of the dynamic object;
and filtering the dynamic object from the image to be processed according to the region information to obtain the static image.
Further, extracting local features in the static image by using a CNN network, comprising:
processing the static image by utilizing a multi-scale dense full convolution network, and returning key point information in the static image;
locally cutting the static image according to the key point information to obtain a plurality of local image blocks;
and respectively extracting the features of the local image blocks by using a local feature network to obtain the feature point positions and the feature descriptors of the local image.
Further, mapping local features of the static image onto the constructed visual bag-of-words model, comprising:
determining a first similarity between the local features of the static image and the cluster center point in the visual bag of words model;
and mapping the local features of the static images to a bag of words category to which one of the cluster center points belongs according to the first similarity.
Further, performing feature vector representation on the image set to be processed according to the visual bag-of-words model, including:
counting the distribution condition of mapping the local features of the static images to each bag type in the visual bag-of-words model;
and obtaining the characteristic vector of the static image according to the distribution condition.
Further, performing closed-loop detection according to the result represented by the feature vector, including:
determining a second similarity between the feature vectors of the static images corresponding to the two images to be processed in the image set to be processed;
and determining the closed-loop detection results of the two images to be detected according to the second similarity.
In a second aspect, an embodiment of the present invention provides an image processing apparatus, including:
the filtering module is configured to filter dynamic objects in the images to be processed in the image set to be processed to obtain static images;
the characteristic extraction module is configured to extract local characteristics in the static image by using a CNN network, and cluster the local characteristics to obtain a plurality of cluster central points;
the closed-loop detection module is configured to construct a visual bag-of-words model of the static image according to the plurality of clustering center points, map local features of the static image to the constructed visual bag-of-words model, perform feature vector representation on the image set to be processed according to the visual bag-of-words model, and perform closed-loop detection according to a result of the feature vector representation.
The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the image processing apparatus includes a memory and a processor, the memory is used for storing one or more computer instructions for supporting the image processing apparatus to execute the method in the first aspect, and the processor is configured to execute the computer instructions stored in the memory. The image processing apparatus may further include a communication interface for the image processing apparatus to communicate with other devices or a communication network.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of the first aspect.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for a security authentication apparatus for an enterprise account, which contains computer instructions for performing the method according to the first aspect.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
the method comprises the steps of firstly utilizing a convolutional neural network to carry out local region segmentation and feature extraction on an image, combining an extracted image local CNN feature descriptor with a traditional closed-loop detection bag-of-words model algorithm, clustering the obtained image local CNN feature descriptor in the bag-of-words model by utilizing a clustering center algorithm, constructing a visual dictionary tree according to a plurality of clustered central points, further obtaining a feature vector representing the image, carrying out similarity comparison on the image, and judging whether a loop is generated. The method can simultaneously improve the accuracy and stability of closed-loop detection in a complex environment.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow diagram of an image processing method according to an embodiment of the present disclosure;
FIG. 2 shows a flow chart of step S101 according to the embodiment shown in FIG. 1;
FIG. 3 shows a flowchart of step S102 according to the embodiment shown in FIG. 1;
FIG. 4 shows a flow diagram of a method for combining a deep learning advantage with a conventional closed loop detection advantage in an embodiment of the present disclosure;
fig. 5 shows a block diagram of the structure of an image processing apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device suitable for implementing an image processing method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates a flowchart of an image processing method according to an embodiment of the present disclosure. As shown in fig. 1, the image processing method includes the steps of:
in step S101, filtering the dynamic objects in the to-be-processed image set to obtain a static image;
in step S102, a CNN network is used to extract local features in the static image, and the local features are clustered to obtain a plurality of cluster center points;
in step S103, a visual bag-of-words model of the static image is constructed according to the plurality of cluster center points, local features of the static image are mapped onto the constructed visual bag-of-words model, feature vector representation is performed on the image set to be processed according to the visual bag-of-words model, and closed-loop detection is performed according to a result of the feature vector representation.
In this embodiment, the image set to be processed may be a set formed by a plurality of images to be processed for closed-loop detection. The dynamic object is an object which is in dynamic change in the image to be processed, such as a person, an animal, a vehicle, and the like.
According to the embodiment of the disclosure, before the closed-loop detection is performed on the to-be-processed image in the to-be-processed image set, the dynamic object in the to-be-processed image is filtered, and then the static image is obtained. In some embodiments, the pixels in the image area where the dynamic object in the image to be processed is located may be set to a preset color, for example, black, because the full black area has no feature point, the feature point interference on the dynamic object is eliminated.
For a plurality of static images derived from a set of images to be processed, a CNN network may be utilized to extract local features in the static images. In the process of extracting the local features of the static image by using the CNN network, firstly, the first part of the CNN network is used for extracting key point information from the static image, and the static image is cut into a plurality of local area image blocks based on the key points, wherein each local area image block comprises a key point extracted from the static image. Then, for each local area image block, the second part CNN network is used to extract the key point information from each local area image block, and the key point information extracted from the multiple local area image blocks corresponding to the static image is used as the local feature of the static image, for example, the feature vector of the key point extracted from the multiple local area image blocks corresponding to the static image can be used as the local feature of the static image.
After the local features of the static images corresponding to each image to be processed in the image set to be processed are obtained, the local features can be clustered by using a center clustering algorithm to obtain a plurality of clustering center points. The center clustering algorithm may adopt a Kmeans + + algorithm, which is a known algorithm, and the specific clustering process is not described herein again.
After clustering is completed, a visual bag-of-words model of the static images corresponding to all the images to be processed in the image set to be processed can be constructed according to the clustering center points. The constructed visual bag-of-words model comprises a plurality of bag-of-words categories, each bag-of-words category represents a category, and the central point of each bag-of-words category is one of the cluster central points. When a local feature is similar to the cluster center point, the local feature may be classified into the bag of words category in which the cluster center point is located.
After the visual bag-of-words model is constructed, the local features of the static images can be mapped to the visual bag-of-words model according to the similarity between the local features and the clustering center points. I.e. the local features of the static image are classified into the various bag-of-words categories comprised by the visual bag-of-words model. After the mapping is completed, the feature vector representation is performed on the static image according to the mapping result, and the feature vector representation result of the static image can be used as the feature vector representation result of the image to be processed corresponding to the static image. In some embodiments, the feature vector representation of the static image may be derived based on the distribution of local features of the static image across the various bag-of-words classes after these local adjustments are mapped onto the visual bag-of-words model.
After the feature vector representation of each to-be-processed image in the to-be-processed image set is obtained, a closed-loop detection result between the two to-be-processed images can be determined according to the feature vector representation.
In an optional implementation manner of this embodiment, as shown in fig. 2, the step S101, namely, the step of filtering the dynamic object in the to-be-processed image set to obtain the static image, further includes the following steps:
in step S201, detecting a dynamic object in the image to be processed, and extracting area information of the dynamic object;
in step S202, the dynamic object is filtered from the image to be processed according to the region information, so as to obtain the static image.
In the image preprocessing stage, scene recognition and dynamic and static scene separation are carried out on the image, and a Yolov3 model is pre-trained by using PASCAL VOC2007 and VOC2012 data sets before the scene separation. The data set contains a plurality of outdoor moving objects such as people, animals, vehicles and the like, and the positions and the belonging classification identifications of the objects in the pictures. Wherein, the pascalloc 2007 contains 20 classifications, 9963 images and 24640 objects marked therein, and the PASCAL VOC2012 contains 20 classifications, 11530 images and 27450 objects marked therein.
The object classification in the PASCAL data set completely covers moving objects which may be encountered by the autonomous mobile robot during outdoor work, and most of the outdoor moving objects in the data set used in the embodiment of the disclosure are people, vehicles and the like, so that the dynamic objects can be easily identified by using a convolutional neural network trained by the database, and then the dynamic objects are filtered from the image to be processed.
After the training of the convolutional neural network for identifying the dynamic objects in the images is completed, the dynamic and static scene identification and separation can be performed on the images to be processed in the image set to be processed by using a dynamic and static scene separation algorithm based on target detection.
In the embodiment of the present disclosure, a pre-trained target detection YOLOv3 model is used to perform dynamic object detection on an image to be processed, and dynamic object region information in the image to be processed is extracted, where the region information includes a rectangular region, a region size, and the like of a position where a dynamic object is located. And separating the corresponding position of the dynamic object area information in the original image, and eliminating the dynamic object information such as pedestrians, vehicles and the like in the original image.
In some embodiments, the adopted scene information separation mode is based on a matrix corresponding to image pixels, and the dynamic object area is blackened in the original image, because the full-black area does not have feature points, the feature point interference on the dynamic object is eliminated. Therefore, when the local feature extraction is carried out on the image subsequently, the feature extraction is not carried out on the separated dynamic scene area. The obtained other position area is the filtered pure static scene image excluding the dynamic objects that we need to solve.
According to the method and the device, a scene semantic segmentation and separation are carried out on the image by adopting a dynamic and static scene separation algorithm based on the image semantic segmentation in the image preprocessing stage, and a semantic segmentation DeepLabv3 Plus model is pre-trained by utilizing the data sets of PASCAL VOC2007 and PASCAL VOC2012 before the scene separation.
And then, performing semantic segmentation on the input image to be processed by using a pre-trained DeepLabv3 Plus model, and obtaining semantic information of the image dynamic and static scenes in the result. Extracting semantic information of a dynamic object (including information of image pixels of an area where the dynamic object is located and the like) in the image to be processed according to the obtained result, traversing the whole semantic information of the image based on an image pixel point matrix, and when the pixel point matrix is labeled object classification information during training, removing the semantic information of the dynamic object from an original image, namely directly and independently blackening the semantic information of the dynamic object such as pedestrians, vehicles and the like from a corresponding position in the original image to obtain a static image only removing the information of the dynamic object.
In some embodiments, in order to avoid incomplete separation of edge information of the dynamic object and ensure that semantic information of the dynamic object is removed more cleanly and thoroughly, image expansion processing may be further performed on the separated area information of the dynamic object (that is, after the image area where the dynamic object is located is blacked, expansion processing is performed on a black area where the dynamic object is located), and the iteration number is 2.
In an optional implementation manner of this embodiment, as shown in fig. 3, the step S102, namely the step of extracting the local feature in the static image by using the CNN network, further includes the following steps:
in step S301, the static image is processed by using a multi-scale dense full convolution network, and the key point information in the static image is returned;
in step S302, locally clipping the static image according to the key point information to obtain a plurality of local image blocks;
in step S303, a local feature network is used to perform feature extraction on the plurality of local image blocks respectively, so as to obtain feature point positions and feature descriptors of the local image blocks.
In the optional implementation manner, the CNN network may use a deep-architecture sparse matching method LF-Net network, and the CNN network may be divided into two parts, where the first part CNN network may be a multi-scale dense full convolution network, which returns the key point position, scale, direction, and the like of the static image, and performs local area clipping on the static image according to the key point position to obtain a plurality of local area image blocks corresponding to each static image. The second partial CNN network may be a local descriptor network used to extract the keypoint-based cropped image blocks produced by the first partial network. When the local feature extraction is carried out on the static image, a full convolution network is firstly utilized to generate a rich feature map o from the image I to be processed. And detecting key points of the generated feature graph o with unchanged scale, selecting top M pixels as feature points according to the scale-unchanged mapping, and finally obtaining the feature point positions and the feature descriptors of the local image blocks.
The multi-scale dense full convolution network is a full convolution network structure designed according to the MSNDet (multi-scale dense network) idea, the network has no full connection layer, three simple ResNet layouts are adopted, each block comprises 5 × 5 convolution filters, then batch normalization, leak-ReLU activation and another group of 5 × 5 convolution are carried out. And takes the concept of MSDNet (multi-scale dense network): a plurality of scales are adopted to obtain abstract features of the graph to generate a feature graph, and the feature graph is divided into two parts which are connected in series: 1. convolution of the upper level co-scale features and down-sampling (diagonalconnection) of the upper level up-scale feature map. 2. Each layer has connections to other convolutional layers, and when the weights are propagated reversely, the weights are updated in a direction of better extracting results of each feature.
And after the key point is selected, image block clipping based on the surrounding position of the key point is carried out on the normalized image (the image obtained by normalizing the static image) by using a bilinear sampling mode (so as to keep difference). A bilinear sampling mode, namely a bilinear interpolation algorithm, is a better image scaling algorithm, and fully utilizes four real pixel values around a virtual point in an image to jointly determine one pixel value in a target image, so that the aim of cutting the image is fulfilled.
After obtaining a plurality of local area image blocks of the static image by using the first part of network, extracting features (including key point positions, sizes, feature vectors and the like of the local area image blocks) by using the local area image blocks cut by the second part of network, and finally obtaining a local feature description subset of the static image for subsequent clustering and establishing a visual bag-of-words model. Compared with the traditional artificial feature extraction algorithm, the LF-Net network can extract more abundant features, is hardly influenced by environments such as illumination and the like, and has stronger stability and higher accuracy.
When the local feature extraction is performed on the image by using the above method in the embodiment of the present disclosure, feature extraction is not performed on the dynamic scene area separated from the original image. Namely, when the area is detected to be a dynamic scene area during feature extraction, the area is bypassed, and local feature extraction is not performed. So that the dynamic scene area no longer participates in the representation of the features of the image. And finally determining the positions and the feature descriptors of the filtered image feature points.
In an optional implementation manner of this embodiment, the step of mapping the local features of the static image onto the constructed visual bag-of-words model in step S103 further includes the following steps:
determining a first similarity between the local features of the static image and the cluster center point in the visual bag of words model;
and mapping the local features of the static images to a bag of words category to which one of the cluster center points belongs according to the first similarity.
In the optional implementation manner, the obtained local features of the static image are combined with a closed-loop detection bag-of-words model algorithm, that is, the extracted local feature descriptor subset of the static image is clustered based on a Kmeans + + central clustering algorithm, and a visual bag-of-words model is constructed according to a plurality of clustered cluster central points.
In some embodiments, after the plurality of cluster center points are obtained by performing a center clustering algorithm on the local features corresponding to the static images corresponding to the to-be-processed images in the to-be-processed image set, the local features corresponding to the static images may be mapped to the plurality of cluster center points, and the specific mapping process may be performed by calculating a first similarity between the local features and the plurality of cluster center points and mapping according to the first similarity. For example, for a certain local feature B of the static image a, a first similarity between the feature descriptor of the local feature B and the feature descriptor of each cluster center point Ci may be calculated, and the first similarity may be determined by the euclidean distance, and the local feature B is mapped to the bag-of-words category corresponding to the cluster center point Ci with the largest corresponding first similarity. By the method, all local features extracted from all static images can be mapped to the visual bag-of-words model, and finally the visual dictionary tree is obtained.
In an optional implementation manner of this embodiment, the step of performing, in step S103, feature vector representation on the image set to be processed according to the visual bag-of-words model further includes the following steps:
counting the distribution condition of mapping the local features of the static images to each bag type in the visual bag-of-words model;
and obtaining the characteristic vector of the static image according to the distribution condition.
In this optional implementation, after the local features on the static images are mapped onto the visual dictionary model, the distribution of the local features on each static image on each bag type of the visual dictionary model, for example, the distribution frequency of the local features on one static image on different bag types, may be counted according to the mapping result, and then the feature vector of the static image is obtained based on the distribution. For example, the constructed visual bag-of-words model includes 3 bag-of-words categories, the static image a has 10 local features in total, after the 10 local features are mapped onto the 3 bag-of-words categories, 1 local feature is mapped onto the first bag-of-words category, 6 local features are mapped onto the second bag-of-words category, and 3 bag-of-words categories are mapped onto the third bag-of-words category, then the feature vector of the static image a may be represented as (1, 6, 3), and of course, the feature vector may be normalized to be used as the feature vector of the static image a. It can be understood that the feature vector dimension of the static image is the same as the number of bag categories in the visual bag-of-words model, and is also the same as the number of cluster center points.
In an optional implementation manner of this embodiment, the step of performing closed-loop detection according to the result represented by the feature vector in step S103 further includes the following steps:
determining a second similarity between the feature vectors of the static images corresponding to the two images to be processed in the image set to be processed;
and determining the closed-loop detection results of the two images to be detected according to the second similarity.
In this optional implementation manner, the feature vector of each to-be-processed image in the to-be-processed image set may be represented by the feature vector of the corresponding static image, so when a closed-loop detection result between two to-be-processed images in the to-be-processed image set is determined, by calculating a second similarity between the feature vectors of the static images corresponding to the two to-be-processed images, when the second similarity is greater than or equal to a preset threshold, it may be considered that the closed-loop detection result of the two to-be-processed images is yes, that is, a closed loop is generated between the two to-be-processed images, otherwise, a closed loop is not generated between the two to-be-processed images. The second similarity can be determined by using the cosine distance between the two images to be processed.
The traditional closed-loop detection algorithm based on local feature extraction in a stable environment has high accuracy, and the deep learning closed-loop detection algorithm based on global feature extraction in a complex environment has better stability. In order to improve the stability and accuracy of closed-loop detection in a complex environment, the embodiment of the disclosure provides a closed-loop detection method based on dynamic and static scene separation and local CNN feature representation. Firstly, in an image preprocessing stage, scene separation is carried out on an input image by utilizing a dynamic and static scene separation algorithm based on target detection or semantic segmentation, interference of dynamic objects such as pedestrians and vehicles in the image is eliminated, and a filtered pure static scene image is obtained. And then, local feature extraction is carried out on the filtered image by using a CNN feature extraction algorithm. When the local feature extraction is performed on the image, feature extraction is not performed on the dynamic scene area separated from the original image. And finally, determining the positions of the feature points and the feature descriptors of the filtered image, combining the positions with a traditional closed loop detection bag-of-words model algorithm, clustering the extracted CNN local feature descriptors by using a bag-of-words model, and constructing a visual dictionary tree. And obtaining a feature vector of the image according to the constructed dictionary tree, comparing the similarity of the feature vector, and judging whether a closed loop is generated.
The beneficial effects that this disclosed embodiment produced mainly lie in:
1) the closed-loop detection algorithm based on dynamic and static scene separation solves the problems that the cognition degree of a visual word bag model to image semantic information is reduced under the interference of movable objects such as pedestrians and vehicles in the environment, the global scene information in an image cannot be effectively extracted, and the like. Thereby improving the accuracy of closed loop detection.
2) The closed loop detection method based on the local CNN feature representation is provided. Because the traditional closed-loop detection algorithm extracts the local features of the image and the closed-loop detection algorithm based on deep learning extracts the global features of the image, the traditional closed-loop detection algorithm based on the local features has higher accuracy and higher precision than the closed-loop detection algorithm based on the deep learning under the same stable environment. However, in a complex environment (such as severe change of illumination), the closed-loop detection algorithm based on deep learning is obviously superior to the traditional closed-loop detection algorithm.
The embodiment of the disclosure provides a method for combining a deep learning advantage and a traditional closed loop detection advantage. The method comprises the steps of firstly utilizing a convolutional neural network to carry out local region segmentation and feature extraction on an image, combining an extracted image local CNN feature descriptor with a traditional closed-loop detection bag-of-words model algorithm, clustering the obtained image local CNN feature descriptor in the bag-of-words model by utilizing a Kmeans + + algorithm, constructing a visual dictionary tree according to K clustered central points, further obtaining feature vectors representing the image, carrying out similarity comparison on the image and judging whether a loop is generated. The method can simultaneously improve the accuracy and stability of closed-loop detection in a complex environment.
Fig. 4 shows a flow chart of a method for combining a deep learning advantage with a conventional closed loop detection advantage in an embodiment of the present disclosure. As shown in fig. 4, in the image preprocessing stage, the original image is subjected to dynamic and static scene separation to obtain a static image after scene separation. The method comprises the steps of carrying out local CNN feature detection on a static image, obtaining image local feature sets, merging feature point clustering, counting frequency distribution of feature points on word bags in each image by utilizing a histogram statistical method in a word bag model, comparing similarity of any two images, and judging whether a closed loop is generated.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
Fig. 5 shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 5, the image processing apparatus includes:
a filtering module 501, configured to filter dynamic objects in the to-be-processed images in the to-be-processed image set to obtain static images;
a feature extraction module 502, configured to extract local features in the static image by using a CNN network, and perform clustering on the local features to obtain a plurality of clustering center points;
a closed-loop detection module 503, configured to construct a visual bag-of-words model of the static image according to the plurality of cluster center points, map local features of the static image to the constructed visual bag-of-words model, perform feature vector representation on the image set to be processed according to the visual bag-of-words model, and perform closed-loop detection according to a result of the feature vector representation.
The image processing apparatus provided in the embodiment of the present disclosure corresponds to the image processing method described above, and specific details may be referred to the description of the image processing method, which is not described herein again.
Fig. 6 is a schematic structural diagram of an electronic device suitable for implementing an image processing method according to an embodiment of the present disclosure.
As shown in fig. 6, the electronic device 600 includes a Central Processing Unit (CPU)601, which can execute various processes in the embodiments of the above-described method of the present disclosure according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to embodiments of the present disclosure, the methods in the embodiments above with reference to the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the methods of embodiments of the present disclosure. In such embodiments, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (9)

1. An image processing method, comprising:
filtering dynamic objects in the images to be processed in the image set to be processed to obtain static images;
extracting local features in the static image by using a CNN network, and clustering the local features to obtain a plurality of clustering central points;
and constructing a visual bag-of-words model of the static image according to the plurality of clustering center points, mapping local features of the static image to the constructed visual bag-of-words model, expressing feature vectors of the image set to be processed according to the visual bag-of-words model, and performing closed-loop detection according to the result expressed by the feature vectors.
2. The method of claim 1, wherein filtering dynamic objects in the to-be-processed images in the to-be-processed image set to obtain a static image comprises:
detecting a dynamic object in the image to be processed, and extracting the region information of the dynamic object;
and filtering the dynamic object from the image to be processed according to the region information to obtain the static image.
3. The method according to claim 1 or 2, wherein extracting local features in the static image using a CNN network comprises:
processing the static image by utilizing a multi-scale dense full convolution network, and returning key point information in the static image;
locally cutting the static image according to the key point information to obtain a plurality of local image blocks;
and respectively extracting the features of the local image blocks by using a local feature network to obtain the feature point positions and the feature descriptors of the local image.
4. The method of claim 1 or 2, wherein mapping local features of the static image onto the constructed visual bag-of-words model comprises:
determining a first similarity between the local features of the static image and the cluster center point in the visual bag of words model;
and mapping the local features of the static images to a bag of words category to which one of the cluster center points belongs according to the first similarity.
5. The method of claim 4, wherein performing feature vector representation on the set of images to be processed according to the visual bag-of-words model comprises:
counting the distribution condition of mapping the local features of the static images to each bag type in the visual bag-of-words model;
and obtaining the characteristic vector of the static image according to the distribution condition.
6. The method of claim 5, wherein performing closed-loop detection based on the result of the eigenvector representation comprises:
determining a second similarity between the feature vectors of the static images corresponding to the two images to be processed in the image set to be processed;
and determining the closed-loop detection results of the two images to be detected according to the second similarity.
7. An image processing apparatus characterized by comprising:
the filtering module is configured to filter dynamic objects in the images to be processed in the image set to be processed to obtain static images;
the characteristic extraction module is configured to extract local characteristics in the static image by using a CNN network, and cluster the local characteristics to obtain a plurality of cluster central points;
the closed-loop detection module is configured to construct a visual bag-of-words model of the static image according to the plurality of clustering center points, map local features of the static image to the constructed visual bag-of-words model, perform feature vector representation on the image set to be processed according to the visual bag-of-words model, and perform closed-loop detection according to a result of the feature vector representation.
8. An electronic device comprising a memory and a processor; wherein,
the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method of any one of claims 1-6.
9. A computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement the method of any of claims 1-6.
CN201911084122.2A 2019-11-07 2019-11-07 Image processing method, image processing device, electronic equipment and storage medium Pending CN110852327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911084122.2A CN110852327A (en) 2019-11-07 2019-11-07 Image processing method, image processing device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911084122.2A CN110852327A (en) 2019-11-07 2019-11-07 Image processing method, image processing device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110852327A true CN110852327A (en) 2020-02-28

Family

ID=69598753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911084122.2A Pending CN110852327A (en) 2019-11-07 2019-11-07 Image processing method, image processing device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110852327A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582447A (en) * 2020-04-30 2020-08-25 电子科技大学 Closed loop detection method based on multiple network characteristics
CN111738299A (en) * 2020-05-27 2020-10-02 完美世界(北京)软件科技发展有限公司 Scene static object merging method and device, storage medium and computing equipment
CN111882002A (en) * 2020-08-06 2020-11-03 桂林电子科技大学 MSF-AM-based low-illumination target detection method
CN112182275A (en) * 2020-09-29 2021-01-05 神州数码信息系统有限公司 Trademark approximate retrieval system and method based on multi-dimensional feature fusion
CN112396596A (en) * 2020-11-27 2021-02-23 广东电网有限责任公司肇庆供电局 Closed loop detection method based on semantic segmentation and image feature description

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831446A (en) * 2012-08-20 2012-12-19 南京邮电大学 Image appearance based loop closure detecting method in monocular vision SLAM (simultaneous localization and mapping)
CN106919920A (en) * 2017-03-06 2017-07-04 重庆邮电大学 Scene recognition method based on convolution feature and spatial vision bag of words
CN109902619A (en) * 2019-02-26 2019-06-18 上海大学 Image closed loop detection method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831446A (en) * 2012-08-20 2012-12-19 南京邮电大学 Image appearance based loop closure detecting method in monocular vision SLAM (simultaneous localization and mapping)
CN106919920A (en) * 2017-03-06 2017-07-04 重庆邮电大学 Scene recognition method based on convolution feature and spatial vision bag of words
CN109902619A (en) * 2019-02-26 2019-06-18 上海大学 Image closed loop detection method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YUKI ONO等: "LF-Net: Learning Local Features from Image", 《32ND CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2018)》 *
刘强等: "复杂环境下视觉SLAM闭环检测方法综述", 《机器人》 *
林辉: "基于CNN与VLAD融合的闭环检测", 《研究与开发》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582447A (en) * 2020-04-30 2020-08-25 电子科技大学 Closed loop detection method based on multiple network characteristics
CN111738299A (en) * 2020-05-27 2020-10-02 完美世界(北京)软件科技发展有限公司 Scene static object merging method and device, storage medium and computing equipment
CN111738299B (en) * 2020-05-27 2023-10-27 完美世界(北京)软件科技发展有限公司 Scene static object merging method and device, storage medium and computing equipment
CN111882002A (en) * 2020-08-06 2020-11-03 桂林电子科技大学 MSF-AM-based low-illumination target detection method
CN111882002B (en) * 2020-08-06 2022-05-24 桂林电子科技大学 MSF-AM-based low-illumination target detection method
CN112182275A (en) * 2020-09-29 2021-01-05 神州数码信息系统有限公司 Trademark approximate retrieval system and method based on multi-dimensional feature fusion
CN112396596A (en) * 2020-11-27 2021-02-23 广东电网有限责任公司肇庆供电局 Closed loop detection method based on semantic segmentation and image feature description

Similar Documents

Publication Publication Date Title
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
Chen et al. Vehicle detection in high-resolution aerial images via sparse representation and superpixels
CN112966691B (en) Multi-scale text detection method and device based on semantic segmentation and electronic equipment
CN110334762B (en) Feature matching method based on quad tree combined with ORB and SIFT
Chen et al. Vehicle detection in high-resolution aerial images based on fast sparse representation classification and multiorder feature
CN111145209B (en) Medical image segmentation method, device, equipment and storage medium
CN102968637B (en) Complicated background image and character division method
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111898432B (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN112116599B (en) Sputum smear tubercle bacillus semantic segmentation method and system based on weak supervised learning
CN110175615B (en) Model training method, domain-adaptive visual position identification method and device
CN114359851A (en) Unmanned target detection method, device, equipment and medium
CN105528595A (en) Method for identifying and positioning power transmission line insulators in unmanned aerial vehicle aerial images
Zhang et al. Road recognition from remote sensing imagery using incremental learning
CN113361495A (en) Face image similarity calculation method, device, equipment and storage medium
CN116704490B (en) License plate recognition method, license plate recognition device and computer equipment
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN113850136A (en) Yolov5 and BCNN-based vehicle orientation identification method and system
CN113223037A (en) Unsupervised semantic segmentation method and unsupervised semantic segmentation system for large-scale data
CN113657196B (en) SAR image target detection method, SAR image target detection device, electronic equipment and storage medium
CN112241736A (en) Text detection method and device
CN113822134A (en) Instance tracking method, device, equipment and storage medium based on video
CN113378837A (en) License plate shielding identification method and device, electronic equipment and storage medium
CN116071625B (en) Training method of deep learning model, target detection method and device
CN112668662A (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228