CN116563538A - Image segmentation method and system - Google Patents
Image segmentation method and system Download PDFInfo
- Publication number
- CN116563538A CN116563538A CN202310474603.4A CN202310474603A CN116563538A CN 116563538 A CN116563538 A CN 116563538A CN 202310474603 A CN202310474603 A CN 202310474603A CN 116563538 A CN116563538 A CN 116563538A
- Authority
- CN
- China
- Prior art keywords
- image
- feature
- module
- segmentation
- multiple scales
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 238000003709 image segmentation Methods 0.000 title claims abstract description 47
- 230000011218 segmentation Effects 0.000 claims abstract description 143
- 238000011176 pooling Methods 0.000 claims abstract description 61
- 230000007246 mechanism Effects 0.000 claims abstract description 50
- 238000012549 training Methods 0.000 claims description 28
- 230000006870 function Effects 0.000 claims description 21
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 230000002708 enhancing effect Effects 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 230000004927 fusion Effects 0.000 abstract description 10
- 238000012545 processing Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 39
- 238000000605 extraction Methods 0.000 description 18
- 238000004891 communication Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an image segmentation method and system, which relate to the technical field of image processing, wherein the method comprises the following steps: acquiring an image to be segmented; inputting an image to be segmented into a target segmentation network to obtain a segmentation result; the target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module. The system performs the method. According to the invention, the image to be segmented is downsampled based on the superpixel segmentation module, so that the first images with multiple scales are obtained, the first feature images of the image to be segmented extracted by the encoder module are pooled based on the first images with multiple scales, and the final segmentation result of the image to be segmented is obtained by combining the attention mechanism and the feature fusion operation, so that the accuracy of segmenting the image to be segmented is improved.
Description
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an image segmentation method and system.
Background
Image segmentation is a technique and process of dividing an image into several specific regions with unique properties and presenting objects of interest. The method is a key step from image processing to image analysis, and the images cannot be accurately analyzed without accurate segmentation.
The existing image segmentation method is mostly based on the brightness and color of pixels in an image, and global semantic information in image features is difficult to extract, so that the accuracy of image segmentation is low.
Disclosure of Invention
The image segmentation method and system provided by the invention are used for solving the problem of low accuracy of image segmentation in the prior art.
The invention provides an image segmentation method, which comprises the following steps:
acquiring an image to be segmented;
inputting an image to be segmented into a target segmentation network to obtain a segmentation result;
the target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module;
and, the obtaining the segmentation result includes:
extracting a first feature map of the image to be segmented based on the encoder module;
downsampling the image to be segmented based on the super-pixel pooling module to obtain a first image with multiple scales, pooling the first feature image based on the first image with multiple scales to obtain a second feature image with multiple scales;
respectively inputting the second feature images with the multiple scales into the attention mechanism module to obtain a third feature image with the multiple scales;
And based on the decoder module, fusing the third feature images with the multiple scales and the first feature images to obtain the segmentation result.
According to the image segmentation method provided by the invention, the second feature images of a plurality of scales are respectively input into the attention mechanism module to obtain a third feature image of a plurality of scales, and the method comprises the following steps:
based on a plurality of attention mechanism modules, the second feature map of each scale is transposed respectively to obtain a fourth feature map;
multiplying the fourth feature map with the second feature map to obtain a fifth feature map;
convolving the second feature map to obtain a sixth feature map;
multiplying the sixth feature map by the fifth feature map to obtain a third feature map of the multiple scales.
According to the image segmentation method provided by the invention, the method for fusing the third feature map and the first feature map of the multiple scales based on the decoder module to obtain the segmentation result comprises the following steps:
respectively carrying out inverse pooling on the third feature graphs of the multiple scales based on the decoder module to obtain a plurality of seventh feature graphs;
overlapping the seventh feature images and then fusing the overlapped seventh feature images with the first feature images to obtain fused feature images;
Deconvolution is carried out on the fused feature images to obtain the segmentation result.
According to the image segmentation method provided by the invention, the encoder module is determined according to the trained encoder module in the preset segmentation network, and the encoder module in the preset segmentation network is obtained after training the convolutional neural network by using the ImageNet data set.
According to the image segmentation method provided by the invention, the acquisition mode of the target segmentation network comprises the following steps:
acquiring a plurality of sample images;
cutting each sample image to obtain a second image corresponding to each sample image;
labeling the second image to obtain a third image;
enhancing the second image to obtain a fourth image;
obtaining the sample data set according to the third image and the fourth image;
and inputting the sample data set into a preset segmentation network for training to obtain the target segmentation network.
According to the image segmentation method provided by the invention, the sample data set is input into a preset segmentation network for training to obtain the target segmentation network, and the method comprises the following steps:
and inputting the sample data set into a preset segmentation network for training until the value of a target loss function of the preset segmentation network tends to be stable, stopping training, and obtaining the target segmentation network, wherein the target loss function is determined according to a cross entropy function and a boundary similarity function.
According to the image segmentation method provided by the invention, the second image is enhanced to obtain a fourth image, which comprises the following steps:
performing data enhancement on the second image to obtain the fourth image;
wherein the data enhancement comprises any one of the following:
random angle rotation, horizontal flip, vertical flip, color adjustment, photometric distortion, and class-balanced sampling.
The present invention also provides an image segmentation system, comprising: the device comprises an acquisition module and a segmentation module;
the acquisition module is used for acquiring an image to be segmented;
the segmentation module is used for inputting the image to be segmented into a target segmentation network and obtaining a segmentation result;
the target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module;
and, the obtaining the segmentation result includes:
extracting a first feature map of the image to be segmented based on the encoder module;
downsampling the image to be segmented based on the super-pixel pooling module to obtain a first image with multiple scales, pooling the first feature image based on the first image with multiple scales to obtain a second feature image with multiple scales;
Respectively inputting the second feature images with the multiple scales into the attention mechanism module to obtain a third feature image with the multiple scales;
and based on the decoder module, fusing the third feature images with the multiple scales and the first feature images to obtain the segmentation result.
The invention also provides an electronic device comprising a processor and a memory storing a computer program, the processor implementing the image segmentation method as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image segmentation method as described in any of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the image segmentation method as described in any one of the above.
According to the image segmentation method and system, the image to be segmented is downsampled based on the super-pixel segmentation module, the first images with multiple scales are obtained, the first feature images of the image to be segmented extracted by the encoder module are pooled based on the first images with multiple scales, a final segmentation result of the image to be segmented is obtained by combining an attention mechanism and feature fusion operation, and the accuracy of segmentation of the image to be segmented is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an image segmentation method provided by the invention;
FIG. 2 is a schematic diagram of a super pixel pooling module according to the present invention;
FIG. 3 is a schematic diagram of an attention mechanism module according to the present invention;
FIG. 4 is a schematic diagram of a decoder module provided by the present invention;
FIG. 5 is a schematic diagram of a target segmentation network according to the present invention;
FIG. 6 is a schematic diagram of an image segmentation system according to the present invention;
fig. 7 is a schematic diagram of the physical structure of the electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flow chart of an image segmentation method provided by the present invention, as shown in fig. 1, the method includes:
step 110, obtaining an image to be segmented;
step 120, inputting an image to be segmented into a target segmentation network to obtain a segmentation result;
the target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module;
and, the obtaining the segmentation result includes:
extracting a first feature map of the image to be segmented based on the encoder module;
downsampling the image to be segmented based on the super-pixel pooling module to obtain a first image with multiple scales, pooling the first feature image based on the first image with multiple scales to obtain a second feature image with multiple scales;
respectively inputting the second feature images with the multiple scales into the attention mechanism module to obtain a third feature image with the multiple scales;
and based on the decoder module, fusing the third feature images with the multiple scales and the first feature images to obtain the segmentation result.
It should be noted that, the execution subject of the above method may be a computer device.
Alternatively, the image to be segmented may be specifically an image including the object to be extracted, which may be specifically obtained according to a test set in the sample data set, or may be acquired in real time, and the image may be specifically an RGB image, for example, a remote sensing image, and the object to be extracted may specifically include a road intersection, an airplane, a vehicle, a ship, or the like.
The target segmentation network may be specifically trained based on a sample dataset, which may specifically include an encoder module, a superpixel pooling module, an attention mechanism module, and a decoder module.
By inputting the image to be segmented into the encoder module, the encoder module may specifically employ a trained feature extraction network, such as a convolutional neural network, based on a complete convolutional mask, based on the encoder module extracting a feature map, i.e., a first feature map, of the image to be segmented.
The super-pixel pooling module may specifically include a super-pixel segmentation branch and a pooling layer, input an image to be segmented into the super-pixel pooling module, downsample the image to be segmented by using the super-pixel segmentation branch in the super-pixel pooling module to obtain a first image with multiple scales, input the first image with multiple scales into the pooling layer, and pool the first feature map by using the first image with multiple scales to obtain a feature map with multiple scales, namely a second feature map. The downsampling may specifically include 2-fold, 4-fold, 8-fold downsampling, etc. The superpixel segmentation branch can be specifically built by a simple linear iterative clustering (Simple Linear Iterative Cluster, SLIC) algorithm.
The super-pixel segmentation map is obtained using, for example, a SLIC algorithm, and the partitioning of the super-pixel blocks is referred to as a feature-pooling partitioning.
Super-pixels are small areas composed of a series of adjacent pixel points with similar color, brightness, texture and other characteristics, and most of the small areas keep effective information for further image segmentation and generally do not destroy the boundary information of objects in the image. The super-pixel algorithm groups pixels by utilizing the similarity of the features among the pixels, and uses a small amount of super-pixels to replace a large amount of pixels to express the image features, so that the complexity of image processing is greatly reduced, and the super-pixel algorithm is commonly used in the field of image segmentation.
And respectively inputting the obtained second feature images with the multiple scales into an attention mechanism module to obtain a third feature image with the multiple scales, wherein the second feature image with each scale is correspondingly input into one attention mechanism module, and a third feature image with a corresponding scale is output.
For example, fig. 2 is a schematic structural diagram of a super-pixel pooling module provided by the present invention, as shown in fig. 2, the size of the dimension of the image to be segmented is h×w×c, where H represents the height of the image to be segmented, W represents the width of the image to be segmented, and C represents the number of channels of the image to be segmented.
The image to be segmented is input into an encoder module trained by a convolutional neural network, and a first feature map is output, wherein the scale of the first feature map is H, W and C'.
The image to be segmented is input into a super-pixel pooling module, the image to be segmented is downsampled by utilizing a super-pixel segmentation branch to obtain a super-pixel segmentation image, namely a first image, the first image and a first feature image are input into a pooling layer, super-pixel pooling is carried out on the first feature image by utilizing the first image, the operation of ordinary pooling is replaced, a second feature image is obtained, the scale size of the second feature image is K x C, wherein K represents the number of super-pixel blocks in the super-pixel pooling module.
The second feature map is input to a corresponding attention mechanism module, and a third feature map is output, wherein the size of the third feature map is K x C'.
Based on the decoder module, the third feature images of multiple scales output by the attention mechanism module and the first feature images output by the encoder module are subjected to feature fusion, and an image which is obtained by dividing an image to be divided, namely a division result, is output, wherein the divided image can be specifically a binary mask which represents an object to be extracted.
For example, the image to be segmented is a remote sensing image of a road intersection, and the remote sensing image of the road intersection is input into a target segmentation network to obtain a segmentation result.
The extraction of the road intersection of the remote sensing image has important significance for application such as automatic driving, map construction and the like. While the existing road intersection extraction method focuses on the segmentation of road pixel levels, while the road extraction method based on the end-to-end convolutional neural network has outstanding advantages in distinguishing roads from other features, the following problems still exist in the conventional road extraction work: the existing road extraction method focuses on the segmentation of the road pixel level, and the recognition of the road intersection focuses less; the simple convolution operator cannot pay sharp attention to the spatial relationship between each feature point, and is difficult to completely perceive global semantic information of road features, so that the road extraction effect is poor. The multi-scale road intersection information is obtained through the multi-branch network at the same time, so that the defect of a super-pixel pooling method in the field of road intersection extraction is overcome; the image segmentation method provided by the invention has the characteristic that the obtained segmentation result has accuracy after the image to be segmented comprising the road intersection is segmented.
According to the image segmentation method provided by the invention, the image to be segmented is downsampled based on the super-pixel segmentation module, the first images with multiple scales are obtained, the first feature images of the image to be segmented extracted by the encoder module are pooled based on the first images with multiple scales, and the final segmentation result of the image to be segmented is obtained by combining the attention mechanism and the feature fusion operation, so that the accuracy of segmentation of the image to be segmented is improved.
Further, in an embodiment, the inputting the second feature maps of the multiple scales into the attention mechanism module to obtain a third feature map of the multiple scales may specifically include:
based on a plurality of attention mechanism modules, the second feature map of each scale is transposed respectively to obtain a fourth feature map;
multiplying the fourth feature map with the second feature map to obtain a fifth feature map;
convolving the second feature map to obtain a sixth feature map;
multiplying the sixth feature map by the fifth feature map to obtain a third feature map of the multiple scales.
Optionally, the second feature map of each scale is input to one attention mechanism module respectively, and each attention mechanism module outputs a feature map of one scale, namely a third feature map, specifically:
And transposing the second characteristic diagram of each scale based on each attention mechanism module to obtain a corresponding fourth characteristic diagram.
And multiplying the fourth characteristic diagram with the input second characteristic diagram by using the attention mechanism module to obtain a fifth characteristic diagram, and convolving the input second characteristic diagram to obtain a sixth characteristic diagram.
And multiplying the sixth characteristic diagram with the fifth characteristic diagram by using the attention mechanism module to obtain a third characteristic diagram with one scale.
And obtaining a third characteristic diagram of multiple scales according to the third characteristic diagram output by each attention mechanism module.
For example, fig. 3 is a schematic structural diagram of the attention mechanism module provided by the present invention, as shown in fig. 3, one of the second feature maps with a size of k×c' in the second feature maps with a size of multiple scales output by the superpixel pooling module is transposed and then multiplied by itself, and a feature map with a size of k×k (i.e., a fifth feature map) is obtained after softmax. And carrying out 1X 1 convolution on the second characteristic diagram with the size of K 'to obtain a characteristic diagram with the size of K' C '(namely a sixth characteristic diagram), and multiplying the characteristic diagram with the size of K' K by the fifth characteristic diagram to obtain an output characteristic diagram with the size of K 'C' (namely a third characteristic diagram).
Further, in an embodiment, the fusing, based on the decoder module, the third feature map and the first feature map of the multiple scales to obtain the segmentation result may specifically include:
respectively carrying out inverse pooling on the third feature graphs of the multiple scales based on the decoder module to obtain a plurality of seventh feature graphs;
overlapping the seventh feature images and then fusing the overlapped seventh feature images with the first feature images to obtain fused feature images;
deconvolution is carried out on the fused feature images to obtain the segmentation result.
Optionally, inputting the obtained third feature maps with multiple scales into a decoder module, and performing inverse pooling on the third feature maps with each scale to obtain a seventh feature map with multiple scales being the same.
And overlapping the obtained seventh feature graphs with the same size with the first feature graphs output by the encoder module, splicing/fusing the seventh feature graphs to obtain fused feature graphs, and deconvoluting the fused feature graphs to obtain a final segmentation result.
And (3) carrying out inverse average pooling operation on the obtained third feature graphs with the multiple scales to restore the third feature graphs to the size of the input image, overlapping the third feature graphs with the multiple scales into one piece, splicing/fusing the third feature graphs with the first feature graphs generated by the encoder module, and carrying out deconvolution on the spliced feature graphs to generate a prediction graph, namely a segmentation result.
For example, fig. 4 is a schematic structural diagram of a decoder module according to the present invention, as shown in fig. 4, the decoder module is used to inverse pool the obtained third feature maps with dimensions K' ×c ", k×c" and k×c ", respectively, to obtain a seventh feature map with dimensions h×w×c".
And splicing/fusing the overlapped seventh feature images and the first feature images output by the encoder module to obtain fused feature images, and deconvoluting the fused feature images to obtain prediction output, namely a segmentation result.
According to the image segmentation method provided by the invention, the second characteristic images of a plurality of scales output by the super-pixel segmentation module are processed based on the attention mechanism module, the third characteristic images of a plurality of scales are output, the third characteristic images of a plurality of scales and the first characteristic images output by the encoder module are combined to perform characteristic fusion, the characteristic information of the image to be segmented is extracted, the segmentation result of the image to be segmented is finally obtained, and the segmentation effect of the image to be segmented is improved.
Further, in one embodiment, the encoder module is determined according to an encoder module in a trained preset split network, where the encoder module in the preset split network is obtained after training the convolutional neural network using an ImageNet dataset.
Optionally, in recent years, with the continuous development of Deep Learning (DL), the research of convolutional neural networks (Convolutional Neural Network, CNN) in the aspect of image information interpretation has made great progress, and has a very broad application prospect. The CNN can autonomously learn the characteristics of geometry, shape and the like of the ground feature elements according to the input image, overcomes the defect of manually constructing the characteristics by the traditional method, and is widely applied to target extraction tasks. In the widely used encoder module and decoder module architecture of CNN, encoder modules with different hierarchical structures encode input data, learn and extract semantic features of a target (e.g., a road intersection); and decoding the acquired semantic features step by using a decoder module, and recovering the spatial resolution of the deep features. Meanwhile, in order to solve the problem that deep feature space detail information is not easy to recover in a decoder module stage, jump connection is introduced to perform feature fusion operation among different levels, so that shallow layer features with rich space detail information are fully utilized, deep features with finer semantic information are generated, and a good extraction effect is obtained in the extraction process of a road intersection.
Based on this, the present invention obtains the encoder modules of the preset split network using the feature extraction network (e.g., convolutional neural network) based on the full convolutional mask pre-trained on the ImageNet dataset, and takes the parameters of the encoder modules in the preset split network as the initial parameters of the encoder modules in the target split network.
The pre-training adopts an encoder module obtained by a feature extraction network based on a complete convolution mask, and the structure of the encoder module is as follows: the feature extraction network based on the complete convolution mask is composed of a plurality of residual blocks, feature extraction is realized by convolution kernel with the size of 3 multiplied by 3 in the blocks, downsampling is realized by convolution with the step length of 2, jump connection is set, and the convergence speed of the network is accelerated, wherein the structure of an encoder module in the target segmentation network is the same as that of an encoder module in the preset segmentation network.
The method comprises the steps of inputting a sample data set into a preset segmentation network for training, adjusting initial parameters of an encoder module, parameters of a super-pixel pooling module, parameters of an attention mechanism module and parameters of a decoder module in the preset segmentation network, performing convolution with a convolution kernel size of 7 on an image to be segmented by using the trained encoder module, performing maximum pooling with a step size of 2 on an obtained feature map, and then performing downsampling by using convolution with the step size of 2 through a plurality of residual blocks, and performing feature extraction by using convolution with a plurality of convolution kernels with the step size of 3×3 to obtain a first feature map.
Further, in an embodiment, the obtaining manner of the target partition network may specifically include:
acquiring a plurality of sample images;
cutting each sample image to obtain a second image corresponding to each sample image;
labeling the second image to obtain a third image;
enhancing the second image to obtain a fourth image;
obtaining the sample data set according to the third image and the fourth image;
and inputting the sample data set into a preset segmentation network for training to obtain the target segmentation network.
Optionally, a plurality of remote sensing images to be interpreted (i.e. raw) are collected as sample images and the sample images are pixel-level annotated to form a dataset comprising road intersection masks, in particular:
collecting a plurality of remote sensing images as sample images, and randomly cutting the sample images to obtain second images, for example, regularly cutting the sample images into small sizes such as 512×512, 1024×1024 and the like.
And (3) carrying out pixel-level labeling on the obtained second image to obtain a semantic tag image, namely a third image, of each remote sensing image, enhancing the second image to obtain a fourth image, and forming a sample data set containing a road intersection mask by each third image and the fourth image, wherein for example, a road intersection in the second image is labeled with the same pixel value, and a non-road intersection in the second image is labeled with another pixel value.
The sample data set comprises a data set which is obtained by dividing an image set formed by a third image and a fourth image according to a preset proportion and comprises a training set, a verification set and a test set.
Further, in an embodiment, the enhancing the second image to obtain a fourth image may specifically include:
performing data enhancement on the second image to obtain the fourth image;
wherein the data enhancement comprises any one of the following:
random angle rotation, horizontal flip, vertical flip, color adjustment, photometric distortion, and class-balanced sampling.
Optionally, the fourth image is obtained by performing data enhancement on the second image, where the data enhancement specifically refers to performing a transformation such as random angle rotation, horizontal inversion, vertical inversion, color adjustment, photometric distortion, class-balanced sampling, etc. on the third image.
Further, in an embodiment, the inputting the sample dataset into a preset segmentation network for training to obtain the target segmentation network may specifically include:
and inputting the sample data set into a preset segmentation network for training until the value of a target loss function of the preset segmentation network tends to be stable, stopping training, and obtaining the target segmentation network, wherein the target loss function is determined according to a cross entropy function and a boundary similarity function.
Optionally, a preset segmentation network is built, training is performed by using a training set, the preset segmentation network comprises an encoder module adopting a feature extraction network based on a complete convolution mask, a super-pixel pooling module (comprising a super-pixel segmentation branch and a pooling layer), an attention mechanism module and a decoder module fusing multi-scale information, and the building steps are as follows:
adopting an SLIC algorithm to build a super-pixel segmentation branch;
downsampling images in a training set by 2 times and 4 times, and respectively performing superpixel segmentation to obtain superpixel segmentation graphs with three scales (including H.W. C, H/2*W/2.C and H/4*W/4*C);
constructing an encoder module, and sending the images in the training set into the encoder module to obtain a feature map;
pooling the characteristic images output by the encoder module by utilizing the super-pixel segmentation images with three scales respectively to obtain the characteristic images output by the super-pixel pooling module, and obtaining the characteristic images with three scales output by the super-pixel pooling module through an attention mechanism module;
and a decoder module is built to fuse the feature images output by the attention mechanism module and the encoder module, and the multi-scale information fusion module is arranged to improve the fusion capability of each layer of features and realize the effective extraction of images to be segmented (such as road intersections in remote sensing images comprising the road intersections).
Inputting the images in the training set in the sample data set into a preset segmentation network for training, after the training is finished and the network converges (the value of the target loss function of the preset segmentation network tends to be stable at the moment), storing the parameters of the trained encoder module, the parameters of the super-pixel pooling module, the parameters of the attention mechanism module and the parameters of the decoder module, obtaining the target segmentation network according to the initial parameters of the trained encoder module, the parameters of the super-pixel pooling module, the parameters of the attention mechanism module, the parameters of the decoder module and the preset segmentation network, sending the verification set into the target segmentation network after the training is finished, and verifying the segmentation accuracy of the target segmentation network.
It should be noted that, by acquiring the value of the objective loss function in each training process, it is determined whether the variation value of the objective loss function acquired continuously and repeatedly is smaller than or equal to the preset threshold, if yes, it is determined that the value of the objective loss function of the preset partition network tends to be stable.
And inputting the images to be segmented in the test set into the target segmentation network after training is completed, and obtaining a segmentation result.
The objective loss function may specifically be obtained by adding a cross entropy function and a boundary similarity function.
Cross entropy function L CE Is defined as:
wherein y is i For the true label value of image i (i.e. obtained after pixel-level labeling of the second image), y i ' is the predicted value of the image i (i.e. the image containing the label value output by the preset dividing network), N is the number of samples, and i represents the ith image.
The definition of the boundary similarity function is:
wherein X is i And Y i Representing the predicted value of the image i and the true label value of the image i, X respectively i The n Y represents the overlap of the predicted value of image i and the true tag value of image i, |X i |,|Y i The i indicates the number of predicted values and true tag values of the image i, respectively.
For example, fig. 5 is a schematic structural diagram of a target segmentation network according to the present invention, as shown in fig. 5, the images to be segmented (the size of the segmented image is h×w×c) are respectively downsampled by 2 times and downsampled by 4 times, so as to obtain three first images with sizes H/2*W/2×c, H/4*W/4*C and h×w×c, respectively, and the segmented images are input to an encoder module to output a first feature map with sizes h×w×c'.
The first characteristic diagram output by the encoder module and the three-size first image are input into a pooling layer in the super-pixel pooling module, the first characteristic diagram output by the encoder module is pooled, a third profile of three dimensions (including K '×c ", k×c" and k×c ") is obtained, and respectively carrying out inverse pooling on the third characteristic diagrams of K'. Times.C ', and obtaining seventh characteristic diagrams of three H'. Times.W '. Times.C'.
And (3) splicing/fusing the characteristic diagrams with the H-W-C scale and the first characteristic diagram output by the encoder module, which are obtained by superposing the seventh characteristic diagrams with the H-W-C scale, to obtain a fused characteristic diagram, deconvoluting the fused characteristic diagram, and outputting a segmentation result.
According to the image segmentation method provided by the invention, the image to be segmented is downsampled based on the super-pixel segmentation module, the first images with multiple scales are obtained, the first feature images of the image to be segmented extracted by the encoder module are pooled based on the first images with multiple scales, and the final segmentation result of the image to be segmented is obtained by combining the attention mechanism and the feature fusion operation, so that the accuracy and the efficiency of segmentation of the image to be segmented are improved.
The image segmentation system provided by the invention is described below, and the image segmentation system described below and the image segmentation method described above can be referred to correspondingly.
Fig. 6 is a schematic structural diagram of an image segmentation system according to the present invention, as shown in fig. 6, including:
an acquisition module 610 and a segmentation module 611;
the acquiring module 610 is configured to acquire an image to be segmented;
the segmentation module 611 is configured to input an image to be segmented into a target segmentation network, and obtain a segmentation result;
The target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module;
and, the obtaining the segmentation result includes:
extracting a first feature map of the image to be segmented based on the encoder module;
downsampling the image to be segmented based on the super-pixel pooling module to obtain a first image with multiple scales, pooling the first feature image based on the first image with multiple scales to obtain a second feature image with multiple scales;
respectively inputting the second feature images with the multiple scales into the attention mechanism module to obtain a third feature image with the multiple scales;
and based on the decoder module, fusing the third feature images with the multiple scales and the first feature images to obtain the segmentation result.
According to the image segmentation system provided by the invention, the image to be segmented is downsampled based on the super-pixel segmentation module, the first images with multiple scales are obtained, the first feature images of the image to be segmented extracted by the encoder module are pooled based on the first images with multiple scales, and the final segmentation result of the image to be segmented is obtained by combining the attention mechanism and the feature fusion operation, so that the accuracy of segmentation of the image to be segmented is improved.
Fig. 7 is a schematic physical structure of an electronic device according to the present invention, as shown in fig. 7, the electronic device may include: a processor (processor) 710, a communication interface (communication interface) 711, a memory (memory) 712, and a bus (bus) 713, wherein the processor 710, the communication interface 711, and the memory 712 perform communication with each other through the bus 713. Processor 710 may call logic instructions in memory 712 to perform the following methods:
acquiring an image to be segmented;
inputting an image to be segmented into a target segmentation network to obtain a segmentation result;
the target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module;
and, the obtaining the segmentation result includes:
extracting a first feature map of the image to be segmented based on the encoder module;
downsampling the image to be segmented based on the super-pixel pooling module to obtain a first image with multiple scales, pooling the first feature image based on the first image with multiple scales to obtain a second feature image with multiple scales;
Respectively inputting the second feature images with the multiple scales into the attention mechanism module to obtain a third feature image with the multiple scales;
and based on the decoder module, fusing the third feature images with the multiple scales and the first feature images to obtain the segmentation result.
Further, the logic instructions in the memory described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer power supply screen (which may be a personal computer, a server, or a network power supply screen, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Further, the present invention discloses a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the image segmentation method provided by the above-mentioned method embodiments, for example comprising:
acquiring an image to be segmented;
inputting an image to be segmented into a target segmentation network to obtain a segmentation result;
the target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module;
and, the obtaining the segmentation result includes:
extracting a first feature map of the image to be segmented based on the encoder module;
downsampling the image to be segmented based on the super-pixel pooling module to obtain a first image with multiple scales, pooling the first feature image based on the first image with multiple scales to obtain a second feature image with multiple scales;
respectively inputting the second feature images with the multiple scales into the attention mechanism module to obtain a third feature image with the multiple scales;
And based on the decoder module, fusing the third feature images with the multiple scales and the first feature images to obtain the segmentation result.
In another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the image segmentation method provided in the above embodiments, for example, including:
acquiring an image to be segmented;
inputting an image to be segmented into a target segmentation network to obtain a segmentation result;
the target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module;
and, the obtaining the segmentation result includes:
extracting a first feature map of the image to be segmented based on the encoder module;
downsampling the image to be segmented based on the super-pixel pooling module to obtain a first image with multiple scales, pooling the first feature image based on the first image with multiple scales to obtain a second feature image with multiple scales;
respectively inputting the second feature images with the multiple scales into the attention mechanism module to obtain a third feature image with the multiple scales;
And based on the decoder module, fusing the third feature images with the multiple scales and the first feature images to obtain the segmentation result.
The system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer power screen (which may be a personal computer, a server, or a network power screen, etc.) to perform the method described in the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. An image segmentation method, comprising:
acquiring an image to be segmented;
inputting an image to be segmented into a target segmentation network to obtain a segmentation result;
the target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module;
and, the obtaining the segmentation result includes:
extracting a first feature map of the image to be segmented based on the encoder module;
downsampling the image to be segmented based on the super-pixel pooling module to obtain a first image with multiple scales, pooling the first feature image based on the first image with multiple scales to obtain a second feature image with multiple scales;
Respectively inputting the second feature images with the multiple scales into the attention mechanism module to obtain a third feature image with the multiple scales;
and based on the decoder module, fusing the third feature images with the multiple scales and the first feature images to obtain the segmentation result.
2. The image segmentation method according to claim 1, wherein the inputting the second feature maps of the multiple scales into the attention mechanism module respectively obtains a third feature map of the multiple scales, includes:
based on a plurality of attention mechanism modules, the second feature map of each scale is transposed respectively to obtain a fourth feature map;
multiplying the fourth feature map with the second feature map to obtain a fifth feature map;
convolving the second feature map to obtain a sixth feature map;
multiplying the sixth feature map by the fifth feature map to obtain a third feature map of the multiple scales.
3. The image segmentation method according to claim 1, wherein the fusing the third feature map and the first feature map of the multiple scales based on the decoder module to obtain the segmentation result includes:
Respectively carrying out inverse pooling on the third feature graphs of the multiple scales based on the decoder module to obtain a plurality of seventh feature graphs;
overlapping the seventh feature images and then fusing the overlapped seventh feature images with the first feature images to obtain fused feature images;
deconvolution is carried out on the fused feature images to obtain the segmentation result.
4. The image segmentation method according to claim 1, wherein the encoder module is determined according to an encoder module in a trained preset segmentation network, the encoder module in the preset segmentation network being obtained after training a convolutional neural network using an ImageNet dataset.
5. The image segmentation method according to any one of claims 1-4, wherein the obtaining manner of the target segmentation network includes:
acquiring a plurality of sample images;
cutting each sample image to obtain a second image corresponding to each sample image;
labeling the second image to obtain a third image;
enhancing the second image to obtain a fourth image;
obtaining the sample data set according to the third image and the fourth image;
And inputting the sample data set into a preset segmentation network for training to obtain the target segmentation network.
6. The image segmentation method according to claim 5, wherein the inputting the sample dataset into a preset segmentation network for training to obtain the target segmentation network comprises:
and inputting the sample data set into a preset segmentation network for training until the value of a target loss function of the preset segmentation network tends to be stable, stopping training, and obtaining the target segmentation network, wherein the target loss function is determined according to a cross entropy function and a boundary similarity function.
7. The image segmentation method as set forth in claim 5, wherein the enhancing the second image to obtain a fourth image comprises:
performing data enhancement on the second image to obtain the fourth image;
wherein the data enhancement comprises any one of the following:
random angle rotation, horizontal flip, vertical flip, color adjustment, photometric distortion, and class-balanced sampling.
8. An image segmentation system, comprising: the device comprises an acquisition module and a segmentation module;
the acquisition module is used for acquiring an image to be segmented;
The segmentation module is used for inputting the image to be segmented into a target segmentation network and obtaining a segmentation result;
the target segmentation network is trained based on a sample data set and comprises an encoder module, a super-pixel pooling module, an attention mechanism module and a decoder module;
and, the obtaining the segmentation result includes:
extracting a first feature map of the image to be segmented based on the encoder module;
downsampling the image to be segmented based on the super-pixel pooling module to obtain a first image with multiple scales, pooling the first feature image based on the first image with multiple scales to obtain a second feature image with multiple scales;
respectively inputting the second feature images with the multiple scales into the attention mechanism module to obtain a third feature image with the multiple scales;
and based on the decoder module, fusing the third feature images with the multiple scales and the first feature images to obtain the segmentation result.
9. An electronic device comprising a processor and a memory storing a computer program, characterized in that the processor implements the image segmentation method according to any one of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the image segmentation method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310474603.4A CN116563538B (en) | 2023-04-27 | 2023-04-27 | Image segmentation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310474603.4A CN116563538B (en) | 2023-04-27 | 2023-04-27 | Image segmentation method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116563538A true CN116563538A (en) | 2023-08-08 |
CN116563538B CN116563538B (en) | 2023-09-22 |
Family
ID=87501069
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310474603.4A Active CN116563538B (en) | 2023-04-27 | 2023-04-27 | Image segmentation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116563538B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111127493A (en) * | 2019-11-12 | 2020-05-08 | 中国矿业大学 | Remote sensing image semantic segmentation method based on attention multi-scale feature fusion |
CN113850825A (en) * | 2021-09-27 | 2021-12-28 | 太原理工大学 | Remote sensing image road segmentation method based on context information and multi-scale feature fusion |
CN113888550A (en) * | 2021-09-27 | 2022-01-04 | 太原理工大学 | Remote sensing image road segmentation method combining super-resolution and attention mechanism |
-
2023
- 2023-04-27 CN CN202310474603.4A patent/CN116563538B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111127493A (en) * | 2019-11-12 | 2020-05-08 | 中国矿业大学 | Remote sensing image semantic segmentation method based on attention multi-scale feature fusion |
CN113850825A (en) * | 2021-09-27 | 2021-12-28 | 太原理工大学 | Remote sensing image road segmentation method based on context information and multi-scale feature fusion |
CN113888550A (en) * | 2021-09-27 | 2022-01-04 | 太原理工大学 | Remote sensing image road segmentation method combining super-resolution and attention mechanism |
Non-Patent Citations (4)
Title |
---|
JIE CHEN 等: "SPMF-Net: Weakly Supervised Building Segmentation by Combining Superpixel Pooling and Multi-Scale Feature Fusion", 《REMOTE SENSING》, pages 1 - 13 * |
MATHIJS SCHUURMANS等: "Efficient semantic image segmentation with superpixel pooling", 《ARXIV》, pages 1 - 11 * |
SUHA KWAK 等: "Weakly Supervised Semantic Segmentation Using Superpixel Pooling Network", 《PROCEEDINGS OF THE THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》, pages 4111 - 4117 * |
李亚军: "基于超像素池化的快速语义分割", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN116563538B (en) | 2023-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111047551B (en) | Remote sensing image change detection method and system based on U-net improved algorithm | |
CN109190752B (en) | Image semantic segmentation method based on global features and local features of deep learning | |
CN112308860B (en) | Earth observation image semantic segmentation method based on self-supervision learning | |
CN112132156B (en) | Image saliency target detection method and system based on multi-depth feature fusion | |
CN110322495B (en) | Scene text segmentation method based on weak supervised deep learning | |
CN114120102A (en) | Boundary-optimized remote sensing image semantic segmentation method, device, equipment and medium | |
CN111696110B (en) | Scene segmentation method and system | |
CN111612008B (en) | Image segmentation method based on convolution network | |
CN113807355A (en) | Image semantic segmentation method based on coding and decoding structure | |
CN113780296A (en) | Remote sensing image semantic segmentation method and system based on multi-scale information fusion | |
CN111882620B (en) | Road drivable area segmentation method based on multi-scale information | |
CN113486956B (en) | Target segmentation system and training method thereof, and target segmentation method and device | |
CN111046768A (en) | Deep learning method for simultaneously extracting road pavement and center line of remote sensing image | |
CN116740362B (en) | Attention-based lightweight asymmetric scene semantic segmentation method and system | |
CN113177956B (en) | Semantic segmentation method for unmanned aerial vehicle remote sensing image | |
CN113065551A (en) | Method for performing image segmentation using a deep neural network model | |
CN114283285A (en) | Cross consistency self-training remote sensing image semantic segmentation network training method and device | |
CN116453121A (en) | Training method and device for lane line recognition model | |
CN116883650A (en) | Image-level weak supervision semantic segmentation method based on attention and local stitching | |
CN113837931B (en) | Transformation detection method and device for remote sensing image, electronic equipment and storage medium | |
CN113554655B (en) | Optical remote sensing image segmentation method and device based on multi-feature enhancement | |
CN113591614B (en) | Remote sensing image road extraction method based on close-proximity spatial feature learning | |
CN112634289B (en) | Rapid feasible domain segmentation method based on asymmetric void convolution | |
CN116563538B (en) | Image segmentation method and system | |
CN114708591B (en) | Document image Chinese character detection method based on single word connection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |