Multi-modal image segmentation method based on functional mapping
Technical Field
The invention belongs to the technical field of image segmentation in image processing, and particularly relates to a multi-modal image segmentation method based on functional mapping.
Background
The vigorous development of digital image technology has brought forward a large number of emerging industries, such as remote sensing satellite image positioning, medical image analysis, traffic intelligent identification and the like, and the increasingly mature of the information-based society is promoted. Images serve as an important bridge for human perception of the world and are also closely related to the visual field. For example, image processing is increasingly in demand and plays an increasingly critical role in various visual applications in the fields of artificial intelligence, machine vision, physiology, medicine, meteorology, military science, and the like. The image segmentation is used as an image preprocessing method, a solid foundation is laid for high-level semantic analysis in the image, and the performance can be improved by using an image segmentation technology in many applications such as image recognition, target positioning, edge detection and the like.
Image segmentation, as the name implies, is the segmentation of a given image into regions according to some rule or goal. For example, an image captured by a lake may be divided into a plurality of regions representing different semantic categories, such as the lake surface, people, boats, houses, trees, sky, etc., where the people and the boats may be regarded as target foreground objects and the rest may be regarded as background objects. The traditional image segmentation technology mainly processes clues such as gray scale, color, texture, shape and the like of a single image, and typical methods include threshold segmentation, region segmentation, edge segmentation, image segmentation, energy-based functional segmentation and the like. For example, the threshold segmentation method judges the class of the gray value according to a set threshold; the edge segmentation method is used for detecting according to the characteristics that the edge gray value has step property or catastrophe property and the like; the region segmentation method is used for judging according to an image similarity criterion and mainly comprises the technologies of watershed, region splitting and merging, array region growing and the like; the image segmentation takes the image as an undirected graph with pixels as vertexes and adjacent pixels connected by edges, and each segmentation area is taken as a sub-graph in the graph; the method is characterized in that a target edge is represented by using a continuous curve based on energy functional segmentation, and a segmentation result is solved through energy functional minimization, and the method is generally divided into a parameter active contour model and a geometric active contour model.
The disadvantages of the above method are mainly expressed in the following aspects: firstly, the original pixels of the image are directly processed, so that the time complexity of the algorithm is increased, and the calculation cost is increased; second, underlying processing techniques such as thresholding and edge segmentation are difficult to correlate with the semantic features of the image; thirdly, complementary information between images is ignored, and especially, some common structural and potential information exists between images containing similar objects, so that the segmentation effect of the image objects is directly influenced. Therefore, these methods are not suitable for large-scale image segmentation tasks including common targets, and thus have a certain adverse effect on practical applications such as image recognition and target positioning with large magnitude. Based on these considerations, aiming at application fields such as intelligent traffic recognition, medical influence analysis, large-scale image recognition and the like, an image segmentation technology which can establish the association between the bottom-layer features and the semantic features of the images and can effectively utilize potential structural information among the images in multiple directions is urgently needed to be designed.
Disclosure of Invention
In order to effectively utilize potential structure information between images, reduce the computational complexity of image segmentation processing and improve the target segmentation effect in the images, the invention provides a multi-modal image segmentation method based on functional mapping, which comprises the following steps:
1. after acquiring an image set containing a target, performing the following operations:
1) segmenting each image in the set into superpixel blocks, and representing the segmented superpixels by using different feature descriptors to obtain a multi-modal image representation;
2) establishing a map based on superpixels on the multi-modal image, and constructing a corresponding Laplacian matrix;
3) characterizing a reduction functional space of each image, and establishing functional mapping between image pairs;
4) aligning the image functional mapping of each mode with an image clue, and introducing an implicit function to keep consistency between the functional mappings;
5) and obtaining functional mapping expression according to the multi-modal mapping consistency, calculating a segmentation function corresponding to the image through a joint optimization objective function to obtain optimal segmentation expression of the image, and finishing image segmentation.
Further, the segmenting each image in the set in step 1) into superpixel blocks, and characterizing the segmented superpixels with different feature descriptors to obtain a multi-modal image representation, specifically:
1) let the set consist of n associated images, denotedEach image contains one or more target classes, and the number of the target classes of the whole set is C;
2) taking pixels in the image as the vertexes of the image, dividing the image in the set into q small regions (such as 100) by using an image division method, wherein q is a positive integer, the small regions are formed by pixel points with similar values and are called super pixels, and the division block belonging to the c-th class in the ith image is expressed as SicWhere i ═ {1,2, …, n }, and C ═ 1,2, …, C };
3) using m different feature descriptors, such as Scale Invariant Feature Transform (SIFT), Local Binary Pattern (LBP), histogram of gradient (HOG), etc., to represent each superpixel in the image, thereby obtaining a multi-modal feature table reflecting image intrinsic information in multiple directionsFor example, the ith image corresponds to a matrix setI.e. the kth image feature descriptor corresponds to the kth modality
Further, the step 2) of establishing a map based on superpixels on the multi-modal image and constructing a corresponding laplacian matrix specifically includes:
2.1) regarding q superpixels on each image mode as the vertexes of the graph, and constructing a superpixel graph formed by fully connecting the corresponding vertexes;
2.2) respectively constructing Laplace matrixes on the superpixel graphs of different modesThey are obtained by a weight matrix W calculated by a gaussian weighting strategy, i.e., L ═ D-W, where D is a diagonal matrix whose diagonal elements are the sum of the column elements of W.
The linear reduction functional space of each image is characterized in the step 3), and functional mapping between image pairs is established, specifically:
1) computing a Laplace matrix for multiple modalitiesAnd the feature value and the feature vector of (c), and take the front p (p)<q) eigenvectors stretched into a reduced functional spaceAnd the corresponding eigenvalues on each mode of each image form a diagonal matrix respectively
2) Let each image segmentation function be foiCorresponds to SicThe search space of the function corresponds to a set of basis vectorsSpanned p-dimensional spaceAnd f isoiRepresenting coefficients of the ith image as a linear function combinationWherein B isiIs formed by the first p eigenvectors of the Laplace matrix;
3) reflecting the relationship between any two pairwise images by linear functional mapping, e.g. from the subspace of the ith imageSubspace to jth imageIs used for functional mappingRepresentation, i.e. subspaceMapping a function of (1) to a subspaceCan be obtained by calculating Rijf is obtained.
Aligning the image functional mapping of each modality with an image cue in the step 4), and introducing a latent function to keep consistency between functional mappings, specifically:
1) the image clues correspond to different description operators, and the image functional mapping of each modality is aligned with the image clues by optimizing the following expression, namely
<math>
<mrow>
<msub>
<mi>min</mi>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
</msub>
<mi>H</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msub>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<msubsup>
<mi>X</mi>
<mi>i</mi>
<mi>k</mi>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>X</mi>
<mi>j</mi>
<mi>k</mi>
</msubsup>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mn>1</mn>
</msub>
<mo>+</mo>
<mi>α</mi>
<msubsup>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<msubsup>
<mi>Λ</mi>
<mi>i</mi>
<mi>k</mi>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>Λ</mi>
<mi>j</mi>
<mi>k</mi>
</msubsup>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mi>F</mi>
<mn>2</mn>
</msubsup>
<mo>+</mo>
<mi>β</mi>
<msub>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mn>1</mn>
</msub>
<mo>,</mo>
</mrow>
</math>
Wherein the constant alpha>0,β>0, symbol | · |)1Representing the L1 norm of the matrix, the symbol |FA Frobenius norm representing a matrix;
2) the introduced implicit functions are shared by the input images, and the functional mapping can effectively relate the corresponding implicit functions on each image through the functional mapping consistent terms, each implicit function only appears in a certain subset of the images, and the implicit function z corresponding to the ith imagei=[zi1,zi2,…,zil]E {0, 1} characterizes the relationship between the latent function and the image, while the continuous variable Φi=[φi1,φi2,…,φil]Describing each implicit function on the image;
the correspondence term of the functional mapping in the last step is expressed as
<math>
<mrow>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>,</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>γ</mi>
<munder>
<mi>Σ</mi>
<mrow>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>∈</mo>
<mi>E</mi>
</mrow>
</munder>
<msubsup>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>Φ</mi>
<mi>j</mi>
</msub>
<mi>diag</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mn>2</mn>
<mn>2</mn>
</msubsup>
<mo>+</mo>
<mi>λ</mi>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msubsup>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mi>diag</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mn>2</mn>
<mn>2</mn>
</msubsup>
<mo>,</mo>
</mrow>
</math>
Wherein the constant gamma>0,λ>0, symbol | · |)2L2 norm, diag (z) representing the matrixi) Represents a diagonal matrix, (i, j) ∈ Ε represents the neighbor set of the image pair, e.g., 20 neighbor images can be computed.
The functional mapping expression is obtained according to the multi-modal mapping consistency in the step 5), and the segmentation function corresponding to each graph is calculated through a joint optimization objective function, specifically:
1) computing functional mapping expressions from established multi-modal mapping consistency relationships, i.e.
<math>
<mrow>
<msub>
<mi>min</mi>
<mrow>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>,</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
</mrow>
</msub>
<munder>
<mi>Σ</mi>
<mrow>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>∈</mo>
<mi>E</mi>
</mrow>
</munder>
<mi>H</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>,</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
</math>
Wherein the variable phiiAnd phijThere is an orthogonal constraint between them, and the solution is carried out by using a variable alternation optimization method, namely fixing the other two variables to optimize the remaining one variable, namely the variable ziThe method is initialized to be a full 1 vector, and the optimal functional mapping expression R can be calculated through multiple iterations until the function is convergedij;
2) Taking an image sample as a pictureThe weight between two vertexes is recorded asThe joint optimization target expression of the image segmentation function is
Wherein the constant ζ>0, C ═ {1,2, …, C }, symbol (·)TRepresenting transposes of vectors or matrices, subspace BikThe first p eigenvectors of the superpixel graph Laplacian matrix on the ith image corresponding to the kth mode are expanded into different classes of segmentation functions ficMutual exclusion constraints are satisfied;
3) by solving the optimal solution of the objective function in the above steps, the optimal segmentation function of the ith image can be obtainedFrom this it can be determined that the optimal segmented block in the image belonging to the c-th object class is represented as
The invention provides a multi-modal image segmentation method based on functional mapping, which has the advantages that: the super pixels are formed by carrying out image segmentation on the original pixels of the image, so that the calculation cost is reduced; reflecting the representation content of the image from the perspective of different descriptors by constructing a multi-modal superpixel representation; by establishing functional mapping between image pairs in the reduction functional space and keeping the consistency of the image pairs by using an implicit function, the association between the low-level features and the high-level semantics of the image is effectively established, thereby improving the image segmentation effect and laying a foundation for tamping visual applications such as image recognition, target positioning and the like.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention is further illustrated with reference to figure 1:
1. after acquiring an image set containing a target, performing the following operations:
1) segmenting each image in the set into superpixel blocks, and representing the segmented superpixels by using different feature descriptors to obtain a multi-modal image representation;
2) establishing a map based on superpixels on the multi-modal image, and constructing a corresponding Laplacian matrix;
3) characterizing a reduction functional space of each image, and establishing functional mapping between image pairs;
4) aligning the image functional mapping of each mode with an image clue, and introducing an implicit function to keep consistency between the functional mappings;
5) and obtaining functional mapping expression according to the multi-modal mapping consistency, calculating a segmentation function corresponding to the image through a joint optimization objective function to obtain optimal segmentation expression of the image, and finishing image segmentation.
The step 1) of segmenting each image in the set into superpixel blocks, and representing the segmented superpixels by using different feature descriptors to obtain a multi-modal image representation specifically comprises the following steps:
1) let the set consist of n associated images, denotedEach image contains one or more target classes, and the number of the target classes of the whole set is C;
2) taking pixels in the image as the vertexes of the image, dividing the image in the set into q small regions (such as 100) by using an image division method, wherein q is a positive integer, the small regions are formed by pixel points with similar values and are called super pixels, and the division block belonging to the c-th class in the ith image is expressed as SicWhere i ═ {1,2, …, n }, and C ═ 1,2, …, C };
3) representing each super pixel in the image by using m different feature descriptors such as Scale Invariant Feature Transform (SIFT), Local Binary Pattern (LBP), gradient Histogram (HOG) and the like, thereby obtaining multi-modal feature representation reflecting image intrinsic information in multiple directions, such as a matrix set corresponding to the ith imageI.e. the kth image feature descriptor corresponds to the kth modality
Establishing a map based on superpixels on the multi-modal image in the step 2), and constructing a corresponding Laplace matrix, wherein the specific steps are as follows:
2.1) regarding q superpixels on each image mode as the vertexes of the graph, and constructing a superpixel graph formed by fully connecting the corresponding vertexes;
2.2) respectively constructing Laplace matrixes on the superpixel graphs of different modesThey are obtained by a weight matrix W calculated by a gaussian weighting strategy, i.e. L ═ D-W, where D is the diagonal matrix of the sum of column elements whose diagonal elements are W.
The linear reduction functional space of each image is represented in the step 3), and functional mapping between image pairs is established, specifically:
1) computing a Laplace matrix for multiple modalitiesAnd the feature value and the feature vector of (c), and take the front p (p)<q) eigenvectors stretched into a reduced functional spaceAnd the corresponding eigenvalues on each mode of each image form a diagonal matrix respectively
2) Let each image segmentation function be foiCorresponds to SicThe search space of the function corresponds to a set of basis vectorsSpanned p-dimensional spaceAnd f isoiRepresenting coefficients of the ith image as a linear function combinationWherein B isiIs formed by the first p eigenvectors of the Laplace matrix;
3) reflecting the relationship between any two pairwise images by linear functional mapping, e.g. from the subspace of the ith imageSubspace to jth imageIs used for functional mappingRepresentation, i.e. subspaceMapping a function of (1) to a subspaceCan be obtained by calculating Rijf is obtained.
Aligning the image functional mapping of each modality with an image clue in the step 4), and introducing an implicit function to keep consistency between the functional mappings, wherein the method specifically comprises the following steps:
1) the image clues correspond to different description operators, and the image functional mapping of each modality is aligned with the image clues by optimizing the following expression, namely
<math>
<mrow>
<msub>
<mi>min</mi>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
</msub>
<mi>H</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msub>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<msubsup>
<mi>X</mi>
<mi>i</mi>
<mi>k</mi>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>X</mi>
<mi>j</mi>
<mi>k</mi>
</msubsup>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mn>1</mn>
</msub>
<mo>+</mo>
<mi>α</mi>
<msubsup>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<msubsup>
<mi>Λ</mi>
<mi>i</mi>
<mi>k</mi>
</msubsup>
<mo>-</mo>
<msubsup>
<mi>Λ</mi>
<mi>j</mi>
<mi>k</mi>
</msubsup>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mi>F</mi>
<mn>2</mn>
</msubsup>
<mo>+</mo>
<mi>β</mi>
<msub>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mn>1</mn>
</msub>
<mo>,</mo>
</mrow>
</math>
Wherein the constant alpha>0,β>0, symbol | · |)1Representing the L1 norm of the matrix, the symbol |FA Frobenius norm representing a matrix;
2) the introduced implicit functions are shared by the input images, and the functional mapping can effectively relate the corresponding implicit functions on each image through the functional mapping consistent terms, each implicit function only appears in a certain subset of the images, and the implicit function z corresponding to the ith imagei=[zi1,zi2,…,zil]E {0, 1} characterizes the relationship between the latent function and the image, while the continuous variable Φi=[φi1,φi2,…,φil]Describing each implicit function on the image;
the correspondence term of the functional mapping in the last step is expressed as
<math>
<mrow>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>,</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>γ</mi>
<munder>
<mi>Σ</mi>
<mrow>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>∈</mo>
<mi>E</mi>
</mrow>
</munder>
<msubsup>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>Φ</mi>
<mi>j</mi>
</msub>
<mi>diag</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mn>2</mn>
<mn>2</mn>
</msubsup>
<mo>+</mo>
<mi>λ</mi>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msubsup>
<mrow>
<mo>|</mo>
<mo>|</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mi>diag</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>|</mo>
</mrow>
<mn>2</mn>
<mn>2</mn>
</msubsup>
<mo>,</mo>
</mrow>
</math>
Wherein the constant gamma>0,λ>0, symbol | · |)2L2 norm, diag (z) representing the matrixi) Represents a diagonal matrix, (i, j) ∈ Ε represents the neighbor set of the image pair, e.g., 20 neighbor images can be computed.
Obtaining functional mapping expression according to multi-modal mapping consistency in the step 5), and calculating a segmentation function corresponding to each graph through a joint optimization objective function, wherein the method specifically comprises the following steps:
1) computing functional mapping expressions from established multi-modal mapping consistency relationships, i.e.
<math>
<mrow>
<msub>
<mi>min</mi>
<mrow>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>,</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
</mrow>
</msub>
<munder>
<mi>Σ</mi>
<mrow>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>,</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>∈</mo>
<mi>E</mi>
</mrow>
</munder>
<mi>H</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mi>Q</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>R</mi>
<mi>ij</mi>
</msub>
<mo>,</mo>
<msub>
<mi>Φ</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>z</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
</math>
Wherein,variable phiiAnd phijThere is an orthogonal constraint between them, and the solution is carried out by using a variable alternation optimization method, namely fixing the other two variables to optimize the remaining one variable, namely the variable ziThe method is initialized to be a full 1 vector, and the optimal functional mapping expression R can be calculated through multiple iterations until the function is convergedij;
2) Taking an image sample as a pictureThe weight between two vertexes is recorded asThe joint optimization target expression of the image segmentation function is
Wherein the constant ζ>0, C ═ {1,2, …, C }, symbol (·)TRepresenting transposes of vectors or matrices, subspace BikThe first p eigenvectors of the superpixel graph Laplacian matrix on the ith image corresponding to the kth mode are expanded into different classes of segmentation functions ficMutual exclusion constraints are satisfied;
3) by solving the optimal solution of the objective function in the above steps, the optimal segmentation function of the ith image can be obtainedFrom this it can be determined that the optimal segmented block in the image belonging to the c-th object class is represented as