Nothing Special   »   [go: up one dir, main page]

CN104778683A - Multi-modal image segmenting method based on functional mapping - Google Patents

Multi-modal image segmenting method based on functional mapping Download PDF

Info

Publication number
CN104778683A
CN104778683A CN201510040592.4A CN201510040592A CN104778683A CN 104778683 A CN104778683 A CN 104778683A CN 201510040592 A CN201510040592 A CN 201510040592A CN 104778683 A CN104778683 A CN 104778683A
Authority
CN
China
Prior art keywords
image
msub
mrow
modal
functional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510040592.4A
Other languages
Chinese (zh)
Other versions
CN104778683B (en
Inventor
李平
李黎
李建军
俞俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Huicui Intelligent Technology Co ltd
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201510040592.4A priority Critical patent/CN104778683B/en
Publication of CN104778683A publication Critical patent/CN104778683A/en
Application granted granted Critical
Publication of CN104778683B publication Critical patent/CN104778683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to a multi-modal image segmenting method based on functional mapping. For an image set comprising a target, the multi-modal image segmenting method comprises the following steps: (1), segmenting an image into superpixel blocks, and representing the superpixel blocks by using different feature descriptors to obtain a multi-modal image representation; (2), establishing a superpixel map on the multi-modal image, and constructing a corresponding Laplacian matrix; (3), representing a reduction functional space of each image, and establishing functional mapping between image pairs; (4), aligning the image functional mapping of each modal to an image thread, and introducing an implicit function to keep consistency between functional mapping; (5), obtaining a functional mapping representation according to the consistency of the multi-modal mapping, and calculating a segmentation function corresponding to an image through combining and optimizing an objective function to obtain an optimal segmentation representation of the image. According to the multi-modal image segmenting method, each target region block of the image can be accurately judged by using a target potential relevance shared between feature representations of different modals of the image and the image, so that the image segmentation performance and effect are enhanced.

Description

Multi-modal image segmentation method based on functional mapping
Technical Field
The invention belongs to the technical field of image segmentation in image processing, and particularly relates to a multi-modal image segmentation method based on functional mapping.
Background
The vigorous development of digital image technology has brought forward a large number of emerging industries, such as remote sensing satellite image positioning, medical image analysis, traffic intelligent identification and the like, and the increasingly mature of the information-based society is promoted. Images serve as an important bridge for human perception of the world and are also closely related to the visual field. For example, image processing is increasingly in demand and plays an increasingly critical role in various visual applications in the fields of artificial intelligence, machine vision, physiology, medicine, meteorology, military science, and the like. The image segmentation is used as an image preprocessing method, a solid foundation is laid for high-level semantic analysis in the image, and the performance can be improved by using an image segmentation technology in many applications such as image recognition, target positioning, edge detection and the like.
Image segmentation, as the name implies, is the segmentation of a given image into regions according to some rule or goal. For example, an image captured by a lake may be divided into a plurality of regions representing different semantic categories, such as the lake surface, people, boats, houses, trees, sky, etc., where the people and the boats may be regarded as target foreground objects and the rest may be regarded as background objects. The traditional image segmentation technology mainly processes clues such as gray scale, color, texture, shape and the like of a single image, and typical methods include threshold segmentation, region segmentation, edge segmentation, image segmentation, energy-based functional segmentation and the like. For example, the threshold segmentation method judges the class of the gray value according to a set threshold; the edge segmentation method is used for detecting according to the characteristics that the edge gray value has step property or catastrophe property and the like; the region segmentation method is used for judging according to an image similarity criterion and mainly comprises the technologies of watershed, region splitting and merging, array region growing and the like; the image segmentation takes the image as an undirected graph with pixels as vertexes and adjacent pixels connected by edges, and each segmentation area is taken as a sub-graph in the graph; the method is characterized in that a target edge is represented by using a continuous curve based on energy functional segmentation, and a segmentation result is solved through energy functional minimization, and the method is generally divided into a parameter active contour model and a geometric active contour model.
The disadvantages of the above method are mainly expressed in the following aspects: firstly, the original pixels of the image are directly processed, so that the time complexity of the algorithm is increased, and the calculation cost is increased; second, underlying processing techniques such as thresholding and edge segmentation are difficult to correlate with the semantic features of the image; thirdly, complementary information between images is ignored, and especially, some common structural and potential information exists between images containing similar objects, so that the segmentation effect of the image objects is directly influenced. Therefore, these methods are not suitable for large-scale image segmentation tasks including common targets, and thus have a certain adverse effect on practical applications such as image recognition and target positioning with large magnitude. Based on these considerations, aiming at application fields such as intelligent traffic recognition, medical influence analysis, large-scale image recognition and the like, an image segmentation technology which can establish the association between the bottom-layer features and the semantic features of the images and can effectively utilize potential structural information among the images in multiple directions is urgently needed to be designed.
Disclosure of Invention
In order to effectively utilize potential structure information between images, reduce the computational complexity of image segmentation processing and improve the target segmentation effect in the images, the invention provides a multi-modal image segmentation method based on functional mapping, which comprises the following steps:
1. after acquiring an image set containing a target, performing the following operations:
1) segmenting each image in the set into superpixel blocks, and representing the segmented superpixels by using different feature descriptors to obtain a multi-modal image representation;
2) establishing a map based on superpixels on the multi-modal image, and constructing a corresponding Laplacian matrix;
3) characterizing a reduction functional space of each image, and establishing functional mapping between image pairs;
4) aligning the image functional mapping of each mode with an image clue, and introducing an implicit function to keep consistency between the functional mappings;
5) and obtaining functional mapping expression according to the multi-modal mapping consistency, calculating a segmentation function corresponding to the image through a joint optimization objective function to obtain optimal segmentation expression of the image, and finishing image segmentation.
Further, the segmenting each image in the set in step 1) into superpixel blocks, and characterizing the segmented superpixels with different feature descriptors to obtain a multi-modal image representation, specifically:
1) let the set consist of n associated images, denotedEach image contains one or more target classes, and the number of the target classes of the whole set is C;
2) taking pixels in the image as the vertexes of the image, dividing the image in the set into q small regions (such as 100) by using an image division method, wherein q is a positive integer, the small regions are formed by pixel points with similar values and are called super pixels, and the division block belonging to the c-th class in the ith image is expressed as SicWhere i ═ {1,2, …, n }, and C ═ 1,2, …, C };
3) using m different feature descriptors, such as Scale Invariant Feature Transform (SIFT), Local Binary Pattern (LBP), histogram of gradient (HOG), etc., to represent each superpixel in the image, thereby obtaining a multi-modal feature table reflecting image intrinsic information in multiple directionsFor example, the ith image corresponds to a matrix setI.e. the kth image feature descriptor corresponds to the kth modality
Further, the step 2) of establishing a map based on superpixels on the multi-modal image and constructing a corresponding laplacian matrix specifically includes:
2.1) regarding q superpixels on each image mode as the vertexes of the graph, and constructing a superpixel graph formed by fully connecting the corresponding vertexes;
2.2) respectively constructing Laplace matrixes on the superpixel graphs of different modesThey are obtained by a weight matrix W calculated by a gaussian weighting strategy, i.e., L ═ D-W, where D is a diagonal matrix whose diagonal elements are the sum of the column elements of W.
The linear reduction functional space of each image is characterized in the step 3), and functional mapping between image pairs is established, specifically:
1) computing a Laplace matrix for multiple modalitiesAnd the feature value and the feature vector of (c), and take the front p (p)<q) eigenvectors stretched into a reduced functional spaceAnd the corresponding eigenvalues on each mode of each image form a diagonal matrix respectively
2) Let each image segmentation function be foiCorresponds to SicThe search space of the function corresponds to a set of basis vectorsSpanned p-dimensional spaceAnd f isoiRepresenting coefficients of the ith image as a linear function combinationWherein B isiIs formed by the first p eigenvectors of the Laplace matrix;
3) reflecting the relationship between any two pairwise images by linear functional mapping, e.g. from the subspace of the ith imageSubspace to jth imageIs used for functional mappingRepresentation, i.e. subspaceMapping a function of (1) to a subspaceCan be obtained by calculating Rijf is obtained.
Aligning the image functional mapping of each modality with an image cue in the step 4), and introducing a latent function to keep consistency between functional mappings, specifically:
1) the image clues correspond to different description operators, and the image functional mapping of each modality is aligned with the image clues by optimizing the following expression, namely
<math> <mrow> <msub> <mi>min</mi> <msub> <mi>R</mi> <mi>ij</mi> </msub> </msub> <mi>H</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msubsup> <mi>X</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>-</mo> <msubsup> <mi>X</mi> <mi>j</mi> <mi>k</mi> </msubsup> <mo>|</mo> <mo>|</mo> </mrow> <mn>1</mn> </msub> <mo>+</mo> <mi>&alpha;</mi> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msubsup> <mi>&Lambda;</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&Lambda;</mi> <mi>j</mi> <mi>k</mi> </msubsup> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> <mi>F</mi> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&beta;</mi> <msub> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> <mn>1</mn> </msub> <mo>,</mo> </mrow> </math>
Wherein the constant alpha>0,β>0, symbol | · |)1Representing the L1 norm of the matrix, the symbol |FA Frobenius norm representing a matrix;
2) the introduced implicit functions are shared by the input images, and the functional mapping can effectively relate the corresponding implicit functions on each image through the functional mapping consistent terms, each implicit function only appears in a certain subset of the images, and the implicit function z corresponding to the ith imagei=[zi1,zi2,…,zil]E {0, 1} characterizes the relationship between the latent function and the image, while the continuous variable Φi=[φi1i2,…,φil]Describing each implicit function on the image;
the correspondence term of the functional mapping in the last step is expressed as
<math> <mrow> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>&gamma;</mi> <munder> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>E</mi> </mrow> </munder> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&Phi;</mi> <mi>j</mi> </msub> <mi>diag</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&lambda;</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mi>diag</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> <mn>2</mn> </msubsup> <mo>,</mo> </mrow> </math>
Wherein the constant gamma>0,λ>0, symbol | · |)2L2 norm, diag (z) representing the matrixi) Represents a diagonal matrix, (i, j) ∈ Ε represents the neighbor set of the image pair, e.g., 20 neighbor images can be computed.
The functional mapping expression is obtained according to the multi-modal mapping consistency in the step 5), and the segmentation function corresponding to each graph is calculated through a joint optimization objective function, specifically:
1) computing functional mapping expressions from established multi-modal mapping consistency relationships, i.e.
<math> <mrow> <msub> <mi>min</mi> <mrow> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>E</mi> </mrow> </munder> <mi>H</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
Wherein the variable phiiAnd phijThere is an orthogonal constraint between them, and the solution is carried out by using a variable alternation optimization method, namely fixing the other two variables to optimize the remaining one variable, namely the variable ziThe method is initialized to be a full 1 vector, and the optimal functional mapping expression R can be calculated through multiple iterations until the function is convergedij
2) Taking an image sample as a pictureThe weight between two vertexes is recorded asThe joint optimization target expression of the image segmentation function is
Wherein the constant ζ>0, C ═ {1,2, …, C }, symbol (·)TRepresenting transposes of vectors or matrices, subspace BikThe first p eigenvectors of the superpixel graph Laplacian matrix on the ith image corresponding to the kth mode are expanded into different classes of segmentation functions ficMutual exclusion constraints are satisfied;
3) by solving the optimal solution of the objective function in the above steps, the optimal segmentation function of the ith image can be obtainedFrom this it can be determined that the optimal segmented block in the image belonging to the c-th object class is represented as
The invention provides a multi-modal image segmentation method based on functional mapping, which has the advantages that: the super pixels are formed by carrying out image segmentation on the original pixels of the image, so that the calculation cost is reduced; reflecting the representation content of the image from the perspective of different descriptors by constructing a multi-modal superpixel representation; by establishing functional mapping between image pairs in the reduction functional space and keeping the consistency of the image pairs by using an implicit function, the association between the low-level features and the high-level semantics of the image is effectively established, thereby improving the image segmentation effect and laying a foundation for tamping visual applications such as image recognition, target positioning and the like.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention is further illustrated with reference to figure 1:
1. after acquiring an image set containing a target, performing the following operations:
1) segmenting each image in the set into superpixel blocks, and representing the segmented superpixels by using different feature descriptors to obtain a multi-modal image representation;
2) establishing a map based on superpixels on the multi-modal image, and constructing a corresponding Laplacian matrix;
3) characterizing a reduction functional space of each image, and establishing functional mapping between image pairs;
4) aligning the image functional mapping of each mode with an image clue, and introducing an implicit function to keep consistency between the functional mappings;
5) and obtaining functional mapping expression according to the multi-modal mapping consistency, calculating a segmentation function corresponding to the image through a joint optimization objective function to obtain optimal segmentation expression of the image, and finishing image segmentation.
The step 1) of segmenting each image in the set into superpixel blocks, and representing the segmented superpixels by using different feature descriptors to obtain a multi-modal image representation specifically comprises the following steps:
1) let the set consist of n associated images, denotedEach image contains one or more target classes, and the number of the target classes of the whole set is C;
2) taking pixels in the image as the vertexes of the image, dividing the image in the set into q small regions (such as 100) by using an image division method, wherein q is a positive integer, the small regions are formed by pixel points with similar values and are called super pixels, and the division block belonging to the c-th class in the ith image is expressed as SicWhere i ═ {1,2, …, n }, and C ═ 1,2, …, C };
3) representing each super pixel in the image by using m different feature descriptors such as Scale Invariant Feature Transform (SIFT), Local Binary Pattern (LBP), gradient Histogram (HOG) and the like, thereby obtaining multi-modal feature representation reflecting image intrinsic information in multiple directions, such as a matrix set corresponding to the ith imageI.e. the kth image feature descriptor corresponds to the kth modality
Establishing a map based on superpixels on the multi-modal image in the step 2), and constructing a corresponding Laplace matrix, wherein the specific steps are as follows:
2.1) regarding q superpixels on each image mode as the vertexes of the graph, and constructing a superpixel graph formed by fully connecting the corresponding vertexes;
2.2) respectively constructing Laplace matrixes on the superpixel graphs of different modesThey are obtained by a weight matrix W calculated by a gaussian weighting strategy, i.e. L ═ D-W, where D is the diagonal matrix of the sum of column elements whose diagonal elements are W.
The linear reduction functional space of each image is represented in the step 3), and functional mapping between image pairs is established, specifically:
1) computing a Laplace matrix for multiple modalitiesAnd the feature value and the feature vector of (c), and take the front p (p)<q) eigenvectors stretched into a reduced functional spaceAnd the corresponding eigenvalues on each mode of each image form a diagonal matrix respectively
2) Let each image segmentation function be foiCorresponds to SicThe search space of the function corresponds to a set of basis vectorsSpanned p-dimensional spaceAnd f isoiRepresenting coefficients of the ith image as a linear function combinationWherein B isiIs formed by the first p eigenvectors of the Laplace matrix;
3) reflecting the relationship between any two pairwise images by linear functional mapping, e.g. from the subspace of the ith imageSubspace to jth imageIs used for functional mappingRepresentation, i.e. subspaceMapping a function of (1) to a subspaceCan be obtained by calculating Rijf is obtained.
Aligning the image functional mapping of each modality with an image clue in the step 4), and introducing an implicit function to keep consistency between the functional mappings, wherein the method specifically comprises the following steps:
1) the image clues correspond to different description operators, and the image functional mapping of each modality is aligned with the image clues by optimizing the following expression, namely
<math> <mrow> <msub> <mi>min</mi> <msub> <mi>R</mi> <mi>ij</mi> </msub> </msub> <mi>H</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msubsup> <mi>X</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>-</mo> <msubsup> <mi>X</mi> <mi>j</mi> <mi>k</mi> </msubsup> <mo>|</mo> <mo>|</mo> </mrow> <mn>1</mn> </msub> <mo>+</mo> <mi>&alpha;</mi> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msubsup> <mi>&Lambda;</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>-</mo> <msubsup> <mi>&Lambda;</mi> <mi>j</mi> <mi>k</mi> </msubsup> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> <mi>F</mi> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&beta;</mi> <msub> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> <mn>1</mn> </msub> <mo>,</mo> </mrow> </math>
Wherein the constant alpha>0,β>0, symbol | · |)1Representing the L1 norm of the matrix, the symbol |FA Frobenius norm representing a matrix;
2) the introduced implicit functions are shared by the input images, and the functional mapping can effectively relate the corresponding implicit functions on each image through the functional mapping consistent terms, each implicit function only appears in a certain subset of the images, and the implicit function z corresponding to the ith imagei=[zi1,zi2,…,zil]E {0, 1} characterizes the relationship between the latent function and the image, while the continuous variable Φi=[φi1i2,…,φil]Describing each implicit function on the image;
the correspondence term of the functional mapping in the last step is expressed as
<math> <mrow> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>&gamma;</mi> <munder> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>E</mi> </mrow> </munder> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&Phi;</mi> <mi>j</mi> </msub> <mi>diag</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&lambda;</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mi>diag</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> <mn>2</mn> </msubsup> <mo>,</mo> </mrow> </math>
Wherein the constant gamma>0,λ>0, symbol | · |)2L2 norm, diag (z) representing the matrixi) Represents a diagonal matrix, (i, j) ∈ Ε represents the neighbor set of the image pair, e.g., 20 neighbor images can be computed.
Obtaining functional mapping expression according to multi-modal mapping consistency in the step 5), and calculating a segmentation function corresponding to each graph through a joint optimization objective function, wherein the method specifically comprises the following steps:
1) computing functional mapping expressions from established multi-modal mapping consistency relationships, i.e.
<math> <mrow> <msub> <mi>min</mi> <mrow> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>E</mi> </mrow> </munder> <mi>H</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
Wherein,variable phiiAnd phijThere is an orthogonal constraint between them, and the solution is carried out by using a variable alternation optimization method, namely fixing the other two variables to optimize the remaining one variable, namely the variable ziThe method is initialized to be a full 1 vector, and the optimal functional mapping expression R can be calculated through multiple iterations until the function is convergedij
2) Taking an image sample as a pictureThe weight between two vertexes is recorded asThe joint optimization target expression of the image segmentation function is
Wherein the constant ζ>0, C ═ {1,2, …, C }, symbol (·)TRepresenting transposes of vectors or matrices, subspace BikThe first p eigenvectors of the superpixel graph Laplacian matrix on the ith image corresponding to the kth mode are expanded into different classes of segmentation functions ficMutual exclusion constraints are satisfied;
3) by solving the optimal solution of the objective function in the above steps, the optimal segmentation function of the ith image can be obtainedFrom this it can be determined that the optimal segmented block in the image belonging to the c-th object class is represented as

Claims (6)

1. A multi-modal image segmentation method based on functional mapping is characterized in that the following operations are carried out on an image set containing a target:
1) segmenting each image in the set into superpixel blocks, and representing the segmented superpixels by using different feature descriptors to obtain a multi-modal image representation;
2) establishing a map based on superpixels on the multi-modal image, and constructing a corresponding Laplacian matrix;
3) characterizing a reduction functional space of each image, and establishing functional mapping between image pairs;
4) aligning the image functional mapping of each mode with an image clue, and introducing an implicit function to keep consistency between the functional mappings;
5) and obtaining functional mapping expression according to the multi-modal mapping consistency, calculating a segmentation function corresponding to the image through a joint optimization objective function to obtain optimal segmentation expression of the image, and finishing image segmentation.
2. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: in the step 1), each image in the set is segmented into superpixel blocks, and the segmented superpixels are characterized by different feature descriptors to obtain a multi-modal image representation, specifically:
1.1) the set is composed of n related images, and is recorded asEach image contains one or more target classes, and the number of the target classes of the whole set is C;
1.2) taking the pixels in the image as the vertexes of the image, dividing the image in the set into q small regions by using an image segmentation method, wherein q is a positive integer, the small regions are formed by pixel points with similar values and are called super pixels, and the segmentation block belonging to the class c in the ith image is expressed as SicWhere i ═ {1,2, …, n }, and C ═ 1,2, …, C };
1.3) using m different feature descriptors to represent each superpixel in the image, thereby obtaining multi-modal feature representation reflecting image intrinsic information in multiple directions, and setting a matrix set corresponding to the ith imageI.e. the kth image feature descriptor corresponds to the kth modality
3. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: establishing a map based on superpixels on the multi-modal image in the step 2), and constructing a corresponding Laplace matrix, specifically:
2.1) regarding q superpixels on each image mode as the vertexes of the graph, and constructing a superpixel graph formed by fully connecting the corresponding vertexes;
2.2) respectively constructing Laplace matrixes on the superpixel graphs of different modesThey are obtained by a weight matrix W calculated by a gaussian weighting strategy, i.e., L ═ D-W, where D is a diagonal matrix whose diagonal elements are the sum of the column elements of W.
4. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: the reduction functional space of each image is characterized in the step 3), and functional mapping between image pairs is established, specifically:
3.1) computing the Laplace matrix of the multimodal imageAnd taking the first p eigenvectors to stretch into a reduced functional spaceAnd the corresponding eigenvalues on each mode of each image form a diagonal matrix respectivelyWherein p is<q;
3.2) setting each image segmentation function as foiCorresponds to SicThe search space of the function corresponds to a set of basis vectorsFormed by stretchingSpace of p dimensionAnd f isoiRepresenting coefficients of the ith image as a linear function combinationWherein B isiIs formed by the first p eigenvectors of the Laplace matrix;
3.3) reflecting the relation between any two images in pairs through linear functional mapping, and obtaining the subspace of the ith imageSubspace to jth imageIs used for functional mappingRepresentation, i.e. subspaceMapping a function of (1) to a subspaceCan be obtained by calculating Rijf is obtained.
5. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: aligning the image functional mapping of each modality with an image cue in the step 4), and introducing a latent function to keep consistency between functional mappings, specifically:
4.1) image clues correspond to different description operators, and the image functional mapping of each modality is aligned with the image clues by optimizing the following expression, namely
<math> <mrow> <msub> <mi>min</mi> <msub> <mi>R</mi> <mi>ij</mi> </msub> </msub> <mi>H</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msub> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msubsup> <mi>X</mi> <mi>i</mi> <mi>k</mi> </msubsup> <mo>-</mo> <msubsup> <mi>X</mi> <mi>j</mi> <mi>k</mi> </msubsup> <mo>|</mo> <mo>|</mo> </mrow> <mn>1</mn> </msub> <mo>+</mo> <mi>&alpha;</mi> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msubsup> <mi>&Lambda;</mi> <mi>j</mi> <mi>k</mi> </msubsup> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> <mi>F</mi> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&beta;</mi> <msub> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> <mn>1</mn> </msub> <mo>,</mo> </mrow> </math>
Wherein the constant alpha>0,β>0, symbol | · | non-conducting phosphor1Representing the L1 norm of the matrix, the symbol | | · | | non |)FA Frobenius norm representing a matrix;
4.2) the introduced implicit functions are shared by the input images, and the corresponding implicit functions on each image can be effectively related by functional mapping a consistent item, wherein each implicit function only appears in a certain subset of the images, and the ith imageImage corresponding implicit function zi=[zi1,zi2,…,zil]E {0, 1} characterizes the relationship between the latent function and the image, while the continuous variable Φi=[φi1i2,…,φil]Describing each implicit function on the image;
the correspondence term of the functional mapping in the last step is expressed as
<math> <mrow> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>&gamma;</mi> <munder> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>E</mi> </mrow> </munder> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&Phi;</mi> <mi>j</mi> </msub> <mi>diag</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&lambda;</mi> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mi>diag</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> <mn>2</mn> </msubsup> <mo>,</mo> </mrow> </math>
Wherein the constant gamma>0,λ>0, symbol | · | non-conducting phosphor2L2 norm, diag (z) representing the matrixi) Representing a diagonal matrix, (i, j) ∈ Ε represents the neighbor set of the image pair.
6. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: the functional mapping expression is obtained according to the multi-modal mapping consistency in the step 5), and the segmentation function corresponding to each graph is calculated through a joint optimization objective function, specifically:
5.1) computing a functional mapping expression according to the multi-modal mapping consistency relationship established in the step 4), namely
<math> <mrow> <msub> <mi>min</mi> <mrow> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> </mrow> </msub> <munder> <mi>&Sigma;</mi> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>E</mi> </mrow> </munder> <mi>H</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>&Phi;</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>
Wherein the variable phiiAnd phijThere is an orthogonal constraint between them, and the solution is carried out by using a variable alternation optimization method, namely fixing the other two variables to optimize the remaining one variable, namely the variable ziThe method is initialized to be a full 1 vector, and the optimal functional mapping expression R can be calculated through multiple iterations until the function is convergedij
5.2) using image sample as imageThe weight between two vertexes is recorded asThe joint optimization target expression of the image segmentation function is
Wherein the constant ζ>0, C ═ {1,2, …, C }, symbol (·)TRepresenting transposes of vectors or matrices, subspace BikThe first p eigenvectors of the superpixel graph Laplacian matrix on the ith image corresponding to the kth mode are expanded into different classes of segmentation functions ficMutual exclusion constraints are satisfied;
5.3) obtaining the optimal segmentation function of the ith image by solving the optimal solution of the objective function in 5.2)From this it can be determined that the optimal segmented block in the image belonging to the c-th object class is represented as
CN201510040592.4A 2015-01-27 2015-01-27 A kind of multi-modality images dividing method based on Functional Mapping Active CN104778683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510040592.4A CN104778683B (en) 2015-01-27 2015-01-27 A kind of multi-modality images dividing method based on Functional Mapping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510040592.4A CN104778683B (en) 2015-01-27 2015-01-27 A kind of multi-modality images dividing method based on Functional Mapping

Publications (2)

Publication Number Publication Date
CN104778683A true CN104778683A (en) 2015-07-15
CN104778683B CN104778683B (en) 2017-06-27

Family

ID=53620129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510040592.4A Active CN104778683B (en) 2015-01-27 2015-01-27 A kind of multi-modality images dividing method based on Functional Mapping

Country Status (1)

Country Link
CN (1) CN104778683B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069787A (en) * 2015-08-04 2015-11-18 浙江慧谷信息技术有限公司 Image joint segmentation algorithm based on consistency function space mapping
CN106202281A (en) * 2016-06-28 2016-12-07 广东工业大学 A kind of multi-modal data represents learning method and system
CN109993756A (en) * 2019-04-09 2019-07-09 中康龙马(北京)医疗健康科技有限公司 A kind of general medical image cutting method based on graph model Yu continuous successive optimization
CN111382776A (en) * 2018-12-26 2020-07-07 株式会社日立制作所 Object recognition device and object recognition method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093470A (en) * 2013-01-23 2013-05-08 天津大学 Rapid multi-modal image synergy segmentation method with unrelated scale feature
US20140050391A1 (en) * 2012-08-17 2014-02-20 Nec Laboratories America, Inc. Image segmentation for large-scale fine-grained recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140050391A1 (en) * 2012-08-17 2014-02-20 Nec Laboratories America, Inc. Image segmentation for large-scale fine-grained recognition
CN103093470A (en) * 2013-01-23 2013-05-08 天津大学 Rapid multi-modal image synergy segmentation method with unrelated scale feature

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
苏坡等: ""基于超像素的多模态MRI脑胶质瘤分割"", 《西北工业大学学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069787A (en) * 2015-08-04 2015-11-18 浙江慧谷信息技术有限公司 Image joint segmentation algorithm based on consistency function space mapping
CN106202281A (en) * 2016-06-28 2016-12-07 广东工业大学 A kind of multi-modal data represents learning method and system
CN111382776A (en) * 2018-12-26 2020-07-07 株式会社日立制作所 Object recognition device and object recognition method
CN109993756A (en) * 2019-04-09 2019-07-09 中康龙马(北京)医疗健康科技有限公司 A kind of general medical image cutting method based on graph model Yu continuous successive optimization
CN109993756B (en) * 2019-04-09 2022-04-15 中康龙马(北京)医疗健康科技有限公司 General medical image segmentation method based on graph model and continuous stepwise optimization

Also Published As

Publication number Publication date
CN104778683B (en) 2017-06-27

Similar Documents

Publication Publication Date Title
Liu et al. RoadNet: Learning to comprehensively analyze road networks in complex urban scenes from high-resolution remotely sensed images
Lahoud et al. 3d instance segmentation via multi-task metric learning
Melekhov et al. Dgc-net: Dense geometric correspondence network
Giraud et al. SuperPatchMatch: An algorithm for robust correspondences using superpixel patches
Choong et al. Image segmentation via normalised cuts and clustering algorithm
Nguyen et al. Satellite image classification using convolutional learning
CN107146219B (en) Image significance detection method based on manifold regularization support vector machine
CN109840518B (en) Visual tracking method combining classification and domain adaptation
Khan et al. A modified adaptive differential evolution algorithm for color image segmentation
CN104778683A (en) Multi-modal image segmenting method based on functional mapping
CN111091129A (en) Image salient region extraction method based on multi-color characteristic manifold sorting
Sima et al. Bottom-up merging segmentation for color images with complex areas
CN112330639A (en) Significance detection method for color-thermal infrared image
Wu et al. A cascaded CNN-based method for monocular vision robotic grasping
Xiang et al. Turbopixel segmentation using eigen-images
CN114638953B (en) Point cloud data segmentation method and device and computer readable storage medium
Geng et al. A novel color image segmentation algorithm based on JSEG and Normalized Cuts
Hassan et al. Salient object detection based on CNN fusion of two types of saliency models
CN109636818A (en) A kind of Laplce&#39;s canonical constrains the Target Segmentation method of lower low-rank sparse optimization
Gao et al. SAMM: surroundedness and absorption Markov model based visual saliency detection in images
Mhamdi et al. A local approach for 3D object recognition through a set of size functions
Rahimi et al. Single image ground plane estimation
Cai et al. Higher level segmentation: Detecting and grouping of invariant repetitive patterns
Xing et al. Improving Reliability of Heterogeneous Change Detection by Sample Synthesis and Knowledge Transfer
Sun et al. An enhanced affinity graph for image segmentation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220808

Address after: Room 406, building 19, haichuangyuan, No. 998, Wenyi West Road, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: HANGZHOU HUICUI INTELLIGENT TECHNOLOGY CO.,LTD.

Address before: 310018 No. 2 street, Xiasha Higher Education Zone, Hangzhou, Zhejiang

Patentee before: HANGZHOU DIANZI University

TR01 Transfer of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Multimodal Image Segmentation Method Based on Functional Mapping

Granted publication date: 20170627

Pledgee: Guotou Taikang Trust Co.,Ltd.

Pledgor: HANGZHOU HUICUI INTELLIGENT TECHNOLOGY CO.,LTD.

Registration number: Y2024980004919

PE01 Entry into force of the registration of the contract for pledge of patent right