CN104778683A

CN104778683A - Multi-modal image segmenting method based on functional mapping

Info

Publication number: CN104778683A
Application number: CN201510040592.4A
Authority: CN
Inventors: 李平; 李黎; 李建军; 俞俊
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Huicui Intelligent Technology Co ltd
Priority date: 2015-01-27
Filing date: 2015-01-27
Publication date: 2015-07-15
Anticipated expiration: 2035-01-27
Also published as: CN104778683B

Abstract

The invention relates to a multimodal image segmentation method based on functional mapping. The present invention performs the following operations on the image set containing the target: 1) divide the image into superpixel blocks, and use different feature descriptors to represent them to obtain a multimodal image representation; 2) establish a superpixel map on the multimodal image, Construct the corresponding Laplacian matrix; 3) Characterize the reduced functional space of each image, and establish the functional mapping between image pairs; 4) Align the image functional mapping of each modality with image cues, introduce The implicit function maintains the consistency between the functional mappings; 5) The functional mapping expression is obtained according to the consistency of the multimodal mapping, and the segmentation function corresponding to the image is calculated by jointly optimizing the objective function to obtain the optimal segmentation representation of the image. The present invention can accurately determine each target area block of the image by utilizing the feature representation of different modalities of the image and the potential relationship of the common target between the images, thereby enhancing the performance and effect of image segmentation.

Description

A Multimodal Image Segmentation Method Based on Functional Mapping

技术领域technical field

本发明属于图像处理中的图像分割技术领域，特别是基于泛函映射的多模态图像分割方法。The invention belongs to the technical field of image segmentation in image processing, in particular to a multimodal image segmentation method based on functional mapping.

背景技术Background technique

数字图像技术的蓬勃发展催生了大量的新兴产业，如遥感卫星图像定位、医学影像分析、交通智能识别等等，促进了信息化社会的日臻成熟。图像作为人类感知世界的重要桥梁，与视觉领域也紧密相关。例如，图像处理在人工智能、机器视觉、生理学、医学、气象学、军事学等领域的各类视觉应用中的需求不断增长，并且发挥着越来越关键的作用。而图像分割作为图像预处理方法，为图像中高层语义分析奠定了坚实的基础，例如图像识别、目标定位、边缘检测等许多应用中均可使用图像分割技术提升性能。The vigorous development of digital image technology has spawned a large number of emerging industries, such as remote sensing satellite image positioning, medical image analysis, traffic intelligent identification, etc., which has promoted the maturity of the information society. As an important bridge for humans to perceive the world, images are also closely related to the visual field. For example, image processing is in increasing demand and playing an increasingly critical role in various visual applications in the fields of artificial intelligence, machine vision, physiology, medicine, meteorology, military science, etc. As an image preprocessing method, image segmentation has laid a solid foundation for high-level semantic analysis in images. For example, image segmentation technology can be used to improve performance in many applications such as image recognition, target positioning, and edge detection.

图像分割，顾名思义是将给定的图像按照某种规则或目标进行区域分割。例如，一副湖边拍摄的图像可以分割为湖面、人、小船、房屋、树丛、天空等多个代表不同语义类别的区域，这里的人和小船可以看做目标前景对象，其余的可看做背景对象。传统的图像分割技术主要针对单幅图像的灰度、颜色、纹理、形状等线索进行处理，典型的方法有阈值分割、区域分割、边缘分割、图分割、基于能量泛函分割等。例如阈值分割方法根据设定阈值对灰度值判断其所属类别；边缘分割方法根据边缘灰度值具有阶跃性或突变性等特点进行检测；区域分割方法根据图像相似性准则进行判定，主要有分水岭、区域分裂合并、阵子区域生长等技术；图分割将图像看做以像素为顶点而相邻像素用边连接的无向图，每个分割区域看做图中的子图；基于能量泛函分割利用连续曲线表示目标边缘，并通过能量泛函最小化求解分割结果，一般分为参数活动轮廓模型和几何活动轮廓模型两种。Image segmentation, as the name implies, is to segment a given image into regions according to certain rules or targets. For example, an image taken by a lake can be divided into multiple regions representing different semantic categories such as the lake surface, people, boats, houses, bushes, and sky. The people and boats here can be regarded as the target foreground objects, and the rest can be regarded as background object. Traditional image segmentation technology mainly deals with clues such as grayscale, color, texture, and shape of a single image. Typical methods include threshold segmentation, region segmentation, edge segmentation, graph segmentation, and energy functional-based segmentation. For example, the threshold segmentation method judges the category of the gray value according to the set threshold; the edge segmentation method detects the gray value of the edge according to the characteristics of step or mutation; the region segmentation method judges according to the image similarity criterion, mainly including Watershed, region splitting and merging, sub-region growth and other technologies; graph segmentation regards an image as an undirected graph with pixels as vertices and adjacent pixels connected by edges, and each segmented region is regarded as a subgraph in the graph; based on energy functional Segmentation uses continuous curves to represent the target edge, and solves the segmentation results through energy functional minimization. It is generally divided into two types: parametric active contour model and geometric active contour model.

上述方法的不足点主要表现在以下几个方面：第一，直接针对图像原始像素处理，增加了算法的时间复杂度，加大了计算开销；第二，底层处理技术如阈值分割和边缘分割很难与图像的语义特征建立关联；第三，忽略了图像之间的互补信息，尤其是包含相似目标的图像之间存在一些共同的结构和潜在信息，直接影响了图像目标的分割效果。因此，这些方法并不适合大规模的包含共同目标的图像分割任务，由此对数量级较大的图像识别、目标定位等实际应用产生一定的不利影响。基于这些考虑，针对智能交通识别、医学影响分析、大规模图像识别等应用领域，迫切需要设计一种能建立图像底层特征与语义特征的关联，并可多方位有效利用图像之间的潜在结构信息的图像分割技术。The shortcomings of the above methods are mainly manifested in the following aspects: first, direct processing of the original pixels of the image increases the time complexity of the algorithm and increases the computational overhead; second, the underlying processing techniques such as threshold segmentation and edge segmentation are very difficult It is difficult to establish an association with the semantic features of the image; third, the complementary information between images is ignored, especially there are some common structures and potential information between images containing similar objects, which directly affects the segmentation effect of image objects. Therefore, these methods are not suitable for large-scale image segmentation tasks that contain a common target, and thus have a certain adverse effect on practical applications such as image recognition and target positioning that have a larger order of magnitude. Based on these considerations, for intelligent traffic recognition, medical impact analysis, large-scale image recognition and other application fields, it is urgent to design a method that can establish the association between the underlying image features and semantic features, and effectively utilize the potential structural information between images in multiple directions. image segmentation technology.

发明内容Contents of the invention

为了有效利用图像之间的潜在结构信息，降低图像分割处理的计算复杂度，提升图像中的目标分割效果，本发明提出了一种基于泛函映射的多模态图像分割方法，该方法包括以下步骤：In order to effectively utilize the potential structure information between images, reduce the computational complexity of image segmentation processing, and improve the target segmentation effect in the image, the present invention proposes a multimodal image segmentation method based on functional mapping, which includes the following step:

1、获取包含目标的图像集合后，进行以下操作：1. After obtaining the image collection containing the target, perform the following operations:

1)将集合中的各图像分割成超像素块，用不同的特征描述子表征分割后的超像素以获得多模态图像表示；1) Segment each image in the set into superpixel blocks, and use different feature descriptors to characterize the segmented superpixels to obtain a multimodal image representation;

2)在多模态图像上建立基于超像素的图，并构建相应的拉普拉斯矩阵；2) Establish a superpixel-based graph on the multimodal image, and construct the corresponding Laplacian matrix;

3)表征每幅图像的约减泛函空间，建立图像对之间的泛函映射；3) Characterize the reduced functional space of each image, and establish a functional mapping between image pairs;

4)将每种模态的图像泛函映射与图像线索对齐，并引入隐函数保持泛函映射之间的一致性；4) Align the image functional mapping of each modality with the image cues, and introduce an implicit function to maintain the consistency between the functional mappings;

5)依据多模态映射一致性获得泛函映射表达，通过联合优化目标函数计算图像对应的分割函数，得到图像最优分割表示，完成图像分割。5) The functional mapping expression is obtained according to the consistency of the multimodal mapping, and the segmentation function corresponding to the image is calculated by jointly optimizing the objective function, and the optimal segmentation representation of the image is obtained to complete the image segmentation.

进一步，所述的步骤1)中所述的将集合中的各图像分割成超像素块，用不同的特征描述子表征分割后的超像素以获得多模态图像表示，具体是：Further, in the step 1), each image in the set is divided into superpixel blocks, and different feature descriptors are used to characterize the divided superpixels to obtain a multimodal image representation, specifically:

1)设集合由n幅相关联的图像组成，记为每幅图像含有一个或多个目标类，整个集合的目标类别数目为C；1) Let the set consist of n associated images, denoted as Each image contains one or more target categories, and the number of target categories in the entire set is C;

2)将图像中的像素看做图的顶点，利用图分割方法将集合中的图像划分为q个小区域(如100)，q为正整数，这些小区域由取值相近的像素点构成，称之为超像素，第i幅图像中属于第c类的分割块表示为S_ic，其中i＝{1,2,…,n}，c＝{1,2,…,C}；2) Consider the pixels in the image as the vertices of the graph, and use the graph segmentation method to divide the image in the set into q small areas (such as 100), where q is a positive integer, and these small areas are composed of pixels with similar values. It is called a superpixel, and the segmentation block belonging to the c-th category in the i-th image is represented as S _ic , where i={1,2,...,n}, c={1,2,...,C};

3)利用m种不同的特征描述子，如尺度不变特征变换(SIFT)、局部二值模式(LBP)、梯度直方图(HOG)等，表征图像中的各超像素，从而获得多方位反映图像本征信息的多模态特征表示，如第i幅图像对应矩阵集合即第k种图像特征描述子对应第k种模态 3) Use m different feature descriptors, such as scale-invariant feature transform (SIFT), local binary pattern (LBP), gradient histogram (HOG), etc., to characterize each superpixel in the image, so as to obtain multi-directional reflection Multimodal feature representation of image intrinsic information, such as the i-th image corresponding matrix set That is, the kth image feature descriptor corresponds to the kth modality

进一步，所述的步骤2)中的在多模态图像上建立基于超像素的图，并构建相应的拉普拉斯矩阵，具体是：Further, in the described step 2), a superpixel-based graph is established on the multimodal image, and a corresponding Laplacian matrix is constructed, specifically:

2.1)将每种图像模态上的q个超像素看做图的顶点，构建相应顶点全连接而成的超像素图；2.1) Treat the q superpixels on each image modality as the vertices of the graph, and construct a superpixel graph that is fully connected to the corresponding vertices;

2.2)分别在各不同模态的超像素图上构建拉普拉斯矩阵它们通过高斯加权策略计算的权重矩阵W获得，即L＝D-W，其中D是一对角阵，其对角线元素为W的各列元素和。2.2) Construct the Laplacian matrix on the superpixel maps of different modalities They are obtained through the weight matrix W calculated by the Gaussian weighting strategy, that is, L=DW, where D is a pair of diagonal matrices, and its diagonal elements are the sum of elements in each column of W.

所述的步骤3)中的表征每幅图像的线性约减泛函空间，建立图像对之间的泛函映射，具体是：Described step 3) characterizes the linear reduction functional space of each image, and sets up the functional mapping between image pairs, specifically:

1)计算多模态的拉普拉斯矩阵的特征值和特征向量，并取前p(p<q)个特征向量张成约减泛函空间且每幅图像的各模态上对应的特征值分别组成对角矩阵 1) Calculate the multimodal Laplacian matrix The eigenvalues and eigenvectors of , and take the first p (p<q) eigenvectors to form a reduced functional space And the eigenvalues corresponding to each mode of each image form a diagonal matrix

2)设每幅图像分割函数为f_oi对应S_ic，该函数的搜索空间对应一组基向量所张成的p维空间且f_oi对第i幅图像的系数表示为线性函数组合其中B_i为拉普拉斯矩阵前p个特征向量构成；2) Let the segmentation function of each image be f _oi corresponding to S _ic , and the search space of this function corresponds to a set of basis vectors The spanned p-dimensional space And the coefficient of f _oi on the i-th image is expressed as a combination of linear functions Among them, B _i is composed of the first p eigenvectors of the Laplacian matrix;

3)通过线性泛函映射反映任意两两成对图像之间的关系，如从第i幅图像的子空间到第j幅图像的子空间的泛函映射用矩阵表示，即子空间中的函数映射到子空间中的表达值可由计算R_ijf得到。3) Reflect the relationship between any pair of images through linear functional mapping, such as from the subspace of the i-th image to the subspace of the jth image The matrix for the functional mapping of means that the subspace The functions in are mapped to the subspace The expression value in can be obtained by calculating R _ij f.

所述的步骤4)中的将每种模态的图像泛函映射与图像线索对齐，并引入隐函数保持泛函映射之间的一致性，具体是：In the step 4), the image functional mapping of each modality is aligned with the image clues, and an implicit function is introduced to maintain the consistency between the functional mappings, specifically:

1)图像线索对应不同的描述算子，每种模态的图像泛函映射与图像线索对齐通过优化以下表达式实现，即1) The image cues correspond to different description operators, and the image functional mapping of each modality is aligned with the image cues by optimizing the following expression, namely

${min min}_{{R R}_{ij ij}} H h (({R R}_{ij ij})) = = {Σ Σ}_{k k = = 11}^{m m} {| | | | {R R}_{ij ij} {X x}_{i i}^{k k} - - {X x}_{j j}^{k k} | | | |}_{11} + + α α {| | | | {R R}_{ij ij} {Λ Λ}_{i i}^{k k} - - {Λ Λ}_{j j}^{k k} {R R}_{ij ij} | | | |}_{F f}^{22} + + β β {| | | | {R R}_{ij ij} | | | |}_{11},,$

其中，常数α>0，β>0，符号‖·‖₁表示矩阵的L1范数，符号‖·‖_F表示矩阵的Frobenius范数；Among them, the constant α>0, β>0, the symbol ‖·‖ ₁ represents the L1 norm of the matrix, and the symbol ‖·‖ _F represents the Frobenius norm of the matrix;

2)引入的隐函数由输入图像共享，且通过泛函映射一致项使得泛函映射能有效关联每幅图像上对应的隐函数，而每个隐函数仅出现在图像的某个子集中，且第i幅图像对应的隐函数z_i＝[z_i1，z_i2，…，z_il]∈{0，1}表征隐函数与图像之间的关系，而连续变量Φ_i＝[φ_i1,φ_i2,…,φ_il]对图像上的各隐函数进行描述；2) The introduced implicit function is shared by the input image, and the functional mapping can effectively associate the corresponding implicit function on each image through the functional mapping consistent item, and each implicit function only appears in a certain subset of the image, and the first The implicit function z _i =[z _i1 , z _i2 ,..., z _il ]∈{0, 1} corresponding to the i image represents the relationship between the implicit function and the image, and the continuous variable Φ _i =[φ _i1 ,φ _i2 ,...,φ _il ] describe each implicit function on the image;

上一步中的泛函映射一致项表示为The functional mapping consensus term in the previous step is expressed as

$Q Q (({R R}_{ij ij},, {Φ Φ}_{i i},, {z z}_{i i})) = = γ γ \underset{((i i,, j j)) &Element; &Element; E E.}{Σ Σ} {| | | | {R R}_{ij ij} {Φ Φ}_{i i} - - {Φ Φ}_{j j} diag diag (({z z}_{i i})) | | | |}_{22}^{22} + + λ λ {Σ Σ}_{i i = = 11}^{n no} {| | | | {Φ Φ}_{i i} - - {Φ Φ}_{i i} diag diag (({z z}_{i i})) | | | |}_{22}^{22},,$

其中，常数γ>0，λ>0，符号‖·‖₂表示矩阵的L2范数，diag(z_i)表示一对角矩阵，(i,j)∈Ε表示图像对的近邻集合，如可取20幅邻居图像进行计算。Among them, the constant γ>0, λ>0, the symbol ‖·‖ ₂ represents the L2 norm of the matrix, diag(z _i ) represents a diagonal matrix, (i,j)∈E represents the neighbor set of the image pair, such as 20 neighbor images are calculated.

所述的步骤5)中的依据多模态映射一致性获得泛函映射表达，通过联合优化目标函数计算每幅图对应的分割函数，具体是：According to the multimodal mapping consistency in the described step 5), the functional mapping expression is obtained, and the segmentation function corresponding to each picture is calculated by jointly optimizing the objective function, specifically:

1)依据已建立的多模态映射一致性关系计算泛函映射表达，即1) Calculate the functional mapping expression according to the established multimodal mapping consistency relationship, namely

${min min}_{{R R}_{ij ij},, {Φ Φ}_{i i},, {z z}_{i i}} \underset{((i i,, j j)) &Element; &Element; E E.}{Σ Σ} H h (({R R}_{ij ij})) + + Q Q (({R R}_{ij ij},, {Φ Φ}_{i i},, {z z}_{i i})),,$

其中，变量Φ_i与Φ_j之间存在正交约束，这里采用变量交替优化方法进行求解，即固定其他两个变量优化剩余的一个变量，变量z_i初始化为全1向量，通过多次迭代直至函数收敛，可计算得到最优的泛函映射表达R_ij；Among them, there is an orthogonal constraint between the variables Φ _i and Φ _j. Here, the variable alternation optimization method is used to solve the problem, that is, the other two variables are fixed to optimize the remaining variable, and the variable z _i is initialized as a vector of all 1s. The function converges, and the optimal functional mapping expression R _ij can be calculated;

2)以图像样本为图上的顶点，两顶点之间的权重记为则图像分割函数的联合优化目标表达式为2) Take the image sample as a picture The vertices on , the weight between two vertices is recorded as Then the joint optimization objective expression of the image segmentation function is

其中，常数ζ>0，c＝{1，2，…，C}，符号(·)^T表示向量或矩阵的转置，子空间B_ik由第i幅图像对应第k种模态上超像素图拉普拉斯矩阵的前p个特征向量张成，且不同类别的分割函数f_ic满足互斥约束；Among them, the constant ζ>0, c={1, 2, ..., C}, the symbol ( ) ^T represents the transposition of a vector or matrix, and the subspace B _ik consists of the i-th image corresponding to the superpixel on the k-th modality The first p eigenvectors of the graph Laplacian matrix are spanned, and the segmentation functions f _ic of different categories satisfy the mutual exclusion constraints;

3)通过求解上述步骤中目标函数的最优解，可得到第i幅图像的最优分割函数据此可以确定图像中属于第c个目标类别的最优分割块表示为 3) By solving the optimal solution of the objective function in the above steps, the optimal segmentation function of the i-th image can be obtained According to this, it can be determined that the optimal segmentation block belonging to the c-th target category in the image is expressed as

本发明提出了基于泛函映射的多模态图像分割方法，其优点在于：通过对图像原始像素进行图分割形成超像素，降低了计算开销；通过构建多模态超像素表示从不同描述子的角度反映图像的表征内容；通过在约减泛函空间建立图像对之间的泛函映射，以及利用隐函数保持其一致性，有效地建立了图像的低层特征与高层语义之间的关联，进而提升了图像分割效果，为如图像识别、目标定位等视觉应用奠定了夯实的基础。The present invention proposes a multimodal image segmentation method based on functional mapping, which has the advantages of: forming superpixels by image segmentation on the original pixels of the image, which reduces computational overhead; The angle reflects the representation content of the image; by establishing the functional mapping between image pairs in the reduced functional space, and using the implicit function to maintain its consistency, the relationship between the low-level features of the image and the high-level semantics is effectively established, and then It improves the image segmentation effect and lays a solid foundation for visual applications such as image recognition and target positioning.

附图说明Description of drawings

图1是本发明的方法流程图。Fig. 1 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

参照附图1，进一步说明本发明：With reference to accompanying drawing 1, further illustrate the present invention:

步骤1)中所述的将集合中的各图像分割成超像素块，用不同的特征描述子表征分割后的超像素以获得多模态图像表示，具体是：In step 1), each image in the collection is divided into superpixel blocks, and different feature descriptors are used to characterize the divided superpixels to obtain a multimodal image representation, specifically:

步骤2)中的在多模态图像上建立基于超像素的图，并构建相应的拉普拉斯矩阵，具体是：In step 2), a superpixel-based graph is established on the multimodal image, and a corresponding Laplacian matrix is constructed, specifically:

2.2)分别在各不同模态的超像素图上构建拉普拉斯矩阵它们通过高斯加权策略计算的权重矩阵W获得，即L＝D-W，其中D是对角线元素为W的列元素和的对角矩阵。2.2) Construct the Laplacian matrix on the superpixel maps of different modalities They are obtained by a weight matrix W calculated by a Gaussian weighting strategy, that is, L=DW, where D is a diagonal matrix whose diagonal elements are sums of column elements of W.

步骤3)中的表征每幅图像的线性约减泛函空间，建立图像对之间的泛函映射，具体是：In step 3), characterize the linear reduction functional space of each image, and establish the functional mapping between image pairs, specifically:

步骤4)中的将每种模态的图像泛函映射与图像线索对齐，并引入隐函数保持泛函映射之间的一致性，具体是：In step 4), the image functional mapping of each modality is aligned with the image clues, and an implicit function is introduced to maintain the consistency between the functional mappings, specifically:

步骤5)中的依据多模态映射一致性获得泛函映射表达，通过联合优化目标函数计算每幅图对应的分割函数，具体是：In step 5), the functional mapping expression is obtained according to the multimodal mapping consistency, and the segmentation function corresponding to each image is calculated by jointly optimizing the objective function, specifically:

Claims

1. A multi-modal image segmentation method based on functional mapping is characterized in that the following operations are carried out on an image set containing a target:

1) segmenting each image in the set into superpixel blocks, and representing the segmented superpixels by using different feature descriptors to obtain a multi-modal image representation;

2) establishing a map based on superpixels on the multi-modal image, and constructing a corresponding Laplacian matrix;

3) characterizing a reduction functional space of each image, and establishing functional mapping between image pairs;

4) aligning the image functional mapping of each mode with an image clue, and introducing an implicit function to keep consistency between the functional mappings;

5) and obtaining functional mapping expression according to the multi-modal mapping consistency, calculating a segmentation function corresponding to the image through a joint optimization objective function to obtain optimal segmentation expression of the image, and finishing image segmentation.

2. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: in the step 1), each image in the set is segmented into superpixel blocks, and the segmented superpixels are characterized by different feature descriptors to obtain a multi-modal image representation, specifically:

1.1) the set is composed of n related images, and is recorded asEach image contains one or more target classes, and the number of the target classes of the whole set is C;

1.2) taking the pixels in the image as the vertexes of the image, dividing the image in the set into q small regions by using an image segmentation method, wherein q is a positive integer, the small regions are formed by pixel points with similar values and are called super pixels, and the segmentation block belonging to the class c in the ith image is expressed as S_icWhere i ═ {1,2, …, n }, and C ═ 1,2, …, C };

1.3) using m different feature descriptors to represent each superpixel in the image, thereby obtaining multi-modal feature representation reflecting image intrinsic information in multiple directions, and setting a matrix set corresponding to the ith imageI.e. the kth image feature descriptor corresponds to the kth modality

3. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: establishing a map based on superpixels on the multi-modal image in the step 2), and constructing a corresponding Laplace matrix, specifically:

2.1) regarding q superpixels on each image mode as the vertexes of the graph, and constructing a superpixel graph formed by fully connecting the corresponding vertexes;

2.2) respectively constructing Laplace matrixes on the superpixel graphs of different modesThey are obtained by a weight matrix W calculated by a gaussian weighting strategy, i.e., L ═ D-W, where D is a diagonal matrix whose diagonal elements are the sum of the column elements of W.

4. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: the reduction functional space of each image is characterized in the step 3), and functional mapping between image pairs is established, specifically:

3.1) computing the Laplace matrix of the multimodal imageAnd taking the first p eigenvectors to stretch into a reduced functional spaceAnd the corresponding eigenvalues on each mode of each image form a diagonal matrix respectivelyWherein p is<q；

3.2) setting each image segmentation function as f_oiCorresponds to S_icThe search space of the function corresponds to a set of basis vectorsFormed by stretchingSpace of p dimensionAnd f is_oiRepresenting coefficients of the ith image as a linear function combinationWherein B is_iIs formed by the first p eigenvectors of the Laplace matrix;

3.3) reflecting the relation between any two images in pairs through linear functional mapping, and obtaining the subspace of the ith imageSubspace to jth imageIs used for functional mappingRepresentation, i.e. subspaceMapping a function of (1) to a subspaceCan be obtained by calculating R_ijf is obtained.

5. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: aligning the image functional mapping of each modality with an image cue in the step 4), and introducing a latent function to keep consistency between functional mappings, specifically:

4.1) image clues correspond to different description operators, and the image functional mapping of each modality is aligned with the image clues by optimizing the following expression, namely

Wherein the constant alpha>0，β>0, symbol | · | non-conducting phosphor₁Representing the L1 norm of the matrix, the symbol | | · | | non |)_FA Frobenius norm representing a matrix;

4.2) the introduced implicit functions are shared by the input images, and the corresponding implicit functions on each image can be effectively related by functional mapping a consistent item, wherein each implicit function only appears in a certain subset of the images, and the ith imageImage corresponding implicit function z_i＝[z_i1，z_i2，…，z_il]E {0, 1} characterizes the relationship between the latent function and the image, while the continuous variable Φ_i＝[φ_i1,φ_i2,…,φ_il]Describing each implicit function on the image;

the correspondence term of the functional mapping in the last step is expressed as

<math> <mrow> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>Φ</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mi>γ</mi> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>E</mi> </mrow> </munder> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <msub> <mi>Φ</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>Φ</mi> <mi>j</mi> </msub> <mi>diag</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>λ</mi> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msubsup> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>Φ</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>Φ</mi> <mi>i</mi> </msub> <mi>diag</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> <mn>2</mn> </msubsup> <mo>,</mo> </mrow> </math>

Wherein the constant gamma>0，λ>0, symbol | · | non-conducting phosphor₂L2 norm, diag (z) representing the matrix_i) Representing a diagonal matrix, (i, j) ∈ Ε represents the neighbor set of the image pair.

6. The multi-modal functional mapping-based image segmentation method of claim 1, wherein: the functional mapping expression is obtained according to the multi-modal mapping consistency in the step 5), and the segmentation function corresponding to each graph is calculated through a joint optimization objective function, specifically:

5.1) computing a functional mapping expression according to the multi-modal mapping consistency relationship established in the step 4), namely

<math> <mrow> <msub> <mi>min</mi> <mrow> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>Φ</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> </mrow> </msub> <munder> <mi>Σ</mi> <mrow> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mi>E</mi> </mrow> </munder> <mi>H</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>Q</mi> <mrow> <mo>(</mo> <msub> <mi>R</mi> <mi>ij</mi> </msub> <mo>,</mo> <msub> <mi>Φ</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>,</mo> </mrow> </math>

Wherein the variable phi_iAnd phi_jThere is an orthogonal constraint between them, and the solution is carried out by using a variable alternation optimization method, namely fixing the other two variables to optimize the remaining one variable, namely the variable z_iThe method is initialized to be a full 1 vector, and the optimal functional mapping expression R can be calculated through multiple iterations until the function is converged_ij；

5.2) using image sample as imageThe weight between two vertexes is recorded asThe joint optimization target expression of the image segmentation function is

Wherein the constant ζ>0, C ═ {1,2, …, C }, symbol (·)^TRepresenting transposes of vectors or matrices, subspace B_ikThe first p eigenvectors of the superpixel graph Laplacian matrix on the ith image corresponding to the kth mode are expanded into different classes of segmentation functions f_icMutual exclusion constraints are satisfied;

5.3) obtaining the optimal segmentation function of the ith image by solving the optimal solution of the objective function in 5.2)From this it can be determined that the optimal segmented block in the image belonging to the c-th object class is represented as