CN104778683A - Multi-modal image segmenting method based on functional mapping - Google Patents
Multi-modal image segmenting method based on functional mapping Download PDFInfo
- Publication number
- CN104778683A CN104778683A CN201510040592.4A CN201510040592A CN104778683A CN 104778683 A CN104778683 A CN 104778683A CN 201510040592 A CN201510040592 A CN 201510040592A CN 104778683 A CN104778683 A CN 104778683A
- Authority
- CN
- China
- Prior art keywords
- image
- msub
- mrow
- functional
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013507 mapping Methods 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 title claims abstract description 29
- 239000011159 matrix material Substances 0.000 claims abstract description 45
- 230000011218 segmentation Effects 0.000 claims abstract description 41
- 238000003709 image segmentation Methods 0.000 claims abstract description 29
- 239000013598 vector Substances 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 8
- 230000007717 exclusion Effects 0.000 claims description 3
- 238000012886 linear function Methods 0.000 claims description 3
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims 2
- 230000000694 effects Effects 0.000 abstract description 5
- 230000002708 enhancing effect Effects 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 230000017105 transposition Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
本发明涉及基于泛函映射的多模态图像分割方法。本发明对包含目标的图像集合做如下操作:1)将图像分割成超像素块,并用不同的特征描述子表征,获得多模态图像表示;2)在多模态图像上建立超像素图,构建相应的拉普拉斯矩阵;3)表征每幅图像的约减泛函空间,建立图像对之间的泛函映射;4)将每种模态的图像泛函映射与图像线索对齐,引入隐函数保持泛函映射之间的一致性;5)依据多模态映射一致性获得泛函映射表达,通过联合优化目标函数计算图像对应的分割函数,得到图像的最优分割表示。本发明能够利用图像不同模态的特征表示以及图像之间共有的目标潜在关联,准确判定图像的各目标区域块,增强了图像分割的性能和效果。
The invention relates to a multimodal image segmentation method based on functional mapping. The present invention performs the following operations on the image set containing the target: 1) divide the image into superpixel blocks, and use different feature descriptors to represent them to obtain a multimodal image representation; 2) establish a superpixel map on the multimodal image, Construct the corresponding Laplacian matrix; 3) Characterize the reduced functional space of each image, and establish the functional mapping between image pairs; 4) Align the image functional mapping of each modality with image cues, introduce The implicit function maintains the consistency between the functional mappings; 5) The functional mapping expression is obtained according to the consistency of the multimodal mapping, and the segmentation function corresponding to the image is calculated by jointly optimizing the objective function to obtain the optimal segmentation representation of the image. The present invention can accurately determine each target area block of the image by utilizing the feature representation of different modalities of the image and the potential relationship of the common target between the images, thereby enhancing the performance and effect of image segmentation.
Description
技术领域technical field
本发明属于图像处理中的图像分割技术领域,特别是基于泛函映射的多模态图像分割方法。The invention belongs to the technical field of image segmentation in image processing, in particular to a multimodal image segmentation method based on functional mapping.
背景技术Background technique
数字图像技术的蓬勃发展催生了大量的新兴产业,如遥感卫星图像定位、医学影像分析、交通智能识别等等,促进了信息化社会的日臻成熟。图像作为人类感知世界的重要桥梁,与视觉领域也紧密相关。例如,图像处理在人工智能、机器视觉、生理学、医学、气象学、军事学等领域的各类视觉应用中的需求不断增长,并且发挥着越来越关键的作用。而图像分割作为图像预处理方法,为图像中高层语义分析奠定了坚实的基础,例如图像识别、目标定位、边缘检测等许多应用中均可使用图像分割技术提升性能。The vigorous development of digital image technology has spawned a large number of emerging industries, such as remote sensing satellite image positioning, medical image analysis, traffic intelligent identification, etc., which has promoted the maturity of the information society. As an important bridge for humans to perceive the world, images are also closely related to the visual field. For example, image processing is in increasing demand and playing an increasingly critical role in various visual applications in the fields of artificial intelligence, machine vision, physiology, medicine, meteorology, military science, etc. As an image preprocessing method, image segmentation has laid a solid foundation for high-level semantic analysis in images. For example, image segmentation technology can be used to improve performance in many applications such as image recognition, target positioning, and edge detection.
图像分割,顾名思义是将给定的图像按照某种规则或目标进行区域分割。例如,一副湖边拍摄的图像可以分割为湖面、人、小船、房屋、树丛、天空等多个代表不同语义类别的区域,这里的人和小船可以看做目标前景对象,其余的可看做背景对象。传统的图像分割技术主要针对单幅图像的灰度、颜色、纹理、形状等线索进行处理,典型的方法有阈值分割、区域分割、边缘分割、图分割、基于能量泛函分割等。例如阈值分割方法根据设定阈值对灰度值判断其所属类别;边缘分割方法根据边缘灰度值具有阶跃性或突变性等特点进行检测;区域分割方法根据图像相似性准则进行判定,主要有分水岭、区域分裂合并、阵子区域生长等技术;图分割将图像看做以像素为顶点而相邻像素用边连接的无向图,每个分割区域看做图中的子图;基于能量泛函分割利用连续曲线表示目标边缘,并通过能量泛函最小化求解分割结果,一般分为参数活动轮廓模型和几何活动轮廓模型两种。Image segmentation, as the name implies, is to segment a given image into regions according to certain rules or targets. For example, an image taken by a lake can be divided into multiple regions representing different semantic categories such as the lake surface, people, boats, houses, bushes, and sky. The people and boats here can be regarded as the target foreground objects, and the rest can be regarded as background object. Traditional image segmentation technology mainly deals with clues such as grayscale, color, texture, and shape of a single image. Typical methods include threshold segmentation, region segmentation, edge segmentation, graph segmentation, and energy functional-based segmentation. For example, the threshold segmentation method judges the category of the gray value according to the set threshold; the edge segmentation method detects the gray value of the edge according to the characteristics of step or mutation; the region segmentation method judges according to the image similarity criterion, mainly including Watershed, region splitting and merging, sub-region growth and other technologies; graph segmentation regards an image as an undirected graph with pixels as vertices and adjacent pixels connected by edges, and each segmented region is regarded as a subgraph in the graph; based on energy functional Segmentation uses continuous curves to represent the target edge, and solves the segmentation results through energy functional minimization. It is generally divided into two types: parametric active contour model and geometric active contour model.
上述方法的不足点主要表现在以下几个方面:第一,直接针对图像原始像素处理,增加了算法的时间复杂度,加大了计算开销;第二,底层处理技术如阈值分割和边缘分割很难与图像的语义特征建立关联;第三,忽略了图像之间的互补信息,尤其是包含相似目标的图像之间存在一些共同的结构和潜在信息,直接影响了图像目标的分割效果。因此,这些方法并不适合大规模的包含共同目标的图像分割任务,由此对数量级较大的图像识别、目标定位等实际应用产生一定的不利影响。基于这些考虑,针对智能交通识别、医学影响分析、大规模图像识别等应用领域,迫切需要设计一种能建立图像底层特征与语义特征的关联,并可多方位有效利用图像之间的潜在结构信息的图像分割技术。The shortcomings of the above methods are mainly manifested in the following aspects: first, direct processing of the original pixels of the image increases the time complexity of the algorithm and increases the computational overhead; second, the underlying processing techniques such as threshold segmentation and edge segmentation are very difficult It is difficult to establish an association with the semantic features of the image; third, the complementary information between images is ignored, especially there are some common structures and potential information between images containing similar objects, which directly affects the segmentation effect of image objects. Therefore, these methods are not suitable for large-scale image segmentation tasks that contain a common target, and thus have a certain adverse effect on practical applications such as image recognition and target positioning that have a larger order of magnitude. Based on these considerations, for intelligent traffic recognition, medical impact analysis, large-scale image recognition and other application fields, it is urgent to design a method that can establish the association between the underlying image features and semantic features, and effectively utilize the potential structural information between images in multiple directions. image segmentation technology.
发明内容Contents of the invention
为了有效利用图像之间的潜在结构信息,降低图像分割处理的计算复杂度,提升图像中的目标分割效果,本发明提出了一种基于泛函映射的多模态图像分割方法,该方法包括以下步骤:In order to effectively utilize the potential structure information between images, reduce the computational complexity of image segmentation processing, and improve the target segmentation effect in the image, the present invention proposes a multimodal image segmentation method based on functional mapping, which includes the following step:
1、获取包含目标的图像集合后,进行以下操作:1. After obtaining the image collection containing the target, perform the following operations:
1)将集合中的各图像分割成超像素块,用不同的特征描述子表征分割后的超像素以获得多模态图像表示;1) Segment each image in the set into superpixel blocks, and use different feature descriptors to characterize the segmented superpixels to obtain a multimodal image representation;
2)在多模态图像上建立基于超像素的图,并构建相应的拉普拉斯矩阵;2) Establish a superpixel-based graph on the multimodal image, and construct the corresponding Laplacian matrix;
3)表征每幅图像的约减泛函空间,建立图像对之间的泛函映射;3) Characterize the reduced functional space of each image, and establish a functional mapping between image pairs;
4)将每种模态的图像泛函映射与图像线索对齐,并引入隐函数保持泛函映射之间的一致性;4) Align the image functional mapping of each modality with the image cues, and introduce an implicit function to maintain the consistency between the functional mappings;
5)依据多模态映射一致性获得泛函映射表达,通过联合优化目标函数计算图像对应的分割函数,得到图像最优分割表示,完成图像分割。5) The functional mapping expression is obtained according to the consistency of the multimodal mapping, and the segmentation function corresponding to the image is calculated by jointly optimizing the objective function, and the optimal segmentation representation of the image is obtained to complete the image segmentation.
进一步,所述的步骤1)中所述的将集合中的各图像分割成超像素块,用不同的特征描述子表征分割后的超像素以获得多模态图像表示,具体是:Further, in the step 1), each image in the set is divided into superpixel blocks, and different feature descriptors are used to characterize the divided superpixels to obtain a multimodal image representation, specifically:
1)设集合由n幅相关联的图像组成,记为每幅图像含有一个或多个目标类,整个集合的目标类别数目为C;1) Let the set consist of n associated images, denoted as Each image contains one or more target categories, and the number of target categories in the entire set is C;
2)将图像中的像素看做图的顶点,利用图分割方法将集合中的图像划分为q个小区域(如100),q为正整数,这些小区域由取值相近的像素点构成,称之为超像素,第i幅图像中属于第c类的分割块表示为Sic,其中i={1,2,…,n},c={1,2,…,C};2) Consider the pixels in the image as the vertices of the graph, and use the graph segmentation method to divide the image in the set into q small areas (such as 100), where q is a positive integer, and these small areas are composed of pixels with similar values. It is called a superpixel, and the segmentation block belonging to the c-th category in the i-th image is represented as S ic , where i={1,2,...,n}, c={1,2,...,C};
3)利用m种不同的特征描述子,如尺度不变特征变换(SIFT)、局部二值模式(LBP)、梯度直方图(HOG)等,表征图像中的各超像素,从而获得多方位反映图像本征信息的多模态特征表示,如第i幅图像对应矩阵集合即第k种图像特征描述子对应第k种模态 3) Use m different feature descriptors, such as scale-invariant feature transform (SIFT), local binary pattern (LBP), gradient histogram (HOG), etc., to characterize each superpixel in the image, so as to obtain multi-directional reflection Multimodal feature representation of image intrinsic information, such as the i-th image corresponding matrix set That is, the kth image feature descriptor corresponds to the kth modality
进一步,所述的步骤2)中的在多模态图像上建立基于超像素的图,并构建相应的拉普拉斯矩阵,具体是:Further, in the described step 2), a superpixel-based graph is established on the multimodal image, and a corresponding Laplacian matrix is constructed, specifically:
2.1)将每种图像模态上的q个超像素看做图的顶点,构建相应顶点全连接而成的超像素图;2.1) Treat the q superpixels on each image modality as the vertices of the graph, and construct a superpixel graph that is fully connected to the corresponding vertices;
2.2)分别在各不同模态的超像素图上构建拉普拉斯矩阵它们通过高斯加权策略计算的权重矩阵W获得,即L=D-W,其中D是一对角阵,其对角线元素为W的各列元素和。2.2) Construct the Laplacian matrix on the superpixel maps of different modalities They are obtained through the weight matrix W calculated by the Gaussian weighting strategy, that is, L=DW, where D is a pair of diagonal matrices, and its diagonal elements are the sum of elements in each column of W.
所述的步骤3)中的表征每幅图像的线性约减泛函空间,建立图像对之间的泛函映射,具体是:Described step 3) characterizes the linear reduction functional space of each image, and sets up the functional mapping between image pairs, specifically:
1)计算多模态的拉普拉斯矩阵的特征值和特征向量,并取前p(p<q)个特征向量张成约减泛函空间且每幅图像的各模态上对应的特征值分别组成对角矩阵 1) Calculate the multimodal Laplacian matrix The eigenvalues and eigenvectors of , and take the first p (p<q) eigenvectors to form a reduced functional space And the eigenvalues corresponding to each mode of each image form a diagonal matrix
2)设每幅图像分割函数为foi对应Sic,该函数的搜索空间对应一组基向量所张成的p维空间且foi对第i幅图像的系数表示为线性函数组合其中Bi为拉普拉斯矩阵前p个特征向量构成;2) Let the segmentation function of each image be f oi corresponding to S ic , and the search space of this function corresponds to a set of basis vectors The spanned p-dimensional space And the coefficient of f oi on the i-th image is expressed as a combination of linear functions Among them, B i is composed of the first p eigenvectors of the Laplacian matrix;
3)通过线性泛函映射反映任意两两成对图像之间的关系,如从第i幅图像的子空间到第j幅图像的子空间的泛函映射用矩阵表示,即子空间中的函数映射到子空间中的表达值可由计算Rijf得到。3) Reflect the relationship between any pair of images through linear functional mapping, such as from the subspace of the i-th image to the subspace of the jth image The matrix for the functional mapping of means that the subspace The functions in are mapped to the subspace The expression value in can be obtained by calculating R ij f.
所述的步骤4)中的将每种模态的图像泛函映射与图像线索对齐,并引入隐函数保持泛函映射之间的一致性,具体是:In the step 4), the image functional mapping of each modality is aligned with the image clues, and an implicit function is introduced to maintain the consistency between the functional mappings, specifically:
1)图像线索对应不同的描述算子,每种模态的图像泛函映射与图像线索对齐通过优化以下表达式实现,即1) The image cues correspond to different description operators, and the image functional mapping of each modality is aligned with the image cues by optimizing the following expression, namely
其中,常数α>0,β>0,符号‖·‖1表示矩阵的L1范数,符号‖·‖F表示矩阵的Frobenius范数;Among them, the constant α>0, β>0, the symbol ‖·‖ 1 represents the L1 norm of the matrix, and the symbol ‖·‖ F represents the Frobenius norm of the matrix;
2)引入的隐函数由输入图像共享,且通过泛函映射一致项使得泛函映射能有效关联每幅图像上对应的隐函数,而每个隐函数仅出现在图像的某个子集中,且第i幅图像对应的隐函数zi=[zi1,zi2,…,zil]∈{0,1}表征隐函数与图像之间的关系,而连续变量Φi=[φi1,φi2,…,φil]对图像上的各隐函数进行描述;2) The introduced implicit function is shared by the input image, and the functional mapping can effectively associate the corresponding implicit function on each image through the functional mapping consistent item, and each implicit function only appears in a certain subset of the image, and the first The implicit function z i =[z i1 , z i2 ,..., z il ]∈{0, 1} corresponding to the i image represents the relationship between the implicit function and the image, and the continuous variable Φ i =[φ i1 ,φ i2 ,...,φ il ] describe each implicit function on the image;
上一步中的泛函映射一致项表示为The functional mapping consensus term in the previous step is expressed as
其中,常数γ>0,λ>0,符号‖·‖2表示矩阵的L2范数,diag(zi)表示一对角矩阵,(i,j)∈Ε表示图像对的近邻集合,如可取20幅邻居图像进行计算。Among them, the constant γ>0, λ>0, the symbol ‖·‖ 2 represents the L2 norm of the matrix, diag(z i ) represents a diagonal matrix, (i,j)∈E represents the neighbor set of the image pair, such as 20 neighbor images are calculated.
所述的步骤5)中的依据多模态映射一致性获得泛函映射表达,通过联合优化目标函数计算每幅图对应的分割函数,具体是:According to the multimodal mapping consistency in the described step 5), the functional mapping expression is obtained, and the segmentation function corresponding to each picture is calculated by jointly optimizing the objective function, specifically:
1)依据已建立的多模态映射一致性关系计算泛函映射表达,即1) Calculate the functional mapping expression according to the established multimodal mapping consistency relationship, namely
其中,变量Φi与Φj之间存在正交约束,这里采用变量交替优化方法进行求解,即固定其他两个变量优化剩余的一个变量,变量zi初始化为全1向量,通过多次迭代直至函数收敛,可计算得到最优的泛函映射表达Rij;Among them, there is an orthogonal constraint between the variables Φ i and Φ j. Here, the variable alternation optimization method is used to solve the problem, that is, the other two variables are fixed to optimize the remaining variable, and the variable z i is initialized as a vector of all 1s. The function converges, and the optimal functional mapping expression R ij can be calculated;
2)以图像样本为图上的顶点,两顶点之间的权重记为则图像分割函数的联合优化目标表达式为2) Take the image sample as a picture The vertices on , the weight between two vertices is recorded as Then the joint optimization objective expression of the image segmentation function is
其中,常数ζ>0,c={1,2,…,C},符号(·)T表示向量或矩阵的转置,子空间Bik由第i幅图像对应第k种模态上超像素图拉普拉斯矩阵的前p个特征向量张成,且不同类别的分割函数fic满足互斥约束;Among them, the constant ζ>0, c={1, 2, ..., C}, the symbol ( ) T represents the transposition of a vector or matrix, and the subspace B ik consists of the i-th image corresponding to the superpixel on the k-th modality The first p eigenvectors of the graph Laplacian matrix are spanned, and the segmentation functions f ic of different categories satisfy the mutual exclusion constraints;
3)通过求解上述步骤中目标函数的最优解,可得到第i幅图像的最优分割函数据此可以确定图像中属于第c个目标类别的最优分割块表示为 3) By solving the optimal solution of the objective function in the above steps, the optimal segmentation function of the i-th image can be obtained According to this, it can be determined that the optimal segmentation block belonging to the c-th target category in the image is expressed as
本发明提出了基于泛函映射的多模态图像分割方法,其优点在于:通过对图像原始像素进行图分割形成超像素,降低了计算开销;通过构建多模态超像素表示从不同描述子的角度反映图像的表征内容;通过在约减泛函空间建立图像对之间的泛函映射,以及利用隐函数保持其一致性,有效地建立了图像的低层特征与高层语义之间的关联,进而提升了图像分割效果,为如图像识别、目标定位等视觉应用奠定了夯实的基础。The present invention proposes a multimodal image segmentation method based on functional mapping, which has the advantages of: forming superpixels by image segmentation on the original pixels of the image, which reduces computational overhead; The angle reflects the representation content of the image; by establishing the functional mapping between image pairs in the reduced functional space, and using the implicit function to maintain its consistency, the relationship between the low-level features of the image and the high-level semantics is effectively established, and then It improves the image segmentation effect and lays a solid foundation for visual applications such as image recognition and target positioning.
附图说明Description of drawings
图1是本发明的方法流程图。Fig. 1 is a flow chart of the method of the present invention.
具体实施方式Detailed ways
参照附图1,进一步说明本发明:With reference to accompanying drawing 1, further illustrate the present invention:
1、获取包含目标的图像集合后,进行以下操作:1. After obtaining the image collection containing the target, perform the following operations:
1)将集合中的各图像分割成超像素块,用不同的特征描述子表征分割后的超像素以获得多模态图像表示;1) Segment each image in the set into superpixel blocks, and use different feature descriptors to characterize the segmented superpixels to obtain a multimodal image representation;
2)在多模态图像上建立基于超像素的图,并构建相应的拉普拉斯矩阵;2) Establish a superpixel-based graph on the multimodal image, and construct the corresponding Laplacian matrix;
3)表征每幅图像的约减泛函空间,建立图像对之间的泛函映射;3) Characterize the reduced functional space of each image, and establish a functional mapping between image pairs;
4)将每种模态的图像泛函映射与图像线索对齐,并引入隐函数保持泛函映射之间的一致性;4) Align the image functional mapping of each modality with the image cues, and introduce an implicit function to maintain the consistency between the functional mappings;
5)依据多模态映射一致性获得泛函映射表达,通过联合优化目标函数计算图像对应的分割函数,得到图像最优分割表示,完成图像分割。5) The functional mapping expression is obtained according to the consistency of the multimodal mapping, and the segmentation function corresponding to the image is calculated by jointly optimizing the objective function, and the optimal segmentation representation of the image is obtained to complete the image segmentation.
步骤1)中所述的将集合中的各图像分割成超像素块,用不同的特征描述子表征分割后的超像素以获得多模态图像表示,具体是:In step 1), each image in the collection is divided into superpixel blocks, and different feature descriptors are used to characterize the divided superpixels to obtain a multimodal image representation, specifically:
1)设集合由n幅相关联的图像组成,记为每幅图像含有一个或多个目标类,整个集合的目标类别数目为C;1) Let the set consist of n associated images, denoted as Each image contains one or more target categories, and the number of target categories in the entire set is C;
2)将图像中的像素看做图的顶点,利用图分割方法将集合中的图像划分为q个小区域(如100),q为正整数,这些小区域由取值相近的像素点构成,称之为超像素,第i幅图像中属于第c类的分割块表示为Sic,其中i={1,2,…,n},c={1,2,…,C};2) Consider the pixels in the image as the vertices of the graph, and use the graph segmentation method to divide the image in the set into q small areas (such as 100), where q is a positive integer, and these small areas are composed of pixels with similar values. It is called a superpixel, and the segmentation block belonging to the c-th category in the i-th image is represented as S ic , where i={1,2,...,n}, c={1,2,...,C};
3)利用m种不同的特征描述子,如尺度不变特征变换(SIFT)、局部二值模式(LBP)、梯度直方图(HOG)等,表征图像中的各超像素,从而获得多方位反映图像本征信息的多模态特征表示,如第i幅图像对应矩阵集合即第k种图像特征描述子对应第k种模态 3) Use m different feature descriptors, such as scale-invariant feature transform (SIFT), local binary pattern (LBP), gradient histogram (HOG), etc., to characterize each superpixel in the image, so as to obtain multi-directional reflection Multimodal feature representation of image intrinsic information, such as the i-th image corresponding matrix set That is, the kth image feature descriptor corresponds to the kth modality
步骤2)中的在多模态图像上建立基于超像素的图,并构建相应的拉普拉斯矩阵,具体是:In step 2), a superpixel-based graph is established on the multimodal image, and a corresponding Laplacian matrix is constructed, specifically:
2.1)将每种图像模态上的q个超像素看做图的顶点,构建相应顶点全连接而成的超像素图;2.1) Treat the q superpixels on each image modality as the vertices of the graph, and construct a superpixel graph that is fully connected to the corresponding vertices;
2.2)分别在各不同模态的超像素图上构建拉普拉斯矩阵它们通过高斯加权策略计算的权重矩阵W获得,即L=D-W,其中D是对角线元素为W的列元素和的对角矩阵。2.2) Construct the Laplacian matrix on the superpixel maps of different modalities They are obtained by a weight matrix W calculated by a Gaussian weighting strategy, that is, L=DW, where D is a diagonal matrix whose diagonal elements are sums of column elements of W.
步骤3)中的表征每幅图像的线性约减泛函空间,建立图像对之间的泛函映射,具体是:In step 3), characterize the linear reduction functional space of each image, and establish the functional mapping between image pairs, specifically:
1)计算多模态的拉普拉斯矩阵的特征值和特征向量,并取前p(p<q)个特征向量张成约减泛函空间且每幅图像的各模态上对应的特征值分别组成对角矩阵 1) Calculate the multimodal Laplacian matrix The eigenvalues and eigenvectors of , and take the first p (p<q) eigenvectors to form a reduced functional space And the eigenvalues corresponding to each mode of each image form a diagonal matrix
2)设每幅图像分割函数为foi对应Sic,该函数的搜索空间对应一组基向量所张成的p维空间且foi对第i幅图像的系数表示为线性函数组合其中Bi为拉普拉斯矩阵前p个特征向量构成;2) Let the segmentation function of each image be f oi corresponding to S ic , and the search space of this function corresponds to a set of basis vectors The spanned p-dimensional space And the coefficient of f oi on the i-th image is expressed as a combination of linear functions Among them, B i is composed of the first p eigenvectors of the Laplacian matrix;
3)通过线性泛函映射反映任意两两成对图像之间的关系,如从第i幅图像的子空间到第j幅图像的子空间的泛函映射用矩阵表示,即子空间中的函数映射到子空间中的表达值可由计算Rijf得到。3) Reflect the relationship between any pair of images through linear functional mapping, such as from the subspace of the i-th image to the subspace of the jth image The matrix for the functional mapping of means that the subspace The functions in are mapped to the subspace The expression value in can be obtained by calculating R ij f.
步骤4)中的将每种模态的图像泛函映射与图像线索对齐,并引入隐函数保持泛函映射之间的一致性,具体是:In step 4), the image functional mapping of each modality is aligned with the image clues, and an implicit function is introduced to maintain the consistency between the functional mappings, specifically:
1)图像线索对应不同的描述算子,每种模态的图像泛函映射与图像线索对齐通过优化以下表达式实现,即1) The image cues correspond to different description operators, and the image functional mapping of each modality is aligned with the image cues by optimizing the following expression, namely
其中,常数α>0,β>0,符号‖·‖1表示矩阵的L1范数,符号‖·‖F表示矩阵的Frobenius范数;Among them, the constant α>0, β>0, the symbol ‖·‖ 1 represents the L1 norm of the matrix, and the symbol ‖·‖ F represents the Frobenius norm of the matrix;
2)引入的隐函数由输入图像共享,且通过泛函映射一致项使得泛函映射能有效关联每幅图像上对应的隐函数,而每个隐函数仅出现在图像的某个子集中,且第i幅图像对应的隐函数zi=[zi1,zi2,…,zil]∈{0,1}表征隐函数与图像之间的关系,而连续变量Φi=[φi1,φi2,…,φil]对图像上的各隐函数进行描述;2) The introduced implicit function is shared by the input image, and the functional mapping can effectively associate the corresponding implicit function on each image through the functional mapping consistent item, and each implicit function only appears in a certain subset of the image, and the first The implicit function z i =[z i1 , z i2 ,..., z il ]∈{0, 1} corresponding to the i image represents the relationship between the implicit function and the image, and the continuous variable Φ i =[φ i1 ,φ i2 ,...,φ il ] describe each implicit function on the image;
上一步中的泛函映射一致项表示为The functional mapping consensus term in the previous step is expressed as
其中,常数γ>0,λ>0,符号‖·‖2表示矩阵的L2范数,diag(zi)表示一对角矩阵,(i,j)∈Ε表示图像对的近邻集合,如可取20幅邻居图像进行计算。Among them, the constant γ>0, λ>0, the symbol ‖·‖ 2 represents the L2 norm of the matrix, diag(z i ) represents a diagonal matrix, (i,j)∈E represents the neighbor set of the image pair, such as 20 neighbor images are calculated.
步骤5)中的依据多模态映射一致性获得泛函映射表达,通过联合优化目标函数计算每幅图对应的分割函数,具体是:In step 5), the functional mapping expression is obtained according to the multimodal mapping consistency, and the segmentation function corresponding to each image is calculated by jointly optimizing the objective function, specifically:
1)依据已建立的多模态映射一致性关系计算泛函映射表达,即1) Calculate the functional mapping expression according to the established multimodal mapping consistency relationship, namely
其中,变量Φi与Φj之间存在正交约束,这里采用变量交替优化方法进行求解,即固定其他两个变量优化剩余的一个变量,变量zi初始化为全1向量,通过多次迭代直至函数收敛,可计算得到最优的泛函映射表达Rij;Among them, there is an orthogonal constraint between the variables Φ i and Φ j. Here, the variable alternation optimization method is used to solve the problem, that is, the other two variables are fixed to optimize the remaining variable, and the variable z i is initialized as a vector of all 1s. The function converges, and the optimal functional mapping expression R ij can be calculated;
2)以图像样本为图上的顶点,两顶点之间的权重记为则图像分割函数的联合优化目标表达式为2) Take the image sample as a picture The vertices on , the weight between two vertices is recorded as Then the joint optimization objective expression of the image segmentation function is
其中,常数ζ>0,c={1,2,…,C},符号(·)T表示向量或矩阵的转置,子空间Bik由第i幅图像对应第k种模态上超像素图拉普拉斯矩阵的前p个特征向量张成,且不同类别的分割函数fic满足互斥约束;Among them, the constant ζ>0, c={1, 2, ..., C}, the symbol ( ) T represents the transposition of a vector or matrix, and the subspace B ik consists of the i-th image corresponding to the superpixel on the k-th modality The first p eigenvectors of the graph Laplacian matrix are spanned, and the segmentation functions f ic of different categories satisfy the mutual exclusion constraints;
3)通过求解上述步骤中目标函数的最优解,可得到第i幅图像的最优分割函数据此可以确定图像中属于第c个目标类别的最优分割块表示为 3) By solving the optimal solution of the objective function in the above steps, the optimal segmentation function of the i-th image can be obtained According to this, it can be determined that the optimal segmentation block belonging to the c-th target category in the image is expressed as
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510040592.4A CN104778683B (en) | 2015-01-27 | 2015-01-27 | A Multimodal Image Segmentation Method Based on Functional Mapping |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510040592.4A CN104778683B (en) | 2015-01-27 | 2015-01-27 | A Multimodal Image Segmentation Method Based on Functional Mapping |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104778683A true CN104778683A (en) | 2015-07-15 |
CN104778683B CN104778683B (en) | 2017-06-27 |
Family
ID=53620129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510040592.4A Active CN104778683B (en) | 2015-01-27 | 2015-01-27 | A Multimodal Image Segmentation Method Based on Functional Mapping |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104778683B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105069787A (en) * | 2015-08-04 | 2015-11-18 | 浙江慧谷信息技术有限公司 | Image joint segmentation algorithm based on consistency function space mapping |
CN106202281A (en) * | 2016-06-28 | 2016-12-07 | 广东工业大学 | A kind of multi-modal data represents learning method and system |
CN109993756A (en) * | 2019-04-09 | 2019-07-09 | 中康龙马(北京)医疗健康科技有限公司 | A kind of general medical image cutting method based on graph model Yu continuous successive optimization |
CN111382776A (en) * | 2018-12-26 | 2020-07-07 | 株式会社日立制作所 | Object recognition device and object recognition method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103093470A (en) * | 2013-01-23 | 2013-05-08 | 天津大学 | Rapid multi-modal image synergy segmentation method with unrelated scale feature |
US20140050391A1 (en) * | 2012-08-17 | 2014-02-20 | Nec Laboratories America, Inc. | Image segmentation for large-scale fine-grained recognition |
-
2015
- 2015-01-27 CN CN201510040592.4A patent/CN104778683B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140050391A1 (en) * | 2012-08-17 | 2014-02-20 | Nec Laboratories America, Inc. | Image segmentation for large-scale fine-grained recognition |
CN103093470A (en) * | 2013-01-23 | 2013-05-08 | 天津大学 | Rapid multi-modal image synergy segmentation method with unrelated scale feature |
Non-Patent Citations (1)
Title |
---|
苏坡等: ""基于超像素的多模态MRI脑胶质瘤分割"", 《西北工业大学学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105069787A (en) * | 2015-08-04 | 2015-11-18 | 浙江慧谷信息技术有限公司 | Image joint segmentation algorithm based on consistency function space mapping |
CN106202281A (en) * | 2016-06-28 | 2016-12-07 | 广东工业大学 | A kind of multi-modal data represents learning method and system |
CN111382776A (en) * | 2018-12-26 | 2020-07-07 | 株式会社日立制作所 | Object recognition device and object recognition method |
CN109993756A (en) * | 2019-04-09 | 2019-07-09 | 中康龙马(北京)医疗健康科技有限公司 | A kind of general medical image cutting method based on graph model Yu continuous successive optimization |
CN109993756B (en) * | 2019-04-09 | 2022-04-15 | 中康龙马(北京)医疗健康科技有限公司 | General medical image segmentation method based on graph model and continuous stepwise optimization |
Also Published As
Publication number | Publication date |
---|---|
CN104778683B (en) | 2017-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | RoadNet: Learning to comprehensively analyze road networks in complex urban scenes from high-resolution remotely sensed images | |
Lu et al. | Deep feature-preserving normal estimation for point cloud filtering | |
CN105574534A (en) | Significant object detection method based on sparse subspace clustering and low-order expression | |
CN106203430A (en) | A kind of significance object detecting method based on foreground focused degree and background priori | |
Huang et al. | Saliency and co-saliency detection by low-rank multiscale fusion | |
Giraud et al. | SuperPatchMatch: An algorithm for robust correspondences using superpixel patches | |
Choong et al. | Image segmentation via normalised cuts and clustering algorithm | |
CN106780582B (en) | The image significance detection method merged based on textural characteristics and color characteristic | |
CN109101981B (en) | A loop closure detection method based on global image stripe code in street scene scene | |
CN106530338A (en) | Method and system for matching MR image feature points before and after nonlinear deformation of biological tissue | |
CN106815842A (en) | A kind of improved image significance detection method based on super-pixel | |
WO2023142602A1 (en) | Image processing method and apparatus, and computer-readable storage medium | |
CN106650744A (en) | Image object co-segmentation method guided by local shape migration | |
CN104778683B (en) | A Multimodal Image Segmentation Method Based on Functional Mapping | |
Khan et al. | A modified adaptive differential evolution algorithm for color image segmentation | |
Vinoth Kumar et al. | A decennary survey on artificial intelligence methods for image segmentation | |
CN111091129A (en) | Image salient region extraction method based on multi-color characteristic manifold sorting | |
CN111062274B (en) | Context-aware embedded crowd counting method, system, medium and electronic equipment | |
Li et al. | Arbitrary body segmentation in static images | |
Xie et al. | 3D surface segmentation from point clouds via quadric fits based on DBSCAN clustering | |
CN112668662B (en) | Target detection method in wild mountain forest environment based on improved YOLOv3 network | |
Ding et al. | Image segmentation as learning on hypergraphs | |
Hassan et al. | Salient object detection based on CNN fusion of two types of saliency models | |
Zhang et al. | Automatic superpixel generation algorithm based on a quadric error metric in 3D space | |
Sun et al. | An enhanced affinity graph for image segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220808 Address after: Room 406, building 19, haichuangyuan, No. 998, Wenyi West Road, Yuhang District, Hangzhou City, Zhejiang Province Patentee after: HANGZHOU HUICUI INTELLIGENT TECHNOLOGY CO.,LTD. Address before: 310018 No. 2 street, Xiasha Higher Education Zone, Hangzhou, Zhejiang Patentee before: HANGZHOU DIANZI University |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A Multimodal Image Segmentation Method Based on Functional Mapping Granted publication date: 20170627 Pledgee: Guotou Taikang Trust Co.,Ltd. Pledgor: HANGZHOU HUICUI INTELLIGENT TECHNOLOGY CO.,LTD. Registration number: Y2024980004919 |