CN117953208A

CN117953208A - Graph-based edge attention gate medical image segmentation method and device

Info

Publication number: CN117953208A
Application number: CN202311697571.0A
Authority: CN
Inventors: 郝德琛; 章永来; 强彦; 李华玲; 安洋; 李强
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-04-30

Abstract

The invention discloses a graph-based edge attention gate medical image segmentation method and device, wherein the device comprises the following steps: a data collection unit; a data preprocessing unit; EAGC _ IUNet model building unit: constructing a graph-based edge attention gate medical image segmentation model, and marking the model as EAGC-IUNet; EAGC _ IUNet model training unit: training the EAGC _ IUNet constructed; a medical image segmentation unit: according to the medical image segmentation method, a target region segmentation result is given; the invention improves the edge attention gate structure, and uses the lateral operator and the longitudinal operator of Sobel to extract the edge characteristics of the characteristic diagram in two directions respectively, so that the high-frequency edge information of the characteristic diagram is easier to extract. The improved unet3+ is used as a backbone network, so that the model parameter number can be reduced while the unet++ advantage is maintained, and the full-scale jump connection is more beneficial to capturing fine-granularity semantic features and coarse-granularity semantic features in the image.

Description

Graph-based edge attention gate medical image segmentation method and device

Technical Field

The invention relates to the technical field of medical image processing, in particular to a medical image segmentation method and device based on an edge attention gate of a graph structure.

Background

Medical images are widely used by doctors as important data for clinical diagnosis for finding diseases, making treatment plans, judging prognosis, and the like. The disease detection rate and the diagnosis accuracy can be remarkably improved by accurately positioning the focus and accurately drawing the focus severity according to the medical image. Common medical imaging techniques include X-ray imaging, computed Tomography (CT), magnetic Resonance Imaging (MRI), ultrasound examination, positron Emission Tomography (PET), angiography, and optical imaging.

At an early stage, scholars often adopt traditional methods such as contour detection, threshold value, filtering, clustering, priori and machine learning algorithms in processing medical image segmentation tasks. With the rapid development of the deep learning technology, a vast number of students gradually transition from the traditional method to the deep learning method in the research direction of the medical image field. Many scholars have made significant progress in medical image segmentation, especially when the convolution kernel performs convolution operations to extract the powerful generalization ability of high-dimensional features, which represents a significant advantage in visual tasks. But these developments are accompanied by two problems to be solved. The first problem is that for this powerful generalization capability there is a degree of local location information loss. For example, when larger receptive fields are used to extract image semantic information, the preservation of the extracted features into a small feature map can result in loss of local positional information at the pixel level. The second problem is also a common difficulty in the image segmentation problem, namely, a problem of missing segmentation edge information.

Disclosure of Invention

The invention aims to solve the technical problem of providing a graph-based edge attention gate medical image segmentation method and device aiming at the defects of the prior art.

The technical scheme of the invention is as follows:

the invention provides a graph-based edge attention gate medical image segmentation method, and the general flow is shown in figure 1. The method comprises the steps S110-S150, and specifically comprises the following steps:

S110: collecting original medical image data and corresponding manual segmentation result data of a target area to respectively form an original image data set I and a label data set L;

S120: the original image data set and the label data set are subjected to data preprocessing, and a test and training data set is constructed; s130: a graph-based edge attention gate medical image segmentation model is constructed and denoted EAGC — IUNet.

S140: training is performed on the EAGC _ IUNet constructed.

S150: and according to the medical image segmentation method, a target region segmentation result is given.

The flow of the data preprocessing method in step S120 is shown in fig. 2, and includes steps S210 to S230. Specifically, the method comprises the steps of S210 three-dimensional medical image slicing, S220 two-dimensional image normalization and S230 two-dimensional image scaling. The respective modules are described below.

S210: if the medical image is a three-dimensional image, each piece of original medical image data and corresponding piece of target area manual segmentation result data are subjected to two-dimensional slicing along the cross section.

S220: to accelerate neural network training convergence to ensure rapid network convergence, all two-dimensional images are first normalized, i.e., each pixel value of the image is changed from [0,255] to [0,1]. The normalization formula is as follows:

where x _i denotes an ith pixel value, and max (x) denote maximum and minimum values of the pixel, respectively.

And secondly, respectively constructing an original image training data set I _train, an original image test data set I _test, a manual segmentation result training data set L _train and a manual segmentation result test data set L _test according to the ratio of 80% to 20% for the normalized original medical image data and the corresponding manual segmentation result data of the target region.

S230: all image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.

In step S130, a graph-based edge attention gate medical image segmentation model (EAGC _ IUNet) is constructed, and the network structure is shown in fig. 3. The improved unet3+ is adopted as a backbone network, and comprises a graph encoder module, a convolution encoder module and a decoder part, and the following steps are adopted:

310 in the graph encoder module, include a build weighted adjacency matrix, build node annotation feature, and two modified residual graph convolution modules (denoted MCMCRes _gcn). The specific calculation process comprises the following steps: firstly, respectively solving an adjacent matrix and a node characteristic matrix of the edge relation for an original input image. Since the graph rolling operation is a laplace smoothing, neighboring nodes tend to have similar characteristics as information propagates between the neighboring nodes. To prevent the phenomenon that the number of layers of the graph is too deep to cause excessive smoothness, a two-layer graph rolling network structure is adopted. And finally, taking the adjacent matrix and the node characteristic matrix as the input of the improved residual diagram convolution module, and calculating by using the two improved residual diagram convolution modules to obtain the two-dimensional diagram convolution characteristics.

The graph structure data is defined as triples G (N, E, F). N represents a node annotation vector set of a graph with the size of |N| x S, |N| represents the number of nodes in the graph, S represents the dimension of the node annotation vector, E is an edge set of the graph, and F represents a graph feature. The graph structure data is not unique to the matrix N and the edge set E, unlike the representation of the data in euclidean space. The matrix N corresponds to the set E, and N and E are arranged according to the order of the nodes. The construction of cosine weighted adjacency matrix, construction of node annotation features and improvement residual map convolution modules are referred to in 310 and are specifically calculated as follows:

(1) And constructing a weighted adjacency matrix, taking the influence of the distance and the pixel value on the correlation between the nodes into consideration, and calculating the weighted adjacency matrix through cosine distance by using the distance between the nodes and the pixel value as vectors.

The vectors A and B consist of the distance between nodes and the pixel value.

(2) Node annotation N is initialized by the pyramid multiscale features of the pre-trained U-Net. And extracting the feature graphs of the decoders with different depths from the U-Net model, respectively carrying out up-sampling with different multiples to achieve the same size as the feature graphs output by the decoder of the last layer, and carrying out channel splicing on the feature graphs to obtain the number S of the feature channels. Assuming that the number of long pixels on the edge of the spliced multilayer feature image is ζ, the number of feature image pixels is ζXζ= |N|. Let the pixel of the ith row and jth column of the kth channel in the feature map be denoted as x _i,j,k, and the eta node annotation be denoted as

n_η＝(x_i,j,1,x_i,j,2,,x_i,j,S),i,j＝1,2,,ξ

Wherein, the calculation formula of eta is eta= (i-1) x lambda+j, lambda is the number of pixels of the second dimension of the image. And splicing N _η on the first dimension according to the node sequence eta to obtain a node annotation N.

(3) The graph convolution calculation is carried out on one graph, and the normalized Laplace matrix L is as follows

L＝I-D^-1/2AD^-1/2

Wherein D is the angle matrix. Since the laplace matrix L is a real symmetric semi-definite matrix, there is a set of orthogonal eigenvectors, obtained by fourier-basis diagonalization u= [ U ₀,u₁,…,u_n-1 ]

L＝UΛU^T

Where matrix Λ is a diagonal matrix of eigenvalues, Λ=diag ([ λ ₀,…,λ_n-1])∈R^n×n. Fourier transform of graph g _θ is as follows

g_θ(L)*x＝Ug_θ(Λ)U^Tx

In the non-parametric filter g _θ (Λ) =diag (θ), the parameter θ is a fourier coefficient vector. To address the limitations of non-parametric filters, which are not limited to the vertex domain and high in temporal complexity, polynomial filters are used instead of non-parametric filters.

The polynomial filter formula is as follows

Substitution is available

By matrix transformation, GCN output values are as follows

Where (α ₁,α₂,…,α_K) is an arbitrary value, the random initialization value changes the parameter value by back propagation.

The improved residual diagram convolution module (MCMCRes _GCN Block) is based on a two-layer diagram convolution layer, introduces the ideas of residual connection and Dropout in ResNet, and the structure of the improved residual diagram convolution module is shown in fig. 4. Inspired by the Dropout idea, a method of MCMC Dropout is presented herein. Unlike Dropout, MCMC Dropout performs MCMC sampling screening on node feature vectors in the graph structure. The residual structure is then applied in a graph convolution module. The improved residual image convolution module takes two layers of image convolution as a basic structure, x ₀ input by the image convolution layer is sampled by MCMC to obtain x ₁,x₁ and F (x ₀) output by the image convolution layer is added to obtain final characteristic H (x ₀), and the formula is as follows

H(x₀)＝F(x₀)+x₁

320 In the convolutional encoder block, five convolutional encoding blocks, each consisting of two 2D convolutional layers, a batch normalization layer, and a ReLU activation layer. The specific calculation process comprises the following steps: the original input image is first subjected to five convolution block calculations and four downsampling operations. And reconstructing a second dimension of the result graph convolution feature of the graph encoder into a two-dimensional square matrix, namely converting the graph convolution feature into a three-dimensional matrix. And performing channel splicing on the two-dimensional graph convolution characteristics after up-sampling of different scales and the convolution characteristics of the last four convolution encoders.

The structure of the 330 convolutional decoding block includes four convolutional decoding blocks, four edge attention gates, a two-dimensional convolutional layer, and a Sigmoid layer. The first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer. The specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.

The edge attention gate structure at 330 is as shown in fig. 5, and is specifically as follows: x ^l is the feature map output at the first layer, the feature size is C _x×H_x×W_x, where C _x is the number of channels of the feature map at the first layer, and H _x×W_x is the size of each feature map. The gating signal u is a feature map of the previous layer up-sampling, with a feature size of C _u×H_u×W_u.

G _x is introduced as a lateral operator of Sobel, G _y is a longitudinal operator of Sobel, padding and convolution operations are carried out on x ^l by using the lateral operator and the longitudinal operator of Sobel, and then F _u,F_x is obtained in a point-by-point way.

Wherein, is convolution operation. And F _u and F _x are subjected to convolution kernels of 1 multiplied by 1 to obtain a feature W _u and a feature W _x, and then the feature mapping size obtained by adding the features point by point is C _υ×H_u×W_u, so that the outline features of the features are enhanced. Grid resampling is performed using quadratic linear interpolation after sequentially passing through the activation functions of linear transformation and nonlinear transformation. The original feature map extracted over multiple scales and the attention coefficient (α) weighted edge enhanced feature map are combined by a jump connection, where the attention coefficient α e [0,1] preserves features that are relevant only to a particular task by identifying salient feature regions and modifying the attention weight distribution.

The model is trained in step S140 using the original image training dataset I _train and the manual segmentation result training dataset L _train in step S120. The method comprises the following specific steps: firstly, inputting I _train into a network to calculate to obtain a round of iterative calculation result, comparing the result with a corresponding manual segmentation result in L _train, and calculating a loss value by using a loss function; secondly, calculating gradients through a random gradient descent optimizer and updating weights in a network through back propagation; then iterating the process until the error requirement is met, and obtaining a network training model; finally, the model is verified using the original image test dataset I _test and the manual segmentation result test dataset L _test. The loss function in the above step is calculated as follows using a weighted loss function

L_s＝α·Loss_BBCE+β·Loss_DICE+γ·Loss_MIoU

Where α, β, γ represent the weights of the three loss functions. Loss _BBCE represents a balanced two-class cross-entropy Loss function, loss _DICE represents a Dice Loss function, loss _MIoU represents a MIoU Loss function, and the calculation formula is as follows

Where μ represents the balance hyper-parameter of the positive and negative samples, and the ratio of positive samples to total sample size is often taken. y is the result of the prediction and is,For the label image, K is the category number.

The invention further discloses a graph-based edge attention gate medical image segmentation device, which comprises:

a data collection unit: collecting original medical image data and corresponding manual segmentation result data of a target area to respectively form an original image data set I and a label data set L;

a data preprocessing unit: the original image data set and the label data set are subjected to data preprocessing, and a test and training data set is constructed;

EAGC _ IUNet model building unit: constructing a graph-based edge attention gate medical image segmentation model, and marking the model as EAGC-IUNet;

EAGC _ IUNet model training unit: training the EAGC _ IUNet constructed;

a medical image segmentation unit: and according to the medical image segmentation method, a target region segmentation result is given.

The device, the data preprocessing unit includes: a three-dimensional medical image slicing subunit, a two-dimensional image normalization subunit, and a two-dimensional image scaling subunit:

three-dimensional medical image slice subunit: if the medical image is a three-dimensional image, carrying out two-dimensional slicing on each original medical image data and the corresponding manual segmentation result data of the target area along the cross section;

Two-dimensional image normalization subunit: in order to accelerate the neural network training convergence and ensure the network to converge rapidly, firstly, normalizing all two-dimensional images, namely changing each pixel value of the images from [0,255] to [0,1]; the normalization formula is as follows:

Wherein x _i denotes an ith pixel value, and max (x) denote maximum and minimum values of the pixels, respectively;

Secondly, respectively constructing an original image training data set I _train, an original image testing data set I _test, a manual segmentation result training data set L _train and a manual segmentation result testing data set L _test according to the ratio of 80% to 20% for the normalized original medical image data and the corresponding manual segmentation result data of the target area;

A two-dimensional image scaling subunit: all image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.

The device comprises a EAGC _ IUNet model building unit: the improved unet3+ is adopted as a backbone network, and comprises a graph encoder module, a convolution encoder module and a decoder part, and the following steps are adopted:

A graph encoder module: the method comprises the steps of constructing a weighted adjacency matrix, constructing node annotation characteristics and two improved residual diagram convolution modules MCMCRes _GCN; the specific process comprises the following steps: firstly, respectively solving an adjacent matrix and a node characteristic matrix of an edge relation for an original input image; since the graph rolling operation is a laplace smoothing, neighboring nodes tend to have similar characteristics when information propagates between the neighboring nodes; in order to prevent the phenomenon that the number of the picture winding layers is too deep so as to generate excessive smoothness, a two-layer picture winding network structure is adopted; finally, taking the adjacent matrix and the node characteristic matrix as input of an improved residual diagram convolution module, and calculating by using two improved residual diagram convolution modules to obtain two-dimensional diagram convolution characteristics;

Defining the graph structure data as triples G (N, E, F); n represents a node annotation vector set of a graph with the size of |N| x S, |N| represents the number of nodes in the graph, S represents the dimension of the node annotation vector, E is an edge set of the graph, and F represents a graph feature. The graph structure data is different from the representation of the data in euclidean space, and the matrix N and the edge set E are not unique; the matrix N corresponds to the set E, and N and E are arranged according to the order of the nodes;

A convolutional encoder module: in the convolutional encoder block, five convolutional encoding blocks are included, wherein each convolutional encoding block consists of two 2D convolutional layers, a batch normalization layer and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, carrying out five convolution block calculations and four downsampling operations on an original input image; reconstructing a second dimension of the result graph convolution feature of the graph encoder into a two-dimensional square matrix, namely converting the graph convolution feature into a three-dimensional matrix; performing channel splicing on the two-dimensional graph convolution characteristics after up-sampling of different scales and the convolution characteristics of the last four convolution encoders respectively;

A decoder: the structure of the convolution decoding module comprises four convolution decoding modules, four edge attention gates, a two-dimensional convolution layer and a Sigmoid layer; the first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.

The device, in the graph encoder module, constructs a cosine weighted adjacency matrix, constructs node annotation characteristics and improves residual graph convolution module, and the device is specifically as follows:

(1) Constructing a weighted adjacency matrix, taking the influence of distance and pixel values on the correlation between nodes into consideration, using the distance between the nodes and the pixel values as vectors, and calculating the weighted adjacency matrix through cosine distance;

the vectors A and B consist of the distance between nodes and pixel values;

(2) The node annotation N is initialized by pyramid multi-scale features of the pre-training U-Net; extracting feature graphs of decoders with different depths from a U-Net model, respectively carrying out up-sampling with different multiples to achieve the same size as the feature graphs output by the decoder of the last layer, and carrying out channel splicing on the feature graphs to obtain the number S of feature channels; assuming that the number of long pixels of the edges of the spliced multilayer feature images is xi, the number of feature image pixels is xi multiplied by xi= |N|; let the pixel of the ith row and jth column of the kth channel in the feature map be denoted as x _i,j,k, and the eta node annotation be denoted as

n_η＝(x_i,j,1,x_i,j,2,…,x_i,j,S),i,j＝1,2,…,ξ

Wherein, the calculation formula of eta is eta= (i-1) x lambda+j, lambda is the number of pixels of the second dimension of the image; splicing N _η on the first dimension according to the node sequence eta to obtain a node annotation N;

L＝I-D^-1/2AD^-1/2

L＝UΛU^T

g_θ(L)*x＝Ug_θ(Λ)U^Tx

In the non-parametric filter g _θ (Λ) =diag (θ), the parameter θ is a fourier coefficient vector; in order to solve the limitation of the non-parametric filter, which is not limited to the vertex domain and has high time complexity, a polynomial filter is used to replace the non-parametric filter; the polynomial filter formula is as follows

Substitution is available

By matrix transformation, GCN output values are as follows

The improved residual diagram convolution module (MCMCRes _GCN Block) is based on a two-layer diagram convolution layer, introduces the ideas of residual connection and Dropout in ResNet, and performs MCMC sampling screening on node feature vectors in a diagram structure; the residual structure is then applied in a graph convolution module. The improved residual image convolution module takes two layers of image convolution as a basic structure, x ₀ input by the image convolution layer is sampled by MCMC to obtain x ₁,x₁ and F (x ₀) output by the image convolution layer is added to obtain final characteristic H (x ₀), and the formula is as follows

H(x₀)＝F(x₀)+x₁。

The device, the edge attention door specifically comprises the following parts: x ^l is the feature map output at the first layer, the feature size is C _x×H_x×W_x, where C _x is the number of channels of the feature map at the first layer, and H _x×W_x is the size of each feature map; the gating signal u is the feature mapping of the previous layer up-sampling, and the feature size is C _u×H_u×W_u;

Introducing G _x as a lateral operator of Sobel, G _y as a longitudinal operator of Sobel, performing Padding and convolution operation on x ^l by using the lateral operator and the longitudinal operator of Sobel, and performing point-by-point phase to obtain F _u,F_x;

Wherein, is convolution operation. F _u and F _x are subjected to convolution kernels of 1 multiplied by 1 to obtain a feature W _u and a feature W _x, and then the feature mapping size obtained by adding the features point by point is C _υ×H_u×W_u, so that the outline features of the features are enhanced; grid resampling is carried out by using quadratic linear interpolation after sequentially passing through activation functions of linear transformation and nonlinear transformation; the original feature map extracted over multiple scales and the attention coefficient (α) weighted edge enhanced feature map are combined by a jump connection, where the attention coefficient α e [0,1] preserves features that are relevant only to a particular task by identifying salient feature regions and modifying the attention weight distribution.

The device comprises a EAGC-IUNet model training unit, a manual segmentation result training data set L _train and a training unit, wherein the EAGC-IUNet model training unit is used for training a model by using an original image training data set I _train and the manual segmentation result training data set L _train; the method comprises the following specific steps: firstly, inputting I _train into a network to calculate to obtain a round of iterative calculation result, comparing the result with a corresponding manual segmentation result in L _train, and calculating a loss value by using a loss function; secondly, calculating gradients through a random gradient descent optimizer and updating weights in a network through back propagation; then iterating the process until the error requirement is met, and obtaining a network training model; finally, the model is verified using the original image test dataset I _test and the manual segmentation result test dataset L _test. The loss function in the above step is calculated as follows using a weighted loss function

L_s＝α·Loss_BBCE+β·Loss_DICE+γ·Loss_MIoU

The beneficial effects and innovations of the invention are as follows:

1. The invention improves the extraction method of the input characteristics of the graph convolutional network of the graph encoder, uses the distance between the nodes and the pixel value as vectors, calculates the weight adjacency matrix through cosine distance, and initializes the node annotation by the pyramid multiscale characteristics of the pre-trained U-Net.

2. The improved residual image convolution module uses the MCMC Dropout to carry out MCMC screening on the node characteristic vectors in the image structure, so that sampling points and training suggestion sampling distribution can be dynamically adjusted in the node characteristic sampling process to achieve a better sampling effect.

3. The invention improves the edge attention gate structure, and uses the lateral operator and the longitudinal operator of Sobel to extract the edge characteristics of the characteristic diagram in two directions respectively, so that the high-frequency edge information of the characteristic diagram is easier to extract.

4. The invention uses the improved UNet3+ as a backbone network, so that the model parameter quantity is reduced while the advantage of UNet++ is maintained, and the full-scale jump connection is more beneficial to capturing fine-granularity semantic features and coarse-granularity semantic features in the image.

Drawings

Fig. 1: an overall flow chart;

fig. 2: a flow chart of a data preprocessing method;

fig. 3: graph-based edge attention gate medical image segmentation network structure diagram;

fig. 4: improving a residual diagram convolution module structure diagram;

Fig. 5: edge attention door structure;

Fig. 6: the embodiment calculates a specific flowchart;

Detailed Description

The following describes the implementation process of the present invention in a DR chest lung nodule segmentation dataset for MRI ischemic dark band data set and pneumoconiosis assisted screening for acute ischemic stroke with reference to the accompanying drawings and examples.

Example 1: MRI ischemic dark band dataset image segmentation applied to acute ischemic cerebral apoplexy

The specific flow in this embodiment is shown in fig. 6, and the specific steps are as follows:

And respectively forming an MRI data set I and a label data set L by the original data in the MRI ischemic dark band data set and the corresponding target region manual segmentation result data, and further carrying out preprocessing operation on the data. Each MRI data and corresponding target region manual segmentation result data is two-dimensionally sliced along the cross-section and all two-dimensional slices are normalized, i.e., each pixel value of the image is changed from 0,255 to 0, 1. The normalization formula is as follows:

And respectively constructing an MRI training data set I _train, an MRI test data set I _test, a manual segmentation result training data set L _train and a manual segmentation result test data set L _test according to the proportion of 80% and 20% for the normalized MRI data and the corresponding manual segmentation result data of the target region. All image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.

Taking an MRI data input as an example, a graph-based edge attention gate medical image segmentation model (EAGC _ IUNet) is constructed and trained, and the specific steps are as follows:

First, a graph encoder module is constructed. The distances between the nodes and the pixel values are used as vectors, a weight adjacency matrix is calculated through cosine distances, and the node annotation N is initialized by using pyramid multiscale characteristics of the pre-training U-Net. And extracting the feature graphs of the decoders with different depths from the U-Net model, respectively carrying out up-sampling with different multiples to achieve the same size as the feature graphs output by the decoder of the last layer, and carrying out channel splicing on the feature graphs to obtain the number S of the feature channels. Assuming that the number of long pixels on the edge of the spliced multilayer feature image is ζ, the number of feature image pixels is ζXζ= |N|. Let the pixel of the kth channel in the ith row and jth column in the feature map be denoted as x _i,j,k, the jth node note as n _η＝(x_i,j,1,x_i,j,2,…,x_i,j,S), i, j=1, 2. Wherein the calculation formula of eta is eta= (i-1) x lambda + j, lambda being the number of pixels in the second dimension of the image. And splicing N _η on the first dimension according to the node sequence eta to obtain a node annotation N. An improved residual map convolution module (denoted MCMCRes _gcn) was further constructed. The residual structure is applied in a graph convolution module. The improved residual image convolution module takes two layers of image convolution as a basic structure, x ₀ of image convolution input is sampled through MCMC to obtain x ₁,x₁ and F (x ₀) of image convolution output are added to obtain final characteristic H (x ₀), namely H (x ₀)＝F(x₀)+x₁), an adjacent matrix and a node characteristic matrix are taken as the input of the improved residual image convolution module, and two-dimensional image convolution characteristics are obtained through calculation of the two improved residual image convolution modules.

Next, a convolutional encoder is constructed. In the convolutional encoder block, five convolutional encoding blocks are included, where each convolutional encoding block consists of two 2D convolutional layers, a batch normalization layer, and a ReLU activation layer. The specific calculation process is as follows: five convolution block calculations and four downsampling operations are performed on the original input image, followed by a reconstruction of the second dimension of the resulting picture convolution feature of the picture encoder into a two-dimensional matrix, i.e. a conversion of the picture convolution feature into a three-dimensional matrix. And finally, carrying out up-sampling on the two-dimensional graph convolution characteristics in different scales, and respectively carrying out channel splicing on the two-dimensional graph convolution characteristics and convolution characteristics of the last four convolution encoders.

Again, a convolutional decoding module is constructed. The structure of the convolution decoding module comprises four convolution decoding modules, four edge attention gates, a two-dimensional convolution layer and a Sigmoid layer. The first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer. The specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.

Finally, the model is trained using MRI training dataset I _train and manual segmentation result training dataset L _train. The method comprises the following specific steps: i _train is input into a network to calculate to obtain a round of iterative calculation result, the result is compared with a corresponding manual segmentation result in L _train, and a loss value is calculated by using a loss function. The loss function uses L _s＝α·Loss_BBCE+β·Loss_DICE+γ·Loss_MIoU. Where α, β, γ represent the weights of the three loss functions. Loss _BBCE represents a balanced two-class cross-entropy Loss function, loss _DICE represents a Dice Loss function, and Loss _MIoU represents a MIoU Loss function. In this embodiment, the values are α=2, β=4, and γ=4, respectively. The gradients are then calculated by a random gradient descent optimizer and the weights in the network are updated by back propagation. And iterating the process until the error requirement is met, and obtaining the network training model. And finally, inputting the MRI test data set I _test into the model to obtain a final segmentation result, and verifying the model with the manual segmentation result test data set L _test.

Example 2: DR chest lung nodule segmentation for pneumoconiosis auxiliary screening

And respectively forming a DR chest radiography image data set I and a label data set L by the original data in the DR chest radiography lung nodule data set and the corresponding target area manual segmentation result data, and further preprocessing the data. All images are normalized, i.e. each pixel value of the image is changed from [0,255] to [0,1]. The normalization formula is as follows:

And respectively constructing a DR chest image training dataset I _train, a DR chest image testing dataset I _test, a manual segmentation result training dataset L _train and a manual segmentation result testing dataset L _test according to the ratio of 80% to 20% for the normalized original medical image data and the corresponding manual segmentation result data of the target region. All image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.

In the same calculation manner as in embodiment 1, taking a DR chest image as an example, a graph-based edge attention gate medical image segmentation model (EAGC _ IUNet) is constructed and trained, and the specific steps are as follows:

Finally, the model is trained using DR chest radiography image training dataset I _train and manual segmentation result training dataset L _train. The method comprises the following specific steps: i _train is input into a network to calculate to obtain a round of iterative calculation result, the result is compared with a corresponding manual segmentation result in L _train, and a loss value is calculated by using a loss function. The loss function uses L _s＝α·Loss_BBCE+β·Loss_DICE+γ·Loss_MIoU. Where α, β, γ represent the weights of the three loss functions. Loss _BBCE represents a balanced two-class cross-entropy Loss function, loss _DICE represents a Dice Loss function, and Loss _MIoU represents a MIoU Loss function. In this embodiment, the values are α=2, β=4, and γ=4, respectively. The gradients are then calculated by a random gradient descent optimizer and the weights in the network are updated by back propagation. And iterating the process until the error requirement is met, and obtaining the network training model. And finally, inputting the DR chest radiography image test data set I _test into a model to obtain a final segmentation result, and verifying the model with the manual segmentation result test data set L _test.

It will be understood that modifications and variations will be apparent to those skilled in the art from the foregoing description, and it is intended that all such modifications and variations be included within the scope of the following claims.

Claims

1. The image segmentation method based on the edge attention gate of the image is characterized by comprising the following steps of S110-S150:

S120: the original image data set and the label data set are subjected to data preprocessing, and a test and training data set is constructed;

s130: constructing a graph-based edge attention gate medical image segmentation model, and marking the model as EAGC-IUNet;

S140: training the EAGC _ IUNet constructed;

2. The method according to claim 1, wherein the data preprocessing method in step S120 includes steps S210 to S230; the method comprises the steps of S210 three-dimensional medical image slicing, S220 two-dimensional image normalization and S230 two-dimensional image scaling:

s210: if the medical image is a three-dimensional image, carrying out two-dimensional slicing on each original medical image data and the corresponding manual segmentation result data of the target area along the cross section;

S220: in order to accelerate the neural network training convergence and ensure the network to converge rapidly, firstly, normalizing all two-dimensional images, namely changing each pixel value of the images from [0,255] to [0,1]; the normalization formula is as follows:

3. The method according to claim 1, wherein the construction of the graph-based edge attention gate medical image segmentation model (EAGC _ IUNet) using the modified unet3+ as the backbone network in step S130 comprises a graph encoder module, a convolutional encoder module and a decoder section, in particular as follows:

310 in a graph encoder module, including a construct weighted adjacency matrix, a construct node annotation feature, and two improved residual graph convolution modules MCMCRes _gcn; the specific calculation process comprises the following steps: firstly, respectively solving an adjacent matrix and a node characteristic matrix of an edge relation for an original input image; since the graph rolling operation is a laplace smoothing, neighboring nodes tend to have similar characteristics when information propagates between the neighboring nodes; in order to prevent the phenomenon that the number of the picture winding layers is too deep so as to generate excessive smoothness, a two-layer picture winding network structure is adopted; finally, taking the adjacent matrix and the node characteristic matrix as input of an improved residual diagram convolution module, and calculating by using two improved residual diagram convolution modules to obtain two-dimensional diagram convolution characteristics;

Defining the graph structure data as triples G (N, E, F); n represents a node annotation vector set of a graph with the size of |N| x S, |N| represents the number of nodes in the graph, S represents the dimension of the node annotation vector, E is an edge set of the graph, and F represents a graph feature; the graph structure data is different from the representation of the data in euclidean space, and the matrix N and the edge set E are not unique; the matrix N corresponds to the set E, and N and E are arranged according to the order of the nodes;

320 in a convolutional encoder block, comprising five convolutional encoding blocks, wherein each convolutional encoding block consists of two 2D convolutional layers, a batch normalization layer, and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, carrying out five convolution block calculations and four downsampling operations on an original input image; reconstructing a second dimension of the result graph convolution feature of the graph encoder into a two-dimensional square matrix, namely converting the graph convolution feature into a three-dimensional matrix; performing channel splicing on the two-dimensional graph convolution characteristics after up-sampling of different scales and the convolution characteristics of the last four convolution encoders respectively;

The structure of the 330 convolution decoding module comprises four convolution decoding modules, four edge attention gates, a two-dimensional convolution layer and a Sigmoid layer; the first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.

4. A method according to claim 3, characterized in that it involves in 310 the construction of a cosine weighted adjacency matrix, the construction of node annotation features and the improvement of the residual map convolution module, which are calculated in particular as follows:

the vectors A and B consist of the distance between nodes and pixel values;

n_η＝(x_i,j,1,x_i,j,2,…,x_i,j,S),i,j＝1,2,...,ξ

L＝I-D^-1/2AD^-1/2

Where D is the diagonal matrix, since the laplace matrix L is a real symmetric semi-definite matrix, there is a set of orthogonal eigenvectors, obtained by fourier basis diagonalization u= [ U ₀,u₁,...,u_n-1 ]

L＝UΛU^T

Where matrix Λ is a diagonal matrix of eigenvalues, Λ=diag ([ λ ₀,...,λ_n-1])∈R^n×n; fourier transform of graph g _θ as follows

g_θ(L)*x＝Ug_θ(Λ)U^Tx

Substitution is available

By matrix transformation, GCN output values are as follows

Wherein, (α ₁,α₂,...,α_K) is an arbitrary value, the random initialization value changes the parameter value by back propagation;

The improved residual diagram convolution module (MCMCRes _GCN Block) is based on a two-layer diagram convolution layer, introduces the ideas of residual connection and Dropout in ResNet, and performs MCMC sampling screening on node feature vectors in a diagram structure; the residual structure is then applied to a graph convolution module; the improved residual image convolution module takes two layers of image convolution as a basic structure, x ₀ input by the image convolution layer is sampled by MCMC to obtain x ₁,x₁ and F (x ₀) output by the image convolution layer is added to obtain final characteristic H (x ₀), and the formula is as follows

H(x₀)＝F(x₀)+x₁。

5. A method according to claim 3, characterized in that the edge attention gate in 330 is, in particular, as follows: x ^l is the feature map output at the first layer, the feature size is C _x×H_x×W_x, where C _x is the number of channels of the feature map at the first layer, and H _x×W_x is the size of each feature map; the gating signal u is the feature mapping of the previous layer up-sampling, and the feature size is C _u×H_u×W_u;

Wherein, is convolution operation; f _u and F _x are subjected to convolution kernels of 1 multiplied by 1 to obtain a feature W _u and a feature W _x, and then the feature mapping size obtained by adding the features point by point is C _υ×H_u×W_u, so that the outline features of the features are enhanced; grid resampling is carried out by using quadratic linear interpolation after sequentially passing through activation functions of linear transformation and nonlinear transformation; the original feature map extracted over multiple scales and the attention coefficient (α) weighted edge enhanced feature map are combined by a jump connection, where the attention coefficient α e [0,1] preserves features that are relevant only to a particular task by identifying salient feature regions and modifying the attention weight distribution.

6. The method according to claim 1, wherein the model is trained in step S140 using the original image training dataset I _train and the manual segmentation result training dataset L _train in step S120; the method comprises the following specific steps: firstly, inputting I _train into a network to calculate to obtain a round of iterative calculation result, comparing the result with a corresponding manual segmentation result in L _train, and calculating a loss value by using a loss function; secondly, calculating gradients through a random gradient descent optimizer and updating weights in a network through back propagation; then iterating the process until the error requirement is met, and obtaining a network training model; finally, verifying the model by using the original image test dataset I _test and the manual segmentation result test dataset L _test; the loss function in the above step is calculated as follows using a weighted loss function

L_s＝α·Loss_BBCE+β·Loss_DICE+γ·Loss_MIoU

Wherein α, β, γ represent weights of three loss functions; loss _BBCE represents a balanced two-class cross-entropy Loss function, loss _DICE represents a Dice Loss function, loss _MIoU represents a MIoU Loss function, and the calculation formula is as follows

Wherein mu represents the balance super parameter of positive and negative samples, and the ratio of the positive samples to the total sample size is usually taken; y is the result of the prediction and is,For the label image, K is the category number.

7. An edge attention gate medical image segmentation apparatus based on a graph, comprising:

EAGC _ IUNet model training unit: training the EAGC _ IUNet constructed;

8. The apparatus of claim 7, wherein the data preprocessing unit comprises: a three-dimensional medical image slicing subunit, a two-dimensional image normalization subunit, and a two-dimensional image scaling subunit:

9. The apparatus of claim 1, wherein EAGC _ IUNet model building unit: the improved unet3+ is adopted as a backbone network, and comprises a graph encoder module, a convolution encoder module and a decoder part, and the following steps are adopted:

10. The apparatus of claim 9, wherein in the graph encoder module, the cosine weighted adjacency matrix is constructed, the node annotation feature is constructed, and the residual graph convolution module is modified, in particular as follows:

the vectors A and B consist of the distance between nodes and pixel values;

n_η＝(x_i,j,1,x_i,j,2,…,x_i,j,S),i,j＝1,2,…,ξ

L＝I-D^-1/2AD^-1/2

Wherein D is an angle matrix; since the laplace matrix L is a real symmetric semi-definite matrix, there is a set of orthogonal eigenvectors, obtained by fourier-basis diagonalization u= [ U ₀,u₁,...,u_n-1 ]

L＝UΛU^T

g_θ(L)*x＝Ug_θ(Λ)U^Tx

Substitution is available

By matrix transformation, GCN output values are as follows

H(x₀)＝F(x₀)+x₁。