CN117953208A - Graph-based edge attention gate medical image segmentation method and device - Google Patents
Graph-based edge attention gate medical image segmentation method and device Download PDFInfo
- Publication number
- CN117953208A CN117953208A CN202311697571.0A CN202311697571A CN117953208A CN 117953208 A CN117953208 A CN 117953208A CN 202311697571 A CN202311697571 A CN 202311697571A CN 117953208 A CN117953208 A CN 117953208A
- Authority
- CN
- China
- Prior art keywords
- convolution
- graph
- image
- layer
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000003709 image segmentation Methods 0.000 title claims abstract description 31
- 230000011218 segmentation Effects 0.000 claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 54
- 238000010586 diagram Methods 0.000 claims abstract description 33
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 238000013480 data collection Methods 0.000 claims abstract description 3
- 239000011159 matrix material Substances 0.000 claims description 87
- 238000004364 calculation method Methods 0.000 claims description 53
- 238000012360 testing method Methods 0.000 claims description 48
- 230000006870 function Effects 0.000 claims description 42
- 238000010606 normalization Methods 0.000 claims description 30
- 239000013598 vector Substances 0.000 claims description 28
- 230000008569 process Effects 0.000 claims description 23
- 238000005070 sampling Methods 0.000 claims description 23
- 230000004913 activation Effects 0.000 claims description 21
- 230000009466 transformation Effects 0.000 claims description 10
- 238000000547 structure data Methods 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 7
- 238000004804 winding Methods 0.000 claims description 6
- SBAJRGRUGUQKAF-UHFFFAOYSA-N 3-(2-cyanoethylamino)propanenitrile Chemical compound N#CCCNCCC#N SBAJRGRUGUQKAF-UHFFFAOYSA-N 0.000 claims description 5
- 238000010276 construction Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000005096 rolling process Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 238000009499 grossing Methods 0.000 claims description 4
- 238000006467 substitution reaction Methods 0.000 claims description 4
- 238000012952 Resampling Methods 0.000 claims description 3
- 230000006872 improvement Effects 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000002595 magnetic resonance imaging Methods 0.000 description 12
- 230000000302 ischemic effect Effects 0.000 description 4
- 238000002601 radiography Methods 0.000 description 4
- 206010056342 Pulmonary mass Diseases 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000002591 computed tomography Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 206010035653 pneumoconiosis Diseases 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 206010008190 Cerebrovascular accident Diseases 0.000 description 1
- 208000032382 Ischaemic stroke Diseases 0.000 description 1
- 208000006011 Stroke Diseases 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 238000002583 angiography Methods 0.000 description 1
- 230000002490 cerebral effect Effects 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012634 optical imaging Methods 0.000 description 1
- 238000002600 positron emission tomography Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a graph-based edge attention gate medical image segmentation method and device, wherein the device comprises the following steps: a data collection unit; a data preprocessing unit; EAGC _ IUNet model building unit: constructing a graph-based edge attention gate medical image segmentation model, and marking the model as EAGC-IUNet; EAGC _ IUNet model training unit: training the EAGC _ IUNet constructed; a medical image segmentation unit: according to the medical image segmentation method, a target region segmentation result is given; the invention improves the edge attention gate structure, and uses the lateral operator and the longitudinal operator of Sobel to extract the edge characteristics of the characteristic diagram in two directions respectively, so that the high-frequency edge information of the characteristic diagram is easier to extract. The improved unet3+ is used as a backbone network, so that the model parameter number can be reduced while the unet++ advantage is maintained, and the full-scale jump connection is more beneficial to capturing fine-granularity semantic features and coarse-granularity semantic features in the image.
Description
Technical Field
The invention relates to the technical field of medical image processing, in particular to a medical image segmentation method and device based on an edge attention gate of a graph structure.
Background
Medical images are widely used by doctors as important data for clinical diagnosis for finding diseases, making treatment plans, judging prognosis, and the like. The disease detection rate and the diagnosis accuracy can be remarkably improved by accurately positioning the focus and accurately drawing the focus severity according to the medical image. Common medical imaging techniques include X-ray imaging, computed Tomography (CT), magnetic Resonance Imaging (MRI), ultrasound examination, positron Emission Tomography (PET), angiography, and optical imaging.
At an early stage, scholars often adopt traditional methods such as contour detection, threshold value, filtering, clustering, priori and machine learning algorithms in processing medical image segmentation tasks. With the rapid development of the deep learning technology, a vast number of students gradually transition from the traditional method to the deep learning method in the research direction of the medical image field. Many scholars have made significant progress in medical image segmentation, especially when the convolution kernel performs convolution operations to extract the powerful generalization ability of high-dimensional features, which represents a significant advantage in visual tasks. But these developments are accompanied by two problems to be solved. The first problem is that for this powerful generalization capability there is a degree of local location information loss. For example, when larger receptive fields are used to extract image semantic information, the preservation of the extracted features into a small feature map can result in loss of local positional information at the pixel level. The second problem is also a common difficulty in the image segmentation problem, namely, a problem of missing segmentation edge information.
Disclosure of Invention
The invention aims to solve the technical problem of providing a graph-based edge attention gate medical image segmentation method and device aiming at the defects of the prior art.
The technical scheme of the invention is as follows:
the invention provides a graph-based edge attention gate medical image segmentation method, and the general flow is shown in figure 1. The method comprises the steps S110-S150, and specifically comprises the following steps:
S110: collecting original medical image data and corresponding manual segmentation result data of a target area to respectively form an original image data set I and a label data set L;
S120: the original image data set and the label data set are subjected to data preprocessing, and a test and training data set is constructed; s130: a graph-based edge attention gate medical image segmentation model is constructed and denoted EAGC — IUNet.
S140: training is performed on the EAGC _ IUNet constructed.
S150: and according to the medical image segmentation method, a target region segmentation result is given.
The flow of the data preprocessing method in step S120 is shown in fig. 2, and includes steps S210 to S230. Specifically, the method comprises the steps of S210 three-dimensional medical image slicing, S220 two-dimensional image normalization and S230 two-dimensional image scaling. The respective modules are described below.
S210: if the medical image is a three-dimensional image, each piece of original medical image data and corresponding piece of target area manual segmentation result data are subjected to two-dimensional slicing along the cross section.
S220: to accelerate neural network training convergence to ensure rapid network convergence, all two-dimensional images are first normalized, i.e., each pixel value of the image is changed from [0,255] to [0,1]. The normalization formula is as follows:
where x i denotes an ith pixel value, and max (x) denote maximum and minimum values of the pixel, respectively.
And secondly, respectively constructing an original image training data set I train, an original image test data set I test, a manual segmentation result training data set L train and a manual segmentation result test data set L test according to the ratio of 80% to 20% for the normalized original medical image data and the corresponding manual segmentation result data of the target region.
S230: all image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.
In step S130, a graph-based edge attention gate medical image segmentation model (EAGC _ IUNet) is constructed, and the network structure is shown in fig. 3. The improved unet3+ is adopted as a backbone network, and comprises a graph encoder module, a convolution encoder module and a decoder part, and the following steps are adopted:
310 in the graph encoder module, include a build weighted adjacency matrix, build node annotation feature, and two modified residual graph convolution modules (denoted MCMCRes _gcn). The specific calculation process comprises the following steps: firstly, respectively solving an adjacent matrix and a node characteristic matrix of the edge relation for an original input image. Since the graph rolling operation is a laplace smoothing, neighboring nodes tend to have similar characteristics as information propagates between the neighboring nodes. To prevent the phenomenon that the number of layers of the graph is too deep to cause excessive smoothness, a two-layer graph rolling network structure is adopted. And finally, taking the adjacent matrix and the node characteristic matrix as the input of the improved residual diagram convolution module, and calculating by using the two improved residual diagram convolution modules to obtain the two-dimensional diagram convolution characteristics.
The graph structure data is defined as triples G (N, E, F). N represents a node annotation vector set of a graph with the size of |N| x S, |N| represents the number of nodes in the graph, S represents the dimension of the node annotation vector, E is an edge set of the graph, and F represents a graph feature. The graph structure data is not unique to the matrix N and the edge set E, unlike the representation of the data in euclidean space. The matrix N corresponds to the set E, and N and E are arranged according to the order of the nodes. The construction of cosine weighted adjacency matrix, construction of node annotation features and improvement residual map convolution modules are referred to in 310 and are specifically calculated as follows:
(1) And constructing a weighted adjacency matrix, taking the influence of the distance and the pixel value on the correlation between the nodes into consideration, and calculating the weighted adjacency matrix through cosine distance by using the distance between the nodes and the pixel value as vectors.
The vectors A and B consist of the distance between nodes and the pixel value.
(2) Node annotation N is initialized by the pyramid multiscale features of the pre-trained U-Net. And extracting the feature graphs of the decoders with different depths from the U-Net model, respectively carrying out up-sampling with different multiples to achieve the same size as the feature graphs output by the decoder of the last layer, and carrying out channel splicing on the feature graphs to obtain the number S of the feature channels. Assuming that the number of long pixels on the edge of the spliced multilayer feature image is ζ, the number of feature image pixels is ζXζ= |N|. Let the pixel of the ith row and jth column of the kth channel in the feature map be denoted as x i,j,k, and the eta node annotation be denoted as
nη=(xi,j,1,xi,j,2,,xi,j,S),i,j=1,2,,ξ
Wherein, the calculation formula of eta is eta= (i-1) x lambda+j, lambda is the number of pixels of the second dimension of the image. And splicing N η on the first dimension according to the node sequence eta to obtain a node annotation N.
(3) The graph convolution calculation is carried out on one graph, and the normalized Laplace matrix L is as follows
L=I-D-1/2AD-1/2
Wherein D is the angle matrix. Since the laplace matrix L is a real symmetric semi-definite matrix, there is a set of orthogonal eigenvectors, obtained by fourier-basis diagonalization u= [ U 0,u1,…,un-1 ]
L=UΛUT
Where matrix Λ is a diagonal matrix of eigenvalues, Λ=diag ([ λ 0,…,λn-1])∈Rn×n. Fourier transform of graph g θ is as follows
gθ(L)*x=Ugθ(Λ)UTx
In the non-parametric filter g θ (Λ) =diag (θ), the parameter θ is a fourier coefficient vector. To address the limitations of non-parametric filters, which are not limited to the vertex domain and high in temporal complexity, polynomial filters are used instead of non-parametric filters.
The polynomial filter formula is as follows
Substitution is available
By matrix transformation, GCN output values are as follows
Where (α 1,α2,…,αK) is an arbitrary value, the random initialization value changes the parameter value by back propagation.
The improved residual diagram convolution module (MCMCRes _GCN Block) is based on a two-layer diagram convolution layer, introduces the ideas of residual connection and Dropout in ResNet, and the structure of the improved residual diagram convolution module is shown in fig. 4. Inspired by the Dropout idea, a method of MCMC Dropout is presented herein. Unlike Dropout, MCMC Dropout performs MCMC sampling screening on node feature vectors in the graph structure. The residual structure is then applied in a graph convolution module. The improved residual image convolution module takes two layers of image convolution as a basic structure, x 0 input by the image convolution layer is sampled by MCMC to obtain x 1,x1 and F (x 0) output by the image convolution layer is added to obtain final characteristic H (x 0), and the formula is as follows
H(x0)=F(x0)+x1
320 In the convolutional encoder block, five convolutional encoding blocks, each consisting of two 2D convolutional layers, a batch normalization layer, and a ReLU activation layer. The specific calculation process comprises the following steps: the original input image is first subjected to five convolution block calculations and four downsampling operations. And reconstructing a second dimension of the result graph convolution feature of the graph encoder into a two-dimensional square matrix, namely converting the graph convolution feature into a three-dimensional matrix. And performing channel splicing on the two-dimensional graph convolution characteristics after up-sampling of different scales and the convolution characteristics of the last four convolution encoders.
The structure of the 330 convolutional decoding block includes four convolutional decoding blocks, four edge attention gates, a two-dimensional convolutional layer, and a Sigmoid layer. The first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer. The specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.
The edge attention gate structure at 330 is as shown in fig. 5, and is specifically as follows: x l is the feature map output at the first layer, the feature size is C x×Hx×Wx, where C x is the number of channels of the feature map at the first layer, and H x×Wx is the size of each feature map. The gating signal u is a feature map of the previous layer up-sampling, with a feature size of C u×Hu×Wu.
G x is introduced as a lateral operator of Sobel, G y is a longitudinal operator of Sobel, padding and convolution operations are carried out on x l by using the lateral operator and the longitudinal operator of Sobel, and then F u,Fx is obtained in a point-by-point way.
Wherein, is convolution operation. And F u and F x are subjected to convolution kernels of 1 multiplied by 1 to obtain a feature W u and a feature W x, and then the feature mapping size obtained by adding the features point by point is C υ×Hu×Wu, so that the outline features of the features are enhanced. Grid resampling is performed using quadratic linear interpolation after sequentially passing through the activation functions of linear transformation and nonlinear transformation. The original feature map extracted over multiple scales and the attention coefficient (α) weighted edge enhanced feature map are combined by a jump connection, where the attention coefficient α e [0,1] preserves features that are relevant only to a particular task by identifying salient feature regions and modifying the attention weight distribution.
The model is trained in step S140 using the original image training dataset I train and the manual segmentation result training dataset L train in step S120. The method comprises the following specific steps: firstly, inputting I train into a network to calculate to obtain a round of iterative calculation result, comparing the result with a corresponding manual segmentation result in L train, and calculating a loss value by using a loss function; secondly, calculating gradients through a random gradient descent optimizer and updating weights in a network through back propagation; then iterating the process until the error requirement is met, and obtaining a network training model; finally, the model is verified using the original image test dataset I test and the manual segmentation result test dataset L test. The loss function in the above step is calculated as follows using a weighted loss function
Ls=α·LossBBCE+β·LossDICE+γ·LossMIoU
Where α, β, γ represent the weights of the three loss functions. Loss BBCE represents a balanced two-class cross-entropy Loss function, loss DICE represents a Dice Loss function, loss MIoU represents a MIoU Loss function, and the calculation formula is as follows
Where μ represents the balance hyper-parameter of the positive and negative samples, and the ratio of positive samples to total sample size is often taken. y is the result of the prediction and is,For the label image, K is the category number.
The invention further discloses a graph-based edge attention gate medical image segmentation device, which comprises:
a data collection unit: collecting original medical image data and corresponding manual segmentation result data of a target area to respectively form an original image data set I and a label data set L;
a data preprocessing unit: the original image data set and the label data set are subjected to data preprocessing, and a test and training data set is constructed;
EAGC _ IUNet model building unit: constructing a graph-based edge attention gate medical image segmentation model, and marking the model as EAGC-IUNet;
EAGC _ IUNet model training unit: training the EAGC _ IUNet constructed;
a medical image segmentation unit: and according to the medical image segmentation method, a target region segmentation result is given.
The device, the data preprocessing unit includes: a three-dimensional medical image slicing subunit, a two-dimensional image normalization subunit, and a two-dimensional image scaling subunit:
three-dimensional medical image slice subunit: if the medical image is a three-dimensional image, carrying out two-dimensional slicing on each original medical image data and the corresponding manual segmentation result data of the target area along the cross section;
Two-dimensional image normalization subunit: in order to accelerate the neural network training convergence and ensure the network to converge rapidly, firstly, normalizing all two-dimensional images, namely changing each pixel value of the images from [0,255] to [0,1]; the normalization formula is as follows:
Wherein x i denotes an ith pixel value, and max (x) denote maximum and minimum values of the pixels, respectively;
Secondly, respectively constructing an original image training data set I train, an original image testing data set I test, a manual segmentation result training data set L train and a manual segmentation result testing data set L test according to the ratio of 80% to 20% for the normalized original medical image data and the corresponding manual segmentation result data of the target area;
A two-dimensional image scaling subunit: all image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.
The device comprises a EAGC _ IUNet model building unit: the improved unet3+ is adopted as a backbone network, and comprises a graph encoder module, a convolution encoder module and a decoder part, and the following steps are adopted:
A graph encoder module: the method comprises the steps of constructing a weighted adjacency matrix, constructing node annotation characteristics and two improved residual diagram convolution modules MCMCRes _GCN; the specific process comprises the following steps: firstly, respectively solving an adjacent matrix and a node characteristic matrix of an edge relation for an original input image; since the graph rolling operation is a laplace smoothing, neighboring nodes tend to have similar characteristics when information propagates between the neighboring nodes; in order to prevent the phenomenon that the number of the picture winding layers is too deep so as to generate excessive smoothness, a two-layer picture winding network structure is adopted; finally, taking the adjacent matrix and the node characteristic matrix as input of an improved residual diagram convolution module, and calculating by using two improved residual diagram convolution modules to obtain two-dimensional diagram convolution characteristics;
Defining the graph structure data as triples G (N, E, F); n represents a node annotation vector set of a graph with the size of |N| x S, |N| represents the number of nodes in the graph, S represents the dimension of the node annotation vector, E is an edge set of the graph, and F represents a graph feature. The graph structure data is different from the representation of the data in euclidean space, and the matrix N and the edge set E are not unique; the matrix N corresponds to the set E, and N and E are arranged according to the order of the nodes;
A convolutional encoder module: in the convolutional encoder block, five convolutional encoding blocks are included, wherein each convolutional encoding block consists of two 2D convolutional layers, a batch normalization layer and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, carrying out five convolution block calculations and four downsampling operations on an original input image; reconstructing a second dimension of the result graph convolution feature of the graph encoder into a two-dimensional square matrix, namely converting the graph convolution feature into a three-dimensional matrix; performing channel splicing on the two-dimensional graph convolution characteristics after up-sampling of different scales and the convolution characteristics of the last four convolution encoders respectively;
A decoder: the structure of the convolution decoding module comprises four convolution decoding modules, four edge attention gates, a two-dimensional convolution layer and a Sigmoid layer; the first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.
The device, in the graph encoder module, constructs a cosine weighted adjacency matrix, constructs node annotation characteristics and improves residual graph convolution module, and the device is specifically as follows:
(1) Constructing a weighted adjacency matrix, taking the influence of distance and pixel values on the correlation between nodes into consideration, using the distance between the nodes and the pixel values as vectors, and calculating the weighted adjacency matrix through cosine distance;
the vectors A and B consist of the distance between nodes and pixel values;
(2) The node annotation N is initialized by pyramid multi-scale features of the pre-training U-Net; extracting feature graphs of decoders with different depths from a U-Net model, respectively carrying out up-sampling with different multiples to achieve the same size as the feature graphs output by the decoder of the last layer, and carrying out channel splicing on the feature graphs to obtain the number S of feature channels; assuming that the number of long pixels of the edges of the spliced multilayer feature images is xi, the number of feature image pixels is xi multiplied by xi= |N|; let the pixel of the ith row and jth column of the kth channel in the feature map be denoted as x i,j,k, and the eta node annotation be denoted as
nη=(xi,j,1,xi,j,2,…,xi,j,S),i,j=1,2,…,ξ
Wherein, the calculation formula of eta is eta= (i-1) x lambda+j, lambda is the number of pixels of the second dimension of the image; splicing N η on the first dimension according to the node sequence eta to obtain a node annotation N;
(3) The graph convolution calculation is carried out on one graph, and the normalized Laplace matrix L is as follows
L=I-D-1/2AD-1/2
Wherein D is the angle matrix. Since the laplace matrix L is a real symmetric semi-definite matrix, there is a set of orthogonal eigenvectors, obtained by fourier-basis diagonalization u= [ U 0,u1,…,un-1 ]
L=UΛUT
Where matrix Λ is a diagonal matrix of eigenvalues, Λ=diag ([ λ 0,…,λn-1])∈Rn×n. Fourier transform of graph g θ is as follows
gθ(L)*x=Ugθ(Λ)UTx
In the non-parametric filter g θ (Λ) =diag (θ), the parameter θ is a fourier coefficient vector; in order to solve the limitation of the non-parametric filter, which is not limited to the vertex domain and has high time complexity, a polynomial filter is used to replace the non-parametric filter; the polynomial filter formula is as follows
Substitution is available
By matrix transformation, GCN output values are as follows
Where (α 1,α2,…,αK) is an arbitrary value, the random initialization value changes the parameter value by back propagation.
The improved residual diagram convolution module (MCMCRes _GCN Block) is based on a two-layer diagram convolution layer, introduces the ideas of residual connection and Dropout in ResNet, and performs MCMC sampling screening on node feature vectors in a diagram structure; the residual structure is then applied in a graph convolution module. The improved residual image convolution module takes two layers of image convolution as a basic structure, x 0 input by the image convolution layer is sampled by MCMC to obtain x 1,x1 and F (x 0) output by the image convolution layer is added to obtain final characteristic H (x 0), and the formula is as follows
H(x0)=F(x0)+x1。
The device, the edge attention door specifically comprises the following parts: x l is the feature map output at the first layer, the feature size is C x×Hx×Wx, where C x is the number of channels of the feature map at the first layer, and H x×Wx is the size of each feature map; the gating signal u is the feature mapping of the previous layer up-sampling, and the feature size is C u×Hu×Wu;
Introducing G x as a lateral operator of Sobel, G y as a longitudinal operator of Sobel, performing Padding and convolution operation on x l by using the lateral operator and the longitudinal operator of Sobel, and performing point-by-point phase to obtain F u,Fx;
Wherein, is convolution operation. F u and F x are subjected to convolution kernels of 1 multiplied by 1 to obtain a feature W u and a feature W x, and then the feature mapping size obtained by adding the features point by point is C υ×Hu×Wu, so that the outline features of the features are enhanced; grid resampling is carried out by using quadratic linear interpolation after sequentially passing through activation functions of linear transformation and nonlinear transformation; the original feature map extracted over multiple scales and the attention coefficient (α) weighted edge enhanced feature map are combined by a jump connection, where the attention coefficient α e [0,1] preserves features that are relevant only to a particular task by identifying salient feature regions and modifying the attention weight distribution.
The device comprises a EAGC-IUNet model training unit, a manual segmentation result training data set L train and a training unit, wherein the EAGC-IUNet model training unit is used for training a model by using an original image training data set I train and the manual segmentation result training data set L train; the method comprises the following specific steps: firstly, inputting I train into a network to calculate to obtain a round of iterative calculation result, comparing the result with a corresponding manual segmentation result in L train, and calculating a loss value by using a loss function; secondly, calculating gradients through a random gradient descent optimizer and updating weights in a network through back propagation; then iterating the process until the error requirement is met, and obtaining a network training model; finally, the model is verified using the original image test dataset I test and the manual segmentation result test dataset L test. The loss function in the above step is calculated as follows using a weighted loss function
Ls=α·LossBBCE+β·LossDICE+γ·LossMIoU
Where α, β, γ represent the weights of the three loss functions. Loss BBCE represents a balanced two-class cross-entropy Loss function, loss DICE represents a Dice Loss function, loss MIoU represents a MIoU Loss function, and the calculation formula is as follows
Where μ represents the balance hyper-parameter of the positive and negative samples, and the ratio of positive samples to total sample size is often taken. y is the result of the prediction and is,For the label image, K is the category number.
The beneficial effects and innovations of the invention are as follows:
1. The invention improves the extraction method of the input characteristics of the graph convolutional network of the graph encoder, uses the distance between the nodes and the pixel value as vectors, calculates the weight adjacency matrix through cosine distance, and initializes the node annotation by the pyramid multiscale characteristics of the pre-trained U-Net.
2. The improved residual image convolution module uses the MCMC Dropout to carry out MCMC screening on the node characteristic vectors in the image structure, so that sampling points and training suggestion sampling distribution can be dynamically adjusted in the node characteristic sampling process to achieve a better sampling effect.
3. The invention improves the edge attention gate structure, and uses the lateral operator and the longitudinal operator of Sobel to extract the edge characteristics of the characteristic diagram in two directions respectively, so that the high-frequency edge information of the characteristic diagram is easier to extract.
4. The invention uses the improved UNet3+ as a backbone network, so that the model parameter quantity is reduced while the advantage of UNet++ is maintained, and the full-scale jump connection is more beneficial to capturing fine-granularity semantic features and coarse-granularity semantic features in the image.
Drawings
Fig. 1: an overall flow chart;
fig. 2: a flow chart of a data preprocessing method;
fig. 3: graph-based edge attention gate medical image segmentation network structure diagram;
fig. 4: improving a residual diagram convolution module structure diagram;
Fig. 5: edge attention door structure;
Fig. 6: the embodiment calculates a specific flowchart;
Detailed Description
The following describes the implementation process of the present invention in a DR chest lung nodule segmentation dataset for MRI ischemic dark band data set and pneumoconiosis assisted screening for acute ischemic stroke with reference to the accompanying drawings and examples.
Example 1: MRI ischemic dark band dataset image segmentation applied to acute ischemic cerebral apoplexy
The specific flow in this embodiment is shown in fig. 6, and the specific steps are as follows:
And respectively forming an MRI data set I and a label data set L by the original data in the MRI ischemic dark band data set and the corresponding target region manual segmentation result data, and further carrying out preprocessing operation on the data. Each MRI data and corresponding target region manual segmentation result data is two-dimensionally sliced along the cross-section and all two-dimensional slices are normalized, i.e., each pixel value of the image is changed from 0,255 to 0, 1. The normalization formula is as follows:
where x i denotes an ith pixel value, and max (x) denote maximum and minimum values of the pixel, respectively.
And respectively constructing an MRI training data set I train, an MRI test data set I test, a manual segmentation result training data set L train and a manual segmentation result test data set L test according to the proportion of 80% and 20% for the normalized MRI data and the corresponding manual segmentation result data of the target region. All image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.
Taking an MRI data input as an example, a graph-based edge attention gate medical image segmentation model (EAGC _ IUNet) is constructed and trained, and the specific steps are as follows:
First, a graph encoder module is constructed. The distances between the nodes and the pixel values are used as vectors, a weight adjacency matrix is calculated through cosine distances, and the node annotation N is initialized by using pyramid multiscale characteristics of the pre-training U-Net. And extracting the feature graphs of the decoders with different depths from the U-Net model, respectively carrying out up-sampling with different multiples to achieve the same size as the feature graphs output by the decoder of the last layer, and carrying out channel splicing on the feature graphs to obtain the number S of the feature channels. Assuming that the number of long pixels on the edge of the spliced multilayer feature image is ζ, the number of feature image pixels is ζXζ= |N|. Let the pixel of the kth channel in the ith row and jth column in the feature map be denoted as x i,j,k, the jth node note as n η=(xi,j,1,xi,j,2,…,xi,j,S), i, j=1, 2. Wherein the calculation formula of eta is eta= (i-1) x lambda + j, lambda being the number of pixels in the second dimension of the image. And splicing N η on the first dimension according to the node sequence eta to obtain a node annotation N. An improved residual map convolution module (denoted MCMCRes _gcn) was further constructed. The residual structure is applied in a graph convolution module. The improved residual image convolution module takes two layers of image convolution as a basic structure, x 0 of image convolution input is sampled through MCMC to obtain x 1,x1 and F (x 0) of image convolution output are added to obtain final characteristic H (x 0), namely H (x 0)=F(x0)+x1), an adjacent matrix and a node characteristic matrix are taken as the input of the improved residual image convolution module, and two-dimensional image convolution characteristics are obtained through calculation of the two improved residual image convolution modules.
Next, a convolutional encoder is constructed. In the convolutional encoder block, five convolutional encoding blocks are included, where each convolutional encoding block consists of two 2D convolutional layers, a batch normalization layer, and a ReLU activation layer. The specific calculation process is as follows: five convolution block calculations and four downsampling operations are performed on the original input image, followed by a reconstruction of the second dimension of the resulting picture convolution feature of the picture encoder into a two-dimensional matrix, i.e. a conversion of the picture convolution feature into a three-dimensional matrix. And finally, carrying out up-sampling on the two-dimensional graph convolution characteristics in different scales, and respectively carrying out channel splicing on the two-dimensional graph convolution characteristics and convolution characteristics of the last four convolution encoders.
Again, a convolutional decoding module is constructed. The structure of the convolution decoding module comprises four convolution decoding modules, four edge attention gates, a two-dimensional convolution layer and a Sigmoid layer. The first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer. The specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.
Finally, the model is trained using MRI training dataset I train and manual segmentation result training dataset L train. The method comprises the following specific steps: i train is input into a network to calculate to obtain a round of iterative calculation result, the result is compared with a corresponding manual segmentation result in L train, and a loss value is calculated by using a loss function. The loss function uses L s=α·LossBBCE+β·LossDICE+γ·LossMIoU. Where α, β, γ represent the weights of the three loss functions. Loss BBCE represents a balanced two-class cross-entropy Loss function, loss DICE represents a Dice Loss function, and Loss MIoU represents a MIoU Loss function. In this embodiment, the values are α=2, β=4, and γ=4, respectively. The gradients are then calculated by a random gradient descent optimizer and the weights in the network are updated by back propagation. And iterating the process until the error requirement is met, and obtaining the network training model. And finally, inputting the MRI test data set I test into the model to obtain a final segmentation result, and verifying the model with the manual segmentation result test data set L test.
Example 2: DR chest lung nodule segmentation for pneumoconiosis auxiliary screening
The specific flow in this embodiment is shown in fig. 6, and the specific steps are as follows:
And respectively forming a DR chest radiography image data set I and a label data set L by the original data in the DR chest radiography lung nodule data set and the corresponding target area manual segmentation result data, and further preprocessing the data. All images are normalized, i.e. each pixel value of the image is changed from [0,255] to [0,1]. The normalization formula is as follows:
where x i denotes an ith pixel value, and max (x) denote maximum and minimum values of the pixel, respectively.
And respectively constructing a DR chest image training dataset I train, a DR chest image testing dataset I test, a manual segmentation result training dataset L train and a manual segmentation result testing dataset L test according to the ratio of 80% to 20% for the normalized original medical image data and the corresponding manual segmentation result data of the target region. All image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.
In the same calculation manner as in embodiment 1, taking a DR chest image as an example, a graph-based edge attention gate medical image segmentation model (EAGC _ IUNet) is constructed and trained, and the specific steps are as follows:
First, a graph encoder module is constructed. The distances between the nodes and the pixel values are used as vectors, a weight adjacency matrix is calculated through cosine distances, and the node annotation N is initialized by using pyramid multiscale characteristics of the pre-training U-Net. And extracting the feature graphs of the decoders with different depths from the U-Net model, respectively carrying out up-sampling with different multiples to achieve the same size as the feature graphs output by the decoder of the last layer, and carrying out channel splicing on the feature graphs to obtain the number S of the feature channels. Assuming that the number of long pixels on the edge of the spliced multilayer feature image is ζ, the number of feature image pixels is ζXζ= |N|. Let the pixel of the kth channel in the ith row and jth column in the feature map be denoted as x i,j,k, the jth node note as n η=(xi,j,1,xi,j,2,…,xi,j,S), i, j=1, 2. Wherein the calculation formula of eta is eta= (i-1) x lambda + j, lambda being the number of pixels in the second dimension of the image. And splicing N η on the first dimension according to the node sequence eta to obtain a node annotation N. An improved residual map convolution module (denoted MCMCRes _gcn) was further constructed. The residual structure is applied in a graph convolution module. The improved residual image convolution module takes two layers of image convolution as a basic structure, x 0 of image convolution input is sampled through MCMC to obtain x 1,x1 and F (x 0) of image convolution output are added to obtain final characteristic H (x 0), namely H (x 0)=F(x0)+x1), an adjacent matrix and a node characteristic matrix are taken as the input of the improved residual image convolution module, and two-dimensional image convolution characteristics are obtained through calculation of the two improved residual image convolution modules.
Next, a convolutional encoder is constructed. In the convolutional encoder block, five convolutional encoding blocks are included, where each convolutional encoding block consists of two 2D convolutional layers, a batch normalization layer, and a ReLU activation layer. The specific calculation process is as follows: five convolution block calculations and four downsampling operations are performed on the original input image, followed by a reconstruction of the second dimension of the resulting picture convolution feature of the picture encoder into a two-dimensional matrix, i.e. a conversion of the picture convolution feature into a three-dimensional matrix. And finally, carrying out up-sampling on the two-dimensional graph convolution characteristics in different scales, and respectively carrying out channel splicing on the two-dimensional graph convolution characteristics and convolution characteristics of the last four convolution encoders.
Again, a convolutional decoding module is constructed. The structure of the convolution decoding module comprises four convolution decoding modules, four edge attention gates, a two-dimensional convolution layer and a Sigmoid layer. The first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer. The specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.
Finally, the model is trained using DR chest radiography image training dataset I train and manual segmentation result training dataset L train. The method comprises the following specific steps: i train is input into a network to calculate to obtain a round of iterative calculation result, the result is compared with a corresponding manual segmentation result in L train, and a loss value is calculated by using a loss function. The loss function uses L s=α·LossBBCE+β·LossDICE+γ·LossMIoU. Where α, β, γ represent the weights of the three loss functions. Loss BBCE represents a balanced two-class cross-entropy Loss function, loss DICE represents a Dice Loss function, and Loss MIoU represents a MIoU Loss function. In this embodiment, the values are α=2, β=4, and γ=4, respectively. The gradients are then calculated by a random gradient descent optimizer and the weights in the network are updated by back propagation. And iterating the process until the error requirement is met, and obtaining the network training model. And finally, inputting the DR chest radiography image test data set I test into a model to obtain a final segmentation result, and verifying the model with the manual segmentation result test data set L test.
It will be understood that modifications and variations will be apparent to those skilled in the art from the foregoing description, and it is intended that all such modifications and variations be included within the scope of the following claims.
Claims (10)
1. The image segmentation method based on the edge attention gate of the image is characterized by comprising the following steps of S110-S150:
S110: collecting original medical image data and corresponding manual segmentation result data of a target area to respectively form an original image data set I and a label data set L;
S120: the original image data set and the label data set are subjected to data preprocessing, and a test and training data set is constructed;
s130: constructing a graph-based edge attention gate medical image segmentation model, and marking the model as EAGC-IUNet;
S140: training the EAGC _ IUNet constructed;
s150: and according to the medical image segmentation method, a target region segmentation result is given.
2. The method according to claim 1, wherein the data preprocessing method in step S120 includes steps S210 to S230; the method comprises the steps of S210 three-dimensional medical image slicing, S220 two-dimensional image normalization and S230 two-dimensional image scaling:
s210: if the medical image is a three-dimensional image, carrying out two-dimensional slicing on each original medical image data and the corresponding manual segmentation result data of the target area along the cross section;
S220: in order to accelerate the neural network training convergence and ensure the network to converge rapidly, firstly, normalizing all two-dimensional images, namely changing each pixel value of the images from [0,255] to [0,1]; the normalization formula is as follows:
Wherein x i denotes an ith pixel value, and max (x) denote maximum and minimum values of the pixels, respectively;
Secondly, respectively constructing an original image training data set I train, an original image testing data set I test, a manual segmentation result training data set L train and a manual segmentation result testing data set L test according to the ratio of 80% to 20% for the normalized original medical image data and the corresponding manual segmentation result data of the target area;
s230: all image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.
3. The method according to claim 1, wherein the construction of the graph-based edge attention gate medical image segmentation model (EAGC _ IUNet) using the modified unet3+ as the backbone network in step S130 comprises a graph encoder module, a convolutional encoder module and a decoder section, in particular as follows:
310 in a graph encoder module, including a construct weighted adjacency matrix, a construct node annotation feature, and two improved residual graph convolution modules MCMCRes _gcn; the specific calculation process comprises the following steps: firstly, respectively solving an adjacent matrix and a node characteristic matrix of an edge relation for an original input image; since the graph rolling operation is a laplace smoothing, neighboring nodes tend to have similar characteristics when information propagates between the neighboring nodes; in order to prevent the phenomenon that the number of the picture winding layers is too deep so as to generate excessive smoothness, a two-layer picture winding network structure is adopted; finally, taking the adjacent matrix and the node characteristic matrix as input of an improved residual diagram convolution module, and calculating by using two improved residual diagram convolution modules to obtain two-dimensional diagram convolution characteristics;
Defining the graph structure data as triples G (N, E, F); n represents a node annotation vector set of a graph with the size of |N| x S, |N| represents the number of nodes in the graph, S represents the dimension of the node annotation vector, E is an edge set of the graph, and F represents a graph feature; the graph structure data is different from the representation of the data in euclidean space, and the matrix N and the edge set E are not unique; the matrix N corresponds to the set E, and N and E are arranged according to the order of the nodes;
320 in a convolutional encoder block, comprising five convolutional encoding blocks, wherein each convolutional encoding block consists of two 2D convolutional layers, a batch normalization layer, and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, carrying out five convolution block calculations and four downsampling operations on an original input image; reconstructing a second dimension of the result graph convolution feature of the graph encoder into a two-dimensional square matrix, namely converting the graph convolution feature into a three-dimensional matrix; performing channel splicing on the two-dimensional graph convolution characteristics after up-sampling of different scales and the convolution characteristics of the last four convolution encoders respectively;
The structure of the 330 convolution decoding module comprises four convolution decoding modules, four edge attention gates, a two-dimensional convolution layer and a Sigmoid layer; the first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.
4. A method according to claim 3, characterized in that it involves in 310 the construction of a cosine weighted adjacency matrix, the construction of node annotation features and the improvement of the residual map convolution module, which are calculated in particular as follows:
(1) Constructing a weighted adjacency matrix, taking the influence of distance and pixel values on the correlation between nodes into consideration, using the distance between the nodes and the pixel values as vectors, and calculating the weighted adjacency matrix through cosine distance;
the vectors A and B consist of the distance between nodes and pixel values;
(2) The node annotation N is initialized by pyramid multi-scale features of the pre-training U-Net; extracting feature graphs of decoders with different depths from a U-Net model, respectively carrying out up-sampling with different multiples to achieve the same size as the feature graphs output by the decoder of the last layer, and carrying out channel splicing on the feature graphs to obtain the number S of feature channels; assuming that the number of long pixels of the edges of the spliced multilayer feature images is xi, the number of feature image pixels is xi multiplied by xi= |N|; let the pixel of the ith row and jth column of the kth channel in the feature map be denoted as x i,j,k, and the eta node annotation be denoted as
nη=(xi,j,1,xi,j,2,…,xi,j,S),i,j=1,2,...,ξ
Wherein, the calculation formula of eta is eta= (i-1) x lambda+j, lambda is the number of pixels of the second dimension of the image; splicing N η on the first dimension according to the node sequence eta to obtain a node annotation N;
(3) The graph convolution calculation is carried out on one graph, and the normalized Laplace matrix L is as follows
L=I-D-1/2AD-1/2
Where D is the diagonal matrix, since the laplace matrix L is a real symmetric semi-definite matrix, there is a set of orthogonal eigenvectors, obtained by fourier basis diagonalization u= [ U 0,u1,...,un-1 ]
L=UΛUT
Where matrix Λ is a diagonal matrix of eigenvalues, Λ=diag ([ λ 0,...,λn-1])∈Rn×n; fourier transform of graph g θ as follows
gθ(L)*x=Ugθ(Λ)UTx
In the non-parametric filter g θ (Λ) =diag (θ), the parameter θ is a fourier coefficient vector; in order to solve the limitation of the non-parametric filter, which is not limited to the vertex domain and has high time complexity, a polynomial filter is used to replace the non-parametric filter; the polynomial filter formula is as follows
Substitution is available
By matrix transformation, GCN output values are as follows
Wherein, (α 1,α2,...,αK) is an arbitrary value, the random initialization value changes the parameter value by back propagation;
The improved residual diagram convolution module (MCMCRes _GCN Block) is based on a two-layer diagram convolution layer, introduces the ideas of residual connection and Dropout in ResNet, and performs MCMC sampling screening on node feature vectors in a diagram structure; the residual structure is then applied to a graph convolution module; the improved residual image convolution module takes two layers of image convolution as a basic structure, x 0 input by the image convolution layer is sampled by MCMC to obtain x 1,x1 and F (x 0) output by the image convolution layer is added to obtain final characteristic H (x 0), and the formula is as follows
H(x0)=F(x0)+x1。
5. A method according to claim 3, characterized in that the edge attention gate in 330 is, in particular, as follows: x l is the feature map output at the first layer, the feature size is C x×Hx×Wx, where C x is the number of channels of the feature map at the first layer, and H x×Wx is the size of each feature map; the gating signal u is the feature mapping of the previous layer up-sampling, and the feature size is C u×Hu×Wu;
Introducing G x as a lateral operator of Sobel, G y as a longitudinal operator of Sobel, performing Padding and convolution operation on x l by using the lateral operator and the longitudinal operator of Sobel, and performing point-by-point phase to obtain F u,Fx;
Wherein, is convolution operation; f u and F x are subjected to convolution kernels of 1 multiplied by 1 to obtain a feature W u and a feature W x, and then the feature mapping size obtained by adding the features point by point is C υ×Hu×Wu, so that the outline features of the features are enhanced; grid resampling is carried out by using quadratic linear interpolation after sequentially passing through activation functions of linear transformation and nonlinear transformation; the original feature map extracted over multiple scales and the attention coefficient (α) weighted edge enhanced feature map are combined by a jump connection, where the attention coefficient α e [0,1] preserves features that are relevant only to a particular task by identifying salient feature regions and modifying the attention weight distribution.
6. The method according to claim 1, wherein the model is trained in step S140 using the original image training dataset I train and the manual segmentation result training dataset L train in step S120; the method comprises the following specific steps: firstly, inputting I train into a network to calculate to obtain a round of iterative calculation result, comparing the result with a corresponding manual segmentation result in L train, and calculating a loss value by using a loss function; secondly, calculating gradients through a random gradient descent optimizer and updating weights in a network through back propagation; then iterating the process until the error requirement is met, and obtaining a network training model; finally, verifying the model by using the original image test dataset I test and the manual segmentation result test dataset L test; the loss function in the above step is calculated as follows using a weighted loss function
Ls=α·LossBBCE+β·LossDICE+γ·LossMIoU
Wherein α, β, γ represent weights of three loss functions; loss BBCE represents a balanced two-class cross-entropy Loss function, loss DICE represents a Dice Loss function, loss MIoU represents a MIoU Loss function, and the calculation formula is as follows
Wherein mu represents the balance super parameter of positive and negative samples, and the ratio of the positive samples to the total sample size is usually taken; y is the result of the prediction and is,For the label image, K is the category number.
7. An edge attention gate medical image segmentation apparatus based on a graph, comprising:
a data collection unit: collecting original medical image data and corresponding manual segmentation result data of a target area to respectively form an original image data set I and a label data set L;
a data preprocessing unit: the original image data set and the label data set are subjected to data preprocessing, and a test and training data set is constructed;
EAGC _ IUNet model building unit: constructing a graph-based edge attention gate medical image segmentation model, and marking the model as EAGC-IUNet;
EAGC _ IUNet model training unit: training the EAGC _ IUNet constructed;
a medical image segmentation unit: and according to the medical image segmentation method, a target region segmentation result is given.
8. The apparatus of claim 7, wherein the data preprocessing unit comprises: a three-dimensional medical image slicing subunit, a two-dimensional image normalization subunit, and a two-dimensional image scaling subunit:
three-dimensional medical image slice subunit: if the medical image is a three-dimensional image, carrying out two-dimensional slicing on each original medical image data and the corresponding manual segmentation result data of the target area along the cross section;
Two-dimensional image normalization subunit: in order to accelerate the neural network training convergence and ensure the network to converge rapidly, firstly, normalizing all two-dimensional images, namely changing each pixel value of the images from [0,255] to [0,1]; the normalization formula is as follows:
Wherein x i denotes an ith pixel value, and max (x) denote maximum and minimum values of the pixels, respectively;
Secondly, respectively constructing an original image training data set I train, an original image testing data set I test, a manual segmentation result training data set L train and a manual segmentation result testing data set L test according to the ratio of 80% to 20% for the normalized original medical image data and the corresponding manual segmentation result data of the target area;
A two-dimensional image scaling subunit: all image data is scaled using the size () function in the PIL packet, with the image size scaled to 256×256.
9. The apparatus of claim 1, wherein EAGC _ IUNet model building unit: the improved unet3+ is adopted as a backbone network, and comprises a graph encoder module, a convolution encoder module and a decoder part, and the following steps are adopted:
A graph encoder module: the method comprises the steps of constructing a weighted adjacency matrix, constructing node annotation characteristics and two improved residual diagram convolution modules MCMCRes _GCN; the specific process comprises the following steps: firstly, respectively solving an adjacent matrix and a node characteristic matrix of an edge relation for an original input image; since the graph rolling operation is a laplace smoothing, neighboring nodes tend to have similar characteristics when information propagates between the neighboring nodes; in order to prevent the phenomenon that the number of the picture winding layers is too deep so as to generate excessive smoothness, a two-layer picture winding network structure is adopted; finally, taking the adjacent matrix and the node characteristic matrix as input of an improved residual diagram convolution module, and calculating by using two improved residual diagram convolution modules to obtain two-dimensional diagram convolution characteristics;
Defining the graph structure data as triples G (N, E, F); n represents a node annotation vector set of a graph with the size of |N| x S, |N| represents the number of nodes in the graph, S represents the dimension of the node annotation vector, E is an edge set of the graph, and F represents a graph feature; the graph structure data is different from the representation of the data in euclidean space, and the matrix N and the edge set E are not unique; the matrix N corresponds to the set E, and N and E are arranged according to the order of the nodes;
A convolutional encoder module: in the convolutional encoder block, five convolutional encoding blocks are included, wherein each convolutional encoding block consists of two 2D convolutional layers, a batch normalization layer and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, carrying out five convolution block calculations and four downsampling operations on an original input image; reconstructing a second dimension of the result graph convolution feature of the graph encoder into a two-dimensional square matrix, namely converting the graph convolution feature into a three-dimensional matrix; performing channel splicing on the two-dimensional graph convolution characteristics after up-sampling of different scales and the convolution characteristics of the last four convolution encoders respectively;
A decoder: the structure of the convolution decoding module comprises four convolution decoding modules, four edge attention gates, a two-dimensional convolution layer and a Sigmoid layer; the first to three convolution decoding modules consist of two 2D convolution layers, a batch normalization layer, a ReLU activation layer, a deconvolution layer and a splicing layer, and the fourth convolution decoding module consists of two 2D convolution layers, a batch normalization layer and a ReLU activation layer; the specific calculation process comprises the following steps: firstly, calculating image features through the edge attention gate and the convolution decoding module, then carrying out weighted splicing on calculation results of the first to fourth convolution decoding modules, and finally obtaining a final segmentation result through calculation of a two-dimensional convolution layer and a Sigmoid layer.
10. The apparatus of claim 9, wherein in the graph encoder module, the cosine weighted adjacency matrix is constructed, the node annotation feature is constructed, and the residual graph convolution module is modified, in particular as follows:
(1) Constructing a weighted adjacency matrix, taking the influence of distance and pixel values on the correlation between nodes into consideration, using the distance between the nodes and the pixel values as vectors, and calculating the weighted adjacency matrix through cosine distance;
the vectors A and B consist of the distance between nodes and pixel values;
(2) The node annotation N is initialized by pyramid multi-scale features of the pre-training U-Net; extracting feature graphs of decoders with different depths from a U-Net model, respectively carrying out up-sampling with different multiples to achieve the same size as the feature graphs output by the decoder of the last layer, and carrying out channel splicing on the feature graphs to obtain the number S of feature channels; assuming that the number of long pixels of the edges of the spliced multilayer feature images is xi, the number of feature image pixels is xi multiplied by xi= |N|; let the pixel of the ith row and jth column of the kth channel in the feature map be denoted as x i,j,k, and the eta node annotation be denoted as
nη=(xi,j,1,xi,j,2,…,xi,j,S),i,j=1,2,…,ξ
Wherein, the calculation formula of eta is eta= (i-1) x lambda+j, lambda is the number of pixels of the second dimension of the image; splicing N η on the first dimension according to the node sequence eta to obtain a node annotation N;
(3) The graph convolution calculation is carried out on one graph, and the normalized Laplace matrix L is as follows
L=I-D-1/2AD-1/2
Wherein D is an angle matrix; since the laplace matrix L is a real symmetric semi-definite matrix, there is a set of orthogonal eigenvectors, obtained by fourier-basis diagonalization u= [ U 0,u1,...,un-1 ]
L=UΛUT
Where matrix Λ is a diagonal matrix of eigenvalues, Λ=diag ([ λ 0,...,λn-1])∈Rn×n; fourier transform of graph g θ as follows
gθ(L)*x=Ugθ(Λ)UTx
In the non-parametric filter g θ (Λ) =diag (θ), the parameter θ is a fourier coefficient vector; in order to solve the limitation of the non-parametric filter, which is not limited to the vertex domain and has high time complexity, a polynomial filter is used to replace the non-parametric filter; the polynomial filter formula is as follows
Substitution is available
By matrix transformation, GCN output values are as follows
Wherein, (α 1,α2,...,αK) is an arbitrary value, the random initialization value changes the parameter value by back propagation;
The improved residual diagram convolution module (MCMCRes _GCN Block) is based on a two-layer diagram convolution layer, introduces the ideas of residual connection and Dropout in ResNet, and performs MCMC sampling screening on node feature vectors in a diagram structure; the residual structure is then applied to a graph convolution module; the improved residual image convolution module takes two layers of image convolution as a basic structure, x 0 input by the image convolution layer is sampled by MCMC to obtain x 1,x1 and F (x 0) output by the image convolution layer is added to obtain final characteristic H (x 0), and the formula is as follows
H(x0)=F(x0)+x1。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311697571.0A CN117953208A (en) | 2023-12-11 | 2023-12-11 | Graph-based edge attention gate medical image segmentation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311697571.0A CN117953208A (en) | 2023-12-11 | 2023-12-11 | Graph-based edge attention gate medical image segmentation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117953208A true CN117953208A (en) | 2024-04-30 |
Family
ID=90800708
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311697571.0A Pending CN117953208A (en) | 2023-12-11 | 2023-12-11 | Graph-based edge attention gate medical image segmentation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117953208A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118212241A (en) * | 2024-05-22 | 2024-06-18 | 齐鲁工业大学(山东省科学院) | Neck MRI image analysis method based on dual-stage granularity network |
-
2023
- 2023-12-11 CN CN202311697571.0A patent/CN117953208A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118212241A (en) * | 2024-05-22 | 2024-06-18 | 齐鲁工业大学(山东省科学院) | Neck MRI image analysis method based on dual-stage granularity network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | ME‐Net: multi‐encoder net framework for brain tumor segmentation | |
CN107610194B (en) | Magnetic resonance image super-resolution reconstruction method based on multi-scale fusion CNN | |
CN110335261B (en) | CT lymph node detection system based on space-time circulation attention mechanism | |
CN108921851B (en) | Medical CT image segmentation method based on 3D countermeasure network | |
Frid-Adar et al. | Improving the segmentation of anatomical structures in chest radiographs using u-net with an imagenet pre-trained encoder | |
Birenbaum et al. | Longitudinal multiple sclerosis lesion segmentation using multi-view convolutional neural networks | |
CN113947609B (en) | Deep learning network structure and multi-label aortic dissection CT image segmentation method | |
CN112767417B (en) | Multi-modal image segmentation method based on cascaded U-Net network | |
CN105825509A (en) | Cerebral vessel segmentation method based on 3D convolutional neural network | |
CN111640120A (en) | Pancreas CT automatic segmentation method based on significance dense connection expansion convolution network | |
CN117036162B (en) | Residual feature attention fusion method for super-resolution of lightweight chest CT image | |
Sharan et al. | Encoder modified U-net and feature pyramid network for multi-class segmentation of cardiac magnetic resonance images | |
CN109118487B (en) | Bone age assessment method based on non-subsampled contourlet transform and convolutional neural network | |
CN115147600A (en) | GBM multi-mode MR image segmentation method based on classifier weight converter | |
CN117953208A (en) | Graph-based edge attention gate medical image segmentation method and device | |
CN110930378A (en) | Emphysema image processing method and system based on low data demand | |
CN114596317A (en) | CT image whole heart segmentation method based on deep learning | |
CN116091412A (en) | Method for segmenting tumor from PET/CT image | |
Francis et al. | Diagnostic of cystic fibrosis in lung computer tomographic images using image annotation and improved PSPNet modelling | |
Xia et al. | Deep residual neural network based image enhancement algorithm for low dose CT images | |
CN118037791A (en) | Construction method and application of multi-mode three-dimensional medical image segmentation registration model | |
CN118015396A (en) | Unsupervised medical image organ segmentation model-based pre-training method | |
CN117635625A (en) | Pancreatic tumor segmentation method based on automatic data enhancement strategy and multi-attention-assisted UNet | |
CN117522891A (en) | 3D medical image segmentation system and method | |
Badretale et al. | Fully convolutional architecture for low-dose CT image noise reduction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |