Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and its practical application, and to enable others skilled in the pertinent art to understand the invention for various advantages and features. The innovative concepts embodied in the present invention can be utilized in conjunction with other embodiments in a wide variety of design forms. Therefore, the embodiments described herein should be used to assist those skilled in the art to fully and completely understand the innovative concepts and the scope of the present invention, and should not be construed as limiting the scope of the claims presented herein.
Quotation marks are used to refer to words and phrases and to indicate that the definitions or usage of the words and phrases are set forth herein to avoid obscuring the understanding.
Some terminology with precedence relationships may be used herein to facilitate the listing of objects to be described. These terms should not be construed as limiting the composition and structure of the method of the invention, but should be construed as merely providing temporary labels added to facilitate the differentiation of the proposed objects, and interchanging the order of priority, for example, by interchanging the terms "first" and "second" without affecting the description herein of the innovative concepts of the invention.
When the term "and/or" is used herein to link a set of statements, no limitation is placed on the combination of the order of the objects of the statements. It is acceptable to arbitrarily change the arrangement order of these objects.
The embodiments of the invention are described herein using terminology that is not intended to be limiting of the inventive concepts of the present invention.
Reference herein to embodiments of the invention or objects and elements contained therein may be made in the singular or plural. However, unless the context clearly dictates otherwise, singular and plural references in the text may also refer broadly to the same plural items and plural references in the text may also refer broadly to the same single item.
When describing the present invention herein, the use of the word "comprising" or variations thereof is used to refer to a group of objects encompassed by the present invention. If not explicitly stated in the context, such references are inexhaustible, meaning that one or more objects and elements can continue to be added, including: features, integers, steps, operations, elements, components, phases, parts, and/or groups. The use of "component parts including" or the like to reference a set of objects, elements or features is also intended to mean that no mention is made of the same; when the word "a component includes but is not limited to" or the like is used, it means that the content may include nothing mentioned; when the word "consisting essentially of" or the like is used, it is meant that some of the contents are omitted here, but the omitted contents do not affect the basic features and novelty of the proposed concept of the present invention.
Unless otherwise defined herein, specific terms are given a unique definition, unless otherwise indicated, and all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The present invention will be described herein using a few specialized or daily terms. To the extent that the meaning is not specifically expressed in the context in which it is intended to be understood in an idealized or overly formal sense, it is submitted that it is understood in accordance with the common usage of such terms in the art to which this invention pertains.
"radiation therapy" or "radiotherapy" refers to the use of a directable radiation therapy device to emit a "radiation beam" of energetic particles to a region of a patient's body occupied by cancerous cells or tumors so as to produce a radiation absorbed dose in the region to directly destroy the DNA of the cells in the region, or to indirectly destroy the DNA by generating charged particles in the cells. Cells repair DNA damage and when the repair capacity is insufficient to restore DNA damage, cells stop dividing or die. But this process can also cause damage to healthy cells of vital organs and anatomical structures surrounding the area. Therefore, one of the important links in the radiation therapy planning process is to make accurate segmentation based on high-resolution medical images in order to ensure avoidance of vital organs when planning radiotherapy, minimize the amount of absorbtion that forms in healthy tissue, and completely cover the entire target area in order to reduce the probability of recurrence.
"radiotherapy planning" or "radiotherapy planning" is a step in the radiation therapy process, which is performed by a group of professionals, including: a radiation oncologist, a radiation therapist, a medical physicist, a medical dosimeter to design a plan for external radiation therapy of a patient with a tumor; the resulting treatment plan is referred to as a "radiotherapy plan". Generally, a medical image of a patient is first processed to obtain a "segmentation" and then a radiotherapy plan is designed based thereon. "segmentation" as used herein refers to the use of a set of regions of interest to describe the correspondence of pixels in a medical image to target areas, vital organs, and other human anatomy within the human body.
"two-dimensional medical image for radiotherapy" or "medical image" refers to data obtained using a certain imaging technique for medical or medical purposes in the field of radiotherapy. In practical applications, such imaging modalities include, but are not limited to: computed Tomography (CT), magnetic resonance imaging (MR), Positron Emission Tomography (PET) and Single Photon Emission Computed Tomography (SPECT). Among them, the CT image data is usually used as a main image data set in the radiotherapy planning; MRI image data may be used as a primary or secondary set of image data for soft tissue imaging; the PET image data and SPECT image data may reflect the metabolism of a particular substance in the human body, and thus may be used to improve the delineation accuracy of a planned target region involving such regions when the diseased tissue absorbs a particular substance differently than surrounding tissue. For the convenience of doctors, data obtained by certain imaging technologies need to comply with "digital imaging and communications in medicine (DICOM)" or "digital imaging and communications in medicine (DICOM-RT) for radiotherapy" when being transmitted among imaging devices, storage devices, and radiotherapy devices. These two protocols do not depend on a particular device to function.
In particular, "CT" scanning generates images reflecting structures of the human body by means of a computer-controlled X-ray irradiation process. In CT scanning, an X-ray tube is moved around a patient and emits X-rays toward the body under computer control. The emitted X-rays may be attenuated during transmission. A set of linear detectors is positioned on the other side of the body in the path of the X-rays for receiving a portion of the X-rays that pass through the body. The intensity distribution of the X-rays reaching the detector is not uniform due to the different attenuation that occurs when the X-rays pass through different tissues. The X-ray intensity distribution data received by the detector may constitute projection data, and then a set of images is reconstructed using a back-projection algorithm to reflect the attenuation of the X-rays in the human body, i.e., a set of images reflecting the tissue density distribution of the human body. Typically, the resulting three-dimensional image consists of a set of slices, each slice reflecting the density distribution within a slice of human tissue of a certain thickness.
An "MRI" scan uses a device moving on a circular orbit capable of generating a strong magnetic field to form radio waves for irradiating a patient located near the center of the circle, causing the patient's body tissues to emit radio waves reflecting the information itself. The frequency distribution of radio waves emitted by different human tissues including tumors, depending on their chemical composition, will have a relatively large intensity at the frequency corresponding to the specific composition. From which images depicting organs of the human body can be obtained. MRI can be used to generate three-dimensional images depicting designated parts of the human body; unlike CT scanning, however, MRI images can be used to more finely distinguish between soft tissues of close density.
"PET" scanning can be used to generate images that reflect dynamic chemical processes occurring within human tissue. For example, sugar metabolism occurs in human tissues. Typically, a small amount of powdered sugar is radiolabeled prior to PET scanning and mixed with normal sugar, and the mixture is prepared as a solution and injected into the patient. Since tumor cells consume or absorb sugars at a higher rate than normal human tissue, more radioactive material accumulates near the tumor site. At this time, a PET scanning device may be used to track the distribution of the sugar inside the tumor or other parts of the body. In some embodiments, fusing the CT scan results with PET images may help the physician to better distinguish normal tissue from abnormal tissue.
"SPECT" uses radiotracers and scanning equipment to record data and a computer to reconstruct two-dimensional or three-dimensional images. In SPECT scanning, a small amount of a radiopharmaceutical, which is harmless to the human body, is injected into a blood vessel, and then a detector is used to track the radioactive material; the obtained data can reflect the absorption condition of the human cells to the radioactive substances. The SPECT scan results can be used to describe the blood flow and metabolism of human tissue.
A "region of interest" is a subset of a medical image designated for medical or medical purposes. For discrete medical images, the region of interest refers to a sub-region composed of a portion of a bin of a two-dimensional medical image, or a sub-region composed of a portion of a voxel of a set of three-dimensional medical images. For medical images in a continuous format, the region of interest refers to a region within a closed curve in a two-dimensional medical image or a region within a closed curve in a set of three-dimensional medical images. For example, a region of interest referred to as "Gross Tumor Volume (GTV)" refers to the approximate area occupied by a harmful object that is clearly visible in medical images. GTV is determined, and generally, a physician needs to determine the location of the GTV in conjunction with information from imaging modalities (CT, MRI, ultrasound, etc.), diagnostic modalities (pathology and past cases, etc.), and clinical examinations. The present invention also uses the region of interest to refer to the region of two-dimensional medical image that is composed of pixels corresponding to an important organ, such as: the region occupied by the heart in a medical image of the breast. Thus, segmenting the medical image means delineating a GTV in the medical image to determine the extent of the target cells so that the beams can be set for these cells when planning the radiotherapy. At the same time, accurate delineation of the anatomical structures surrounding the GTV also helps to avoid vital organs in radiotherapy planning and minimize the radiation absorbed dose formed in the anatomical structures surrounding the GTV.
"Forward planning" is one way to plan in external radiation radiotherapy. In this manner, a physician (e.g., a dosimeter) may set the beam in a radiation therapy planning system to produce a sufficient radiation absorbed dose in the tumor region while avoiding vital organs and minimizing the radiation absorbed dose to healthy tissue. Accurate segmentation of medical images is a key to forward planning.
"inverse planning" is one way to design a radiotherapy plan. In this manner, the radiation oncologist is responsible for locating vital organs and tumors within the patient; the dosimeter gives the target dose and the importance factor to these regions. These inputs serve as conditions to assist an optimizer in generating and screening a radiotherapy plan, i.e., placing a set of external beams to the target volume. Unlike "forward planning" known in the field of oncology, inverse planning does not require manual planning and adjustment of the plan in a trial and error manner, but rather an optimization procedure can be used to solve the inverse problem set by the dosimeter. Similarly, accurate segmentation of medical images is a prerequisite for inverse planning.
"image segmentation" refers to identifying objects contained in an arbitrary given image and locating the contours thereof. In the field of radiotherapy, "segmenting a medical image" refers to providing a set of regions of interest for a medical image used in radiotherapy planning to describe an object contained in the image, including but not limited to: a target region containing a tumor that requires a higher radiation absorbed dose to be formed by irradiation, vital organs that need to be avoided in radiotherapy, other healthy tissues that need to minimize the radiation absorbed dose in order to reduce complications, and/or other anatomical structures. The output of "segmenting the medical image" is a set of regions of interest, i.e. pixels in the medical image are marked with non-repeating region of interest numbers, and the marked pixels only belong to the region of interest corresponding to the numbers. Such a component of interest is a "segmentation" of the medical image. In one aspect, accurate "segmentation" can help the physician in designing a radiotherapy plan to completely cover the target area to reduce the probability of recurrence and at the same time reduce damage to surrounding healthy tissue. Meanwhile, the automation degree of the radiotherapy plan is improved, the time consumption of the radiotherapy plan is reduced, and the efficiency of the planning process is improved. This is beneficial and not harmful to the patient to be treated.
The non-deep learning segmentation method can be used for pre-segmenting a two-dimensional radiotherapy medical image so as to perform semantic segmentation on the medical image by using the trained DCNN at a subsequent stage. Here, the method for image segmentation of the present invention is divided into two methods: the segmentation method without deep learning is called "pre-segmentation method"; the "segmentation method" refers to the process of segmenting an image by using DCNN. Although the information provided by the "pre-segmentation method" may not be sufficient for an unambiguous delineation of the ROI, it may be used to generate an annotated image containing positioning information of the object as an initial input to the DCNN. Such initial labeled images may also be fused with medical images one by one to form a set of "channel number plus one" fused images as input for the trained DCNN. For example, a segmentation method without deep learning can be used to quickly identify the skin and fat layers of the surface layer of the human body, can be used to determine the contour of the human body, or further determine other human body structures. Therefore, the information obtained by pre-segmentation by using a non-deep learning segmentation method can be used for reducing the workload of the DCNN, so that the DCNN can be used for segmenting the region of interest in a targeted manner. The image segmentation method without deep learning includes but is not limited to: threshold-based methods, derivative-based methods, morphological methods, region growing, watershed methods, level set methods.
"deep convolutional neural network" (DCNN) refers to a feedforward artificial neural network designed with a weight-sharing architecture and applying convolution operation and pooling operation to process data in the field of machine learning, and can be used for image recognition and object classification. Through simple adjustment, the DCNN can be used for performing semantic segmentation on two-dimensional medical images for radiotherapy. Such DCNN can generate a multichannel probability "score map" pixel by performing a set of transformations on the input image in successive layers. Each channel in the "score map" contains a probability distribution describing whether the corresponding region of interest is likely to appear in the input image, and the location where the region of interest is likely to appear. Such a multi-channel "score map" can further be used to generate a final annotation image describing which vital organs and anatomical structures are present in the input image, and the location of the present objects in the input image.
The "reference data" refers to a labeled image obtained by accurately delineating a two-dimensional medical image. Such data can be used to compare with the score map generated by the DCNN under training and calculate a loss function based on the difference between the two.
The "initial annotation image" refers to an annotation image obtained by processing a two-dimensional medical image by using a pre-segmentation method. Such images may be fused with corresponding two-dimensional medical images to generate fused images, i.e., medical images with labeling information. The fused image may be used as an input to the DCNN, and the information contained therein may help reduce the workload borne by the DCNN during the segmentation phase of the radiotherapy planning process.
The "feature map" is the output of a layer of transforms in the DCNN, which is the result of processing the output from a previous layer of transforms or the labeled medical image input to the DCNN. Here, the "feature map" refers to a feature map or an input image input to the layer, and a sub-region extracted in the traversal process is divided by a kernel matrix included in the layer in a transformation manner to obtain a pixel value, so that a "feature map" can be generated pixel by pixel as an output of the layer. When traversing the input image or feature map, the distance between adjacent traversal coordinates can be expressed as the number of pixels between adjacent coordinates, i.e. the "step length"; each time a sub-region is extracted from the input with respect to the traversal coordinates, the region is typically a neighborhood of the traversal coordinates, i.e., the "receptive field"; here, the step size may be larger than one pixel. With respect to the feature map of the layer output, a relatively ideal result is that the map should contain features of one or more aspects of the input to the layer; meanwhile, for the last layer of the DCNN, the output feature map should be sorted so that the information of each "channel" can be used to segment the corresponding region of interest.
"channel" refers to a feature map generated by one of the "filters" contained in one layer of the DCNN. When the transformation included in one layer of the DCNN includes a plurality of filters, each filter can independently generate a feature map; these feature maps may be combined into a multi-channel feature map as the output of the layer. When the DCNN is used to "segment medical images for radiotherapy", the ideal result is that, in the multi-channel feature map generated by the last layer of transformation of the DCNN, each channel only contains information that can be used to segment an area of interest; when a plurality of channels correspond to the same region of interest, the channels can always be integrated into one channel. Thus, a final multi-channel feature map can be obtained, so that each channel can be used for and can only be used for labeling a segmentation situation of a region of interest.
"layers" are the units that make up the DCNN. Each layer in the DCNN may contain one or more successive transforms for processing sub-regions extracted from the image or feature map input to the layer, i.e., "receptive fields", and the processing results make up the output of the layer, i.e., "feature maps". When the step length of extracting the receptive field is one pixel, the output characteristic graph has the same size with the input of the layer; when the step size is larger than one pixel, the size of the output is smaller than the input. Some of the transformations may be represented in the form of a three-dimensional matrix, a "kernel matrix".
A "filter" is a function that is used to process a sub-region, the "receptive field". The computed value of the function is a scalar quantity, which is the pixel value in one channel of the multi-channel pixels corresponding to the receptive field in the "feature map" output by the layer in which the filter is located. Here, the filters of one layer of the DCNN correspond one-to-one to the channels of the feature map output by the layer. In a DCNN, the first layer ideally contains filters that can process as input sub-regions taken from the input image; and the filter included in the last layer can be set appropriately, so that the output of the layer can obtain a 'multi-channel score map' through an up-sampling method, so that each channel can be used and can be used only for describing whether the input of the DCNN includes the region of interest corresponding to the channel and where the region of interest is located if the input of the DCNN includes the region of interest.
"receptive field" refers to a sub-region of a multi-channel signature. This region can be used as input to a "filter" and computed to obtain a scalar value. This value may be taken as the pixel value in one channel corresponding to the filter used, among the multi-channel pixels in the output feature map corresponding to the receptive field.
The "step length" refers to the component of the displacement between adjacent traverse coordinates in the direction of the spatial coordinate axis when the filter traverses the input image or feature map. If the step size is 1, one pixel value can be obtained as an output for each pixel in the input, and thus a feature map that is equal to the input in spatial dimension can be obtained as an output of the layer where the filter is located. When the step size is larger than or equal to 2, the output characteristic diagram is smaller than the input characteristic diagram in space size.
The "score map" or "rough score map" is a kind of feature map, which contains information extracted from previous layers step by step, and can be used to describe the position of an object appearing in the input image of the DCNN. When the number of the interested regions is larger, the graph is a multi-channel scoring graph. Processing the map using a probabilistic map model (PGM) results in a set of well-defined regions of interest describing the spatial position of objects appearing in the input image. Each channel of the graph contains a two-dimensional score map, where the "score" of each pixel represents the probability that the pixel belongs to the region of interest corresponding to the channel. The score can also be regarded as the confidence of the DCNN in the judgment that the pixel belongs to the region of interest, that is, for a multi-channel pixel in the score map, the higher the score in one channel of the pixel is, the higher the confidence of the DCNN in the judgment that the pixel belongs to the region of interest corresponding to the channel is. Ideally, the DCNN should be trained with appropriate reference data and medical images, so that the scores of each pixel in each channel in the multi-channel score map output by the DCNN can satisfy the appropriate preference order. For example, if a DCNN is trained using medical images of healthy tissue containing tumor sites and associated fiducial data, then after proper training, it should be seen in the score map output by the DCNN that the probability that a pixel of the site-occupying region belongs to the tumor region should be higher than the probability that the pixel belongs to the original healthy tissue.
The "probability map model" (PGM) is a method of describing inter-pixel correlations by means of the energy function of a particle system. For a given pixel map, when an energy function is calculated according to the pixel relation of the map, if the energy function gives a minimum value, a segmentation of the map can be obtained according to the current pixel relation, namely a group of interested regions are found in the process. When PGM is used for segmentation of medical images, the model can be used to process DCNN-generated multi-channel scoring maps to obtain well-defined target regions containing tumor or cancer cells, or regions of interest containing vital organs and other anatomical structures, i.e., "post-segmentation".
"multipass label graph" refers to a multipass graph obtained by processing a "score graph" using a PGM method. Each of which corresponds to a region of interest. The pixel value of a multi-channel pixel in a channel in the figure is a number, which indicates that the pixel belongs to the region of interest corresponding to the number.
The "final labeled image" is an output labeled image obtained by processing the input image by combining the DCNN and PGM methods, and the image is obtained by fusing the multi-channel labeled images. When the graph is fused, if the pixel values of a plurality of channels are not zero, the number of the interesting region corresponding to the channel with the preference order being more priority is used for marking the pixel.
A multidimensional quantity will be referred to herein in some instances in a simplified form. For example, a two-dimensional medical image for radiotherapy may also be referred to as: 2D medical images for radiotherapy; the four-dimensional tensor can also be referred to as a 4D tensor.
The embodiment of the invention is mainly used for accurately and effectively segmenting the two-dimensional radiotherapy image. The user can automatically complete the segmentation of the radiotherapy medical image according to the requirement by using the method provided by the invention and the matched equipment. In some embodiments, software and hardware designed according to the present invention may also assist the user in manually segmenting the medical image. Furthermore, with respect to the segmentation results from the medical images of the patient, a radiotherapy planning system (TPS) of choice may be derived and submitted to the physician. The physician may use the selected system to manually or automatically design and optimize a radiotherapy plan to arrive at a radiotherapy plan that may be used to treat the patient.
Figure 1 is a radiotherapy workflow 100 that the present invention is designed to follow, which may be used to design and optimize a radiotherapy plan. The specific embodiment is as follows. When the physician confirms that a radiotherapy plan needs to be planned for a patient and radiotherapy is administered,
in stage 101, a medical image 110 for diagnosis is taken of a patient using an imaging device and passed on to subsequent steps. The data transmission should conform to the DICOM protocol;
at stage 102, the physician segments and delineates a region of interest from the medical image 110. The processing results are passed to subsequent steps as part of the medical image 110;
in stage 103, the doctor designs the radiotherapy prescription according to the sketched result of the medical image 110, which includes but is not limited to: a radiation dose for killing tumor or cancer cells within the target area;
at stage 104, the physician designs a radiotherapy plan according to the prescription set forth at stage 103;
at stage 105, the senior physician is responsible for reviewing the radiotherapy plan. The approved treatment plan 112 is passed on to subsequent steps for administering radiation therapy 106 to the patient. Treatment plans that fail the audit are returned. The physician may choose to repeat one or more of the preceding stages and then review the modified plan again; the physician may also choose to redesign a plan, and so on until the new plan is reviewed.
Ideally, the physician can use appropriate equipment to perform the various stages of the radiation therapy workflow 100, and the equipment is suitably integrated into a system that can be used to design and optimize a complete radiation therapy plan and to deliver radiation therapy.
FIG. 2 is a schematic diagram of one embodiment of the present invention, including the hardware and software components of a medical image segmentation system 200, and imaging and radiotherapy apparatus; the medical image segmentation system 200 plays an important role in segmenting medical images. The components of the system include: a segmentation engine 220, a memory 211, a user interface 212, a set of processors 213, an external storage device 214; among them, the external storage device has various uses including: a device 250 for storing training set data, i.e. two-dimensional medical images for training the DCNN model, and reference data for labeling these images; a device 251 for storing a DCNN setting file, generally configured to store the trained DCNN setting, so as to reconstruct the trained DCNN model when used next time; a device 252 for storing target set data, i.e. medical images for radiotherapy; a device 253 for storing the program of the software component. A segmentation engine 220 designed according to the proposed method can be used to segment medical images for radiotherapy using DCNN and PGM models, and comprises four software components: training component 221, pre-segmentation component 222, segmentation component 223, post-segmentation component 224. The training component 221 is used to "train" a DCNN model to segment two-dimensional medical images for radiotherapy. The structure of the DCNN model is predefined 500 prior to training. The parameters of the various layers of the DCNN model are adjusted during the training process, and the training data is obtained from the storage device 250 through the process 260, which includes: two-dimensional medical images for training and reference data for labeling images for training. The output of the training component 221 is a trained DCNN model. The trained DCNN model is stored in the storage device 251 through a process 261. In use, the settings of the model may be read from the storage 251 by process 262 and a trained DCNN model may be reconstructed in the segmentation component 223. The task of the pre-segmentation component 222 is to read the medical images from the storage device 252 as input to generate a set of initial labeled images corresponding to the input images, fuse the two sets of images in a one-to-one correspondence, and then pass the fused images to the segmentation component 223 as input to the component. The segmentation component 223 includes a trained DCNN that reads the profile from the storage device 251 and reconstructs it via process 262. The component processes the fused image from the pre-segmentation component 222 using the reconstructed DCNN to obtain a set of multi-channel score maps as output. The figure depicts the prediction made by the DCNN as to the location of a region of interest in the input image. The post-processing component 224 takes the score map containing the region of interest prediction information as input, and generates a set of multi-channel label maps for describing the accurate positions of various regions of interest in the input image. Under an ideal condition, information in each channel of a multi-channel label graph can be sequentially fused into a single-channel label image, and the single-channel label image can be regarded as a segmentation of a corresponding two-dimensional medical image in a target set.
When the medical image segmentation system 210 in an embodiment is first started, the segmentation engine 220 is loaded into the memory 211. At this point, or when the user needs it in actual use, the training component 221 can be used to "train" a DCNN defined according to the architecture 500. The training set required for training needs to be organized in advance and stored in the storage device 250. When training is complete, the settings for the DCNN are stored in the storage device 251. Ideally, when the system 210 is powered off and powered on again, the data stored in the storage device 251 should be usable for reconstructing a DCNN model, i.e., when the system 210 is powered on, the segmentation component 223 reconstructs a DCNN model that can be used for segmenting two-dimensional medical images for radiotherapy according to a predefined structure 500 and the settings stored in the storage device 251. Thus, in future use, when conditions permit or require, the trained models can be retrained using a new training set, or the retrained models can be used instead of the old models.
In the ideal case of the situation where,
the medical image received by the segmentation system 210 is in accordance with DICOM protocol 240, and the output image is in accordance with DICOM-RT protocol 241 for the doctor to use;
medical images and corresponding reference data for training the DCNN are taken from the database 201 and stored in the storage device 250 for the training component 221 to acquire and use through the process 260 for training. The medical image data used for training should conform to DICOM protocol 240; the corresponding reference data can be manually drawn by a doctor or drawn by using a semi-automatic or automatic tool;
the settings of the trained DCNN are saved in the storage device 251 through process 261. After training is complete, the segmentation component 223 can be read and used at any time through process 262;
medical images of a patient to be treated may be acquired from database 201 or acquired using imaging device 202 and then stored in storage device 252. These data are processed by a pre-segmentation component 222, a segmentation component 223, and a post-segmentation component 224 to obtain a segmentation of the input image. The physician can use the segmentation results herein to complete the subsequent steps of radiotherapy planning.
The segmentation engine 220 may use the trained DCNN to process the medical image from the database 201 or the imaging device 220, taken from the storage device 252, resulting in a segmentation of the image. The segmentation result can be further processed to obtain a set of final labeled images corresponding to the input images one by one. Ideally, the final labeled image should conform to the DICOM-RT protocol 241 for delivery to the radiotherapy planning apparatus 230 or the radiotherapy apparatus 231 for delivery of radiotherapy.
When the segmentation engine 220 is first read into the memory 211, a DCNN touch should be trained using the training set image stored on the storage device 250 using the training component 221. The training set includes a set of two-dimensional medical images accumulated in the past and reference data for labeling corresponding to the two-dimensional medical images.
Referring to fig. 3, a flowchart 300 of training a DCNN for segmentation of medical images according to the present invention begins at 301:
at step 310, a set of two-dimensional medical images is acquired from the database 201 and saved to the storage device 250. Ideally, the database 201 should contain a large number of two-dimensional medical images accumulated in the past, so that the training set can be used for fully training the DCNN model;
in step 311, a set of reference data is obtained from the database 201 and stored in the storage device 250, wherein the reference data corresponds to the medical image obtained in step 310. The training component 221 will consider the reference data as an ideal segmentation of the medical images in the training set and train the DCNN model accordingly;
at step 312, a DCNN model is initialized according to the predefined framework 500;
at step 313, the pre-segmentation component 222 generates initial labeled images corresponding one-to-one to the medical images in the training set using one or more non-deep learning methods;
in step 314, fusing the medical images in each training set with the corresponding initial labeled images to obtain a set of fused images; the channel number of one fused image is one more than that of the corresponding medical image, and the fused image is used for storing the marking information;
in step 320, the DCNN is used to process the fused image to obtain a multi-channel rough score map corresponding to the fused image as a prediction, i.e., a heuristic segmentation, of the region of interest layout in the image. In an ideal situation, the rough training score map obtained here can be expanded into a score map with the same spatial size as the medical images in the training set by using an upper sampling method and an interpolation method;
in step 321, the score map generated by the DCNN is compared with the labeled image included in the reference data, and a loss function is calculated according to the difference between the two. The value of the loss function reflects the difference between the predicted effect of the DCNN in training and the ideal segmentation provided by the reference data;
at step 322, the multi-channel probability map generated by the DCNN under training is evaluated to determine whether the resulting description of the location of interest is acceptable. If the prediction result is acceptable, step 330 is entered according to the flow chart. If not, step 323 is entered to train the DCNN once. The evaluation result is usually negative when the DCNN model is initially trained.
In step 323, the DCNN is trained in a back-propagation manner, in hopes that the value of the loss function of the heuristic segmentation obtained through step 320 can be reduced in the next round of evaluation. Ideally, a training of the DCNN should be able to slightly refine the settings of the model parameters in order to obtain a reduced value in the calculation of the loss function at step 322. When the training is completed, step 320, step 321 and step 322 are repeated.
In step 330, the DCNN is trained. At this point, the segmentation component 223 may use the model to complete the work of the segmentation segment 102 in the radiotherapy workflow 100. Ideally, the parameter settings of the trained DCNN may be stored in the storage 251 so that, in subsequent use, the segmentation system 210 may invoke the segmentation component 223 upon startup to reconstruct the trained DCNN model.
The training process ends at step 302.
The segmentation engine 210 may use a trained DCNN to segment two-dimensional medical images for radiotherapy.
The flow chart 400 of the segmentation process begins at step 401, which includes:
at step 402, the target set is retrieved and saved to the storage device 252. The target set includes medical images of the patient acquired from the imaging device 202, and the target set medical images are schematically illustrated in 411.
In step 403, the county segmentation component 222 reads the target set, processes the images in the target set using a set of non-deep learning methods, and generates a set of initial labeled images. The initial labeled images correspond to the medical images in the target set one-to-one, and a schematic diagram is shown in 412.
In step 404, each medical image in the target set is fused with the corresponding initial labeled image to obtain a fused image. When the medical images in the target set are two-dimensional medical images, the fused image is a two-channel image, one channel stores the two-dimensional medical images, and the other channel stores the labeling information of the images. The fused image is schematically shown in 413.
In step 405, the segmentation component 223 reads the parameter settings of the DCNN model from the storage device 251 through the process 262, and reconstructs a trained DCNN therefrom.
In step 406, the segmentation component 223 reads the fused image as input, processes the fused image using the reconstructed DCNN to generate a set of multi-channel score maps as output. Each channel of the multi-channel score map comprises a group of probability distributions for describing the prediction of the DCNN model about the position of the region of interest corresponding to the channel in the target set medical image corresponding to the map. A schematic of the multi-channel score map is shown at 414, where the coloring of each pixel reflects the score of the multi-channel score map with respect to that pixel.
In step 407, the post-segmentation component 224 takes the multi-channel score map as input and generates a multi-channel label map in a one-to-one correspondence. Each channel of the multi-channel label graph represents a region of interest corresponding to the channel, and the region of interest is located at a clear outline position in the target set medical image corresponding to the channel. A schematic of the multi-channel label graph is shown at 415, where the coloring of each pixel corresponds to the region-of-interest number, and the order of preference of the numbers is determined against the score of the multi-channel score graph for that pixel.
In step 408, the channels of each multi-channel labeled graph are fused in sequence to generate a final labeled image with a single channel in a one-to-one correspondence manner, which is used to represent the segmentation of the corresponding target set medical images. In an ideal case, each pixel in the final labeling graph is labeled with and only one region of interest number. When the interesting region ranges described by different channels are overlapped, pixels of the interesting region with higher preference order are marked for the overlapped region. Thus, all of the single-channel final annotation views taken together may be used to represent a segmentation of the target set medical images acquired in step 402.
The segmentation of the medical image for radiation therapy ends at step 410. The medical image and its segmentation results obtained in step 402 are put into an appropriate form and passed to the subsequent steps of the radiotherapy workflow, for example, treatment planning is performed according to the segmentation results.
The architecture of the DCNN model implemented in the segmentation component 223 may vary depending on the clinical needs. In general, a DCNN model is composed of multiple layers, each layer containing one or more successive transforms, each transform consisting of a set of filters. The different transformation structures are the same, and the parameters may be different, so that the image or feature map input to the layer can be gradually transformed into a single-channel feature map or image. When a layer contains multiple transforms, the input to the layer may be transformed into a multi-channel feature map or image as output. The entire DCNN model may generate a set of successive feature maps by excluding the fused image input in the component 510 from successive layers, and finally generate a pixel-accurate multi-channel probability distribution for predicting the segmentation of the target set medical image, i.e., a "multi-channel score map" 580.
Fig. 5 is one of DCNN architectures that may be employed by the present invention. The segmentation engine 220 may function using the DCNN trained accordingly.
Fig. 5 shows a framework 500 comprising seventy layers for sequentially generating 16 multi-channel intermediate feature maps from an input two-channel fused image, and finally generating a two-channel score map as a prediction for segmentation of the input image. These layers collectively contain three transforms. The first transform is the convolution + ReLU transform 501. The layers that transform feature map 531 into feature map 532 are given as an example. When the layer is transformed, the field with the size of 3 × 3 channels is extracted in the feature map 531 in an ergodic manner with step size of 1, and then the field is processed by using a convolution filter, that is, the field is convolved with a kernel matrix with the same size. The convolution result is processed by using the ReLU filter, and the processing result can be used as the pixel value in the channel corresponding to the group of filters on the multi-channel pixel corresponding to the receptive field in the multi-channel feature map output by the layer. Thus, by successively using the transform processing of the layer to traverse each of the receptive fields from the feature map 531, a multi-channel feature map having the same spatial size as the input video can be obtained as the feature map 532, which is the output of the layer. Another transform is a pooling transform 502. The layer in which the feature map 543 is converted into the feature map 551 is taken as an example. When the layer is transformed, the sense field with the size of 2 × channel number is extracted in the feature map 543 in a traversal manner with the step size of 2, so as to generate a feature map 551 with the size of one fourth of the map as the output of the layer. The third transformation occurs only in the last layer, and is used to transform the rough normalized score map 573 into a multi-channel score map having the same spatial size as the fused image of the input DCNN as an output of the layer and the DCNN using an upsampling method. Thus, the segmentation component 223 processes an input fused image using the DCNN trained according to the architecture 500 to generate a multi-channel score map as an output.
Assuming that the input two-dimensional medical image available for radiotherapy is 512 x 1 in size and is accompanied by an additional channel to store the initial label map of the image generated by the pre-segmentation component 222, the specific steps of the segmentation component 223 to process the input include:
in part 510 of the DCNN,
the DCNN reads the two-channel fusion image 510 as input. The image had a spatial size and number of channels [512 x 2 ]. One of the channels is a medical image 512, and the other channel stores an initial label map 511 of the image. When the training component 221 trains the DCNN, the medical images 512 are taken from the training set stored on the storage device 250, and the initial callout 511 is generated by the pre-segmentation component 222. When the segmentation component uses the DCNN to segment medical images for radiotherapy planning, the medical images are taken from a set of targets stored on the storage device 252, and the initial annotation map 511 is generated by the pre-segmentation component 222.
The three-layer transform is included in part 520 of the DCNN,
the first level of transformation in section 520 includes 64 sets of convolution + ReLU filters for receiving the fused image 510 as input and generating a [512 by 64] feature map 521 as output;
the second level transformation of section 520 includes 64 sets of convolution + ReLU filters for receiving the feature map 521 as input, generating a [512 by 64] feature map 522 as output;
the third tier of transformation in section 520 includes 128 sets of pooling filters for receiving the feature map 522 as input, generating a [256 x 128] feature map 531 as output;
a two-layer transform is included in part 530 of the DCNN,
the first level of transformation in section 530 comprises 128 sets of convolution + ReLU filters for receiving fused image 531 as input and generating a [256 × 128] feature map 532 as output;
the second level transformation of section 530 comprises 256 sets of pooling filters for receiving the feature map 532 as input, generating a [128 x 256] feature map 541 as output;
the three-layer transform is included in part 540 of the DCNN,
the first level transformation of section 540 comprises 256 sets of convolution + ReLU filters for receiving feature map 541 as input, generating a [128 × 256] feature map 542 as output;
the second tier transformation of section 540 comprises 256 sets of convolution + ReLU filters to receive feature map 542 as input, and generate as output a [128 × 256] feature map 543;
the third tier of transformation in section 540 comprises 512 sets of pooling filters for receiving the feature map 543 as input, generating a [64 x 512] feature map 551 as output;
the three-layer transform is included in part 550 of the DCNN,
the first level of transformation in section 550 comprises 512 sets of convolution + ReLU filters for receiving as input the signature graph 551 and generating as output a [64 x 512] signature graph 552;
the second level transformation of section 550 comprises 512 sets of convolution + ReLU filters for receiving feature map 552 as input, generating as output a feature map 553 of [64 × 512 ];
the third tier of transformation in section 550 comprises 512 sets of pooling filters receiving as input the feature map 553, generating as output a [32 x 512] feature map 561;
the three-layer transform is included in the portion 560 of the DCNN,
the first level of transformation in section 560 comprises 512 sets of convolution + ReLU filters for receiving the feature map 561 as input, generating a [32 x 512] feature map 562 as output;
the second level transformation of section 560 contains 512 sets of convolution + ReLU filters for receiving feature map 562 as input, generating as output a [32 x 512] feature map 563;
the third tier of transformation in section 560 comprises 4096 sets of pooling filters for receiving as input the profile 563 and generating as output a profile 571 of [16 x 4096 ];
a two-layer transform is included in part 570 of the DCNN,
the first level of transformation at part 570 comprises 4096 sets of convolution + ReLU filters for receiving as input the feature map 571 and generating as output a [16 x 4096] feature map 572;
the second hierarchical transformation of section 570 comprises a set of convolution + ReLU filters equal in number to the number of regions of interest, receiving as input a signature graph 572, generating as output a [16 × 16 regions of interest ] signature graph 573. The convolution of this layer + the ReLU filter is used to generate a rough scoring map for each region of interest one-to-one. For example, for the case of two regions of interest, the transformation of the layer generates a two-channel profile; one of which may be used to generate one channel 581 of the score map 580 via an upsampling and interpolation method and another channel 582 of the score map 5800.
In part 580 of the DCNN,
the coarse learned feature map 573 is received as input, and the scoring map 580 having the same spatial dimensions as the fused image 510 is obtained using the above sampling method and interpolation method.
In part 590 of the DCNN,
the scoring graph 580 is received as input, and for the multi-channel pixels in the graph, the region-of-interest numbers corresponding to the channels are selected according to the scores to label the pixels, so that a single-channel image can be obtained. Here, the score represents a probability given by the DCNN model with respect to the judgment that "the pixel belongs to the corresponding region of interest".
Thus, the DCNN model takes the medical image 512 and the corresponding initial annotation map 511 as input, and generates a multi-channel score map 580 as a prediction for the occurrence and location of the region of interest in the medical image 512.
In the training process, the training component 221 compares the multi-channel score map 580 with the labeled image about the medical image 512, which is made according to the benchmark data, calculates a loss function to represent the prediction effect, and then adjusts the parameter setting of the DCNN by using a back propagation method; the training step is repeated until the predicted effect of the DCNN is acceptable.
In actual use, the segmentation component 223 processes the medical image 512 and the initial annotation map 511 using the reconstructed DCNN model, generates a score map 580 as an output of the component, and passes it to the post-segmentation component 224. The post-segmentation component 224 receives the multi-channel score map 580 as an input, generates multi-channel labeled images in a one-to-one correspondence, and fuses the multi-channel labeled images into a final labeled image according to the scores, and outputs the final labeled image as a segmentation result of the medical image 512.
FIG. 6 is one of the implementations of the invention: a two-dimensional chest medical image is segmented using a pre-segmentation component 222, a segmentation component 223, and a post-segmentation component 224 to obtain a region of interest (heart). Wherein the DCNN model used by the segmentation component 223 is trained and can be used to identify regions of interest (heart) and regions of interest (body) in the medical images of the chest, and the preference order given to the regions of interest (heart) is higher than the regions of interest (body). In addition, the post-segmentation component 224 uses a Grab-cut model to determine the boundaries for the region of interest (heart) based on the multi-channel score map output by the segmentation component 223. The specific steps of this embodiment 600 include,
beginning with section 610, the pre-segmentation component 222 receives a two-dimensional medical image as input, generates a corresponding initial annotation map, and fuses the two to form a two-channel fused image 611 as output. Here, the fused image 611 displays the label information in an overlaid manner. The pixels labeled as region of interest (heart) are covered 613 with red regions, the parts belonging to other regions in the body are covered 614 with green regions, and the blue border represents the boundary 612 given by the pre-segmentation component 222 to the range where the heart may appear, i.e. when only the position of the region of interest (heart) is predicted, pixels outside the boundary range need not be considered. Ideally, the annotations provided by the pre-segmentation component 222 can ease the workload of the segmentation component 223.
At element 620, the segmentation component 223 processes the fused image 611 using a DCNN model to generate a multi-channel feature map as an output. This map can be viewed as a coarse scoring map, with relatively low accuracy, to predict the location of the region of interest (heart) in the input image 610.
At part 630, the diagram is a rough grading diagram, which is obtained by sequentially fusing multichannel grading diagrams. The map embodies in a colored way the prediction of the border of the region of interest in the map and the order of preference of the region of interest, i.e. the region of interest (heart) is highest, followed by the region of interest (human body) and the extracorporeal part is lowest. This figure is only intended as a schematic diagram.
At 640, an image is obtained by processing the rough segmented image 630 using the top sampling method and the bilinear difference method. In an ideal case, the pre-segmentation component 222 can complete the segmentation of a portion of the region of interest, for example, the region of interest (human body) 641. This figure is only intended as a schematic diagram.
At part 650, there is a schematic diagram of the use of a Grab-cut model for post-processing to determine boundaries for a region of interest (heart). The scoring graph is divided into three parts by two thresholds, the upper adjacent area and the lower adjacent area of each threshold are respectively represented by one letter, so that one pair of letters corresponds to one threshold, namely: a-b, c-d. The pixel value in the profile, i.e. the "score", represents the probability that one pixel belongs to the region of interest (heart). For example, the pixels in the region a 651 have a relatively high probability of belonging to the region of interest (heart), and thus have a relatively low probability of belonging to other regions of interest; the probability of pixels in region b 652 is low but close to the upper threshold 655; the pixels in area c 653 are lower and close to the lower threshold 656; the probability that region d 654 belongs to the region of interest (heart) is the lowest of the four regions. The post-segmentation component 224, implemented according to the Grab-cut model, automatically searches for a boundary within the regions b-c, while ensuring that the region a is complete, such that the regions with close scores are located as far as possible on the same side of the boundary. Ideally, the information in the initial annotation map can be used to quickly determine the attribution of a portion of the pixels, for example, the green region 614 and pixels outside the blue border 612 do not belong to the region of interest (heart).
In part 660, the post-segmentation component 224 implemented according to the Grab-cut method generates a final labeled image including the region of interest (heart), the region of interest (portion of the inside of the human body not belonging to the heart), and the external portion of the body, according to the multi-channel score map 640. The region of interest in the figure has a defined boundary.
Thus, the pre-segmentation component 222 generates an initial annotation graph according to the input medical image 610, and fuses the initial annotation graph and the medical image 610; the fused image 611 is processed by the DCNN model in the segmentation component 223 and then further processed by the Grab-cut model in the post-segmentation component 224, completing the segmentation of the region of interest (heart) in the input image 610.
FIG. 7 is one of the implementations of the invention: a two-dimensional chest medical image is segmented using a pre-segmentation component 222, a segmentation component 223, and a post-segmentation component 224 to obtain a region of interest (heart). Wherein the DCNN model used by the segmentation component 223 is trained and can be used to identify regions of interest (heart) and regions of interest (body) in the medical images of the chest, and the preference order given to the regions of interest (heart) is higher than the regions of interest (body). In addition, the post-segmentation component 224 uses a Grab-cut model to determine the boundaries for the region of interest (heart) based on the multi-channel score map output by the segmentation component 223. The specific steps of this embodiment 700 include,
beginning with section 710, the pre-segmentation component 222 receives a two-dimensional medical image as input, generates a corresponding initial annotation map, and fuses the two into a two-channel fused image 711 as output. Here, the fused image 711 displays the label information in an overlaid and colored manner. The pixels labeled as region of interest (heart) are covered 713 with red regions, the parts belonging to other regions in the body are covered 714 with green regions, and the blue border represents the boundary 712 given by the pre-segmentation component 222 to the range where the heart may appear, i.e. when only the position of the region of interest (heart) is predicted, pixels outside the boundary range need not be considered. Ideally, the annotations provided by the pre-segmentation component 222 can ease the workload of the segmentation component 223.
At element 620, the segmentation component 223 processes the fused image 711 using a DCNN model to generate a multi-channel feature map as output. This map can be viewed as a coarse scoring map, with relatively low accuracy, to predict the location of the region of interest (heart) in the input image 710.
At portion 730, the diagram is a rough and refined score map, which is obtained by sequentially fusing the multi-channel score maps. The map embodies in a colored way the prediction of the border of the region of interest in the map and the order of preference of the region of interest, i.e. the region of interest (heart) is highest, followed by the region of interest (human body) and the extracorporeal part is lowest. This figure is only intended as a schematic diagram.
At element 740, an image is obtained by processing the rough segmented image 730 using the top sampling method and the bilinear difference method. In an ideal case, the pre-segmentation component 222 can perform segmentation of a portion of a region of interest, for example, the region of interest (human body) 741. This figure is only intended as a schematic diagram.
At 750, a schematic diagram of the determination of boundaries for a region of interest (heart) using a fully connected conditional random field model, Grab-cut, for post-processing. When processing the pixel map using this model, the pixel map is considered to be a "spin" composed particle system at the pixel location, and the scoring of pixels in the scoring map is considered to be the energy that these spin particles have. The model calculates the "energy" for a pixel map by taking the pixel map as a spinning particle system, with a physical energy function. Ideally, the spin state distribution corresponding to the minimum of the energy function can be considered as a segmentation of the input image 710.
In the portion 760, the post-segmentation component 224 implemented by the fully connected conditional random field method is generated according to the multi-channel score map 740, and includes a schematic diagram of the final labeled image of the region of interest (heart), the region of interest (the portion of the inside of the human body not belonging to the heart), and the external portion of the body. The region of interest in the figure has a defined boundary. Thus, the pre-segmentation component 222 generates an initial annotation graph according to the input medical image 710, and fuses the initial annotation graph and the medical image 710; the fused image 711 is processed by the DCNN model in the segmentation component 223 and then further processed by the fully connected conditional random field model in the post-segmentation component 224, completing the segmentation of the region of interest (heart) in the input image 710.
Although specific embodiments of the invention have been described herein with reference to specific examples, it will be understood by those skilled in the art that: various embodiments that can be used in the same context can be made by substituting equivalent components or other methods in the embodiments without departing from or departing from the innovative concepts and methods of the present invention, and the embodiments can be made to function in the same way depending on the specific environment of use, requirements, available materials, composition of processing objects, or requirements for workflow. Such modifications are intended to fall within the scope of the appended claims without violating or deviating from the innovative concepts and methods presented herein.