CN112085067A

CN112085067A - Method for high-throughput screening of DNA damage response inhibitor

Info

Publication number: CN112085067A
Application number: CN202010829597.6A
Authority: CN
Inventors: 王毅; 王锐; 荀德金; 陈雪纯
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-08-17
Filing date: 2020-08-17
Publication date: 2020-12-15
Anticipated expiration: 2040-08-17
Also published as: CN112085067B

Abstract

The invention discloses a method for high-throughput screening of a DNA damage response inhibitor, which comprises the following steps: s1, training a cell nucleus segmentation network model based on the U-Net network; s2, constructing a cell nucleus type judgment network model and training; s3, shooting the cells after the action of the DNA damage reaction inhibitor by using high content imaging equipment to obtain an image to be analyzed; and inputting the image to be analyzed into the cell nucleus segmentation network model and then inputting into the cell nucleus type judgment network model, and counting damaged cell nucleus ratios corresponding to each DNA damage reaction inhibitor, wherein the smaller the damaged cell nucleus ratio is, the better the effect of the DNA damage reaction inhibitor is. The method can automatically perform segmentation and class decision on the images acquired by the high-content imaging equipment in batches, and can preliminarily screen out compounds with further research value through statistical analysis.

Description

Method for high-throughput screening of DNA damage response inhibitor

Technical Field

The invention relates to the technical field of DNA damage, drug screening and deep learning, in particular to a method for screening a DNA damage response inhibitor in a high-throughput manner.

Background

DNA damage is caused when organisms are subjected to various endogenous and exogenous factors (e.g., reactive oxygen species, DNA replication errors, ultraviolet radiation, ionizing radiation and genotoxic agents). The accumulation of DNA damage has been shown to be closely related to organ aging and cancer progression.

Despite the question whether inhibiting DNA damage or optimizing the DNA repair process slows aging in humans, evidence suggests that prevention of DNA damage and promotion of DNA repair are key therapeutic targets for age-related diseases, including vascular diseases, metabolic diseases, neurodegenerative diseases.

In addition, DNA damage response inhibitors (DDR) are also useful in the treatment of cancer due to the high likelihood of tumor tissue accumulating DNA damage. Therefore, the development of a rapid and accurate high-throughput DDR screening method has important academic value.

The occurrence of nuclear foci is a common indicator of DNA damage and has wide applications in biometrics, individual radiosensitivity assessment, and toxicity assessment. The formation of nuclear foci is caused by the accumulation or modification of certain DDR proteins at double strand breaks.

DDR proteins include gamma H2AX, 53BP1, RAD51, MRE11/RAD50/NBS1 complex and the like. Lesions can be visualized under a fluorescence microscope by immunofluorescence, immunohistochemical analysis or labeling methods with fluorescent proteins. In general, the number of lesions is closely related to the radiation dose, and researchers can quantify DNA damage by counting the number of lesions and counting the lesions per nucleus or per DNA region.

Currently, some automated methods that allow batch processing are not always satisfactory in some situations.

In current open source software, FoCo has a friendly graphical user interface, but because of the variation in brightness between individual cells and batch-by-batch in the acquisition setup, intensity parameters need to be adjusted manually, which often introduces large errors.

Focinator is an ImageJ-based macro that detects Foci using only the maximum criteria, and also has similar limitations of FoCo.

Findfici allows manual training of parameters, but people mark Foci (focus) is laborious and error prone, especially in situations where background interference is large. In addition, when the cell density is high, some cell nuclei adhered to each other exist, and the cell nuclei cannot be well segmented by using the threshold segmentation method.

Therefore, a method capable of processing a large amount of image data acquired by a high-content imaging platform in batch, performing image segmentation rapidly and accurately, determining whether cell nuclei are damaged, and finally performing drug screening by using statistical analysis is urgently needed.

Disclosure of Invention

The invention provides a method for screening a DNA damage response inhibitor in a high-throughput manner, which can automatically perform single cell nuclear segmentation and class decision on images acquired by high content imaging equipment in batches, and can preliminarily screen out compounds with further research value through statistical analysis.

A method for high-throughput screening of DNA damage response inhibitors comprises the following steps:

s1, training a cell nucleus segmentation network model based on the U-Net network;

s2, constructing a cell nucleus type judgment network model and training;

s3, shooting the cells after the action of the DNA damage reaction inhibitor by using high content imaging equipment to obtain an image to be analyzed; and inputting the image to be analyzed into the cell nucleus segmentation network model and then inputting into the cell nucleus type judgment network model, and counting damaged cell nucleus ratios corresponding to each DNA damage reaction inhibitor, wherein the smaller the damaged cell nucleus ratio is, the better the effect of the DNA damage reaction inhibitor is.

The cell nucleus segmentation model can automatically segment images shot by high-content imaging platforms of different models, realizes automatic adaptation of image segmentation among different batches, improves the condition that the segmentation is not performed under the condition of cell nucleus adhesion, and improves the segmentation robustness under the condition of large cell background interference.

The nuclear segmentation model provided by the invention uses a deep learning method, adopts a U-Net network architecture, comprises an encoder and a decoder, wherein the encoder can automatically extract features, the extracted features are more and more abstract along with the increase of the number of layers, higher-dimensional information is reflected, the input of the extracted features is an image acquired by a high-content imaging platform, and the output of the extracted features is an extracted feature map.

The decoder gradually restores the details and the spatial dimensions of the object, and meanwhile, the encoder and the decoder are connected quickly, so that the decoder can be helped to restore the target details better. The input is a feature map extracted by the encoder, and the output is a mask image having the same size as the input image. The mask image can be used to segment individual nuclei in an image.

The cell nucleus type judging model can judge whether the input single cell nucleus image is damaged or not, has high accuracy, reduces the time consumed by manual counting, and can output the probability of each cell nucleus corresponding to each type so as to facilitate subsequent statistical analysis.

The cell nucleus type judgment model uses a deep learning method, uses a VGG-19 network architecture, uses a convolutional neural network to extract characteristics, uses a pooling layer to zoom images, obtains higher-dimensional characteristic information after several groups of convolution pooling, finally uses the high-dimensional characteristic information to classify the images, inputs the images of single cell nucleus and outputs the type judgment result of the cell nucleus.

The method for screening the DNA damage reaction inhibitor in high flux carries out statistical analysis on the result of the nucleus type judgment, carries out comparative analysis on the result of the nucleus type judgment and the result of a control group and a positive drug, calculates the proportion of damaged nuclei in an image acquired by each DNA damage reaction inhibitor, and finally carries out sequencing for statistical analysis. The compounds with the top rank are selected to carry out subsequent efficacy verification experiments, such as dose-effect curve experiments, comet experiments and the like.

Compared with the prior art, the invention has the following effects:

the method for screening the DNA damage reaction inhibitor at high flux can automatically process images from different drug sources, has high accuracy, processes a large amount of image data acquired by a high content imaging platform in batches, and provides a foundation for drug effect experiments.

Drawings

FIG. 1 is a general flow chart of the method for screening DNA damage response inhibitors in high throughput according to the present invention, in which Focinet refers to a cell nucleus segmentation model and a cell nucleus classification determination model.

FIG. 2 is a schematic diagram of a network architecture of a nuclear segmentation model according to the present invention.

Fig. 3 is a schematic diagram of a network architecture of the cell nucleus type determination model according to the present invention.

Fig. 4 is a flow chart of DDR drug screening using the established cell nucleus segmentation model and cell nucleus type determination model in the present invention, wherein FociNet in the figure refers to the cell nucleus segmentation model and the cell nucleus type determination model.

Detailed Description

The technical solution of the present invention is further described in detail below with reference to the accompanying tables and examples. The following examples are carried out on the premise of the technical scheme of the invention, and detailed embodiments and processes are given, but the scope of the invention is not limited to the following examples.

As shown in FIG. 1, this example provides a method for high throughput screening of DNA damage response inhibitors.

And S1, training the nuclear segmentation network model based on the U-Net network.

The cell nucleus segmentation network model is used for carrying out image segmentation on image data shot by the high content imaging equipment to obtain a mask image corresponding to the image data, and then the mask image is used for cutting the original image to obtain a single cell nucleus image.

And S11, constructing a first U-Net network and a second U-Net network.

The U-Net network comprises an encoder and a decoder, wherein quick connection exists between the encoder and the decoder, the structure of the encoder comprises 4-5 subblocks, except the last subblock, each subblock comprises two convolution layers and a pooling layer, elu is used as an activation function, and a Dropout layer is added between the two convolution layers; the last subblock includes only two convolutional layers, with a Dropout layer added between them, using elu as the activation function.

The encoder can automatically extract features, the extracted features are more and more abstract along with the increase of the number of layers, higher-dimensional information is reflected, the input of the encoder is an image acquired by a high-content imaging platform, and the output of the encoder is an extracted feature map. The decoder gradually restores the details and the spatial dimensions of the object, and meanwhile, a quick connection exists between the encoder and the decoder, so that the decoder can be helped to restore the target details better. The decoder input is a feature map extracted by the encoder, and the output is a mask image of the same size as the input image.

The number of subblocks in the encoder has a certain influence on the segmentation effect of the model, the number of subblocks is too small, the model training is insufficient, the characteristics with higher dimension cannot be extracted, the number of subblocks is too large, the model training process is slow, and the model can generate more redundant parameters. The number of sub-blocks is typically 4 to 5.

As shown in fig. 2, the structure of the encoder according to this embodiment includes 5 sub-blocks, each of which includes a certain number of convolutional layers and pooling layers.

The first sub-block contains two convolutional layers and a pooling layer, with elu being used as the activation function, with a Dropout layer added between the two convolutional layers to randomly drop some features during the training process to prevent over-fitting and increase the robustness of the model. The convolutional layer can be used for extracting features, and the pooling layer is used for scaling the image to extract higher-dimensional features;

similarly, the second sub-block, the third sub-block, and the fourth sub-block also include two convolutional layers and a pooling layer, with elu being used as the activation function, and a Dropout layer being added between the two convolutional layers;

the fifth subblock contains only two convolutional layers, again with a Dropout layer added between them, using elu as the activation function.

The structure of the decoder described in this embodiment includes 4 subblocks, each subblock includes a certain number of transposed convolution layers and a shortcut connection layer, the transposed convolution layers are used to scale the feature map back to a previous size, and the shortcut connection layer is used to connect the feature map in the encoder and the scaled image of the corresponding size of the transposed convolution layers, and the decoder can be helped to better restore the target details through information sharing. The decoder is connected after the encoder, each subblock is firstly subjected to feature scaling by using a transposed convolutional layer, then is communicated with a feature map of the encoder with a corresponding size, finally is connected with two convolutional layers, a Dropout layer is added between the two convolutional layers, elu is also used for an activation function of the two convolutional layers, and finally a convolutional layer is connected after the fourth subblock to output a final mask image.

After the structures of the encoder and the decoder are built, the encoder and the decoder learn the input samples, namely, the parameter optimization of the encoder and the decoder is realized, and the encoder and the decoder capable of performing the cell nucleus segmentation can be obtained. Because the images acquired by high content do not have corresponding mask images, and manual labeling consumes a large amount of time, the existing data sets are firstly considered to be searched on the network, and two training sets are found together.

The S12, DATA-SCIENCE-BOWL-2018 dataset is first passed through a resize function and then used to train the first U-Net network.

The DATA-SCIENCE-BOWL-2018 dataset is derived from https: com/kamalkraj/DATA-SCIENCE-BOWL-2018/tree/master/dada. The method is characterized in that the background difference is large, the image source is complex, and the method can be used for a rough network to inhibit the situation that some background interference is large. The loss function is a cross entropy loss function.

Aiming at images shot by high content imaging platforms from different sources, as the sizes of the images shot by different platforms and the settings of different experimenters are different, a resize function needs to be added before a U-Net network model, for the condition of unequal length and width, the original image is cut by the resize function to obtain the images with the same length and width, then the cut images are scaled, and finally the images are unified to the size of 512 to be input into the U-Net network model.

S13, the BBBC039 picture data set is subjected to a resize function and then used for training a second U-Net network.

The contrast between the background of the BBBC039 data set and the interested area is very obvious, and the poor applicability of the model under the condition that the ground network trained by the latter data set greatly interferes with the background is well avoided due to the preprocessing of the network trained by the former data set; meanwhile, a plurality of cell nuclei are adhered in the latter set of images, so that the method is very suitable for the segmentation scene. Through the model trained by the network, the image preprocessed by the previous network can be subjected to more fine image segmentation, and finally the mask image of the original input image is obtained.

The method comprises the steps that an image of an input cell nucleus segmentation network model firstly passes through a U-Net network to obtain a first mask image; and after the image to be detected is multiplied by the second mask image, obtaining the positions of all pixel points of each communication area through a communication domain algorithm, and further cutting each communication area out independently.

Through the mask image, the positions of all pixel points contained in each communicated region in the image can be obtained by using a connected domain algorithm, and then each communicated region can be cut out independently. Because the sizes of different cell nuclei are different, the subsequent cell nucleus type judgment model is troublesome by using the respective length and width of each cell nucleus for cutting, and according to the priori knowledge of cell biology, the fact that each connected region (usually, the region of one cell nucleus and the situation that a small amount of cells cannot be divided) is placed in a 256-by-256 container is determined, and the pixel values of the other regions of the image except the extracted region of the cell nucleus are 0. Each communicating region is placed in the middle of the container. This cuts out each individual cell nucleus region from the original image.

And S2, constructing a cell nucleus type judgment network model and training.

And S21, constructing a nucleus type judgment network model based on the VGG-19 network.

As shown in fig. 3. Specifically, we adopt the network architecture of VGG-19.

The number of the sub-blocks has a certain influence on the classification effect of the model, if the number of the sub-blocks is too small, the model training is insufficient, high-dimensional features cannot be extracted, the model training is not suitable for subsequent classification decision, the number of the sub-blocks is too large, the model training process is slow, large redundant parameters can be generated, and the number of the sub-blocks is usually 4 to 5.

The cell nucleus type judgment network model is divided into 5 sub-blocks and 2 full-connection layers which are connected in sequence, wherein the first two sub-blocks respectively comprise two convolution layers and a pooling layer, the activation function of the convolution layers uses relu, the last 3 sub-blocks respectively comprise 4 convolution layers and a pooling layer, and the activation function of the convolution layers also uses relu; the activation function of the first fully-connected layer is relu, and the activation function of the latter fully-connected layer is softmax.

The input of the model is an image of individual cell nuclei obtained by post-cropping using a cell nucleus segmentation model. After 5 sub-blocks, the obtained feature map is stretched into a one-dimensional vector.

The number of layers of the full-connection layer can also influence the classification effect of the model to a certain extent, under the common condition, if the number of layers is small, the model training is insufficient, the model training is easy to be under-fitted, the good classification effect cannot be achieved, and if the number of layers is large, the model training is easy to be over-fitted, and the model training method cannot be applied to the actual classification scene. Here, in the process of optimizing the network structure, we find that features extracted by the previous 5 sub-blocks are very suitable for our classification scene, so that a good classification effect can be achieved only by using two full-connection layers, and therefore, no more full-connection layers are added. After the framework is constructed, the network structure learns the input samples, namely, the parameter optimization of the network structure is realized, and finally, a model capable of carrying out the nucleus classification is obtained through training.

S22, obtaining the single cell image data set to train the cell nucleus type judgment network model.

The loss function is a cross entropy loss function.

The contrast group and the positive drug are segmented by using a nucleus segmentation network through images shot by high content to obtain corresponding single cell nucleus images, then 2000 images are manually selected from the single cell nucleus images, each image is strictly screened and examined, the three categories are damaged, undamaged and signal-free, the nuclei of EGFP focuses which have diffuse EGFP signals and no aggregated fluorescent spots or have the aggregated fluorescent spots counted as 1 to 4 are marked as undamaged types, and the nuclei with more than 4 EGFP focuses are marked as damaged types. Nuclei without EGFP signaling or showing pan-nuclear noise are cells that do not express EGFP or that are poorly illuminated, and are therefore labeled as a no-signal type.

For labeled data sets, we use the data amplification method, rotate the original image by 90 degrees, 180 degrees and 270 degrees, amplify the final data set to 24000 (2000 × 3) × (1+3), and then randomly distribute the data set according to the ratio of 4: the proportion of 1 is divided into a training set and a verification set.

The training set directly participates in model training and is used for adjusting parameters of the model, the verification set indirectly participates in the training of the model, after each batch of training is completed, verification can be performed on the verification set and is used for adjusting hyper-parameters of the model and performing primary evaluation on the capability of the model. In addition, 300 images of single cell nuclei are additionally marked as a test set, and the test set is not involved in training and is directly used for evaluating the final model. Finally, the accuracy of the model on the training set reaches 99.03%, the accuracy on the verification set reaches 99.15%, and the accuracy on the test set reaches 99.02%. By using the trained model, the image of the single cell nucleus input later can be predicted, and the corresponding category of each cell nucleus is output.

S31, shooting the cells after the action of the DNA damage reaction inhibitor by using high content imaging equipment to obtain an image to be analyzed;

s32, inputting the image to be analyzed into the trained cell nucleus segmentation network model to obtain a single cell nucleus image;

s33, inputting the single cell nucleus image into the trained cell nucleus type judgment network model for classification decision;

s34, counting the damaged cell nucleus ratio corresponding to each DNA damage response inhibitor, wherein the smaller the damaged cell nucleus ratio is, the better the effect of the DNA damage response inhibitor is.

As shown in fig. 4, a control group, a radiation damage group and a positive drug group are selected, and shot by high content imaging equipment, and then segmented by using a trained nucleus segmentation model to obtain a series of single cell nucleus images, and then the single cell nucleus images are input into a nucleus type determination model for classification decision. And counting the proportion of damaged cell nuclei in all the images of each group.

There was a significant difference between the control group and the radiation-damaged group, with the control group having a lower proportion of damaged nuclei and the radiation-damaged group having a higher proportion of damaged nuclei. After intervention of adding the positive drug WR-1065, the proportion of damaged cell nuclei is equivalent to that of a control group, and is obviously different from that of a radiation damage group, so that the DNA damage reaction inhibitor can inhibit the DNA damage reaction process to a certain extent, and then the DNA damage reaction inhibitor can be further verified through a dose-effect curve, a comet assay and the like.

Claims

1. A method for screening a DNA damage response inhibitor in high throughput, which is characterized by comprising the following steps:

s2, constructing a cell nucleus type judgment network model and training;

2. The method for high throughput screening of DNA damage response inhibitors according to claim 1, wherein the structure of the cell nucleus segmentation network model comprises two U-Net networks connected in series, and the image to be analyzed inputted into the cell nucleus segmentation network model first passes through the first U-Net network to obtain a first mask image; and after the image to be analyzed is multiplied by the second mask image, obtaining the positions of all pixel points of each communication area through a communication domain algorithm, and further cutting out each communication area independently.

3. The method for high-throughput screening of DNA damage response inhibitors according to claim 1, wherein the U-Net network-based cell nucleus segmentation network model is trained as follows:

s11, constructing a first U-Net network and a second U-Net network;

s12, enabling the DATA-SCIENCE-BOWL-2018 DATA set to pass through a resize function, and then training a first U-Net network;

s13, the BBBC039 picture data set is first subjected to a resize function and then used for training the second U-Net network.

4. The method for high throughput screening of DNA damage response inhibitors according to claim 3, wherein the first U-Net network and the second U-Net network each comprise an encoder and a decoder, and a shortcut connection exists between the encoder and the decoder;

the structure of the encoder comprises 4-5 sub-blocks; each subblock, except the last subblock, comprises two convolutional layers and a pooling layer connected in sequence, and a Dropout layer is added between the two convolutional layers by using elu as an activation function; the last subblock includes two convolutional layers, with elu being the activation function, with a Dropout layer added between the two convolutional layers;

the structure of the decoder comprises 4-5 subblocks; each sub-block uses a transposition convolution layer firstly, then connects two convolution layers, adds a Dropout layer between the two convolution layers, the activation functions of the two convolution layers use elu, connects a convolution layer after the last sub-block, and outputs the final mask image.

5. The method for high throughput screening of DNA damage response inhibitors of claim 1, wherein said cell nucleus class determination network model is based on VGG-19 network, ResNet or DenseNet.

6. The method for high-throughput screening of DNA damage response inhibitors according to claim 1 or 5, wherein the method for constructing and training the nuclear class judgment network model comprises the following steps:

s21, constructing a nucleus type judgment network model based on the VGG-19 network;

7. The method for high throughput screening of DNA damage response inhibitors according to claim 6, wherein the obtaining of the single cell image dataset trains the nuclear class determination network model as follows:

s221, segmenting the image shot by the high content equipment by using the cell nucleus segmentation network model trained in S1 to obtain a corresponding single cell nucleus image;

s222, manually selecting 1800-2200 damaged cell nuclei, undamaged cell nuclei and no-signal images from the single cell nucleus image and marking;

s223, amplifying the original three types of images by using a data amplification method, and then dividing the images into a training set and a verification set according to the proportion;

s224, the training set judges the network model training for the cell nucleus type, adjusts the model parameters, the verification set indirectly participates in the model training, after each batch of training is finished, the verification set is used for verification, and the hyper-parameters of the model are adjusted.

8. The method for high throughput screening of DNA damage response inhibitors according to claim 1, wherein the image to be analyzed is input into the cell nucleus segmentation network model and then input into the cell nucleus type determination network model, specifically as follows: inputting an image to be analyzed into the trained cell nucleus segmentation network model to obtain a single cell nucleus image; and then inputting the single cell nucleus image into a trained cell nucleus type judgment network model for classification decision.