CN113611367A - CRISPR/Cas9 off-target prediction method based on VAE data enhancement - Google Patents
CRISPR/Cas9 off-target prediction method based on VAE data enhancement Download PDFInfo
- Publication number
- CN113611367A CN113611367A CN202110898820.7A CN202110898820A CN113611367A CN 113611367 A CN113611367 A CN 113611367A CN 202110898820 A CN202110898820 A CN 202110898820A CN 113611367 A CN113611367 A CN 113611367A
- Authority
- CN
- China
- Prior art keywords
- data
- vae
- training
- layer
- sampling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108091033409 CRISPR Proteins 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000010354 CRISPR gene editing Methods 0.000 title claims abstract description 20
- 238000009826 distribution Methods 0.000 claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000005070 sampling Methods 0.000 claims abstract description 32
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 108020004414 DNA Proteins 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 108091027544 Subgenomic mRNA Proteins 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000003780 insertion Methods 0.000 claims description 3
- 230000037431 insertion Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 12
- 238000010606 normalization Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000013145 classification model Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000010453 CRISPR/Cas method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009437 off-target effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Bioethics (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Biotechnology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Machine Translation (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a variable aperture actuation (VAE) data enhancement based CRISPR/Cas9 off-target prediction method, which comprises the steps of S1, processing training data by using Pair coding; s2, pre-training the data processed in the step S1 by adopting an H-VAE model to obtain parameters of hidden variable distribution; s3, sampling a new positive sample by adopting the given posterior distribution and combining the parameters of the hidden variable distribution; s4, fusing the newly sampled positive sample with the previous training data, replacing the last full connection layer on the basis of keeping the information extraction module of the original information model, and performing combined training by using the fused data; and S5, utilizing the trained task classification result to carry out miss prediction on the new input task. The invention solves the problems of unstable learning and the like caused by class imbalance data.
Description
Technical Field
The invention relates to the technical field of computer science, in particular to a CRISPR/Cas9 off-target prediction method based on VAE data enhancement.
Background
Since the acquisition of CRISPR/Cas9 off-target data needs to be obtained by biological experiments, there are some inherent disadvantages to biological experiments, such as: high cost, slow speed, multiple uncontrollable factors and the like, which can cause that the off-target data of the CRISPR/Cas9 is very little, so that the training of the model becomes difficult. One of the problems with CRISPR/Cas9 off-target data is that the number of positive and negative samples is very different, which presents a very challenging problem for training conventional deep learning algorithms. Conventional models trained on unbalanced datasets are easy to achieve higher accuracy for most classes. However, such high accuracy is not practical. Since the results show that such models tend to perform poorly for truly important positive sample classification accuracy. In the previous research, the deep crispr adopts an oversampling method, and copies positive samples to achieve the number matched with the negative samples, or generates new positive sample data by using a SMOTE algorithm to compensate the problem of insufficient positive samples.
Disclosure of Invention
The invention aims to provide a CRISPR/Cas9 off-target prediction method based on VAE data enhancement, so as to overcome the defects in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a CRISPR/Cas9 off-target prediction method based on VAE data enhancement, comprising the steps of:
s1, processing the training data by using Pair codes;
s2, pre-training the data processed in the step S1 by adopting an H-VAE model to obtain parameters of hidden variable distribution;
s3, sampling a new positive sample by adopting the given posterior distribution and combining the parameters of the hidden variable distribution;
s4, fusing the newly sampled positive sample with the previous training data, replacing the last full connection layer on the basis of keeping the information extraction module of the original information model, and performing combined training by using the fused data;
and S5, utilizing the trained task classification result to carry out miss prediction on the new input task.
Further, in step S1, specifically, Pair processing is performed on the sgrnas and the target DNAs in the training data in a one-to-one correspondence manner by using Pair codes.
Further, the framework of the H-VAE model in step S2 includes an Embedding layer, an Encoder, and a Decoder layer; wherein the Embedding layer is composed of a word Embedding matrix, and the mapping of the input of the Embedding layer is changed from the input of Nx24 to Nx24xdhAs input to the Encoder layer; the Encoder layer consists of four blocks, and each Block consists of three operations of convolution-batch normalization-activation function; the Decoder layer is composed of four blocks, and each Block is composed of three operations of deconvolution-batch normalization-activation function.
Further, the training step of the H-VAE model comprises the following steps:
s21, inputting samples x for any batch1,...,xnDenoted by X, the dimension of X being RNx24N is the size of the batch, the output length of each sample obtained after passing through the sequence coding module is 24, the output length comprises sequence samples of three conditions of mismatching, insertion and deletion, the sample X is input into the word embedding layer, and the obtained dimension isTensor X of size1Wherein d iseIs the dimension of the word embedding layer;
s22 tensor X through word embedding1Obtaining mean value mu and variance sigma of posterior distribution through a series of convolution operations of Encoder layer2Sampling data on the posterior distribution, converting the sampling operation into a sampling result by utilizing a heavy parameter skill to participate in calculation, wherein the calculation formula is as follows:
wherein,is a gaussian distribution with a mean of 0 and a variance of 1. From N (μ, σ)2) Middle sampling z, which corresponds to sampling from N (0,1)Xi and let z ═ μ + xi × σ;
s23, after obtaining the result after sampling, inputting the result into the Decoder layer, and obtaining the result through deconvolution operationk is used to indicate the sample z corresponding to each sample x, and g can be regarded as a deconvolution process.Is a reconstructed x;
s24 adopting reconstruction lossConstraining the generator to recover the original input data according to the hidden variables, wherein:
further, step S24 includes adding a loss function to constrain the generator, the formula of the loss function is as follows:
where d is the dimension of the hidden variable, μ(i)Andrespectively representing the mean and variance of the ith component.
Further, in step S3, a plurality of different probability distributions are selected by using the parameters of the hidden variable distribution, and the plurality of different probability distributions are combined to sample the positive sample.
Compared with the prior art, the invention has the advantages that: aiming at the problem that the existing model is weak in extraction capacity of matching information of base pairs, the invention provides a deep learning framework based on Pair coding, so that the model can fully utilize the matching information of sgRNA-DNA base pairs. Meanwhile, the coding mode can also process types except mismatch off-target. Aiming at the problem that model training is extremely unstable due to extreme imbalance of data types, a Variable Amplitude Enhancement (VAE) -data-based CRISPR/Cas9 off-target prediction method is provided. After the training is converged, the mean and variance of the Gaussian distribution of the hidden space information of a minority class can be obtained. In the data expansion stage, generating random numbers of corresponding Gaussian distribution, and determining sampling variables; and inputting the sampling variables into a decoder of a variational encoder to generate similar samples, and performing mixed training on the generated samples and real data by a final classification model, thereby achieving the purposes of relieving the learning instability and the like caused by class imbalance data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart of the CRISPR/Cas9 off-target prediction method based on VAE data enhancement of the present invention.
FIG. 2 is a diagram showing a method of expressing Pair sequence in the present invention.
FIG. 3 is a diagram of the H-VAE pre-training module of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.
Referring to fig. 1, the present embodiment discloses a CRISPR/Cas9 off-target prediction method based on VAE data enhancement, comprising the following steps:
step S1, processing the training data by using Pair coding.
As shown in fig. 2, since the base sequence and the text sequence have natural similarity, the model using word embedding representation can achieve very good effect, and the word embedding method has strong representation capability, so that the sequence is represented by using the word embedding method, unlike the conventional method, the two different sequences, namely sgRNA and DNA, are not encoded separately during encoding, but the sgRNA and DNA are encoded as pairing information. In this embodiment, the sgrnas and the target DNAs are in one-to-one correspondence, and 25 different base combinations can be obtained by taking indels into account. By considering the matching information between the sequences, the embodiment can obtain an efficient pairing representation mode. After the coding information is obtained, the input words are embedded into the layer to obtain the expression of each pair of basic groups in a high-dimensional space, so that the model can have a larger assumed space in a pre-training module, and the expression capability of the model is improved.
And S2, pre-training the data processed in the step S1 by adopting an H-VAE model to obtain parameters of hidden variable distribution.
The VAE usually uses a single convolutional neural network or a recurrent neural network for encoding and decoding in image generation and sequence generation tasks, and since the information of the sequence pair is too single by using image encoding, in order to enhance the expression capability of the model, the embodiment learns the hidden variables of the positive sample by using a VAE model (H-VAE) based on mixed word embedding and convolutional neural network, so as to obtain the parameters of the hidden variable distribution of the positive sample. Therefore, the data generated by the learned distribution is used as a supplement to the original data in the training process to alleviate the class imbalance problem.
The pre-training framework of the H-VAE model is divided into an Embedding layer, an Encoder layer and a Decode layer, which are respectively introduced below.
Embedding layer: the Embedding layer is composed of a word Embedding matrix, and the mapping of the input passing through the Embedding layer is changed from the input of Nx24 to Nx24xdhAs input to the Encoder layer.
Encoder layer: the Encoder layer consists of four blocks, each of which consists of three operations of convolution-batch normalization-activation function. The convolution operation is to extract data features by a convolution kernel (convolution kernel). The region inside the convolution kernel is called the "perceptual field", and the size of the perceptual field is the size of the convolution kernel. The convolution operation is divided into two steps, local aggregation and window sliding. During local aggregation, the data in the receptive field is multiplied by matrix elements by using parameters in the convolution kernel, and then added and output to a Feature map (Feature map). After local aggregation, the convolution kernel is slid to the next region, and the step size of the sliding is specified in advance.
After the convolution operation, the next operation is the activation of the LeakyReLU, which has the effect of non-linearly mapping the result of the convolution linear transformation, and unlike the conventional ReLU activation function, the LeakyReLU does not set the number less than 0 to 0, but scales it. The problems of gradient disappearance and the like caused by ReLU can be relieved to a certain extent.
Wherein a isiIs a value that is considered to be set for controlling the zoom ratio.
The last operation is a Batch Normalization (Batch Normalization) operation. The batch normalization has the effect that the normalization operation is carried out on the hidden layer input, so that the obtained output is in the unsaturated region of the activation function, the gradient reduction is facilitated, and the network training speed is increased.
In this embodiment, it is assumed that the posterior distribution of the hidden variable is a normal distribution, and the goal of the Encoder layer is to learn the distribution. And the following Decoder layer is the slave p (z | x)k) Reducing z obtained by sampling into xk. If a posterior distribution of hidden variables can be obtained, then the distribution can be determined from p (z | x)k) Randomly sampling a series of samples, which are similar to xkIn (1). After four Block operations, the finally obtained output respectively passes through two full-connection layers, and the mean value and the variance of the posterior distribution of the hidden variables are output.
Decoder layer: the Decoder layer also consists of four blocks, each of which consists of three operations of deconvolution-batch normalization-activation function. Because data needs to be generated at a decoding layer, the obtained hidden layer input needs to be subjected to deconvolution operation, and the batch normalization and activation functions are consistent with those of an Encoder layer.
In this embodiment, the training step of the H-VAE model includes:
s21, for a batch of input samples { x1,...,xnAnd the whole is represented by X, the dimension of X is, and N is the size of the batch. Each sample is the output obtained after passing through the sequence coding module, has the length of 24 and comprises sequence samples of three conditions of mismatching, insertion and deletion. Inputting X into the word embedding layer to obtain dimension ofTensor of size, where is the dimension of the word embedding layer.
S22, obtaining the mean value mu and the variance sigma of the posterior distribution through a series of convolution operations of Encoder layers by the tensor X1 subjected to word embedding2To get the input to the Decoder layer, the data needs to be sampled over this distribution, since the sampling operation is not conducive. To train the network, a transformation is performed using a heavy parameter technique (reparameterization technique) such that transforming the sampling operation into a sampling result participates in the computation:
whereinIs subject to a Gaussian distribution with a mean of 0 and a variance of 1, and is therefore from N (μ, σ)2) Middle sampling z, N (mu, sigma)2) Is a gaussian (normal) distribution giving mean and variance, a distribution commonly used by many models, which is equivalent to sampling ξ from N (0, I) and making z ═ μ + ξ × σ. Therefore, the original distributed sampling data is changed into a series of data sampled in N (0, I) distribution, and the result of the original distributed sampling is obtained through transformation, so that the sampling operation does not need to participate in gradient descent, and the sampling result is changed to participate, so that the model can be normally trained.
S23, after obtaining the result after sampling, inputting the result into the Decoder layer because of the obtained zkIs specific to xkThus, through a series of deconvolution operations in the generator, can be obtained
S24, for the generator to learn p (x)k|zk) Similar to the AE model, reconstruction loss is requiredThe generator is constrained so that it can recover the original input data from the hidden variables. For the training of the model, the L2 distance function is chosen herein as the reconstruction loss D. In addition, unlike conventional AE models, the process of reconstructing the VAE can be noisy. If the model is optimized by simply using the reconstruction loss, the variance of the hidden variable is finally reduced to 0 by the model so as to reduce the influence of noise as much as possible, and therefore, the model is degraded into a common AE model. Thus, in addition to reconstruction loss, the VAE also tends all p (z | x) to a normal distribution, and to achieve this goal, an additional loss function, i.e., KL divergence of two normal distributions, is added in addition to the reconstruction loss:
where d is the dimension of the hidden variable, μ(i)Andrespectively representing the mean and variance of the ith component. The final loss function is therefore:
and (5) stopping the machine after certain training steps until the loss value is not reduced any more. After training is completed, the mean and variance of the hidden variable distribution of the positive sample can be obtained.
Step S3, sampling a new positive sample by using the given posterior distribution and combining the parameters of the hidden variable distribution, that is: and selecting a plurality of different probability distributions by using parameters of the hidden variables, and combining the plurality of different probability distributions to sample the positive sample, thereby relieving the problem of too few positive samples.
And step S4, fusing the newly sampled positive sample with the previous training data, replacing the last full connection layer on the basis of keeping the information extraction module of the original information model, and performing combined training by using the fused data.
Specifically, in this embodiment, after the H-VAE pre-training is completed, in order to train the final CRISPR/Cas9 off-target prediction task, on the basis of retaining the information extraction module of the original information model, the last full-junction layer is replaced, so that it can predict the off-target activity of the CRISPR/Cas 9. And simultaneously, in the training process of each batch, adding a generated sample sampled from the positive sample distribution obtained by pre-training for joint training.
And step S5, utilizing the trained task classification result to carry out miss prediction on the new input task.
Specifically, the present embodiment utilizes the finally trained model obtained from the previous step processing, and combines the manual features to process and predict new data.
Aiming at the problem that the existing model is weak in extraction capacity of matching information of base pairs, the invention provides a deep learning framework based on Pair coding, so that the model can fully utilize the matching information of sgRNA-DNA base pairs. Meanwhile, the coding mode can also process types except mismatch off-target. Aiming at the problem that model training is extremely unstable due to extreme imbalance of data types, a Variable Amplitude Enhancement (VAE) -data-based CRISPR/Cas9 off-target prediction method is provided. After the training is converged, the mean and variance of the Gaussian distribution of the hidden space information of a minority class can be obtained. In the data expansion stage, generating random numbers of corresponding Gaussian distribution, and determining sampling variables; and inputting the sampling variables into a decoder of a variational encoder to generate similar samples, and performing mixed training on the generated samples and real data by a final classification model, thereby achieving the purposes of relieving the learning instability and the like caused by class imbalance data.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, various changes or modifications may be made by the patentees within the scope of the appended claims, and within the scope of the invention, as long as they do not exceed the scope of the invention described in the claims.
Claims (6)
1. A CRISPR/Cas9 off-target prediction method based on VAE data enhancement is characterized by comprising the following steps:
s1, processing the training data by using Pair codes;
s2, pre-training the data processed in the step S1 by adopting an H-VAE model to obtain parameters of hidden variable distribution;
s3, sampling a new positive sample by adopting the given posterior distribution and combining the parameters of the hidden variable distribution;
s4, fusing the newly sampled positive sample with the previous training data, replacing the last full connection layer on the basis of keeping the information extraction module of the original information model, and performing combined training by using the fused data;
and S5, utilizing the trained task classification result to carry out miss prediction on the new input task.
2. The CRISPR/Cas9 off-target prediction method based on VAE data enhancement according to claim 1, wherein in step S1, specifically, Pair processing of sgRNA and target DNA in training data is performed in a one-to-one correspondence manner by using Pair coding.
3. The CRISPR/Cas9 off-target prediction method based on VAE data enhancement according to claim 1, wherein the framework of the H-VAE model in step S2 comprises an Embedding layer, an Encoder and a Decoder layer; wherein the Embedding layer is composed of a word Embedding matrix, and the mapping of the input of the Embedding layer is changed from the input of Nx24 to Nx24xdhAs input to the Encoder layer; the Encoder layer consists of four blocks, and each Block consists of three operations of convolution-batch normalization-activation function; the Decoder layer is composed of four blocks, and each Block is composed of three operations of deconvolution-batch normalization-activation function.
4. The CRISPR/Cas9 off-target prediction method based on VAE data enhancement according to claim 3, wherein the training step of the H-VAE model comprises:
s21, inputting samples x for any batch1,...,xnDenoted by X, the dimension of X being RNx24N is the size of the batch, the output length of each sample obtained after passing through the sequence coding module is 24, the output length comprises sequence samples of three conditions of mismatching, insertion and deletion, the sample X is input into the word embedding layer, and the obtained dimension isTensor X of size1Wherein d iseIs the dimension of the word embedding layer;
s22 tensor X through word embedding1Obtaining mean value mu and variance sigma of posterior distribution through a series of convolution operations of Encoder layer2Sampling data on the posterior distribution, converting the sampling operation into a sampling result by utilizing a heavy parameter skill to participate in calculation, wherein the calculation formula is as follows:
wherein,is a gaussian distribution with a mean of 0 and a variance of 1. From N (μ, σ)2) The middle sampling z is equivalent to sampling xi from N (0,1), and making z ═ μ + xi × σ;
s23, after obtaining the result after sampling, inputting the result into the Decoder layer, and obtaining the result through deconvolution operationk is used to indicate the sample z corresponding to each sample x, and g can be regarded as a deconvolution process.Is a reconstructed x;
s24 adopting reconstruction lossConstraining the generator to recover the original input data according to the hidden variables, wherein:
5. the method for CRISPR/Cas9 off-target prediction based on VAE data enhancement according to claim 4, further comprising adding a loss function to constrain the generator in step S24, wherein the formula of the loss function is as follows:
6. The CRISPR/Cas9 off-target prediction method based on VAE data enhancement according to claim 1, wherein the step S3 is specifically to select a plurality of different probability distributions by using parameters of implicit variable distribution, and sample a positive sample by combining the plurality of different probability distributions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110898820.7A CN113611367B (en) | 2021-08-05 | 2021-08-05 | CRISPR/Cas9 off-target prediction method based on VAE data enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110898820.7A CN113611367B (en) | 2021-08-05 | 2021-08-05 | CRISPR/Cas9 off-target prediction method based on VAE data enhancement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113611367A true CN113611367A (en) | 2021-11-05 |
CN113611367B CN113611367B (en) | 2022-12-13 |
Family
ID=78307284
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110898820.7A Active CN113611367B (en) | 2021-08-05 | 2021-08-05 | CRISPR/Cas9 off-target prediction method based on VAE data enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113611367B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114334007A (en) * | 2022-01-20 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Gene off-target prediction model training method, prediction method, device and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070912A (en) * | 2019-04-15 | 2019-07-30 | 桂林电子科技大学 | A kind of prediction technique of CRISPR/Cas9 undershooting-effect |
CN111261223A (en) * | 2020-01-12 | 2020-06-09 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
CN111258992A (en) * | 2020-01-09 | 2020-06-09 | 电子科技大学 | Seismic data expansion method based on variational self-encoder |
US20200226475A1 (en) * | 2019-01-14 | 2020-07-16 | Cambia Health Solutions, Inc. | Systems and methods for continual updating of response generation by an artificial intelligence chatbot |
CN111581962A (en) * | 2020-05-14 | 2020-08-25 | 福州大学 | Text representation method based on subject word vector and hybrid neural network |
CN111613267A (en) * | 2020-05-21 | 2020-09-01 | 中山大学 | CRISPR/Cas9 off-target prediction method based on attention mechanism |
CN111782799A (en) * | 2020-06-30 | 2020-10-16 | 湖南大学 | Enhanced text abstract generation method based on replication mechanism and variational neural reasoning |
-
2021
- 2021-08-05 CN CN202110898820.7A patent/CN113611367B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200226475A1 (en) * | 2019-01-14 | 2020-07-16 | Cambia Health Solutions, Inc. | Systems and methods for continual updating of response generation by an artificial intelligence chatbot |
CN110070912A (en) * | 2019-04-15 | 2019-07-30 | 桂林电子科技大学 | A kind of prediction technique of CRISPR/Cas9 undershooting-effect |
CN111258992A (en) * | 2020-01-09 | 2020-06-09 | 电子科技大学 | Seismic data expansion method based on variational self-encoder |
CN111261223A (en) * | 2020-01-12 | 2020-06-09 | 湖南大学 | CRISPR off-target effect prediction method based on deep learning |
CN111581962A (en) * | 2020-05-14 | 2020-08-25 | 福州大学 | Text representation method based on subject word vector and hybrid neural network |
CN111613267A (en) * | 2020-05-21 | 2020-09-01 | 中山大学 | CRISPR/Cas9 off-target prediction method based on attention mechanism |
CN111782799A (en) * | 2020-06-30 | 2020-10-16 | 湖南大学 | Enhanced text abstract generation method based on replication mechanism and variational neural reasoning |
Non-Patent Citations (4)
Title |
---|
GAO Y 等: ""Data imbalance in CRISPR off-target"", 《BRIEFINGS IN BIOINFORMATICS》 * |
LIN J 等: ""Off-target predictions in CRISPR-Cas9 gene editing using deep"", 《BIOINFORMATICS》 * |
张桂珊 等: ""机器学习方法在CRISPR/Cas9系统中的应用"", 《遗传》 * |
徐海波: ""基于机器学习的CRISPR/Cas9系统脱靶效应及靶向效率预测"", 《中国优秀硕士学位论文全文数据库基于机器学习的CRISPR/CAS9系统脱靶效应及靶向效率预测》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114334007A (en) * | 2022-01-20 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Gene off-target prediction model training method, prediction method, device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113611367B (en) | 2022-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110765966B (en) | One-stage automatic recognition and translation method for handwritten characters | |
CN107506823B (en) | Construction method of hybrid neural network model for dialog generation | |
CN106650813A (en) | Image understanding method based on depth residual error network and LSTM | |
Zhang et al. | Unsupervised representation learning from pre-trained diffusion probabilistic models | |
CN101310294A (en) | Method for training neural networks | |
CN110060657B (en) | SN-based many-to-many speaker conversion method | |
CN114443827A (en) | Local information perception dialogue method and system based on pre-training language model | |
CN111125333B (en) | Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism | |
CN111898689A (en) | Image classification method based on neural network architecture search | |
CN113822054A (en) | Chinese grammar error correction method and device based on data enhancement | |
Krokotsch et al. | Improving semi-supervised learning for remaining useful lifetime estimation through self-supervision | |
Chen et al. | Learning multiscale consistency for self-supervised electron microscopy instance segmentation | |
Wehenkel et al. | Diffusion priors in variational autoencoders | |
CN116740223A (en) | Method for generating image based on text | |
CN114170461A (en) | Teacher-student framework image classification method containing noise labels based on feature space reorganization | |
CN113611367B (en) | CRISPR/Cas9 off-target prediction method based on VAE data enhancement | |
EP4196918A1 (en) | System and method for generating parametric activation functions | |
Sarrouti | NLM at VQA-Med 2020: Visual Question Answering and Generation in the Medical Domain. | |
CN116955644A (en) | Knowledge fusion method, system and storage medium based on knowledge graph | |
Londt et al. | Evolving character-level densenet architectures using genetic programming | |
CN113204640B (en) | Text classification method based on attention mechanism | |
CN114757177B (en) | Text summarization method for generating network based on BART fusion pointer | |
CN110399619A (en) | Position coding method and computer storage medium towards neural machine translation | |
CN115101122A (en) | Protein processing method, apparatus, storage medium, and computer program product | |
CN114548293A (en) | Video-text cross-modal retrieval method based on cross-granularity self-distillation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |