Nothing Special   »   [go: up one dir, main page]

CN113611367A - CRISPR/Cas9 off-target prediction method based on VAE data enhancement - Google Patents

CRISPR/Cas9 off-target prediction method based on VAE data enhancement Download PDF

Info

Publication number
CN113611367A
CN113611367A CN202110898820.7A CN202110898820A CN113611367A CN 113611367 A CN113611367 A CN 113611367A CN 202110898820 A CN202110898820 A CN 202110898820A CN 113611367 A CN113611367 A CN 113611367A
Authority
CN
China
Prior art keywords
data
vae
training
layer
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110898820.7A
Other languages
Chinese (zh)
Other versions
CN113611367B (en
Inventor
彭绍亮
向伟铭
陈东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202110898820.7A priority Critical patent/CN113611367B/en
Publication of CN113611367A publication Critical patent/CN113611367A/en
Application granted granted Critical
Publication of CN113611367B publication Critical patent/CN113611367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Bioethics (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Machine Translation (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a variable aperture actuation (VAE) data enhancement based CRISPR/Cas9 off-target prediction method, which comprises the steps of S1, processing training data by using Pair coding; s2, pre-training the data processed in the step S1 by adopting an H-VAE model to obtain parameters of hidden variable distribution; s3, sampling a new positive sample by adopting the given posterior distribution and combining the parameters of the hidden variable distribution; s4, fusing the newly sampled positive sample with the previous training data, replacing the last full connection layer on the basis of keeping the information extraction module of the original information model, and performing combined training by using the fused data; and S5, utilizing the trained task classification result to carry out miss prediction on the new input task. The invention solves the problems of unstable learning and the like caused by class imbalance data.

Description

CRISPR/Cas9 off-target prediction method based on VAE data enhancement
Technical Field
The invention relates to the technical field of computer science, in particular to a CRISPR/Cas9 off-target prediction method based on VAE data enhancement.
Background
Since the acquisition of CRISPR/Cas9 off-target data needs to be obtained by biological experiments, there are some inherent disadvantages to biological experiments, such as: high cost, slow speed, multiple uncontrollable factors and the like, which can cause that the off-target data of the CRISPR/Cas9 is very little, so that the training of the model becomes difficult. One of the problems with CRISPR/Cas9 off-target data is that the number of positive and negative samples is very different, which presents a very challenging problem for training conventional deep learning algorithms. Conventional models trained on unbalanced datasets are easy to achieve higher accuracy for most classes. However, such high accuracy is not practical. Since the results show that such models tend to perform poorly for truly important positive sample classification accuracy. In the previous research, the deep crispr adopts an oversampling method, and copies positive samples to achieve the number matched with the negative samples, or generates new positive sample data by using a SMOTE algorithm to compensate the problem of insufficient positive samples.
Disclosure of Invention
The invention aims to provide a CRISPR/Cas9 off-target prediction method based on VAE data enhancement, so as to overcome the defects in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a CRISPR/Cas9 off-target prediction method based on VAE data enhancement, comprising the steps of:
s1, processing the training data by using Pair codes;
s2, pre-training the data processed in the step S1 by adopting an H-VAE model to obtain parameters of hidden variable distribution;
s3, sampling a new positive sample by adopting the given posterior distribution and combining the parameters of the hidden variable distribution;
s4, fusing the newly sampled positive sample with the previous training data, replacing the last full connection layer on the basis of keeping the information extraction module of the original information model, and performing combined training by using the fused data;
and S5, utilizing the trained task classification result to carry out miss prediction on the new input task.
Further, in step S1, specifically, Pair processing is performed on the sgrnas and the target DNAs in the training data in a one-to-one correspondence manner by using Pair codes.
Further, the framework of the H-VAE model in step S2 includes an Embedding layer, an Encoder, and a Decoder layer; wherein the Embedding layer is composed of a word Embedding matrix, and the mapping of the input of the Embedding layer is changed from the input of Nx24 to Nx24xdhAs input to the Encoder layer; the Encoder layer consists of four blocks, and each Block consists of three operations of convolution-batch normalization-activation function; the Decoder layer is composed of four blocks, and each Block is composed of three operations of deconvolution-batch normalization-activation function.
Further, the training step of the H-VAE model comprises the following steps:
s21, inputting samples x for any batch1,...,xnDenoted by X, the dimension of X being RNx24N is the size of the batch, the output length of each sample obtained after passing through the sequence coding module is 24, the output length comprises sequence samples of three conditions of mismatching, insertion and deletion, the sample X is input into the word embedding layer, and the obtained dimension is
Figure BDA0003198964110000021
Tensor X of size1Wherein d iseIs the dimension of the word embedding layer;
s22 tensor X through word embedding1Obtaining mean value mu and variance sigma of posterior distribution through a series of convolution operations of Encoder layer2Sampling data on the posterior distribution, converting the sampling operation into a sampling result by utilizing a heavy parameter skill to participate in calculation, wherein the calculation formula is as follows:
Figure BDA0003198964110000022
wherein,
Figure BDA0003198964110000023
is a gaussian distribution with a mean of 0 and a variance of 1. From N (μ, σ)2) Middle sampling z, which corresponds to sampling from N (0,1)Xi and let z ═ μ + xi × σ;
s23, after obtaining the result after sampling, inputting the result into the Decoder layer, and obtaining the result through deconvolution operation
Figure BDA0003198964110000024
k is used to indicate the sample z corresponding to each sample x, and g can be regarded as a deconvolution process.
Figure BDA0003198964110000025
Is a reconstructed x;
s24 adopting reconstruction loss
Figure BDA0003198964110000026
Constraining the generator to recover the original input data according to the hidden variables, wherein:
Figure BDA0003198964110000027
in training the calculation, use
Figure BDA0003198964110000028
To calculate:
Figure BDA0003198964110000029
further, step S24 includes adding a loss function to constrain the generator, the formula of the loss function is as follows:
Figure BDA00031989641100000210
where d is the dimension of the hidden variable, μ(i)And
Figure BDA0003198964110000031
respectively representing the mean and variance of the ith component.
Further, in step S3, a plurality of different probability distributions are selected by using the parameters of the hidden variable distribution, and the plurality of different probability distributions are combined to sample the positive sample.
Compared with the prior art, the invention has the advantages that: aiming at the problem that the existing model is weak in extraction capacity of matching information of base pairs, the invention provides a deep learning framework based on Pair coding, so that the model can fully utilize the matching information of sgRNA-DNA base pairs. Meanwhile, the coding mode can also process types except mismatch off-target. Aiming at the problem that model training is extremely unstable due to extreme imbalance of data types, a Variable Amplitude Enhancement (VAE) -data-based CRISPR/Cas9 off-target prediction method is provided. After the training is converged, the mean and variance of the Gaussian distribution of the hidden space information of a minority class can be obtained. In the data expansion stage, generating random numbers of corresponding Gaussian distribution, and determining sampling variables; and inputting the sampling variables into a decoder of a variational encoder to generate similar samples, and performing mixed training on the generated samples and real data by a final classification model, thereby achieving the purposes of relieving the learning instability and the like caused by class imbalance data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart of the CRISPR/Cas9 off-target prediction method based on VAE data enhancement of the present invention.
FIG. 2 is a diagram showing a method of expressing Pair sequence in the present invention.
FIG. 3 is a diagram of the H-VAE pre-training module of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.
Referring to fig. 1, the present embodiment discloses a CRISPR/Cas9 off-target prediction method based on VAE data enhancement, comprising the following steps:
step S1, processing the training data by using Pair coding.
As shown in fig. 2, since the base sequence and the text sequence have natural similarity, the model using word embedding representation can achieve very good effect, and the word embedding method has strong representation capability, so that the sequence is represented by using the word embedding method, unlike the conventional method, the two different sequences, namely sgRNA and DNA, are not encoded separately during encoding, but the sgRNA and DNA are encoded as pairing information. In this embodiment, the sgrnas and the target DNAs are in one-to-one correspondence, and 25 different base combinations can be obtained by taking indels into account. By considering the matching information between the sequences, the embodiment can obtain an efficient pairing representation mode. After the coding information is obtained, the input words are embedded into the layer to obtain the expression of each pair of basic groups in a high-dimensional space, so that the model can have a larger assumed space in a pre-training module, and the expression capability of the model is improved.
And S2, pre-training the data processed in the step S1 by adopting an H-VAE model to obtain parameters of hidden variable distribution.
The VAE usually uses a single convolutional neural network or a recurrent neural network for encoding and decoding in image generation and sequence generation tasks, and since the information of the sequence pair is too single by using image encoding, in order to enhance the expression capability of the model, the embodiment learns the hidden variables of the positive sample by using a VAE model (H-VAE) based on mixed word embedding and convolutional neural network, so as to obtain the parameters of the hidden variable distribution of the positive sample. Therefore, the data generated by the learned distribution is used as a supplement to the original data in the training process to alleviate the class imbalance problem.
The pre-training framework of the H-VAE model is divided into an Embedding layer, an Encoder layer and a Decode layer, which are respectively introduced below.
Embedding layer: the Embedding layer is composed of a word Embedding matrix, and the mapping of the input passing through the Embedding layer is changed from the input of Nx24 to Nx24xdhAs input to the Encoder layer.
Encoder layer: the Encoder layer consists of four blocks, each of which consists of three operations of convolution-batch normalization-activation function. The convolution operation is to extract data features by a convolution kernel (convolution kernel). The region inside the convolution kernel is called the "perceptual field", and the size of the perceptual field is the size of the convolution kernel. The convolution operation is divided into two steps, local aggregation and window sliding. During local aggregation, the data in the receptive field is multiplied by matrix elements by using parameters in the convolution kernel, and then added and output to a Feature map (Feature map). After local aggregation, the convolution kernel is slid to the next region, and the step size of the sliding is specified in advance.
After the convolution operation, the next operation is the activation of the LeakyReLU, which has the effect of non-linearly mapping the result of the convolution linear transformation, and unlike the conventional ReLU activation function, the LeakyReLU does not set the number less than 0 to 0, but scales it. The problems of gradient disappearance and the like caused by ReLU can be relieved to a certain extent.
Figure BDA0003198964110000041
Wherein a isiIs a value that is considered to be set for controlling the zoom ratio.
The last operation is a Batch Normalization (Batch Normalization) operation. The batch normalization has the effect that the normalization operation is carried out on the hidden layer input, so that the obtained output is in the unsaturated region of the activation function, the gradient reduction is facilitated, and the network training speed is increased.
In this embodiment, it is assumed that the posterior distribution of the hidden variable is a normal distribution, and the goal of the Encoder layer is to learn the distribution. And the following Decoder layer is the slave p (z | x)k) Reducing z obtained by sampling into xk. If a posterior distribution of hidden variables can be obtained, then the distribution can be determined from p (z | x)k) Randomly sampling a series of samples, which are similar to xkIn (1). After four Block operations, the finally obtained output respectively passes through two full-connection layers, and the mean value and the variance of the posterior distribution of the hidden variables are output.
Decoder layer: the Decoder layer also consists of four blocks, each of which consists of three operations of deconvolution-batch normalization-activation function. Because data needs to be generated at a decoding layer, the obtained hidden layer input needs to be subjected to deconvolution operation, and the batch normalization and activation functions are consistent with those of an Encoder layer.
In this embodiment, the training step of the H-VAE model includes:
s21, for a batch of input samples { x1,...,xnAnd the whole is represented by X, the dimension of X is, and N is the size of the batch. Each sample is the output obtained after passing through the sequence coding module, has the length of 24 and comprises sequence samples of three conditions of mismatching, insertion and deletion. Inputting X into the word embedding layer to obtain dimension of
Figure BDA0003198964110000051
Tensor of size, where is the dimension of the word embedding layer.
S22, obtaining the mean value mu and the variance sigma of the posterior distribution through a series of convolution operations of Encoder layers by the tensor X1 subjected to word embedding2To get the input to the Decoder layer, the data needs to be sampled over this distribution, since the sampling operation is not conducive. To train the network, a transformation is performed using a heavy parameter technique (reparameterization technique) such that transforming the sampling operation into a sampling result participates in the computation:
Figure BDA0003198964110000052
wherein
Figure BDA0003198964110000053
Is subject to a Gaussian distribution with a mean of 0 and a variance of 1, and is therefore from N (μ, σ)2) Middle sampling z, N (mu, sigma)2) Is a gaussian (normal) distribution giving mean and variance, a distribution commonly used by many models, which is equivalent to sampling ξ from N (0, I) and making z ═ μ + ξ × σ. Therefore, the original distributed sampling data is changed into a series of data sampled in N (0, I) distribution, and the result of the original distributed sampling is obtained through transformation, so that the sampling operation does not need to participate in gradient descent, and the sampling result is changed to participate, so that the model can be normally trained.
S23, after obtaining the result after sampling, inputting the result into the Decoder layer because of the obtained zkIs specific to xkThus, through a series of deconvolution operations in the generator, can be obtained
Figure BDA0003198964110000054
S24, for the generator to learn p (x)k|zk) Similar to the AE model, reconstruction loss is required
Figure BDA0003198964110000055
The generator is constrained so that it can recover the original input data from the hidden variables. For the training of the model, the L2 distance function is chosen herein as the reconstruction loss D. In addition, unlike conventional AE models, the process of reconstructing the VAE can be noisy. If the model is optimized by simply using the reconstruction loss, the variance of the hidden variable is finally reduced to 0 by the model so as to reduce the influence of noise as much as possible, and therefore, the model is degraded into a common AE model. Thus, in addition to reconstruction loss, the VAE also tends all p (z | x) to a normal distribution, and to achieve this goal, an additional loss function, i.e., KL divergence of two normal distributions, is added in addition to the reconstruction loss:
Figure BDA0003198964110000061
where d is the dimension of the hidden variable, μ(i)And
Figure BDA0003198964110000062
respectively representing the mean and variance of the ith component. The final loss function is therefore:
Figure BDA0003198964110000063
and (5) stopping the machine after certain training steps until the loss value is not reduced any more. After training is completed, the mean and variance of the hidden variable distribution of the positive sample can be obtained.
Step S3, sampling a new positive sample by using the given posterior distribution and combining the parameters of the hidden variable distribution, that is: and selecting a plurality of different probability distributions by using parameters of the hidden variables, and combining the plurality of different probability distributions to sample the positive sample, thereby relieving the problem of too few positive samples.
And step S4, fusing the newly sampled positive sample with the previous training data, replacing the last full connection layer on the basis of keeping the information extraction module of the original information model, and performing combined training by using the fused data.
Specifically, in this embodiment, after the H-VAE pre-training is completed, in order to train the final CRISPR/Cas9 off-target prediction task, on the basis of retaining the information extraction module of the original information model, the last full-junction layer is replaced, so that it can predict the off-target activity of the CRISPR/Cas 9. And simultaneously, in the training process of each batch, adding a generated sample sampled from the positive sample distribution obtained by pre-training for joint training.
And step S5, utilizing the trained task classification result to carry out miss prediction on the new input task.
Specifically, the present embodiment utilizes the finally trained model obtained from the previous step processing, and combines the manual features to process and predict new data.
Aiming at the problem that the existing model is weak in extraction capacity of matching information of base pairs, the invention provides a deep learning framework based on Pair coding, so that the model can fully utilize the matching information of sgRNA-DNA base pairs. Meanwhile, the coding mode can also process types except mismatch off-target. Aiming at the problem that model training is extremely unstable due to extreme imbalance of data types, a Variable Amplitude Enhancement (VAE) -data-based CRISPR/Cas9 off-target prediction method is provided. After the training is converged, the mean and variance of the Gaussian distribution of the hidden space information of a minority class can be obtained. In the data expansion stage, generating random numbers of corresponding Gaussian distribution, and determining sampling variables; and inputting the sampling variables into a decoder of a variational encoder to generate similar samples, and performing mixed training on the generated samples and real data by a final classification model, thereby achieving the purposes of relieving the learning instability and the like caused by class imbalance data.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, various changes or modifications may be made by the patentees within the scope of the appended claims, and within the scope of the invention, as long as they do not exceed the scope of the invention described in the claims.

Claims (6)

1. A CRISPR/Cas9 off-target prediction method based on VAE data enhancement is characterized by comprising the following steps:
s1, processing the training data by using Pair codes;
s2, pre-training the data processed in the step S1 by adopting an H-VAE model to obtain parameters of hidden variable distribution;
s3, sampling a new positive sample by adopting the given posterior distribution and combining the parameters of the hidden variable distribution;
s4, fusing the newly sampled positive sample with the previous training data, replacing the last full connection layer on the basis of keeping the information extraction module of the original information model, and performing combined training by using the fused data;
and S5, utilizing the trained task classification result to carry out miss prediction on the new input task.
2. The CRISPR/Cas9 off-target prediction method based on VAE data enhancement according to claim 1, wherein in step S1, specifically, Pair processing of sgRNA and target DNA in training data is performed in a one-to-one correspondence manner by using Pair coding.
3. The CRISPR/Cas9 off-target prediction method based on VAE data enhancement according to claim 1, wherein the framework of the H-VAE model in step S2 comprises an Embedding layer, an Encoder and a Decoder layer; wherein the Embedding layer is composed of a word Embedding matrix, and the mapping of the input of the Embedding layer is changed from the input of Nx24 to Nx24xdhAs input to the Encoder layer; the Encoder layer consists of four blocks, and each Block consists of three operations of convolution-batch normalization-activation function; the Decoder layer is composed of four blocks, and each Block is composed of three operations of deconvolution-batch normalization-activation function.
4. The CRISPR/Cas9 off-target prediction method based on VAE data enhancement according to claim 3, wherein the training step of the H-VAE model comprises:
s21, inputting samples x for any batch1,...,xnDenoted by X, the dimension of X being RNx24N is the size of the batch, the output length of each sample obtained after passing through the sequence coding module is 24, the output length comprises sequence samples of three conditions of mismatching, insertion and deletion, the sample X is input into the word embedding layer, and the obtained dimension is
Figure FDA0003198964100000012
Tensor X of size1Wherein d iseIs the dimension of the word embedding layer;
s22 tensor X through word embedding1Obtaining mean value mu and variance sigma of posterior distribution through a series of convolution operations of Encoder layer2Sampling data on the posterior distribution, converting the sampling operation into a sampling result by utilizing a heavy parameter skill to participate in calculation, wherein the calculation formula is as follows:
Figure FDA0003198964100000011
wherein,
Figure FDA0003198964100000021
is a gaussian distribution with a mean of 0 and a variance of 1. From N (μ, σ)2) The middle sampling z is equivalent to sampling xi from N (0,1), and making z ═ μ + xi × σ;
s23, after obtaining the result after sampling, inputting the result into the Decoder layer, and obtaining the result through deconvolution operation
Figure FDA0003198964100000022
k is used to indicate the sample z corresponding to each sample x, and g can be regarded as a deconvolution process.
Figure FDA0003198964100000023
Is a reconstructed x;
s24 adopting reconstruction loss
Figure FDA0003198964100000024
Constraining the generator to recover the original input data according to the hidden variables, wherein:
Figure FDA0003198964100000025
in training the calculation, use
Figure FDA0003198964100000026
To calculate:
Figure FDA0003198964100000027
5. the method for CRISPR/Cas9 off-target prediction based on VAE data enhancement according to claim 4, further comprising adding a loss function to constrain the generator in step S24, wherein the formula of the loss function is as follows:
Figure FDA0003198964100000028
where d is the dimension of the hidden variable, μ(i)And
Figure FDA0003198964100000029
respectively representing the mean and variance of the ith component.
6. The CRISPR/Cas9 off-target prediction method based on VAE data enhancement according to claim 1, wherein the step S3 is specifically to select a plurality of different probability distributions by using parameters of implicit variable distribution, and sample a positive sample by combining the plurality of different probability distributions.
CN202110898820.7A 2021-08-05 2021-08-05 CRISPR/Cas9 off-target prediction method based on VAE data enhancement Active CN113611367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110898820.7A CN113611367B (en) 2021-08-05 2021-08-05 CRISPR/Cas9 off-target prediction method based on VAE data enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110898820.7A CN113611367B (en) 2021-08-05 2021-08-05 CRISPR/Cas9 off-target prediction method based on VAE data enhancement

Publications (2)

Publication Number Publication Date
CN113611367A true CN113611367A (en) 2021-11-05
CN113611367B CN113611367B (en) 2022-12-13

Family

ID=78307284

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110898820.7A Active CN113611367B (en) 2021-08-05 2021-08-05 CRISPR/Cas9 off-target prediction method based on VAE data enhancement

Country Status (1)

Country Link
CN (1) CN113611367B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114334007A (en) * 2022-01-20 2022-04-12 腾讯科技(深圳)有限公司 Gene off-target prediction model training method, prediction method, device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070912A (en) * 2019-04-15 2019-07-30 桂林电子科技大学 A kind of prediction technique of CRISPR/Cas9 undershooting-effect
CN111261223A (en) * 2020-01-12 2020-06-09 湖南大学 CRISPR off-target effect prediction method based on deep learning
CN111258992A (en) * 2020-01-09 2020-06-09 电子科技大学 Seismic data expansion method based on variational self-encoder
US20200226475A1 (en) * 2019-01-14 2020-07-16 Cambia Health Solutions, Inc. Systems and methods for continual updating of response generation by an artificial intelligence chatbot
CN111581962A (en) * 2020-05-14 2020-08-25 福州大学 Text representation method based on subject word vector and hybrid neural network
CN111613267A (en) * 2020-05-21 2020-09-01 中山大学 CRISPR/Cas9 off-target prediction method based on attention mechanism
CN111782799A (en) * 2020-06-30 2020-10-16 湖南大学 Enhanced text abstract generation method based on replication mechanism and variational neural reasoning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200226475A1 (en) * 2019-01-14 2020-07-16 Cambia Health Solutions, Inc. Systems and methods for continual updating of response generation by an artificial intelligence chatbot
CN110070912A (en) * 2019-04-15 2019-07-30 桂林电子科技大学 A kind of prediction technique of CRISPR/Cas9 undershooting-effect
CN111258992A (en) * 2020-01-09 2020-06-09 电子科技大学 Seismic data expansion method based on variational self-encoder
CN111261223A (en) * 2020-01-12 2020-06-09 湖南大学 CRISPR off-target effect prediction method based on deep learning
CN111581962A (en) * 2020-05-14 2020-08-25 福州大学 Text representation method based on subject word vector and hybrid neural network
CN111613267A (en) * 2020-05-21 2020-09-01 中山大学 CRISPR/Cas9 off-target prediction method based on attention mechanism
CN111782799A (en) * 2020-06-30 2020-10-16 湖南大学 Enhanced text abstract generation method based on replication mechanism and variational neural reasoning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GAO Y 等: ""Data imbalance in CRISPR off-target"", 《BRIEFINGS IN BIOINFORMATICS》 *
LIN J 等: ""Off-target predictions in CRISPR-Cas9 gene editing using deep"", 《BIOINFORMATICS》 *
张桂珊 等: ""机器学习方法在CRISPR/Cas9系统中的应用"", 《遗传》 *
徐海波: ""基于机器学习的CRISPR/Cas9系统脱靶效应及靶向效率预测"", 《中国优秀硕士学位论文全文数据库基于机器学习的CRISPR/CAS9系统脱靶效应及靶向效率预测》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114334007A (en) * 2022-01-20 2022-04-12 腾讯科技(深圳)有限公司 Gene off-target prediction model training method, prediction method, device and electronic equipment

Also Published As

Publication number Publication date
CN113611367B (en) 2022-12-13

Similar Documents

Publication Publication Date Title
CN110765966B (en) One-stage automatic recognition and translation method for handwritten characters
CN107506823B (en) Construction method of hybrid neural network model for dialog generation
CN106650813A (en) Image understanding method based on depth residual error network and LSTM
Zhang et al. Unsupervised representation learning from pre-trained diffusion probabilistic models
CN101310294A (en) Method for training neural networks
CN110060657B (en) SN-based many-to-many speaker conversion method
CN114443827A (en) Local information perception dialogue method and system based on pre-training language model
CN111125333B (en) Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN111898689A (en) Image classification method based on neural network architecture search
CN113822054A (en) Chinese grammar error correction method and device based on data enhancement
Krokotsch et al. Improving semi-supervised learning for remaining useful lifetime estimation through self-supervision
Chen et al. Learning multiscale consistency for self-supervised electron microscopy instance segmentation
Wehenkel et al. Diffusion priors in variational autoencoders
CN116740223A (en) Method for generating image based on text
CN114170461A (en) Teacher-student framework image classification method containing noise labels based on feature space reorganization
CN113611367B (en) CRISPR/Cas9 off-target prediction method based on VAE data enhancement
EP4196918A1 (en) System and method for generating parametric activation functions
Sarrouti NLM at VQA-Med 2020: Visual Question Answering and Generation in the Medical Domain.
CN116955644A (en) Knowledge fusion method, system and storage medium based on knowledge graph
Londt et al. Evolving character-level densenet architectures using genetic programming
CN113204640B (en) Text classification method based on attention mechanism
CN114757177B (en) Text summarization method for generating network based on BART fusion pointer
CN110399619A (en) Position coding method and computer storage medium towards neural machine translation
CN115101122A (en) Protein processing method, apparatus, storage medium, and computer program product
CN114548293A (en) Video-text cross-modal retrieval method based on cross-granularity self-distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant