CN105760712A - Copy number variation detection method based on next generation sequencing - Google Patents
Copy number variation detection method based on next generation sequencing Download PDFInfo
- Publication number
- CN105760712A CN105760712A CN201610114354.8A CN201610114354A CN105760712A CN 105760712 A CN105760712 A CN 105760712A CN 201610114354 A CN201610114354 A CN 201610114354A CN 105760712 A CN105760712 A CN 105760712A
- Authority
- CN
- China
- Prior art keywords
- copy number
- sample
- cnv
- site
- statistic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Chemical & Material Sciences (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
The invention discloses a copy number variation detection method based on next generation sequencing. The method comprises the following steps: pre-processing copy number variation data, constructing a sliding window, calculating of statistics, implementing a replacement policy, constructing zero distribution, and carrying out performance evaluation of an algorithm. The performance evaluation of the algorithm comprises the steps of judging whether a relatively high correct positive rate can be acquired by the algorithm under the condition that a false positive rate is controllable, evaluating whether the algorithm can relatively accurately estimate a p value or not, detecting a boundary detection capability of copy number variation, and analyzing the calculation complexity of the algorithm. With the adoption of the copy number variation detection method, the problem of copy number variation detection errors, caused by the fact that sequencing platforms and sequencing levels are different, is solved, and a result is relatively accurate; data is normalized by utilizing characteristics of a multi-peak frequency histogram, so that a normal region and a copy number variation region are accurately divided; and a new model is established by a comprehensive effect of relevance between a variation reads number and a variation site, so that the inconsistency problem is solved, and the remarkable level of copy number variation is objectively estimated.
Description
Technical field
The invention belongs to DNA molecular and carry out the high throughput sequencing technologies field of sequencing, particularly relate to a kind of copy number mutation detection method based on new-generation sequencing.
Background technology
Copy number variation (copynumbervariation, CNV) is the important phenomenon in cancer gene group.Its main manifestations is amplification and the disappearance two states of copy number, and generation, development with cancerous cell have close ties.Detecting the concurrent CNV of same area in multiple cancer sample the impact that confluence analysis CNV is on full-length genome expression, identify that those are affected the cancer gene of expression by CNV, this has great importance for the generation and transfer studying cancer.Although the CNV detection method based on single sample is more and more ripe, but these methods still can not meet multiple sample in detection sensitivity and degree of accuracy etc. there is the detection in CNV region jointly, therefore, CNV carrying out analyzing of system and provides important channel for the pathogenesis studying cancer from molecular level, its bottom, most crucial problem are how to detect CNV relevant to tumor-related gene in multiple cancer sample.
New-generation sequencing (NextGenerationSequencing, NGS) technology is once to obtain the high throughput sequencing technologies of up to a million the even short sequence information of millions of, has high speed, high-resolution, low cost, repeatable advantages of higher.Therefore, study detection CNV based on NGS data and substantially increase speed and accuracy, also reduce cost simultaneously.
Numerous researchs show, CNV functional mode is often implied in the consistent variation region of cancer gene group sample, and in NGS comparison to the proportional relation of the sequential digit values in each region of genome and the copy numerical value in this region, so set up the computational methods based on theory of statistics, detection CNV concurrent (Common) significance level in multiple cancer samples, for identifying CNV functional mode and finding that potential cancer gene provides direct, feasible technological means, and then provide important information for the biological physician prediction to cancer and diagnosis.Therefore, setting up rationally and effectively, statistical inspection model is most important.
The intensive in high flux full-length genome CNV site and the complexity of structure thereof, bring great challenge to the detection of the foundation of statistical inspection model and CNV significance, be mainly reflected in following two aspect.First, the difficult point of problem itself: a) number of loci more than up to 180 ten thousand and sample number is often less, define the data general layout of a kind of high latitude small sample;B) systematic error that order-checking platform and order-checking level difference are brought, and the sample of difference order-checking level is normalized;C) the reads signal (readdepth, RD) that gene loci is corresponding is vulnerable to the effect of noise such as order-checking mistake, comparison mistake;D) there is stronger relatedness between CNV site, and dependent so that there is reciprocal effect between detecting factor;E) amplification of detection copy number or miss status to consider the feature of two aspects, i.e. relatedness between site correspondence reads number and site, this requires the mechanism of a rational balance the two feature.Second, solve the theory of problem and the challenge of method: a) data scale is big, the effectively control to calculating Time & Space Complexity is a challenge;B) how to take into full account the relatedness between CNV site, reduce the conservative that CNV significance level is estimated, be a difficulties;C) how to set up null hypothesis distribution consistent with statistic, strengthen the statistical significance that significance level is estimated, be an emphasis and the problem not yet broken through at present.
Analyzing technically, consider from sample size, current existing copy mutation detection method is broadly divided into the CNV detection method below based on single sample analysis and the method based on multisample.Mainly have technically: the copy number detection method of the detection method based on fluorescence sites hybridization technique, the Comparative genomic hybridization based on microarray and gene new-generation sequencing technology.First two method resolution is very low and is difficult to detect short CNV, and the method based on NGS more highlights because it has high-throughout advantage.CNV detection method based on NGS is broadly divided into based on PEM (pair-endmapping) signature with based on two kinds of technology paths of DOC (depthofcoverage).Although the method based on PEM is capable of detecting when the CNV of small fragment but is difficult to the insertion (copy number amplification) of detection large fragment and the CNV (such as SDs) of complex region.The CNV of large fragment can be detected based on the method for DOC.Therefore there is also the method combined both some, such as CNVer, improve the breakpoint accuracy rate in CNV region by integrating DOC and PEM signature.The method being currently based on DOC is more exposed to favor.
DOC detection model based on segmentation relates generally to different dividing methods, such as CBS, LASSO etc..The testing result that different dividing methods produces also is not quite similar.As ReadDepth adopts CBS partitioning algorithm can identify the border that copy number makes a variation more accurately, when detecting low coverage data, still there is higher sensitivity and specificity.The constraint of the uncontrolled sample of FREEC method, adopts LASSO to return accurate CNV border, but ignores local reads number variation, easily cause error detection;Be likely to simultaneously be subject to sub-clone affect G/C content standardization so that affect CNV detection.Segseq method and rSW-seq method, owing to directly making comparisons with control sample, can quickly detect and accurately identify CNV region, but it does not account for the local feature feature of multiple sample, causes that resultant error is very big.Due to sequencing technologies and genomic local feature feature, partitioning algorithm can make the false positive of result higher.SeqCNA does not require to control sample yet, adopts LOESS or polymorphic matching to be applicable to the CNV of detection local small fragment, but is not suitable for detection cancer sample data.
Based on the assumption that the DOC statistical significance model of inspection is mainly concerned with two key elements, i.e. statistic of test and zero cloth, the quality of they designs directly influences the effectiveness of significance level estimation and the qualification performance of CNV functional mode.The EWT method RD fitted Gaussian probability Distribution Model to continuous fragment (window), adopt monolateral Z-test inspection CNV, the copy number variable region of large fragment can be detected, but EWT does not account for the relatedness between site, it is impossible to accurately detect the position of insertion (CNV) and the CNV of small fragment is insensitive.CNV-seq method RD ratio (with sample for reference) the matching Poisson distribution model to non-overlapping segment (window), the significance calculating Z-score is simultaneously introduced partitioning algorithm to detect CNV, improve the sensitivity that low coverage data is detected, but easily improve false positive.CNA-seg, based on the HMM method of segseq and JointSLM, is simultaneously introduced card side χ2Statistic detection CNV.
The detection method being currently based on the common CNV of multisample of DOC is still not as ripe, and detection method mainly has CMDS method [17], cn.MOPS method, JointSLM method and the detection method etc. based on punishment sparse regression model.Wherein the Single locus of multiple samples is built correlation diagonal matrix and calculates its significance to detect CNV by CMDS method, and accuracy rate is higher compared with detecting single sample, improves the cost performance of time and space complexity simultaneously.Cn.MOPS method reduces the influence of noise of technology and biomutation, it is adaptable to detect the CNV that multiple sample same area variation amplitude is inconsistent, and the CNV that amplitude is consistent is insensitive.JointSLM method is the EWT extension detected at multisample, is simultaneously introduced hidden Markov model (HMM) and detects CNV, but when there is common CNV in part sample, it is felt simply helpless.Detection method based on penalty coefficient regression model is one the penalized regression model of RDsignal matching to multiple samples; commonCNV (cCNV) border detection will be converted into change point (changepoint) test problems and utilize significance test method to detect, thus improve accuracy rate and reducing false discovery rate.But but its accuracy rate can decline during ancestors' difference of multiple sample data.
By to existing these based on DOC model [3,7,9-27] com-parison and analysis it can be seen that major part method can produce a significantly high false discovery rate, especially when without reference to sample, feature is especially prominent.The existing significance model based on NGS, is all with CNV structure fragment for detection primitive when designing statistic, and employs the information of relatedness between the frequency of CNV and amplitude and CNV site when quantitative statistics amount.For the structure of zero cloth, most methods are all realized by random permutation strategy.
Analyze from the biological characteristic of CNV data, between CNV site independently, namely contiguous CNV site is an organic whole, then be difficult to the objective significance level estimating CNV with Single locus for detection primitive, easily ignores again the relatedness in inside configuration site with structure fragment for detection primitive;Secondly, consider the reads number of CNV and the relatedness in site despite multiple method when counting statistics amount, but the two feature is not reasonably weighed by they, it is easy to flase drop CNV.
Existing CNV significance level detection method is primarily present following deficiency:
(1) statistic being primitive with single CNV site, it is easy to cause the conservative that significance level is estimated;Though remain the inherent structure characteristic of copy number to a certain extent with CNV structure fragment for constant dollar amount, but ignore the dependency between internal site, it is difficult to the significance level of objective estimation statistic CNV.
(2) there is no the frequency of reasonable tradeoff CNV and the relatedness of variant sites so that the biological performance that CNV associates with cancer is difficult to position;
(3) based on the method for single pattern detection when detecting the cCNV of multiple samples, systematic error or platform errors problem are serious.
(4) there is no the automatic Synthesis multiple samples from difference order-checking platform or order-checking level so that there is bigger limitation when detecting multiple samples concurrent CNV functional mode;
(5) for the sample data of low-coverage level, insensitive, Detection results is not good.
Summary of the invention
It is an object of the invention to provide a kind of copy number mutation detection method based on new-generation sequencing, it is intended to the data for different coverage take different normalized measures, make data more operability, reduce systematic error;Integrate multiple sample, it is proposed to a set of with CNV structural units be primitive significance level etection theory and method;With supervised learning mechanism for guiding, set up and the consistent zero cloth of statistic, to improve the accuracy that significance level is estimated.
The present invention is achieved in that a kind of copy number mutation detection method based on new-generation sequencing, a kind of copy number mutation detection method based on new-generation sequencing, should comprise the following steps based on the copy number mutation detection method of new-generation sequencing:
The pretreatment of copy number variation data: filter out the reads that in the Batch effect of CNV signal and comparison process, comparison quality is relatively very low;By standardization G/C content, adjust the reads number that data sample site is corresponding;The order-checking level normalization of multiple samples is processed into the data of corresponding same order-checking level;For the data sample that overburden depth is low, directly data normalization is become same level;For the data sample that overburden depth is high, first define copy number amplification and miss status according to its data frequency rectangular histogram feature;
The structure of sliding window: the multiple samples after integrated standardization process, obtains a higher dimensional matrix;Intend structure sliding window to calculate the frequency in site from original position and utilize Pearson formula to calculate in each window the dependency between site simultaneously, sliding window gradually, until throughout each site;Calculate the dependency between site;
The calculating of statistic: calculate amplification or the miss status of the statistic reflection copy number variation in each site in each sliding window, utilize known copy number mutation schema construction training set, the weight of study frequency and correlation coefficient, w1And w2, with counting statistics amount,
Stest=w1*f+w2*a
Wherein, f, a, StestRefer to the frequency of copy number mutation pattern, dependency, and the value of statistic in training set respectively;
The enforcement of Replacement Strategy and the structure of zero cloth: the multiple samples after standardization are calculated the detection statistic that on full-length genome, each site is corresponding, structure zero cloth T, then sample data is implemented random permutation, to each sample, its position occurred in full-length genome of random permutation, until s sample standard deviation is replaced, constitute a total replacement sample set;To each displacement sample set, calculate the statistic that tandem copies number variation occurs;Finally calculate the significance level of detection statistic:
Estimation based on CNV significance level: evaluated the CNV region occurred by the p value that the sample all sites obtained is corresponding, if p value is less than the threshold value (such as 0.05) of certain setting, then it is considered that this CNV has biological meaning or cancer function.To each CNV construction unit, set up the zero cloth of amplification and miss status respectively, to detect the significance level of amplification and miss status respectively.
The Performance Evaluation of algorithm: can evaluation algorithm when false positive rate (FPR) be controlled, it is thus achieved that higher valid positive rate (TPR);Whether evaluation algorithms can accurately estimate p value (TypeIErrorRate);The border Detection capability of copy number variation;The computation complexity of parser.
Further, reads < Q30 in the reads that in the Batch effect of CNV signal and comparison process, comparison quality is relatively very low is filtered out described in.
Further, multiple samples after described integrated standardization process, obtaining higher dimensional matrix in a higher dimensional matrix is the number of sites N of number of samples s* sample, relatedness between the described contiguous copy number variant sites of copy number variation presented with one section of region is stronger, up to 0.985, between distant site, relatedness is more weak.
Further, described for each sliding window, calculate its statistic with the amplification reflecting copy number and making a variation or miss status, for low cover degree sample, directly calculating the correlation coefficient between other sites in reads number frequency corresponding to each site and this site and window, comprehensively its frequency and correlation coefficient quantify its statistic (S);For the sample of high overburden depth, utilize the ingenious state area accurately having different biological functions to show the amplification of copy number and disappearance both of frequency histogram separately, calculate the statistic (S) of both states respectively.
Further, S in the calculating of described statistictestTraining set is intended give relative value by the relation of copy number mutation pattern known in public database Yu gene expression dose to it.
Further, the described detection statistic that each site on multiple samples calculating full-length genome after standardization is corresponding, structure zero cloth T, then sample data is implemented sample data in random permutation is that the every a line in data matrix represents a sample, and every string represents a site on full-length genome.
Further, if the described zero cloth based on CNV length designs with p value in the estimation of significance level less than 0.05 threshold value set, this CNV has biological meaning or cancer function, and the amplification of described CNV and miss status have different biological functions and performance.
Further, in the Performance Evaluation of described algorithm, whether evaluation algorithms can accurately estimate p value, and namely whether the statistical model of algorithm has stronger statistical significance.
The invention solves the problem that prior art is easily trapped into conservative when copy number makes a variation significance estimation;Automatic Synthesis of the present invention detects the region that multiple samples occur copy number to make a variation in same area jointly, avoid the detection error that prior art only detects the copy number variable region of single sample or paired sample, from patient groups, study the relation of copy number variation and cancer;The invention solves the copy number variation detection error problem owing to order-checking platform and order-checking level difference cause, make result more accurate;The present invention is directed to new-generation sequencing data form to utilize from multimodal frequency histogram feature normalization data, accurately to divide normal region and copy number variable region;Prior art is only at copy number variant sites reads number, and consider during statistic design that between variation reads number and adjacent variables site, relatedness exists discordance, the present invention is directed to this problem, consider the comprehensive function of relatedness between variation reads number and variant sites, set up new model, solve problem of inconsistency, with the significance level of objective estimation copy number variation.
When detecting multisample cCNV, the present invention integrates multiple sample, decreases and detects produced systematic error or order-checking platform errors based on single sample testing method successively, substantially increases detection effect.
When early stage normalization (standardization) processes data, the present invention is directed to different order-checking horizontal datas and adopt different processing methods, with prior art low covering horizontal data detection insensitive compared with, no matter present invention order-checking covering level height all has higher sensitivity, this lays a good foundation for the follow-up degree of accuracy improving detection copy number variation.
The copy number variation of detection multisample common region, except to consider that the region that multiple sample generation copy number makes a variation presents identical amplification or deleted signal, the detection that copy number is made a variation by the correlation between adjacent sites also has important biological meaning.Therefore, be conducive to estimating more objectively the significance level of the copy number variation of common region based on the statistic of the feature of structure these two aspects and statistical inspection model;And prior art often only emphasizes the amplitude of copy number variable region, and ignore the dependency between site;For this, the present invention considers both features, set up statistical inspection model, and by supervised learning strategy balance the two feature with reasonably counting statistics amount, this not only makes hypothesis testing model and statistic have concordance, and can strengthen statistics and the biological double meaning that significance level is estimated.
Present invention data for difference covering level when data process take different standardization processing methods, especially to high overburden depth data, first define copy number amplification and miss status according to its data frequency rectangular histogram feature, isolate only normal (0) amplification (1) data set and normal (0) disappearance (-1) data set;The present invention is with Single locus for detection primitive when designing statistic, and combines the information of relatedness between the reads number of CNV Single locus and site when quantitative statistics amount, it is possible to fundamentally improve the accuracy that significance level is estimated;The present invention integrates multiple sample, weighed by the feature of dependency two aspect between the supervised learning method reads number (amplitude) to full-length genome site and site, rationally to quantify statistic, and construct and the consistent hypothesis testing model of statistic, thus improve the statistical significance that significance level is estimated.
Given emulation data: comprise 5 samples of 18 concurrent copy numbers variation (cCNV), the present invention is capable of detecting when 17 cCNV regions, and prior art such as FREEC is only capable of detecting 15 cCNV regions by single pattern detection global alignment.Great many of experiments shows simultaneously: compared with FREEC, and the present invention reduces variable region order on border when detecting more accurate.
Accompanying drawing explanation
Fig. 1 is the copy number mutation detection method flow chart based on new-generation sequencing that the embodiment of the present invention provides.
Detailed description of the invention
In order to make the purpose of the present invention, technical scheme and advantage clearly understand, below in conjunction with embodiment, the present invention is further elaborated.Should be appreciated that specific embodiment described herein is only in order to explain the present invention, is not intended to limit the present invention.
Present invention data for difference covering level when data process take different standardization processing methods, especially to high overburden depth data, first define copy number amplification and miss status according to its data frequency rectangular histogram feature, isolate only normal (0) amplification (1) data set and normal (0) disappearance (-1) data set;The present invention is with Single locus for detection primitive when designing statistic, and combines the information of relatedness between the reads number of CNV Single locus and site when quantitative statistics amount, it is possible to fundamentally improve the accuracy that significance level is estimated;The present invention integrates multiple sample, weighed by the feature of dependency two aspect between the supervised learning method reads number (amplitude) to full-length genome site and site, rationally to quantify statistic, and construct and the consistent hypothesis testing model of statistic, thus improve the statistical significance that significance level is estimated.
Below in conjunction with accompanying drawing, the application principle of the present invention is further described.
A kind of copy number mutation detection method based on new-generation sequencing, should comprise the following steps based on the copy number mutation detection method of new-generation sequencing:
S101: the pretreatment of copy number variation data: filter out the reads that in the Batch effect of CNV signal and comparison process, comparison quality is relatively very low;By standardization G/C content, adjust the reads number that data sample site is corresponding;The order-checking level normalization of multiple samples is processed into the data of corresponding same order-checking level;For the data sample that overburden depth is low, directly data normalization is become same level;For the data sample that overburden depth is high, first define copy number amplification and miss status according to its data frequency rectangular histogram feature;
S102: the structure of sliding window: the multiple samples after integrated standardization process, obtains a higher dimensional matrix;Intend structure sliding window to calculate the frequency in site from original position and utilize Pearson formula to calculate in each window the dependency between site simultaneously, sliding window gradually, until throughout each site;Calculate the dependency between site
S103: the calculating of statistic: calculate amplification or the miss status of the statistic reflection copy number variation of each sliding window, utilize known copy number mutation schema construction training set, the weight of study frequency and correlation coefficient, w1And w2, with counting statistics amount,
Stest=w1*f+w2*a
Wherein, f, a, StestRefer to the frequency of copy number mutation pattern, dependency, and the value of statistic in training set respectively;
S104: the enforcement of Replacement Strategy and the structure of zero cloth: the multiple samples after standardization are calculated the detection statistic that on full-length genome, each site is corresponding, structure zero cloth T, then sample data is implemented random permutation, to each sample, its position occurred in full-length genome of random permutation, until s sample standard deviation is replaced, constitute a total replacement sample set;To each displacement sample set, calculate the statistic that tandem copies number variation occurs;Finally calculate the significance level of detection statistic:
P-value represents the p-value value that each site of sample is corresponding, and K is the number of times T of random permutation is statistic during zero cloth,For the statistic of i & lt, ifMore than T, then counting adds one, finally namely obtains p value.(wherein p-value,T is vector)
S105: based on the estimation of CNV significance level: evaluated the CNV region occurred by the p value that the sample all sites obtained is corresponding, if p value is less than the threshold value (such as 0.05) of certain setting, then it is considered that this CNV has biological meaning or cancer function.To each CNV construction unit, set up the zero cloth of amplification and miss status respectively, to detect the significance level of amplification and miss status respectively.
S106: the Performance Evaluation of algorithm: can evaluation algorithm when false positive rate (FPR) be controlled, it is thus achieved that higher valid positive rate (TPR);Whether evaluation algorithms can accurately estimate p value (TypeIErrorRate);The border Detection capability of copy number variation;The computation complexity of parser.
Reads < Q30 in the reads that in the described Batch effect filtering out CNV signal and comparison process, comparison quality is relatively very low.
Multiple samples after described integrated standardization process, obtaining higher dimensional matrix in a higher dimensional matrix is the number of sites N of number of samples s* sample, relatedness between the described contiguous copy number variant sites of copy number variation presented with one section of region is stronger, up to 0.985, between distant site, relatedness is more weak.
Described for each sliding window, calculate its statistic with the amplification reflecting copy number and making a variation or miss status, for low cover degree sample, directly calculating the correlation coefficient between other sites in reads number frequency corresponding to each site and this site and window, comprehensively its frequency and correlation coefficient quantify its statistic (S);For the sample of high overburden depth, utilize the ingenious state area accurately having different biological functions to show the amplification of copy number and disappearance both of frequency histogram separately, calculate the statistic (S) of both states respectively.
S in the calculating of described statistictestTraining set is intended give relative value by the relation of copy number mutation pattern known in public database Yu gene expression dose to it.
The described detection statistic that each site on multiple samples calculating full-length genome after standardization is corresponding, structure zero cloth T, then sample data is implemented sample data in random permutation is that the every a line in data matrix represents a sample, and every string represents a site on full-length genome.
If described 0.05 threshold value based on p value in the estimation of CNV significance level less than setting, this CNV has biological meaning or cancer function, and the amplification of described CNV and miss status have different biological functions and performance.
In the Performance Evaluation of described algorithm, whether evaluation algorithms can accurately estimate p value, and namely whether the statistical model of algorithm has stronger statistical significance.
Below in conjunction with application principle, the invention will be further described.
On the basis that copy number biological nature and theory of statistics are fully studied, set up statistical inspection model, design CNV significance level detection algorithm, utilize a large amount of emulation data testing algorithm repeatedly, its performance is analyzed and evaluation from multi-angle.
(1) pretreatment of copy number variation data
Sample data that copy number is made a variation carries out suitable pretreatment has important meaning to copy number variation significance detection.A) for the quality problems in the Batch effect of CNV signal and comparison process, the relatively very low reads of comparison quality (< Q30) is filtered out.B) due to new-generation sequencing technology data measured, its order-checking coverage is by the impact of G/C content, thus affecting copy number variation detection.It would therefore be desirable to by standardization G/C content, adjust the reads number that data sample site is corresponding.C) owing to the order-checking level of multiple samples would be likely to occur height difference, it is impossible to be made directly follow-up normalized set, it is necessary to normalized becomes the data of corresponding same order-checking level just to have meaning.For the data sample that overburden depth is low, directly data normalization can be become same level;For the data sample that overburden depth is high, can first define copy number amplification and miss status according to its data frequency rectangular histogram feature.
(2) structure of sliding window
Multiple samples after integrated standardization process, can obtain a higher dimensional matrix (the number of sites N of number of samples s* sample).Owing to copy number variation presents with one section of region, the relatedness between generally contiguous copy number variant sites is stronger, may be up to 0.985, and between distant site, relatedness compares overly soft pulse to ignoring.In order to more accurately calculate the dependency between site, intend structure sliding window and calculate the frequency in site from original position and utilize the Pearson formula dependency to calculate in each window between site simultaneously, sliding window gradually, until throughout each site.Wherein result is affected not quite by choosing of the size of sliding window, and we take 10 temporarily here, and rear extended meeting observes it by experiment to impact effect.
(3) calculating of statistic
For each sliding window, calculate its statistic with the amplification reflecting copy number and making a variation or miss status.Owing to the data of new-generation sequencing are subject to the impact of order-checking overburden depth, for low cover degree and high coverage sample counting statistics amount respectively, greatly strengthen the suitability of the present invention.For low cover degree sample, directly calculating the correlation coefficient between other sites in reads number frequency corresponding to each site and this site and window, comprehensively its frequency and correlation coefficient quantify its statistic (S).Sample for high overburden depth, we utilize the ingenious state area accurately having different biological functions to show the amplification of copy number and disappearance both of frequency histogram separately, calculate the statistic (S) of both states respectively, be conducive to the significance level of detection copy number variation better.Here difficult point is how reasonable tradeoff frequency and correlation coefficient, and for this, we utilize known copy number mutation schema construction training set, the weight of study frequency and correlation coefficient, w1And w2, with counting statistics amount.
Stest=w1*f+w2*a
Wherein, f, a, StestRefer to the frequency of copy number mutation pattern, dependency, and the value of statistic in training set respectively.Due to StestTraining set does not clearly provide, therefore, intends giving relative value by the relation of copy number mutation pattern known in public database Yu gene expression dose to it.
(4) structure of the enforcement of Replacement Strategy and zero cloth
Multiple samples after standardization are calculated the detection statistic that on full-length genome, each site is corresponding, constructs zero cloth T.Then to sample data, (the every a line in data matrix represents a sample, every string represents a site on full-length genome) implement random permutation, detailed process is as follows: a) for each sample, its position occurred in full-length genome of random permutation, until s sample standard deviation is replaced, constitute a total replacement sample set;For each displacement sample set, calculate the statistic that tandem copies number variation occurs;Finally calculate the significance level of detection statistic:
(5) estimation with significance level is designed based on the zero cloth of CNV length
The CNV region occurred is evaluated, if p value is less than the threshold value (such as 0.05) of certain setting, then it is considered that this CNV has biological meaning or cancer function by the p value that the sample all sites obtained is corresponding.Furthermore, it is contemplated that the amplification of CNV and miss status have different biological functions and performance, we, for each CNV construction unit, set up the zero cloth of amplification and miss status respectively, to detect the significance level of amplification and miss status respectively.
(6) Performance Evaluation of algorithm
The present invention intends from three below aspect, the performance of algorithm being evaluated: a) can evaluation algorithm when false positive rate (FPR) be controlled, it is thus achieved that higher valid positive rate (TPR);B) whether evaluation algorithms can accurately estimate p value (TypeIErrorRate), and namely whether the statistical model of algorithm has stronger statistical significance;C) the border Detection capability of copy number variation;D) computation complexity of parser.
Intend with the normal cell copy number of 1000Affymetrix full-length genome SNP6.0 chip detection for background, consider NGS technology and data characteristics, based on theory of probability and nonstationary model, build markov CNV emulation mode, the large-scale CNV data based on NGS of simulation, test the method performance of the present invention.Partial simulation experiment draws, this algorithm, under keeping higher TPR situation, has higher border Detection capability.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all any amendment, equivalent replacement and improvement etc. made within the spirit and principles in the present invention, should be included within protection scope of the present invention.
Claims (8)
1. the copy number mutation detection method based on new-generation sequencing, it is characterised in that should comprise the following steps based on the copy number mutation detection method of new-generation sequencing:
The pretreatment of copy number variation data: filter out the reads that in the Batch effect of CNV signal and comparison process, comparison quality is relatively very low;By standardization G/C content, adjust the reads number that data sample site is corresponding;The order-checking level normalization of multiple samples is processed into the data of corresponding same order-checking level;For the data sample that overburden depth is low, directly data normalization is become same level;For the data sample that overburden depth is high, first define copy number amplification and miss status according to its data frequency rectangular histogram feature;
The structure of sliding window: the multiple samples after integrated standardization process, obtains a higher dimensional matrix;Intend structure sliding window to calculate the frequency in site from original position and utilize Pearson formula to calculate in each window the dependency between site simultaneously, sliding window gradually, until throughout each site;Calculate the dependency between site;
The calculating of statistic: calculate amplification or the miss status of the statistic reflection copy number variation of each sliding window, utilize known copy number mutation schema construction training set, the weight of study frequency and correlation coefficient, w1And w2, with counting statistics amount,
Stest=w1*f+w2*a
Wherein, f, a, StestRefer to the frequency of copy number mutation pattern, dependency, and the value of statistic in training set respectively;
The enforcement of Replacement Strategy and the structure of zero cloth: the multiple samples after standardization are calculated the detection statistic that on full-length genome, each site is corresponding, structure zero cloth T, then sample data is implemented random permutation, to each sample, its position occurred in full-length genome of random permutation, until s sample standard deviation is replaced, constitute a total replacement sample set;To each displacement sample set, calculate the statistic that tandem copies number variation occurs;Finally calculate the significance level of detection statistic:
P-value represents the p-value value that each site of sample is corresponding, and K is the number of times T of random permutation is statistic during zero cloth,For the statistic of i & lt, ifMore than T, then counting adds one, finally namely obtains p value.(wherein p-value,T is vector)
Estimation based on CNV significance level: evaluated the CNV region occurred by the p value that the sample all sites obtained is corresponding, if p value is less than the threshold value (such as 0.05) of certain setting, then it is considered that this CNV has biological meaning or cancer function.To each CNV construction unit, set up the zero cloth of amplification and miss status respectively, to detect the significance level of amplification and miss status respectively;
The Performance Evaluation of algorithm: can evaluation algorithm when false positive rate be controlled, it is thus achieved that higher valid positive rate;Whether evaluation algorithms can accurately estimate p value;The border Detection capability of copy number variation;The computation complexity of parser.
2. the copy number mutation detection method based on new-generation sequencing as claimed in claim 1, it is characterised in that described in filter out reads < Q30 in the reads that in the Batch effect of CNV signal and comparison process, comparison quality is relatively very low.
3. the copy number mutation detection method based on new-generation sequencing as claimed in claim 1, it is characterized in that, multiple samples after described integrated standardization process, obtaining higher dimensional matrix in a higher dimensional matrix is the number of sites N of number of samples s* sample, relatedness between the described contiguous copy number variant sites of copy number variation presented with one section of region is stronger, up to 0.985, between distant site, relatedness is more weak.
4. the copy number mutation detection method based on new-generation sequencing as claimed in claim 1, it is characterized in that, described for each sliding window, calculate its statistic with the amplification reflecting copy number and making a variation or miss status, for low cover degree sample, directly calculating the correlation coefficient between other sites in reads number frequency corresponding to each site and this site and window, comprehensively its frequency and correlation coefficient quantify its statistic (S);For the sample of high overburden depth, utilize the ingenious state area accurately having different biological functions to show the amplification of copy number and disappearance both of frequency histogram separately, calculate the statistic (S) of both states respectively.
5. the copy number mutation detection method based on new-generation sequencing as claimed in claim 1, it is characterised in that S in the calculating of described statistictestTraining set is intended give relative value by the relation of copy number mutation pattern known in public database Yu gene expression dose to it.
6. the copy number mutation detection method based on new-generation sequencing as claimed in claim 1, it is characterized in that, the described detection statistic that each site on multiple samples calculating full-length genome after standardization is corresponding, structure zero cloth T, then sample data is implemented sample data in random permutation is that the every a line in data matrix represents a sample, and every string represents a site on full-length genome.
7. the copy number mutation detection method based on new-generation sequencing as claimed in claim 1, it is characterized in that, if the described zero cloth based on CNV length designs with p value in the estimation of significance level less than 0.05 threshold value set, this CNV has biological meaning or cancer function, and the amplification of described CNV and miss status have different biological functions and performance.
8. the copy number mutation detection method based on new-generation sequencing as claimed in claim 1, it is characterised in that in the Performance Evaluation of described algorithm, whether evaluation algorithms can accurately estimate p value, and namely whether the statistical model of algorithm has stronger statistical significance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610114354.8A CN105760712B (en) | 2016-03-01 | 2016-03-01 | A kind of copy number mutation detection method based on new-generation sequencing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610114354.8A CN105760712B (en) | 2016-03-01 | 2016-03-01 | A kind of copy number mutation detection method based on new-generation sequencing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105760712A true CN105760712A (en) | 2016-07-13 |
CN105760712B CN105760712B (en) | 2019-03-26 |
Family
ID=56331603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610114354.8A Active CN105760712B (en) | 2016-03-01 | 2016-03-01 | A kind of copy number mutation detection method based on new-generation sequencing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105760712B (en) |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106372459A (en) * | 2016-08-30 | 2017-02-01 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting copy number variation based on amplicon next generation sequencing |
CN106650312A (en) * | 2016-12-29 | 2017-05-10 | 安诺优达基因科技(北京)有限公司 | Device for detecting DNA copy number variation of circulating tumor |
CN106682450A (en) * | 2016-11-24 | 2017-05-17 | 西安电子科技大学 | New generation sequencing copy number variation simulation method based on state transition model |
CN106682455A (en) * | 2016-11-24 | 2017-05-17 | 西安电子科技大学 | Statistical testing method of copy number consistency variation region in multiple samples |
CN106676178A (en) * | 2017-01-19 | 2017-05-17 | 北京吉因加科技有限公司 | System and method for tumor heterogeneity assessment |
CN106778072A (en) * | 2016-12-30 | 2017-05-31 | 西安交通大学 | For the flow bearing calibration of second generation Oncogenome high-flux sequence data |
CN106845154A (en) * | 2016-12-29 | 2017-06-13 | 安诺优达基因科技(北京)有限公司 | A kind of device for the copy number variation detection of FFPE samples |
CN107229839A (en) * | 2017-05-25 | 2017-10-03 | 西安电子科技大学 | A kind of Indel detection methods based on new-generation sequencing data |
CN108073790A (en) * | 2016-11-10 | 2018-05-25 | 安诺优达基因科技(北京)有限公司 | A kind of chromosomal variation detection device |
CN108197428A (en) * | 2017-12-25 | 2018-06-22 | 西安交通大学 | A kind of next-generation sequencing technologies copy number mutation detection method of parallel Dynamic Programming |
CN108256292A (en) * | 2016-12-29 | 2018-07-06 | 安诺优达基因科技(北京)有限公司 | A kind of copy number variation detection device |
CN108563923A (en) * | 2017-12-05 | 2018-09-21 | 华南理工大学 | A kind of genetic mutation data distribution formula storage method and framework |
WO2018214010A1 (en) * | 2017-05-23 | 2018-11-29 | 深圳华大基因研究院 | Method, device, and storage medium for detecting mutation on the basis of sequencing data |
CN109658983A (en) * | 2018-12-20 | 2019-04-19 | 深圳市海普洛斯生物科技有限公司 | A kind of method and apparatus identifying and eliminate false positive in variance detection |
CN109887546A (en) * | 2019-01-15 | 2019-06-14 | 明码(上海)生物科技有限公司 | A kind of single-gene or polygenes copy number detection system and method based on two generation sequencing technologies |
CN110024035A (en) * | 2016-09-22 | 2019-07-16 | Illumina公司 | The variation detection of body cell copy number |
WO2019157791A1 (en) * | 2018-02-14 | 2019-08-22 | 南京世和基因生物技术有限公司 | Detection method and device of copy number variations, and computer readable medium |
CN110310704A (en) * | 2019-05-08 | 2019-10-08 | 西安电子科技大学 | A kind of copy number mutation detection method based on local outlier factor |
CN111429966A (en) * | 2020-04-23 | 2020-07-17 | 长沙金域医学检验实验室有限公司 | Chromosome copy number variation discrimination method and device based on robust linear regression |
CN111508559A (en) * | 2020-04-21 | 2020-08-07 | 北京橡鑫生物科技有限公司 | Method and device for detecting target area CNV |
CN111627498A (en) * | 2020-05-21 | 2020-09-04 | 北京吉因加医学检验实验室有限公司 | Method and device for correcting GC bias of sequencing data |
CN111863124A (en) * | 2020-06-06 | 2020-10-30 | 聊城大学 | Copy number variation detection method, system, storage medium and computer equipment |
CN112365927A (en) * | 2017-12-28 | 2021-02-12 | 安诺优达基因科技(北京)有限公司 | CNV detection device |
CN112885406A (en) * | 2020-04-16 | 2021-06-01 | 深圳裕策生物科技有限公司 | Method and system for detecting HLA heterozygosity loss |
CN113270141A (en) * | 2021-06-10 | 2021-08-17 | 哈尔滨因极科技有限公司 | Genome copy number variation detection integration algorithm |
CN113284558A (en) * | 2021-07-02 | 2021-08-20 | 赛福解码(北京)基因科技有限公司 | Method for distinguishing gene expression difference and long copy number variation in RNA sequencing data |
CN114758720A (en) * | 2022-06-14 | 2022-07-15 | 北京贝瑞和康生物技术有限公司 | Methods, apparatus, and media for detecting copy number variation |
CN115064210A (en) * | 2022-07-27 | 2022-09-16 | 北京大学第三医院(北京大学第三临床医学院) | Method for identifying chromosome cross-exchange positions in diploid embryonic cells and application |
CN117409856A (en) * | 2023-10-25 | 2024-01-16 | 北京博奥医学检验所有限公司 | Mutation detection method, system and storable medium based on single sample to be detected targeted gene region second generation sequencing data |
CN118016150A (en) * | 2023-11-30 | 2024-05-10 | 东莞博奥木华基因科技有限公司 | Model construction for detecting copy number variation of genetic sequence and application thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050064476A1 (en) * | 2002-11-11 | 2005-03-24 | Affymetrix, Inc. | Methods for identifying DNA copy number changes |
CN103778350A (en) * | 2014-01-09 | 2014-05-07 | 西安电子科技大学 | Somatic copy number alteration obviousness detection method based on two-dimension statistic model |
CN104221022A (en) * | 2012-04-05 | 2014-12-17 | 深圳华大基因医学有限公司 | Method and system for detecting copy number variation |
CN104603284A (en) * | 2012-09-12 | 2015-05-06 | 深圳华大基因研究院 | Method for detecting copy number variations by genome sequencing fragments |
CN104694384A (en) * | 2015-03-20 | 2015-06-10 | 上海美吉生物医药科技有限公司 | Mitochondrial DNA copy index variability detecting device |
-
2016
- 2016-03-01 CN CN201610114354.8A patent/CN105760712B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050064476A1 (en) * | 2002-11-11 | 2005-03-24 | Affymetrix, Inc. | Methods for identifying DNA copy number changes |
CN104221022A (en) * | 2012-04-05 | 2014-12-17 | 深圳华大基因医学有限公司 | Method and system for detecting copy number variation |
CN104603284A (en) * | 2012-09-12 | 2015-05-06 | 深圳华大基因研究院 | Method for detecting copy number variations by genome sequencing fragments |
CN103778350A (en) * | 2014-01-09 | 2014-05-07 | 西安电子科技大学 | Somatic copy number alteration obviousness detection method based on two-dimension statistic model |
CN104694384A (en) * | 2015-03-20 | 2015-06-10 | 上海美吉生物医药科技有限公司 | Mitochondrial DNA copy index variability detecting device |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106372459A (en) * | 2016-08-30 | 2017-02-01 | 天津诺禾致源生物信息科技有限公司 | Method and device for detecting copy number variation based on amplicon next generation sequencing |
CN106372459B (en) * | 2016-08-30 | 2019-03-15 | 天津诺禾致源生物信息科技有限公司 | A kind of method and device based on amplification second filial sequencing copy number variation detection |
CN110024035A (en) * | 2016-09-22 | 2019-07-16 | Illumina公司 | The variation detection of body cell copy number |
CN110024035B (en) * | 2016-09-22 | 2023-11-14 | Illumina公司 | Somatic cell copy number variation detection |
CN108073790B (en) * | 2016-11-10 | 2022-03-01 | 安诺优达基因科技(北京)有限公司 | Chromosome variation detection device |
CN108073790A (en) * | 2016-11-10 | 2018-05-25 | 安诺优达基因科技(北京)有限公司 | A kind of chromosomal variation detection device |
CN106682450A (en) * | 2016-11-24 | 2017-05-17 | 西安电子科技大学 | New generation sequencing copy number variation simulation method based on state transition model |
CN106682455A (en) * | 2016-11-24 | 2017-05-17 | 西安电子科技大学 | Statistical testing method of copy number consistency variation region in multiple samples |
CN106682450B (en) * | 2016-11-24 | 2019-05-07 | 西安电子科技大学 | A kind of new-generation sequencing copy number variation emulation mode based on state transition model |
CN106682455B (en) * | 2016-11-24 | 2019-03-26 | 西安电子科技大学 | A kind of Statistical Identifying Method of multisample copy number consistency variable region |
CN108256292A (en) * | 2016-12-29 | 2018-07-06 | 安诺优达基因科技(北京)有限公司 | A kind of copy number variation detection device |
CN106845154B (en) * | 2016-12-29 | 2022-04-08 | 浙江安诺优达生物科技有限公司 | A device for FFPE sample copy number variation detects |
CN106650312B (en) * | 2016-12-29 | 2022-05-17 | 浙江安诺优达生物科技有限公司 | Device for detecting copy number variation of circulating tumor DNA |
CN106845154A (en) * | 2016-12-29 | 2017-06-13 | 安诺优达基因科技(北京)有限公司 | A kind of device for the copy number variation detection of FFPE samples |
CN108256292B (en) * | 2016-12-29 | 2021-11-02 | 浙江安诺优达生物科技有限公司 | Copy number variation detection device |
CN106650312A (en) * | 2016-12-29 | 2017-05-10 | 安诺优达基因科技(北京)有限公司 | Device for detecting DNA copy number variation of circulating tumor |
CN106778072A (en) * | 2016-12-30 | 2017-05-31 | 西安交通大学 | For the flow bearing calibration of second generation Oncogenome high-flux sequence data |
CN106778072B (en) * | 2016-12-30 | 2019-05-21 | 西安交通大学 | For the process bearing calibration of second generation Oncogenome high-flux sequence data |
CN106676178B (en) * | 2017-01-19 | 2020-03-24 | 北京吉因加科技有限公司 | Method and system for evaluating tumor heterogeneity |
CN106676178A (en) * | 2017-01-19 | 2017-05-17 | 北京吉因加科技有限公司 | System and method for tumor heterogeneity assessment |
WO2018214010A1 (en) * | 2017-05-23 | 2018-11-29 | 深圳华大基因研究院 | Method, device, and storage medium for detecting mutation on the basis of sequencing data |
CN107229839A (en) * | 2017-05-25 | 2017-10-03 | 西安电子科技大学 | A kind of Indel detection methods based on new-generation sequencing data |
CN108563923B (en) * | 2017-12-05 | 2020-08-18 | 华南理工大学 | Distributed storage method and system for genetic variation data |
CN108563923A (en) * | 2017-12-05 | 2018-09-21 | 华南理工大学 | A kind of genetic mutation data distribution formula storage method and framework |
CN108197428A (en) * | 2017-12-25 | 2018-06-22 | 西安交通大学 | A kind of next-generation sequencing technologies copy number mutation detection method of parallel Dynamic Programming |
CN108197428B (en) * | 2017-12-25 | 2020-06-19 | 西安交通大学 | Copy number variation detection method for next generation sequencing technology based on parallel dynamic programming |
CN112365927A (en) * | 2017-12-28 | 2021-02-12 | 安诺优达基因科技(北京)有限公司 | CNV detection device |
CN112365927B (en) * | 2017-12-28 | 2023-08-25 | 安诺优达基因科技(北京)有限公司 | CNV detection device |
WO2019157791A1 (en) * | 2018-02-14 | 2019-08-22 | 南京世和基因生物技术有限公司 | Detection method and device of copy number variations, and computer readable medium |
CN109658983A (en) * | 2018-12-20 | 2019-04-19 | 深圳市海普洛斯生物科技有限公司 | A kind of method and apparatus identifying and eliminate false positive in variance detection |
CN109887546A (en) * | 2019-01-15 | 2019-06-14 | 明码(上海)生物科技有限公司 | A kind of single-gene or polygenes copy number detection system and method based on two generation sequencing technologies |
CN110310704A (en) * | 2019-05-08 | 2019-10-08 | 西安电子科技大学 | A kind of copy number mutation detection method based on local outlier factor |
CN112885406A (en) * | 2020-04-16 | 2021-06-01 | 深圳裕策生物科技有限公司 | Method and system for detecting HLA heterozygosity loss |
CN111508559A (en) * | 2020-04-21 | 2020-08-07 | 北京橡鑫生物科技有限公司 | Method and device for detecting target area CNV |
CN111429966A (en) * | 2020-04-23 | 2020-07-17 | 长沙金域医学检验实验室有限公司 | Chromosome copy number variation discrimination method and device based on robust linear regression |
CN111627498A (en) * | 2020-05-21 | 2020-09-04 | 北京吉因加医学检验实验室有限公司 | Method and device for correcting GC bias of sequencing data |
CN111627498B (en) * | 2020-05-21 | 2022-10-04 | 北京吉因加医学检验实验室有限公司 | Method and device for correcting GC bias of sequencing data |
CN111863124A (en) * | 2020-06-06 | 2020-10-30 | 聊城大学 | Copy number variation detection method, system, storage medium and computer equipment |
CN111863124B (en) * | 2020-06-06 | 2024-01-30 | 聊城大学 | Copy number variation detection method, system, storage medium and computer equipment |
CN113270141B (en) * | 2021-06-10 | 2023-02-21 | 哈尔滨因极科技有限公司 | Genome copy number variation detection integration algorithm |
CN113270141A (en) * | 2021-06-10 | 2021-08-17 | 哈尔滨因极科技有限公司 | Genome copy number variation detection integration algorithm |
CN113284558A (en) * | 2021-07-02 | 2021-08-20 | 赛福解码(北京)基因科技有限公司 | Method for distinguishing gene expression difference and long copy number variation in RNA sequencing data |
CN113284558B (en) * | 2021-07-02 | 2024-03-12 | 赛福解码(北京)基因科技有限公司 | Method for distinguishing gene expression difference and long copy number variation in RNA sequencing data |
CN114758720A (en) * | 2022-06-14 | 2022-07-15 | 北京贝瑞和康生物技术有限公司 | Methods, apparatus, and media for detecting copy number variation |
CN115064210A (en) * | 2022-07-27 | 2022-09-16 | 北京大学第三医院(北京大学第三临床医学院) | Method for identifying chromosome cross-exchange positions in diploid embryonic cells and application |
CN117409856A (en) * | 2023-10-25 | 2024-01-16 | 北京博奥医学检验所有限公司 | Mutation detection method, system and storable medium based on single sample to be detected targeted gene region second generation sequencing data |
CN117409856B (en) * | 2023-10-25 | 2024-03-29 | 北京博奥医学检验所有限公司 | Mutation detection method, system and storable medium based on single sample to be detected targeted gene region second generation sequencing data |
CN118016150A (en) * | 2023-11-30 | 2024-05-10 | 东莞博奥木华基因科技有限公司 | Model construction for detecting copy number variation of genetic sequence and application thereof |
Also Published As
Publication number | Publication date |
---|---|
CN105760712B (en) | 2019-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105760712A (en) | Copy number variation detection method based on next generation sequencing | |
Gamarra et al. | Split and merge watershed: A two-step method for cell segmentation in fluorescence microscopy images | |
Li et al. | FDR-control in multiscale change-point segmentation | |
CN108447057B (en) | SAR image change detection method based on significance and depth convolution network | |
CN113785362A (en) | Automatic detection of boundaries in mass spectrometry data | |
CN104951809A (en) | Unbalanced data classification method based on unbalanced classification indexes and integrated learning | |
CN106021984A (en) | Whole-exome sequencing data analysis system | |
Wang et al. | A novel approach combined transfer learning and deep learning to predict TMB from histology image | |
CN109887546B (en) | Single-gene or multi-gene copy number detection system and method based on next-generation sequencing | |
CN104820841B (en) | Hyperspectral classification method based on low order mutual information and spectrum context waveband selection | |
CN114707571B (en) | Credit data anomaly detection method based on enhanced isolation forest | |
CN110517790A (en) | Compound hepatotoxicity wind agitation method for early prediction based on deep learning and gene expression data | |
CN114821296A (en) | Underground disease ground penetrating radar image identification method and system, storage medium and terminal | |
CN115620812B (en) | Resampling-based feature selection method and device, electronic equipment and storage medium | |
CN107463797B (en) | Biological information analysis method and device for high-throughput sequencing, equipment and storage medium | |
CN117495640A (en) | Regional carbon emission prediction method and system | |
Djunaidi et al. | Gray level co-occurrence matrix feature extraction and histogram in breast cancer classification with ultrasonographic imagery | |
CN116564409A (en) | Machine learning-based identification method for sequencing data of transcriptome of metastatic breast cancer | |
CN111814893A (en) | Lung full-scan image EGFR mutation prediction method and system based on deep learning | |
CN113793324A (en) | Nasopharyngeal carcinoma induced chemotherapy curative effect prediction method and system | |
CN118427681A (en) | Cross-working condition open-set fault diagnosis method and equipment based on self-supervision contrast learning enhancement | |
Khalilabad et al. | Fully automatic classification of breast cancer microarray images | |
CN110837853A (en) | Rapid classification model construction method | |
Liu et al. | Effidiag: an efficient framework for breast cancer diagnosis in multi-gigapixel whole slide images | |
CN115171906A (en) | Prostate cancer screening and diagnosing method based on machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |