计算机科学 ›› 2017, Vol. 44 ›› Issue (1): 80-83.doi: 10.11896/j.issn.1002-137X.2017.01.015
张晓东,凌诚,高敬阳
ZHANG Xiao-dong, LING Cheng and GAO Jing-yang
摘要: 随着高通量测序技术的应用与发展,基于测序的缺失变异检测方法大量涌现。然而,单一检测方法仍存在适用的局限性以及检测精度与敏感度不足的问题。为此,提出一种基于多检测理论融合的特征挖掘与机器学习算法集成的基因组缺失变异综合检测方法。该方法将多种工具应用于个体缺失变异检测,得到变异检测初始集;再根据多种检测理论对初始集中的缺失变异进行序列特征挖掘与特征提取;最后,将检测工具与机器学习算法相融合以获得集成的检测方法,剔除初始集中的假阳性变异,获得最终的结果集。基于千人基因组计划数据的实验表明,相较于单个工具的检测结果,该方法在检测精度和敏感度上均占优势;相较于多个工具检测结果的直接组合,该方法在损失少许检测敏感度的前提下显著地提高了检测精度。
[1] EICHLER E E,NICKERSON D A,ALTSHULER D,et al.Completing the map of human genetic variation[J].Nature,2007,447(7141):161-165. [2] CONRAD D F,PINTO D,REDON R,et al.Origins and functional impact of copy number variation in the human genome[J].Nature,2010,464(7289):704-712. [3] PAK C H,DANKO T,ZHANG Y,et al.Human neuropsychia-tric disease modeling using conditional deletion reveals synaptic transmission defects caused by heterozygous mutations in NRXN1[J].Cell Stem Cell,2015,17(3):316-328. [4] LEE M Y,WON H S,BAEK J W,et al.Variety of prenatally diag-nosed congenital heart disease in 22q11.2 deletion syndrome[J].Obstetrics & Gynecology Science,2014,57(1):11-16. [5] ALKAN C,COE B P,EICHLER E E.Genome structural variation discovery and genotyping[J].Nature Reviews Genetics,2011,12(5):363-376. [6] YE K,SCHULZ M H,LONG Q,et al.Pindel:a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads[J].Bioinformatics,2009,25(21):2865-2871. [7] ZHANG J,WANG J,WU Y.An improved approach for accu-rate and efficient calling of structural variations with low-coverage sequence data[J].BMC Bioinformatics,2012,13(Suppl 6):1-11. [8] RAUSCH T,ZICHNER T,SCHLATTL A,et al.DELLY:st-ructural variant discovery by integrated paired-end and split-read analysis[J].Bioinformatics,2012,28(18):i333-i339. [9] CHEN K,WALLIS J W,MCLELLAN M D,et al.BreakDancer:an algorithm for high-resolution mapping of genomic structural variation[J].Nature Methods,2009,6(9):677-681. [10] ABYZOV A,URBAN A E,SNYDER M,et al.CNVnator:anapproach to discover,genotype,and characterize typical and atypical CNVs from family and population genome sequencing[J].Genome Research,2011,21(6):974-984. [11] HORMOZDIARI F,HAJIRASOULIHA I,DAO P,et al.Next-generation Variation Hunter:combinatorial algorithms for transposon insertion discovery[J].Bioinformatics,2010,26(12):i350-i357. [12] LI H,DURBIN R.Fast and accurate short read alignment with Burrows-Wheeler transform[J].Bioinformatics,2009,25(14):1754-1760. [13] LI H,HANDSAKER B,WYSOKER A,et al.The sequence alignment/map format and SAMtools[J].Bioinformatics,2009,25(16):2078-2079. [14] CHANG C C,LIN C J.LIBSVM:A library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology (TIST),2011,2(3):389-396. [15] 1000 Genomes Project Consortium.An integrated map of genetic variation from 1092 human genomes[J].Nature,2012,491(7422):56-65. |
No related articles found! |
|