Abstract
The genomic coverage of copy number variations (CNVs) ranges from 5% to 10%, which is one of the essential pathogenic factors of human diseases. The detection of large CNVs is still defective. However, the read length of the third-generation sequencing (3GS) data is longer than that of the next-generation sequencing (NGS) data, which can theoretically solve the defect that the long variation can’t be detected. However, due to the low accuracy of the 3GS data, it is difficult to apply in practice. To a large extent, it is a supplement to the NGS data research. To solve these problems, we developed a new mutation detection tool named AssCNV23 in this paper. Firstly, this tool corrects the 3GS data to solve the problem of high error rate, and then combines the results of a variety of mutation detection tools to improve the accuracy of the initial mutation set and to solve the detection bias of a single detection tool. At the same time, the high-quality 3GS data was introduced by AssCNV23 to guide the NGS data to assemble, and then detects the CNV after getting enough length data. Finally, to improve the detection efficiency, the tool generates images containing the sequence depth information based on the read depth strategy and uses the convolutional neural network to detect the existing CNVs. The experimental results show that AssCNV23 guarantees a high level of breakpoint accuracy and performs well in identifying large variation. Compared with other tools, the deep learning model has advantages in accuracy and sensitivity, and Matthew correlation coefficient (MCC) performs well in various experiments. This algorithm is relatively reliable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ye, K., Wang, J., Jayasinghe, R., et al.: Systematic discovery of complex indels in human cancers. Nat. Med. 22(1), 97–104 (2016)
Redon, R., Ishikawa, S., Fitch, K.R., et al.: Global variation in copy number in the human genome. Nature 444(7118), 444–454 (2006)
Yu, G., et al.: An improved burden-test pipeline for identifying associations from rare germline and somatic variants. BMC Genom. 18(Suppl 7:753), 55–62 (2017)
Thuresson, A.C., Van Buggenhout, G., Sheth, F., et al.: Whole gene duplication of SCN2A and SCN3A is associated with neonatal seizures and a normal intellectual development. Clin. Genet. 91(1), 106–110 (2017)
Lu, C., Xie, M., Wendl, M., Wang, J., McLellan, M., Leiserson, M., et al.: Patterns and functional implications of rare germline variants across 12 cancer types. Nat. Commun. 6, Article no. 10086 (2015)
Bentley, D.: Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 545–552 (2006)
Sanger, F., Nicklen, S., Coulson, A.: DNA sequencing with chain-terminating inhibitors. PNAS 74, 5463–5467 (1977)
Kingsford, C., Schatz, M., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinf. 11, 21 (2010)
Chin, C.S., et al.: The origin of the Haitian cholera outbreak strain. N. Engl. J. Med. 364, 33–42 (2011)
Rasko, D.A., et al.: Origins of the E. coli strain causing an outbreak of Hemolytic–Uremic syndrome in Germany. N. Engl. J. Med. 365, 709–717 2011
Garcíaalcalde, F., Okonechnikov, K., Carbonell, J., et al.: Qualimap: evaluating next-generation sequence alignment data. Bioinformatics 28(20), 2678 (2012)
Huang, W., Li, L., Myers, J.R., Marth, G.T.: ART: a next-generation sequencing read simulator. Bioinformatics 28(4), 593–594 (2012)
Pattnaik, S., Gupta, S., Rao, A.A., et al.: SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data. BMC Bioinf. 15(1), 40 (2014)
Ono, Y., Asai, K., Hamada, M.: PBSIM: PacBio reads simulator–toward accurate genome assembly. Bioinformatics 29(1), 119–121 (2013)
Acknowledgment
Project supported by Beijing Natural Science Foundation (5182018) and the Fundamental Research Funds for the Central Universities & Research projects on biomedical transformation of China-Japan Friendship Hospital (PYBZ1834).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, F., Gao, L., Gao, J. (2019). Integrated Detection of Copy Number Variation Based on the Assembly of NGS and 3GS Data. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11465. Springer, Cham. https://doi.org/10.1007/978-3-030-17938-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-17938-0_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17937-3
Online ISBN: 978-3-030-17938-0
eBook Packages: Computer ScienceComputer Science (R0)