Abstract
Construction of chromosome-level assembly is a vital step in achieving the goal of a ‘Platinum’ genome, but it remains a major challenge to assemble and anchor sequences to chromosomes in autopolyploid or highly heterozygous genomes. High-throughput chromosome conformation capture (Hi-C) technology serves as a robust tool to dramatically advance chromosome scaffolding; however, existing approaches are mostly designed for diploid genomes and often with the aim of reconstructing a haploid representation, thereby having limited power to reconstruct chromosomes for autopolyploid genomes. We developed a novel algorithm (ALLHiC) that is capable of building allele-aware, chromosomal-scale assembly for autopolyploid genomes using Hi-C paired-end reads with innovative ‘prune’ and ‘optimize’ steps. Application on simulated data showed that ALLHiC can phase allelic contigs and substantially improve ordering and orientation when compared to other mainstream Hi-C assemblers. We applied ALLHiC on an autotetraploid and an autooctoploid sugar-cane genome and successfully constructed the phased chromosomal-level assemblies, revealing allelic variations present in these two genomes. The ALLHiC pipeline enables de novo chromosome-level assembly of autopolyploid genomes, separating each allele. Haplotype chromosome-level assembly of allopolyploid and heterozygous diploid genomes can be achieved using ALLHiC, overcoming obstacles in assembling complex genomes.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The Hi-C data (O. sativa L. japonica cv. Nipponbare, O. sativa L. indica cv. 93-11 and S. spontaneum L. AP85-441) generated in this study have been deposited in the GSA database (http://gsa.big.ac.cn) under BioProject No. PRJCA001420 and accession No. CRA001597. Other published datasets used for ALLHiC testing are listed in Supplementary Table 1.
References
Ekblom, R. & Wolf, J. B. A field guide to whole-genome sequencing, assembly and annotation. Evolut. Appl. 7, 1026–1042 (2014).
Deschamps, S. et al. A chromosome-scale assembly of the sorghum genome using nanopore sequencing and optical mapping. Nat. Commun. 9, 4844 (2018).
Belser, C. et al. Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps. Nat. Plants 4, 879–887 (2018).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).
Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat. Genet. 49, 643–650 (2017).
Reyes-Chin-Wo, S. et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 8, 14953 (2017).
Dekker, J. The three ‘C’ s of chromosome conformation capture: controls, controls, controls. Nat. Methods 3, 17–21 (2006).
van Berkum, N. L. et al. Hi-C: a method to study the three-dimensional architecture of genomes. J. Vis. Exp. https://doi.org/10.3791/1869 (2010).
Zhang, L. et al. RNA sequencing provides insights into the evolution of lettuce and the regulation of flavonoid biosynthesis. Nat. Commun. 8, 2264 (2017).
Jarvis, D. E. et al. The genome of Chenopodium quinoa. Nature 542, 307–312 (2017).
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350 (2016).
Ghurye, J., Pop, M., Koren, S., Bickhart, D. & Chin, C. S. Scaffolding of long read assemblies using long range contact information. BMC Genom. 18, 527 (2017).
Schmitt, A. D., Hu, M. & Ren, B. Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol. 17, 743–755 (2016).
Wood, T. E. et al. The frequency of polyploid speciation in vascular plants. Proc. Natl Acad. Sci. USA 106, 13875–13879 (2009).
Ming, R. & Man Wai, C. Assembling allopolyploid genomes: no longer formidable. Genome Biol. 16, 27 (2015).
Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).
Li, F. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567–572 (2014).
Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, 531–537 (2015).
Sierro, N. et al. The tobacco genome sequence and its comparison with those of tomato and potato. Nat. Commun. 5, 3833 (2014).
Sierro, N. et al. Reference genomes and transcriptomes of Nicotiana sylvestris and Nicotiana tomentosiformis. Genome Biol. 14, R60 (2013).
Bertioli, D. J. et al. The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat. Genet. 48, 438–446 (2016).
Zhuang, W. et al. The genome of cultivated peanut provides insight into legume karyotypes, polyploid evolution and crop domestication. Nat. Genet. 51, 865–876 (2019).
Chapman, J. A. et al. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol. 16, 26 (2015).
The Potato Genome Sequencing,. et al. Genome sequence and analysis of the tuber crop potato. Nature 475, 189–195 (2011).
Yang, J. et al. Haplotype-resolved sweet potato genome traces back its hexaploidization history. Nat. Plants 3, 696–703 (2017).
Kronenberg, Z. N. et al. FALCON-Phase: integrating PacBio and Hi-C data for phased diploid genomes. Preprint at https://doi.org/10.1101/327064 (2018).
Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).
Zhang, Q. et al. N(6)-Methyladenine DNA methylation in japonica and indica rice genomes and Its association with gene expression, plant development, and stress responses. Mol. Plant 11, 1492–1508 (2018).
Schneeberger, K. et al. Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proc. Natl Acad. Sci. USA 108, 10249–10254 (2011).
Xie, T. et al. De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. Mol. Plant 8, 489–492 (2015).
Ferhat et al. Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res. 24, 999–1011 (2014).
Bastian, M., Heymann, S. & Jacomy, M. Gephi: an open source software for exploring and manipulating networks. In International AAAI Conference on Weblogs and Social Media 361–362 (AAAI, 2009).
Tang, H. Disentangling a polyploid genome. Nat. Plants 3, 688–689 (2017).
Wang, J. et al. Microcollinearity between autopolyploid sugarcane and diploid sorghum genomes. BMC Genomics 11, 261 (2010).
Zhang, J. et al. Recent polyploidization events in three Saccharum founding species. Plant Biotechnol. J. 17, 264–274 (2019).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Marie-Nelly, H. et al. High-quality genome (re)assembly using chromosomal contact data. Nat. Commun. 5, 5695 (2014).
Irvine, J. E. Saccharum species as horticultural classes. Theor. Appl. Genet. 98, 186–194 (1999).
Fávero, A. P., Simpson, C. E., Valls, J. F. M. & Vello, N. A. Study of the evolution of cultivated peanut through crossability studies among Arachis ipaensis, A. duranensis, and A. hypogaea. Crop Sci. 46, 1546–1552 (2006).
Bertioli, D. J. et al. The genome sequence of segmental allotetraploid peanut Arachis hypogaea. Nat. Genet. 51, 877–884 (2019).
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).
Wang, M. et al. Evolutionary dynamics of 3D genome architecture following polyploidization in cotton. Nat. Plants 4, 90–97 (2018).
Tang, H. et al. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16, 3 (2015).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Acknowledgements
This work was supported by the National Key Research and Development Program of China (No. 2016YFD0100305 to H.T.), a National Natural Science Foundation of China grant (No. 31701874 to X.Z.) and the Fuzhou Science and Technology project (No. 2017-N-33 to X.Z). We also thank the Fujian provincial government for a Fujian ‘100 Talent Plan’ award (to H.T.).
Author information
Authors and Affiliations
Contributions
H.T. and X.Z. designed and implemented the ALLHiC software. X.Z., H.T., S.Z. and Q.Z. tested the software on various datasets. X.Z., H.T. and R.M. wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information: Nature Plants thanks Jean Marc Aury, Jay Ghurye and Yves van de Peer for their contribution to the peer review of this work.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–5, Supplementary Tables 1–6 and Supplementary Figs. 1–36.
Rights and permissions
About this article
Cite this article
Zhang, X., Zhang, S., Zhao, Q. et al. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 5, 833–845 (2019). https://doi.org/10.1038/s41477-019-0487-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41477-019-0487-8
This article is cited by
-
An improved genome assembly of Chrysanthemum nankingense reveals expansion and functional diversification of terpene synthase gene family
BMC Genomics (2024)
-
Chromosome-level genome assembly of Hippophae tibetana provides insights into high-altitude adaptation and flavonoid biosynthesis
BMC Biology (2024)
-
Origin and diversity of Capsella bursa-pastoris from the genomic point of view
BMC Biology (2024)
-
Haplotype-resolved genome of Mimosa bimucronata revealed insights into leaf movement and nitrogen fixation
BMC Genomics (2024)
-
A consensus genome of sika deer (Cervus nippon) and transcriptome analysis provided novel insights on the regulation mechanism of transcript factor in antler development
BMC Genomics (2024)