Article
Published: 09 September 2021

Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study

Nature Biotechnology volume 39, pages 1129–1140 (2021)Cite this article

13k Accesses
170 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 11 October 2021

This article has been updated

Abstract

Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Experimental design and mapping results.**

**Fig. 2: Distribution of genomic coverage across sequencing technologies for all replicates.**

**Fig. 3: Estimating rates of sequencing error per platform.**

**Fig. 4: Validating SNPs and INDEL events from short-read datasets against the GIAB high-confidence truth set as determined by RTG vcfeval.**

**Fig. 5: Assessing variability for the son (HG002) across HiSeq X10, 2000 and 4000, platforms that had more than one replicate per cell line to enable this analysis.**

**Fig. 6: Reproducibility of sequencing of bacterial genomes in a complex metagenomic mixture.**

Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment

Article Open access 13 December 2022

Targeted DNA-seq and RNA-seq of Reference Samples with Short-read and Long-read Sequencing

Article Open access 16 August 2024

Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data

Article Open access 19 March 2024

Data availability

The genome sequences in this study are available as EBV-immortalized B lymphocyte cell lines (from Coriell) as well as from DNA (from Coriell and NIST). The data in this study were derived from the batch of DNA from the NIST Reference Materials. All data generated within this study from these genomes are publicly available on the NCBI Sequence Read Archive under the BioProject PRJNA646948, within accessions SRR12898279–SRR12898354.

Code availability

All code used within this study is publicly available at https://www.github.com/jfoox/abrfngs2. This repository includes directories containing scripts for primary analyses such as alignment and variant calling (SLURM/), shell scripts to perform post-processing calculations (bin/) and R scripts used to create figures (Rmds/). All tables used to generate figures are provided in a tables/ directory.

Change history

11 October 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41587-021-01122-z

References

Schuster, S. C. Next-generation sequencing transforms today’s biology. Nat. Methods 5, 16–18 (2008).
Article CAS PubMed Google Scholar
Shendure, J. & Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008).
Article CAS PubMed Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
Mardis, E. R. The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008).
Article CAS PubMed Google Scholar
MacLean, D., Jones, J. D. & Studholme, D. J. Application of ‘next-generation’ sequencing technologies to microbial genetics. Nature Rev. Microbiol. 7, 96–97 (2009).
Article Google Scholar
Glenn, T. C. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759–769 (2011).
Article CAS PubMed Google Scholar
Aziz, N. et al. College of American Pathologists’ laboratory standards for next-generation sequencing clinical tests. Arch. Pathol. Lab. Med. 139, 481–493 (2015).
Article PubMed Google Scholar
Schlaberg, R. et al. Validation of metagenomic next-generation sequencing tests for universal pathogen detection. Arch. Pathol. Lab. Med. 141, 776–786 (2017).
Article CAS PubMed Google Scholar
Zhou, J. et al. Reproducibility and quantitation of amplicon sequencing-based detection. ISME J. 5, 1303–1313 (2011).
Article CAS PubMed PubMed Central Google Scholar
Mellmann, A. et al. High interlaboratory reproducibility and accuracy of next-generation-sequencing-based bacterial genotyping in a ring trial. J. Clin. Microbiol. 55, 908–913 (2017).
Article CAS PubMed PubMed Central Google Scholar
Quail, M. A. et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341 (2012).
Article CAS PubMed PubMed Central Google Scholar
Shi, L. et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).
Article CAS PubMed Google Scholar
Shi, L. et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–838 (2010).
Article CAS PubMed Google Scholar
Li, S. et al. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat. Biotechnol. 32, 915–925 (2014).
Article PubMed PubMed Central Google Scholar
Su, Z. et al. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).
Article CAS Google Scholar
Wang, C. et al. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat. Biotechnol. 32, 926–932 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, S. et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat. Biotechnol. 32, 888–895 (2014).
Article CAS PubMed PubMed Central Google Scholar
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
Article CAS PubMed PubMed Central Google Scholar
Merker, J. D. et al. Proficiency testing of standardized samples shows very high interlaboratory agreement for clinical next-generation sequencing–based oncology assays. Arch. Pathol. Lab. Med. 143, 463–471 (2019).
Article CAS PubMed Google Scholar
Mahamdallie, S. et al. The ICR639 CPG NGS validation series: a resource to assess analytical sensitivity of cancer predisposition gene testing. Wellcome Open Res. 3, 68 (2018).
Article PubMed PubMed Central Google Scholar
Zhong, Q. et al. Multi-laboratory proficiency testing of clinical cancer genomic profiling by next-generation sequencing. Pathol. Res. Pract. 214, 957–963 (2018).
Article CAS PubMed Google Scholar
Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37, 561–566 (2019).
Article CAS PubMed PubMed Central Google Scholar
Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37, 555–560 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ball, M. P. et al. A public resource facilitating clinical use of genomes. Proc. Natl Acad. Sci. USA 109, 11920–11927 (2012).
Article CAS PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Preprint at bioRxiv https://doi.org/10.1101/2020.07.24.212712 (2020).
Landrum, M. J. & Kattman, B. L. ClinVar at five years: delivering on the promise. Hum. Mutat. 39, 1623–1630 (2018).
Article PubMed Google Scholar
Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
Article PubMed Google Scholar
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8, 14061 (2017).
Article CAS PubMed PubMed Central Google Scholar
Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
Article PubMed PubMed Central Google Scholar
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Article CAS PubMed PubMed Central Google Scholar
Olson, N. D. et al. precisionFDA Truth Challenge V2: calling variants from short-and long-reads in difficult-to-map regions. Preprint at bioRxiv https://doi.org/10.1101/2020.11.13.380741 (2020).
Freed, D. N., Aldana, R., Weber, J. A. & Edwards, J. S. The Sentieon Genomics Tools - A fast and accurate solution to variant calling from next-generation sequence data. Preprint at bioRxiv 115717 (2017).
McIntyre, A. B. et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18, 182 (2017).
Article PubMed PubMed Central Google Scholar
Sogin, M. L. in PCR Protocols: A Guide to Methods and Applications (eds Innis, M. et al.) (Elsevier, 2012).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
Article CAS PubMed Google Scholar
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2018).
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
Article CAS PubMed Google Scholar
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
Article CAS PubMed Google Scholar
Luo, R. et al. Exploring the limit of using a deep neural network on pileup data for germline variant calling. Nat. Mach. Intell. 2, 220–227 (2020).
Article Google Scholar
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Article CAS PubMed PubMed Central Google Scholar
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Article PubMed PubMed Central Google Scholar
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Article CAS PubMed Google Scholar
Conway, J. R., Lex, A. & Gehlenborg, N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33, 2938–2940 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Article CAS PubMed Google Scholar
Toptaş, B. Ç., Rakocevic, G., Kómár, P. & Kural, D. Comparing complex variants in family trios. Bioinformatics 34, 4241–4247 (2018).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Illumina and ThermoFisher for providing reagents allowing the study to take place. We also thank NIST for providing the GIAB DNA samples necessary to carry out the study. We acknowledge the HudsonAlpha Institute of Biotechnology for expert assistance in Illumina DNA library preparation. The Association of Biomolecular Resource Facilities (ABRF) also provided funding, logistical support and project oversight. We thank the ABRF NGS Study members, who contributed to the design and execution of this project. We are particularly grateful for the assistance provided by multiple core facilities spending their own time and resources to participate in this research. We thank the Epigenomics Core Facility and Scientific Computing Unit at Weill Cornell Medicine, as well as the Starr Cancer Consortium (I9-A9-071), and acknowledge funding from the Irma T. Hirschl and Monique Weill-Caulier Charitable Trusts, Bert L and N Kuggie Vallee Foundation, the WorldQuant Foundation, The Pershing Square Sohn Cancer Research Alliance, NASA (NNX14AH50G, NNX17AB26G), the National Institutes of Health (R25EB020393, R01NS076465, R01AI125416, R01ES021006, 1R21AI129851, 1R01MH117406), the Bill and Melinda Gates Foundation (OPP1151054), TRISH (NNX16AO69A:0107, NNX16AO69A:0061), the Leukemia and Lymphoma Society grants (LLS 9238-16, Mak, LLS-MCL-982, Chen-Kiang) and the Alfred P. Sloan Foundation (G-2015-13964). Certain commercial equipment, instruments or materials are identified to adequately specify experimental conditions or reported results. Such identification implies neither recommendation nor endorsement by the National Institute of Standards and Technology, nor does it imply that the equipment, instruments or materials identified are necessarily the best available for the purpose. F.J.S. and M.M. are supported by the NIH (UM1 HG008898).

Author information

Authors and Affiliations

Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
Jonathan Foox, Jenny Xiang, Alicia Alonso & Christopher E. Mason
The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
Jonathan Foox & Christopher E. Mason
University of Vermont Cancer Center, Vermont Integrative Genomics Resource, University of Vermont, Burlington, VT, USA
Scott W. Tighe & Phoebe K. Laaguiby
Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Charles M. Nicolet
Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
Justin M. Zook
New York Genome Center, New York, NY, USA
Marta Byrska-Bishop, Wayne E. Clarke & Giuseppe Narzisi
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Michael M. Khayat, Medhat Mahmoud & Fritz J. Sedlazeck
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
Michael M. Khayat, Medhat Mahmoud & Fritz J. Sedlazeck
Molecular Biology Core Facilities, Dana-Farber Cancer Institute, Boston, MA, USA
Zachary T. Herbert
DNA Sequencing Core, University of Utah, Salt Lake City, UT, USA
Derek Warner
Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, USA
George S. Grills
Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
Jin Jen
HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
Shawn Levy
BGI-Shenzhen, Shenzhen, China
Xia Zhao, Wenwei Zhang, Fei Teng, Yonggang Zhao & Haorong Lu
MGI, BGI-Shenzhen, Shenzhen, China
Xia Zhao
Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark
Yonggang Zhao
Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen, China
Haorong Lu
Illumina, Inc., San Diego, CA, USA
Gary P. Schroth
Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA
William Farmerie
Department of Pathology, Fox Chase Cancer Center, Philadelphia, PA, USA
Don A. Baldwin
The Feil Family Brain and Mind Research Institute, New York, NY, USA
Christopher E. Mason
The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA
Christopher E. Mason

Authors

Jonathan Foox
View author publications
You can also search for this author in PubMed Google Scholar
Scott W. Tighe
View author publications
You can also search for this author in PubMed Google Scholar
Charles M. Nicolet
View author publications
You can also search for this author in PubMed Google Scholar
Justin M. Zook
View author publications
You can also search for this author in PubMed Google Scholar
Marta Byrska-Bishop
View author publications
You can also search for this author in PubMed Google Scholar
Wayne E. Clarke
View author publications
You can also search for this author in PubMed Google Scholar
Michael M. Khayat
View author publications
You can also search for this author in PubMed Google Scholar
Medhat Mahmoud
View author publications
You can also search for this author in PubMed Google Scholar
Phoebe K. Laaguiby
View author publications
You can also search for this author in PubMed Google Scholar
Zachary T. Herbert
View author publications
You can also search for this author in PubMed Google Scholar
Derek Warner
View author publications
You can also search for this author in PubMed Google Scholar
George S. Grills
View author publications
You can also search for this author in PubMed Google Scholar
Jin Jen
View author publications
You can also search for this author in PubMed Google Scholar
Shawn Levy
View author publications
You can also search for this author in PubMed Google Scholar
Jenny Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Alonso
View author publications
You can also search for this author in PubMed Google Scholar
Xia Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wenwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Teng
View author publications
You can also search for this author in PubMed Google Scholar
Yonggang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Haorong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Gary P. Schroth
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Narzisi
View author publications
You can also search for this author in PubMed Google Scholar
William Farmerie
View author publications
You can also search for this author in PubMed Google Scholar
Fritz J. Sedlazeck
View author publications
You can also search for this author in PubMed Google Scholar
Don A. Baldwin
View author publications
You can also search for this author in PubMed Google Scholar
Christopher E. Mason
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.E.M., S.W.T., C.M.N. and D.A.B. conceived and designed the study. C.E.M., A.A., S.W.T., Z.T.H., W.F., G.S.G., S.L., P.K.L., D.W., X.Z., W.Z., F.T., Y.Z., J.X., J.J. and H.L. implemented the protocols. J.M.Z., W.E.C., M.B.-B. and G.N. assisted with analysis design. J.F. aggregated and processed data, led data analysis and figure generation, and wrote the manuscript. F.J.S., W.E.C., M.B.-B., G.N., M.M.K., M.M. and S.W.T. performed data analysis, figure generation and manuscript editing. G.P.S. performed experimental planning, support and data analysis.

Corresponding authors

Correspondence to Fritz J. Sedlazeck, Don A. Baldwin or Christopher E. Mason.

Ethics declarations

Competing interests

G.P.S. is employed by Illumina Inc. X.Z., W.Z., F.T., Y.Z. and H.L are employees of MGI Inc. All other authors declare no competing interests.

Additional information

Peer review information Nature Biotechnology thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Quality Control and Decoy Capture.

(a) The insert Size distribution of every replicate, stratified by sequencing instrument. (b) The percentage of total reads that were mapped to decoy contigs within the GRCh38 reference genome.

Extended Data Fig. 2 Normalized Genomic Coverage.

Heatmap showing the distribution of read counts per library (rows) by GC content (columns) across human whole genome and exome samples. Read count values are normalized by total reads per replicate, such that a value of 1 matches maximum value for a given replicate. Annotation tracks on the right indicate the sequencing platform and cell line genome for that replicate.

Extended Data Fig. 3 All-versus-all Genomic Coverage Comparison.

Comparisons for every platform within each UCSC RepeatMasker region. Blue bars indicate >50% of shared sites are better represented in the given platform (column) versus all other platforms (rows). Red bars indicate that the other platform out-covered the given platform.

Extended Data Fig. 4 Variant Detection by Context.

Precision and sensitivity scores as derived from rtg vcfeval analysis, stratified by regions in (a) the CLINVAR database and (b) the OMIM database. For each of the cell lines, genes from each database were overlapped with high confidence regions for variant calling. (c) Scores stratified by regions in the exome, as defined by the AmpliSeq target capture regions file. For each of the cell lines, exomic regions were overlapped with high confidence regions for variant calling.

Extended Data Fig. 5 Genomic Variant Heatmap.

Heatmap of genotype (GT) of variant alleles on chromosome 1, across all human replicates across within sequencing platforms, as measured against the Genome in a Bottle high confidence variant call sets for each genome. Heterozygous variant alleles are shaded in orange (0.5), homozygous variants in red (1), missing data in blue (0), and inapplicable sites (sites outside of the GIAB high confidence region in one cell line but present in another) in gray. Hierarchical clustering reveals strong grouping by cell line, followed by less clear grouping within platforms and inter- and intra-lab replicates.

Extended Data Fig. 6 Mendelian Violation Detection Per Context.

UpSet intersections of Mendelian violations. Each plot is stratified by variant type (SNPs on top, followed by INDELs; INS_5 = insertions 0-5 bp in size, INS_6to15 = insertions 6 to 15 bp in size, INS_15 = insertions >15 bp in size; same for deletions, ‘DEL’). Events were recorded within high confidence regions for the Ashkenazi Son (HG002).

Extended Data Fig. 7 Structural Variants per Instrument.

Comparison between the identified SVs in the six replicates from long-read sequencing instruments, showing agreement of 6,980 SVs between samples (green column).

Extended Data Fig. 8 Structural Variant Metrics.

Coverage, insert size, and read length mean and standard deviation across total SVs in sequencing runs.

Extended Data Fig. 9 SV Agreement between Callers and Instruments.

(a) Insights into SV variability by caller. First the strategy used to examine SV caller variability after stratifying for platforms, replicates and centers variability; next the SV call set sizes and overlap with the GIAB SV call set for the SV caller variability set of HG002; finally the types and sizes of SVs in the SV caller variability set of HG002 (translocations are set to size 50 by default in the SURVIVOR parameters for visualization purposes). (b) Insights into SV variability by platform. Diagrams utilize sequencing runs from HiSeqX10, HiSeq2000 and HiSeq4000 while the final two characterize all platforms available. First the strategy used to examine platform variability after stratifying for SV callers, centers and replicates variability; next, SV call set sizes and overlaps with the GIAB SV call set for the platform variability SV call set of HG002; next, types and sizes of SVs in the platform variability SV call set of HG002. Final two panels include HiSeqX10, HiSeq2000, HiSeq4000, NovaSeq, BGI and MGI for visualization purposes. The NovaSeq, BGI and MGI SV call sets were not integrated into the analyses strategy because sequencing runs with replicates for each sample at different centers on different platforms were not available. On top, SV call set sizes and overlap with the GIAB SV call set for the platform variability SV call set of HG002. Below, types and sizes of SVs in the platform variability SV call set of HG002. (Translocations are set to size 50 by default in the SURVIVOR parameters for visualization purposes).

Extended Data Fig. 10 Metagenomic Bacterial Sequencing Distribution.

(a) Heatmap showing the distribution of read counts per library (rows) by GC content (columns) across bacterial genomes and the metagenomic mixtrue. Read count values are normalized by total reads per replicate, such that a value of 1 matches maximum value for a given replicate. Annotation tracks on the right indicate the sequencing platform and cell line genome for that replicate. (b) Calculations of entropy per genome/metagenomic mixture. Entropy was measured across all GC windows for all replicates for a given sample, rowSums(-(p * log(p)).

Supplementary information

Supplementary Information

Supplementary Methods, Results and Tables 1–9.

Reporting Summary

Supplementary Data 1

FASTQC reports.

Supplementary Data 2

Satellite regions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Foox, J., Tighe, S.W., Nicolet, C.M. et al. Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study. Nat Biotechnol 39, 1129–1140 (2021). https://doi.org/10.1038/s41587-021-01049-5

Download citation

Received: 31 July 2020
Accepted: 05 August 2021
Published: 09 September 2021
Issue Date: September 2021
DOI: https://doi.org/10.1038/s41587-021-01049-5