Nothing Special   »   [go: up one dir, main page]

Published February 29, 2024 | Version v1
Dataset Open

Variant Calls for the Inbred Nachman Mouse Strains

  • 1. ROR icon Jackson Laboratory
  • 2. ROR icon University of California, Berkeley

Description

SNV calling: 

Reads were mapped to the GRCm39 reference sequence using minimap2 invoking the “HIFI” preset. Per sample single nucleotide calling was performed using DeepVariant (v1.2.0) under the “PACBIO” model. Per sample gVCF files were then merged using glnexus (v1.2.7) under the DeepVariantWGS configuration to produce a joint call set. Sites with missing data, genotype quality <30, and indels were subsequently filtered using bcftools (v 0.1.19). We further eliminated sites with heterozygous calls as these sites are potentially enriched for false positives given our modest sequencing coverage. Variant effects were predicted using the Variant Effect Predictor (Ensembl release 109.3) using the GRCm39 Mus musculus assembly. 

Unfiltered SNV calls: hifi.deepvariant.jointGenotyping.vcf.gz

Filtered and annotated SNV calls: hifi.deepvariant.jointGenotyping.filtered.vep.vcf.gz

SV calling: 

We identified SVs in the Nachman wild-derived inbred strain genomes using both pbsv (https://github.com/PacificBiosciences/pbsv) and sniffles2 (v. 2.0.7). pbsv was first run on each sample in discover mode to identify read signatures consistent with possible SVs. SVs where then called and samples jointly genotyped by executing pbsv in call mode. Tandem repeats in the GRCm39 assembly were identified using the findTandemRepeats.py script (https://github.com/PacificBiosciences/pbsv/commit/bcec7d382f3ea40158ed9cca3c5fef9686a76641) and supplied when executing sniffles2 to improve the accuracy of calls in repetitive regions. Per sample SV calls generated by sniffles2 were merged and filtered to include only autosomal calls using bcftools merge (v. 0.1.19). Calls with close or overlapping breakpoints across samples were collapsed using truvari (v4.0.0), with the following parameters specified: -pctsize 0.75 –pctovl 0.5 –pctseq 0.7 -s 20 -S 10000000 -k common --chain. We then intersected pbsv and sniffles2 SV calls using truvari bench to produce a higher confidence call set. We used the pbsv callset as the “truth” set and invoked the following command line parameters: -pctsize 0.75 –pctovl 0.5 –pctseq 0.7 –dup-to-ins –passonly -sizemin 20. SVs were annotated using the Ensembl Variant Effect Predictor (release 109.3) and gene model annotations from the GRCm39 assembly. 

pbsv calls: NachmanInbred.pbsv.filtered.vcf.gz

sniffles2 SV calls: NachmanSV.sniffles.vcf.gz

pbsv - sniffles2 merged SV calls, with VEP annotations: Nachman.pbsv.sniffles.mergeSet.vep.vcf.gz

Files

Files (3.5 GB)

Name Size Download all
md5:e5f8576934417aab415503714d2688d1
1.3 GB Download
md5:89b0a30cf92bc6d515f6d49c365fc88e
1.9 MB Download
md5:7b37106aa7011740444123fcd2821581
1.9 GB Download
md5:234a75bca70e5bf5f780f13a7c2ff36e
2.1 MB Download
md5:ba991ccbce4f0dadbd26a30e1a084e66
93.6 MB Download
md5:827f7c1912937748dfc4604151160206
978.6 kB Download
md5:b9e759c0d6054167b64f18da311f2d7b
141.4 MB Download
md5:3ac7fa78b6f8085d998ee9885d017eec
1.4 MB Download
md5:96501087b54778128702a1ff59a20180
81.1 MB Download
md5:aece07a2929134c94435e27d5bb477b5
970.4 kB Download

Additional details

Related works

Is described by
Preprint: 10.1101/2023.09.21.558738 (DOI)