Europe PMC
Nothing Special   »   [go: up one dir, main page]

Europe PMC requires Javascript to function effectively.

Either your web browser doesn't support Javascript or it is currently turned off. In the latter case, please turn on Javascript support in your web browser and reload this page.

This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our privacy notice and cookie policy.

Abstract 


As the predominant pest of alfalfa, Odontothrips loti Haliday causes great damages over the major alfalfa-growing regions of China. The characteristics of strong mobility and fecundity make them develop rapidly in the field and hard to be controlled. There is a shortage of bioinformation and limited genomic resources available of O. loti for us to develop novel pest management strategies. In this study, we constructed a chromosome-level reference genome assembly of O. loti with a genome size of 346.59 Mb and scaffold N50 length of 18.52 Mb, anchored onto 16 chromosomes and contained 20128 genes, of which 93.59% were functionally annotated. The results of 99.20% complete insecta_odb10 genes in BUSCO analysis, 91.11% short reads mapped to the ref-genome, and the consistent tendency among the thrips in the distribution of gene length reflects the quality of genome. Our study provided the first report of genome for the genus Odontothrips, which offers a genomic resource for further investigations on evolution and molecular biology of O. loti, contributing to pest management.

Free full text 


Logo of sdataLink to Publisher's site
Sci Data. 2024; 11: 451.
Published online 2024 May 4. https://doi.org/10.1038/s41597-024-03289-x
PMCID: PMC11069530
PMID: 38704405

Chromosome-level genome assembly of Odontothrips loti Haliday (Thysanoptera: Thripidae)

Associated Data

Data Citations

Abstract

As the predominant pest of alfalfa, Odontothrips loti Haliday causes great damages over the major alfalfa-growing regions of China. The characteristics of strong mobility and fecundity make them develop rapidly in the field and hard to be controlled. There is a shortage of bioinformation and limited genomic resources available of O. loti for us to develop novel pest management strategies. In this study, we constructed a chromosome-level reference genome assembly of O. loti with a genome size of 346.59 Mb and scaffold N50 length of 18.52 Mb, anchored onto 16 chromosomes and contained 20128 genes, of which 93.59% were functionally annotated. The results of 99.20% complete insecta_odb10 genes in BUSCO analysis, 91.11% short reads mapped to the ref-genome, and the consistent tendency among the thrips in the distribution of gene length reflects the quality of genome. Our study provided the first report of genome for the genus Odontothrips, which offers a genomic resource for further investigations on evolution and molecular biology of O. loti, contributing to pest management.

Subject terms: Genome assembly algorithms, Agricultural genetics

Background & Summary

Odontothrips loti Haliday (Thysanoptera: Thripidae) is a destructive, oligophagous pest that mainly feeds on leguminous crops, particularly alfalfa Medicago sativa L.1,2. As the predominant pest of alfalfa, in North China, the major alfalfa-growing region, O. loti can cause damage to 70%~100% of plants on average3,4. Thrips attack the entire life cycle of the host plants, causing the plants to wilt or stop growing and the leaves to turn dry (Fig. 1), which not only leads to severe yield and forage quality reductions but also exacerbates the spread of plant viruses57. Several features of thrips such as small body size, cryptic behavior, and high fecundity make them difficult to control.

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3289_Fig1_HTML.jpg

Odontothrips loti (a), alfalfa with O. loti damage (b) and without O. loti damage (c).

Taking advantages of the low-cost of next generation sequencing (NGS) technology, researchers could identify functional genes related to virus transmission or pesticide resistance from the whole genome level through the construction of genome map, understand the evolution of pesticide resistance and virus transmission mechanisms, and control pest by gene regulation, making it possible to develop new pest management strategies815. As the genetic information of O. loti is still largely unknown currently, we aimed to disclose it for the development of novel O. loti control strategies.

In this study, we present a high-quality chromosome-level genome of O. loti, which was obtained using a combination of ONT long-read sequencing, Illumina short-read sequencing and chromosome conformation capture (Hi-C) technologies. Comparative genomic analysis was also performed on O. loti and another fourteen insect species to explore their phylogenetic relationship and genomic features. We provide the first genome assembly for a thrip in the Odontothrips genus to facilitate better understanding the genome evolution of thrips and developing novel control strategies for this important alfalfa pest.

Methods

Sample preparation

Odontothrips loti individuals were initially collected from the alfalfa field at Shangzhuang Experimental Station at the China Agricultural University (40°8’15”N, 116°11’18”E), and the colony was established and maintained for approximately 10 generations in the laboratory using the ‘Zhongmu No.1’ alfalfa at the temperature of 25 ± 1 °C, the relative humidity of 65 ± 5%, and the light: dark cycle of 16 h:8 h. The developmental stages of the thrips were examined under a light microscope. Individuals were collected, flash frozen in liquid nitrogen, and stored at −80 °C until use. Detailed information for O. loti sampling was shown in Table 1.

Table 1

Sample information of Odontothrips loti in this study.

SampleNymph /AdultSexThe number of thrips
DNA for surveyAdultFemale1
DNA for assemblyAdultFemale and male800
DNA for Hi-CAdultFemale and male800
RNA for annotationNymph and adultFemale and male240

Genomic DNA sequencing

For Illumina short-read sequencing, the genomic DNA was isolated from of a single female adult following Chen’s protocol16, briefly, using sodium dodecyl sulfate (SDS) and proteinase K digestion, followed by phenol-chloroform extraction. The library (150 bp inserts) was constructed with Nextera DNA Flex Library Prep Kit (Illumina, San Diego, CA, USA), and sequenced on the Illumina NovaSeq 6000 (Illumina, San Diego, CA, USA), generating 43.66 Gb of raw data with 150 bp pair-end reads. Adapters and low-quality short reads were removed by Fastp (v0.21.0)17 with default parameters, resulting in a total of 42.05 Gb (~123 × coverage) of clean data (Table 2). The short-read data was used for genome survey and assembly polish.

Table 2

Library sequencing data and methods used in this study to assemble the Odontothrips loti genome.

Sequencing strategyPlatformUsageInsertion sizeClean data (Gb)Coverage (X)
Short-readsIlluminaSurvey Assembly150 bp42.05123
Long-readsOxford NanoporeAssembly10–20Kb39.63116
Hi-CIlluminaHi-C assembly150 bp31.7893
RNA-seqOxford NanoporeAnnotation1–15Kb10.2430

For long-read genomic DNA sequencing, we used approximately 800 mixed-sex adult thrips. Genomic DNA was extracted using the SDS method16, and the DNA fragment size and the degree of degradation were checked on a 0.7% agarose gel. The purity and concentration of extracted DNA were determined with NanoDrop One (Thermo Fisher Scientific). The library was constructed with SQK-LSK109 kit (Oxford Nanopore Technologies, Oxford, UK) according to the manufacturer’s instructions and sequenced on the Oxford Nanopore PromethION platform (Oxford Nanopore Technologies, Oxford, UK). We obtained 41.19 Gb (~120 × coverage) of raw long-read data with mean length of 6,182.26 bp (N50 = 16,150 bp). We then used Oxford Nanopore GUPPY (v0.3.0, https://timkahlke.github.io/LongRead_tutorials/BS_G.html) to filter reads with quality score < 7 and obtained 39.63 Gb (~116 × coverage) of clean reads. The cleaned long-read data were used for contig-level genome assembly (Table 2).

Hi-C library preparation and sequencing

The Hi-C sequencing library was prepared with 800 mixed-sex adult thrips. Samples were cross-linked with a 2% formaldehyde isolation buffer and then treated with DpnII (New England Biolabs, Beijing, CN) to digest nuclei. Biotinylated nucleotides were used to repair tails, and the ligated DNA was split into fragments of 300–700 bp in length. The resulting Hi-C library was sequenced in Illumina Novoseq 6000 for 150 bp paired-end reads. After applying the same filter criteria for short reads, a total of 31.78 Gb (~93 × coverage) of clean data was generated to assist the chromosome-level assembly (Table 2).

ONT-Transcriptome sequencing

For ONT-transcriptome sequencing, approximately 240 thrips including nymph and adult were mixed for RNA extraction with the RNA Easy Fast Tissue/Cell Kit (Tiangen). NanoDrop (Thermo Fisher Scientific) and Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA) were used to evaluate the quality of extracted RNA. SQK-PCS109 and SQK-PBK004 kit (Oxford Nanopore Technologies) were used for reverse transcript and construction of cDNA library, and sequencing was proceeded on the PromethION sequencer (Oxford Nanopore Technologies, Oxford, UK). A total of 10.24 Gb of clean reads were generated with mean length of 1,034.61 bp (N50 = 1,238 bp), used to assist genome annotation (Table 2).

Estimation of genomic characteristics

Genomic characteristics were estimated based on 42.05 Gb of short-read data using a K-mer-based statistical analysis in Jellyfish (v2.3.0)18 and GenomeScope219 (p = 2, k = 19). Based on 19-mer depth analysis, the genome size and heterozygosity were estimated to be 341.3 Mb and 1.49%, respectively, therefore, this genome is considered highly heterozygous (Fig. 2).

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3289_Fig2_HTML.jpg

Characteristics of the Illumina short-read sequencing of the Odontothrips loti genome.

Genome assembly

Contig level assembly

We first used NextDenovo (v2.5.0)20 to generate a draft assembly, and conducted two rounds of polish with ONT long reads on Racon (v1.4.11, https://github.com/lbcb-sci/racon). Illumina reads were mapped to the assembly using BWA v0.7.17 and another two rounds of contig polishing were performed with Pilon (v1.23)21. Owing to its highly heterozygous feature, Purge_haplotigs (v1.0.4, https://github.com/skingan/purge_haplotigs_multiBAM) was applied to de-heterozygosis the draft genome to generate the final contig-level genome, which was 346.58-Mb long and similar to the estimated size, with the N50 contig length of 8.59 Mb (Table 3).

Table 3

Major indicators of the Odontothrips loti genome.

FeaturesValues
Estimated genome size (bp)341,303,860
Contig-level assembly size (bp)346,577,358
Chromosome-level assembly size (bp)346,592,158
Anchored to chromosome (bp)301,277,358
Contig N50(bp)8,588,564
Scaffold N50(bp)18,519,078

Hi-C scaffolding

Low-quality raw reads (quality score <20,length shorter than 30 bp) and adaptors were removed using Fastp (v0.21.0)17. The clean reads were then mapped to the contig assembly using HICUP (v0.8.0)22 to filter unmapped reads, invalid pairs, dangling end and repeats resulting from PCR amplification. The valid paired-end pairs were used for contig cluster, order and orient by ALLHIC (v0.9.8)23. The interaction between contig pairs were converted into binary files by 3D-DNA24 and Juicer (v1.6)25. The HiCExplorer (v3.6)26 was used to generate the heat maps of contig interaction intensity and location. The Juicebox (v1.11.08)27 was subsequently employed to review assembly manually. In summary, the resulting chromosome-level genome length was 346.59 Mb with a scaffold N50 of 18.52 Mb (Table 3), around 86.93% (301.28 Mb) of the genome bases were anchored onto 16 chromosomes (Fig. 3a), and most syntenic blocks of genome presents in the low GC content region (Fig. 3b).

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3289_Fig3_HTML.jpg

Heatmap of genome-wide Hi-C data and circular representation of the chromosomes of Odontothrips loti. (a) The heatmap of chromosome interactions in O. loti. The frequency of Hi-C interaction links is represented by colors, which ranges from yellow (low) to red (high). (b) Circos plot of distribution of the genomic elements in O. loti. The tracks indicate (i) length of the chromosome, (ii) gene density, (iii) distribution of transposable element (TE) density, and (iv) GC density. Center: intra-genomic syntenic blocks of O. loti. The densities of genes, TEs, and GC were calculated in 500 kb windows.

Predicting repeats

We used ReaptModeler (v.1.0.11, https://github.com/Dfam-consortium/RepeatModeler) to predict repeat sequence. LTR_FINDER (vOfficial, -size 1000000 -time 300)28 and LTR_retriever (v2.9.0)29 were used to find and de-redundant the LTR sequence. These two de novo library were combined with RepBase30 for further prediction by RepeatMasker (v4.0.9,-nolow -no_is -norna)31. RepeatProteinMask (-noLowSimple -pvalue 0.0001) was used for homo-prediction. All results were de-redundant and merged to the final repeat sequence. In summary, 115.26 Mb repeat sequences were identified, accounting for 33.26% of the O. loti genome (Table 4). Among these repeat sequences, most (18.85%) are DNA transposon, followed by 10.13% of long terminal repeats (LTRs), 3.45% of long interspersed nuclear elements (LINEs) and only 0.40% of short interspersed nuclear elements (SINEs) (Table 4).

Table 4

Statistics of the repeat sequences annotation in Odontothrips loti genome.

TypeLength (bp)Percentage in genome (%)
DNA65,317,63018.85
LTR35,092,75310.13
LINE11,957,0623.45
SINE1,382,4120.40
Unknown14,723,7064.25
Total115,261,57233.26

Protein-coding genes and functional predictions

We utilized a pipeline include three strategies: transcriptome-based prediction, homology-based prediction, and ab initio prediction to annotate protein coding genes. For transcriptome-based prediction, we use NanoFilt (v2.8.0, -q 7 -l 100 -headcrop 30 -minGC 0.3)32, Pychopper (v2.7.2, https://github.com/epi2me-labs/pychopper), racon (v1.4.11, https://github.com/lbcb-sci/racon), minimap2 (v2.17-r941)33, stringtie (v2.1.4)34 and TransDecoder (v5.1.0, https://github.com/TransDecoder/TransDecoder) for ONT-transcriptome reads to predicted protein-coding gene. For homology-based prediction, tblastn (v2.7.1)35 with an E-value cutoff of 1e-5 and Exonerate (v2.4)36 were used to predict gene structure by comparing with 3 closely related species (Megalurothrips usitatus, Thrips palmi, Frankliniella occidentalis) and model species Drosophila melanogaster. Before ab initio prediction, repetitive elements from the whole genome were soft-masked. Augustus (v3.3.2)37, GenScan (v1.0)38 and GlimmerHMM (v3.0.4)39 were used for de novo prediction. Finally, MAKER (v2.31.10)40 integrated the above three strategies, resulting in a non-redundant gene set, with weighting as default. Overall, 20,128 protein coding genes were obtained (Table 5).

Table 5

Statistics for the Odontothrips loti functionally annotated protein-coding genes.

DatabaseNumberPercentage (%)
Protein-coding genes20,128100.00
Annotated genes18,83793.59
Interproscan17,89588.91
NR16,36381.29
Uniprot16,24180.69
Pfam13,93269.22
GO12,22960.76
KEGG8,52742.36
Pathway4,80123.85
Unanotated genes1,2916.41

For functional annotation, protein sequences were aligned to Non-Redundant protein (NR), Universal Protein (Uniprot), Protein Families Analysis and Modeling (Pfam), Clusters of Orthologous Groups of proteins (COG), Kyoto Encyclopedia of Genes and Genomes (KEGG) and evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG) database. Gene Ontology (GO) terms was obtained from Uniport. InterProScan (v5.52-86.0)41 was used to search the conserved sequences, motifs and domains. There were 12,229 (60.76%) and 8,527 (42.36%) genes annotated to GO terms and KEGG pathways respectively. A total of 18,837 genes (93.59%) were annotated using at least one public database (Table 5).

Data Records

The assembly genome sequence and annotation data were deposited in Figshare42 and GenBank43. Raw data from Nanopore (CRR997575)44, Illumina (CRR997573)45 and Hi-C (CRR997574)46 genome sequencing and RNA-seq (CRR997576)47 were deposited in the Genome Sequence Archive (GSA, https://ngdc.cncb.ac.cn/gsa)48, and were related to the BioProject PRJCA022165.

Technical Validation

Genome quality assessment

We assessed the quality of chromosome-level genome from the three aspects: continuity, consistency, and completeness. First, the scaffold N50 of O. loti is 18.52 Mb (Table 3), representing the continuity of genome. Second, we evaluated the consistency of the genome by calculating the comparison rate and coverage of Illumina reads through BWA (v0.7.17)49, resulting 91.11% short reads were aligned to and covered 94.68% of the ref-genome. Third, we used BUSCO (v4.1.4)50 to estimate the completeness of chromosome-level genome by searching the 1367 BUSCO genes in insecta_odb10 (https://busco-data.ezlab.org/v5/data/lineages/). The results showed a high completeness level with 99.2%, 99.2%, 95.6%, 94.4% complete genes found in the contig-level genome, chromosome-level genome, annotated gene sets and protein-coding gene sets, respectively (Fig. 4).

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3289_Fig4_HTML.jpg

Benchmarking of genome completeness of Odontothrips loti genome assembly and annotation, evaluated by BUSCO based on insect_odb10 database which includes 1,367 genes. C: the number of complete genes, S: the number of complete and single-copy genes, D: the number of complete and duplicated genes, F: the number of incomplete genes, M: the number of missing genes.

Evaluation of gene prediction

To verify the accuracy and reliability of the gene prediction, we determined the distribution of gene length, CDS length, exon length and intron length in O. loti, D. melanogaster51 and other four related species (M. usitatus8, T. palmi12, F. occidentalis14, S. biformis13). The consistent tendency among the thrips supported an ideal annotated gene dataset in O. loti (Fig. 5).

An external file that holds a picture, illustration, etc.
Object name is 41597_2024_3289_Fig5_HTML.jpg

Annotated genes comparison of the distribution of (a) gene length (b) CDS length (c) exon length (d) intron length in Odontothrips loti with Drosophila melanogaster and four closely related species. The x-axis represents the length, and the y-axis represents the density of genes.

Acknowledgements

This work was supported by National Natural Science Foundation of China (no. 31971759 to B.L.), the Beijing Innovation Consortium of Modern Agricultural Industry Technology System (no. BAIC02-2024 to B.L.) and the Ningxia Province Sci-Tech Innovation Demonstration Program of High-Quality Agricultural Development and Ecological Conservation (no. NGSB-2021-15-04 to W.S.). We are grateful to Chaoyang Zhao (National Soil Dynamics Laboratory, USDA-ARS, Auburn, AL, USA) for guidance to improve the language of manuscript. The bioinformatics analysis is supported by High-performance Computing Platform of China Agricultural University.

Author contributions

B.L. conceived of this project. L.Y. and D.W. participated in the data analysis. L.Y., D.W., M.M., W.S., W.Y. and Z.R. collected the samples. L.Y. wrote the manuscript. L.Y. and B.L. revised the manuscript. All authors have read, revised, and approved the final manuscript for submission.

Code availability

All software and pipelines were executed according to the manual and protocols of the published bioinformatic tools. The version and code/parameters of software have been described in Methods section. No custom code was used.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1. Liu Y, Luo Y, Du L, Ban L. Antennal Transcriptome Analysis of Olfactory Genes and Characterization of Odorant Binding Proteins in Odontothrips loti (Thysanoptera: Thripidae) Int J Mol Sci. 2023;24:5284. 10.3390/ijms24065284. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
2. Liu Y, Li J, Ban L. Morphology and Distribution of Antennal Sensilla in Three Species of Thripidae (Thysanoptera) Infesting Alfalfa Medicago sativa. Insects. 2021;12:81. 10.3390/insects12010081. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
3. Buhe T, Wang X. Breeding research of the variety of anti-thrips alfalfa. Multifunctional Grasslands In A Changing World, Volume Ii Xxi International Grassland Congress And Viii International Rangeland Congress, Hohhot, China. 2008;29 E 5 Y:5–5. [Google Scholar]
4. Li N, Song X, Wang X. The complete mitochondrial genome of Odontothrips loti (Haliday, 1852) (Thysanoptera: Thripidae) Mitochondrial DNA B Resour. 2019;5:7–8. 10.1080/23802359.2019.1693296. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
5. Wu S, et al. A decade of a thrips invasion in China: lessons learned. Ecotoxicology. 2018;27:1032–1038. 10.1007/s10646-017-1864-6. [Abstract] [CrossRef] [Google Scholar]
6. Li J, et al. Occurrence, Distribution, and Transmission of Alfalfa Viruses in China. Viruses. 2022;14:1519. 10.3390/v14071519. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
7. Li J, et al. RNA-seq reveals plant virus composition and diversity in alfalfa, thrips, and aphids in Beijing, China. Arch Virol. 2021;166:1711–1722. 10.1007/s00705-021-05067-1. [Abstract] [CrossRef] [Google Scholar]
8. Ma L, et al. Chromosome-level genome assembly of bean flower thrips Megalurothrips usitatus (Thysanoptera: Thripidae) Sci Data. 2023;10:252. 10.1038/s41597-023-02164-5. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
9. Bao W, Kataoka Y, Fukada K, Sonoda S. Imidacloprid resistance of melon thrips, Thrips palmi, is conferred by CYP450-mediated detoxification. J. Pestic. Sci. 2015;40:65–68. 10.1584/jpestics.D15-004. [CrossRef] [Google Scholar]
10. Shi, P. et al. Variable resistance to spinetoram in populations of Thrips palmi across a small area unconnected to genetic similarity. Evolutionary Applications13, (2020). [Europe PMC free article] [Abstract]
11. Xue, B. & Sonoda, S. Resistance to cypermethrin in melon thrips, Thrips palmi (Thysanoptera: Thripidae), is conferred by reduced sensitivity of the sodium channel and CYP450-mediated detoxification. Applied Entomology and Zoology47, (2012).
12. Guo S, et al. Chromosome-level assembly of the melon thrips genome yields insights into evolution of a sap-sucking lifestyle and pesticide resistance. Molecular Ecology Resources. 2020;20:1110–1125. 10.1111/1755-0998.13189. [Abstract] [CrossRef] [Google Scholar]
13. Hu Q, Ye Z, Zhuo J, Li J-M, Zhang C. A chromosome-level genome assembly of Stenchaetothrips biformis and comparative genomic analysis highlights distinct host adaptations among thrips. Commun Biol. 2023;6:1–10. 10.1038/s42003-023-05187-1. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
14. Rotenberg, D. et al. Genome-enabled insights into the biology of thrips as crop pests. BMC Biology18, (2020). [Europe PMC free article] [Abstract]
15. Zhang, Z. et al. The Chromosome-Level Genome Assembly of Bean Blossom Thrips (Megalurothrips usitatus) Reveals an Expansion of Protein Digestion-Related Genes in Adaption to High-Protein Host Plants. Int J Mol Sci24, (2023). [Europe PMC free article] [Abstract]
16. Chen H, Rangasamy M, Tan SY, Wang H, Siegfried BD. Evaluation of Five Methods for Total DNA Extraction from Western Corn Rootworm Beetles. PLoS ONE. 2010;5:e11963. 10.1371/journal.pone.0011963. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
17. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. 10.1093/bioinformatics/bty560. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
18. Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k -mers. Bioinformatics. 2011;27:764–770. 10.1093/bioinformatics/btr011. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
19. Ranallo-Benavidez TR, Jaron KS, Schatz MC. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun. 2020;11:1432. 10.1038/s41467-020-14998-3. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
20. Hu, J. et al. An efficient error correction and accurate assembly tool for noisy long reads. Preprint at 10.1101/2023.03.09.531669 (2023). [Europe PMC free article] [Abstract]
21. Walker BJ, et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLOS ONE. 2014;9:e112963. 10.1371/journal.pone.0112963. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
22. Wingett S, et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 2015;4:1310. 10.12688/f1000research.7334.1. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
23. Zhang X, Zhang S, Zhao Q, Ming R, Tang H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants. 2019;5:833–845. 10.1038/s41477-019-0487-8. [Abstract] [CrossRef] [Google Scholar]
24. Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. 10.1126/science.aal3327. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
25. Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–98. 10.1016/j.cels.2016.07.002. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
26. Wolff J, et al. Galaxy HiCExplorer 3: a web server for reproducible Hi-C, capture Hi-C and single-cell Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 2020;48:W177–W184. 10.1093/nar/gkaa220. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
27. Durand NC, et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 2016;3:99–101. 10.1016/j.cels.2015.07.012. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
28. Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–268. 10.1093/nar/gkm286. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
29. Ou S, Jiang N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 2018;176:1410–1422. 10.1104/pp.17.01310. [Abstract] [CrossRef] [Google Scholar]
30. Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11. 10.1186/s13100-015-0041-9. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
31. Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009;Chapter 4:4.10.1–4.10.14. [Abstract] [Google Scholar]
32. De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34:2666–2669. 10.1093/bioinformatics/bty149. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
33. Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics. 2016;32:2103–2110. 10.1093/bioinformatics/btw152. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
34. Kovaka S, et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019;20:278. 10.1186/s13059-019-1910-1. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
35. Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. 10.1186/1471-2105-10-421. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
36. Slater GSC, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31. 10.1186/1471-2105-6-31. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
37. Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–644. 10.1093/bioinformatics/btn013. [Abstract] [CrossRef] [Google Scholar]
38. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997;268:78–94. 10.1006/jmbi.1997.0951. [Abstract] [CrossRef] [Google Scholar]
39. Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23:673–679. 10.1093/bioinformatics/btm009. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
40. Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 2011;12:491. 10.1186/1471-2105-12-491. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
41. Blum M, et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021;49:D344–D354. 10.1093/nar/gkaa977. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
42. Luo Y. 2024. Chromosome-level reference genome assembly of O. loti. figshare. [CrossRef]
43. Luo Y, Ban L. 2024. Chromosome-level genome assembly of Odontothrips loti Haliday (Thysanoptera: Thripidae) GenBank. JAZGLN000000000
44. 2024. NGDC Genome Sequence Archive (GSA) https://ngdc.cncb.ac.cn/gsa/browse/CRA014018/CRR997575
45. 2024. NGDC Genome Sequence Archive (GSA) https://ngdc.cncb.ac.cn/gsa/browse/CRA014018/CRR997573
46. 2024. NGDC Genome Sequence Archive (GSA) https://ngdc.cncb.ac.cn/gsa/browse/CRA014018/CRR997574
48. Chen T, et al. The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genomics, Proteomics & Bioinformatics. 2021;19:578–583. 10.1016/j.gpb.2021.08.001. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]
49. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at 10.48550/arXiv.1303.3997 (2013).
50. Seppey M, Manni M, Zdobnov EM. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods Mol Biol. 2019;1962:227–245. 10.1007/978-1-4939-9173-0_14. [Abstract] [CrossRef] [Google Scholar]
51. Hoskins RA, et al. The Release 6 reference sequence of the Drosophila melanogaster genome. Genome Res. 2015;25:445–458. 10.1101/gr.185579.114. [Europe PMC free article] [Abstract] [CrossRef] [Google Scholar]

Articles from Scientific Data are provided here courtesy of Nature Publishing Group

Citations & impact 


Impact metrics

Jump to Citations

Article citations

Data 


Data behind the article

This data has been text mined from the article, or deposited into data resources.

Similar Articles 


To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.


    Funding 


    Funders who supported this work.

    Beijing Innovation Consortium of Modern Agricultural Industry Technology System

      National Natural Science Foundation of China (1)

      National Natural Science Foundation of China (National Science Foundation of China) (1)

      The Ningxia Province Sci-Tech Innovation Demonstration Program of High-Quality Agricultural Development and Ecological Conservation