Gene Expression Datasets for Two Versions of the Saccharum spontaneum AP85-441 Genome
<p>Representation of the workflow, input files, processing steps, and output (i.e., databases, intermediate files, and matrices) adopted in the effort of combining the datasets v2018 and v2019 into a single consolidated dataset.</p> "> Figure 2
<p>Histogram contour of the expression data (FPKM) of v2018 and v2019: (<b>left</b>) allele expression and (<b>right</b>) gene expression.</p> "> Figure 3
<p>Boxplot of the distribution of expression values for alleles (v2018a, v2019a) and genes (v2018g, v2019g) data. Vertical magnitudes are displayed in logarithmic axis: (<b>top</b>) including all expression values and (<b>bottom</b>) including only expression values greater than 0.001.</p> ">
Abstract
:1. Summary
2. Data Description
3. Methods
3.1. Data Sources
3.2. Pre-Processing
3.3. Gene Expression Consolidation
3.3.1. Blast
3.3.2. Optimized Matching
- Each CDS in v2018 is represented as a node u in the group S of sources.
- Each CDS in v2019 is represented as a node v in the group T of targets.
- If node and node appear as a match in either of the mappings, then they are connected by an edge with capacity 1 and cost , corresponding to the highest BLAST identity value between them.
- An additional source node is added and edges , for each , are created with capacity 1 and cost 0, ensuring that each CDS in v2018 can be used at most once.
- An additional target node is added and edges , for each , are created with infinite capacity and cost 0, ensuring that each CDS in v2019 can be used several times.
- Finally, a min-cost max-flow algorithm is executed taking as the source node and as the sink node. This will have the effect of most nodes in S being used and each node in T having at least one possible incoming connection.
3.3.3. Gene Expression Consolidation
3.4. Metadata
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
CDS | Coding DNA Sequence |
ASE | Allele-Specific Expression |
CSV | Comma-Separated Value |
FPKM | Fragments Per Kilobase of exon per million Mapped |
OMICAS | Optimización Multiescala In-silico de Cultivos Agrícolas Sostenibles |
References
- Henry, R.J.; Kole, C. Basic information on the sugarcane plant. In Genetics, Genomics and Breeding of Sugarcane, 1st ed.; CRC Press: Boca Raton, FL, USA, 2010; Volume 9, pp. 1–8. [Google Scholar] [CrossRef]
- Kim, C.; Wang, X.; Lee, T.H.; Jakob, K.; Lee, G.J.; Paterson, A.H. Comparative analysis of Miscanthus and Saccharum reveals a shared whole-genome duplication but different evolutionary fates. Plant Cell 2014, 26, 2420–2429. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, J.; Zhang, X.; Tang, H.; Zhang, Q.; Hua, X.; Ma, X.; Zhu, F.; Jones, T.; Zhu, X.; Bowers, J.; et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 2018, 50, 1565–1573. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Saccharum Genome Database. 2018. Available online: http://sugarcane.zhangjisenlab.cn/sgd/html/download.html (accessed on 7 August 2022).
- The Ming Laboratory, Saccharum Spontaneum AP85-441 Genome. 2019. Available online: https://www.life.illinois.edu/ming/downloads/Spontaneum_genome/ (accessed on 7 August 2022).
- Cai, M.; Lin, J.; Li, Z.; Lin, Z.; Ma, Y.; Wang, Y.; Ming, R. Allele specific expression of Dof genes responding to hormones and abiotic stresses in sugarcane. PLoS ONE 2020, 15, 1–24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ma, P.; Yuan, Y.; Shen, Q.; Jiang, Q.; Hua, X.; Zhang, Q.; Zhang, M.; Ming, R.; Zhang, J. Evolution and Expression Analysis of Starch Synthase Gene Families in Saccharum spontaneum. Trop. Plant Biol. 2019, 12, 158–173. [Google Scholar] [CrossRef]
- Lin, J.; Zhu, M.; Cai, M.; Zhang, W.; Fatima, M.; Jia, H.; Li, F.; Ming, R. Identification and Expression Analysis of TCP Genes in Saccharum spontaneum L. Trop. Plant Biol. 2019, 12, 206–218. [Google Scholar] [CrossRef]
- Li, Z.; Hua, X.; Zhong, W.; Yuan, Y.; Wang, Y.; Wang, Z.; Ming, R.; Zhang, J. Genome-Wide Identification and Expression Profile Analysis of WRKY Family Genes in the Autopolyploid Saccharum spontaneum. Plant Cell Physiol. 2019, 61, 616–630. [Google Scholar] [CrossRef] [PubMed]
- Li, P.; Chai, Z.; Lin, P.; Huang, C.; Huang, G.; Xu, L.; Deng, Z.; Zhang, M.; Zhang, Y.; Zhao, X. Genome-wide identification and expression analysis of AP2/ERF transcription factors in sugarcane (Saccharum spontaneum L.). BMC Genom. 2020, 21, 685. [Google Scholar] [CrossRef] [PubMed]
- Feng, X.; Wang, Y.; Zhang, N.; Zhang, X.; Wu, J.; Huang, Y.; Ruan, M.; Zhang, J.; Qi, Y. Systematic Identification, Evolution and Expression Analysis of the SPL Gene Family in Sugarcane (Saccharum spontaneum). Trop. Plant Biol. 2021, 14, 313–328. [Google Scholar] [CrossRef]
- Ali, A.; Javed, T.; Zaheer, U.; Zhou, J.R.; Huang, M.T.; Fu, H.Y.; Gao, S.J. Genome-Wide Identification and Expression Profiling of the bHLH Transcription Factor Gene Family in Saccharum spontaneum Under Bacterial Pathogen Stimuli. Trop. Plant Biol. 2021, 14, 283–294. [Google Scholar] [CrossRef]
- Trujillo-Montenegro, J.H.; Cubillos, M.J.R.; Loaiza, C.D.; Quintero, M.; Espitia-Navarro, H.F.; Villareal, F.A.S.; Valens, C.A.V.; Barrios, A.F.G.; Vega, J.D.; Duitama, J.; et al. Unraveling the genome of a high yielding colombian sugarcane hybrid. Front. Plant Sci. 2021, 12, 694859. [Google Scholar] [CrossRef] [PubMed]
- Souza, G.M.; Sluys, M.A.V.; Lembke, C.G.; Lee, H.; Margarido, G.R.A.; Hotta, C.T.; Gaiarsa, J.W.; Diniz, A.L.; Oliveira, M.D.M.; Ferreira, S.D.S.; et al. Assembly of the 373k gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world’s leading biomass crop. GigaScience 2019, 8, giz129. [Google Scholar] [CrossRef] [PubMed]
- Margarido, G.R.A.; Correr, F.H.; Furtado, A.; Botha, F.C.; Henry, R.J. Limited allele-specific gene expression in highly polyploid sugarcane. Genome Res. 2022, 32, 297–308. [Google Scholar] [CrossRef] [PubMed]
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef] [PubMed]
- Zhu, T.; Wang, L.; Rimbert, H.; Rodriguez, J.C.; Deal, K.R.; Oliveira, R.D.; Choulet, F.; Keeble-Gagnère, G.; Tibbits, J.; Rogers, J.; et al. Optical maps refine the bread wheat Triticum aestivum cv. Chinese Spring genome assembly. Plant J. 2021, 107, 303–314. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Tang, H.; Debarry, J.D.; Tan, X.; Li, J.; Wang, X.; Lee, T.H.; Jin, H.; Marler, B.; Guo, H.; et al. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012, 40, e49. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bray, N.L.; Pimentel, H.; Melsted, P.; Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 2016, 34, 525–527. [Google Scholar] [CrossRef] [PubMed]
- Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms; MIT Press: Cambridge, MA, USA, 2022. [Google Scholar]
Matrix Name | Number of Alleles/Genes | Reference |
---|---|---|
Alleles v2018 | 112,788 | [3] |
Genes v2018 | 109,050 | This data descriptor |
Alleles v2019 | 83,821 | This data descriptor |
Genes v2019 | 35,516 | This data descriptor |
All Values | Values > 0.001 | ||||
---|---|---|---|---|---|
Matrix Name | Max Value | Count | Median | Count | Median |
Alleles v2018 | 28,579 | 10,827,650 | 0.300 | 6,809,283 | 1.720 |
Genes v2018 | 28,579 | 10,468,800 | 0.320 | 6,621,587 | 1.761 |
Alleles v2019 | 20,135 | 8,046,816 | 0.127 | 4,460,747 | 2.117 |
Genes v2019 | 32,282 | 3,409,536 | 0.744 | 2,277,266 | 3.963 |
Specifications | Description |
---|---|
Subject area | Biological science, computer science |
More specific subject area | Bioinformatics, Genomics, Sugarcane, Expression analysis |
Type of data | Data spreadsheets, plain text, Python code |
How data was acquired | Compiled from open access databases and websites |
Data source location | Global |
Data accesibility | The data presented in this article is freely and publicly available for any academic, educational, and research purpose. The public repository is located at https://github.com/mauriciogeteg/sugarcane-gene-expression (accessed on 3 December 2022) |
Folders included in the dataset | 4 (blast_results; Codes; figures; inputs) |
Files included as inputs | 7 (Allele_info_2018.csv; Allele_info_2019.csv; circadian.7z; growth.xlsx; Leaf_Section.7z; Sspon.v20180123.cds.fasta.7z; Sspon.v20190103.cds.fasta.7z) |
Files included as blast_results | 2 (blastn_2018_2019.txt; blastn_2019_2018.txt) |
Files included as figures | 3 (Figure1.pdf; Figure2.pdf; Figure3.pdf) |
Codes included in the dataset | 3 (allele_transform.py; condense_alleles_18.py; condense_alleles_19.py) |
Expression matrices | 4 (Sspon18_allele_.7z; Sspon18_gene_.7z; 2 compressed matrices in: Sspon19_.7z) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
López-Rozo, N.; Ramirez-Castrillon, M.; Romero, M.; Finke, J.; Rocha, C. Gene Expression Datasets for Two Versions of the Saccharum spontaneum AP85-441 Genome. Data 2023, 8, 1. https://doi.org/10.3390/data8010001
López-Rozo N, Ramirez-Castrillon M, Romero M, Finke J, Rocha C. Gene Expression Datasets for Two Versions of the Saccharum spontaneum AP85-441 Genome. Data. 2023; 8(1):1. https://doi.org/10.3390/data8010001
Chicago/Turabian StyleLópez-Rozo, Nicolás, Mauricio Ramirez-Castrillon, Miguel Romero, Jorge Finke, and Camilo Rocha. 2023. "Gene Expression Datasets for Two Versions of the Saccharum spontaneum AP85-441 Genome" Data 8, no. 1: 1. https://doi.org/10.3390/data8010001
APA StyleLópez-Rozo, N., Ramirez-Castrillon, M., Romero, M., Finke, J., & Rocha, C. (2023). Gene Expression Datasets for Two Versions of the Saccharum spontaneum AP85-441 Genome. Data, 8(1), 1. https://doi.org/10.3390/data8010001