Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Effect of RNA-Seq data normalization on protein interactome mapping for Alzheimer’s disease

Published: 01 April 2024 Publication History

Abstract

High throughput RNA sequencing brings new perspective to the elucidation of molecular mechanisms of diseases. Normalization is the first and most important step for RNA-Seq data, and it can differ based on the purpose of the analysis. Within-sample normalization methods (eg. TPM) are preferred when genes in a sample are compared with each other, and between-sample normalization methods (eg. deseq2, TMM, Voom) are used when the samples in a dataset are compared. Normalization approaches rescale the data, and, therefore, they affect the results of the analysis. Here, we selected two most commonly used Alzheimer’s disease RNA-Seq datasets from ROSMAP and Mayo Clinic cohorts and mapped the differentially expressed genes on human protein interactome to discover disease-specific subnetworks. To this end, the raw count data were first processed with four different, commonly used RNA-Seq normalization methods (deseq2, TMM, Voom and TPM). Then, covariate adjustment was applied to the normalized data for gender, age of death and post-mortem interval. Each normalized dataset was separately mapped on the human protein-protein interaction network either in covariate-adjusted or non-adjusted form. Capturing known Alzheimer’s disease genes and genes associated with the disease-related functional terms in the discovered subnetworks were the criteria to compare different normalization methods. Based on our results, applying covariate adjustment has a positive effect on normalization by removing the confounder effects. Covariate-adjusted TMM and covariate-adjusted deseq2 methods performed better in both transcriptome datasets.

Graphical Abstract

Display Omitted

Highlights

RNA-Seq normalization methods are benchmarked via PPI networks for the first time.
Covariate adjustment leads to better representation of dysregulated AD mechanisms.
Covariate-adjusted TMM performed the best for two AD datasets among 8 alternatives.

References

[1]
Z.B. Abrams, T.S. Johnson, K. Huang, P.R.O. Payne, K. Coombes, A protocol to evaluate RNA sequencing normalization methods, BMC Bioinforma. 20 (Suppl 24) (2019) 1–7,.
[2]
N. Alcaraz, H. Kücük, J. Weile, A. Wipat, J. Baumbach, Keypathwayminer: Detecting case-specific biological pathways using expression data, Internet Math. 7 (4) (2011) 299–313,.
[3]
N. Alcaraz, J. Pauling, R. Batra, E. Barbosa, A. Junge, A.G.L. Christensen, V. Azevedo, H.J. Ditzel, J. Baumbach, KeyPathwayMiner 4.0: Condition-specific pathway analysis by combining multiple omics studies and networks with Cytoscape, BMC Syst. Biol. 8 (1) (2014) 4–9,.
[4]
M. Allen, M.M. Carrasquillo, C. Funk, B.D. Heavner, F. Zou, C.S. Younkin, J.D. Burgess, H.S. Chai, J. Crook, J.A. Eddy, H. Li, B. Logsdon, M.A. Peters, K.K. Dang, X. Wang, D. Serie, C. Wang, T. Nguyen, S. Lincoln, …., N. Ertekin-Taner, Human whole genome genotype and transcriptome data for Alzheimer’s and other neurodegenerative diseases, Sci. Data 3 (1) (2016) 10,.
[5]
D. Beisser, G.W. Klau, T. Dandekar, T. Müller, M.T. Dittrich, BioNet: an R-Package for the functional analysis of biological networks, Bioinforma. (Oxf., Engl. ) 26 (8) (2010) 1129–1130,.
[6]
A.M. Bolger, M. Lohse, B. Usadel, Trimmomatic: A flexible trimmer for Illumina sequence data, Bioinformatics 30 (15) (2014) 2114–2120,.
[7]
P.L. De Jager, Y. Ma, C. McCabe, J. Xu, B.N. Vardarajan, D. Felsky, H.U. Klein, C.C. White, M.A. Peters, B. Lodgson, P. Nejad, A. Tang, L.M. Mangravite, L. Yu, C. Gaiteri, S. Mostafavi, J.A. Schneider, D.A. Bennett, Data descriptor: A multi-omic atlas of the human frontal cortex for aging and Alzheimer’s disease research, Sci. Data 5 (1) (2018) 13,.
[8]
A. Dobin, C.A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, T.R. Gingeras, STAR: Ultrafast universal RNA-seq aligner, Bioinformatics 29 (1) (2013) 15–21,.
[9]
S. Durinck, Y. Moreau, A. Kasprzyk, S. Davis, B. De Moor, A. Brazma, W. Huber, BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis, Bioinforma. (Oxf., Engl. ) 21 (16) (2005) 3439–3440,.
[10]
C. Evans, J. Hardin, D.M. Stoebel, Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions, Brief. Bioinforma. 19 (5) (2018) 776–792,.
[11]
M. Greenacre, P.J.F. Groenen, T. Hastie, A.I. D’Enza, A. Markos, E. Tuzhilina, Principal component analysis, Nat. Rev. Methods Prim. 2 (1) (2022) 100,.
[12]
C. Guo, H.-H. Jeong, Y.-C. Hsieh, H.-U. Klein, D.A. Bennett, P.L. De Jager, Z. Liu, J.M. Shulman, Tau Activates Transposable Elements in Alzheimer’s Disease, Cell Rep. 23 (10) (2018) 2874–2880,.
[13]
H. Han, X. Jiang, Disease Biomarker Query from RNA-Seq Data, Cancer Inform. 13 (Suppl 1) (2014) 81–94,.
[14]
H. Han, K. Men, How does normalization impact RNA-seq disease diagnosis?, J. Biomed. Inform. 85 (July) (2018) 80–92,.
[15]
C.W. Law, Y. Chen, W. Shi, G.K. Smyth, Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts, Genome Biol. 15 (2) (2014) 1–17,.
[16]
C.W. Law, K. Zeglinski, X. Dong, M. Alhamdoosh, G.K. Smyth, M.E. Ritchie, A guide to creating design matrices for gene expression experiments, F1000Research 9 (2020) 1444,.
[17]
X. Li, G.N. Brock, E.C. Rouchka, N.G.F. Cooper, D. Wu, T.E. OToole, R.S. Gill, A.M. Eteleeb, L. O’Brien, S.N. Rai, A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data, PLoS ONE 12 (5) (2017) 1–22,.
[18]
Y. Liao, G.K. Smyth, W. Shi, featureCounts: an efficient general purpose program for assigning sequence reads to genomic features, Bioinforma. (Oxf., Engl. ) 30 (7) (2014) 923–930,.
[19]
M.I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2, Genome Biol. 15 (12) (2014) 1–21,.
[20]
S. Mostafavi, C. Gaiteri, S.E. Sullivan, C.C. White, S. Tasaki, J. Xu, M. Taga, H.-U. Klein, E. Patrick, V. Komashko, C. McCabe, R. Smith, E.M. Bradshaw, D.E. Root, A. Regev, L. Yu, L.B. Chibnik, J.A. Schneider, T.L. Young-Pearse, …., P.L. De Jager, A molecular network of the aging human brain provides insights into the pathology and cognitive decline of Alzheimer’s disease, Nat. Neurosci. 21 (6) (2018) 811–819,.
[21]
R.A. Neff, M. Wang, S. Vatansever, L. Guo, C. Ming, Q. Wang, E. Wang, E. Horgusluoglu-Moloch, W.M. Song, A. Li, E.L. Castranio, T.C.W. Julia, L. Ho, A. Goate, V. Fossati, S. Noggle, S. Gandy, M.E. Ehrlich, P. Katsel, B. Zhang, Molecular subtyping of Alzheimer’s disease using RNA sequencing data reveals novel mechanisms and targets, Sci. Adv. 7 (2) (2021) 1–18,.
[22]
Y. Nguyen, D. Nettleton, H. Liu, C.K. Tuggle, Detecting Differentially Expressed Genes with RNA-seq Data Using Backward Selection to Account for the Effects of Relevant Covariates, J. Agric., Biol., Environ. Stat. 20 (4) (2015) 577–597,.
[23]
V. Raghavan, L. Kraft, F. Mesny, L. Rigerte, A simple guide to de novo transcriptome assembly and annotation, Brief. Bioinforma. 23 (2) (2022) 1–30,.
[24]
J. Reimand, T. Arak, P. Adler, L. Kolberg, S. Reisberg, H. Peterson, J. Vilo, g:Profiler-a web server for functional interpretation of gene lists (2016 update), Nucleic Acids Res. 44 (W1) (2016) W83–W89,.
[25]
M.E. Ritchie, B. Phipson, D. Wu, Y. Hu, C.W. Law, W. Shi, G.K. Smyth, Limma powers differential expression analyses for RNA-sequencing and microarray studies, Nucleic Acids Res. 43 (7) (2015),.
[26]
M.D. Robinson, D.J. McCarthy, G.K. Smyth, edgeR: a Bioconductor package for differential expression analysis of digital gene expression data, Bioinforma. (Oxf., Engl. ) 26 (1) (2010) 139–140,.
[27]
M.D. Robinson, A. Oshlack, A scaling normalization method for differential expression analysis of RNA-seq data, Genome Biol. 11 (3) (2010) R25,.
[28]
L. Song, Y.T. Yang, Q. Guo, X.-M. Zhao, Cellular transcriptional alterations of peripheral blood in Alzheimer’s disease, BMC Med. 20 (1) (2022) 1–13,.
[29]
C. Stark, B.J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, M. Tyers, BioGRID: a general repository for interaction datasets, Nucleic Acids Res. 34 (Database issue) (2006) 535–539,.
[30]
E.V. Todd, M.A. Black, N.J. Gemmell, The power and promise of RNA-seq in ecology and evolution, Mol. Ecol. 25 (6) (2016) 1224–1241,.
[31]
V.R. Varma, H. Büşra Lüleci, A.M. Oommen, S. Varma, C.T. Blackshear, M.E. Griswold, Y. An, J.A. Roberts, R. O’Brien, O. Pletnikova, J.C. Troncoso, D.A. Bennett, T. Çakır, C. Legido-Quigley, M. Thambisetty, Abnormal brain cholesterol homeostasis in Alzheimer’s disease—a targeted metabolomic and transcriptomic study, Npj Aging Mech. Dis. 7 (1) (2021),.
[32]
T. Wu, J. Wang, C. Liu, Y. Zhang, B. Shi, X. Zhu, Z. Zhang, G. Skogerbø, L. Chen, H. Lu, Y. Zhao, R. Chen, NPInter: the noncoding RNAs and protein related biomacromolecules interaction database, Nucleic Acids Res. 34 (Database issue) (2006) 150–152,.
[33]
J. Zyprych-Walczak, A. Szabelska, L. Handschuh, K. Górczak, K. Klamecka, M. Figlerowicz, I. Siatkowski, The Impact of Normalization Methods on RNA-Seq Data Analysis, BioMed. Res. Int. 2015 (2015),.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computational Biology and Chemistry
Computational Biology and Chemistry  Volume 109, Issue C
Apr 2024
207 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 April 2024

Author Tags

  1. RNA-Seq
  2. Data normalization
  3. Alzheimer’s disease, Covariate adjustment
  4. Protein-protein interactions

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media