Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Application of data augmentation techniques towards metabolomics

Published: 01 September 2022 Publication History

Abstract

Niemann–Pick Class 1 (NPC1) disease is a rare and debilitating neurodegenerative lysosomal storage disease (LSD). Metabolomics datasets of NPC1 patients available to perform this type of analysis are often limited in the number of samples and severely unbalanced. In order to improve the predictive capability and identify new biomarkers in an NPC1 disease urinary dataset, data augmentation (DA) techniques based on computational intelligence have been employed to create synthetic samples, i.e. the addition of noise, oversampling techniques and conditional generative adversarial networks. These techniques have been used to evaluate their predictive capacities on a set of urine samples donated by 13 untreated NPC1 disease and 47 heterozygous (parental) carrier control participants. Results on the prediction have also been obtained using different machine learning classification models and the partial least squares techniques. These results provide strong evidence for the ability of DA techniques to generate good quality synthetic data. Results acquired show increases in sensitivity of 20%–50%, an F1 score of 6%–30%, and a predictive capacity of 0.3 (out of 1). Additionally, more conventional forms of multivariate data analysis have been employed. These have allowed the detection of unusual urinary metabolite profiles, and the identification of biomarkers through the use of synthetically augmented datasets. Results indicate that urinary branched-chain amino acids such as valine, 3-aminoisobutyrate and quinolinate, may be employable as valuable biomarkers for the diagnosis and prognostic monitoring of NPC1 disease.

Highlights

Niemann–Pick type C is a very rare neurodegenerative lysosomal storage disease.
Niemann–Pick type C metabolomics datasets are often scarce, containing few samples.
Data Augmentation techniques were applied to create additional synthetic samples.
Prediction performance shows a significant improvement in sensitivity (20%–50%).
DA techniques allow the identification of relevant urinary metabolomics biomarkers.

References

[1]
Vanier M.T., Niemann-pick disease type C, Orphanet J. Rare Dis. 5 (1) (2010) 1–18,.
[2]
Geberhiwot T., Moro A., Dardis A., Ramaswami U., Sirrs S., Marfa M.P., Vanier M.T., Walterfang M., Bolton S., Dawson C., et al., Consensus clinical management guidelines for Niemann-Pick disease type C, Orphanet J. Rare Dis. 13 (1) (2018) 1–19,.
[3]
Winkler M.B., Kidmose R.T., Szomek M., Thaysen K., Rawson S., Muench S.P., Wüstner D., Pedersen B.P., Structural insight into eukaryotic sterol transport through Niemann-Pick type C proteins, Cell 179 (2) (2019) 485–497,.
[4]
Platt F.M., d’Azzo A., Davidson B.L., Neufeld E.F., Tifft C.J., Lysosomal storage diseases, Nat. Rev. Dis. Primers 4 (1) (2018) 1–25,.
[5]
Lloyd-Evans E., Morgan A.J., He X., Smith D.A., Elliot-Smith E., Sillence D.J., Churchill G.C., Schuchman E.H., Galione A., Platt F.M., Niemann-Pick disease type C1 is a sphingosine storage disease that causes deregulation of lysosomal calcium, Nature Med. 14 (11) (2008) 1247,.
[6]
Cougnoux A., Cluzeau C., Mitra S., Li R., Williams I., Burkert K., Xu X., Wassif C., Zheng W., Porter F., Necroptosis in Niemann–Pick disease, type C1: A potential therapeutic target, Cell Death Dis. 7 (3) (2016) e2147,.
[7]
Grootveld M., Silwood C.J.L., 1H NMR analysis as a diagnostic probe for human saliva, Biochem. Biophys. Res. Commun. 329 (1) (2005) 1–5,.
[8]
Ruiz-Rodado V., Marcos Luque-Baena R., te Vruchte D., Probert F., H Lachmann R., J Hendriksz C., E Wraith J., Imrie J., Elizondo D., Sillence D., et al., 1H NMR-linked urinary metabolic profiling of niemann-pick class C1 (NPC1) disease: Identification of potential new biomarkers using correlated component regression (CCR) and genetic algorithm (GA) analysis strategies, Curr. Metabol. 2 (2) (2014) 88–121.
[9]
He K., Zhang X., Ren S., Sun J., Deep residual learning for image recognition, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778,.
[10]
Sandfort V., Yan K., Pickhardt P.J., Summers R.M., Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks, Sci. Rep. 9 (1) (2019) 1–9,.
[11]
Waheed A., Goyal M., Gupta D., Khanna A., Al-Turjman F., Pinheiro P.R., CovidGAN: Data augmentation using auxiliary classifier gan for improved covid-19 detection, IEEE Access 8 (2020) 91916–91923,.
[12]
Yoo T.K., Choi J.Y., Kim H.K., A generative adversarial network approach to predicting postoperative appearance after orbital decompression surgery for thyroid eye disease, Comput. Biol. Med. 118 (2020),.
[13]
Marzullo A., Moccia S., Catellani M., Calimeri F., De Momi E., Towards realistic laparoscopic image generation using image-domain translation, Comput. Methods Programs Biomed. 200 (2021),.
[14]
Liu Z., Zhao H., Fang X., Huo D., Abdominal computed tomography localizer image generation: A deep learning approach, Comput. Methods Programs Biomed. (2021),.
[15]
Zur R.M., Jiang Y., Pesce L., Drukker K., Noise injection for training artificial neural networks: A comparison with weight decay and early stopping, Med. Phys. 36 (10) (2009) 4810–4818,.
[16]
Moreno-Barea F.J., Strazzera F., Jerez J.M., Urda D., Franco L., Forward noise adjustment scheme for data augmentation, in: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, pp. 728–734,.
[17]
Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P., SMOTE: Synthetic minority over-sampling technique, J. Artificial Intelligence Res. 16 (2002) 321–357,.
[18]
Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y., Generative adversarial nets, in: Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.
[19]
Radford A., Metz L., Chintala S., Unsupervised representation learning with deep convolutional generative adversarial networks, 2015, CoRR, arXiv:1511.06434.
[20]
Douzas G., Bacao F., Effective data generation for imbalanced learning using conditional generative adversarial networks, Expert Syst. Appl. 91 (2018) 464–471,.
[21]
Moreno-Barea F.J., Jerez J.M., Franco L., Improving classification accuracy using data augmentation on small data sets, Expert Syst. Appl. 161 (2020),.
[22]
Liu Y., Zhou Y., Liu X., Dong F., Wang C., Wang Z., Wasserstein GAN-based small-sample augmentation for new-generation artificial intelligence: A case study of cancer-staging data in biology, Engineering 5 (1) (2019) 156–163,.
[23]
Marouf M., Machart P., Bansal V., Kilian C., Magruder D.S., Krebs C.F., Bonn S., Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks, Nature Commun. 11 (1) (2020) 1–12,.
[24]
García-Ordás M.T., Benavides C., Benítez-Andrades J.A., Alaiz-Moretón H., García-Rodríguez I., Diabetes detection using deep learning techniques with oversampling and feature augmentation, Comput. Methods Programs Biomed. 202 (2021),.
[25]
Barile B., Marzullo A., Stamile C., Durand-Dubief F., Sappey-Marinier D., Data augmentation using generative adversarial neural networks on brain structural connectivity in multiple sclerosis, Comput. Methods Programs Biomed. 206 (2021),.
[26]
Frid-Adar M., Diamant I., Klang E., Amitai M., Goldberger J., Greenspan H., GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification, Neurocomputing 321 (2018) 321–331,.
[27]
Han C., Rundo L., Araki R., Furukawa Y., Mauri G., Nakayama H., Hayashi H., Infinite brain MR images: PGGAN-based data augmentation for tumor detection, in: Neural Approaches to Dynamics of Signal Exchanges, Springer, 2020, pp. 291–303,.
[28]
Chen Y., Yang X.-H., Wei Z., Heidari A.A., Zheng N., Li Z., Chen H., Hu H., Zhou Q., Guan Q., Generative adversarial networks in medical image augmentation: A review, Comput. Biol. Med. (2022),.
[29]
Açıcı K., Aşuroğlu T., Erdaş Ç.B., Oğul H., T4SS effector protein prediction with deep learning, Data 4 (1) (2019) 45,.
[30]
Beinecke J., Heider D., Gaussian noise up-sampling is better suited than SMOTE and ADASYN for clinical decision making, BioData Min. 14 (1) (2021) 1–11,.
[31]
Shah J., Brock G.N., Gaskins J., Bayesmetab: Treatment of missing values in metabolomic studies using a Bayesian modeling approach, BMC Bioinformatics 20 (24) (2019) 1–13,.
[32]
Rodrigues J., Amin A., Raghushaker C.R., Chandra S., Joshi M.B., Prasad K., Rai S., Nayak S.G., Ray S., Mahato K.K., Exploring photoacoustic spectroscopy-based machine learning together with metabolomics to assess breast tumor progression in a xenograft model ex vivo, Lab. Invest. 101 (7) (2021) 952–965,.
[33]
Wishart D.S., Feunang Y.D., Marcu A., Guo A.C., Liang K., Vázquez-Fresno R., Sajed T., Johnson D., Li C., Karu N., et al., HMDB 4.0: The human metabolome database for 2018, Nucleic Acids Res. 46 (D1) (2018) D608–D617.
[34]
Zhu J.-Y., Park T., Isola P., Efros A.A., Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223–2232,.
[35]
Mirza M., Osindero S., Conditional generative adversarial nets, 2014, CoRR, arXiv:1411.1784.
[36]
M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, in: International Conference on Machine Learning, 2017, pp. 214–223.
[37]
Karras T., Aila T., Laine S., Lehtinen J., Progressive growing of GANs for improved quality, stability, and variation, 2017, CoRR, arXiv:1710.10196.
[38]
Xu B., Wang N., Chen T., Li M., Empirical evaluation of rectified activations in convolutional network, 2015, CoRR, arXiv:1505.00853.
[39]
S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in: International Conference on Machine Learning, 2015, pp. 448–456.
[40]
A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural network acoustic models, in: International Conference on Machine Learning, 2013, pp. 3.
[41]
Kingma D.P., Ba J., Adam: A method for stochastic optimization, 2014, CoRR, arXiv:1412.6980.
[42]
M. Abadi, A. Agarwal, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, et al. TensorFlow: A system for large-scale machine learning, in: 12th USENIX Symposium on Operating Systems Design and Implementation, 2016, pp. 265–283.
[43]
Breiman L., Random forests, Mach. Learn. 45 (1) (2001) 5–32,.
[44]
Chong J., Soufan O., Li C., Caraus I., Li S., Bourque G., Wishart D.S., Xia J., MetaboAnalyst 4.0: Towards more transparent and integrative metabolomics analysis, Nucleic Acids Res. 46 (1) (2018) 486–494,.
[45]
Szymańska E., Saccenti E., Smilde A.K., Westerhuis J.A., Double-check: Validation of diagnostic statistics for PLS-DA models in metabolomics studies, Metabolomics 8 (1) (2012) 3–16,.
[46]
Farrés M., Platikanov S., Tsakovski S., Tauler R., Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation, J. Chemom. 29 (10) (2015) 528–536,.
[47]
Meikle P.J., Hopwood J.J., Clague A.E., Carey W.F., Prevalence of lysosomal storage disorders, JAMA 281 (3) (1999) 249–254,.
[48]
Ruiz-Rodado V., Nicoli E.-R., Probert F., Smith D.A., Morris L., Wassif C.A., Platt F.M., Grootveld M., 1H NMR-linked metabolomics analysis of liver from a mouse model of NP-C1 disease, J. Proteome Res. 15 (10) (2016) 3511–3527,.
[49]
Probert F., Ruiz-Rodado V., Te Vruchte D., Nicoli E.-R., Claridge T.D., Wassif C.A., Farhat N., Porter F.D., Platt F.M., Grootveld M., NMR analysis reveals significant differences in the plasma metabolic profiles of Niemann Pick C1 patients, heterozygous carriers, and healthy controls, Sci. Rep. 7 (1) (2017) 1–12.
[50]
Percival B.C., Gibson M., Wilson P.B., Platt F.M., Grootveld M., Metabolomic studies of lipid storage disorders, with special reference to Niemann-Pick type C disease: A critical review with future perspectives, Int. J. Mol. Sci. 21 (7) (2020) 2533,.
[51]
Percival B.C., Latour Y.L., Tifft C.J., Grootveld M., Rapid identification of new biomarkers for the classification of GM1 type 2 gangliosidosis using an unbiased 1H NMR-linked metabolomics strategy, Cells 10 (3) (2021) 572,.
[52]
Marshall D.D., Powers R., Beyond the paradigm: Combining mass spectrometry and nuclear magnetic resonance for metabolomics, Prog. Nucl. Magn. Reson. Spectrosc. 100 (2017) 1–16,.

Index Terms

  1. Application of data augmentation techniques towards metabolomics
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Computers in Biology and Medicine
    Computers in Biology and Medicine  Volume 148, Issue C
    Sep 2022
    1337 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 01 September 2022

    Author Tags

    1. Data augmentation
    2. Machine learning
    3. Metabolomics
    4. Niemann–Pick type C disease
    5. Rare diseases

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media