Preview |
PDF, English
- main document
Download (9MB) | Terms of use |
Abstract
Background: Feature extraction and signature identification are two critical steps to understand diverse biological processes. Signatures are defined as groups of molecular features that are sufficient to identify certain genotype or phenotype. In particular, Non-negative Matrix Factorization (NMF) has been used to identify signatures in complex genomic datasets. However, running a basic NMF analysis is a challenging task with a steep learning curve and long computing time; furthermore, the usability of these algorithms is lessened by limited resources to interpret the results obtained from them. This creates a pressing need for the development of tools that mitigate such obstacles.
Results: In this study we developed ButchR and ShinyButchR, a fast and user-friendly toolkit to decompose datasets (slicing genomics) and learn signatures using NMF. The package can be freely installed from GitHub at https://github.com/wurst-theke/ButchRr. We used ButchR to identify a new regulatory subtype in neuroblastoma, which showed mesenchymal characteristics and was phenotypically associated to multipotent Schwann cell precursors. Additionally, we created a new workflow to infer regulatory relationships between genes and their _cis_-regulatory elements for individual cells, followed by inference of regulatory-signatures.
Conclusions: ButchR/ShinyButchR is an useful toolkit for analyzing multiple types of data, and inferring signatures that are able to capture relevant biological information. This toolkit is a new valuable resource to the scientific community, and it can be used to understand complex biological processes.
Document type: | Dissertation |
---|---|
Supervisor: | Brors, Prof. Dr. Benedikt |
Place of Publication: | Heidelberg |
Date of thesis defense: | 21 September 2021 |
Date Deposited: | 21 Oct 2021 09:38 |
Date: | 2021 |
Faculties / Institutes: | The Faculty of Bio Sciences > Dean's Office of the Faculty of Bio Sciences |
DDC-classification: | 004 Data processing Computer science 500 Natural sciences and mathematics |
Controlled Keywords: | nicht-negative Matrixfaktorisierung, Bioinformatik, Krebs <Medizin>, Neuroblastom, Genomik |