Many biological labs commonly use RNA-seq or other high-throughput technologies to assess the gene expression changes in their experimental systems. In order to produce these gene expression changes, regulatory machinary in the cell must have caused changes in transcription. Assessing the effect of each factor of this regulatory machinary directly is often difficult. An open question, then, is how to use gene expression data to predict which cis-regulatory elements, such as transcription factor proteins, were at play in producing those gene expression changes.
The goal of GarNet is to use gene expression and epigenetic data to impute transcription factors (TFs) that played an important role in a biological system. Transcription factors bind in open chromatin regions to specific DNA sequences called "motifs," and affect the expression levels of nearby genes. To determine which TFs were relevant to a biological system, users should supply epigenetic regions (peaks) of interest (i.e. open chromatin regions derived from ATAC-seq or DNase-seq on your cells or in a similar cell line) and differential gene expression data. GarNet:
- Looks for known TF motifs (derived from cisBP) that occur within your epigenetic regions.
- Looks for known genes (derived from RefSeq) that occur near your epigenetic regions.
- Maps the TFs and genes that were found near the same peaks to each other as those TFs potentially effect the expression of those genes
- For each TF, uses linear regression to see if the change in expression level is dependent on the strength of the Transcription factor binding motif.
If a Transcription Factor binding motif is found near genes changing in expression, inside relevant epigenetic regions in this tissue type, and changes in gene expression are significantly dependent on the strength of that motif, we predict that the TF is likely an important player in the gene expression in your system. We assign a score to that TF based on the significance and slope of the regression of motif strength on gene expression.
GarNet now uses BedTools for genomic intersection calculations. BedTools installation instructions available here. For mac users, we recommend:
brew tap homebrew/science
brew install bedtools
GarNet is a python3 package available via pypi. So a simple
pip3 install garnet
should suffice. GarNet depends on python packages numpy
, pandas
, statsmodels
, and pybedtools
. (and matplotlib
and jinja2
for figures and reports)
GarNet has 3 public methods:
construct_garnet_file
which builds a reference file of important genomic annotations to be mapped against.map_peaks
which maps a file of peaks against a "GarNet file".TF_regression
which, given a set of mapped peaks (e.g. from the previous function) and a gene expression profiles (e.g. from RNA-Seq), will regress each transcription factor's binding scores against its downstream gene expression profile.
Specific documentation about each of the functions can be found here.An example workflow using GarNet can be found in the example
folder .
This repository is an updated version of Garnet, originally written by Sara Gosline and Anthony Soltis as part of OmicsIntegrator.
This repository depends heavily on pandas and pybedtools.
We're very thankful for access to cisBP and RefSeq for our motif and genome data, upon which our analyses depend.