Tejaas: Discover trans-eQTLs

Description

Tejaas is a command line tool to find trans-eQTLs from eQTL data. It is released under the GNU General Public License version 3.

Tejaas is based on the hypothesis that a trans-eQTL should regulate the expression levels of multiple genes. In brief, it implements two statistical methods to find trans-eQTLs:

RR-score (Reverse Regression): It performs a multiple linear regression with L₂-regularization using expression levels of all genes to explain the genotype of a candidate SNP. In contrast to conventional methods, the direction of the regression is reversed, with the gene expressions as explanatory variables. RR-score is a statistic which estimates whether more genes are required to explain the allele counts of a SNP than expected by chance.
JPA-score (Joint P-value Analysis): It evaluates the distribution of p-values of the pairwise linear association of a candidate SNP with all available gene expression levels. Any null SNP (no trans-effect) will have a uniform distribution of p-values, while a trans-eQTL will be associated with more genes than expected by chance, leading to overdispersion near zero. The JPA-score is a statistic which estimates whether the distribution of p-values is significantly overdispersed near zero.

Additionally, it also implements a non-linear unsupervised confounder correction using k-nearest neighbors called KNN correction.

Dependencies

Python version 3.6 or higher
C compiler
Any linear algebra routine, e.g. Intel MKL, OpenBLAS, etc.
Any flavor of Message Passing Interface (MPI), e.g. OpenMPI, MPICH, etc.
Other Python libraries:
- NumPy / array operations
- SciPy / optimization and other special functions
- statsmodel / used for ECDF calculation in JPA-score
- Pygtrie / used for reading MAF file in RR-score / maf null
- mpi4py / linked to MPI and MKL for python parallelization
- scikit-learn / used for PCA decomposition in KNN correction

Installation

See the wiki for detailed installation instructions. Here is a quick start guide:

1. Install dependencies

Installation of Tejaas depends on other dependencies. If you are using conda, then they can be installed using

conda install numpy scipy statsmodels scikit-learn
pip install mpi4py

2. Install Tejaas

2a. Tejaas can be installed directly from the PyPI repository:

pip install tejaas

2b. You can also download this repository and build Tejaas:

git clone git@github.com:soedinglab/tejaas.git
cd tejaas
pip install -e .

3. Run Tejaas!

tejaas [OPTIONS]

See below for valid options or try tejaas --help.

Run an example to check installation

An example script example/run_example.sh is provided to check the installation.

cd tejaas/example
./run_example.sh <outdir> <ncpu>

The script will download some example input files in <outdir>/data and run Tejaas on <ncpu> cores. The output will be created in <outdir>/data. Note that the data subdirectory is automatically created within the <outdir> specified by the user. Check if the output is correct:

python compare_with_gold.py --outdir <outdir>

This will check if the output in <outdir>/data (again note that data subdirectory is automatically appended and need not be specified by the user) matches with the results provided in the example/gold subdirectory.

Input Files

Gene expression file
Genotype file
- VCF
- Oxford
- Dosage
GENCODE file
Population minor allele frequency

Tejaas [OPTIONS]

Option	Argument	Description	Priority	Default value
`--vcf`	`FILEPATH`	Input genotype file in vcf.gz format	Required (vcf or oxf)	--
`--oxf`	`FILEPATH`	Input genotype file in Oxford format	Required (vcf or oxf)	--
`--dosage`		Flag for reading dosage files. The file is specified with the `--oxf` option, e.g. `--oxf FILEPATH --dosage`	Optional	`False`
`--fam`	`FILEPATH`	Input fam file for samples names of Oxford genotype	Optional	--
`--chrom`	`INT`	Chromosome number of the genotype file	Required	--
`--include-SNPs`	`START:END`	Colon-separated index of SNPs to be included	Optional	--
`--gx`	`FILEPATH`	Input gene expression file for trans-eQTL discovery	Required	--
`--gxcorr`	`FILEPATH`	Input gene expression file for target gene discovery	Optional	`--gx` file
`--gxfmt`	`OPTION`	Input gene expression file format (see format details below). Supported options: `gtex`, `cardiogenics`, `geuvadis`	Optional	`gtex`
`--gtf`	`FILEPATH`	Input GTF file from GENCODE to read gene Ensembl IDs. Used for selecting biotypes and getting genomic locations.	Required	--
`--trim`		Flag to trim version number from GENCODE Ensembl IDs	Optional	`False`
`--biotype`	`OPTION`	Which biotypes to select from the GTF file. Supported options: `protein_coding`, `lncRNA`.	Optional	`protein_coding lncRNA`
`--outprefix`	`STRING`	Full path to output file names. The extensions are generated by Tejaas.	Optional	`out`
`--method`	`OPTION`	Name of method to run. Supported options: `jpa` or `rr`	Optional	`rr`
`--null`	`OPTION`	Null model to use for RR-score. Supported options: `perm` or `maf`	Optional	`perm`
`--cismask`		Flag to mask cis-Genes within a window for each candidate SNP. Gene positions are obtained from the GENCODE annotation file.	Optional	`False`
`--window`	`FLOAT`	Window (number of base pairs) used for masking cis genes	Optional	1e6
`--prior-sigma`	`FLOAT`	Standard deviation of the normal prior for reverse multiple linear regression	Optional	0.1
`--knn`	`INT`	Number of neighbours for KNN (use 0 if you do not want to use KNN)	Optional	0
`--psnpthres`	`FLOAT`	Target genes will be reported only for trans-eQTLs below this threshold p-value for RR/JPA-score	Optional	0.0001
`--pgenethres`	`FLOAT`	Target genes will be reported only if their association with trans-eQTLs are below this threshold p-value	Optional	0.05
`--jpanull`	`FILEPATH`	File containing list of null model JPA-scores	Optional	--
`--maf-file`	`FILEPATH`	Read minor allele frequency (MAF) of SNPs from this file, e.g. to read population MAF for `maf` null (see documentation for file format)	Optional	--
`--shuffle`		Flag to randomly shuffle the genotypes to obtain a null distribution	Optional	`False`
`--shuffle-with`	`FILEPATH`	Shuffle the genotypes in the same order of donor IDs specified in `FILEPATH`	Optional	--

Usage Examples

For quick start or installation check, run Tejaas with all default options:

tejaas --vcf ${VCFFILE} --chrom ${CHRM} --gx ${GXFILE} --gtf ${GTFFILE} --cismask --outprefix ${OUTPREFIX}

This will create RR-scores at γ=0.1 and masking all genes within 1Mb of each SNP. The p-values will be computed from the permuted null model. Default format for the gene expression is the same as the GTEx format, and default gtf file is the GENCODE v26 release. For target gene discovery, it will use the same file as used for trans-eQTL discovery.

Example of running Tejaas RR-score. We recommend using the perm null model for calcuting p-values from the RR-score and a separate confounder-corrected gene expression file for target gene discovery. In this example, RR-score is calculated for first 1000 SNPs excluding the first 20 --include-SNPs 21:1000. KNN correction is performed with 20 nearest neighbors --knn 20. All cis-genes within +2MB and -2Mb are masked during analysis --cismask --window 2e6. RR-score calculation uses a prior normal distribution with standard deviation of 0.05 --prior-sigma 0.05. The output reports target genes only for SNPs with p-value < 1e-6 --psnpthres 0.000001. Here, GXFILE is the raw gene expression file, GXCORRFILE is the confounder-corrected gene expression file, VCFFILE is the genotype file in .vcf.gz format and GTFFILE is the GENCODE annotation file.

mpirun -n 8 tejaas --vcf ${VCFFILE} --chrom ${CHRM} --include-SNPs 21:1000 --gx ${GXFILE} --gxcorr ${GXCORRFILE} \
                   --gxfmt gtex --gtf ${GTFFILE} --trim  --outprefix ${OUTPREFIX} \
                   --cismask --window 2e6 --psnpthres 0.000001 \
                   --knn 20 --method rr --null perm --prior-sigma 0.05

Example of running JPA-score with no KNN correction. Empirical p-values are calculated from the null scores loaded from NULLFILE specified by the --jpanull option. If NULLFILE does not exist, then it will create 100000 null scores and write them in the NULLFILE before calculating JPA-scores. If --jpanull option is not used, then p-values for the JPA-scores are calculated from an analytical construction of null model.

mpirun -n 8 tejaas --vcf ${VCFFILE} --chrom ${CHRM} --include-SNPs 1:100 \
                   --gx ${GXFILE} --gxfmt gtex --gtf ${GTFFILE} --outprefix ${OUTPREFIX} \
                   --knn 0 --method jpa --jpanull ${NULLFILE}

Example of parallelizing job submission.

NMAX=20000 # number of SNPs per job
for CHRM in $( seq 1 22 ); do
    VCFFILE="file_path_here_${CHRM}.vcf.gz"
    NTOT=$( calculate_no_of_SNPs_in_this_chromosome )
    NJOB=$( echo $(( (NTOT + NMAX - 1)/NMAX )) )
    for (( i=0; i < ${NJOB}; i++ )); do
        STARTSNP=$(( NMAX * i + 1 ))
        ENDSNP=$(( NMAX * (i + 1) ))
        if [ ${ENDSNP} -gt ${NTOT} ]; then
            ENDSNP=${NTOT}
        fi
        mpirun -n 8 tejaas --vcf ${VCFFILE} --chrom ${CHRM} --include-SNPs ${STARTSNP}:${ENDSNP} --gx ${GXFILE} --gxcorr ${GXCORRFILE} \
                           --gxfmt gtex --gtf ${GTFFILE} --trim  --outprefix ${OUTPREFIX} \
                           --cismask --psnpthres 0.000001 --knn 20 --method rr --null perm --prior-sigma 0.05
    done
done

Contributors

Version History

1.0.2 Bump release version

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
devtools		devtools
example		example
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
configs.ini		configs.ini
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tejaas: Discover trans-eQTLs

Description

Dependencies

Installation

1. Install dependencies

2. Install Tejaas

3. Run Tejaas!

Run an example to check installation

Input Files

Tejaas [OPTIONS]

Usage Examples

Contributors

Version History

About

Releases

Packages

Contributors 3

Languages

License

soedinglab/tejaas

Folders and files

Latest commit

History

Repository files navigation

Tejaas: Discover trans-eQTLs

Description

Dependencies

Installation

1. Install dependencies

2. Install Tejaas

3. Run Tejaas!

Run an example to check installation

Input Files

Tejaas [OPTIONS]

Usage Examples

Contributors

Version History

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages