Nothing Special   »   [go: up one dir, main page]

Skip to content
/ PanMAGs Public

Pan-genomics of metagenome-assembled genomes suffers from fragmentation and incompleteness

Notifications You must be signed in to change notification settings

tli14/PanMAGs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PanMAGs

This repository provides codes and files to reproduce data and figures from the manuscript "Critical assessment of pan-genomics of metagenome-assembled genomes", by Tang Li and Yanbin Yin* (*corresponding author). Here, the Python and shell scripts cover downloading genome data, simulating metagenome-assembled genomes (MAGs) from complete genomes, analyzing pan-genome, performing Clusters of Orthologous Group (COG) functional annotations and comparing phylogenetic trees. The R codes cover reformatting data, generating plots and combining plots.

Requirements

  • FastANI: Calculate Average Nucleotide Identity (ANI).
  • Prokka: Prokaryotic genome annotation.
  • Blast+: Compare sequences to database.
  • Roary: Pan-genome analysis.
  • Anvi'o: Pan-genome analysis.
  • BPGA: Pan-genome analysis.
  • Fasttree: Phylogenetic tree construction.

Data

  • The entire data generated in this study is too large to store on Github, some example data for Escherichia coli are available online for testing MAG simulation, generating mixed MAG datasets, extracting and comparing core genes, and evaluating downstream analysis.
  • Anaconda is used to create conda environment to run Python scripts, the required package conda_list can be downloaded using conda create --name <env> --file conda_list.
  • Information about R packages needed to run R codes can be found in R_packages.
  • The four supplementary tables for this manuscript can be found the folder supplementary tables.

Scripts

Python_Shell_scripts

R_code

  • These R codes were used to generate figures and supplementary figures in the manuscript. For example, the "Fig2.AB.frag_data.R" was used to generate Figure 2.A and Figure 2.B in the manuscript. The input files for generating Figure 2 can be found in "Fig2.frag_incomp".

About

Pan-genomics of metagenome-assembled genomes suffers from fragmentation and incompleteness

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published