BCH 516-1
BCH 516-1
BCH 516-1
BCH 516
BUSARI M.B
FEDERAL UNIVERSITY OF TECHNOLOGY
MINNA
busari.bola@futminna.edu.ng
https://
scholar.google.com/citations?user=dxLL0ZoAAAAJ&hl=en
Course Outline
The concept of genes
Molecular biology/computational research
Biological Data and their sources; Cellular,
molecular biology, Biochemistry, evolutionary
biology, DNA and protein sequence data.
Sequence alignment
Global and local alignment
Multiple sequence alignment
Phylogenic analysis
Applications of bioinformatics and computational
biology
• Molecular Biology
Field of biology that studies the composition, structure and interactions of cellular
molecules - such as nucleic acids and proteins – that carry out the biological
processes essential for the cell's functions and maintenance.
• Gene
Genes are segments of DNA that contain instructions for building the molecules
that make the body function.
• Genome
All the genetic material in an organism. It is made of DNA (or RNA in some
viruses) and includes genes and other elements that control the activity of those
genes.
• Genomics
The branch of molecular biology concerned with the structure, function, evolution,
and mapping of genomes.
• Bioinformatics
Collection and storage of biological information
Derives knowledge from computer analysis of biological data
• Computational biology
Development of algorithms and statistical models to analyze biological data
Data Types
According to the types of data managed in different
databases, biological data bases can roughly fall into
the following categories:
(1) DNA, (2) RNA, (3) protein, (4) expression, (5)
pathway, (6) disease, (7) nomenclature, (8)
literature, and (9) standard and ontology
Sources of the data can be from;
• Cellular and molecular biology
• Genetics
• Biochemistry
• Evolutionary Biology
DNA SEQUENCING
• The 4 steps of next generation sequencing
(NGS) include nucleic acid isolation, library
preparation, clonal amplification and
sequencing, and data analysis.
• Step 1- Nucleic Acid Extraction and
Isolation. ...
• Step 2- Library Preparation. ...
• Step 3- Clonal Amplification and
Sequencing. ...
• Step 4 -Data Analysis Using Bioinformatics.
DNA DATABASES
A DNA database centers on managing DNA data
from many or some specific species. The primary
function of human DNA databases includes
establishment of the;
• Reference genome (e.g., NCBI RefSeq)
• Profiling of human genetic variation (e.g., dbSNP)
• Association of genotype with phenotype (e.g., EGA)
• Identification of human microbiome metagenomes
(e.g., IMG/HMP).
A representative example of DNA database is
GenBank, a collection of all publicly-available DNA
sequences (http://www.ncbi.nlm.nih.gov/genbank)
RNA DATABASES
Only tiny proportion of the human genome is
transcribed into mRNAs, whereas the vast
majority of the genome is transcribed into
“dark matter”—non-coding RNAs (ncRNAs)
that do not encode proteins, including
microRNAs (miRNAs), small nucleolar RNAs
(snoRNAs), piwiRNAs (piRNAs), and long
non-coding RNA (lncRNA).
A representative example of RNA database is
RNAcentral (http://rnacentral.org).
Protein databases
The purpose of constructing protein databases includes;
• collection of universal proteins (e.g., UniProt)
• Identification of protein families and domains (e.g., Pfam)
• Reconstruction of phylogenetic trees (e.g., TreeFam [24])
• Profiling of protein structures (e.g., PDB).
A representative example of protein database is PDB, the
main primary database for 3D structures of biological
macromolecules determined by X-ray crystallography and
NMR.
This was established in 1971, PDB contains 105,465
biological macromolecular structures as of 30 December
2014, in which 27,393 entries belong to human (
http://www.rcsb.org/pdb).
Expression databases
Expression databases can be used for various
purposes;
• Archiving expression data (e.g., GEO)
• Detecting differential and baseline expression (e.g.,
Expression Atlas)
• Exploring tissue-specific gene expression and
regulation (e.g., TiGER)
• Profiling expression information based on both
RNA and protein data (e.g., Human Protein Atlas).
A representative case of expression database is
Human Protein Atlas. (http://www.proteinatlas.org).
Pathway databases
Pathway databases contain biological pathways for
metabolic, signaling, and regulatory pathway
analysis.
A representative example is KEGG PATHWAY, a
curated biological pathway resource on the
molecular interaction and reaction networks.
As the core of KEGG, KEGG PATHWAY
integrates many entities that are stored in KEGG
sibling databases, including genes, proteins, RNAs,
chemical compounds, and chemical reactions (
http://www.genome.jp/kegg/pathway.html).
Disease databases
There are at least 200 forms of cancer in the world, causing 14.6% of
all human deaths.
Thus, obtaining complete cancer genomes and identifying molecular
mutations and abnormal genes can provide new insights for cancer
prevention, detection, and eventually, personalized treatment.
Toward this end, there are two well-known cancer projects, viz., The
Cancer Genome Atlas (TCGA) and International Cancer Genome
Consortium (ICGC).
TCGA, founded in 2006 by the National Cancer Institute and National
Human Genome Research Institute at the National Institutes of
Health, aims to collect a wide diversity of omics data for more than
20 different types of human cancer (http://cancergenome.nih.gov).
Unlike TCGA, ICGC is a voluntary collaborative organization
initiated in 2008 and open to all cancer and genomic researchers in the
world. It aims to obtain a comprehensive description of genomic,
Nomenclature Databases
Nomenclature Database provides data for all
human genes which have approved symbols.
Genew is a database that contains Human Gene
symbols, managed by the HUGO Gene
Nomenclature Committee (HGNC) as a
confidential database, containing over 16 000
records.
Data are integrated with other human gene
databases, e.g. GDB, LocusLink and SWISS-
PROT.
Mouse Genome Database (MGD) is a database
approved for mice gene symbols.
Gene Ontology Databases
The Gene Ontology (GO) is a major bioinformatics
initiative to unify the representation of gene and gene
product attributes across all species.
GO aims to; maintain and develop its controlled
vocabulary of gene and gene product attributes;
• annotate genes and gene products;
• assimilate and disseminate annotation data
Example of GO is Open Biomedical Ontologies
Gene nomenclature focuses on gene and gene products.
But GO focuses on the function of the genes and gene
products.
summary
Why bioinformatics is critical?
Few people adequately trained in both biology and computer
science
3. Structural Bioinformatics
WormBase (http://www.wormbase.org/)
AceDB (http://www.acedb.org/)
FlyBase (http://flybase.bio.indiana.edu/)
Protein databses