Nothing Special   »   [go: up one dir, main page]

Biological Database

Download as ppt
Download as ppt
You are on page 1of 19

Biological database

An ever expanding reservoir


of information…….

Presented By:
Mahesh Yadav
What is biological database???

Biological databases are libraries of life


sciences information, collected from scientific
experiments, published literature, high
throughput experiment technology, and
computational analyses. They contain information
from research areas including genomics,
proteomics, metabolomics, microarray gene
expression and phylogenetics .
A brief history of biological
databases
 1965 M. O. Dayhoff et al. publish “Atlas of a
Protein Sequences and Structures”

 1982 EMBL initiated DNA sequence database,


followed within a year by GenBank of NCBI and
in 1984 by DNA Database of Japan

 1988 EMBL/GenBank/DDBJ agreed on


common format for data elements
Biological Databases: specific
features
 Autonomous: many independent maintainers
 Heterogeneous data formats: e.g., various data
formats for the same data entities; various types
of biological data: genomic, microarray,
proteomic, ...
 Dynamic: frequent and continuous changes in
data content.
 Broad domain knowledge.
 Workflow-oriented: databases.
 Rich set of analysis tools.
 Information integration is essential: data
aggregation from several databases.
Biological Databases: some
statistics
 More than 1000 different databases

 1078 databases reported in The Molecular


Biology Database Collection: 2008 update by
Michael Y. Galperin , Nucleic Acids Research,
2008, Vol. 36, Database issue D2-D4
 Metabase: database of biological databases,

 Update (adding new data) frequency: daily to


annually
 Free accessibility (almost all)
Types of databases
•Primary databases
Original submissions by experimentalists
Content controlled by the submitter

Examples: GenBank, GEO

•Derivative databases
Built from primary data
Content controlled by third party (NCBI)

Examples: Refseq, RefSNP, UniGene, NCBI Protein,


Conserved Domain, Gene.
Genomic
database

Biological
database

Sequence Structure
database database
Sequence Databases
 The sequence databases are the
oldest type of biological databases,
and also the most widely used.
Sequence Databases

Nucleotide sequence database


Protein sequence database

International nucleotide sequence General sequence database e.g. swiss-


database collaboration-include prot, uni-prot, refseq
EMBL,DDBJ,NCBI{GENBANK} Protein properties e.g. Binding DB, PPT-D
Coding and non-coding DNA
Protein localization and targeting e.g.
DBSubLoc - Database of protein
Gene structure,intron exon,splice Subcellular Localization
site
Transcriptional regulator site and Protein sequence motifs and active
transcription factors sites e.g. PROSITE
Database of individual protein families
RNA sequence database e.g. Plant TFDB
Structure database
There are various types of structure databases :
 Small molecules e.g. AANT : Amino Acid - Nucleotide
interaction database
 Carbohydrates e.g. Glycoconjugate Data Bank:
Structures—an annotated glycan structure database
and N-glycan primary structure verification service .
 Nucleic acid e.g. MeRNA (Metals in RNA) -a
comprehensive compilation of all metal binding sites
identified in RNA three-dimensional structures
available from the Protein Data Bank (PDB) and
Nucleic Acid Database.
 Protein structure e.g. The Protein Data Bank (PDB)
is the single worldwide archive of structural data of
biological macromolecules.
PDB
 The Protein Data Bank ( PDB ) was
established at Brookhaven National
Laboratories in 1971 as an archive for
biological macromolecular crystal
structures.
 The three dimensional structures in PDB
are primarily derived from experimental
data obtained by X-ray crystallography
and NMR .
SCOP
 The SCOP database groups different protein
structures according to their evolutionary
relationship.The evolutionary relationship of all
known protein structures have been determined
by manual inspection and automated methods.
 The goal of SCOP is to provide detail
information about close relatives of proteins and
to provide an evolutionary based protein
classification resource.
GENOMIC DATABASE
•General genomic database
e.g. Entrez Gene--It is NCBI's database for gene specific information. It does not
include all known or predicted genes; instead Entrez Gene focuses on the genomes
that have been completely sequenced,
•Taxonomy and identification:
e.g. NCBI Taxonomy Database -It includes names and classifications for all of the
organisms that are represented in the protein and sequence databases.
•Prokaryotic genome database:
e.g. GeneDB
•Viral genome database:
e.g. BioHealthBase
•Fungal genome database:
e.g. Yeast Resource Center
• Genome annotation terms and nomenclature:
• e.g. BioThesaurus--It is a web-based system that maps a comprehensive
collection of protein and gene names to protein entries in the UniProt
Knowledgebase (UniProtKB).

• Invertebrate genome database:


• e.g. Drosophila microarray centre

• Unicellular eukaryote genome database:


e.g.
2. TGD - Tetrahymena Genome Database ,
3. Full malaria--It is a database of full length enriched cDNA libraries of malaria
parasites: Plasmodium falciparum, P. yoelii, and Toxoplasma gondii
Some other databases
 Microarray data and gene expression
database.
 Plant database.
 Immunological database.
 Human gene and disease.
 Literature database.
 EST databases.
ESTs
 EST – expressed sequence tag
 partial DNA sequence (“single-pass”) of a cDNA
clone
 provides the most comprehensive evidence for
the existence of genes and their structure
 provide an inventory of likely genes and their
variants along with information regarding the
functional roles played by these genes and their
products.

e.g. dbEST, HUNT: Annotated human full-length


cDNA sequences
EST cluster databases
 UniGene is a database at NCBI that contains
clusters (UniGene clusters) of sequences that
represent unique genes. These cluster are made
automatically by partitioning GenBank sequences
into a non-redundant set of gene-oriented
clusters.

 Other EST cluster databases are TIGR Gene


Indices, Sputnik: Annotation of clustered plant
ESTs:
Some examples of integrated
biological database resources
are:
 Entrez Browser (at NCBI)
 ExPASy (home of SwissProt)
 Ensembl (Open Source based system)
References
 Lukas K. Buehler, Hooman H. Rashidi :Bioinformatics basics
 MArketa Zuelebil, Jeremy o. Baum:Understanding
Bioinformatics
 Yi Ping Phoebe Chen:Bioinformatics technologies
 Maureen J. Donlin:Introduction to Genomics and
Bioinformatics
 Biological databases an introduction: Dr. Erik Bongcam-
Rudloff
 Building successful biological databases: Russ B. Altman
Stanford University
 Google http://www.google.com
 Nucleic Acids Research – Database & Web Server
issues
http://nar.oupjournals.org

You might also like