Nothing Special   »   [go: up one dir, main page]

Genome Database & Information System For Daphnia: @bio - Indiana.edu

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 14

Genome database &

information system for


Daphnia
• Don Gilbert, gilbertd@bio.indiana.edu October
2002
• Talk doc at http://iubio.bio.indiana.edu/daphnia/docs/
genome-dbs-talk.doc, .ppt
Genome database
examples
• Drosophila: FlyBase, http://flybase.net/ (Indiana Univ.)
• C. elegans: Wormbase, http://www.wormbase.org/
• Mouse: MGD, http://www.informatics.jax.org/
• Saccaromyces: SGD, http://genome-
www.stanford.edu/Saccharomyces/
• Human: LocusLink, http://www.ncbi.nlm.nih.gov/LocusLink/
• Human: GeneCards http://bioinfo.weizmann.ac.il/cards/
• Various eukaryotes: Ensembl http://www.ensembl.org/
• Various eukaryotes: euGenes http://eugenes.org/ (Indiana
Univ.)
• Many newly developing organism genome systems for
Daphnia, insects, vertebrates, new full-genome organisms
Anatomy of genome
database & info system
Anatomy of Genome
DB/IS
• Structure
– Complex document structure; tabular data; etc.
– Organize: Table of contents, Reports, Indexing
– Browse contents; Search / retrieve from biological
questions
– Bulk data search / retrieve for bioinformatics

• Content
– Literature (abstracted and curated), Sequence and
feature analyses, maps, controlled
vocabulary/ontologies, people, biologics, contacts, etc.
– Metadata describing primary data, along with
protocols, notes, sources
Anatomy of Genome
DB/IS, 2
• Data exchange
– Data definitions & schema (XML)
– Controlled vocabularies of science terms, ontologies
– Minimal information for collaboration, sharing

• Informatics / software
– Backend database, data collection, management,
analyses
– Front-end services (hypertext web, search/retrieval);
ease of understanding and usage (HCI)
– Middleware software, interfaces
– Genome specialized: maps, BLAST searches,
ontologies
GMOD - Generic genome
database tools
• Generic Model Organism Database
Construction Set, http://www.gmod.org/
• Database schemas
• Literature curation tools
• Gene ontology management tools
• Visualization tools
• Data processing pipelines
FlyBase and euGenes
FlyBase.net
• Distributed project (4 sites, ~6 PI’s, ~15 curators,
~15 informaticians); 10 years old
• Multiple databases; project data flow and
exchange critical
• Curated and computed data, from expt.
literature, genome sequence
• Integrated database modules (for generic use w/
GMOD)
– Genetics, Sequences, Maps, Expression
– Controlled vocabularies & Ontologies
– Computational analyses
– Organism, taxonomy, phylogenetic/comparative
– Publications, General
euGenes.org
• Automated genome summaries for Human,
Fruitfly, Mouse, Mosquito, Arabidopsis, C.
elegans, Saccharomyces, Zebrafish
• 3 year, computational DB project, 1 part-time
informatician (dgg )
• genome maps, sequences, gene reports,
external database links
• cross-species comparisons: similar genes,
genome features, gene function
A genome web db for
Daphnia
Preliminary example
• http://iubio.bio.indiana.edu/daphnia/
• Sample data include microsatellite DNA of J.
Colbourne, GenBank Daphnia seqs, Medline
abstracts
• Blast searches, reports
• Text data searches
Requirements for a
genome db/ info system
• Data components??
– biosequence types, literature, external data (insects,
others), expression info, pathways, maps, anatomy,
populations, species, ecology, organismal, stocks,
people
– Standard data structure and exchange schema
(sequences, XML)

• Architecture
– Internet-shared, standards-based, open-source preferred
– Relational database for data management
– Search and retrieval software for flat file data
– Flexible – data schema changes common
– Performance constraints
Requirements for
genome system, cont.
• Analysis software
– Project uses: sequence analyses, external database
comparisons
– One-time analyses, publishing results
– Pipeline for automated analyses, rerun as needed
– Public uses (e.g. BLAST search)

• Publication interface
– Detail biological object views (sequences, genes, etc.)
– Queries: simple-common, ad-hoc/general
– Graphic viewers

• Editing / data management interface


– Interactive – document editing
– Batch data updates
Compute parts of
system
• Web server (Apache) and modules
• FTP server for bulk data exchange
• Relational DBMS: PostgreSQL.org, MySQL.com,
Oracle..
• Analysis programs: BLAST, various
bioinformatics tools
• Perl, Java middleware for data access &
analysis, search and report
• Limited, secure access for project data
management
• Public access for released data (web, ftp)

You might also like