Genome Database & Information System For Daphnia: @bio - Indiana.edu
Genome Database & Information System For Daphnia: @bio - Indiana.edu
Genome Database & Information System For Daphnia: @bio - Indiana.edu
• Content
– Literature (abstracted and curated), Sequence and
feature analyses, maps, controlled
vocabulary/ontologies, people, biologics, contacts, etc.
– Metadata describing primary data, along with
protocols, notes, sources
Anatomy of Genome
DB/IS, 2
• Data exchange
– Data definitions & schema (XML)
– Controlled vocabularies of science terms, ontologies
– Minimal information for collaboration, sharing
• Informatics / software
– Backend database, data collection, management,
analyses
– Front-end services (hypertext web, search/retrieval);
ease of understanding and usage (HCI)
– Middleware software, interfaces
– Genome specialized: maps, BLAST searches,
ontologies
GMOD - Generic genome
database tools
• Generic Model Organism Database
Construction Set, http://www.gmod.org/
• Database schemas
• Literature curation tools
• Gene ontology management tools
• Visualization tools
• Data processing pipelines
FlyBase and euGenes
FlyBase.net
• Distributed project (4 sites, ~6 PI’s, ~15 curators,
~15 informaticians); 10 years old
• Multiple databases; project data flow and
exchange critical
• Curated and computed data, from expt.
literature, genome sequence
• Integrated database modules (for generic use w/
GMOD)
– Genetics, Sequences, Maps, Expression
– Controlled vocabularies & Ontologies
– Computational analyses
– Organism, taxonomy, phylogenetic/comparative
– Publications, General
euGenes.org
• Automated genome summaries for Human,
Fruitfly, Mouse, Mosquito, Arabidopsis, C.
elegans, Saccharomyces, Zebrafish
• 3 year, computational DB project, 1 part-time
informatician (dgg )
• genome maps, sequences, gene reports,
external database links
• cross-species comparisons: similar genes,
genome features, gene function
A genome web db for
Daphnia
Preliminary example
• http://iubio.bio.indiana.edu/daphnia/
• Sample data include microsatellite DNA of J.
Colbourne, GenBank Daphnia seqs, Medline
abstracts
• Blast searches, reports
• Text data searches
Requirements for a
genome db/ info system
• Data components??
– biosequence types, literature, external data (insects,
others), expression info, pathways, maps, anatomy,
populations, species, ecology, organismal, stocks,
people
– Standard data structure and exchange schema
(sequences, XML)
• Architecture
– Internet-shared, standards-based, open-source preferred
– Relational database for data management
– Search and retrieval software for flat file data
– Flexible – data schema changes common
– Performance constraints
Requirements for
genome system, cont.
• Analysis software
– Project uses: sequence analyses, external database
comparisons
– One-time analyses, publishing results
– Pipeline for automated analyses, rerun as needed
– Public uses (e.g. BLAST search)
• Publication interface
– Detail biological object views (sequences, genes, etc.)
– Queries: simple-common, ad-hoc/general
– Graphic viewers