Ensembl 2014.

1. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Authors
Flicek P¹
(1 author)

ORCIDs linked to this article

Show all (38)

Nucleic Acids Research, 06 Dec 2013, 42(Database issue):D749-55
https://doi.org/10.1093/nar/gkt1196 PMID: 24316576 PMCID: PMC3964975

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

Abstract

Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.

Free full text

Nucleic Acids Res. 2014 Jan 1; 42(Database issue): D749–D755.

Published online 2013 Dec 6. https://doi.org/10.1093/nar/gkt1196

PMCID: PMC3964975

PMID: 24316576

Ensembl 2014

Paul Flicek,^1,^2,^* M. Ridwan Amode,² Daniel Barrell,² Kathryn Beal,¹ Konstantinos Billis,² Simon Brent,² Denise Carvalho-Silva,¹ Peter Clapham,² Guy Coates,² Stephen Fitzgerald,¹ Laurent Gil,¹ Carlos García Girón,² Leo Gordon,¹ Thibaut Hourlier,² Sarah Hunt,¹ Nathan Johnson,¹ Thomas Juettemann,¹ Andreas K. Kähäri,² Stephen Keenan,¹ Eugene Kulesha,¹ Fergal J. Martin,² Thomas Maurel,¹ William M. McLaren,¹ Daniel N. Murphy,² Rishi Nag,² Bert Overduin,¹ Miguel Pignatelli,¹ Bethan Pritchard,² Emily Pritchard,¹ Harpreet S. Riat,² Magali Ruffier,¹ Daniel Sheppard,² Kieron Taylor,¹ Anja Thormann,¹ Stephen J. Trevanion,² Alessandro Vullo,¹ Steven P. Wilder,¹ Mark Wilson,² Amonida Zadissa,¹ Bronwen L. Aken,² Ewan Birney,¹ Fiona Cunningham,¹ Jennifer Harrow,² Javier Herrero,¹ Tim J.P. Hubbard,² Rhoda Kinsella,¹ Matthieu Muffato,¹ Anne Parker,² Giulietta Spudich,¹ Andy Yates,¹ Daniel R. Zerbino,¹ and Stephen M.J. Searle²

Paul Flicek

¹European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD and ²Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK

Find articles by Paul Flicek

M. Ridwan Amode

Find articles by M. Ridwan Amode

Daniel Barrell

Find articles by Daniel Barrell

Kathryn Beal

Find articles by Kathryn Beal

Konstantinos Billis

Find articles by Konstantinos Billis

Simon Brent

Find articles by Simon Brent

Denise Carvalho-Silva

Find articles by Denise Carvalho-Silva

Peter Clapham

Find articles by Peter Clapham

Guy Coates

Find articles by Guy Coates

Stephen Fitzgerald

Find articles by Stephen Fitzgerald

Laurent Gil

Find articles by Laurent Gil

Carlos García Girón

Find articles by Carlos García Girón

Leo Gordon

Find articles by Leo Gordon

Thibaut Hourlier

Find articles by Thibaut Hourlier

Sarah Hunt

Find articles by Sarah Hunt

Nathan Johnson

Find articles by Nathan Johnson

Thomas Juettemann

Find articles by Thomas Juettemann

Andreas K. Kähäri

Find articles by Andreas K. Kähäri

Stephen Keenan

Find articles by Stephen Keenan

Eugene Kulesha

Find articles by Eugene Kulesha

Fergal J. Martin

Find articles by Fergal J. Martin

Thomas Maurel

Find articles by Thomas Maurel

William M. McLaren

Find articles by William M. McLaren

Daniel N. Murphy

Find articles by Daniel N. Murphy

Rishi Nag

Find articles by Rishi Nag

Bert Overduin

Find articles by Bert Overduin

Miguel Pignatelli

Find articles by Miguel Pignatelli

Bethan Pritchard

Find articles by Bethan Pritchard

Emily Pritchard

Find articles by Emily Pritchard

Harpreet S. Riat

Find articles by Harpreet S. Riat

Magali Ruffier

Find articles by Magali Ruffier

Daniel Sheppard

Find articles by Daniel Sheppard

Kieron Taylor

Find articles by Kieron Taylor

Anja Thormann

Find articles by Anja Thormann

Stephen J. Trevanion

Find articles by Stephen J. Trevanion

Alessandro Vullo

Find articles by Alessandro Vullo

Steven P. Wilder

Find articles by Steven P. Wilder

Mark Wilson

Find articles by Mark Wilson

Amonida Zadissa

Find articles by Amonida Zadissa

Bronwen L. Aken

Find articles by Bronwen L. Aken

Ewan Birney

Find articles by Ewan Birney

Fiona Cunningham

Find articles by Fiona Cunningham

Jennifer Harrow

Find articles by Jennifer Harrow

Javier Herrero

Find articles by Javier Herrero

Tim J.P. Hubbard

Find articles by Tim J.P. Hubbard

Rhoda Kinsella

Find articles by Rhoda Kinsella

Matthieu Muffato

Find articles by Matthieu Muffato

Anne Parker

Find articles by Anne Parker

Giulietta Spudich

Find articles by Giulietta Spudich

Andy Yates

Find articles by Andy Yates

Daniel R. Zerbino

Find articles by Daniel R. Zerbino

Stephen M.J. Searle

Find articles by Stephen M.J. Searle

Author information Article notes Copyright and License information Disclaimer

This article has been cited by other articles in PMC.

Go to:

Abstract

Go to:

INTRODUCTION

The Ensembl project (http://www.ensembl.org) creates and distributes genome annotations and provides integrated views of other valuable genomic data for supported chordate genomes. Our resources are intended to serve as community reference datasets on which other genomic research can be built. As such, Ensembl provides unique tools, datasets and user support compared to similar projects such as the UCSC Genome Browser (1), while supporting community standards that promote interoperability in genomics. For example, we have developed and distribute an extensive, open software infrastructure with diverse analysis pipelines supporting a variety of genome analyses (2) and the artificial intelligence inspired eHive analysis management system (3); data mining and analysis tools that include BioMart (4) and the Ensembl Variant Effect Predictor (VEP) (5); supported and robust application programing interfaces (APIs) (6) and a unique genome browser interface (7). Our software is distributed using a permissive Apache-style open-source license meaning that, unlike similar software, it is free for all potential users. Additionally, our data is provided without restriction and we have the most comprehensive suite of training options of any public genomics tool to maximize usability. In common with the UCSC Genome Browser, Ensembl supports community standard file formats such as BAM, BED, wiggle and other common file types. We have also incorporated support for track hubs over the past year to enable researchers to set up and view large-scale datasets. For example, the data produced by the ENCODE consortium (8) can be viewed by loading the ENCODE track hub (9) and users can then access an experiment matrix within the Ensembl configuration menu and quickly select datasets by cell or experiment type. However, over and above simply displaying these data, Ensembl uses them as described below in our integrative Regulatory Build analysis resulting in an evidence-based annotation of whole-genome regulation.

Ensembl resources are available for a total of 77 species as of release 73 (September 2013) with human, mouse, zebrafish, rat and various farm animals having the most extensive support. For 60 chordate species, we have full support comprising evidence-based gene annotation and comparative genomics analysis. In addition, for 18 of these species, there are variation resources and regulatory annotation for human and mouse. At present 13 additional chordate species are accessible with basic support via Ensembl preview sites (available from http://pre.ensembl.org), which provide BLAST access to the genome data and genome visualization, but not a complete gene build. Three non-chordate model species are also fully supported by Ensembl—worm (Caenorhabditis elegans), fruit fly (Drosophila melanogaster) and yeast (Saccharomyces cerevisiae)—with imported annotation from their respective genome databases in partnership with the Ensembl Genomes project (10). All fully supported species are accessible via the Ensembl BioMart, the Ensembl APIs and web displays. All data are also available for querying via our public MySQL servers, as full data downloads and as an Amazon public dataset.

Since our last report (11), we have added two new species with full gene annotation and comparative genomics support: duck (Anas platyrhynchos) (12) and collared flycatcher (Ficedula albicollis) and one new species with variation support: gibbon. New assemblies with corresponding updates to the gene annotations, alignments and variation data were also provided for rat, cat and chicken. At the same time, we added seven new species with basic support on the Ensembl preview site: blind cave fish (Astyanax mexicanus), white rhinoceros (Ceratotherium simum simum), baboon (Papio anubis), prairie vole (Microtus ochrogaster), vervet monkey (Chlorocebus sabaeus), naked mole-rat (Heterocephalus glaber) and aardvark (Orycteropus afer) and updated the preview sites for common shrew (Sorex araneus), bottle-nosed dolphin (Tursiops truncatus), American pika (Ochotona princeps) and armadillo (Dasypus novemcinctus) with new assemblies. In addition, as the human and mouse genome assemblies are updated regularly by the Genome Reference Consortium (GRC) to include alternate sequences in the form of ‘fix’ and ‘novel’ assembly patches (13), we include these additional alternate sequences and annotate them with genes, variation and other features as appropriate. Ensembl release 73 (September 2013) included the human GRCh37.p12 assembly (i.e. the twelfth patch release of the GRCh37 assembly) and the GRCm38.p1 mouse assembly.

In addition to the newly support species and community standards reported above, the most important updates over the last year that have advanced the project since our last report (11) include new and more comprehensive phenotype annotations most valuable to those interested in human disease research, scrollable genome browsing designed to appeal to all users of our web interface and new REST endpoints supporting more flexible analysis options for those users that interact with the Ensembl resources programmatically. These and other features are described in more detail below.

Ensembl browser

This year we significantly updated Ensembl’s main Region in Detail page with the full incorporation of the Javascript-based, scrollable and zoomable browser, Genoverse, in place of the overview panel that had been a part of Ensembl for >10 years. Older, unsupported browsers fall back to the previous non-scrolling overview image. Genoverse allows users to scroll back and forth along the genome and update the main image below it to show the new region (Figure 1A). Our search engine was also upgraded from Lucene to Solr and we implemented a new search interface with features such as faceting and auto-completion.

An external file that holds a picture, illustration, etc.
Object name is gkt1196f1p.jpg

Figure 1.

(A) The scrollable Genoverse view (with view control icons in the upper right corner) provides the overview panel on the Region in Detail page. Image from URL: http://e73.ensembl.org/Homo_sapiens/Location/View?r=6:133017695-133161157. (B) Phenotype data from DDG2P, OMIM, Orphanet, HGMD and COSMIC for the human gene ATP2A2. Image created from URL http://e73.ensembl.org/Homo_sapiens/Gene/Phenotype?db=core;g=ENSG00000174437;r=12:110718561-110788898.

Our web displays dedicated to variation and phenotype data were also markedly improved. We specifically focused on displays for structural variants, which are now coloured by class and the higher quality structural variants from the 1000 Genomes Project are provided in a separate track. We introduced a page for structural variant phenotype data and now have additional phenotype data, including the variants from NCBI’s ClinVar project that are classified as being probable-pathogenic, pathogenic, drug-response or histocompatibility. Phenotype data from multiple sources are integrated and displayed for relevant genes (Figure 1B). Variants in regulatory regions are now annotated on all tracks and variant names are visible when zoomed in on all displays. We have also improved the VEP visual output in the form of summary pie charts.

Beyond these major developments, there have been other important improvements. In particular, we improved the handling of user data with a streamlined upload interface and support for uploading VEP output files. Additionally, configuration of complex data hubs has been made easier by displaying the track options in a matrix similar to the existing configurations for regulation data.

Ensembl annotations

All Ensembl annotations whether gene, variation or regulation, are based on integration of relevant data sources. We update the human gene set for every Ensembl release via a merge of the Ensembl evidence-based automatic annotation and Havana (14) manual annotation to produce an updated GENCODE gene set. This set also includes all current human Consensus Coding Sequence (CCDS) gene models (15). Manual annotation from Havana is additionally incorporated into our gene sets on alternate releases for zebrafish (16) and for mouse, which also includes all current CCDS gene models. Pig includes manual annotation from Havana on selected regions of the genome. The year 2013 has seen the inclusion of RNASeq data for seven species: human, chicken, cat, collared flycatcher, gibbon, rabbit and anole lizard. For gibbon, rabbit and anole lizard, the RNASeq data were used to update an existing standard gene annotation whereas for other species (except human) the data were integrated into the annotation as part of the primary gene-build process. Some of these species are provided with tissue-specific RNASeq samples which allow users to explore tissue-specific expression.

Ensembl variation annotation integrates all publicly available variation datasets to provide a coherent and complete resource for variome interpretation across 217 million variants in the 18 Ensembl species with supported variation resources. Basic variation data including genomic location, allele changes, allele and genotype frequencies and population data are imported for SNPs and indels from dbSNP (17) and for structural variants from DGVa (18). Additional human data are imported directly from the 1000 Genomes Project (19), the Exome Sequencing Project (20) and from 14 individual genomes that provide genotype information. In addition to the newly supported gibbon data listed above, over the past year, updated variation resources were released for human, platypus, cow, mouse, pig, zebrafish, opossum, orang-utan and macaque. We also extended cross-references to new and popular genotyping chips for human, chicken, horse and cow.

We significantly expanded our support for human phenotype data in Ensembl beyond the UniProt (21), OMIM (22), EGA, HGMD Public (23) (variation location only), COSMIC somatic mutations (24) and NHGRI GWAS Catalog (25) resources that we have supported in the past. New data from ClinVar, Orphanet (26), the Developmental Disorder Genotype—Phenotype Database (DDG2P) from DECIPHER (27), dbGaP, Phencode and the MAGIC and GIANT consortiums have now been fully integrated into Ensembl (Figure 1B). From these data sources, we select only the significant associations to display on the website and provide full datasets in the database and BioMart. For variants stored in LOVD (28), Ensembl queries LOVD directly and displays the information on the appropriate variation web page. In addition, we have now incorporated phenotype information for other species. Mouse phenotype data are provided from international projects including EuroPhenome (29) and IMPC (International Mouse Phenotyping Consortium) (30). For other animals, data are imported from the Online Mendelian Inheritance in Animals (OMIA) database (31). This year we have also developed a new pipeline to cross-reference publications citing variants from EuropePMC, NCBI and UCSC.

Regulatory annotation in Ensembl is currently available across multiple human and mouse cell lines. The main resource, the Ensembl Regulatory Build is a comprehensive synthesis of functional assays provide by a number of consortia, such as ENCODE (8) and Roadmap Epigenomics Mapping Consortium (32). Although the raw data from these projects can be displayed directly on Ensembl through dedicated track hubs, the Regulatory Build is a higher level integrated analysis that defines a collection of Regulatory Features (i.e. regions of the genome that display regulatory activity in one of 13 human or five murine cell lines). Where relevant, transcription-factor binding sites are predicted on these regions using the JASPAR binding motifs (33).

Additionally, Ensembl links out to relevant externally curated databases of regulatory data including enhancer regions from VISTA (34), miRNA binding sites from Microcosm http://www.ebi.ac.uk/enright-srv/microcosm/) and eQTLs from Genevar (35). Several reference DNA methylation experiments are also included.

Go to:

COMPARATIVE GENOMICS RESOURCES

Whole-genome alignments of vertebrate species are provided within the Ensembl Compara database. Because all genome assemblies are not sequenced to the same level of completeness, we group the assemblies into two tiers for differential processing. High quality genomes from 13 species are aligned into a progressive multiple sequence alignment using the EPO (Enredo-Pecan-Ortheus) pipeline (36,37), which also estimates the underlying ancestral genome sequences. The low coverage genomes of an additional 23 species, that are much more fragmented, are inserted into the previous alignment by mapping them onto the human assembly with LASTZ. We also produce clade-specific multiple sequence alignments for primates, birds, fish and amniotes. In particular, the fish multiple alignment has been extended to eight species this year. From these alignments, we compute the conservation at every position, using GERP (38).

Our gene-based comparative genomics resources are updated every release to incorporate new species, updated assemblies and gene annotation sets. These include gene phylogenetic trees, gene families and gene dynamics. This year, the inclusion of duck and collared flycatcher and an update of the guide species tree have greatly improved the quality of the gene trees in the Sauria clade and reduced the number of poorly supported duplications from 74% to 30%. In close collaboration with the TreeFam (39) and Ensembl Genomes (10) projects, we will migrate to an HMM-based classification for GreeTree annotation, which will reduce a key quadratic complexity to a linear one.

Data access, data mining and quality control

Ensembl’s REST service, available at http://beta.rest.ensembl.org, continues to be actively developed as a public beta (Figure 2). This year has seen the addition of 12 new endpoints including access to translation features, SNVs and protein domains, as well as access to whole-genome alignments by region. Additionally, the REST API is now able to query GeneTrees by their containing member stable identifier or gene symbols such as HGNC. Other new endpoints include the ability to use a stable identifier to identify overlapping features and location information as well as access to NCBI taxonomy and ontology datasets. The ontology endpoints currently provide the gene ontology (GO) (40), sequence ontology (SO) (41) and experimental factor ontology (EFO) (42) information used within Ensembl. The REST service will move out of beta during the next year coinciding with the introduction of POST requests and improved VEP integration.

An external file that holds a picture, illustration, etc.
Object name is gkt1196f2p.jpg

Figure 2.

Usage and example output for the Ensembl REST server Fetch Variant Consequences endpoint.

The more established BioMart data-mining tool (43) provides users with a variety of ways of accessing the Ensembl data quickly and with relative ease. Users can choose to access the data via the MartView web interface or via the MartService routes including the BioMart Perl API, DAS server, SOAP, REST or BioConductor biomaRt package. The Ensembl BioMart databases (4) are built from scratch each release in order to incorporate the latest annotated and imported data, and they are current with the data resources described above.

Beyond programmatic and tool-based data-access methods, we continue to provide complete data downloads in a variety of formats including the VCF files that were introduced this year to distribute many subsets of the Ensembl variation data.

To manage the increasing size and complexity of Ensembl releases, we have increased our quality control (QC) procedures over the past year. These are an essential part of each release cycle and range from validation testing of the various APIs to methods for checking data and database integrity. The Ensembl gene set is also independently analyzed using a specific curation/QC pipeline run for all updates of the human and mouse gene sets. This procedure compares the set of Ensembl translations for a particular species directly to the publicly available sequence resources UniProt (21) and RefSeq (44) and reports the percentage identity of the alignments. In addition to the above species, the pipeline has been employed for the rat, zebrafish and chicken genomes.

Ensembl variation and regulatory resources also rely on comprehensive and flexible infrastructure to manage the growing amount of relevant datasets in the public domain. In preparation for the updated human reference genome (GRCh38) expected at the end of 2013, we consolidated these pipelines to work automatically over a large number of files with minimal supervision. Specifically, more of our pipelines are run using the eHive system (3) and employ both a modular structure to the analysis and on-the-fly calculation for specific data types stored in the databases.

Outreach and training

Ensembl supports users through worldwide face-to-face training workshops, our gro.lbmesne@ksedpleh and gro.lbmesne@ved email lists, online training and social media. This year we made several changes to intensify and connect our user interactions on Twitter (https://twitter.com/Ensembl), Facebook (https://www.facebook.com/Ensembl.org) and the Ensembl Blog (http://www.ensembl.info/). Together these methods have proven extremely effective for supporting users beyond the traditional mailing list and FAQ model that we continue to maintain.

Distance and on-line training are provided through our Helpdesk channel on YouTube, which saw >70% growth in the number of subscribers and incorporated two new videos during the year. For those users wanting more intensive training, we filmed and now provide a one-day Ensembl browser workshop (http://www.ebi.ac.uk/training/online/course/ensembl-filmed-browser-workshop) and a three-day Ensembl API workshop (http://www.ebi.ac.uk/training/online/course/ensembl-filmed-api-workshop), both complete with videos and exercises. A Quick Ensembl course providing a short introduction to the browser (http://www.ebi.ac.uk/training/online/course/ensembl-quick-tour-0) complements the more complete workshops.

Go to:

ACKNOWLEDGEMENTS

The authors wish to thank all of their users, the systems teams who maintain their computational infrastructure and those researchers who have provided data to Ensembl in advance of publication under the understandings of the Fort Lauderdale meeting discussing Community Resource Projects and the Toronto meeting on pre-publication data sharing.

FUNDING

The Wellcome Trust provides majority funding for the Ensembl project [WT095908; and WT098051] with additional funding for specific project components from the National Human Genome Research Institute [U41HG007234;, 1R24RR032658; and 1R01HD074078], the BBSRC [BB/I025506/1, BB/I025360/1; and BB/K009524/1], the European Molecular Biology Laboratory and as specified: The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013;) under grant agreement n° 222664. (“Quantomics”); This Publication reflects only the author’s views and the European Community is not liable for any use that may be made of the information contained herein. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013;) under grant agreement number [200754] - the GEN2PHEN project; The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013;) under grant agreement [n° 282510 - BLUEPRINT]; Work supported by the European Commission within the Framework Programme 7 Capacities Specific Programme, under Grant Agreement [no. 312301] (Helix Nebula - The Science Cloud; Rat genomics resources receive additional support as specified: The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013;) under grant agreement [N° HEALTH-F4-2010-241504] (EURATRANS). Funding for open access charge: The Wellcome Trust.

Conflict of interest statement. None declared.

Go to:

REFERENCES

1. Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, et al. The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res. 2013;41:D64–D69. [Europe PMC free article] [Abstract] [Google Scholar]

2. Potter SC, Clarke L, Curwen V, Keenan S, Mongin E, Searle SMJ, Stabenau A, Storey R, Clamp M. The Ensembl analysis pipeline. Genome Res. 2004;14:934–941. [Europe PMC free article] [Abstract] [Google Scholar]

3. Severin J, Beal K, Vilella AJ, Fitzgerald S, Schuster M, Gordon L, Ureta-Vidal A, Flicek P, Herrero J. eHive: an artificial intelligence workflow system for genomic analysis. BMC Bioinform. 2010;11:240. [Europe PMC free article] [Abstract] [Google Scholar]

4. Kinsella RJ, Kähäri A, Haider S, Zamora J, Proctor G, Spudich G, Almeida-King J, Staines D, Derwent P, Kerhornou A, et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford) 2011;2011 bar030. [Europe PMC free article] [Abstract] [Google Scholar]

5. McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26:2069–2070. [Europe PMC free article] [Abstract] [Google Scholar]

6. Stabenau A, McVicker G, Melsopp C, Proctor G, Clamp M, Birney E. The Ensembl core software libraries. Genome Res. 2004;14:929–933. [Europe PMC free article] [Abstract] [Google Scholar]

7. Parker A, Bragin E, Brent S, Pritchard B, Smith JA, Trevanion S. Using caching and optimization techniques to improve performance of the Ensembl website. BMC Bioinform. 2010;11:239. [Europe PMC free article] [Abstract] [Google Scholar]

8. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. [Europe PMC free article] [Abstract] [Google Scholar]

9. Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R, Heitner SG, et al. ENCODE Data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 2013;41:D56–D63. [Europe PMC free article] [Abstract] [Google Scholar]

10. Kersey PJ, Staines DM, Lawson D, Kulesha E, Derwent P, Humphrey JC, Hughes DST, Keenan S, Kerhornou A, Koscielny G, et al. Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species. Nucleic Acids Res. 2012;40:D91–D97. [Europe PMC free article] [Abstract] [Google Scholar]

11. Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, et al. Ensembl 2013. Nucleic Acids Res. 2013;41:D48–D55. [Europe PMC free article] [Abstract] [Google Scholar]

12. Huang Y, Li Y, Burt DW, Chen H, Zhang Y, Qian W, Kim H, Gan S, Zhao Y, Li J, et al. The duck genome and transcriptome provide insight into an avian influenza virus reservoir species. Nat Genet. 2013;45:776–783. [Europe PMC free article] [Abstract] [Google Scholar]

13. Church DM, Schneider VA, Graves T, Auger K, Cunningham F, Bouk N, Chen H-C, Agarwala R, McLaren WM, Ritchie GRS, et al. Modernizing reference genome assemblies. PLoS Biol. 2011;9:e1001091. [Europe PMC free article] [Abstract] [Google Scholar]

14. Wilming LG, Gilbert JGR, Howe K, Trevanion S, Hubbard T, Harrow JL. The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 2008;36:D753–D760. [Europe PMC free article] [Abstract] [Google Scholar]

15. Harte RA, Farrell CM, Loveland JE, Suner M-M, Wilming L, Aken B, Barrell D, Frankish A, Wallin C, Searle S, et al. Tracking and coordinating an international curation effort for the CCDS Project. Database (Oxford) 2012;2012 bas008. [Europe PMC free article] [Abstract] [Google Scholar]

16. Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013;496:498–503. [Europe PMC free article] [Abstract] [Google Scholar]

17. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. [Europe PMC free article] [Abstract] [Google Scholar]

18. Lappalainen I, Lopez J, Skipper L, Hefferon T, Spalding JD, Garner J, Chen C, Maguire M, Corbett M, Zhou G, et al. dbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 2013;41:D936–D941. [Europe PMC free article] [Abstract] [Google Scholar]

19. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. [Europe PMC free article] [Abstract] [Google Scholar]

20. Fu W, O’Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Altshuler D, Shendure J, Nickerson DA, et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013;493:216–220. [Europe PMC free article] [Abstract] [Google Scholar]

21. UniProt Consortium. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res. 2013;41:D43–D47. [Europe PMC free article] [Abstract] [Google Scholar]

22. Amberger J, Bocchini C, Hamosh A. A new face and new challenges for Online Mendelian Inheritance in Man (OMIM(®)) Hum Mutat. 2011;32:564–567. [Abstract] [Google Scholar]

23. Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NST, Abeysinghe S, Krawczak M, Cooper DN. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 2003;21:577–581. [Abstract] [Google Scholar]

24. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D945–D950. [Europe PMC free article] [Abstract] [Google Scholar]

25. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. [Europe PMC free article] [Abstract] [Google Scholar]

26. Rath A, Olry A, Dhombres F, Brandt MM, Urbero B, Ayme S. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum. Mutat. 2012;33:803–808. [Abstract] [Google Scholar]

27. Swaminathan GJ, Bragin E, Chatzimichali EA, Corpas M, Bevan AP, Wright CF, Carter NP, Hurles ME, Firth HV. DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders. Hum Mol Genet. 2012;21:R37–R44. [Europe PMC free article] [Abstract] [Google Scholar]

28. Fokkema IFAC, Taschner PEM, Schaafsma GCP, Celli J, Laros JFJ, den Dunnen JT. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 2011;32:557–563. [Abstract] [Google Scholar]

29. Morgan H, Beck T, Blake A, Gates H, Adams N, Debouzy G, Leblanc S, Lengger C, Maier H, Melvin D, et al. EuroPhenome: a repository for high-throughput mouse phenotyping data. Nucleic Acids Res. 2010;38:D577–D585. [Europe PMC free article] [Abstract] [Google Scholar]

30. Mallon A-M, Iyer V, Melvin D, Morgan H, Parkinson H, Brown SDM, Flicek P, Skarnes WC. Accessing data from the International Mouse Phenotyping Consortium: state of the art and future plans. Mamm. Genome. 2012;23:641–652. [Europe PMC free article] [Abstract] [Google Scholar]

31. Lenffer J, Nicholas FW, Castle K, Rao A, Gregory S, Poidinger M, Mailman MD, Ranganathan S. OMIA (Online Mendelian Inheritance in Animals): an enhanced platform and integration into the Entrez search interface at NCBI. Nucleic Acids Res. 2006;34:D599–D601. [Europe PMC free article] [Abstract] [Google Scholar]

32. Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 2010;28:1045–1048. [Europe PMC free article] [Abstract] [Google Scholar]

33. Portales-Casamar E, Thongjuea S, Kwon AT, Arenillas D, Zhao X, Valen E, Yusuf D, Lenhard B, Wasserman WW, Sandelin A. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 2010;38:D105–D110. [Europe PMC free article] [Abstract] [Google Scholar]

34. Visel A, Minovitsky S, Dubchak I, Pennacchio LA. VISTA Enhancer Browser–a database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35:D88–D92. [Abstract] [Google Scholar]

35. Yang T-P, Beazley C, Montgomery SB, Dimas AS, Gutierrez-Arcelus M, Stranger BE, Deloukas P, Dermitzakis ET. Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies. Bioinformatics. 2010;26:2474–2476. [Europe PMC free article] [Abstract] [Google Scholar]

36. Paten B, Herrero J, Beal K, Fitzgerald S, Birney E. Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 2008;18:1814–1828. [Europe PMC free article] [Abstract] [Google Scholar]

37. Paten B, Herrero J, Fitzgerald S, Beal K, Flicek P, Holmes I, Birney E. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 2008;18:1829–1843. [Europe PMC free article] [Abstract] [Google Scholar]

38. Cooper,G.M., Stone,E.A., Asimenos,G., NISC Comparative Sequencing Program, Green,E.D., Batzoglou,S. and Sidow,A. (2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res., 15, 901–913. [Europe PMC free article] [Abstract]

39. Ruan J, Li H, Chen Z, Coghlan A, Coin LJM, Guo Y, Hériché J-K, Hu Y, Kristiansen K, Li R, et al. TreeFam: 2008 Update. Nucleic Acids Res. 2008;36:D735–D740. [Europe PMC free article] [Abstract] [Google Scholar]

40. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. [Europe PMC free article] [Abstract] [Google Scholar]

41. Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 2005;6:R44. [Europe PMC free article] [Abstract] [Google Scholar]

42. Malone J, Holloway E, Adamusiak T, Kapushesky M, Zheng J, Kolesnikov N, Zhukova A, Brazma A, Parkinson H. Modeling sample variables with an Experimental Factor Ontology. Bioinformatics. 2010;26:1112–1118. [Europe PMC free article] [Abstract] [Google Scholar]

43. Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A. BioMart–biological queries made easy. BMC Genom. 2009;10:22. [Europe PMC free article] [Abstract] [Google Scholar]

44. Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40:D130–D135. [Europe PMC free article] [Abstract] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Full text links

Read article at publisher's site: https://doi.org/10.1093/nar/gkt1196

Read article for free, from open access legal sources, via Unpaywall: https://academic.oup.com/nar/article-pdf/42/D1/D749/3627199/gkt1196.pdf

Citations & impact

Impact metrics

918

Citations

Jump to Citations

Data citation

Jump to Data

Citations of article over time

Alternative metrics

Altmetric item for https://www.altmetric.com/details/1968917

Altmetric
Discover the attention surrounding your research
https://www.altmetric.com/details/1968917

Article citations

Altered microRNA composition in the uterine lumen fluid in cattle (Bos taurus) pregnancies initiated by artificial insemination or transfer of an in vitro produced embryo.
Biase FH, Moorey SE, Schnuelle JG, Rodning S, Ortega MS, Spencer TE
J Anim Sci Biotechnol, 15(1):130, 13 Sep 2024
Cited by: 0 articles | PMID: 39267128 | PMCID: PMC11397056
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations.
Wang L, Sun H, Yue Z, Xia J, Li X
PeerJ, 12:e17991, 06 Sep 2024
Cited by: 0 articles | PMID: 39253604 | PMCID: PMC11382650
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Maternal n-3 enriched diet reprograms the offspring neurovascular transcriptome and blunts inflammation induced by endotoxin in the neonate.
Chumak T, Jullienne A, Ek CJ, Ardalan M, Svedin P, Quan R, Salehi A, Salari S, Obenaus A, Vexler ZS, Mallard C
J Neuroinflammation, 21(1):199, 11 Aug 2024
Cited by: 0 articles | PMID: 39128994 | PMCID: PMC11316986
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
Identification of novel cattle (Bos taurus) genes and biological insights of their function in pre-implantation embryo development.
Schettini GP, Morozyuk M, Biase FH
BMC Genomics, 25(1):775, 09 Aug 2024
Cited by: 0 articles | PMID: 39118001 | PMCID: PMC11313146
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC
CTC-derived pancreatic cancer models serve as research tools and are suitable for precision medicine approaches.
Tang J, Zheng Q, Wang Q, Zhao Y, Ananthanarayanan P, Reina C, Šabanović B, Jiang K, Yang MH, Meny CC, Wang H, Agerbaek MØ, Clausen TM, Gustavsson T, Wen C, Borghi F, Mellano A, Fenocchio E, Gregorc V, [...] Heeschen C
Cell Rep Med, 5(9):101692, 19 Aug 2024
Cited by: 0 articles | PMID: 39163864 | PMCID: PMC11524981
This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.
Free full text in Europe PMC

Go to all (918) article citations

Other citations

Wikipedia (3)

Data

Data behind the article

This data has been text mined from the article, or deposited into data resources.

BioStudies: supplemental material and supporting data

http://www.ebi.ac.uk/biostudies/studies/S-EPMC3964975?xr=true

Ensembl Genome Browser

(1 citation) Ensembl - ENSG00000174437

Data that cites the article

This data has been provided by curated databases and other sources that have cited the article.

ENCODE: Encyclopedia of DNA Elements

http://encodeproject.org/publications/055c259f-596a-4020-b43d-c966795b48b8/

Funding

Funders who supported this work.

Biotechnology and Biological Sciences Research Council (6)

Grant ID: BB/I025328/1
20 publications
Grant ID: BB/I025360/1
14 publications
Grant ID: BB/K009524/1
6 publications
Grant ID: BB/I025360/2
18 publications
Grant ID: BB/I025506/1
19 publications
Pig genome annotation and analysis
Dr Tim Hubbard, The Wellcome Trust Sanger Institute
Grant ID: BB/E011640/1
13 publications

NCRR NIH HHS (2)

Grant ID: 1R24RR032658
1 publication
Grant ID: R24 RR032658
15 publications

NHGRI NIH HHS (3)

Grant ID: U41HG007234
10 publications
Grant ID: U41 HG007234
192 publications
Grant ID: U54 HG004555
59 publications

NICHD NIH HHS (2)

Grant ID: 1R01HD074078
4 publications
Grant ID: R01 HD074078
26 publications

Wellcome Trust (3)

Ensembl
Dr Paul Flicek, European Bioinformatics Institute
Grant ID: 095908
83 publications
Grant ID: WT095908
59 publications
Grant ID: WT098051
621 publications

Search life-sciences literature (45,103,477 articles, preprints and more)

Ensembl 2014.

Author information

Affiliations

ORCIDs linked to this article

Abstract

Free full text

Ensembl 2014

Paul Flicek

M. Ridwan Amode

Daniel Barrell

Kathryn Beal

Konstantinos Billis

Simon Brent

Denise Carvalho-Silva

Peter Clapham

Guy Coates

Stephen Fitzgerald

Laurent Gil

Carlos García Girón

Leo Gordon

Thibaut Hourlier

Sarah Hunt

Nathan Johnson

Thomas Juettemann

Andreas K. Kähäri

Stephen Keenan

Eugene Kulesha

Fergal J. Martin

Thomas Maurel

William M. McLaren

Daniel N. Murphy

Rishi Nag

Bert Overduin

Miguel Pignatelli

Bethan Pritchard

Emily Pritchard

Harpreet S. Riat

Magali Ruffier

Daniel Sheppard

Kieron Taylor

Anja Thormann

Stephen J. Trevanion

Alessandro Vullo

Steven P. Wilder

Mark Wilson

Amonida Zadissa

Bronwen L. Aken

Ewan Birney

Fiona Cunningham

Jennifer Harrow

Javier Herrero

Tim J.P. Hubbard

Rhoda Kinsella

Matthieu Muffato

Anne Parker

Giulietta Spudich

Andy Yates

Daniel R. Zerbino

Stephen M.J. Searle

Ensembl browser

Ensembl annotations

Data access, data mining and quality control

Outreach and training

FUNDING

Full text links

Citations & impact

Impact metrics

Citations of article over time

Alternative metrics

Article citations

Other citations

Wikipedia (3)

Data

Data behind the article

BioStudies: supplemental material and supporting data

Ensembl Genome Browser

Data that cites the article

ENCODE: Encyclopedia of DNA Elements

Similar Articles

Biotechnology and Biological Sciences Research Council (6)

NCRR NIH HHS (2)

NHGRI NIH HHS (3)

NICHD NIH HHS (2)

Wellcome Trust (3)