Abstract
Free full text
The NIH Roadmap Epigenomics Program data resource
Abstract
The NIH Roadmap Reference Epigenome Mapping Consortium is developing a community resource of genome-wide epigenetic maps in a broad range of human primary cells and tissues. There are large amounts of data already available, and a number of different options for viewing and analyzing the data. This report will describe key features of the websites where users will find data, protocols and analysis tools developed by the consortium, and provide a perspective on how this unique resource will facilitate and inform human disease research, both immediately and in the future.
The completion of the Human Genome Project marked a significant milestone, one which paved the way for annotation of the full catalog of human genes. This was undeniably a huge step forward for human disease research. The sequence of a gene, however, only provides some insight into its function. Given that each of our cells possesses an identical complement of genes, what differentiates a skin cell from a heart muscle cell from a neuron? Genes must be turned on, off or become expressed at different levels to effect the changes leading to the functional differences between cell types. Therefore, it is equally important to understand how these genes are regulated – when, where and how is a given gene expressed? Epigenetic mechanisms, such as DNA methylation and a variety of post-translational histone modifications, play an important role in establishing gene-expression programs, as well as in maintaining them, as cells divide.
The NIH Roadmap Epigenomics Program [101], funded through the NIH Common Fund, was developed with the goal of investigating how epigenetic mechanisms contribute to human health and disease. This multicomponent program funds research in several relevant areas, including technology development in epigenetics and epigenetic imaging, discovery and characterization of novel epigenetic marks, and investigation of how epigenetic signatures are disrupted in human disease. One key goal of this program is to gain a better understanding of the normal pattern of epigenetic modification, which will allow for comparisons between different tissues and cell types, and will serve as a reference for comparison to diseased samples. Recent advances in sequencing technology have made it possible to move beyond gene-by-gene analyses, allowing for truly unbiased, genome-wide mapping of epigenetic modifications. The NIH Roadmap Reference Epigenome Mapping Consortium, a group comprised of four Reference Epigenome Mapping Centers and an Epigenomics Data Analysis and Coordination Center, has been charged with generating these genome-wide epigenomic maps and assembling them into a publicly available data resource (Table 1) [1].
Table 1
Download data? | Browse data with… | View data on… | Updated | Other features | Can upload own data? | Ref. | |
---|---|---|---|---|---|---|---|
Reference Epigenome Mapping Consortium Homepage | Links to data download | Clickable data matrix or visual data browser | Links to UCSC browser mirror (Epigenome Browser) | Data: at each data freeze (four-times per year) Other items: as needed | Protocols, publications, quality metrics, project and center/group information | No (but linked Epigenome Browser supports upload) | [103] |
NCBI Epigenomics Hub | .wig | Sample (i.e., cell/tissue type) browser, experiment (i.e., epigenetic feature) browser, text search | NCBI epigenomics viewer or UCSC browser mirror | Continuously | `Compare Samples' tool to identify regions of greatest chromatin differences, suggests GO terms and pathways most associated | Being implemented | [106] |
NCBI Gene Expression Omnibus | .bed, .wig, .bam and SRA | By sample, study or data matrix | NCBI Epigenomics viewer | Continuously | N/A | [105] | |
The Human Epigenome Atlas (on Genboree) | .bed, .wig by ftp or http | By sample, assay or clickable data matrix | UCSC browser, Atlas Gene/Pathway browser (read densities across single genes or pathways) | At each data freeze | Information on metadata, data flow and data quality. Tools for analysis via Genboree workbench (independent tools and Galaxy pipelines). Data and functionality exposed via HTTP REST APIs for programmatic use and extension | [107] | |
Roadmap Epigenomics Visualization Hub | No | ENCODE style data matrix | UCSC browser mirror, or remote display at UCSC main site (see [108], click `load track hub') | At each data freeze | UCSC mirror hosts integrative analysis tracks and summary tracks, tracks viewable at UCSC main site | Being implemented | [108,109] |
Human Epigenome Browser at Washington University | Yes | Expandable data selection matrix and metadata matrix | Next-generation epigenome browser | At each data freeze | Google map-style zoom and pan, genomic data and metadata viewer, data collation view, pathway/gene set view, statistical analysis | Yes | [111] |
Epigenome Browser UCSC mirror | .bed, .wig through table browser Individual reads not available | UCSC data selection matrix (ENCODE style) | UCSC browser mirror | At each data freeze | High-utility UCSC mirror tracks | Yes | [104] |
API: Application programming interface; ENCODE: Encyclopedia of DNA Elements; GO: Gene-ontology; HTTP: Hypertext transfer protocol; N/A: Not applicable; NCBI: National Center for Biotechnology Information; REST: Representational state transfer; SRA: Short Read Archive; UCSC: University of California-Santa Cruz.
Types of data available: cell types & epigenetic features
The Reference Epigenome Mapping Consortium is focused on developing reference epigenomic maps for a variety of human primary cells and tissues. As is true with any epigenetic study, a number of considerations are involved when selecting samples for mapping. Each of the specific cell types that make up a tissue probably have different epigenomic profiles. However, it can often be nearly impossible to isolate enough material of a particular purified cell type for analysis. The consortium has made an effort to achieve balance by covering a wide range of disease-relevant tissues, while including more highly purified cell types when possible. Currently, a wide range of adult and fetal cells and tissues are represented, including cells from a number of distinct brain regions and a variety of purified blood cell types. In addition, several pluripotent cell lines are included, such as induced pluripotent stem cells, human embryonic stem cells, as well as some differentiated forms of these cells. Currently, over 120 unique human primary cells, tissues and pluripotent cells are represented in the database (Table 1).
Specific epigenetic modifications can often be associated with a particular function; for example, H3K9me3 is generally found in repressed regions of the genome, while H3K9ac is generally correlated with gene activation. However, simply determining the distribution of one mark is not sufficient, as the function of a given mark may vary depending upon the broader chromatin context in which it resides. Furthermore, these marks must be correlated with a functional outcome, such as altered gene expression. A key strength of this data resource is the fact that for each cell and tissue type represented, multiple features will be ultimately be mapped, including DNA methylation, post-translational histone modifications, chromatin accessibility and RNA. DNA methylation data will be made available for all cell and tissue types represented. Detailed protocols and standards for each of these analyses have been made available to the community online [102].
The consortium employs several methods for the analysis of DNA methylation, each of which have strengths and limitations (several of these were described in some detail by Bock and colleagues [2]). These include reduced representation bisulfite sequencing [3] and a combination of complementary MeDIP-seq/MRE-seq assays [4]. More recently, the consortium has moved towards using MethylC-seq [5] wherever possible, which offers full coverage DNA methyl ation maps at base pair resolution. As methods for high-throughput analysis of 5-hydroxymethylcytosine are being developed, this feature may also be added to a subset of cell types in the future. Two approaches are used for the analysis of histone modifications by chromatin immunoprecipitation with sequencing. A small number of high-value samples, including several embryonic stem cell lines and their differentiated forms, will be analyzed to significant depth, with a large panel of histone modifications. Currently, there are approximately 30 distinct modifications represented in this panel, with additional modifications added as suitable antibodies are identified. The data gained from these more comprehensive analyses are used to inform the selection of a more limited panel of histone modifications, which is applied to the majority of samples being analyzed. Currently, this panel includes H3K27me3, H3K36me3, H3K4me1, H3K4me3, H3K9me3 and H3K27ac. These are the modifications found to be the most informative, namely the ones that are most difficult to predict based on other modifications [Ernst J, Kellis M, Pers. Comm.]. In addition to DNA methylation and histone modifications, most samples will undergo DNase I hypersensitivity mapping [6], to provide a measure of chromatin accessibility. Finally, each sample will be analyzed for RNA content. In many cases, this will be accomplished with expression arrays, but the consortium is moving towards using RNA-sequencing (RNA-seq) [7] when possible. RNA-seq offers the most comprehensive view of RNA expression. It includes small RNAs, provides a measurement of alternative splicing events and enables allelic analyses to be carried out. The most current standards in use for chromatin immunoprecipitation with sequencing, whole-genome bisulfite sequencing, and RNA-seq can be found at the Consortium homepage, under `Protocols and Data Standards' [102].
Navigating the data
In order to maximize community accessibility to the data and resources developed by the program, we have developed several websites for viewing and downloading data and other information, each with unique features. These are summarized in Table 1 and described below.
Reference Epigenome Mapping Consortium homepage
The centerpiece of the program is the Reference Epigenome Mapping Consortium homepage [103], a centralized landing page which houses consortium data, as well as a variety of additional information, including detailed experimental protocols and information about data quality metrics, downloadable tools for processing and analyzing sequencing data, as well as links to other websites associated with consortium data (Figure 1). A list of all consortium publications can also be found here. Tabs located at the top of the page facilitate navigation of the site. This site is continually evolving in an effort to maximize the user experience and facilitate use of the resource by the community.
This site also features an easy-to-use interface to browse consortium data. Clicking on the `Data' tab will open a matrix-style data browser. Cell and tissue types are grouped into anatomic categories for easy navigation. Available data types are indicated as shaded squares. Clicking on any of these squares will open a track selection window on a consortium-hosted University of California-Santa Cruz (UCSC; CA, USA) Genome Browser Mirror site [104] where the user can select specific data tracks to be viewed for that cell type. Once tracks of interest are selected, the user must change the `maximum display mode' to full, using the drop down menu, and click `submit'. Users can also choose to view all available data for a particular epigenetic feature across all cell types by using the drop down menu found at the top of the matrix browser (`select assay to view cell/tissue data'). This site also offers a unique visual data browser, which displays cell types with available data in an anatomical context.
The consortium's goal is to provide a complete data set for each cell type analyzed. As described earlier, this data set – referred to as a `complete epigenome' – would contain DNA methylation data, RNA expression data, a panel of histone modifications and DNase I hypersensitivity profiles where possible. Definitions of the various classes of complete epigenomes used by the consortium can be found on this site by clicking the `Complete Epigenomes' tab at the top of the page. This page will also be updated with a list of all cell/tissue types that have reached completed status.
National Center for Biotechnology Information
The National Center for Biotechnology Information (NCBI) serves as the long-term archive for data produced by the Reference Epigenome Mapping Consortium, as well as for the other epigenomics projects funded by the Roadmap Epigenomics Program. Rapid data release is an important goal of the consortium. While consortium data can be viewed on several websites, most are updated only after a data freeze, which occurs several times a year. By contrast, the two NCBI sites described below are updated continuously, providing a real-time picture of the data available.
Users interested in simply downloading data files for ana lysis with their own tools should begin at the NIH Roadmap Epigenomics page of the Gene Expression Omnibus (GEO) [105]. This site offers several options for navigating the data: listed by sample (a particular feature in a particular cell type), using a matrix-style browser, or by search terms; however, only the sample list is updated continuously. Clicking on any accession number will open the metadata associated with the sample, where the user will find specific information about the individual the sample was derived from, the experimental conditions, and data generation and processing. The data can be downloaded in several of the most commonly used file formats (e.g., .bed, .wig, .bam and Short Read Archive).
NCBI has also developed an epigenomics portal [8,106], which hosts data generated by the Roadmap Epigenomics Program, as well as other, user-submitted epigenomic data, including data from the Encyclopedia of DNA Elements (ENCODE) and ENCODE in model systems programs [9], a complementary program focused on defining the functional elements that control gene expression. Data can be browsed either by experiment (i.e., by individual data set) or by sample (i.e., by cell or tissue type). Here, data files can be downloaded or viewed on a genome browser. This allows users to view epigenomic data in the context of other genomic data housed at the NCBI, such as SNPs, clinically associated variants and significant genome-wide association studies (GWAS) associations. The epigenomics portal also features a tool to compare the epigenetic profiles of two different samples, pulling out a list of genes and associated gene-ontology terms that differ most significantly in epigenetic profiles between the samples.
Human Epigenome Atlas
The Human Epigenome Atlas [107] was developed by the consortium's Epigenomic Data Analysis and Coordination Center at the Baylor College of Medicine (TX, USA). As with the other sites described, one can quickly browse the available data using a matrix style browser, download data and metadata, and view data on a genome browser, the UCSC genome browser in this case. This site also includes more extensive explanations of the metadata collected, the flow of data from Mapping Center to Epigenomic Data Analysis and Coordination Center to NCBI, and the ana lysis pipelines in place. This information can all be found under the `Informatics' tab at the top of the page.
One unique feature of this site is the Atlas Gene Browser. This allows users to focus on the epigenetic profile of the introns, exons, promoters and UTR of a specific gene(s) or pathway of interest, to assist in identifying common regulatory themes. This site also hosts the Genboree Workbench (also found under the `Informatics' tab), which contains a number of consortium-developed tools for data ana lysis and quality checking of epigenomic data. Although the Genboree workbench is open to the public, one must create an account in order to use these tools. This allows the user to upload and securely store their own data for ana lysis with the tools provided.
UCSC Track Hub
The UCSC Genome Browser [10,108] remains one of the most popular tools for viewing genomic data. The UCSC browser is also the home to data generated by the ENCODE program [11]. In an effort to make the Roadmap Epigenomic Mapping Consortium data available to the public within the UCSC interface, the VizHub was developed [109]. VizHub stores individual data tracks and summary tracks that can be loaded remotely onto the UCSC browser. In order to access the Roadmap Track Hub at the UCSC Human Genome Browser, the user must navigate to the UCSC Human Genome Browser Gateway [110] and click the `Track Hubs' button. The track hub of interest can be selected and loaded, allowing Roadmap Epigenomics data to be displayed on the browser. The user can navigate the available data in the same way as one would usually do on the UCSC browser. For example, clicking on `DNA methylation' would open a track selection window where available methylation tracks may be selected.
The Human Epigenome Browser
The Human Epigenome Browser [111] is a next-generation epigenome browser developed to tackle the challenge of navigating and viewing large amounts of epigenomic data and the associated metadata. The features of this novel browser have been thoroughly described in a recent publication [12], so it will not be covered in detail here.
Future perspective
The NIH Roadmap Epigenomic Mapping Consortium has made great strides in developing reasonably comprehensive epigenomic maps for a variety of normal human cell and tissue type [112]. The next challenge is to refine these maps even further. Each of our organs are made up of many distinct, specialized cell types. For some cell types, such as hematopoetic cells, one can obtain a highly purified cell type in sufficient quantities to allow for genome-wide epigenetic analyses, but for most cell types this remains a significant technical challenge. The consortium has made an effort to strike a balance between highly purified cell types and more hetero geneous tissues in selecting samples to be mapped, but as technologies continue to rapidly improve, so will the ability to define the epigenomes of these unique cell types.
As the NIH Roadmap Epigenomics Program nears completion in 2013, the data available to the community continues to grow. The next step will be in applying these data to understanding human disease. How are these normal epigenetic programs disrupted in disease, and if changes are observed, are they a cause or a consequence of disease? In a separate initiative, the NIH Roadmap Epigenomics Program has also funded a number of investigators tackling this question, who are seeking to clarify the role of epigenetics in the pathogenesis of a variety of complex human diseases, including Alzheimer's disease, glaucoma, atherosclerosis, bipolar disorder, asthma and autism [113]. As these and other similar studies begin to bear fruit, we will begin to be able to answer this question.
An additional challenge is, how we can leverage the data in this resource to understand the role of genetic variants in disease? Some intriguing data arising from the ENCODE program suggest how genome-wide information about chromatin state may be used to identify functional genomic elements linked to hits arising from GWAS [13]. One of the major challenges that has arisen in these studies is that SNPs strongly associated with the disease of interest are often found in regions of the genome with no obvious functional significance. Ernst and Kellis used a novel bioinformatics method to identify six general classes of chromatin states; specific combinations of epigenetic modi fications associated with a functional state, such as promoter, enhancer, insulator, transcribed, repressed and inactive [14]. They found that in a number of existing GWAS data sets, disease-associated SNPs fell into regions identified as cell-type-specific enhancers in a biologically relevant cell type. Going forward, the reference epigenomic maps developed by the Roadmap Epigenomic Mapping Consortium, which cover a far broader range of human primary cells and tissues than currently represented in ENCODE, will be incredibly valuable for investigators trying to understand how specific genetic variants contribute to disease.
Finally, once we have determined what constitutes a `normal' epigenome, we can begin to investigate how epigenetic profiles change in response to the environment. As the agouti viable yellow (Avy) mouse has demonstrated so strikingly [15–17], the epigenome is exquisitely sensitive to external factors, such as diet or environmental chemicals. The reference epigenomes developed by the consortium will form a foundation for future studies aimed at understanding how the epigenome is perturbed, to identify biomarkers of exposure, and ultimately moving towards the development of intervention strategies to minimize the health impact in exposed populations.
Acknowledgements
Without the hard work and collegiality of the investigators and NIH staff that make up the Reference Epigenome Mapping Consortium, this data resource would not be possible. The four Reference Epigenome Mapping Centers are headed by J Stamatoyannopoulos (University of Washington, WA, USA), B Bernstein and A Meissner (Broad Institute, MA, USA), B Ren (University of California San Diego, CA, USA) and J Costello (University of California San Fransisco, CA, USA); the Epigenomics Data Analysis and Coordination Center (TX, USA) is led by A Milosavljevic (Baylor College of Medicine, TX, USA); the National Center for Biotechnology Information (NCBI; MD, USA) effort is led by G Schuler (NCBI). The program is directed by staff at the National Institute of Environmental Health Sciences (NC, USA) and the National Institute on Drug Abuse (MD, USA).
Footnotes
Disclaimer This article may be the work product of an employee or group of employees of the National Institute of Environmental Health Sciences (NIEHS), NIH; however, the statements, opinions or conclusions contained therein do not necessarily represent the statements, opinions or conclusions of NIEHS, NIH or the US government.
Financial & competing interests disclosure The author is an employee of the NIH. The author has no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
References
Papers of special note have been highlighted as:
■of interest
■■ of considerable interest
Websites
Full text links
Read article at publisher's site: https://doi.org/10.2217/epi.12.18
Read article for free, from open access legal sources, via Unpaywall: https://europepmc.org/articles/pmc3381455?pdf=render
Citations & impact
Impact metrics
Article citations
Advances in Integrated Multi-omics Analysis for Drug-Target Identification.
Biomolecules, 14(6):692, 14 Jun 2024
Cited by: 3 articles | PMID: 38927095
Review
Identification of drug responsive enhancers by predicting chromatin accessibility change from perturbed gene expression profiles.
NPJ Syst Biol Appl, 10(1):62, 30 May 2024
Cited by: 0 articles | PMID: 38816426 | PMCID: PMC11139989
Overview: Research on the Genetic Architecture of the Developing Cerebral Cortex in Norms and Diseases.
Methods Mol Biol, 2794:1-12, 01 Jan 2024
Cited by: 0 articles | PMID: 38630215
Expanding adverse outcome pathways towards one health models for nanosafety.
Front Toxicol, 5:1176745, 25 Aug 2023
Cited by: 0 articles | PMID: 37692900 | PMCID: PMC10485555
DeepITEH: a deep learning framework for identifying tissue-specific eRNAs from the human genome.
Bioinformatics, 39(6):btad375, 01 Jun 2023
Cited by: 4 articles | PMID: 37294799 | PMCID: PMC10281860
Go to all (132) article citations
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
The NIH Common Fund/Roadmap Epigenomics Program: Successes of a comprehensive consortium.
Sci Adv, 5(7):eaaw6507, 10 Jul 2019
Cited by: 26 articles | PMID: 31501771 | PMCID: PMC6719411
Review Free full text in Europe PMC
The NIH Roadmap Epigenomics Mapping Consortium.
Nat Biotechnol, 28(10):1045-1048, 01 Oct 2010
Cited by: 1152 articles | PMID: 20944595 | PMCID: PMC3607281
Community resources and technologies developed through the NIH Roadmap Epigenomics Program.
Methods Mol Biol, 1238:27-49, 01 Jan 2015
Cited by: 6 articles | PMID: 25421653
Review
The International Human Epigenome Consortium Data Portal.
Cell Syst, 3(5):496-499.e2, 15 Nov 2016
Cited by: 88 articles | PMID: 27863956
Funding
Funders who supported this work.
Intramural NIH HHS (1)
Grant ID: Z99 ES999999