Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3107411.3108230acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

Analysis of Controls in ChIP-seq

Published: 20 August 2017 Publication History

Abstract

The chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) method, initially introduced a decade ago, is widely used by the scientific community to detect protein/DNA binding and histone modifications across the genome in various cell lines. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alleviate bias, incorporation of control datasets in ChIP-seq analysis is an essential step. The controls are used to detect background signal, whilst the ChIP-seq experiment captures the true binding or histone modification signal. However, a recurrent issue is the existence of noise and bias in the controls themselves, as well as different types of bias in ChIP-seq experiments. Thus, depending on which controls are used, peak calling can produce different results (i.e., binding site positions) for the same ChIP-seq experiment. Consequently, generating "smart" controls, which model the non-signal effect for a specific ChIP-seq experiment, could enhance contrast and thus increase the reliability and reproducibility of the results. Our analysis aims to improve our understanding of ChIP-seq controls and their biases. We use unsupervised clustering and dimensionality reduction techniques to compare 160 controls for the K562 cell line in the ENCODE project, finding distincting groupings of controls which correlate to experimental characteristics. To customize a control for each ChIP-seq experiment, we use LASSO regression to fit a sparse set of controls to each of 500 ChIP-seq experiments (again, from ENCODE data for the K562 cell line). We look at how many controls are selected, which controls are used per ChIP-seq experiment, and how they are related to the different ChIP-seq experiment characteristics. Perhaps most surprisingly, we find that the LASSO models are not particularly sparse, often including half of the possible controls to model any given ChIP-seq. Cross-validation as well as testing with smaller sets of candidate controls proves that such large numbers of controls are beneficial for modeling ChIP-seq background distributions. We also observe clusters of ChIP-seq experiments that tend to rely on clusters of controls, and we look at the experimental characteristics that tend to cause a given control to be useful in modeling the background of a given ChIP-seq experiment. Through these analyses, we attempt to answer largely-unstudied questions regarding how much control data and of what types are useful in ChIP-seq analysis, and how suitable controls can be matched to ChIP-seq datasets.

References

[1]
Laura Arrigoni, Andreas S Richter, Emily Betancourt, Kerstin Bruder, Sarah Diehl, Thomas Manke, and Ulrike Bönisch 2015. Standardizing chromatin research: a simple and universal method for ChIP-seq. Nucleic acids research (2015), gkv1495.
[2]
Yuval Benjamini and Terence P Speed 2012. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic acids research (2012), gks001.
[3]
Yiwen Chen, Nicolas Negre, Qunhua Li, Joanna O Mieczkowska, Matthew Slattery, Tao Liu, Yong Zhang, Tae-Kyung Kim, Housheng Hansen He, Jennifer Zieba, and others 2012. Systematic evaluation of factors influencing ChIP-seq fidelity. Nature methods, Vol. 9, 6 (2012), 609--614.
[4]
ENCODE Project Consortium and others 2004. The ENCODE (ENCyclopedia of DNA elements) project. Science, Vol. 306, 5696 (2004), 636--640.
[5]
Christoffer Flensburg, Sarah A Kinkel, Andrew Keniry, Marnie E Blewitt, and Alicia Oshlack. 2014. A comparison of control samples for ChIP-seq of histone modifications. Frontiers in genetics Vol. 5 (2014), 329.
[6]
Naozumi Hiranuma, Scott Lundberg, and Su-In Lee. 2016. CloudControl: Leveraging many public ChIP-seq control experiments to better remove background noise. In Proceedings of the 7th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM, 191--199.
[7]
Mehran Karimzadeh, Carl Ernst, Anshul Kundaje, and Michael M Hoffman 2016. Umap and Bismap: quantifying genome and methylome mappability. bioRxiv (2016), 095463.
[8]
Stephen G Landt, Georgi K Marinov, Anshul Kundaje, Pouya Kheradpour, Florencia Pauli, Serafim Batzoglou, Bradley E Bernstein, Peter Bickel, James B Brown, Philip Cayting, and others 2012. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome research, Vol. 22, 9 (2012), 1813--1831.
[9]
Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L Salzberg 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology, Vol. 10, 3 (2009), R25.
[10]
Heng Li and Richard Durbin 2009. Fast and accurate short read alignment with Burrows--Wheeler transform. Bioinformatics, Vol. 25, 14 (2009), 1754--1760.
[11]
Georgi K Marinov, Anshul Kundaje, Peter J Park, and Barbara J Wold 2014. Large-scale quality analysis of published ChIP-seq data. G3: Genes| Genomes| Genetics Vol. 4, 2 (2014), 209--223.
[12]
Clifford A Meyer and X Shirley Liu 2014. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nature Reviews Genetics Vol. 15, 11 (2014), 709--721.
[13]
Ryuichiro Nakato and Katsuhiko Shirahige 2016. Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation. Briefings in bioinformatics (2016), bbw023.
[14]
Parameswaran Ramachandran, Gareth A Palidwor, and Theodore J Perkins 2015. BIDCHIPS: bias decomposition and removal from ChIP-seq data clarifies true binding signal and its functional correlates. Epigenetics & chromatin Vol. 8, 1 (2015), 33.
[15]
Joel Rozowsky, Ghia Euskirchen, Raymond K Auerbach, Zhengdong D Zhang, Theodore Gibson, Robert Bjornson, Nicholas Carriero, Michael Snyder, and Mark B Gerstein. 2009. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature biotechnology, Vol. 27, 1 (2009), 66--75.
[16]
Mingxiang Teng and Rafael A Irizarry 2016. Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-Seq peak callers. bioRxiv (2016), 090704.
[17]
Reuben Thomas, Sean Thomas, Alisha K Holloway, and Katherine S Pollard 2016. Features that define the best ChIP-seq peak calling algorithms. Briefings in bioinformatics (2016), bbw035.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics
August 2017
800 pages
ISBN:9781450347228
DOI:10.1145/3107411
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2017

Check for updates

Author Tags

  1. bias
  2. chromatin immunoprecipitation followed by high throughput sequencing
  3. lasso regression

Qualifiers

  • Poster

Conference

BCB '17
Sponsor:

Acceptance Rates

ACM-BCB '17 Paper Acceptance Rate 42 of 132 submissions, 32%;
Overall Acceptance Rate 254 of 885 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 57
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media