Abstract
Hi-C experiments capturing the 3D genome architecture have led to the discovery of topologically-associated domains (TADs) that form an important part of the 3D genome organization and appear to play a role in gene regulation and other functions. Several histone modifications have been independently suggested as the possible explanations of TAD formation, but their combinatorial effects on domain formation remain poorly understood at a global scale. Here, we propose a convex semi-nonparametric approach called nTDP based on Bernstein polynomials to explore the joint effects of histone markers on TAD formation as well as predict TADs solely from the histone data. We find a small subset of modifications to be predictive of TADs across species. By inferring TADs using our trained model, we are able to predict TADs across different species and cell types, without the use of Hi-C data, suggesting their effect is conserved. This work provides the first comprehensive joint model of the effect histone markers on domain formation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bach, F.R.: Exploring large feature spaces with hierarchical multiple kernel learning. In: Advances in Neural Information Processing Systems, pp. 105–112 (2009)
Baù, D., Marti-Renom, M.A.: Structure determination of genomic domains by satisfaction of spatial restraints. Chromosome Res. 19(1), 25–35 (2011)
Bednarz, P., Wilczyński, B.: Supervised learning method for predicting chromatin boundary associated insulator elements. J. Bioinform. Computat. Biol. 12(06), 1442006 (2014)
Bernstein, B.E., et al.: The NIH roadmap epigenomics mapping consortium. Nat. Biotechnol. 28(10), 1045–1048 (2010)
Bickmore, W.A., van Steensel, B.: Genome architecture: Domain organization of interphase chromosomes. Cell 152(6), 1270–1284 (2013)
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., Ren, B.: Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485(7398), 376–380 (2012)
ENCODE Project Consortium, et al.: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Ernst, J., Kellis, M.: ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9(3), 215–216 (2012)
Filippova, D., Patro, R., Duggal, G., Kingsford, C.: Identification of alternative topological domains in chromatin. Alg. Mol. Biol. 9(1), 14 (2014)
Gibcus, J.H., Dekker, J.: The hierarchy of the 3D genome. Mol. Cell 49(5), 773–782 (2013)
Guelen, L., et al.: Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453(7197), 948–951 (2008)
Ho, J.W., et al.: Comparative analysis of metazoan chromatin organization. Nature 512(7515), 449–452 (2014)
Hoffman, M.M., Buske, O.J., Wang, J., Weng, Z., Bilmes, J.A., Noble, W.S.: Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012)
Hou, C., Li, L., Qin, Z.S., Corces, V.G.: Gene density, transcription, and insulators contribute to the partition of the drosophila genome into physical domains. Mol. Cell 48(3), 471–484 (2012)
Le, T.B.K., et al.: High-resolution mapping of the spatial organization of a bacterial chromosome. Science 342(6159), 731–734 (2013)
Libbrecht, M.W., Ay, F., Hoffman, M.M., Gilbert, D.M., Bilmes, J.A., Noble, W.S.: Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell type-specific expression. Genome Res. 25, 544–557 (2015)
Lieberman-Aiden, E., et al.: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950), 289–293 (2009)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)
McKay Curtis, S., Ghosh, S.K., et al.: A variable selection approach to monotonic regression with Bernstein polynomials. J. Appl. Stat. 38(5), 961–976 (2011)
Meilă, M.: Comparing clusterings–an information based distance. J. Multivar. Anal. 98(5), 873–895 (2007)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
Nora, E.P., et al.: Segmental folding of chromosomes: a basis for structural and regulatory chromosomal neighborhoods? BioEssays 35(9), 818–828 (2013)
Phillips-Cremins, J.E., Sauria, M.E., Sanyal, A., Gerasimova, T.I., Lajoie, B.R., Bell, J.S., Ong, C.T., Hookway, T.A., Guo, C., Sun, Y., Bland, M.J., Wagstaff, W., Dalton, S., McDevitt, T.C., Sen, R., Dekker, J., Taylor, J., Corces, V.G.: Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153(6), 1281–1295 (2013)
Rao, S.S., et al.: A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159(7), 1665–1680 (2014)
Sefer, E., Duggal, G., Kingsford, C.: Deconvolution of ensemble chromatin interaction data reveals the latent mixing structures in cell subpopulations. In: Przytycka, T.M. (ed.) RECOMB 2015. LNCS, vol. 9029, pp. 293–308. Springer, Heidelberg (2015)
Sefer, E., Kingsford, C.: Metric labeling and semi-metric embedding for protein annotation prediction. In: Bafna, V., Sahinalp, S.C. (eds.) RECOMB 2011. LNCS, vol. 6577, pp. 392–407. Springer, Heidelberg (2011)
Sexton, T., Yaffe, E., Kenigsberg, E., Bantignies, F., Leblanc, B., Hoichman, M., Parrinello, H., Tanay, A., Cavalli, G.: Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148(3), 458–472 (2012)
Tolhuis, B., Palstra, R.J., Splinter, E., Grosveld, F., de Laat, W.: Looping and interaction between hypersensitive sites in the active \(\beta \)-globin locus. Mol. Cell 10(6), 1453–1465 (2002)
Wahba, G.: Spline models for observational data, vol. 59. SIAM (1990)
Yaffe, E., Tanay, A.: Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43(11), 1059–1065 (2011)
Zhou, J., Troyanskaya, O.G.: Global quantitative modeling of chromatin factor interactions. PLoS Comput. Biol. 10(3), e1003525 (2014)
Funding
This research is funded in part by the Gordon and Betty Moore Foundations Data-Driven Discovery Initiative through Grant GBMF4554 to Carl Kingsford, by the US NSF (1256087, 1319998), and by the US NIH (HG006913, HG007104). C.K. received support as an Alfred P. Sloan Research Fellow.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
\(R(f^{p}_{m})\) can be written more explicitly as in (18) according to [19]:
which turns \(R(f^{p}_{m})\) into (19):
where \(\overline{e}_{p} = \max (0,2-A+p)\), \(T^{i-q}_{j-r}(x)\) is defined below and \(\beta (i+j-q-r+1,2A-3-i-j+q+r)\) is the beta function:
\(R(f^{p}_{m})\) is convex which follows from semidefiniteness of the resulting polynomial.
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sefer, E., Kingsford, C. (2015). Semi-nonparametric Modeling of Topological Domain Formation from Epigenetic Data. In: Pop, M., Touzet, H. (eds) Algorithms in Bioinformatics. WABI 2015. Lecture Notes in Computer Science(), vol 9289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48221-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-662-48221-6_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48220-9
Online ISBN: 978-3-662-48221-6
eBook Packages: Computer ScienceComputer Science (R0)