Interpreting cis-regulatory interactions from large-scale deep neural networks

3729 Accesses
71 Altmetric
9 Mentions
Explore all metrics

Abstract

The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with orthogonal experimental data, providing insights into generalization but offering limited insights into their decision-making process. Existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences. Here we present cis-regulatory element model explanations (CREME), an in silico perturbation toolkit that interprets the rules of gene regulation learned by a genomic DNN. Applying CREME to Enformer, a state-of-the-art DNN, we identify cis-regulatory elements that enhance or silence gene expression and characterize their complex interactions. CREME can provide interpretations across multiple scales of genomic organization, from cis-regulatory elements to fine-mapped functional sequence elements within them, offering high-resolution insights into the regulatory architecture of the genome. CREME provides a powerful toolkit for translating the predictions of genomic DNNs into mechanistic insights of gene regulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Fig. 1: CREME overview and results for context perturbations in K562 cells using Enformer.**

**Fig. 2: CRE-level analysis in K562 using Enformer.**

**Fig. 3: Fine-tile search results for enhancing tiles in K562.**

**Fig. 4: TSS–CRE distance test schematic and results.**

**Fig. 5: Optimal CRE sets reveal complex interactions for K562 using Enformer.**

**Fig. 6: Investigation of CRE interactions.**

Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models

Article 21 June 2024

Effective gene expression prediction from sequence by integrating long-range interactions

Article Open access 04 October 2021

Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers

Article Open access 27 March 2023

Data availability

Final and intermediate results for paper reproducibility are available via Zenodo at https://doi.org/10.5281/zenodo.12584210 (ref. ⁷⁵).

Code availability

Static code for reproducing the analyses in the manuscript is available via Zenodo at https://zenodo.org/records/12594513 (ref. ⁷⁶). A bleeding-edge version of CREME is available via GitHub at https://github.com/p-koo/creme-nn and https://github.com/p-koo/CREME_paper_reproducibility. A stable version of CREME is installable via pip (PyPI at https://pypi.org/project/creme-nn/). Comprehensive documentation is provided on ReadTheDocs.org (API at https://creme-nn.readthedocs.io/en/latest/index.html and tutorials at https://creme-nn.readthedocs.io/en/latest/tutorials.html).

References

Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018).
Article CAS PubMed PubMed Central Google Scholar
Karbalayghareh, A., Sahin, M. & Leslie, C. S. Chromatin interaction–aware gene regulatory modeling with graph attention networks. Genome Res. 32, 930–944 (2022).
PubMed PubMed Central Google Scholar
Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting rna-seq coverage from dna sequence as a unifying model of gene regulation. Preprint at bioRxiv https://doi.org/10.1101/2023.08.30.555582 (2023).
Toneyan, S., Tang, Z. & Koo, P. K. Evaluating deep learning for predicting epigenomic profiles. Nat. Mach. Intell. 4, 1–13 (2022).
Karollus, A., Mauermeier, T. & Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 24, 1–29 (2023).
Article Google Scholar
Kircher, M. et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 10, 3583 (2019).
Article PubMed PubMed Central Google Scholar
Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by starr-seq. Science 339, 1074–1077 (2013).
Article CAS PubMed Google Scholar
Qi, L. S. et al. Repurposing crispr as an RNA-guided platform for sequence-specific control of gene expression. Cell 152, 1173–1183 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sasse, A. et al. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. Nat. Genet. 55, 2060–2064 (2023).
Article CAS PubMed Google Scholar
Huang, C. et al. Personal transcriptome variation is poorly explained by current genomic deep learning models. Nat Genet. 55, 2056–2059 (2023).
Article CAS PubMed PubMed Central Google Scholar
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Proc. of the International Conference on Learning Representations (ICLR, 2014).
Scott, M., and Lee Su-In. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30, 4765–4774 (2017).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning 3145–3153 (2017).
Koo, P. K., Majdandzic, A., Ploenzke, M., Anand, P. & Paul, S. B. Global importance analysis: an interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput. Biol. 17, e1008925 (2021).
Article CAS PubMed PubMed Central Google Scholar
Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
Article CAS PubMed PubMed Central Google Scholar
Koo, P. K. & Ploenzke, M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. Nat. Mach. Intell. 3, 258–266 (2021).
Article PubMed PubMed Central Google Scholar
Hammelman, J. & Gifford, D. K. Discovering differential genome sequence activity with interpretable and efficient deep learning. PLoS Comput. Biol. 17, e1009282 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liu, G., Zeng, H. & Gifford, D. K. Visualizing complex feature interactions and feature sharing in genomic deep neural networks. BMC Bioinform. 20, 401 (2019).
Article Google Scholar
Greenside, P., Shimko, T., Fordyce, P. & Kundaje, A. Discovering epistatic feature interactions from neural network models of regulatory dna sequences. Bioinformatics 34, i629–i637 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jha, A., Aicher, J. K., Gazzara, M. R., Singh, D. & Barash, Y. Enhanced integrated gradients: improving interpretability of deep learning models using splicing codes as a case study. Genome Biol. 21, 149 (2020).
Article CAS PubMed PubMed Central Google Scholar
Linder, J. et al. Interpreting neural networks for biological sequences by learning stochastic masks. Nat. Mach. Intell. 4, 41–54 (2022).
Article PubMed PubMed Central Google Scholar
Seitz, E. E., McCandlish, D. M., Kinney, J. B. & Koo, P. K. Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models. Nat. Mach. Intell. 6, 701–713 (2024).
Article Google Scholar
Fulco, C. P. et al. Systematic mapping of functional enhancer–promoter connections with crispr interference. Science 354, 769–773 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390 (2019).
Article CAS PubMed PubMed Central Google Scholar
Frankish, A. et al. Gencode 2021. Nucleic Acids Res. 49, D916–D923 (2021).
Article CAS PubMed Google Scholar
Lin, X. et al. Nested epistasis enhancer networks for robust genome regulation. Science 377, 1077–1085 (2022).
Article CAS PubMed PubMed Central Google Scholar
Goel, V. Y., Huseyin, M. K. & Hansen, A. S. Region capture micro-c reveals coalescence of enhancers and promoters into nested microcompartments. Nat. Genet. 6, 1048–1056 (2023).
Luthra, I. et al. Regulatory activity is the default dna state in eukaryotes. Nat. Struct. Mol. Biol. 3, 559–567 (2024).
Pang, B. & Snyder, M. P. Systematic identification of silencers in human cells. Nat. Geneti. 52, 254–263 (2020).
Article CAS Google Scholar
Stampfel, G. et al. Transcriptional regulators form diverse groups with context-dependent regulatory functions. Nature 528, 147–151 (2015).
Article CAS PubMed Google Scholar
Kulkarni, M. M. & Arnosti, D. N. cis-regulatory logic of short-range transcriptional repression in drosophila melanogaster. Mol. Cell. Biol. 25, 3411–3420 (2005).
Article CAS PubMed PubMed Central Google Scholar
Doni Jayavelu, N., Jajodia, A., Mishra, A. & Hawkins, R. D. Candidate silencer elements for the human and mouse genomes. Nat. Commun. 11, 1061 (2020).
Article CAS PubMed PubMed Central Google Scholar
Martinez-Ara, M., Comoglio, F., van Arensbergen, J. & van Steensel, B. Systematic analysis of intrinsic enhancer-promoter compatibility in the mouse genome. Mol. Cell 82, 2519–2531 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bergman, D. T. et al. Compatibility rules of human enhancer and promoter sequences. Nature 607, 176–184 (2022).
Article CAS PubMed PubMed Central Google Scholar
Narita, T. et al. The logic of native enhancer-promoter compatibility and cell-type-specific gene expression variation. Preprint at bioRxiv https://doi.org/10.1101/2022.07.18.500456 (2022).
Armendariz, D. A., Sundarrajan, A. & Hon, G. C. Breaking enhancers to gain insights into developmental defects. eLife 12, e88187 (2023).
Article CAS PubMed PubMed Central Google Scholar
Catarino, R. R. & Stark, A. Assessing sufficiency and necessity of enhancer activities for gene expression and the mechanisms of transcription activation. Genes Dev. 32, 202–223 (2018).
Article CAS PubMed PubMed Central Google Scholar
Luo, Y. et al. New developments on the encyclopedia of dna elements (encode) data portal. Nucleic Acids Res. 48, D882–D889 (2020).
Article CAS PubMed Google Scholar
Igolkina, A. A. et al. H3k4me3, h3k9ac, h3k27ac, h3k27me3 and h3k9me3 histone tags suggest distinct regulatory evolution of open and condensed chromatin landmarks. Cells 8, 1034 (2019).
Article CAS PubMed PubMed Central Google Scholar
Monaghan, L. et al. The emerging role of h3k9me3 as a potential therapeutic target in acute myeloid leukemia. Front. Oncol. 9, 705 (2019).
Article PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. PNAS 117, 9451–9457 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gao, T. & Qian, J. Enhanceratlas 2.0: an updated resource with enhancer annotation in 586 tissue/cell types across nine species. Nucleic Acids Res. 48, D58–D64 (2020).
CAS PubMed Google Scholar
Zhang, Y., See, Y. X., Tergaonkar, V. & Fullwood, M. J. Long-distance repression by human silencers: chromatin interactions and phase separation in silencers. Cells 11, 1560 (2022).
Article PubMed PubMed Central Google Scholar
Jin, Y. et al. Targeting methyltransferase prmt5 eliminates leukemia stem cells in chronic myelogenous leukemia. J Clin Invest. 126, 3961–3980 (2016).
Article PubMed PubMed Central Google Scholar
Griffin, G. K. et al. Epigenetic silencing by setdb1 suppresses tumour intrinsic immunogenicity. Nature 595, 309–314 (2021).
Article CAS PubMed PubMed Central Google Scholar
Garcia-Carpizo, V. et al. CREBBP/EP300 bromodomains are critical to sustain the GATA1/MYC regulatory axis in proliferation. Epigenetics Chromatin 11, 30 (2018).
Article PubMed PubMed Central Google Scholar
Del Gaudio, N. et al. BRD9 binds cell type-specific chromatin regions regulating leukemic cell survival via STAT5 inhibition. Cell Death Dis. 10, 338 (2019).
Article PubMed PubMed Central Google Scholar
Lazar, J. E. et al. Global regulatory DNA potentiation by SMARCA4 propagates to selective gene expression programs via domain-level remodeling. Cell Rep. 31, 107676 (2020).
Benton, M. L., Talipineni, S. C., Kostka, D. & Capra, J. A. Genome-wide enhancer annotations differ significantly in genomic distribution, evolution, and function. BMC Genomics 20, 511 (2019).
Article PubMed PubMed Central Google Scholar
Grant, C. E. & Bailey, T. L. XSTREME: comprehensive motif analysis of biological sequence datasets. Preprint at bioRxiv https://doi.org/10.1101/2021.09.02.458722 (2021).
Zuin, J. et al. Nonlinear control of transcription through enhancer–promoter interactions. Nature 604, 571–577 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhan, Y. et al. Reciprocal insulation analysis of Hi-C data shows that tads represent a functionally but not structurally privileged scale in the hierarchical folding of chromosomes. Genome Res. 27, 479–490 (2017).
Article CAS PubMed PubMed Central Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of crispr perturbations. Nat. Genet. 51, 1664–1669 (2019).
Article CAS PubMed PubMed Central Google Scholar
Choi, J. et al. Evidence for additive and synergistic action of mammalian enhancers during cell fate determination. eLife 10, e65381 (2021).
Article CAS PubMed PubMed Central Google Scholar
Martinez-Ara, M., Comoglio, F. & van Steensel, B. Large-scale analysis of the integration of enhancer-enhancer signals by promoters. Preprint at bioRxiv https://doi.org/10.1101/2023.08.11.552995 (2023).
Kvon, E. Z., Waymack, R., Gad, M. & Wunderlich, Z. Enhancer redundancy in development and disease. Nat. Rev. Genet. 22, 324–336 (2021).
Article CAS PubMed PubMed Central Google Scholar
Frankel, N. et al. Phenotypic robustness conferred by apparently redundant transcriptional enhancers. Nature 466, 490–493 (2010).
Article CAS PubMed PubMed Central Google Scholar
Osterwalder, M. et al. Enhancer redundancy provides phenotypic robustness in mammalian development. Nature 554, 239–243 (2018).
Article CAS PubMed PubMed Central Google Scholar
Perry, M. W., Boettiger, A. N. & Levine, M. Multiple enhancers ensure precision of gap gene-expression patterns in the drosophila embryo. Pro. Natl Acad. Sci. USA 108, 13570–13575 (2011).
Article CAS Google Scholar
Hong, C. K. Y. & Cohen, B. A. Genomic environments scale the activities of diverse core promoters. Genome Res. 32, 85–96 (2022).
Article PubMed PubMed Central Google Scholar
Zhou, J. L., Guruvayurappan, K., Chen, H. V., Chen, A. R. & McVicker, G. P. Genome-wide analysis of crispr perturbations indicates that enhancers act multiplicatively and without epistatic-like interactions. Preprint at bioRxiv https://doi.org/10.1101/2023.04.26.538501 (2023).
Sanford, E. M., Emert, B. L., Coté, A. & Raj, A. Gene regulation gravitates toward either addition or multiplication when combining the effects of two signals. eLife 9, e59388 (2020).
Article CAS PubMed PubMed Central Google Scholar
Crocker, J., Ilsley, G. R. & Stern, D. L. Quantitatively predictable control of drosophila transcriptional enhancers in vivo with engineered transcription factors. Nat. Genet. 48, 292–298 (2016).
Article CAS PubMed Google Scholar
Melen, G. J., Levy, S., Barkai, N. & Shilo, B.-Z. Threshold responses to morphogen gradients by zero-order ultrasensitivity. Mol. Syst. Biol. 1, 2005–0028 (2005).
Article PubMed Central Google Scholar
Burz, D. S., Rivera-Pomar, R., Jäckle, H. & Hanes, S. D. Cooperative DNA-binding by bicoid provides a mechanism for threshold-dependent gene activation in the drosophila embryo. EMBO J. 17, 5998–6009 (1998).
Article CAS PubMed PubMed Central Google Scholar
Doughty, B. R. et al. Single-molecule chromatin configurations link transcription factor binding to expression in human cells. Preprint at bioRxiv https://doi.org/10.1101/2024.02.02.578660 (2024).
Bothma, J. P. et al. Enhancer additivity and non-additivity are determined by enhancer strength in the drosophila embryo. eLife 4, e07956 (2015).
Article PubMed PubMed Central Google Scholar
Scholes, C., Biette, K. M., Harden, T. T. & DePace, A. H. Signal integration by shadow enhancers and enhancer duplications varies across the drosophila embryo. Cell Rep. 26, 2407–2418 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ovadia, Y. et al. Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In Adv. Neural Inf. Process. Syst. https://papers.nips.cc/paper_files/paper/2019/file/8558cb408c1d76621371888657d2eb1d-Paper.pdf (2019).
Vaswani, A. et al. Attention is all you need. In Adv. Neural Inf. Process. Syst. https://papers.nips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (2017).
Chen, P. B. et al. Systematic discovery and functional dissection of enhancers needed for cancer cell fitness and proliferation. Cell Rep. 41, 111630 (2022).
Article CAS PubMed PubMed Central Google Scholar
Crocker, J. et al. Low affinity binding site clusters confer hox specificity and regulatory robustness. Cell 160, 191–203 (2015).
Article CAS PubMed Google Scholar
Grant, C. E., Bailey, T. L. & Noble, W. S. Fimo: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Article CAS PubMed PubMed Central Google Scholar
Toneyan, S. & Koo, P. Creme-nn data and results. Zenodo https://doi.org/10.5281/zenodo.12584210 (2024).
Toneyan, S. & Koo, P. Creme-nn code. Zenodo https://zenodo.org/records/12594513 (2023).

Download references

Acknowledgements

We thank S. Navlakha, J. Desmarais, J. Kinney and members of the Koo Lab for helpful comments on the manuscript. Research reported in this publication was supported in part by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG012131 (P.K.K.), the National Institute Of General Medical Sciences of the National Institutes of Health under award number R01GM149921 (S.T. and P.K.K.) and the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. This work was performed with assistance from the US National Institutes of Health Grant S10OD028632-01. We also thank the NVIDIA GPU Grant Program for support.

Author information

Authors and Affiliations

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, New York, NY, USA
Shushan Toneyan & Peter K. Koo

Authors

Shushan Toneyan
View author publications
You can also search for this author in PubMed Google Scholar
Peter K. Koo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.T. and P.K.K. conceived of the method and designed the experiments. S.T. developed code, ran the experiments and analyzed the results. S.T. and P.K.K. interpreted the results and contributed to writing the paper.

Corresponding author

Correspondence to Peter K. Koo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Results of the Context Dependence Test and Context Swap Test for GM12878 and PC-3.

a,b Histogram of normalized context effect from the Context Dependence Test for 10,000 sequences that contain an active, annotated gene in GM12878 and PC-3 cells. Inset shows the subset of sequences for enhancing, silencing and neutral contexts. a inset contains 200, 78 and 183 data points in enhancing, silencing and neutral context respectively. b inset contains 200, 90 and 110 data points in enhancing, silencing and neutral context respectively. c, Pairwise comparison of normalized context effects between cell lines for matched genes. The number of data points is 7688, 6946, 7492 from left to right. d,e, Context Swap Test results. Boxplots of normalized context effect on TSS for sequences with context perturbations given by insertion of the original TSS in different context categories. Results are organized according to the original TSS category: enhancing (left), neutral (middle), and silencing (right). The number of data points in each boxplot represent an all-vs-all comparison of each respective TSS in each possible context. The number of data points in d is 40,000, 36,600, 15,600 in boxplots for TSS from enhancing context, 36,600, 33,489, 14,274 in TSS from neutral context and 15600, 14274, 6084 in TSS from silencing context. The number of data points in e is 40,000, 22,000, 18,000 in boxplots for TSS from enhancing, 22,000, 12,100, 9,900 in TSS from neutral context and 18,000, 9,900, 8,100 in TSS from silencing context. Boxplots show the first and third quartiles, the median (central line) and the range of data with outliers removed (whiskers).

Extended Data Fig. 2 Borzoi Context Dependence Test results.

a, Scatter plot comparing the wild-type activity predicted by Enformer versus Borzoi for the matched cell types and for matched genes. b, Histogram of normalized context effect for the 10,000 highest activity, annotated genes (according to Borzoi’s predictions) for K562, GM12878 and PC-3 cells. Inset shows the subset of sequences for enhancing, silencing and neutral contexts. The number of data points is shown in inset legend.

Extended Data Fig. 3 CRE effects on TSS activity in GM12878 and PC-3 cell lines.

a,b, Boxen plot of the normalized shuffle effect for each tile in sequences from enhancing, neutral and silencing context categories (Necessity Test) for GM12878 (a) and PC-3 (b). The number of data points in a is 7600, 6954, 2964 and in b is 7600, 4180, 3420 in enhancing, neutral and silencing contexts respectively. c, d, Boxen plot of tile effects for each tile in sequences from enhancing, neutral and silencing context categories (Sufficiency Test) for GM12878 (c) and PC-3 (d). Normalization is with predicted TSS activity for wild-type (enhancing context) and control, that is the intrinsic TSS activity (neutral and silencing context). Boxen-plots have the same number of data points as in a and b. In panels a – d center lines of boxenplots show the median and boxes in both directions always indicate half of the remaining data. e, Scatter plot between the results from the Necessity Test (y-axis) versus the results from the Sufficiency Test (x-axis) in K562 cell line (N = 7,600 in each plot corresponding to 200 sequences with 38 tiles in each).

Extended Data Fig. 4 Characterization of sufficient CREs in GM12878 and PC-3.

a, Histogram of the distance between CRE tiles from TSS for sufficient enhancers and silencers in GM12878 and PC-3. b–d, Boxplots of mean DNase-seq coverage (b), mean ATAC-seq coverage (c), and mean histone mark coverage (d) of sufficient enhancer and silencer tiles in various cell types. The number of points in green and red boxes is 76 and 222 in K562, 41 and 57 for GM12878 and 35 and 97 for PC-3. Significance is given by the two-sided Mann-Whitney U test (*: p < 0.05; **: p < 0.05; ***: p < 0.001; ****: p < 0.0001). Boxplots show the first and third quartiles, the median (central line) and the range of data with outliers removed (whiskers).

Extended Data Fig. 5 TSS-CRE Distance Test results across cell lines.

a–c, Average plot of the fold change over max versus distance to TSS for GM12878 (a) and PC-3 (b). Max represents the maximum TSS activity across all embedded positions within each sequence using Enformer. c, d, Plot of the tile sufficiency versus distance to TSS for GM12878 (c) and PC-3 (d), respectively. Tile sufficiency is calculated according to the predicted TSS activity with a TSS-CRE pair at a given distance minus the control sequence (shuffled context with just the TSS) divided by the WT sequence for enhancers and by the control sequence for silencers. In panels a – d shaded regions represent standard deviation of the mean.

Extended Data Fig. 6 Example sequences showing individual tile effect sizes from the Higher-Order Interaction Test results.

a–i, the left panels show results of the greedy search (green) and the additive model (orange) for a particular gene; the right panel shows the independent tile effect size (calculated from the first iteration) sorted according to greedy search tile order. a–c shows example sequences classified as superadditivity; d–f shows sequences classified as subadditivity; g–i shows example sequences classified as additivity.

Extended Data Fig. 7 Optimal CRE sets reveal complex interactions in GM12878 and PC-3.

a, b, Average plot of the greedy search results for enhancer tile sets (a) and silencer tile sets (b) for sequences from different context categories for various cell lines. The fold change over wild-type (WT) is the predicted TSS activity of the shuffled CRE tiles in each round of the greedy search (indicated by the number of tiles). c, d, Sufficiency of the tile sets identified in each round of greedy search. Average fold change over wild-type (c) and control (d), which represents shuffled sequences with just the TSS tile. Sufficiency places the tile sets along with the TSS tile into shuffled sequences, averaging over 10 total shuffles. Shaded region represents the standard deviation of the mean.

Extended Data Fig. 8 Comparison of enhancer sets identified by the Higher-Order Interaction Test and a hypothetical additive model for GM12878 and PC-3.

a, b, Comparison of the average fold change over wild-type (WT) for enhancer sets for sequences categorized as enhancing context versus a hypothetical additive effects model. The sequences from enhancing contexts are stratified according to interaction type, superadditivity, subadditivity, and additivity. Sequences were classified using mean squared error based thresholds of 0.1 for superadditivity and subadditivity and 0.05 for additivity definition (with some ambiguous cases left out of classification). Shaded region represents standard deviation of the mean. c, e, Comparison of hypothetical additive model and hypothetical multiplicative model versus greedy search outcomes at iteration 2 of the higher-order interaction test. The number of points in each box is 69, 38 and 60 in GM12878 and 93, 37, 36 in PC-3 for additive, superadditivity and subadditivity cases. Note, that some ambiguous cases were left out of the classification if they were outside of the selected thresholds. Statistical significance was given according to the two-sided Mann-Whitney U test (*: p < 0.05; **: p < 0.01; ***: p < 0.001; ****: p < 0.0001). Boxplots show the first and third quartiles, the median (central line) and the range of data with outliers removed (whiskers). d, f, Greedy search versus hypothetical additive or multiplicative models. Scatter plots show a more detailed view of the data in c, e with x-axis showing the higher-order interaction test outcomes and the y-axis showing the hypothetical model outputs (additive or multiplicative).

Extended Data Fig. 9 Comparison of silencer sets identified by the Higher-Order Interaction Test and a hypothetical additive model for K562, GM12878 and PC-3.

a–c, Comparison of the average fold change over wild-type (WT) for silencer sets for sequences categorized as silencing context versus a hypothetical additive effects model for K562 (a), GM12878 (b), PC-3 (c). The sequences from silencing contexts are stratified according to interaction type, superadditivity and additivity. Shaded region represents standard deviation of the mean. Notably, we did not identify any subadditivity cases.

Extended Data Fig. 10 Saturation behavior of TSS activity predictions by Enformer in various cell lines.

The results from a CRE Multiplicity Test applied to sequences from enhancing context (left) and silencing context (right) in a–c. Each line represents a particular enhancer or silencer CRE embedded into shuffled sequences at optimal positions (according to a Greedy Search) versus the copy number of the CRE in the sequence. The number of enhancers in each plot in a–c is 200, the number of silencers is 200, 78, 90 in a–c, respectively. The normalized TSS effect represents the predicted TSS activity of the mutated sequence divided by the control, which is the shuffled sequence with the TSS tile and the CRE in their original positions. The average across all CREs is shown with a thicker line and the shaded region represents the standard deviation of the mean.

Supplementary information

Supplementary Information

Supplementary Tables 1–4, Figs. 1–10 and Note 1.

Reporting Summary

Peer Review File

Supplementary Data 1

Supplementary Data Tables 1–4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Toneyan, S., Koo, P.K. Interpreting cis-regulatory interactions from large-scale deep neural networks. Nat Genet (2024). https://doi.org/10.1038/s41588-024-01923-3

Download citation

Received: 28 July 2023
Accepted: 21 August 2024
Published: 16 September 2024
DOI: https://doi.org/10.1038/s41588-024-01923-3
Springer Nature America, Inc.