Abstract
Cancers evolve by accumulating genetic alterations, such as mutations and copy number changes. The chronological order of these events is important for understanding the disease, but not directly observable from cross-sectional genomic data. Cancer progression models (CPMs), such as Mutual Hazard Networks (MHNs), reconstruct the progression dynamics of tumors by learning a network of causal interactions between genetic events from their co-occurrence patterns. However, current CPMs fail to include effects of genetic events on the observation of the tumor itself and assume that observation occurs independently of all genetic events. Since a dataset contains by definition only tumors at their moment of observation, neglecting any causal effects on this event leads to the “conditioning on a collider” bias: Events that make the tumor more likely to be observed appear anti-correlated, which results in spurious suppressive effects or masks promoting effects among genetic events. Here, we extend MHNs by modeling effects from genetic progression events on the observation event, thereby correcting for the collider bias. We derive an efficient tensor formula for the likelihood function and learn two models on somatic mutation datasets from the MSK-IMPACT study. In colon adenocarcinoma, we find a strong effect on observation by mutations in TP53, and in lung adenocarcinoma by mutations in EGFR. Compared to classical MHNs, this explains away many spurious suppressive interactions and uncovers several promoting effects.
The data, code, and results are available at https://github.com/cbg-ethz/ObservationMHN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This construction is only needed for learning the model. In order to extrapolate the progression of a tumor into the future beyond its observation, one would “unfreeze” the process again by setting the outgoing effects of the observation to 1. Ideally one would include effects from the treatment of the patient instead.
- 2.
In the independence model, events occur independently of each other with rates equal to their odds in the dataset.
References
Alfaro-Murillo, J.A., Townsend, J.P.: Pairwise and higher-order epistatic effects among somatic cancer mutations across oncogenesis, January 2022. https://doi.org/10.1101/2022.01.20.477132
Beerenwinkel, N., Eriksson, N., Sturmfels, B.: Conjunctive Bayesian networks. Bernoulli 13(4), 893–909 (2007). https://doi.org/10.3150/07-BEJ6133
Beerenwinkel, N., et al.: Learning multiple evolutionary pathways from cross-sectional data. J. Comput. Biol. 12(6), 584–598 (2005). https://doi.org/10.1089/cmb.2005.12.584
Beerenwinkel, N., Schwarz, R.F., Gerstung, M., Markowetz, F.: Cancer evolution: mathematical models and computational inference. Syst. Biol. 64(1), e1–e25 (2014). https://doi.org/10.1093/sysbio/syu081
Berkson, J.: Limitations of the application of fourfold table analysis to hospital data. Biometrics Bull. 2(3), 47 (1946). https://doi.org/10.2307/3002000
Bettington, M., et al.: Clinicopathological and molecular features of sessile serrated adenomas with dysplasia or carcinoma. Gut 66(1), 97–106 (2015). https://doi.org/10.1136/gutjnl-2015-310456
Bleijenberg, A.G., et al.: The earliest events in BRAF-mutant colorectal cancer: exome sequencing of sessile serrated lesions with a tiny focus dysplasia or cancer reveals recurring mutations in two distinct progression pathways. J. Pathol. 257(2), 239–249 (2022). https://doi.org/10.1002/path.5881
Bond, C.E., et al.: RNF43 and ZNRF3 are commonly altered in serrated pathway colorectal tumorigenesis. Oncotarget 7(43), 70589–70600 (2016). https://doi.org/10.18632/oncotarget.12130
Buis, P.E., Dyksen, W.R.: Efficient vector and parallel manipulation of tensor products. ACM Trans. Math. Softw. 22(1), 18–23 (1996). https://doi.org/10.1145/225545.225548
Bürtin, F., Mullins, C.S., Linnebacher, M.: Mouse models of colorectal cancer: Past, present and future perspectives. World J. Gastroenterol. 26(13), 1394–1426 (2020). https://doi.org/10.3748/wjg.v26.i13.1394
Chen, J.: Timed hazard networks: incorporating temporal difference for oncogenetic analysis. PLoS ONE 18(3), e0283004 (2023). https://doi.org/10.1371/journal.pone.0283004
Cho, J.Y.: Risk factors for acute cholecystitis and a complicated clinical course in patients with symptomatic cholelithiasis. Arch. Surg. 145(4), 329 (2010). https://doi.org/10.1001/archsurg.2010.35
Cicenas, J., et al.: KRAS, NRAS and BRAF mutations in colorectal cancer and melanoma. Med. Oncol. 34(2) (2017). https://doi.org/10.1007/s12032-016-0879-9
Cristea, S., Kuipers, J., Beerenwinkel, N.: pathTiMEx: joint inference of mutually exclusive cancer pathways and their progression dynamics. J. Comput. Biol. 24(6), 603–615 (2017). https://doi.org/10.1089/cmb.2016.0171
Desper, R., Jiang, F., Kallioniemi, O.P., Moch, H., Papadimitriou, C.H., Schäffer, A.A.: Inferring tree models for oncogenesis from comparative genome hybridization data. J. Comput. Biol. 6(1), 37–51 (1999). https://doi.org/10.1089/cmb.1999.6.37
Diaz-Colunga, J., Diaz-Uriarte, R.: Conditional prediction of consecutive tumor evolution using cancer progression models: what genotype comes next? PLoS Comput. Biol. 17(12), e1009055 (2021). https://doi.org/10.1371/journal.pcbi.1009055
Farahani, H.S., Lagergren, J.: Learning oncogenetic networks by reducing to mixed integer linear programming. PLoS ONE 8(6), e65773 (2013). https://doi.org/10.1371/journal.pone.0065773
Fearon, E.R., Vogelstein, B.: A genetic model for colorectal tumorigenesis. Cell 61(5), 759–767 (1990). https://doi.org/10.1016/0092-8674(90)90186-i
Georg, P.: Tensor train decomposition for solving high-dimensional mutual hazard networks (2022). https://doi.org/10.5283/EPUB.53004. https://epub.uni-regensburg.de/id/eprint/53004
Gerstung, M., Baudis, M., Moch, H., Beerenwinkel, N.: Quantifying cancer progression with conjunctive Bayesian networks. Bioinformatics 25(21), 2809–2815 (2009). https://doi.org/10.1093/bioinformatics/btp505
Giannakis, M., et al.: RNF43 is frequently mutated in colorectal and endometrial cancers. Nat. Genet. 46(12), 1264–1266 (2014). https://doi.org/10.1038/ng.3127
Gotovos, A., Burkholz, R., Quackenbush, J., Jegelka, S.: Scaling up continuous-time Markov chains helps resolve underspecification, July 2021. https://doi.org/10.48550/arXiv.2107.02911
Grant, A., et al.: Molecular drivers of tumor progression in microsatellite stable APC mutation-negative colorectal cancers. Sci. Rep. 11(1) (2021). https://doi.org/10.1038/s41598-021-02806-x
Greenbury, S.F., Barahona, M., Johnston, I.G.: HyperTraPS: inferring probabilistic patterns of trait acquisition in evolutionary and disease progression pathways. Cell Syst. 10(1), 39–51.e10 (2020). https://doi.org/10.1016/j.cels.2019.10.009
van de Haar, J., Canisius, S., Yu, M.K., Voest, E.E., Wessels, L.F., Ideker, T.: Identifying epistasis in cancer genomes: a delicate affair. Cell 177(6), 1375–1383 (2019). https://doi.org/10.1016/j.cell.2019.05.005
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Hernán MA, R.J.: Causal Inference: What If. Chapman & Hall/CRC, Boca Raton (2020)
Hjelm, M., Höglund, M., Lagergren, J.: New probabilistic network models and algorithms for oncogenesis. J. Comput. Biol. 13(4), 853–865 (2006). https://doi.org/10.1089/cmb.2006.13.853
Iranzo, J., Gruenhagen, G., Calle-Espinosa, J., Koonin, E.V.: Pervasive conditional selection of driver mutations and modular epistasis networks in cancer. Cell Rep. 40(8), 111272 (2022). https://doi.org/10.1016/j.celrep.2022.111272
Jeong, W.J., Ro, E.J., Choi, K.Y.: Interaction between wnt/\(\beta \)-catenin and RAS-ERK pathways and an anti-cancer strategy via degradations of \(\beta \)-catenin and RAS by targeting the wnt/\(\beta \)-catenin pathway. npj Precis. Oncol. 2(1) (2018). https://doi.org/10.1038/s41698-018-0049-y
Johnston, I.G., Williams, B.P.: Evolutionary inference across eukaryotes identifies specific pressures favoring mitochondrial gene retention. Cell Syst. 2(2), 101–111 (2016). https://doi.org/10.1016/j.cels.2016.01.013
Klever, M., Georg, P., Grasedyck, L., Schill, R., Spang, R., Wettig, T.: Low-rank tensor methods for Markov chains with applications to tumor progression models. J. Math. Biol. 86(1) (2022). https://doi.org/10.1007/s00285-022-01846-9
Lee, S.K., Hwang, J.H., Choi, K.Y.: Interaction of the wnt/\(\beta \)-catenin and RAS-ERK pathways involving co-stabilization of both \(\beta \)-catenin and RAS plays important roles in the colorectal tumorigenesis. Adv. Biol. Regul. 68, 46–54 (2018). https://doi.org/10.1016/j.jbior.2018.01.001
Leggett, B., Whitehall, V.: Role of the serrated pathway in colorectal cancer pathogenesis. Gastroenterology 138(6), 2088–2100 (2010). https://doi.org/10.1053/j.gastro.2009.12.066
Loohuis, L.O., et al.: Inferring tree causal models of cancer progression with probability raising. PLoS ONE 9(10), e108358 (2014). https://doi.org/10.1371/journal.pone.0108358
Luo, X.G., Kuipers, J., Beerenwinkel, N.: Joint inference of exclusivity patterns and recurrent trajectories from tumor mutation trees. Nat. Commun. 14(1) (2023). https://doi.org/10.1038/s41467-023-39400-w
Mina, M., Iyer, A., Ciriello, G.: Epistasis and evolutionary dependencies in human cancers. Curr. Opin. Genet. Dev. 77, 101989 (2022). https://doi.org/10.1016/j.gde.2022.101989
Misra, N., Szczurek, E., Vingron, M.: Inferring the paths of somatic evolution in cancer. Bioinformatics 30(17), 2456–2463 (2014). https://doi.org/10.1093/bioinformatics/btu319
Moen, M.T., Johnston, I.G.: HyperHMM: efficient inference of evolutionary and progressive dynamics on hypercubic transition graphs. Bioinformatics 39(1) (2022). https://doi.org/10.1093/bioinformatics/btac803
Montazeri, H., et al.: Large-scale inference of conjunctive Bayesian networks. Bioinformatics 32(17), i727–i735 (2016). https://doi.org/10.1093/bioinformatics/btw459
Nguyen, B., Sanchez-Vega, C.F.F., Schultz, N., et al.: Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients. Cell 185(3), 563–575.e11 (2022). https://doi.org/10.1016/j.cell.2022.01.003
Nicol, P.B., et al.: Oncogenetic network estimation with disjunctive Bayesian networks. Comput. Syst. Oncol. 1(2) (2021). https://doi.org/10.1002/cso2.1027
Nowell, P.C.: The clonal evolution of tumor cell populations. Science 194(4260), 23–28 (1976). https://doi.org/10.1126/science.959840
Oliveira, C., et al.: KRAS and BRAF oncogenic mutations in MSS colorectal carcinoma progression. Oncogene 26(1), 158–163 (2006). https://doi.org/10.1038/sj.onc.1209758
Ortmann, C.A., et al.: Effect of mutation order on myeloproliferative neoplasms. N. Engl. J. Med. 372(7), 601–612 (2015). https://doi.org/10.1056/nejmoa1412098
Ramazzotti, D., et al.: CAPRI: efficient inference of cancer progression models from cross-sectional data. Bioinformatics 31(18), 3016–3026 (2015). https://doi.org/10.1093/bioinformatics/btv296
Raphael, B.J., Vandin, F.: Simultaneous inference of cancer pathways and tumor progression from cross-sectional mutation data. J. Comput. Biol. 22(6), 510–527 (2015). https://doi.org/10.1089/cmb.2014.0161
Rupp, K., et al.: Differentiated uniformization: a new method for inferring Markov chains on combinatorial state spaces including stochastic epidemic models (2021). https://doi.org/10.48550/ARXIV.2112.10971. https://arxiv.org/abs/2112.10971
Schill, R.: Mutual hazard networks: Markov chain models of cancer progression (2022). https://doi.org/10.5283/EPUB.53417. https://epub.uni-regensburg.de/id/eprint/53417
Schill, R., Solbrig, S., Wettig, T., Spang, R.: Modelling cancer progression using mutual hazard networks. Bioinformatics 36(1), 241–249 (2019). https://doi.org/10.1093/bioinformatics/btz513
The AACR Project GENIE Consortium, et al.: AACR project genie: powering precision medicine through an international consortium. Cancer Discov. 7(8), 818–831 (2017). https://doi.org/10.1158/2159-8290.CD-17-0151
Unni, A.M., Lockwood, W.W., Zejnullahu, K., Lee-Lin, S.Q., Varmus, H.: Evidence that synthetic lethality underlies the mutual exclusivity of oncogenic KRAS and EGFR mutations in lung adenocarcinoma. eLife 4 (2015). https://doi.org/10.7554/elife.06907
Vogelstein, B., Papadopoulos, N., Velculescu, V.E., Zhou, S., Diaz, L.A., Kinzler, K.W.: Cancer genome landscapes. Science 339(6127), 1546–1558 (2013). https://doi.org/10.1126/science.1235122
Yamamoto, D., et al.: Characterization of RNF43 frameshift mutations that drive Wnt ligand- and RS-spondin-dependent colon cancer. J. Pathol. 257(1), 39–52 (2022). https://doi.org/10.1002/path.5868
Yang, L., et al.: An enhanced genetic model of colorectal cancer progression history. Genome Biol. 20(1) (2019). https://doi.org/10.1186/s13059-019-1782-4
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat Methodol. 68(1), 49–67 (2005). https://doi.org/10.1111/j.1467-9868.2005.00532.x
Acknowledgements
This work was supported by the Swiss National Science Foundation grant 179518, the Swiss Cancer League grant KFS-2977-08-2012 and the German Research Foundation grants TRR-305 and GR-3179/6-1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schill, R. et al. (2024). Overcoming Observation Bias for Cancer Progression Modeling. In: Ma, J. (eds) Research in Computational Molecular Biology. RECOMB 2024. Lecture Notes in Computer Science, vol 14758. Springer, Cham. https://doi.org/10.1007/978-1-0716-3989-4_14
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3989-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-1-0716-3988-7
Online ISBN: 978-1-0716-3989-4
eBook Packages: Computer ScienceComputer Science (R0)