WO2019197816A1

WO2019197816A1 - Biomarkers in clear cell renal cell carcinoma

Info

Publication number: WO2019197816A1
Application number: PCT/GB2019/051028
Authority: WO
Inventors: Charles Swanton; Kevin LITCHFIELD; Samra TURAJLIC; Hang Xu
Original assignee: The Francis Crick Institute Limited
Priority date: 2018-04-09
Filing date: 2019-04-09
Publication date: 2019-10-17
Also published as: GB201805876D0

Abstract

The present invention relates to clear cell renal cell carcinoma biomarkers. More particularly it relates to use of the biomarkers to identify patients with clear cell renal cell carcinoma who are more suitable for treatment with drug therapy and/or surgical intervention. Identification of the presence or absence of the biomarkers can be used to identify sub-groups of patients in whom drug therapy, or alternatively surgical intervention is a more appropriate first treatment. The biomarkers allow patient stratification, which can inform clinical decision making and identification of appropriate treatment regimens. The invention also allows prediction of prognosis of clear cell renal cell carcinoma.

Description

BIOMARKERS IN CLEAR CELL RENAL CELL CARCINOMA

FIELD OF INVENTION

The present invention relates to clear cell clear cell renal cell carcinoma (ccRCC)

biomarkers. More particularly it relates to use of the biomarkers to identify patients with clear cell renal cell carcinoma who are more suitable for treatment with drug therapy or surgical intervention. Identification of the presence or absence of the biomarkers can be used to identify sub-groups of patients in whom drug therapy, or alternatively surgical intervention, is a more appropriate first treatment. The biomarkers allow patient stratification, which can inform clinical decision making and identification of appropriate treatment regimens. The invention also allows prediction of prognosis of clear cell renal cell carcinoma.

BACKGROUND TO THE INVENTION

Renal cell carcinoma (RCC) is the 7^th most commonly diagnosed malignancy in the developed world (Znaor et al. , 2015), with a rising incidence (Smittenaar et al. , 2016). The most common histological subtype is clear cell renal cell carcinoma (ccRCC) which accounts for approximately 80% of all cases (Hsieh et al., 2017). Clinical outcomes of ccRCC are variable and a number of pathological parameters (tumour size, stage, necrosis and grade captured by the Leibovich score (Leibovich et al., 2003), and clinical indicators (MSKCC (Motzer et al., 1999) and Heng scores (Heng et al., 2013)) estimate the risk of progression following curative or cytoreductive nephrectomy, respectively. Local vascular and fat invasion and presence of sarcomatoid differentiation also impact the prognosis (Hatcher et al., 1991).

However, many controversies exist in the clinical management of patients with ccRCC, including the role of cytoreductive nephrectomy, metastasectomy and deferral of drug treatment, in the context of advanced disease; the role of tumour thrombus resection in the presence of concurrent metastatic disease; and management of small renal masses. Around one third of patients with localised ccRCC relapse following surgery.

Currently, tumour molecular profiling does not impact clinical decision making in any of these settings. In addition, almost half of all patients diagnosed with ccRCC die as a result of metastatic disease. The proper management of metastatic kidney cancer has been a challenge to clinicians for quite some time.

There is a need in the art for ways to identify patients that are more suitable for certain treatments, for example distinguishing patients who would benefit from initial surgical intervention from those who are less likely to benefit, and for whom drug treatment would be a more efficacious option as a first treatment (for example if there is a high likelihood of metastasis). Identifying patients in this way would improve treatment, as treatment would be tailored to an individual’s tumour type and molecular profile, rather than indiscriminate use of available treatments.

SUMMARY OF INVENTION

The present invention relates to clear cell renal cell carcinoma biomarkers, particularly to biomarkers which may identify patients as suitable for treatment with drug treatment or therapy.

As described in the Examples, the present inventors have determined that genetic diversity and chromosomal complexity are determinants of patient outcome. The inventors have identified several biomarkers that may assist in identify patients as suitable for drug treatment at the outset, rather than deferral of systemic therapy in favour of surgical intervention.

The present invention also relates to specific biomarkers for use in prediction of metastasis, which may assist in diagnosing ccRCC as being more likely to metastasise or not. The biomarkers also have utility in the prognosis of clear cell renal cell carcinoma.

The biomarkers, methods and assays of the present invention facilitate the prediction of metastasis over time, and likely prognosis of the clear cell renal cell carcinoma.

Advantageously, the assays and methods disclosed herein are cost effective and

straightforward to put into practice, facilitating the clinical decision making of the treatment of such patients. The present invention facilitates a personalised approach to treating clear cell renal cell carcinoma, and therefore represents provision of an improvement in treating this condition.

As discussed herein, the inventors have determined a number of different biomarkers that may be assessed to determine what kind of treatment is more suitable for a patient with ccRCC. As discussed in detail below, the biomarkers include mutations in particular genes, somatic copy number alternations, intratumour heterogeneity (ITH) and weighted genome instability index (wGII).

In one aspect the invention provides a method for predicting the response of a patient with clear cell renal cell carcinoma (ccRCC) to drug treatment or surgery, wherein said method comprises analysing for a modification in a gene selected from BAP1 , PBRM1 , SETD2, PTEN, VHL, mTOR, PIK3CA, TSC1 and TSC2, and/or a somatic copy number alteration (SCNA) selected from loss 9p, loss 14q, gain 8q and gain 12p in a sample from said patient. The presence of: i) an inactivating modification in the VHL gene and in two or more of BAP1 , PBRM1 , SETD2 and PTEN genes;

(ii) an inactivating modification in the BAP1 gene, and optionally the VHL gene; (iii) an absence of an inactivating modification in the VHL gene; and/or

(iv) a somatic copy number alteration (SCNA) selected from loss 9p, loss 14q, gain 8q and gain 12p is predictive of an improved response to drug treatment.

The presence of: (i) an inactivating modification in the PBRM1 gene, and in the SETD2 gene;

(ii) an inactivating modification in the PBRM1 gene, and in the mTOR, PIK3CA, TSC1 , TSC2 or PTEN genes;

(iii) an inactivating modification in the PBRM1 gene, and any one of the following somatic cell copy number alterations: (i) loss 9p, (ii) loss 14q, (iii) gain 8q and (iv) gain 12p; or

(iv) an inactivating modification in the VHL gene Is predictive of improved response to surgery.

As such, the invention provides use of a gene selected from BAP1 , PBRM1 , SETD2, PTEN, VHL, mTOR, PIK3CA, TSC1 and TSC2, and/or a somatic copy number alteration (SCNA) selected from loss 9p, loss 14q, gain 8q and gain 12p as biomarkers for predicting the response of a patient with ccRCC to drug treatment or surgery.

The invention provides a method of treating ccRCC in a patient predicted to have an improved response to drug treatment by a method according to the invention, comprising administering a therapeutically effective amount of said drug treatment to said patient. The invention provides a method of treating ccRCC in a patient predicted to have an improved response to surgery by a method according to the invention, comprising surgical resection of a ccRCC tumour or metastasis. The invention also provides an anti- ccRCC drug for use in treating a patient with ccRCC , wherein said patient is predicted to have an improved response to drug treatment by a method according to the invention.

The invention also provides use of an anti- ccRCC drug in the manufacture of a medicament for use in treating a patient with ccRCC , wherein said patient is predicted to have an improved response to drug treatment by a method according to the invention.

In one aspect the invention provides a kit for use in identifying patients with ccRCC who are predicted to have an improved response to drug treatment or surgery by a method according to the invention, said kit comprising primers suitable for identifying mutations in a gene selected from BAP1 , PBRM1 , SETD2, PTEN, VHL, mTOR, PIK3CA, TSC1 and TSC2, and/or a somatic copy number alteration (SCNA) selected from loss 9p, loss 14q, gain 8q and gain 12p, and wherein the kit optionally comprises a set of instructions.

In a further aspect the invention provides a method of predicting the prognosis of a patient with ccRCC, wherein said method comprises analysing the weighted genome instability index (wGII) and the intratumour heterogeneity (ITH) index in a sample from said patient.

In one aspect the invention provides an anti-clear cell renal cell carcinoma drug for use in treating a patient with clear cell renal cell carcinoma, wherein said patient has been determined to have a clear cell renal cell carcinoma comprising:

(i) an inactivating modification in the VHL gene and in two or more of BAP1 , PBRM1 , SETD2 and PTEN genes;

(ii) an inactivating modification in the BAP1 gene, and optionally the VHL gene;

(iii) an absence of an inactivating modification in the VHL gene; or

(iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

As such, in one aspect the invention provides a method of treating a patient having clear cell renal cell carcinoma comprising administering an effective amount of an anti-clear cell renal cell carcinoma drug to the patient, said patient having been determined to have a clear cell renal cell carcinoma comprising:

(ii) an inactivating modification in the BAP1 gene, and optionally the VHL gene; (iii) an absence of an inactivating modification in the VHL gene; or

(iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

In one aspect the invention provides a method of treating a patient having clear cell renal cell carcinoma comprising administering an effective amount of an anti-clear cell renal cell carcinoma drug to the patient, wherein said patient has a clear cell renal cell carcinoma comprising

(iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

The invention also provides a method of treating a patient having clear cell renal cell carcinoma comprising surgical intervention of the clear cell renal cell carcinoma, said patient having been determined to have a clear cell renal cell carcinoma which does not comprise: (i) an inactivating modification in the VHL gene and in two or more of BAP1 ,

PBRM1 , SETD2 and PTEN genes;

(iii) an absence of an inactivating modification in the VHL gene; or

(iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p. References to“lower” and“higher” herein may be defined as being lower or higher than the median level of a cohort, i.e. relative within a population. One skilled in the art would be aware of how to determine the median value and to assess whether the level of a particular patient is higher or lower than said median value.

The invention also provides a method of treating a patient having clear cell renal cell carcinoma comprising surgical intervention of the clear cell renal cell carcinoma, wherein said patient has a clear cell renal cell carcinoma that does not comprise:

(iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

The invention also provides an anti-clear cell renal cell carcinoma drug for use in treating a patient having clear cell renal cell carcinoma, wherein said patient has a clear cell renal cell carcinoma comprising:

(ii) an inactivating modification in the BAP1 gene, and optionally the VHL gene; ((iii) an absence of an inactivating modification in the VHL gene; or (iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

In a further aspect the invention provides the use of an anti-clear cell renal cell carcinoma drug in the manufacture of a medicament for use in treating a patient having clear cell renal cell carcinoma, said patient having been determined as having a clear cell renal cell carcinoma comprising: (i) an inactivating modification in the VHL gene and in two or more of BAP1 ,

PBRM1 , SETD2 and PTEN genes;

(iii) an absence of an inactivating modification in the VHL gene; or

(iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p. In another aspect the invention provides use of an anti-clear cell renal cell carcinoma drug in the treatment of a patient having clear cell renal cell carcinoma, said patient having been determined as having a clear cell renal cell carcinoma comprising

(i) an inactivating modification in the VHL gene and in two or more of BAP1 , PBRM1 , SETD2 and PTEN genes; (ii) an inactivating modification in the BAP1 gene, and optionally the VHL gene;

((iii) an absence of an inactivating modification in the VHL gene; or (iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

Other aspects of the invention are discussed below. LIST OF FIGURES

Figure 1 - Overview

Panel A is an overview of driver alterations, including SNVs, DNVs, INDELs and SCNAs, detected in 101 TRACERx Renal patients. Rectangles and triangles indicate clonal and subclonal mutations respectively. Parallel evolution mutations are annotated with a split indicating >2 events. Five bilateral or multi-focal cases are shown on the right, with distinct VHL mutations within pairs indicated with an asterisk. Panel B shows mutational frequency in 14 key driver genes in the TRACERx Renal cohort and three single biopsy ccRCC studies (The Cancer Genome Atlas (TCGA) KIRC, Sato, and Scelo). Clonal mutations are shown in the darker shade, subclonal in lighter. Panel C shows the frequency of SCNAs in the TRACERx Renal cohort. Copy number gains and losses are indicated above and below respectively, with clonal SCNAs darker and subclonal in lighter shade. Putative driver copy number altered regions are annotated. The dotted line indicates the frequency of the same SCNAs in the TCGA cohort.

Figure 2 - Driver phylogenetic trees

Driver phylogenetic trees for each tumour (or multiple tumours from the same patient) are shown ordered by overall tumours stage. The size of each node represents the number of SCNAs detected within that subclone. Length of lines connecting tumor subclones does not contain information.

Figure 3 - Parallel Evolution

Table shows driver gene events with >10 subclonal mutations across the cohort. These genes were tested for evidence of parallel evolution using a permutation model accounting for overall gene mutation frequency and the number of biopsies per tumour (see STAR METHODS). BAP1, SETD2 and PTEN were found to show significant evidence of parallel evolution (p<0.05, FDR<0.1). Example driver trees and accompanying tumour sampling images are presented for each significant gene: BAP1, PTEN and SETD2. Parallel events are marked on the driver trees.

Figure 4 - Conserved features of ccRCC evolution

Panel A shows event co-occurrence analysis. Values are log₂(observed no. of co occurrences / expected no. of co-occurrences, STAR METHODS), with significant patterns marked according to the legend. Data is shown for event co-occurrence / mutually exclusivity, in first truncal clones only per case (bottom left) and second all terminal subclones (top right) such that all clonal and subclonal interactions are considered, see STAR METHODS. P-values are calculated under a probabilistic model, as implemented in R package‘co-occur’, with only interactions significant in both‘clonal’ and‘clonal + subclonal’ analyses are considered significant. Panel B (first boxplot) shows molecular clock timing analysis from the whole genome sequenced cohort, with time from the most recent common ancestor (MRCA) to tumour diagnosis plotted on the x-axis. On the y-axis are cases split into three groups, based on having one, two or three clonal driver events. VHL wild type cases (n=2) are excluded on account of their distinct aetiological and phenotypic profile. P-value is based on Kruskal-Wallis test. (Second boxplot) shows the same y-axis patient groups but plotted on the x-axis is tumour size (mm). P-value is based on Kruskal-Wallis test. (Third boxplot) shows on the y-axis all cases from the 100-patient cohort, again VHL wild type cases were then excluded, and remaining cases were split into three groups based on one, two or three clonal driver mutations. Multi-region data on % of cells staining positive for proliferation marker ki67 is shown on the x-axis. P-value is based on a linear mixed effect model, to account for non-independence of multiple observations per tumour. Panel C (left) shows an illustrative schematic tree to demonstrate the method used to trace each tumour’s evolutionary paths. (Right) shows results from the event ordering analysis for all pairs of events with n=10 or more observations. Plotted are the counts of instances where: event 1 was found to precede event 2, and event 1 was found to follow event 2. Significance was tested via Fisher’s exact test with p-values shown after correction for multiple testing using Benjamini-Hochberg procedure.

Figure 5 - Evolutionary subtypes

Figure 5 shows cases grouped by evolutionary subtype, with the following parameters also annotated: presence of clonal wGII, presence of subclonal wGII, ITH index score and tumour size (mm) (range [18-180], white = low, black = high). Occurrences of parallel evolution are denoted in the heatmap with“P”. Plotted next is the distribution of stages per subtype, followed by grade, and then a further six metrics are summarised as the average values for each group: i) mean number of tumour clones, ii) % of patients with grade 4 disease, iii) % of patients with microvascular invasion, iv) mean % of cells staining positive for Ki67 proliferation index (mean calculated first per cass, and then across the cohort), v) % of patients with disease relapse/progression, vi) relapse/progression time. Shown next are relapse/progression free survival plots per group and lastly shown are three example driver phylogenetic trees from each group. Figure 6 - Intratumour heterogeneity index and saturation analysis

Panel A shows the number of tumour biopsies profiled (X-axis) versus the number of driver events (i.e. all gene mutations and SCNAs shown in Figure 1A) discovered (Y-axis) for densely sampled (20+ biopsies) cases. Panel B shows saturation curves for all cases with ³ 15 biopsies, with biopsy number plotted on x-axis and proportion of the total driver events detected (from all biopsies) on y-axis, increasing with each additional biopsy taken. Data is shown for all cases and tumours split based on low and high ITH (above/below median). Panel C shows a boxplot summary of the absolute number (top) and proportion (bottom) of biopsies needed to detect ³ 0.75 of driver events for tumours grouped by evolutionary subtype. Panel D illustrates the potential errors arising from a two-site biopsy approach: considering all pairs of biopsies, plotted on the X-axis is the mean number of subclonal driver events misidentified as clonal (illusion of clonality), on Y-axis is the number of subclonal driver events missed entirely. Data is shown for three clinical scenarios left: Small Renal Masses (size < 4cm), middle: tumours treated by nephrectomy with curative intent and right: tumours treated by cytoreductive nephrectomy. The size of points within a panel is proportional to the number of biopsies available for that tumour.

Figure 7 - Clinical endpoints

Panel A Kaplan-Meier plots for progression free survival (PFS) in the TRACERx Renal cohort (three plots in top row) and for overall survival (OS) in TCGA KIRC cohort (three plots in bottom row). Three groupings are plotted for each cohort. Left: high (>median) versus low ITH index: middle: high (>median) versus low wGII; and right: four group high/low

combination groupings of the two metrics. Log-rank and adjusted (for stage and grade as covariates in a Cox proportional hazard model) p-values are stated. Panel B shows the proportion of cases, within each of the high/low four groups, that progressed to disseminated versus solitary metastases, based on each patient’s first progression event. Counts in the highest group“low ITH, high wGII”, were compared to all other groups through Fisher’s exact test. Panel C shows cancer related deaths OS analysis (as opposed to PFS shown in panel A) for the TRACERx Renal cohort, with patients grouped using the four-category high/low ITH/wGII system. Log-rank and adjusted (for stage and grade as covariates in a Cox proportional hazard model) p-values are stated.

Figure 8 - Consort diagram and summary of sequencing.

Figure 9 - TRACERx Renal cohort: correlation between driver events and clinical variables.

Figure 10 - SCNAs co-occurring with mutational driver events. Figure 11 - Overview

Panel A is an overview of somatic alterations detected in matched primary and metastatic tumours across a subset of 38 TRACERx Renal patients. The top panel shows the proportion of clonal and subclonal alterations. In the middle panel alterations in primary tumours are indicated in a lighter shade of colour, those in metastases in darker. Clonal alterations are shown as rectangles and subclonal alterations as triangles. Parallel evolution is indicated in orange with a split indicating multiple events. Abbreviation for tumour sites: P - primary; TT - tumour thrombus; AD - adrenal gland, indirect metastasis; AD(D) - direct invasion of adrenal gland; AD(CL) - contralateral adrenal gland; Renal(CL) - contralateral kidney; Pr - perirenal invasiona; Pf - peri-nephric fat and gerota fascia invasion. Panel B shows the number of clonal and subclonal somatic alterations in primary and metastatic tumours. Panel C shows the number of somatic alterations 1) detected in both primary tumour and the matched metastatic tumour; 2) detected in primary tumour but not the matched metastatic tumour and 3) detected in metastatic tumour but not the matched primary tumour. Panel D shows the proportions of synchronous and metachronous metastatic tumours profiled in the TRACERx Renal, HUC and MSK cohorts. Panel E shows the range of the metastatic sites sampled across the TRACERx, HUC and MSK cohorts. The total number of metastases sampled (n), and the number from each study (Tx represents TRACERx Renal; HUC and MSK are extension cohorts) are shown in brackets.

Figure 12 - Characterisation of metastasising clone(s)

Panel A illustrates the method used to categorising tumour clones. Panel B shows four violin plots summarising (starting top left and working clockwise): i) non-synonymous mutation count, ii) wGII, iii) ploidy and iv) Ki67. Values are compared between tumour clones “not selected” and“selected” for metastasis, with all region/clone values plotted per tumour (excluding MRCA“maintained” clones - see STAR methods). A linear mixed effect (LME) model was used to determine significance, to account for the non-independence of multiple observations from individual tumours. Panel C shows for each driver event the proportion of times it was observed in“not selected” and“selected” clones, for TRACERx, HUC and MSK cohorts. The far right panel shows the log 10 p-value for each event, for enrichment in “selected” versus“not selected” clones, testing using a binomial test with meta-analysis conducted using Fisher's method of combining p values from independent tests, and p- values corrected for multiple testing using Benjamini-Hochberg procedure. Panel D shows overall survival hazard ratios for events with P<0.05 in panel C analysis. Data is shown for TRACERx and HUC cohorts separately, with the circle representing the hazard ratio value, and lines corresponding to the 95% confidence interval estimate. Panel E shows overall survival results for TRACERx and HUC cohorts (combined), split into two groups based on SCNA status at chr 9p21.3 (either copy number loss at chr 9p21.3, or normal wildtype copy number).

Figure 13 - Tumour thrombus

Figure 3 shows tumour thrombus (TT) driver trees. Tumour TNM stage and driver events leading to TT are annotated. Where no uniquely identifiable primary clone was found to seed the TT, a dotted circle is used to represent the notional seeding clone at the TT level. Length of branches connecting clones is not informative.

Figure 14 - Lymph node and distant metastases

Panel A shows driver trees and the clinical course for cases with lymph node and distal metastases. Cases were grouped into those with“rapid progression” and“attenuated progression”. Annotated for each case are the primary tumour evolutionary subtype, primary tumour ITH/wGII classification, select driver events on the tree (VHL, BAP1 , PBRM1 , MTOR, SETD2, TSC1 , TSC2, chr 9p loss and chr 14q loss). Metastasis-seeding subclones and any subclones private to metastasis are highlighted in blue. Clinical course is shown from the time of nephrectomy to death or last follow up. Pattern of disease progression is classified as multiple new metastases (multiple circles), solitary new metastasis (single circle) and progression of existing metastases (“PD”). Progression and follow up times are shown in months. Systemic treatments are indicated. Synchronous and metachronous metastatic sites are listed under corresponding time points. Profiled metastases are highlighted in boxes. Abbreviation for tumour sites: P - primary; TT - tumour thrombus; AD - adrenal gland; AD(D) - direct invasion of adrenal gland; AD(CL) - contralateral adrenal gland; Renal(CL) - contralateral kidney; Pr - perirenal invasion; Pf - peri-nephric fat and gerota fascia invasion. Panel B shows the number of cases with“rapid progression” or“attenuated progression” in each evolutionary subtype. Panel C shows the maximum wGII and ITH in cases with“rapid progression” and“attenuated progression”.

Figure 15 - Latent metastases

Panel A shows the distribution of times from nephrectomy to metastasis resection, split by site of metastasis. The circle represents the median value, and grey lines depicts the median average deviation (MAD) value (i.e. plus/minus one MAD). Far right in brackets are the range [min to max] values. Panel B shows wGII values per region split by site of metastasis. All regions are shown per metastasis, and a linear mixed effect (LME) model was used to determine significance (for pancreas versus all other), to account for the non-independence of multiple observations from individual tumours. Panel C shows fishplot progression patterns for the three cases (SP006, SP023, SP058) with latent pancreatic metastases.

Figure 16 - Spatial resolution of metastases through post-mortem sampling

Figure 6 shows cases K548 (top) and K489 (bottom) which were sampled at post-mortem with the extent of sampling and the clinical course shown. Metastatic progression is illustrated using fish plots with the select driver events annotated ( VHL , BAP1, PBRM1, MTOR, SETD2, TSC1, TSC2, chr 9p loss and chr 14q loss). Metastasising clone colour matches that of the corresponding metastatic site.

Figure 17 - consort diagram for the selection of metastatic samples. Figure 18 - survival analysis for cases where tumour thrombus had early or late

evolutionary divergence.

Figure 19 -Ki67 proliferation index data.

Figure 20 - Cases with tumour thrombus and distal metastases.

Figure 21 - Fish-plot summary of selected cases. Figure 22 - Forest Plot demonstrating SCNAs/WGII and risk of death.

DETAILED DESCRIPTION

PATIENT STRATIFICATION AND SUB-GROUPS

The present invention is based on stratification of patients with ccRCC into particular sub groups, based on the presence or absence of particular genetic events. Identification of patients as belonging to a particular sub-group may be informative as to which treatment, e.g. drug or surgery, may be more appropriate for said patient. This provides an improved way of selecting patients for treatment with drug therapy or surgery and identifies new patient sub-groups who are advantageously treated with drug therapy.

As discussed in the present Examples, analysis of 101 primary ccRCC tumours revealed seven distinct evolutionary subtypes, which were termed “multiple clonal driver”, “BAP1 driven”, “VHL wild type”, “PBRM1~>SETD2”, “PBRM1->mTOR”, “PBRM1->SCNA”, and “VHL monodriver”. The inventors were able to identify an association between the primary tumour evolutionary subtype and the pattern of metastatic spread, which is useful in identifying patients suitable for drug treatment prior to, or instead of, surgery.

Namely, primary tumours with no evidence of chromosomal instability (CIN) had low metastatic potential (“VHL monodriver”); primary tumours with subclonal CIN were associated with synchronous and metachronous oligometastatic disease (“PBRM1-- >SETD2”, “PBRM1-->mTOR”, “PBRM1~>SCNA”); while primary tumours featuring clonal CIN were associated with disseminated disease and shortened progression-free survival (“multiple clonal driver”,“BAP1 driven”,“VHL wild type”).

The following sub-groups were identified:

1. “multiple clonal drivers”, characterised by the presence of two of BAP1 , PBRM 1 , SETD2 or PTEN clonal mutations in addition to a VHL mutation. This sub-type is characterised by high levels of wGII.

2. “BAP1 driven”, characterised by the presence of BAP1 as the sole mutational driver event in addition to VHL. This sub-type is characterised by the presence of high levels of wGII.

3. “PBRM1-->SETD2 driven”. This sub-type is characterised by highly branched trees and high ITH index score.

4. PBRM1-->PI3K. This sub-type is characterised by early PBRM1 mutation followed by mutational activation of the PI3K/AKT/mTOR pathway.

5. PBRM1-->SCNA. This sub-type is characterised by early PBRM1 mutation followed by subclonal SCNAs.

6. VHL wildtype. This sub-type is characterised by high levels of wGII.

7. VHL monodriver. This sub-type is characterised by low levels of wGII.

The“multiple clonal drivers” sub-group may be defined as the presence of an inactivating modification in two or more of BAP1 , PBRM1 , SETD2 or PTEN genes in addition to an inactivating modification in the VHL gene. The inactivating modifications may be present clonally, wherein“clonal” is as defined herein.

An“inactivating modification” as referred to herein may be a mutation, or may be inactivation via methylation. One skilled in the art will be aware of how to identify the presence of such inactivating modifications. For example, one skilled in the art will be aware of routine sequencing methods to identify mutations relative to a standard, e.g. germline, sequence. The inactivation may alternatively be a copy number event, i.e. loss of the particular gene. By“mutation” is meant any non-synonymous mutation relative to the germline sequence.

The“BAP1 driven” sub-group may be defined as the presence of an inactivating modification in the BAP1 gene. The BAP1 inactivating modification may be present clonally or sub- clonally. An inactivating modification in the VHL gene may optionally be present. An inactivating modification in the PBRM1 gene may not be present.

The“PBRM1~>SETD2 driven” sub-group may be defined as the presence of an inactivating modification in the PBRM1 gene, and in the SETD2 gene. The PBRM1 inactivating modification may be present clonally or sub-clonally, and the SETD2 inactivating

modification may be present sub-clonally. This sub-group is characterised in that the PBRM1 inactivating modification appears before the SETD2 inactivating modification in the tumour evolution.

The“PBRM1-->PI3K” sub-group may be defined as the presence of an inactivating modification in the PBRM1 gene, and modification which is (activating or inactivating) in the PI3K/mTOR pathway, which is defined as comprising genes selected from mTOR, PIK3CA, TSC1 , TSC2, and PTEN. As such, this sub-group is characterised by the presence of an inactivating modification in the PBRM1 gene, and an activating or inactivating modification in one or more of the mTOR, PIK3CA, TSC1 , TSC2, and PTEN genes. The PBRM1 inactivating modification may be present clonally or sub-clonally, and the PI3K/mTOR pathway modification may be present sub-clonally. This sub-group is characterised in that the PBRM1 inactivating modification appears before the PI3K/mTOR inactivating

modification in the tumour evolution.

The“PBRM1-->SCNA” sub-group may be defined as the presence of an inactivating modification in the PBRM1 gene, and also the presence of one or more of the somatic cell copy number alterations discussed below, i.e. (i) loss 9p, (ii) loss 14q, (iii) gain 8q and (iv) gain 12p.

The“VHL wildtype” sub-group may be defined as the absence of an inactivating modification in the VHL gene.

The“VHL monodriver” sub-group may be defined as the presence of an inactivating modification in the VHL gene. This sub-group is characterised in that no other inactivating modifications (driver mutations) are present. The VHL inactivating modification may be present clonally.

The“multiple clonal drivers”,“BAP1 driven” and“VHL wildtype” sub-types show an increased likelihood of metastasis to multiple sites and a shorter time to metastasis. As such, it is more appropriate to treat patients in these sub-types with drug therapy prior to, or instead of, surgery, that is, systemic treatment should not be deferred.

(ii) an inactivating modification in the BAP1 gene, and optionally the VHL gene; or

(iii) absence of an inactivating modification in the VHL gene.

The invention also provides an anti-clear cell renal cell carcinoma drug for use in treating a patient having clear cell renal cell carcinoma, said patient having been determined to have a clear cell renal cell carcinoma comprising: (i) an inactivating modification in the VHL gene and in two or more of BAP1 ,

PBRM1 , SETD2 and PTEN genes;

(iii) absence of an inactivating modification in the VHL gene. The invention also provides an anti-clear cell renal cell carcinoma drug for use in treating a patient having clear cell renal cell carcinoma, wherein said patient has a clear cell renal cell carcinoma comprising:

(i) an inactivating modification in the VHL gene and in two or more of BAP1 , PBRM1 , SETD2 and PTEN genes; (ii) an inactivating modification in the BAP1 gene, and optionally the VHL gene; or

(iii) absence of an inactivating modification in the VHL gene.

In a further aspect the invention provides the use of an anti-clear cell renal cell carcinoma drug in the manufacture of a medicament for use in treating a patient having clear cell renal cell carcinoma, said patient having been determined as having a clear cell renal cell carcinoma comprising:

(iii) absence of an inactivating modification in the VHL gene.

In another aspect the invention provides use of an anti-clear cell renal cell carcinoma drug in the treatment of a patient having clear cell renal cell carcinoma, said patient having been determined as having a clear cell renal cell carcinoma comprising:

(ii) an inactivating modification in the BAP1 gene, and optionally the VHL gene; or (iii) absence of an inactivating modification in the VHL gene.

Clear cell renal cell carcinomas in the other identified sub-groups are more likely to metastasise to only one organ, and hence it may be more appropriate to treat patients in these sub-types with surgery and radiotherapy before drug treatment, that is, systemic therapy may be deferred in these patients. As such, in one aspect the invention provides a method of treating a patient having clear cell renal cell carcinoma comprising surgical intervention of the clear cell renal cell carcinoma, said patient having been determined to have a clear cell renal cell carcinoma comprising:

(i) an inactivating modification in the PBRM1 gene, and in the SETD2 gene;

(ii) an inactivating modification in the PBRM1 gene, and in the mTOR gene; (iii) an inactivating modification in the PBRM1 gene, and any one of the following somatic cell copy number alterations: (i) loss 9p, (ii) loss 14q, (iii) gain 8q and (iv) gain 12p; or

(iv) an inactivating modification in the VHL gene. The invention also encompasses a method for stratifying patients into the seven sub-groups described above, said method comprising determining in a sample which has been obtained from a ccRCC in a patient the presence or absence of the biomarkers discussed above and allocating said sample and said patient to one of the seven sub-groups accordingly. The invention encompasses an in vitro method for predicting the response of a patient with ccRCC to drug treatment or surgery, comprising the steps of:

(i) obtaining a sample from said patient; and

(ii) determining the presence or absence of:

(a) an inactivating modification in the VHL gene and in two or more of BAP1 , PBRM1 , SETD2 and PTEN genes;

(b) an inactivating modification in the BAP1 gene, and optionally the VHL gene;

(c) an absence of an inactivating modification in the VHL gene; or

(d) (iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

The presence of any one of (a) to (d) may be indicative of an improved response to drug treatment compared with surgery. The absence of any of (a) to (d) may be indicative of a worse response to drug treatment, i.e. said patient would be better treated with surgery.

The invention also encompasses an in vitro method for determining whether drug treatment is appropriate for a patient diagnosed with ccRCC, wherein said method comprises:

(i) obtaining a sample from said patient; and (ii) determining the presence or absence of:

(b) an inactivating modification in the BAP1 gene, and optionally the VHL gene;

(c) an absence of an inactivating modification in the VHL gene; or

(d) (iv) any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

The presence of any one of (a) to (d) may be indicative that drug treatment is appropriate in said patient. The absence of any of (a) to (d) may be indicative that drug treatment is not appropriate for said patient, and that surgery may be more appropriate for said patient. SOMATIC CELL COPY NUMBER ALTERATIONS (SCNA)

In one aspect, the biomarker may be a somatic cell copy number alteration. Somatic cell copy number alterations are changes in gene copy number that have arisen in somatic tissue.

As described in the present Examples, the inventors found that four particular SCNAs identify clear cell renal cell carcinomas that are more likely to metastasise then clear cell renal cell carcinomas that do not comprise the four SCNAs. These SCNAs are therefore biomarkers for predicting the likelihood of metastasis of a clear cell renal cell carcinoma and may be informative as to whether the carcinoma should be treated with drug therapy or surgery in the first instance.

The SCNAs in question are (i) loss 9p, (ii) loss 14q, (iii) gain 8q and (iv) gain 12p

(corresponding to the loss of HIFIalfa, and CDKN2A and CDKN2B; and gain of MYC, respectively).

As such, in one aspect the invention provides a method for identifying a patient in whom metastasis of clear cell renal cell carcinoma is more likely to occur, said method comprising analysing the following somatic copy number alteration (SCNA) biomarkers: (i) loss 9p, (ii) loss 14q, (iii) gain 8q and (iv) gain 12p.

In a further aspect the invention provides a method of treating a patient having clear cell renal cell carcinoma comprising administering an effective amount of an anti-clear cell renal cell carcinoma drug to the patient, said patient having been determined to have a clear cell renal cell carcinoma comprising any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain

12p.

The invention also provides an anti-clear cell renal cell carcinoma drug for use in treating a patient having clear cell renal cell carcinoma, said patient having been determined to have a clear cell renal cell carcinoma comprising any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

The invention also provides an anti-clear cell renal cell carcinoma drug for use in treating a patient having clear cell renal cell carcinoma, wherein said patient has a clear cell renal cell carcinoma comprising any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

In a further aspect the invention provides the use of an anti-clear cell renal cell carcinoma drug in the manufacture of a medicament for use in treating a patient having clear cell renal cell carcinoma, said patient having been determined as having a clear cell renal cell carcinoma comprising any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

In another aspect the invention provides use of an anti-clear cell renal cell carcinoma drug in the treatment of a patient having clear cell renal cell carcinoma, said patient having been determined as having a clear cell renal cell carcinoma comprising any one of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

In a further aspect the invention provides a method of treating a patient having clear cell renal cell carcinoma comprising surgical intervention of the clear cell renal cell carcinoma, said patient having been determined to have a clear cell renal cell carcinoma which does not comprise any of (i) loss 9p, (ii) loss 14q, (iii) gain 8q or (iv) gain 12p.

SCNAs may be identified using methods known in the art, for example by

immunohistochemistry or in situ hybridization, array-based profiling, or by detection from next generation sequencing datasets .

The methods and uses of the invention relating to SCNAs may be carried out in combination with any of the methods or uses described herein. For example, the SCNA, ITH, wGII, and mutational biomarkers (of the seven sub-groups described above) may be used in combination in order to predict whether a clear cell renal cell carcinoma will metastasise, and drug treatment or surgery can be selected accordingly. For example, ITH and wGII may be analysed in combination with stratification of a patient into one of the seven sub-groups discussed herein.

The invention provides a method of predicting whether a clear cell renal cell carcinoma in a patient is likely to metastasise, wherein said method comprising analysing in a sample obtained from said patient the presence of the following somatic cell copy number alterations: (i) loss 9p, (ii) loss 14q, (iii) gain 8q, (iv) gain 12p.

The invention also provides a method of predicting whether a clear cell renal cell carcinoma will metastasise, said method comprising analysing in a sample obtained from said patient the presence of the following somatic cell copy number alterations: (i) loss 9p, (ii) loss 14q, (iii) gain 8q, (iv) gain 12p.

In one aspect of any of the aspects of the invention as described herein, the somatic cell copy number alteration is selected from (i) loss 9p and (ii) loss 14q. In one aspect both loss 9p and loss 14q are present and/or analysed. In one aspect all of i) loss 9p, (ii) loss 14q, (iii) gain 8q and (iv) gain 12p are present and/or analysed. SAMPLE

Determination of whether a ccRCC from a patient comprises the biomarkers as discussed herein may be carried out in vitro on a sample from said patient. In one aspect of the invention, any if the methods or uses as described may additionally comprise a step of obtaining a ccRCC sample from a patient. In one aspect the methods or uses of the invention may comprise the step of determining the presence or absence of the biomarkers according to the invention in said sample.

Suitable samples will be known to one skilled in the art. In this regard, isolation of biopsies and samples from tumours is common practice in the art and may be performed according to any suitable method, and such methods will be known to one skilled in the art.

The sample may be a tumour sample, blood sample, tumour-associated lymph node sample or sample from a metastatic site, or tissue sample, or be peripheral blood mononuclear cells from the subject.

In certain embodiments that sample is a tumour-associated body fluid or tissue.

The sample may be a blood sample. The sample may contain a blood fraction (e.g. a serum sample or a plasma sample) or may be whole blood. Techniques for collecting samples from a subject are well known in the art.

Suitably, the sample may be circulating tumour DNA, circulating tumour cells or exosomes comprising tumour DNA. The circulating tumour DNA, circulating tumour cells or exosomes comprising tumour DNA may be isolated from a blood sample obtained from the subject using methods which are known in the art.

In one aspect more than one metastatic sample is analysed, i.e. samples from more than one metastatic site, or more than one sample from the same metastatic site.

Tumour samples and non-cancerous tissue samples can be obtained according to any method known in the art. For example, tumour and non-cancerous samples can be obtained from cancer patients that have undergone resection, or they can be obtained by extraction using a hypodermic needle, by microdissection, or by laser capture. Control (non-cancerous) samples can be obtained, for example, from a cadaveric donor or from a healthy donor. ctDNA and circulating tumour cells may be isolated from blood samples according to e.g. Nature. 2017 Apr 26;545(7655):446-451 or Nat Med. 2017 Jan:23(1l:114-119. DNA and/or RNA suitable for downstream sequencing can be isolated from a sample using methods which are known in the art. For example DNA and/or RNA isolation may be performed using phenol-based extraction. Phenol-based reagents contain a combination of denaturants and RNase inhibitors for cell and tissue disruption and subsequent separation of DNA or RNA from contaminants. For example, extraction procedures such as those using DNAzol™, TRIZOL™ or TRI REAGENT™ may be used. DNA and/or RNA may further be isolated using solid phase extraction methods (e.g. spin columns) such as PureLink™ Genomic DNA Mini Kit or QIAGEN RNeasy™ methods. Isolated RNA may be converted to cDNA for downstream sequencing using methods which are known in the art (RT-PCR).

PROGNOSIS OF ccRCC

As described in the present Examples, the inventors have determined that certain

parameters are indicative of prognosis in a patient with ccRCC.

In this regard, intratumour heterogeneity (ITH) and weighted genome instability index (wGII) may be indicative of the prognosis of a patient with ccRCC.

By“intratumour heterogeneity” (ITH) is meant heterogeneity within the same tumour. Intra tumour heterogeneity (ITH) has been documented for many decades, initially from a morphological perspective. Cancers of all types are now recognised to consist of highly diverse populations of cells, where ITH is detectable at the genetic, epigenetic, and phenotypic levels. Recent advances in next-generation sequencing and microarray technology have enabled researchers to begin to appreciate the full extent and complexity of ITH.

“ITH index” as used herein is taken to mean number of subclonal drivers/number of clonal drivers.“Drivers” may include all driver mutations and driver SCNAs shown in Figure 1A.

The drivers may thus be selected from VHL, PBRM1 , SETD2, PIK3CA, MTOR, PTEN, KDM5C, CSMD3, BAP1 , TP53, TSC1 , TSC2, ARID1A, TCEB1 , gain 1q25.1 , gain 2q14.3, gain5q35.3 ,gain7q22.3, gain8q24.21 , gainl 2p11.21 , gain20q13.33, loss1p36.11 , loss3p25.3, loss4q34.3, loss6q22.33, loss8p23.2, loss9p21.3, loss14q31.1.

ITH may be assessed by methods known in the art, for example as described in the present Examples.

By“weighted genome instability index (wGII)” is meant the proportion of the genome with aberrant copy number compared to the median ploidy, weighted on a per chromosome basis. wGII may be determined by methods known in the art, e.g. as used in the present Examples.

By“high” or“low” ITH or wGII is meant low or high in a cohort of patients. One skilled in the art would be aware of how to determine the median value and to assess whether the level of a particular patient is higher or lower than said median value.

“Low” may be considered to be below the median value, and“high” to be equal to or above the median value.

By way of example, the median ITH index value may be 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9,

1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 , 12, 13, 14, 15, 16, 17, 18, 19 or 20. In the cohort in the present Examples the median ITH index value was 1.

The median value for wGII may be, for example, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60% or values inbetween. The median for the cohort in the present Examples for wGII was 32.8%

Low ITH and low WGII may be indicative of a good (or better or improved) prognosis, i.e. within a cohort of patients. Patients with both low ITH and wGII show low metastasis and death rates.

Patients with either high ITH and low WGII, or high ITH and high WGII may have an intermediate prognosis. That is, metastasis and death rates may be higher than in patients with both high ITH and wGII, but lower than in patients with low ITH and high wGII.

Patients with low ITH and high wGII may have a worse prognosis, with higher death rates.

In one aspect the invention provides a method for assessing the prognosis of a patient with ccRCC, said method comprising analysing the level of intratumour heterogeneity (ITH), for example using the ITH index, and weighted genome instability index (wGII) in a sample of said ccRCC from said patient.

In one aspect low ITH and low wGII in said sample is indicative of a good or improved prognosis for said patient. In one aspect high ITH and low WGII, or high ITH and high WGII in said sample is indicate of an intermediate prognosis for said patient. In one aspect low ITH and high wGII in said sample is indicative of a bad or worse prognosis for said patient. The prognosis may be relative to other patients in a cohort. In the present Examples it is shown that patients with high ITH index (e.g. above of the median value of the cohort) had significantly reduced progression free survival compared to those with low ITH index.

One skilled in the art will be able to determine a suitable cohort of patients. In one aspect the invention provides a method for predicting the response of a patient with clear cell renal cell carcinoma (ccRCC) to drug treatment or surgery, wherein said method comprises analysing intratumour heterogeneity (ITH) and weighted genome instability index (wGII) in a sample from said patient.

In one aspect patients with low ITH and low WGII may be predictive of an improved response to surgery (in view of low metastasis).

Patients with either high ITH and low wGII, or high ITH and high wGII may be predictive of a response to either surgery or drug treatment.

Patients with low ITH and high wGII may be predictive of an improved response to drug treatment, rather than surgery, in view of increased metastasis.

In one aspect is provided use of intratumour heterogeneity and weighted genome instability index as biomarkers for predicting the response of a patient with ccRCC to drug treatment or surgery.

The method or use may identify a patient suitable for treatment with drug treatment or surgery as described herein.

The invention also provides a method of treating ccRCC in a patient predicted to have an improved response to drug treatment by the method comprising administering a

therapeutically effective amount of said drug treatment to said patient.

The invention also provides a method of treating ccRCC in a patient predicted to have an improved response to surgery by the method comprising surgical resection of a ccRCC tumour or metastasis.

The invention also provides an anti- ccRCC drug for use in treating a patient with ccRCC , wherein said patient is predicted to have an improved response to drug treatment by the method, and the use of an anti- ccRCC drug in the manufacture of a medicament for use in treating a patient with ccRCC , wherein said patient is predicted to have an improved response to drug treatment by the method.

TREATMENT

The invention also provides a method of treating a patient with clear cell renal cell carcinoma, wherein said method comprises:

(a) determining whether metastasis is likely to occur according to any of the methods of the invention as discussed herein; and (b) treating the patient with drug therapy if metastasis is likely to occur, and treating the patient with surgery if metastasis is less likely to occur.

In one aspect the invention provides a method for reducing the risk of metastases in a patient with clear cell renal cell carcinoma, wherein said method comprises administering an anti-clear cell renal cell carcinoma drug to said patient who has been determined to have a clear cell renal cell carcinoma which is more likely to metastasise according to any of the methods of the invention as described herein.

Methods according to the invention as described herein may be carried out in vitro or ex vivo, or in vivo.

In one aspect any of the methods described herein may additionally comprise a step of identifying a patient who has clear cell renal cell carcinoma.

ANTI-CLEAR CELL RENAL CELL CARCINOMA DRUGS

Anti-clear cell renal cell carcinoma drugs are known in the art and approved for treating clear cell renal cell carcinoma, and may be suitably employed according to the present invention.

In one aspect the anti-clear cell renal cell carcinoma drug may be an antibody, for example a monoclonal antibody. In one aspect the anti-clear cell renal cell carcinoma drug may be selected from Bevacizumab and Nivolumab.

In one aspect the anti-clear cell renal cell carcinoma drug is a kinase inhibitor, such as a tyrosine kinase inhibitor, which may be a VEGF inhibitor. Examples of inhibitors may include Sorafenib Tosylate, Sorafenib, Lenvatinib, Tivozanib, Sutent, Axitinib, Pazopanib,

Cabozantinib or Sunitinib. The inhibitor may be a multikinase inhibitor such as Cabozantinib.

In one aspect the anti-clear cell renal cell carcinoma drug is an inhibitor of mTOR, such as Temsirolimus or Everolimus.

In one aspect the anti-clear cell renal cell carcinoma drug is an interleukin, preferably interleukin-2 (IL-2).

In one aspect the anti-clear cell renal cell carcinoma drug is an interferon.

In one aspect the anti-clear cell renal cell carcinoma drug is a checkpoint inhibitor, such as anti-PD1 , anti-PDL1 or anti-CTLA4 molecule. Such drug are known in the art, for example, Ipillimumab, Pembrolizumab, Atezolizumab, Nivolumab, Avelumab and Durvalumab.

CLEAR CELL RENAL CELL CARCINOMA Clear cell renal cell carcinoma (ccRCC) is a renal cortical tumor typically characterized by malignant epithelial cells with clear cytoplasm and a compact-alveolar (nested) or acinar growth pattern interspersed with intricate, arborizing vasculature. A variable proportion of cells with granular eosinophilic cytoplasm may be present. ccRCC is the most common type of kidney cancer in adults, responsible for a large majority of cases.

When ccRCC metastasises, it most commonly spreads to the lymph nodes, lungs, liver, adrenal glands, brain or bones.

Staging of clear cell renal cell carcinoma may follow the TNM staging system where the size and extent of the tumour (T), involvement of lymph nodes (N) and metastases (M) are classified separately. Also, it can use overall stage grouping into stage I— IV, with the 1997 revision of AJCC described below:

The clear cell renal cell carcinoma according to the invention may be any of stage I, stage II, stage III or stage IV.

In a preferred embodiment of the present invention, the patient described herein is a mammal, preferably a human, cat, dog, horse, donkey, sheep, pig, goat, cow, mouse, rat, rabbit or guinea pig, but most preferably the patient is a human.

An effective response of a patient or a patient's“responsiveness” to treatment refers to the clinical or therapeutic benefit imparted to a patient at risk for, or suffering from, a disease or disorder. Such benefit may include cellular or biological responses, a complete response, a partial response, a stable disease (without progression or relapse), or a response with a later relapse. For example, an effective response can be reduced tumour size or progression-free survival in a patient diagnosed with clear cell renal cell carcinoma.

Treatment outcomes can be predicted and monitored and/or patients benefiting from such treatments can be identified or selected via the methods described herein.

For the treatment of disease, the appropriate dosage of a therapeutic composition will depend on the type of disease to be treated, as defined above, the severity and course of the disease, the patient's clinical history and response to the agent, and the discretion of the attending physician. The agent is suitably administered to the patient at one time or over a series of treatments.

SURGERY

As described herein, in some aspects of the invention a patient may be advantageously treated with surgery in the first instance, rather than drug treatment. This may be applied to patients having a clear cell renal cell carcinoma which is less likely to metastasise than those with a ccRCC predicted by the methods of the invention to be likely to metastasise.

For patients with clear cell renal cell carcinoma less likely to metastasise, surgical intervention of the primary tumour in the first instance may be beneficial.

Surgery may include resection in which all or part of cancerous tissue is physically removed, excised, and/or destroyed and may be used in conjunction with other therapies, such as chemotherapy, radiotherapy, hormonal therapy, gene therapy, immunotherapy, and/or alternative therapies. Tumour resection refers to physical removal of at least part of a tumour. In addition to tumour resection, treatment by surgery includes laser surgery, cryosurgery, electrosurgery, and microscopically-controlled surgery. Surgery may comprise cytoreductive nephrectomy (i.e. resection of the primary tumour for debulking purposes).

KIT

The invention also provides a kit for use in identifying patients with clear cell renal cell carcinoma who are suitable for treatment with an anti-clear cell renal cell carcinoma drug as described herein. The kit may be used in any of the methods as described herein.

Said kit may comprise suitable means for identifying the presence or absence of

modifications, such as inactivating or activating modifications, such as mutations or methyl ations in the SETD2, PTEN, VHL, BAP1 ,PBRM1 , mTOR, PIK3CA, TSC1 , and/or TSC2 genes.

In one aspect the modification may an activating modification, such as an activating modification. In one aspect said kit may comprise suitable means for identifying the presence or absence of activating modifications, such as activating mutations, in the mTOR, PIK3CA, TSC1 , TSC2 and/or PTEN genes. In an alternative aspect said kit may comprise suitable means for identifying the presence or absence of inactivating modifications, such as activating mutations, in the mTOR, PIK3CA, TSC1 , TSC2 and/or PTEN genes.

In addition, said kit may comprise suitable means for identifying the presence or absence of inactivating modifications in the SETD2, VHL, BAP1 , and PBRM1 genes.

The mutation may be any non-synonymous mutation, i.e. a mutation that alters the amino acid sequence of the resulting protein.

In one aspect the kit comprises means for identifying the presence or absence of inactivating or activating modifications in genes selected from the following: VHL (von Hippel-Lindau tumor suppressor), PBRM1 (polybromo 1 ), SETD2 (SET domain containing 2), BAP1 (BRCA1 associated protein 1), KDM5C (lysine demethylase 5C), MTOR (mechanistic target of rapamycin kinase), CSMD3 (CUB and Sushi multiple domains 3 ), TP53 (tumor protein p53), PTEN (phosphatase and tensin homolog ), PIK3CA (phosphatidylinositol-4,5- bisphosphate 3-kinase catalytic subunit alpha), ARID1A (AT-rich interaction domain 1A ), SMARCA4 (SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 4 ), MET (MET proto-oncogene, receptor tyrosine kinase), KMT2C (lysine methyltransferase 2C ), TCEB1 (transcription elongation factor B, polypeptide 1 ), TSC1 (TSC complex subunit 1), and TSC2 (TSC complex subunit 2). In one aspect the kit comprises means suitable for identifying inactivating modifications in the following genes: VHL, PBRM1 , SETD2, BAP1 , KDM5C, mTOR, CSMD3, TP53, PTEN, PIK3CA, ARID1A, SMARCA4, MET, KMT2C, TCEB1 , TSC1 , and TSC2 and activating modifications in the mTOR, PIK3CA, TSC1 , TSC2 and/or PTEN genes

In one aspect the means suitable for identifying inactivating modifications may be nucleic acid primers. One skilled in the art would be aware of how to design suitable primers.

In one aspect the means suitable for identifying inactivating modifications may be a capture probe. Capture probes for the various genes are commercially available.

Optionally, the kit may comprise primers suitable for identifying mutations in other frequently mutated genes in ccRCC.

Optionally the kit may contain genomic regions that allow detection of driver SCNAs. The driver SCNA may be at the cytoband level.

The kit according to the invention may be used to identify patients who may benefit from drug treatment in the first instance, rather than surgery.

Ine one aspect the invention provides use of the kit as described herein for in vitro prediction of the likely response of a patient with ccRCC to drug treatment or surgery, for in vitro assessment of whether drug treatment or surgery is appropriate for a patient diagnosed with ccRCC, or for in vitro identification of a patient with ccRCC for drug treatment or surgery.

The invention also encompasses use of the primers of the kit for in vitro prediction of the response of a patient with ccRCC to drug treatment or surgery, for in vitro assessment of whether drug treatment or surgery is appropriate for a patient with ccRCC, and for for in vitro selection of a patient with ccRCC for drug treatment or surgery.

In a preferred aspect of the invention the patient as described herein is a human patient.

As used herein, "primers" designate isolated nucleic acid molecules that can specifically hybridize or anneal to 5' or 3' regions of a target genomic region (plus and minus strands, respectively, or vice-versa). In general, they are from about 10 to 30 nucleotides in length and anneal at both extremities of a region containing about 50 to 200 nucleotides in length. Under appropriate conditions and with appropriate reagents, such primers permit the amplification of a nucleic acid molecule comprising the nucleotide sequence flanked by the primers. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Singleton, et al., DICTIONARY OF MICROBIOLOGY AND MOLECULAR

BIOLOGY, 20 ED., John Wiley and Sons, New York (1994), and Hale & Marham, THE HARPER COLLINS DICTIONARY OF BIOLOGY, Harper Perennial, NY (1991) provide one of skill with a general dictionary of many of the terms used in this disclosure.

This disclosure is not limited by the exemplary methods and materials disclosed herein, and any methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of this disclosure. Numeric ranges are inclusive of the numbers defining the range. Unless otherwise indicated, any nucleic acid sequences are written left to right in 5' to 3' orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively.

The headings provided herein are not limitations of the various aspects or embodiments of this disclosure which can be had by reference to the specification as a whole. Accordingly, the terms defined immediately below are more fully defined by reference to the specification as a whole.

Amino acids are referred to herein using the name of the amino acid, the three letter abbreviation or the single letter abbreviation.

The term“protein", as used herein, includes proteins, polypeptides, and peptides.

Other definitions of terms may appear throughout the specification. Before the exemplary embodiments are described in more detail, it is to understand that this disclosure is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present disclosure will be limited only by the appended claims.

Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within this disclosure. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within this disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in this disclosure.

It must be noted that as used herein and in the appended claims, the singular forms "a",

"an", and "the" include plural referents unless the context clearly dictates otherwise.

The terms "comprising", "comprises" and "comprised of as used herein are synonymous with "including", "includes" or "containing", "contains", and are inclusive or open-ended and do not exclude additional, non-recited members, elements or method steps. The terms "comprising", "comprises" and "comprised of also include the term "consisting of.

The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that such publications constitute prior art to the claims appended hereto. The invention will now be further described by way of Examples, which are meant to serve to assist one of ordinary skill in the art in carrying out the invention and are not intended in any way to limit the scope of the invention.

Example 1 - Assessment of renal cancer evolution

Materials and methods

STAR Methods

Experimental model and patient details

Patients were recruited into TRACERx Renal Renal, an ethically approved prospective cohort study (National Health Service Research Ethics Committee approval 11/LO/1996). The study sponsor is the Royal Marsden NHS Foundation Trust. The study is coordinated by the Renal Unit at the Royal Marsden Hospital NHS Foundation Trust. The study is open to recruitment at the following sites: Royal Marsden Hospital NHS Foundation Trust, Guy’s and St Thomas’ Hospital NHS Foundation Trust, Royal Free Hospital NHS Foundation Trust and Western General Hospital (NHS Lothian). Patients were recruited into the study according to the following eligibility criteria:

Inclusion criteria

• Age 18- years or older

• Patients with histologically confirmed renal cell carcinoma, or suspected renal cell carcinoma, proceeding to nephrectomy/metastectomy

• Medical and/or surgical management in accordance with national and/or local guidelines • Written informed consent (permitting fresh tissue sampling and blood collection; access to archived diagnostic material and anonymised clinical data)

Exclusion criteria

• Any concomitant medical or psychiatric problems which, in the opinion of the investigator, would prevent completion of treatment or follow-up

• Lack of adequate tissue

Further eligibility criteria were applied to the cohort presented in this paper (it therefore follows that these patients do not have consecutive study ID numbers from 001 to 100):

• Confirmed histological diagnosis of clear cell renal cell carcinoma.

• No documented germline renal cell carcinoma predisposition syndrome (including VHL).

• At least three primary tumour regions available for analysis.

The cohort was representative of patients eligible for curative or cytoreductive nephrectomy. Full clinical characteristics are provided in Table S1A. Demographic data include: Sex, Age and Ethnicity. Clinical data include: Presenting symptoms, Smoking status, BMI, History of Previous RCC, Family History of RCC, Bilateral or Multi-focal RCC, Neoadjuvant therapy (6 patients received systemic therapy prior to nephrectomy). Histology data include: overall TNM Stage (based on Version 7 classification), Location of nephrectomy, Number of harvested and involved lymph nodes, presence of Microvascular Invasion, presence of Renal Vein Invasion, presence of IVC tumour thrombus, Size of primary tumour, Leibovich score, Fuhrman Grade, Time to nephrectomy (days). Clinical status of patients included: Relapse free survival (months), Total follow up (months), Survival Outcome. 16 patients were lost to follow-up: 8 were stage I, 5 stage III and 3 stage IV. For clinical parameter correlation and outcome analyses for cases with multiple tumours (K114, K324, K354, K097, k265) we used the higher stage (or if stage was equal, then the larger of the two tumours, namely: K114_L, K334_R, K352_1 , K097_L, K265_1.

Classification of disease progression pattern for metastatic cases. Patterns of disease progression (Table S1 B) were classified as follows (1) Rapid- disease progression with multiple new lesions or cancer-specific death within 6 months of surgery (2) Attenuated- no disease progression (for example completely resected metastases at presentation, remains disease-free); disease progression with a single new lesion within 6 months of surgery (for example a solitary bone, brain or lung deposit) OR disease progression after >6 months of surgery.

Method details Sample collection

All surgically resected specimens were reviewed macroscopically by a pathologist to guide multi-region sampling for this study and to avoid compromising diagnostic requirements. Tumour measurements were recorded and the specimen were photographed before and after sampling. Primary tumours were dissected along the longest axes and spatially separated regions sampled from the“tumour slice” using a 6 mm punch biopsy needle. The punch was changed between samples to avoid contamination. The total number of samples obtained reflects the tumour size with a minimum of 3 biopsies that are non-overlapping and equally spaced. However, areas which are obviously fibrotic or haemorrhagic are avoided during sampling and every attempt is made to reflect macroscopically heterogeneous tumour areas. Primary tumour regions are labelled as R1 , R2, R3... Rn and locations are recorded. Normal kidney tissue was sampled from areas distant to the primary tumour and labelled N1. Each biopsy was split into two for snap freezing and formalin fixing respectively, such that the fresh frozen sample has its mirror image in the formalin-fixed sample which is subsequently paraffin embedded. Fresh samples were placed in a 1.8 ml cryotube and immediately snap frozen in liquid nitrogen for >30 seconds and transferred to -80 C for storage. Peripheral blood was collected at the time of surgery and processed to separate buffy coat.

Nucleic acid isolation from tissue and blood (TRACERx Renal cohort)

DNA and RNA were co-purified using the AllPrep DNA/RNA mini kit. (Qiagen). Briefly, a 2mm³ piece of tissue was added to 900ul of lysis buffer and homogenised for five seconds using the TissueRaptor (Qiagen) with a fresh homogenisation probe being used for each preparation. Each lysate was applied to a QiaShredder (Qiagen) and then sequentially purified using the DNA and RNA columns according to the manufacturer’s protocol. Germline control DNA was isolated from whole blood using the DNeasy Blood and Tissue kit (Qiagen) according to the manufacturers protocol. DNA quality and yield was measured and accessed using the TapeStation (Agilent) and Qubit Fluorometric quantification. (ThermoFisher Scientific)

Detection of VHL mutations by Sanger sequencing

Validation of the patient VHL mutations was carried using PCR followed by Big Dye Terminator Sanger sequencing on the ABI 3700. 20ng of patient DNA was amplified for each VHL exon. PCR conditions involved 35 cycles of denaturation at 95°C, followed by oligonucleotide primer annealing at 55°C and sequence extension at 72°C using Qiagen Taq polymerase and reagents. Methylation specific PCR

Methylation of the VHL promoter was detected after bisulphite treatment of 500ng of patient DNA using the EZ DNA Methylation-Direct kit (Zymo Research). Bisulphite treated DNA was amplified in the PCR using methylation specific oligonucleotides followed by Big Dye terminator Sanger sequencing. Methylation was confirmed by comparing and contrasting patient tumour and normal renal tissue for methylation protected CpG sequences.

Independent pathology review of individual tumour regions

Where available, (median of 7 regions per patient (range: 1-63) from 79 patients) histological sections of each region in each case were evaluated by the same pathologist (JIL). Tumor type was assigned to each case following current classification of the International Society of Urologic Pathology (ISUP) (Srigley et al. , 2013). Four main histological types were considered based only on hematoxylin-eosin sections: clear cell renal cell carcinoma, papillary renal cell carcinoma, chromophobe renal cell carcinoma and renal oncocytoma. Atypical cases, including unclassified and tumours with mixed histology, were specifically annotated. Tumor architecture was also considered. The presence of rhabdoid and syncytial (Przybycin et al., 2014; Williamson et al., 2014) cells in any region of tumours were also considered, since both are related to a more aggressive clinical course. Tumour grading was performed according to the most up to date ISUP classification (Delahunt et al., 2013) and the presence of necrosis sarcomatoid changes and microvascular invasion was noted. Percentage of viable tumour cells was also estimated in every sample to provide an approximate percentage of tumour content.

Regional staining by Immunohistochemistry and Digital Image Analysis of Ki67

Tissue sections of 4pm were mounted on slides and immunohistochemical staining for Ki67 was performed using a fully automated immunohistochemistry (IHC) system and ready-to- use optimized reagents according to the manufacturer^'s recommendations (Ventana Discovery Ultra, Ventana, Arizon, USA). Primary antibody used was rabbit anti-Ki67 (AB16667, Abeam, Cambridge, UK) and secondary antibody was Discovery Omnimap anti rabbit HRP RUO (760-4311 , Roche, Rotkreuz, Switzerland). DAB kit was Discovery Chromomap DAB RUO (760-4311 , Roche). After IHC procedure, slides were first evaluated for Ki67 staining quality using mouse intestine tissue as positive control. Regions containing tumor tissue were identified and marked by a pathologist and subsequently scanned in brightfield at 20x magnification using Zeiss Axio Scan.ZI and ZEN lite imaging software (Carl Zeiss Microscopy GmbH, Jena, Germany). Digital images were then subjected to automated image analysis using StrataQuest version 5 (TissueGnostics, Vienna, Austria) for Ki67 quantification. Three different gates were set to quantify low, medium and high intensity DAB staining which corresponded to Ki67 expression levels. Results were depicted as total percentage of Ki67-positive nuclei.

Flow Cytometry Determination of DNA Content (FACS)

Fresh frozen tumour tissue samples, approximately 4mm³ in size, were mechanically disrupted and incubated in 2ml of 0.5% pepsin solution (Sigma, UK) at 37 °C for 40 minutes to create a suspension of nuclei. The nuclei were washed with phosphate-buffered saline (PBS) and then fixed with 70% ethanol for a minimum of 90 minutes. The nuclei were washed again with PBS and stained with 200mI of propidium iodide (50pg/ml) overnight. Flow cytometric analysis of DNA content was performed using the LSR Fortessa Cell Analyzer (Becton Dickinson, San Jose, USA), BD Facs Diva™ software and FlowJo software (FlowJo LLC, Oregon, USA. A minimum of 10,000 events were recorded (typically up to 20,000 and up to 100,000 in complex samples). Analysis was performed using methods derived from the European Society for Analytical Cellular Pathology DNA Consensus in Flow Cytometry guidelines and following discussions with Derek Davies (Head of Flow Cytometry Facility, The Francis Crick Institute). Gating of forward and side scatter was applied to exclude debris and cell clumping. Samples with <7,500 events after gating were excluded from further analysis. The coefficient of variation (CV) was measured on each G1 peak. Samples with a CV>10% were excluded from further analysis. Each tumour sample was assumed to contain normal cells to act as internal standard. Where possible the position of the diploid peak was calculated with reference to the peak of diploid cells in a case matched normal tissue sample. The DNA index (Dl) of any aneuploid peak present was calculated by dividing the G1 peak of the aneuploid population by the G1 peak of the normal diploid cells. Diploid samples were defined as having Dl of 1.00. Any additional peak was defined as aneuploid. A tetraploid peak was defined as having a Dl of 1.90-2.10 and containing >15% of total events unless a second peak corresponding to G2 was clear on the histogram. Similarly, aneuploid peaks near to G1 (Dl 0.90-1.10) were only considered if there was a clear second peak containing >15% of total events.

Targeted Driver Panel (DP) design and validation

Driver gene panels (Panel_v3, Panel_v5 and Panel_v6) were used in this study. Panel_v3 was designed in 2014, including 110 putative driver genes. Panel_v5 and Panel_v6 were designed in 2015, including 119 and 130 putative driver genes respectively. Driver genes were selected from genes that were frequently mutated in TCGA (accessed in April 2015) or highlighted in relevant studies (Arai et al., 2014; Sato et al., 2013; Scelo et al., 2014). Only alterations in driver genes represented in all three panels were considered in the overall driver mutation analyses. All panels targeted potential driver SCNA regions. To prevent inter-patient samples swaps, we included the 24 SNPs that were previously identified by Pengelly et al in Panel_v5 and Panel_v6. Details of the 3 panels can be found in Supplementary table (Table S2).

Renal Driver Panel Library Construction and Targeted Sequencing

Following isolated gDNA QC, depending on the available yield, samples were normalised to either 1-3 pg or 200 ng for the Agilent SureSelectXT Target Enrichment Library Protocol; standard or low input sample preparation respectively. Samples were normalised using a 1X Low TE Buffer. Samples were sheared to 150-200bp using a Covaris E220 (Covaris, Woburn, MA, USA), following the run parameters outlined in the Agilent SureSelectXT standard 3 pg and low input 200 ng DNA protocols. Library construction of samples was then performed following the SureSelectXT protocols, using 6 pre-capture PCR cycles for the standard input samples and 10 pre-capture PCR cycles for the 200 ng low input samples. Hybridisation and capture were performed for each individual sample using the Agilent custom Renal Driver Panel target-specific capture library (versions 3, 5 & 6). The same version of the capture library being used for all samples from the same patient case. Captured SureSelect-enriched DNA libraries were PCR amplified using 14 post-capture PCR cycles in PCR reactions that included the appropriate indexing primer for each sample. Amplified, captured, indexed libraries passing final QC on the TapeStation 4200 were normalised to 2nM and pooled, ensuring that unique indexes were allocated to all final libraries (up to 96 single indexes available) in the pool. QC of the final library pools was performed using the Agilent Bioanalyzer High Sensitivity DNA Assay. Library pool QC results were used to denature and dilute samples in preparation for sequencing on the lllumina HiSeq 2500 and NextSeq 500 sequencing platforms. The final libraries were sequenced 101 bp paired-end multiplexed on the lllumina HiSeq 2500 and 151 bp paired-end multiplexed on the NextSeq 500, at the Advanced Sequencing Facility at the Francis Crick Institute. Equivalent sequencing metrics, including per sample coverage, was observed between platforms.

Whole Exome Library Construction and Sequencing

gDNA isolated from each sample were normalized to 1-3 pg. Libraries were prepared from using the Agilent SureSelectXT Target Enrichment Library protocol and Agilent SureSelectXT Human All Exon v4 enrichment capture library. The libraries were prepared using 6 pre-capture and 12 post-capture PCR cycles. Captured Whole Exome final libraries passing the final QC step were normalised to 2nM and pooled for sequencing on the HiSeq 2500 instrument. Dual HiSeq SBS v4 runs at 101 bp paired-end reads generated the data for analysis. Target coverage was 400-500x for the tumour regions and 100-200x for the associated normal.

SNV, and INDEL calling from multi-region DP and multi-region WE sequencing

Paired-end reads (2x100bp) in FastQ format sequenced by Hiseq or NextSeq were aligned to the reference human genome (build hg19), using the Burrows-Wheeler Aligner (BWA) vO.7.15. with seed recurrences (-c flag) set to 10000 (Li and Durbin, 2009). Intermediate processing of Sam/Bam files was performed using Samtools v1.3.1 and deduplication was performed using Picard 1.81 (http://broadinstitute.github.io/picard/) (Li and Durbin, 2009). Single Nucleotide Variant (SNV) calling was performed using Mutect v1.1.7 and small scale insetion/deletions (INDELs) were called running VarScan v2.4.1 in somatic mode with a minimum variant frequency (--min-var-freq) of 0.005, a tumour purity estimate (--tumor- purity) of 0.75 and then validated using Scalpel vO.5.3 (scalpel-discovery in - -somatic mode) (intersection between two callers taken)(Cibulskis et al. , 2013; Fang et al., 2016; Koboldt et al., 2009). SNVs called by Mutect were further filtered using the following criteria: i) £5 alternative reads supporting the variant and variant allele frequency (VAF) £ 1 % in the corresponding germline sample, ii) variants that falling into mitochondrial chromosome, haplotype chromosome, HLA genes or any intergenic region were not considered, iii) presence of both forward and reverse strand reads supporting the variant, iv) >5 reads supporting the variant in at least one tumour region of a patient, v) variants were required to have cancer cell fraction (CCF)>0.5 in at least one tumour region (see Subclonal deconstruction of mutations section for details of CCF calculation) , vi) variants were required to have CCF>0.1 to be called as present in a tumour region, vii) sequencing depth in each region need to be >=50 and £3000. Finally, suspected artefact variants, based on inconsistent allelic frequencies between regions, were reviewed manually on the Integrated Genomics Viewer (IGV), and variants with poorly aligned reads were removed. Dinucleotide substitutions (DNV) were identified when two adjacent SNVs were called and their VAFs were consistently balanced (based on proportion test, P>= 0.05). In such cases the start and stop positions were corrected to represent a DNV and frequency related values were recalculated to represent the mean of the SNVs. Variants were annotated using Annovar (Wang et al., 2010). Deleterious mutations were defined if two out of three algorithms - SIFT, PolyPhen2 and MutationTaster - predicted the mutation as deleterious. Individual tumour biopsy regions were judged to have failed quality control and excluded from analysis based on the following criteria: i) sequencing coverage depth below 100X, ii) low tumour purity such that copy number calling failed. Mutations detected in high-confidence driver genes ( VHL , PBRM1, SETD2, PIK3CA, MTOR, PTEN, KDM5C, CSMD3, BAP1, TP53, TSC1, TSC2, ARID1A, TCEB1) were defined as driver mutations. As TSC1 and TSC2 were not targeted in Panel v5, to check the mutation status in these two genes, patients were sequenced using Panel v5 were re-sequenced with Panel v6 and no new mutations were detected.

SCNA calling from multi-region DP and multi-region WE sequencing

To estimate somatic copy number alterations, CNVkit vO.7.3 was performed with default parameter on paired tumour-normal sequencing data (Talevich et al., 2016). Outliers of the derived log2-ratio (logR) calls from CNVkit were detected and modified using Median Absolute Deviation Winsorization before case-specific joint segmentation to identify genomic segments of constant logR (Nilsen et al., 2012). Tumour sample purity, ploidy and absolute copy number per segment were estimated using ABSOLUTE v1.0.6 (Carter et al., 2012). In line with recommended best practice all ABSOLUTE solutions were reviewed by 3 researchers, with solutions selected based on majority vote. Copy number alterations were then called as losses or gains relative to overall sample wide estimated ploidy. Arm gain or loss was called when >50% of the chromosomal have copy number gain or loss. Driver copy number was identified by overlapping the called somatic copy number segments with putative driver copy number regions identified by Beroukhim and colleagues (Beroukhim et al., 2009). We compared SCNA calls between targeted panel and WGS datasets, and SCNA concordance was 87% (Table S5). The average proportion of the genome with aberrant copy number, weighted on each of the 22 autosomal chromosomes, was estimated as the weighted genome instability index (wGII).

TCGA WES data analysis

To compare mutation frequency detected in TRACERx Renal cohort with public data (Figure 1 B and 1C), event calls from 451 TCGA KIRC patients were retrieved from cBioportal (http://www.cbioportal.org/) on 2017/07/21. To investigate the clonality of mutations in TCGA KIRC cohort, we obtained the next generation sequencing data for matched tumour and normal/blood from 338 cases in BAM format from TCGA, which were then converted into FASTG format files using bam2fastq in bedtools package (Guinlan and Hall, 2010). SNVs, INDELs and SCNAs were called using the same methods as TRACERx Renal data (STAR Methods: SNV, and INDEL calling from multi-region DP and multi-region WE sequencing, SCNA calling from multi-region DP and multi-region WE sequencing). 20 cases were excluded from the study as the ABSOLUTE v1.0.6 algorithm failed to find a stable SCNA solution, further details can be found in Table S4. Clonality of SNVs and SCNAs were estimated using ABSOLUTE v1.0.6. Cancer cell fraction for INDELs were calculated using method described in STAR Methods: Subclonal deconstruction of mutations. INDELs with CCF>0.5 were called clonal. ITH index for each patient was calculated as the measure of intratumour heterogeneity (ITH index = # subclonal drivers / # clonal drivers). However, due to the limitation of single biopsy, intratumour heterogeneity was found to underestimated (ITH index range 0-3, median=0.0, sd=0.41).

Quantification and statistical analysis

R 3.3.2 was used for all statistical analyses.

Saturation Analysis and Phenotypic Correlations

For saturation analysis, the mean number of variants observed for each subset of biopsies of a given size was calculated by exhaustive consideration of all such subsets when the total number of such subsets was less than 18 million and by consideration of a random collection of 18 million subsets, with possible repetition, when the total number of possibilities was greater. For phenotypic correlations, comparisons were performed using the Fisher's Exact test for 2x2 tables and the "non-parametric 2-way anova" Freidman test for n x m tables where at least one of n and m is greater than 2. P-values were corrected for multiple testing using the Benjamini-Hochberg procedure.

Subclonal deconstruction of mutations

To estimate the clonality of a mutation in a region, we used the following formula:

where vaf is the variant allele frequency at the mutation base; p is estimated tumour purity; CN_t and CN_n are the tumour locus specific copy number and the normal locus specific copy number which was assumed to be 2 for autosomal chromosomes; and CCF is the fraction of tumour cells carrying the mutation. Consider CN_mut is the number of chromosomal copies that carry the mutation, the possible CN_mut is ranging from 1 to CN_t (integer number). We then assigned CCF with one of the possible value: 0.01 , 0.02, ..., 1 , together with every possible CN_mut to find the best fit cancer cell fraction of the mutation. Since we focused on driver genes in this study and the accuracy of the estimated CCF is limited by the size of the panel, we call mutations with CCF>0.5 as clonal mutations, mutations with CCF£0.5 and CCF>0.1 are subclonal. To determine the clonality of a mutation in a tumour, we ask for the mutation to be clonal in all regions in a tumour. Exceptions were made for long INDELs that affect >6 bp of the genome, due to VAF under estimation. If a long INDEL is present in all regions of a tumour, we called it as clonal. To determine the clonality of a SCNA in a tumour, we ask for the SCNA to be presence in all tumour regions, otherwise it is called subclonal. Driver tree reconstruction

A matrix with presence and absence of nonsynonymous and synonymous point mutations, DNVs, IN DELs and arm level SCNAs was created for each tumour, and all the events were clustered based on the following rule: a valid cluster has to have at least two arm level SCNAs or one non-synonymous mutation. The driver events clusters were then ordered into a clonal hierarchy using TRONCO and presented as driver trees (De Sano et al., 2016). Clustering was performed on multi-region whole exome sequencing using PyClone Dirichlet process clustering (Roth et al., 2014). For each mutation, the observed variant count was used and reference count was set such that the VAF was equal to half the pre-clustering CCF. Given that copy number and purity had already been calculated, we set the major allele copy numbers to 2 and minor allele copy numbers to 0 and purity to 0.5; allowing clustering to simply group clonal and subclonal mutations based on their pre-clustering CCF estimates. PyClone was with 10,000 iterations and a burn-in of 1000, and default parameters, with the exception of -var_prior set to Ί3B’ and -ref_prior set to ‘normal’.

In terms of limitations, we recognise that our Driver Panel phylogenies are based on fewer clonal markers, as compared to whole exome or genome derived phylogenetic trees. As a consequence some tumour clones are based on only a limited number of genomic markers, however three contingency measures are in place to mitigate against phylogenetic misconstruction: i) ultra-deep 500x sequencing coverage, which ensures stably derived cancer cell fraction estimates, ii) a bespoke gene panel which is enriched for driver events, increasing the likelihood that mutational markers are driving genuine clonal expansion, iii) cross-capture validation with tree structures in >10 cases confirmed using exome sequencing data (Table S5). Furthermore, the panel sequencing strategy has allowed extensive tumour sampling, with >1 ,200 biopsies sequenced, enabling robustness in terms of spatial sampling.

Parallel evolution significance testing

All genes with > 10 subclonal mutations across the cohort were tested for evidence of parallel evolution (qualifying genes: BAP1, CSMD3, KDM5C, MUC16, MTOR, PBRM1, PTEN, SETD2. TSC1, TP53). For each gene the observed number of parallel mutations across the 100 case cohort was compared to a null distribution of the expected number of subclonal mutations co-arising in different tumour regions within the same case due to chance. To simulate the null distribution the mutation frequency of each gene per biopsy region was calculated, based on total number of unique subclonal mutations for that gene (cohort wide) divided by the total number of biopsies sequenced (cohort wide). This probability was then used in a simple Bernoulli trials model simulated for each patient, with the number of trials based on the number of biopsy regions sequenced per case. This model allows for the fact that cases with a large number of sampled regions have high chance of co-arising mutations in different biopsy regions by chance rather than due to parallel evolution. The total count of co-arising mutations by chance was calculated across the 100 case cohort (using the specific number of biopsy regions per case) and then compared to the observed number parallel events. Significance was determined through 1000 permutations per gene, with resulting p-values corrected for multiple testing using the Benjamini-Hochberg procedure.

Detection of allelic imbalance

Heterozygous SNPs called using germline variants were identified using VarScan v2.4.1 in mpileup2snp mode. SNPs used must be called in all regions of the tumour and have a B- allele frequency (BAF, total variant base / total reference bases at a position) of between 0.35 and 0.65 in the germline sample. Mean absolute deviation (MAD) from 0.5 calculated for all heterozygous SNPs on each arm in all samples: mean (abs(arm_hz_BAF - 0.5)). The germline MAD was then subtracted from all tumour region MADs for each patient’s disease for all chromosome arms. Copy neutral allelic imbalance was then called if: 1) There is no copy number event (gain or loss) associated with the chromosome arm in a sample but there is a MAD of >= 0.1. 2) There is no copy number event associated with the chromosome arm in a sample but its MAD is >= the median MAD of gain/loss events in this sample and is also >= 0.05. 3) If a patient’s disease has the same chromosome arm exhibiting copy neutral allelic imbalance in 2 or more regions by the above the two criteria, the same chromosome arm in the other regions is re-examined using the lowest quartile MAD of gain/loss events in each region as a cut off and has a MAD of >=0.05.

Calculating clonality of copy neutral allelic imbalance (CNAI): Only regions with at least one chromosome arm exhibiting a MAD score of greater than 0.05 were considered for this analysis. Regions with no MAD score greater than 0.05 are marked on the patient specific supplementary figures “low purity”. Copy neutral allelic imbalance calls are shown as diamonds in the patient specific copy number plots attached in this email. The CNAI occurrences in each patient were then grouped into the following categories: Clonal CNAI - All regions of the tumour have no copy number gains or losses associated with this chromosome arm but all have been classified as exhibiting CNAI. Clonal loss and CNAI - All regions of the patient’s disease have either a loss being called or exhibit CNAI for this chromosome arm. Detection of mirrored subclonal allelic imbalance (MSAI)

In order to detect mirrored subclonal allelic imbalance (MSAI) allele counts were generated using AlleleCounter (github.com/cancerit/alleleCount) (see companion paper Mitchell et al 2018). The counts from whole exome sequenced samples were analysed using ASCAT (Van Loo et al., 2010) to generate copy number calls. Whole-genome samples were analysed using Battenberg (Nik-Zainal et al., 2012) to generate copy number calls (see companion paper, Mitchell et al 2018). Heterozygous SNPs among the 1000 genomes positions (Genomes Project et al., 2010) used as input for ASCAT/Battenberg analyses were identified by isolating those which had a B-allele frequency (BAF) of between 0.3 and 0.7 (calculated by variant reads over total reads) in the germline sample for each patient. The BAFs of these heterozygous SNPs were then used with the segmentation and copy number calls produced for each region by either ASCAT or Battenberg analyses to detect MSAI events for each patient’s disease using the method outlined previously (Jamal-Hanjani et al., 2017).

Using the heterozygous SNPs present in the targeted regions detected by Driver Panel sequencing we identified allelic imbalance (Al) at the level of chromosome arms. In some cases the Al was not associated with a copy number gain or loss relative to the sample’s ploidy and was classified as copy neutral allelic imbalance (CNAI) (STAR methods). In total, we identified 18 cases where one or more chromosome arms demonstrated clonal CNAIs (34 events total) and 8 patients where, at least one chromosome arm was always affected by either loss relative to ploidy or CNAI (13 events total). 5 of these 8 patients also demonstrated instances of ubiquitous arm level CNAI in all regions.

Validation of MSAI

Validation of MSAI was achieved using Polymorphic microsatellite markers specific to the chromosome and chromosome region being investigated. Once a polymorphic marker is identified, patient DNA is amplified in the PCR, incorporating a fluorescent primer into the PCR fragment that can be accurately measured for size and fluorescent intensity. Measurement of Fluorescent units under each allele peak can be used to compare and contrast variation between alleles within and between different tumour regions and the normal sample using the formula (At/Bt)/(An/Bn).

Co-occurrence testing

Co-occurrence of driver events in each tumour was conducted based on the driver tree clones as determined above. Analysis was conducted on the most frequent driver mutational events ( BAP1 , PBRM1, SETD2, VHL, Figure 1 B), the most frequent SCNAs (3p loss, 5q gain) and SCNA events with established clinically prognostic value (loss 4q, loss 9p, loss 14q and gain 8q) (Ito et al. , 2016; Kojima et al. , 2009; La Rochelle et al. , 2010; Monzon et al. , 2011 ; Perrino et al., 2015). For each event pairing tumour clones were assessed to determine if the given two events were found to co-occur together in the same clone. Analysis was first conducted using only the“MRCA” clone per case (n=100), to ensure independence of observations at the patient level (for bilateral/multi-focal cases the first/left tumour was taken in each case). Analysis was then repeated using“truncal plus subclonal” clones (total n=306 across all tumours, with the set of subclones defined as unique terminal tree nodes). R package‘cooccur’ (Griffith A, 2016)was used to compare observed event co occurrence frequencies to those expected by chance under a probabilistic model. The distribution of observed and expected values is shown in Figure 10. Values were plotted as enrichment scores calculated as log₂(observed count/expected count). Only patterns found to be significant in both the“truncal” and“truncal plus subclonal” were considered significant overall. Correction for multiple testing was conducted using the Benjamini-Hochberg procedure.

Most recent common ancestor (MRCA) and ki67 analysis

The estimated time of MRCA was calculated using multi-region whole genome sequencing data as detailed in the companion paper by Mitchell et al., (Cell 2018). From the total n=33 cases with WGS data, MRCA timing analysis was successful in n=31 cases, from which known VHL wildtype cases (n=2) were excluded on account of their distinct aetiological and phenotypic profile. Of the n=29 cases analysed, n=23 overlapped with the renal TRACERx Renal 101 cohort cohort presented here, and n=6 were additional ccRCC patients recruited to the TRACERx Renal Renal study. The association between time from MRCA to tumour diagnosis and number of clonal driver events was assessed using a linear model, adjusting for the total clonal mutation burden per tumour. The association between tumour region ki67 % of cells stained as positive and number of clonal driver events was assessed using a linear mixed effect (LME) model, to account for the non-independence of multiple samples from individual patients, using all cases with available data in the TRACERx Renal 101 cohort after exclusion of known VHL wildtype tumours.

Event ordering analysis

The ordering of driver events was based on the clonal hierarchy of each tumour, as determined by driver tree reconstruction method detailed above. Due to dense spatial sampling (median 7 biopsies per tumour, range [3-75]) the driver tree ordering was typically robust, with evidence of sequential waves of clonal expansion between events usually confirmed across multiple biopsy regions. The set of sequential event paths (i.e. event A > event B > event C) for each tumour was captured starting with the events in the MRCA clone. For each MRCA event, evolutionary sequences were traced through each node of the tree until a terminal clone was reached. All possible sequential paths (trajectories) between MRCA and terminal clone events were recorded. To reduce risk of multiple testing we limited further analyses to those trajectories containing the most frequent (“core”) ccRCC driver events: VHL, PBRM1, BAP1, SETD2, PI3K/AKT/mTOR pathway mutations or driver SCNAs. The list of trajectories was further reduced to ensure pairings of events were counted only once per case, (e.g. in the case of K243 where a single PBRM1 mutation precedes 10 SETD2 mutations, this is counted only once) and PI3K/AKT/mTOR pathway mutations interacting with SCNAs were not considered due to the nonspecific many-to-many relationship. The final list of trajectories was analysed using R package Trajectory Miner (Gabadinho et al., 2011) to identify recurrent patterns of event pairs enriched for occurrence is a consistent direction. Event pairings observed in ten or more cases were then tested for significance in a specific ordering direction using a Binomial test, with null expected p=0.5, to reflect an equally balanced 50%:50% distribution of event ordering by random chance. As expected, VHL was found to be significantly enriched as an early event preceding all other alterations, consistent with its known timing as a universally clonal event (data not shown in figure). All p-values were corrected for multiple testing using the Benjamini-Hochberg procedure.

Evolutionary subtype classification

Based on the evolutionary analysis in Figures 4A-C a rule based classification was devised in order to assign cases into subgroups and allow for comparison against phenotypic and clinical outcomes. Cases were assigned to groups based on the following series of rules (applied in a hierarchical manner in the order listed): i) presence of ³ 2 BAP1, PBRM1, SETD2 or PTEN clonal mutational events meant assignment to“multiple clonal driver” group (the selection of these four genes is based on the timing results observed in Figure 4B) , ii) presence of a tumour clone/subclone with a BAP1 mutational driver event, and no other “core” mutational driver events aside from VHL in that same clone/subclone, meant assignment to the “BAP1 driven” group, iii) presence of a tumour clone/subclone with PBRM1 mutation followed by a SETD2 mutation, meant assignment to the “PBRM1- >SETD2” group, iv) presence of a tumour clone/subclone with PBRM1 mutation followed by a PI3K pathway mutation, meant assignment to the“PBRM1-> PI3K” group, v) presence of a tumour clone/subclone with PBRM1 mutation followed by a driver SCNA event, meant assignment to the“PSR/W7->SCNA” group, vi) absence of VHL mutation or methylation meant assignment to “VHL wildtype” group, vii) presence of VHL as the only “core” mutational driver event meant assignment to the“VHL monodriver” group. For bilateral/multi focal cases the evolutionary subtype was assigned based on the first/left tumour in each case. To test the stability and validity of the rule based classification an unsupervised clustering analysis was additionally performed, using R function daisy, with the distance matrix computed using Gower’s formula on account of the mixture of continuous and binary data types. Clustering was conducted based on the following measures: wGII (minimum and maximum regional values per tumour), tumour size (mm), clone number, ITH index, number of clonal driver events and presence/absence of the six observed evolutionary patterns (, BAP1 lone driver clone/subclone, PBRM1->SETD2 clone/subclone, PBRM1-> PI3K clone/subclone, PBRM1-> SCNA clone/subclone, VHL mutational status, VHL as the only “core” mutational driver event). Clustering was performed using a partitioning around medoid method, with cluster number from 2 to 15 considered, and a 10 cluster solution resulting as the optimal solution. Overall high concordance in cluster assignment was observed between the rule based and unsupervised methods, and in the unsupervised method three additional subgroups were identified, the groups are referred to just by cluster number due to currently unclear evolutionary aetiology): cluster 5 which was characterised by low clone number (median=2) and small size (mean=6.7cm), cluster 7 which exhibited high wGII, and cluster 9 with branched structure (median 11 clones) and large size (mean=10.9cm).

Survival analysis

Survival analysis was conducted using the Kaplan-Meier method, with p-value determined by a log-rank test. Progression free survival (PFS) was defined as the time to recurrence or relapse, or if a patient had died without recurrence, the time to death. In the TRACERx cohort, overall survival (OS) was measured as cancer specific death. For the TCGA cohort, all death events were included in the PFS/OS analyses (consistent with the original author’s analysis of the data, on account of a lack of cause of death data). Hazard ratio and multivariate analysis adjusting for clinical parameters was determined through a Cox proportional hazards model.

Downsampling simulation

Empirical error rates were determined by exhaustive consideration of all pairs of biopsies from a given tumour sample and, for each pair, comparing the number of variants detected in one or more of the full set of biopsies not found in either member of that pair ("False negative") or determined to be subclonal in the full set but detected in both samples in that pair ("illusion of clonality"). Each tumour is then represented by the mean value of each of these estimates across all pairs. Results

Intratumour heterogeneity of driver events in primary ccRCC

Clinical annotation of the 101 patients under study is provided in Table S1. Demographic and stage distribution were consistent with the referral patterns of the participating centres. All the samples were profiled using a bespoke sequencing panel targeting -110 putative ccRCC driver genes (STAR Methods: Driver Panel; Figure 8A: CONSORT diagram). This approach enabled us to maximise the sequencing depth, a critical factor for correctly inferring evolutionary trajectories (Noorbakhsh and Chuang, 2017). Single nucleotide variants (SNVs), dinucleotides variants (DNVs), small insertion and deletions (INDELs) and somatic copy number alterations (SCNAs) were successfully derived from 1206 regions across 106 primary tumours (median 7 (range 3-75) regions per tumour) from 101 patients, as five patients donated pairs of primary tumours. Within the same cohort, 107 regions from 17 tumours were profiled by whole exome sequencing (WES), 81 regions from 27 tumours by whole genome sequencing (WGS), with six further tumours from the broader TRACERx Renal cohort also profiled by WGS (Figure 8B).

Median sequencing coverage across 1206 tumour regions profiled by the Driver Panel was 612x (range 105-1520x). We identified a total of 740 somatic mutations including 538 SNVs (440 non-synonymous SNVs), 7 DNVs and 195 INDELs (Table S2). We specifically considered non-silent mutations in high-confidence ccRCC driver genes (termed driver mutations, annotated in Figure 1A; STAR Methods). The median number of driver mutations was 3, range 0-15 per tumour (Figure 1A). VHL mutations were the only consistently clonal event, present in 77/106 tumours (Figure 1A). VHL was methylated in 17 additional tumours (Figure 1A). One tumour harboured a mutation in the TCEB1 gene, a part of the VHL complex (Hakimi et al. , 2015) (Figure 1A), thus 90% (95/106) of the tumours harboured clonal disruption of the VHL pathway. 4/11 VHL wild type tumours (K206, K228, K427 and K446, Figure 1A) had evidence of sarcomatoid differentiation (Table S1A), a feature reported to be associated with a lower frequency of VHL mutations (Malouf et al., 2016; Wang et al., 2017). K255, another VHL wild-type tumour, had evidence of both clear cell and papillary histology, and we observed SCNAs specific to both, including gains of 5q and 16. We observed no mutations in the known ccRCC driver genes in K110 (Figure 1A), but the copy number profile, involving whole chromosome losses on 1 , 6, 10 and 17, was consistent with chromophobe RCC (Davis et al., 2014). Additional pathology review confirmed chromophobe histology and K110 was removed from all subsequent analyses.

The overall frequency of driver mutations was higher in our cohort compared to the published single biopsy studies (Cancer Genome Atlas Research, 2013; Sato et al., 2013; Scelo et al. , 2014) (Figure 1 B). Notably, the frequency of VHL mutations in our and Scelo studies was higher than that reported in the TCGA and Sato studies, potentially due to the higher overall number of VHL INDELs called (Figure 1 B). VHL INDELS in the TRACERx Renal cohort were confirmed by Sanger sequencing. The higher frequency of mutations in other driver genes was due to the detection of subclonal events through multi-region profiling in our cohort (Figure 1 B).

An important goal of the TRACERx Renal study is to determine the contribution of SCNAs to clonal evolution. Recurrent SCNAs occur at a limited number of genomic sites in ccRCC (Beroukhim et al., 2009; Cancer Genome Atlas Research, 2013), usually as whole chromosome or chromosome arm events; and the rate of genome doubling in ccRCC is low (Zack et al., 2013). Therefore, recurrent SCNAs can be reliably detected by the Driver Panel, as shown by the high level of concordance with WGS results (Table S2). We measured the fraction of the tumour genome affected by SCNAs using the weighted genome instability index (wGII) (Endesfelder et al., 2014), taking the maximum observed wGII score across all regions per tumour. Maximum values were utilised in order to capture the highest risk, and hence most clinically relevant, subclones in each tumour (STAR Methods). Median wGII in the TRACERx Renal cohort was 32.8% (range 4.7% - 97.4%). All SCNAs were annotated using previously defined cytobands (Beroukhim et al., 2009) to quantify driver SCNAs (Figure 1A, STAR Methods). In total, we detected 751 driver SCNAs; median 7, range 1-14 per tumour (Figure 1A).

Loss of chromosome 3p, which is pathognomonic with ccRCC and encompasses four commonly mutated genes (VHL, PBRM1, SETD2 and BAP1 ), was observed in all but five tumours (K021 , K375, K354, K255, K114R; Figure 1A). Three had clonal 3p copy neutral allelic imbalance (CNAI) (STAR methods) (K021 , K375, K354,), consistent with biallelic inactivation of mutated 3p driver genes. Driver SCNA 3p25.3 (which contains the VHL locus) was subclonal in five tumours: one with a VHL mutation (K252, Figure 1A), one with VHL methylation (K070, Figure 1A); one VHL wild type with a bi-allelic SETD2 mutation (K427, Figure 1 A); and two with no mutations in any of the 3p genes (K169, K446, Figure 1A).

The overall frequency of driver SCNAs was higher compared to the published single biopsy studies (Cancer Genome Atlas Research, 2013; Sato et al., 2013; Scelo et al., 2014) due to the detection of subclonal SCNAs in our cohort (Figures 1C). Notably, the frequency of SCNAs with reported prognostic significance, such as loss of chromosomes 14q and 9p, and gain of chromosomes 8q and 12p is markedly underestimated in single biopsy studies (Cancer Genome Atlas Research, 2013). Overall ITH was measured as an index (ITH index = # subclonal drivers / # clonal drivers, where“drivers” include all driver mutations and driver SCNAs shown in Figure 1A (STAR methods)). Median ITH index value was 1 , with a high variability across the cohort (range 0-13.5; standard deviation = 2.16).

Clonal evolution and clinical variables in ccRCC ccRCC prognostic variables include primary tumour size, overall tumour stage (TNM), Fuhrman grade and the presence of necrosis. Overall, the number of driver events was significantly associated with all of these parameters, with the associations specific to subclonal, and not clonal events (Figure 9). Similarly, higher ITH index values were associated with advanced tumour size, stage and grade (Figure 9). Clonal ordering techniques (see STAR methods) were used to infer clonal structures and driver phylogenetic trees (Figure 29). The median number of clones detected was 4 per tumour (range 1-23). Clone number increased with tumour stage and grade (Figure 9), but showed a non-linear association with tumour size, initially increasing in line with tumour dimensions but then plateauing at ~10cm beyond which clone number began to marginally reduce with increasing size. In conclusion, known prognostic parameters are associated with an increasing repertoire of driver alterations and subclonal driver diversification in ccRCC.

Convergent Evolution

We profiled three patients with synchronous bilateral ccRCCs and two patients with multifocal ccRCCs, with no family history of ccRCC, or germline mutations in the known ccRCC predisposition genes (Table S1A). All five tumour pairs evolved independently, but converged on the VHL pathway. K265, K352, and K334 harboured distinct mutations in VHL and 3p loss events in each of the tumours (Figure 1 A). The right-sided K097 tumour harboured a VHL mutation and VHL was methylated in the left tumour (Figure 1A, Figure S2). Left K114 tumour harboured a VHL mutation and 3p loss, while in the right tumour we detected a TCEB1 mutation with the loss of 8q21.11 , encompassing the TCEB1 locus (Figure 1A). K150 tumour was presumed to be a contralateral renal metastasis from a previously resected left high-risk ccRCC. However, the two tumours had distinct VHL mutations implying a case of bilateral metachronous ccRCCs. Our findings illustrate the importance of molecular profiling of patients presenting with multiple renal tumours to guide appropriate clinical management.

Parallel evolution

We and others have reported parallel evolution of mutations in the same genes or pathways within distinct tumour subclones in ccRCCs (Brastianos et al. , 2015; Gerlinger et al. , 2014). In the TRACERx Renal cohort, 13% of untreated primary tumours had evidence of parallel evolution, with SETD2, BAP1 and PTEN (all p<0.05, False Discovery Rate (FDR) < 0.1 , Figure 3) significantly enriched for parallel evolution, corrected for the number of profiled regions. Certain tumours were notable for the number of parallel events they harboured, for example K243 had 10 distinct SETD2 mutations (Figure 3). In tumour K448, we observed 5 distinct BAP1 mutations, and 3 SETD2 mutations, but BAP1 and SETD2 mutations never co occurred within the same clone.

We recently identified parallel evolution of SCNAs in non-small cell lung cancer (Jamal- Hanjani et al., 2017) through mirrored subclonal allelic imbalance. We analysed the incidence of MSAI in a subset of TRACERx Renal patients where whole genome or exome sequencing data were available (n=41) (STAR Methods) and observed MSAI events in 15/41 tumours (STAR Methods), a subset of which were validated by an orthogonal method. Parallel loss of chromosome 14q was the most common event (4 patients), encompassing the ccRCC tumour suppressor HIF1A locus (Shen et al., 2011).

Identification of conserved ccRCC evolutionary features

To understand the constraints of ccRCC evolution we analysed conserved patterns of driver event co-occurrence, mutual exclusivity and timing to identify statistically significant patterns. We utilised the clonal/phylogenetic hierarchy determined for each case (STAR methods), in order to accurately place driver events within the same tumour subclone, and establish the relative ordering of driver events across the evolutionary path of each tumour.

In our analyses of event co-occurrences at the clone level (STAR methods) we observe an enrichment for mutual exclusivity between BAP1 and SETD2/PBRM1 mutations (Figure 4A). However, at a patient level these events were found to co-occur (Figure 1A), often in separate spatially distinct major tumour subclones (e.g. K153, Figure S3). BAP1 had a propensity for being a lone additional mutational driver event in VHL- mutant clones, whilst PBRM1 and SETD2 were enriched for mutual clonal co-occurrence. Due to limited sample size these patterns did not reach formal significance, however we note the results are in agreement with previously published patient-level meta-analysis (Pena-Llopis et al., 2013).

Of all the driver mutations, BAP1 was associated with the highest number of driver SCNAs in the same clone (Figure 4A, Figure 10, p=0.032 for BAP1 versus no mutational drivers), consistent with its role in chromosomal stability (Peng et al., 2015). Overall, the strongest evidence for co-occurrence was found for the following pairs of driver SCNAs: 14q loss with 4q loss, 14q loss with 9p loss and 4q loss with 9p loss (Figure 4A, all p<0.05). These pairs of events were all found to co-occur ³ 1.8 x more frequently than expected by chance. We validated these observations in the TOGA ccRCC data (all p<0.05, Figure 10), showing that the specific event pairings co-occurred together beyond the general expected correlation between SCNAs (e.g. for 14q loss, the most common partner event genome wide was 9p loss, Figure 10). We note that these SCNAs harbour well-known tumour suppressors 14q31.1 - HIF1A (Shen et al., 2011), 9p21.3- CDKN2A (Beroukhim et al., 2009) and 4q- CXXC4 (Kojima et al., 2009).

In our previous report of ten ccRCC tumours (Gerlinger et al., 2014) mutations in VHL and loss of 3p were consistently clonal, and PBRM1 was an additional clonal driver mutation in three cases. In our current prospective cohort, we observed a subset of cases that harboured two or more additional clonal driver mutations, aside from VHL. Simulated models of tumour growth (Reiter et al., 2013) suggest that just one additional driver will significantly increase the growth rate, and we utilised WGS molecular clock timing data (see companion paper Mitchell et al. 2018) to test this hypothesis in our data. Time to presentation was calculated as the time elapsed from the emergence of the most recent common ancestor (MRCA) to clinical diagnosis. The median time to presentation from the emergence of the MRCA for cases with VHL as the only clonal driver mutation, (n=14 cases, 48% of the WGS cohort) was 28 [4 - 49] years. The addition of one further clonal driver mutation (n=13 cases) was associated with a shortening of time to diagnosis, to 5 [1 - 34] years, while the addition of two further clonal driver mutations (n=2 cases) shortened the time to diagnosis to 5 [4 - 7] years (p=0.007, Figure 4B). Despite the shortened time of tumour growth, tumour size was found to be comparable across all the groups (Figure 4C), and we observed no difference in the mode of presentation (incidental versus symptomatic) across the three groups, suggesting there was no lead-time bias. Overall, the groups had the same total median number (n=3) of driver mutations (considering clonal and subclonal events). Assessment of proliferation by multiregional Ki67 immunohistochemistry (IHC) staining (STAR Methods) showed elevated proliferation index in cases with additional clonal driver mutations

(p=0.034, Figure 4D, Table S3), consistent with the timing analysis.

Order of Events During ccRCC Evolution

The order in which driver events are acquired can have prognostic and therapeutic implications, as shown by Ortmann and colleagues with respect to the order of JAK2 and TET2 mutations in myeloproliferative neoplasms (Ortmann et al., 2015). We considered the ordering of driver events in ccRCC, assessing for recurrent patterns of driver events preceding or following one another. To conduct this analysis, we traced all possible evolutionary trajectories, starting at the base of each driver tree and tracing the path through to each terminal subclone, considering all possible sequential paths between events (Figure 4E). Due to the dense spatial sampling in this cohort the driver tree ordering was typically robust, with evidence of sequential waves of clonal expansion between events usually confirmed across multiple biopsy regions. In order to reduce the risks of multiple testing we limited further analyses to those trajectories containing the most frequent ccRCC driver events: VHL, PBRM1, SETD2, BAP1, PI3K/AKT/mTOR pathway mutations or driver SCNAs (Figure 1 B). Event combinations which we observed in ten or more cases were then tested for significance in the ordering pattern (STAR Methods). Six significantly conserved patterns were detected (all FDR<0.05), the first three of which confirmed VHL as a universally preceding event, as expected. In addition, PBRM1 mutations were found to consistently precede PI3K pathway mutations, SETD2 mutations and driver SCNA events (Figure 4E). In many of these cases the event sequences were observed exclusively in one direction, i.e. PBRM1 precedes SETD2 in 11 separate cases, but the opposite was never observed.

Evolutionary Subtypes

A pertinent question is whether conserved patterns of ccRCC evolution relate to distinct clinical or biological phenotypes; to investigate this in an exploratory context we classified all the tumours under study according to the patterns observed in the evolutionary order, timing and co-occurrence analyses (Figure 4). Seven evolutionary subtypes were defined (Figure 5A) using a rule based classification system (STAR Methods), which was validated by unsupervised clustering. Subtypes were compared across different genomic and clinical metrics (STAR Methods) including levels of wGII, percentage of cells positive for Ki67, ITH index, clonal structure and clinical parameters including stage, percentage of tumours that are Fuhrman grade 4 (%G4) or presence of microvascular invasion (%MVI) (Figure 5). The first subtype consisted of tumours with“multiple clonal drivers” (defined as ³ two BAP1, PBRM1, SETD2 or PTEN clonal mutations), and was characterised by high levels of wGII (9 out of 12 cases with wGII > cohort wide median value), enrichment for late stage disease (all cases were stage III+) and a high level of %MVI / %G4 / %Ki67. These tumours harboured a smaller number of clones (average = 5, range (1-14)) and had little evidence of ITH (1 out of 12 cases had ITH > cohort wide median value) (Figure 5, STAR Methods). This pattern would be consistent with sufficient selective fitness being achieved within the dominant clone through fixation of multiple driver mutations and SCNAs causing a clonal sweep at an early stage of tumourigenesis.

A second and related subtype comprised“BAP1 driven” cases characterised by tumour clones with BAP1 as a lone mutational driver in addition to VHL (Figure 5). Where the tumours harboured other driver mutations, they were never found in the same subclone as the BAP1 mutation (K448, K252, K153, K136, Figure 1). This group was enriched for tumours with elevated wGII (8 out of 12 > median), fewer clones and a higher tumour grade (%G4). This pattern suggests that BAP1 mutations coupled with SCNAs afford a fitness advantage such that no additional driver events become fixed making them terminal drivers within individual clones. The third subtype consisted of“VHL wildtype” tumours,

characterised by high ki67% (highest across all groups), elevated levels of wGII, potentially compensating for a lack of driver mutations, and additional phenotypic differences such as frequent presence of sarcomatoid differentiation.

The fourth subtype were“PBRM1 SETD2” driven, a group characterised by highly branched trees (>10 clones per tumour; range (3-23)), the highest mean ITH score in the whole cohort, lower ki67%, frequent parallel evolution events and advanced disease stage (Figure 5). This pattern would be consistent with the notion of slower branched growth, with early PBRM1 mutations followed by strong and repeated selection for SETD2 mutations. Supporting this notion was the average time to progression (defined as time to progression following cytoreductive nephrectomy, or the time to relapse following nephrectomy with curative intent) in this group (1 1.7 months), which was more than twice as long as that for “multiple clonal driver”,“BAP1 driven” and“VHL wildtype” tumours (4.7, 5.9 and 4.5 months respectively, not formally significant). Critically, the observed features of this subtype were independent of tumour size, with no significant difference between the highly branched “PBRM1 SETDZ (mean tumour size 105mm, Table S1 B) and the more monoclonal “multiple clonal driver” subtype (mean tumour size 107mm, Table S1 B). The fifth and sixth subtypes were

and

characterised by early PBRM1 mutation followed by mutational activation of the PI3K/AKT/mTOR pathway or subclonal SCNAs, respectively, and enriched for lower grade tumours.

The final evolutionary subtype consisted of the“VHL mono-driver” tumours, which displayed limited branching and a monoclonal structure, with no additional driver mutations, and low wGII. The majority of tumours in this group presented at an early stage (mean tumour size 45mm) suggesting they may be an early evolutionary ancestor of the more complex subtypes described above. Small renal masses (SRMs) without evidence of vascular or fat invasion (T1a) are an increasingly common clinical entity, which can potentially be managed by active surveillance (Jewett et al. , 2011). We note that the only £4 cm tumour that was upstaged due to the presence of renal vein invasion (K021) was in the“multiple clonal driver” category, consistent with this evolutionary path enhancing vascular invasion independent of tumour size.

Specific evolutionary subtype cases could not be assigned in 37 cases from a wide distribution of disease stages (stage 1=12, ll=2, 111=16, IV=7). These tumours are likely to be driven by rarer evolutionary patterns not yet identifiable with current sample sizes. Several appeared to exhibit precursor subtype features, for example clonal VHL mutation, followed by PBRM1 mutation in a major subclone, that may have continued to evolve if they remained in situ. Further elucidation of the genomic and non-genomic drivers of evolutionary subtypes in larger datasets will be of major interest.

ITH index and saturation of ccRCC driver events

While pervasive ITH has been described in multiple tumour types, only one prospective study of multiregional tumour profiling has been reported to date (Jamal-Hanjani et al.,

2017). TRACERx Renal, with 1206 primary tumour biopsies profiled across 101 ccRCC cases, affords an unprecedented opportunity to systematically explore the ITH extent. In a subset of tumours (n=15) which underwent extensive sampling (>20 biopsies), we considered driver event (mutation and SCNA) saturation, measured as the proportion of events discovered with each additional tumour region profiled. Our analysis revealed a wide spectrum of saturation gradients (Figure 6A), highlighting the challenge of attempting to establish a biopsy count reliably applicable to all ccRCCs. Accepting this caveat, and considering all the tumours with ³15 biopsies (n=20) we calculated the stepwise change in driver event discovery when using between 1 to 15 biopsies (Figure 6B). On average, two biopsies were required to detect ³ 50% of all variants and seven for ³ 75% variants (Figure 6B). As expected, these values changed markedly based on tumour ITH, with homogenous tumours (£median ITH index) achieving ³ 0.75 detection within four biopsies, as opposed to eight biopsies required for heterogeneous tumours (>median ITH) (Figure 6B). Splitting instead by evolutionary subtype, fewest biopsies were needed to reach 0.75 driver detection in the“multiple clonal driver” and“VHL monodriver” groups, and largest number for“PBRM1- ->SETD2’ tumours (Figure 6C).

We considered the utility of a radiologically guided two-site biopsy approach, for primary tumours which present as an SRM, or larger tumours without (M0) or with metastases (M1). We down-sampled our dataset to two biopsies per tumour (STAR methods), and considered the mean results across all possible combinations to simulate how many subclonal driver events would be missed and how many subclonal events would be misclassified as clonal (“illusion of clonality”). For the SRM group 11/15 tumours had an average of £1 driver event missed and £1 driver event misclassified as clonal with a paired biopsy approach (Figure 6D, panel 1). For larger tumours, whether metastatic or not, performance was less favourable, with the majority suffering from multiple missed subclonal drivers and/or events misclassified as clonal (Figure 6D, panels 2&3). For these tumours, our data suggests that a range of four to eight biopsies is required to capture the majority of events (³75% detection), although this approach may still miss some important drivers. Clonal evolution and clinical significance

Association of the ITH index and disease progression was a pre-defined endpoint of the TRACERx Renal study (Turajlic and Swanton, 2017). Patients whose tumours had high ITH index (>median value) had significantly reduced progression free survival (PFS), compared to those with low ITH index (p=0.0160 log-rank, hazard ratio (95% Cl) HR = 2.4 [1.1 - 5.2]). Due to the small sample size the association was not significant when adjusted for known prognostic variables in a Cox proportional hazards model (p=0.4800 adjusted) (Figure 7A, STAR Methods). As elevated wGII was consistently enriched in the high risk evolutionary subtypes, we also considered its association with PFS. Patients in our cohort whose tumours had high wGII (>median value) had a non-significant trend towards shorter PFS compared to those with low wGII (p=0.0717 log-rank HR = 1.9 [0.9 - 4.0], p=0.9400 adjusted, Figure 7A). We further investigated ITH and wGII metrics in the larger and more robustly powered TCGA KIRC cohort, and found both measures to be significantly associated with PFS (p=0.0021 HR = 1.9 [1.2 - 2.8] and p=0.0004 HR = 2.1 [1.4 - 3.3] respectively, log- rank). This association remained independently significant after adjusting for stage and grade (p=0.05 HR = 1.5 [1.0 - 2.3] and p=0.02 HR = 1.7 [1.1 - 2.6] respectively, adjusted, Figure 7A), and in addition both measures were found to be significantly associated with overall survival (OS) in an adjusted analysis (p=0.04 HR = 1.7 [1.0 - 2.7] and p=0.04 HR = 1.7 [1.0 - 2.8] respectively, adjusted, Table S4). We note that the single biopsy approach is likely to have reduced the sensitivity to detect ITH and subclonal SCNAs in the TCGA cohort.

Next, we considered ITH and wGII measures in combination, to ascertain if a low score in one measure but high in the other was sufficient on its own to be associated with increased patient risk. Significantly reduced survival was observed in all groups compared to“Low ITH and Low wGII”, suggesting that either driver event intratumour heterogeneity, or a

homogeneous profile with high wGII (e.g. "Multiple Clonal Driver" evolutionary subtype), were the underlying factors associated with poor prognosis (TRACERx Renal 100: p=0.0019 log-rank, p=0.7500 adjusted, TCGA PFS: p=0.0025 log-rank, p=0.0041 adjusted, Figure 7A, TCGA OS: p=0.0001 log-rank, p=0.0040 adjusted.

We finally considered whether ITH and wGII measures associated with the pattern of metastatic progression. Within our cohort, 37 patients had metastatic disease and we classified their disease progression (following cytoreductive or curative intent nephrectomy) into“rapid” or attenuated” (STAR Methods). 67% (n=9) of "Low ITH, High wGII" patients had rapid progression, as compared to 18% (n=28) in the other three groups (p=0.0106, Fisher’s exact) (Figure 7B). Although limited by a small number of events (n=14), overall cancer- specific survival analysis (as opposed to PFS) in our cohort also demonstrated an association between ITH / wGII metrics and patient survival (p=0.0065 log-rank). The shortest survival time was observed in the“Low ITH, High wGII" group, further highlighting the aggressive nature of homogeneous tumours with high clonal wGII, a measure reflecting early fixation of chromosomal complexity (Figure 7C).

Example 2 -Determination of evolutionary sub-types of clear cell clear cell renal cell carcinoma (ccRCC)

Materials and methods

Experimental model and subject details Patients were recruited into TRACERx Renal, an ethically approved prospective cohort study (National Health Service Research Ethics Committee approval 11/LO/1996). The study sponsor is the Royal Marsden NHS Foundation Trust. The study is coordinated by the Renal Unit at the Royal Marsden Hospital NHS Foundation Trust. The study is open to recruitment at the following sites: Royal Marsden Hospital NHS Foundation Trust, Guy’s and St Thomas’ Hospital NHS Foundation Trust, Royal Free Hospital NHS Foundation Trust and Western General Hospital (NHS Lothian). Patients were recruited into the study according to the following eligibility criteria:

Inclusion criteria

• Age 18- years or older · Patients with histologically confirmed renal cell carcinoma, or suspected renal cell carcinoma, proceeding to nephrectomy/metastectomy

• Medical and/or surgical management in accordance with national and/or local guidelines

• Written informed consent (permitting fresh tissue sampling and blood collection; access to archived diagnostic material and anonymised clinical data)

Exclusion criteria

Lack of adequate tissue Further eligibility criteria were applied to the cohort presented in this paper (it therefore follows that these patients do not have consecutive study ID numbers from 001 to 100):

• Confirmed histological diagnosis of clear cell renal cell carcinoma.

• No family history of renal cell carcinoma.

• No known germline renal cell carcinoma predisposition syndrome (including VHL).

• At least three primary tumour regions available for analysis.

The cohort was representative of patients eligible for curative or cytoreductive nephrectomy. Full clinical characteristics are provided in Table S1. Demographic data include: Sex, Age and Ethnicity. Clinical data include: Presenting symptoms, Smoking status, BMI, History of Previous RCC, Family History of RCC, Bilateral or Multi-focal RCC, Neoadjuvant therapy (6 patients received systemic therapy prior to nephrectomy). Histology data include: overall TNM Stage (based on Version 7 classification), Location of nephrectomy, Number of harvested and involved lymph nodes, presence of Microvascular Invasion, presence of Renal Vein Invasion, presence of IVC tumour thrombus, Size of primary tumour, Leibovich score, Fuhrman Grade, Time to nephrectomy (days). Clinical status of patients included: Relapse -free survival (months), Total follow up (months), Survival Outcome.

Extension cohort of primary and metastatic (P-M) pairs was accessed under the approval of Basque Country Research Ethics Committee, Hospital Universitario Cruces (Ref CEIC- Euskadi PI2015101).

Post-mortem sampling was performed in the context of the PEACE study (National Health Service Research Ethics Committee approval 13/LO/0972/AM05);

Method details

Sample collection (TRACERx cohort and post-mortem sampling)

All surgically resected specimens were reviewed macroscopically by a pathologist to guide multi-region sampling for this study and to avoid compromising diagnostic requirements. Tumour measurements were recorded and the specimen were photographed before and after sampling. Primary tumours were dissected along the longest axes and spatially separated regions sampled from the“tumour slice” using a 6 mm punch biopsy needle. The punch was changed between samples to avoid contamination. The total number of samples obtained reflects the tumour size with a minimum of 3 biopsies that are non-overlapping and equally spaced. However, areas which are obviously fibrotic or haemorrhagic are avoided during sampling and every attempt is made to reflect macroscopically heterogeneous tumour areas. Primary tumour regions are labelled as R1 , R2, R3... and locations are recorded. Normal kidney tissue was sampled from areas distant to the primary tumour and labelled N1. Each biopsy was split into two for snap freezing and formalin fixing respectively, such that the fresh frozen sample has its mirror image in the formalin-fixed sample which is

subsequently paraffin embedded. Fresh samples were placed in a 1.8 ml cryotube and immediately snap frozen in liquid nitrogen for >30 seconds and transferred to -80 C for storage. Peripheral blood was collected at the time of surgery and processed to separate buffy coat.

Nucleic acid isolation from tissue and blood (TRACERx and PEACE cohorts)

DNA and RNA were co-purified using the AllPrep DNA/RNA mini kit. (Qiagen). Briefly, a 2mm³ piece of tissue was added to 900ul of lysis buffer and homogenised for five seconds using the TissueRaptor (Qiagen) with a fresh homogenisation probe being used for each preparation. Each lysate was applied to a QiaShredder (Qiagen) and then sequentially purified using the DNA and RNA columns according to the manufacturers protocol. Germline control DNA was isolated from whole blood using the DNeasy Blood and Tissue kit (Qiagen) according to the manufacturers protocol. DNA quality and yield was measured and accessed using the TapeStation (Agilent) and Qubit Fluorometric quantification (ThermoFisher Scientific).

Purification of DNA from Formalin Fixed Paraffin Embedded (FFPE) tissue

For a minority of TRACERx Renal cases (n=8), tumour material was obtained from FFPE material. An H&E section from all patient FFPE blocks is reviewed by a pathologist and tumour rich regions are identified for DNA purification. Either a 20uM sections is cut and the area of interest scraped from the slide using a blade alternatively a 2mm core is directly punched from the block. DNA is purified using the GeneRead DNA FFPE kit (Qiagen) with yields and quality being determined by Qubit quantification and TapeStation analysis.

Micro-dissection and nucleic acid isolation (HUC extension cohort)

H&E slides from each case were annotated by pathologists for regions of interest (ROI). Multiple ROIs within the primary tumour were selected on the bases of good tissue preservation avoiding areas of necrosis and haemorrhage, and to reflect microscopically distinct areas with regards to grade (high vs low) and morphology (clear vs. granular/eosinophilic), and sarcomatoid differentiation, where present, as well as areas of normal tissue. The annotated H&E was then used as a reference to guide the dissection of ROIs from serial sections. All tissue sections were cut to 10 pm thickness and deparaffinized with three, five minute incubations in xylene prior to dissection using the alpha AVENIO Millisect System (Roche Diagnostics, Indianapolis, IN) (Adey et al. , 2013). The milling tip blade size for the dissection was selected based on the estimated area of the ROI, where small ROIs less than 200mm2 used small blade sizes (200 or 400pm) and ROIs larger than 200mm2 used larger blade sizes (800 pm). The milling buffer for all dissections was 1x TE buffer with 2% SDS, pH 7.5. Genomic DNA was isolated from each of the dissected FFPE tissue samples using a High Pure FFPE DNA Isolation kit (Roche).

Methylation specific PCR

Methylation of the VHL promoter was detected after bisulphite treatment of 500ng of patient DNA using the EZ DNA Methylation-Direct kit (Zymo Research). Bisulphite treated DNA was amplified in the PCR using methylation specific oligonucleotides (oligonucleotide sequences are detailed in Table 2), followed by Big Dye terminator Sanger sequencing. Methylation was confirmed by comparing and contrasting patient tumour and normal renal tissue for methylation protected CpG sequences.

Regional staining by Immunohistochemistry and Digital Image Analysis of Ki67

Tissue sections of 4pm were mounted on slides and immunohistochemical staining for Ki67 was performed using a fully automated immunohistochemistry (IHC) system and ready-to- use optimized reagents according to the manufacturer^'s recommendations (Ventana Discovery Ultra, Ventana, Arizona, USA). Primary antibody used was rabbit anti-Ki67 (AB16667, Abeam, Cambridge, UK) and secondary antibody was Discovery Omnimap anti rabbit HRP RUO (760-4311 , Roche, Rotkreuz, Switzerland). DAB kit was Discovery

Chromomap DAB RUO (760-4311 , Roche). After IHC procedure, slides were first evaluated for Ki67 staining quality using mouse intestine tissue as positive control. Regions containing tumour tissue were identified and marked by a pathologist and subsequently scanned in brightfield at 20x magnification using Zeiss Axio Scan.ZI and ZEN lite imaging software (Carl Zeiss Microscopy GmbH, Jena, Germany). Digital images were then subjected to automated image analysis using StrataOuest version 5 (TissueGnostics, Vienna, Austria) for Ki67 quantification. Three different gates were set to quantify low, medium and high intensity DAB staining which corresponded to Ki67 expression levels. Results were depicted as total percentage of Ki67-positive nuclei.

Flow Cytometry Determination of DNA Content (FACS) Fresh frozen tumour tissue samples, approximately 4mm³ in size, were mechanically disrupted and incubated in 2ml of 0.5% pepsin solution (Sigma, UK) at 37 °C for 40 minutes to create a suspension of nuclei. The nuclei were washed with phosphate-buffered saline (PBS) and then fixed with 70% ethanol for a minimum of 90 minutes. The nuclei were washed again with PBS and stained with 200mI of propidium iodide (50pg/ml) overnight.

Flow cytometric analysis of DNA content was performed using the LSR Fortessa Cell Analyzer (Becton Dickinson, San Jose, USA), BD Facs Diva™ software and FlowJo software (FlowJo LLC, Oregon, USA. A minimum of 10,000 events were recorded (typically up to 20,000 and up to 100,000 in complex samples). Analysis was performed using methods derived from the European Society for Analytical Cellular Pathology DNA

Consensus in Flow Cytometry guidelines. Gating of forward and side scatter was applied to exclude debris and cell clumping. Samples with <7,500 events after gating were excluded from further analysis. The coefficient of variation (CV) was measured on each G1 peak. Samples with a CV>10% were excluded from further analysis. Each tumour sample was assumed to contain normal cells to act as internal standard. Where possible the position of the diploid peak was calculated with reference to the peak of diploid cells in a case matched normal tissue sample. The DNA index (Dl) of any aneuploid peak present was calculated by dividing the G1 peak of the aneuploid population by the G1 peak of the normal diploid cells. Diploid samples were defined as having Dl of 1.00. Any additional peak was defined as aneuploid. A tetraploid peak was defined as having a Dl of 1.90-2.10 and containing >15% of total events unless a second peak corresponding to G2 was clear on the histogram.

Similarly, aneuploid peaks near to G1 (Dl 0.90-1.10) were only considered if there was a clear second peak containing >15% of total events.

Detection of allelic imbalance at the HLA locus

Allelic imbalance was detected using two polymorphic Sequence-Tagged Site (STR) markers located on the short arm of chr 6, close to the HI_A locus - (D6S248 and

ATA12D05), six STR markers located downstream of the HLA locus on the short arm of chr 6p - (D6S1960, GATA143B11 , D6S1714, D6S1573, D6S438 and D6S257), and six STR markers located upstream of the HLA locus on the short arm of chr 6p - (D6S410, D6S2257, D6S1034, D6S202, D6S1617, D6S1668). 20ng of patient germline and tumor region DNA was amplified using the PCR. The PCR comprised of denaturing at 950C for 5mins, then 35 cycles of denaturing at 950C for 1min, followed by an annealing temperature of 550C for 1min, 720C for 1min and then a PCR extension at 720C for 10min. PCR products were separated on the ABI 3730x1 DNA analyzer. Fragment length and area under the curve of each allele was determined using the Applied Biosystems software GeneMapper v5. When two separate alleles were identified for a particular marker, the fragments could be analyzed for allelic imbalance using the formula (Atumor/Atumor)/(Anormal/Anormal). The output of this formula was defined as the normalized allelic ratio.

Targeted Driver Panel (DP) design and validation

Driver gene panels (Panel_v3, Panel_v5 and Panel_v6) were used in this study. Panel_v3 was designed in 2014, including 110 putative driver genes. Panel_v5 and Panel_v6 were designed in 2015, including 119 and 130 putative driver genes respectively. Driver genes were selected from genes that were frequently mutated in TCGA (accessed in April 2015) or highlighted in relevant studies (Arai et al. , 2014; Sato et al. , 2013; Scelo et al. , 2014). Only alterations in driver genes represented in all three panels were considered in the overall driver mutation analyses. All panels targeted potential driver SCNA regions. To prevent inter-patient samples swaps, we included the 24 SNPs that were previously identified by Pengelly et al in Panel_v5 and Panel_v6.

Driver Panel Library Construction and Targeted Sequencing

Following isolated gDNA QC, depending on the available yield, samples were normalised to either 1-3 pg or 200 ng for the Agilent SureSelectXT Target Enrichment Library Protocol; standard or low input sample preparation respectively. Samples were normalised using a 1X Low TE Buffer. Samples were sheared to 150-200bp using a Covaris E220 (Covaris,

Woburn, MA, USA), following the run parameters outlined in the Agilent SureSelectXT standard 3 mg and low input 200 ng DNA protocols. Library construction of samples was then performed following the SureSelectXT protocols, using 6 pre-capture PCR cycles for the standard input samples and 10 pre-capture PCR cycles for the 200 ng low input samples. Hybridisation and capture were performed for each individual sample using the Agilent custom Renal Driver Panel target-specific capture library (versions 3, 5 & 6). The same version of the capture library being used for all samples from the same patient case. Captured SureSelect-enriched DNA libraries were PCR amplified using 14 post-capture PCR cycles in PCR reactions that included the appropriate indexing primer for each sample. Amplified, captured, indexed libraries passing final QC on the TapeStation 4200 were normalised to 2nM and pooled, ensuring that unique indexes were allocated to all final libraries (up to 96 single indexes available) in the pool. QC of the final library pools was performed using the Agilent Bioanalyzer High Sensitivity DNA Assay. Library pool QC results were used to denature and dilute samples in preparation for sequencing on the lllumina HiSeq 2500 and NextSeq 500 sequencing platforms. The final libraries were sequenced 101bp paired-end multiplexed on the lllumina HiSeq 2500 and 151 bp paired-end multiplexed on the NextSeq 500, at the Advanced Sequencing Facility at the Francis Crick Institute. Equivalent sequencing metrics, including per sample coverage, was observed between platforms. Single nucleotide variants (SNVs), dinucleotide variants (DNVs), small insertions and deletions (INDELs) and somatic copy number alterations (SCNAs) were derived from 463 primary tumour regions and 169 matched metastatic regions from 56 primary-metastasis pairs in 38 patients (with some patients providing multiple metastases, Figure 17, Figure 1A). Median sequencing coverage was 613x (range 166-1479x) across primary tumour regions and 567x (range 273-2661 x) across metastatic regions.

Targeted DP library construction and sequencing (HUC cohort)

DP targeted hybrid-capture panel- Solution-based hybridization capture probes (Roche Sequencing Solutions) were selected from a genome-wide database of pre-scored probes, which varied in size from 50 to 100 nucleotides. Probes were filtered for repetitiveness in the human genome by building a 15-mer histogram from the entire human genome, and then calculating the average 15-mer frequency of the probe by sliding a 15 bp window across the length of each probe. Probes with a score greater than 100 were filtered as repetitive. The remaining probes were scored for uniqueness in the human genome, using SSAHA

(http://www.sanger.ac.uk/science/tools/ssaha). A match in the genome was defined as any 30-mer match in the genome, allowing up to 5 mismatches or indels along the length of the match. Additional scoring parameters included penalties for simple sequence repeats and penalties for deviation from a target Tm of 80 C. Target regions of interest were increased to a minimum size of 100 bp, and then tiled with an average overlap of 35 bp, allowing the probes to overhang the ends of the target regions. These tiled probes were selected from the aforementioned pre-scored database of probes by choosing the best scoring probe starting in a 15 bp window, moving 20 bp in the 3’ direction, and repeating. Probes were allowed to have up to 20 possible matches in the genome, though for this panel 99.5% of the probes had 5 or fewer matches. Selected probe sequences were manufactured into biotinylated sequence capture probe pools by Roche Sequencing Solutions - Madison.

Library construction. Libraries were constructed using the SeqCap EZ HyperCap Workflow User’s Guide, v1.0 (Roche Sequencing Solutions). The extracted DNA was enzymatically fragmented using the KAPA HyperPlus library prep kit according to manufacturer’s instructions (Roche Sequencing Solutions). Fragmentation time for DNA isolated from FFPE was linked to the mass of input DNA, and varied from 12 to 22 minutes depending on input amount (10 to 100 ng). To increase the efficiency of library prep, adapter volume was reduced to 3 I and the adapter ligation reaction was extended to 3 hours at 20oC for cases with 100ng of input DNA, and at 16 hours at 16oC for libraries with less than 100ng of input DNA. Sequencing- Captured samples were pooled following post-capture amplification, and sequenced using an lllumina HiSeq 2500 instrument. Dual HiSeq SBS v4 (lllumina) runs at 101 base-paired-end reads generated the data for analysis.

SNV, and INDEL calling from multi-region DP sequencing

Paired-end reads (2x100bp) in FastQ format sequenced by Hiseq or NextSeq were aligned to the reference human genome (build hg19), using the Burrows-Wheeler Aligner (BWA) vO.7.15. with seed recurrences (-c flag) set to 10000 (Li and Durbin, 2009). Intermediate processing of Sam/Bam files was performed using Samtools v1.3.1 and deduplication was performed using Picard 1.81 (http://broadinstitute.github.io/picard/) (Li and Durbin, 2009). Single Nucleotide Variant (SNV) calling was performed using Mutect v1.1.7 and small scale insetion/deletions (INDELs) were called running VarScan v2.4.1 in somatic mode with a minimum variant frequency (--min-var-freq) of 0.005, a tumour purity estimate (--tumor- purity) of 0.75 and then validated using Scalpel vO.5.3 (scalpel-discovery in - -somatic mode) (intersection between two callers taken)(Cibulskis et al., 2013; Fang et al. , 2016; Koboldt et al., 2009). SNVs called by Mutect were further filtered using the following criteria: i) £5 alternative reads supporting the variant and variant allele frequency (VAF) £ 1 % in the corresponding germline sample, ii) variants that falling into mitochondrial chr, haplotype chr, HLA genes or any intergenic region were not considered, iii) presence of both forward and reverse strand reads supporting the variant, iv) >5 reads supporting the variant in at least one tumour region of a patient, v) variants were required to have cancer cell fraction

(CCF)>0.5 in at least one tumour region (see Subclonal deconstruction of mutations section for details of CCF calculation) , vi) variants were required to have CCF>0.1 to be called as present in a tumour region, vii) sequencing depth in each region need to be >=50 and £3000. Finally, suspected artefact variants, based on inconsistent allelic frequencies between regions, were reviewed manually on the Integrated Genomics Viewer (IGV), and variants with poorly aligned reads were removed. Dinucleotide substitutions (DNV) were identified when two adjacent SNVs were called and their VAFs were consistently balanced (based on proportion test, P>= 0.05). In such cases the start and stop positions were corrected to represent a DNV and frequency related values were recalculated to represent the mean of the SNVs. To reduce sequencing artefacts from FFPE samples, we further filtered out variants that were significantly enriched for presence in FFPE compared with fresh frozen samples (Fisher’s exact test, P<0.001). Variants were annotated using Annovar (Wang et al., 2010). Variants were annotated using Annovar (Wang et al., 2010).

Deleterious mutations were defined if two out of three algorithms - SIFT, PolyPhen2 and MutationTaster - predicted the mutation as deleterious. Individual tumour biopsy regions were judged to have failed quality control and excluded from analysis based on the following criteria: i) sequencing coverage depth below 100X, ii) low tumour purity such that copy number calling failed. Mutations detected in high-confidence driver genes ( VHL , PBRM1, SETD2, PIK3CA, MTOR, PTEN, KDM5C, CSMD3, BAP1, TP53) were defined as driver mutations.

SCNA calling from multi-region DP sequencing

To estimate somatic copy number alterations, CNVkit vO.7.3 was performed with default parameter on paired tumour-normal sequencing data (Talevich et al., 2016). Outliers of the derived log2-ratio (logR) calls from CNVkit were detected and modified using Median Absolute Deviation Winsorization before case-specific joint segmentation to identify genomic segments of constant logR (Nilsen et al., 2012). Tumour sample purity, ploidy and absolute copy number per segment were estimated using ABSOLUTE v1.0.6 (Carter et al., 2012). In line with recommended best practice all ABSOLUTE solutions were reviewed by 3 researchers, with solutions selected based on majority vote. Copy number alterations were then called as losses or gains relative to overall sample wide estimated ploidy. Arm gain or loss was called when >50% of the chromosomal have copy number gain or loss. Driver copy number was identified by overlapping the called somatic copy number segments with putative driver copy number regions identified by Beroukhim and colleagues (Beroukhim et al., 2009). For a subset of TRACERx Renal patients, we compared SCNA calls between targeted panel and WGS datasets, and SCNA concordance was 87% (Companion paper, Turajlic et al., 2018). The average proportion of the genome with aberrant copy number, weighted on each of the 22 autosomal chromosomes, was estimated as the weighted genome instability index (wGII).

MSK validation cohort

Matched tumour and normal aligned sequencing files (BAM format) for the MSK cohort were obtained directly from the authors (Becerra et al., 2017) and were then converted into FASTQ format files using bam2fastq in bedtools package (Quinlan and Hall, 2010). SNVs, INDELs and SCNAs were called using the same methods as TRACERx Renal data (STAR Methods: SNV, and INDEL calling from multi-region DP sequencing, SCNA calling from multi-region DP sequencing). Of the 49 cases with ccRCC histology, 15 cases (Pair 8, Pair 9, Pair 13, Pair 17, Pair 22, Pair 35, Pair 38, Pair 42, Pair 43, Pair 44, Pair 48, Pair 52, Pair 56, Pair 58, Pair 59) were excluded from the study as the ABSOLUTE v1.0.6 algorithm failed to find a stable SCNA solution. Clonality of SNVs and SCNAs were estimated using

ABSOLUTE vl .0.6. Cancer cell fraction for INDELs were calculated using method described in STAR Methods: Subclonal deconstruction of mutations. INDELs with CCF>0.5 were called clonal. ITH index for each patient was calculated as the measure of intratumour heterogeneity (ITH index = # subclonal drivers / # clonal drivers).

Quantification and statistical analysis

R 3.3.2 was used for all statistical analyses. We tested for difference in driver event count between primary and metastatic samples using linear regression, including biopsy number per sample as an independent term in the regression model. The comparison of wGII, DNA index and Ki67 scores between“not selected” and“selected” clones was assessed using region values per case. Regions were classified as being within“not selected” or“selected” clones based on the clustering solution for each tumour. Regions found to be only within the founding MRCA clone, or polyclonal with both“not selected” and“selected” clones, were excluded. The comparison of non-synonymous mutation counts between“not selected” and “selected” clones was based directly on clonal clustering solution obtained for each case, again with founding MRCA clones excluded. For all“not selected” versus“selected” comparisons a linear mixed effect (LME) model was used to determine significance, to account for the non-independence of multiple observations from individual tumours. The comparison of maximum wGII (defined as the maximum regional wGII value per primary tumour) between“Rapid” and“Attenuated” metastatic progression groups was assessed using Mann-Whitney test. Comparison of ITH values (again one score per tumour) between “Rapid” and“Attenuated” metastatic progression groups was determined using Mann- Whitney test. The comparison of wGII between pancreatic and all other metastatic tissue sites was assessed conducted using region values per case, with significance determined using a LME model.

Subclonal deconstruction of mutations

where vaf is the variant allele frequency at the mutation base; p is estimated tumour purity; CN_t and CN_n are the tumour locus specific copy number and the normal locus specific copy number which was assumed to be 2 for autosomal chromosomes; and CCF is the fraction of tumour cells carrying the mutation. Consider CN_mut is the number of chromosomal copies that carry the mutation, the possible CN_mut is ranging from 1 to CN_t (integer number). We then assigned CCF with one of the possible value: 0.01 , 0.02, ..., 1 , together with every possible CN_mut to find the best fit cancer cell fraction of the mutation. Since we focused on driver genes in this study and the accuracy of the estimated CCF is limited by the size of the panel, we call mutations with CCF>0.5 as clonal mutations, mutations with CCF£0.5 and CCF>0.1 are subclonal. To determine the clonality of a mutation in a tumour, we ask for the mutation to be clonal in all regions in a tumour. Exceptions were made for long INDELs that affect >6 bp of the genome, due to VAF under estimation. If a long INDEL is present in all regions of a tumour, we called it as clonal. To determine the clonality of a SCNA in a tumour, we ask for the SCNA to be presence in all tumour regions, otherwise it is called subclonal.

Driver tree reconstruction

A matrix with presence and absence of nonsynonymous and synonymous point mutations, DNVs, INDELs and arm level SCNAs was created for each tumour, and all the events were clustered based on the following rule: a valid cluster has to have at least two arm level SCNAs or one non-synonymous mutation. The driver events clusters were then ordered into a clonal hierarchy using TRONCO and presented as driver trees (De Sano et al., 2016).

In terms of limitations, we recognise that our Driver Panel phylogenies are based on fewer clonal markers, as compared to whole exome or genome derived phylogenetic trees. As a consequence, some tumour clones are based on only a limited number of genomic markers, and similarly the inferred modes of metastatic seeding (e.g. monoclonal vs polyclonal) are also based on a limited set of markers. However, two contingency measures are in place to mitigate against phylogenetic misconstruction: i) ultra-deep 500x sequencing coverage, which ensures stably derived cancer cell fraction estimates, ii) a bespoke gene panel which is enriched for driver events, increasing the likelihood that mutational markers are driving genuine clonal expansion.

Enrichment of events in metastases

All tumour clones were categorised into three groups based on evidence of selection in the metastasis/metastases: i) clone that are not selected (“no selection”, defined as subclonal in the primary and absent from metastasis), ii) clones that are maintained (“maintained”, defined as the most recent common ancestor (MRCA) clones, clonal in both primary and metastasis ), iii) clones that are selected (“selection”, defined as subclonal in the primary and clonal in metastasis; or absent in the primary and present in metastasis). In addition, we observed a small number of clones with alternative selection patterns: a) being subclonal in both primary and in metastases (i.e. polyclonal metastases), which we categorised as “maintained”, and b) being clonal in primary and sublconal in metastases categorised as “maintained” c) clonal in primary but absent in metasteses (i.e. illusion of clonality or events lost by secondary somatic changes), which we categorised as“no selection”. For each driver event (mutational or SCNA), the proportion of times it was found in“not selected”, “maintained” and“selected” clones was calculated for each of the TRACERx, HUC and MSK cohorts. For comparison purposes, a background null distribution of proportions was determined for both mutations and SCNAs, based on all passenger events in each cohort. The proportion of“selected” clones was compared to the“not selected” proportion, using a Binomial test, with probability of selection taken from the null model, and number of trials based on event counts in each cohort. Meta-analysis across the three cohorts was conducted using Fisher's method of combining p values from independent tests, and p- values were corrected for multiple testing using Benjamini-Hochberg procedure.

Survival analysis Survival analysis was conducted using the Kaplan-Meier method, with p-value determined by a log-rank test. Hazard ratio and multivariate analysis adjusting for clinical parameters was determined through a Cox proportional hazards model.

Data and Software Availability

Sequencing data that supports this study will been deposited at the European Genome- phenome Archive (EGA), which is hosted by the European Bioinformatics Institute (EBI); accession number EGAS00001002793.

Additional resources

Clinical trial registry numbers:

TRACERx Renal study website, detailing investigators, sponsors and collaborators:

Results

Overview of the cohorts under study ccRCC tumours exhibit a variety of progressive phenotypes including invasion of the peri renal and renal sinus fat (T3a), direct invasion through the renal capsule (Gerota’s fascia) and into the adrenal gland (T4), intravascular tumour growth (T3a-T3c); and lymph node (N1/N2) and visceral metastases (M1), including indirect spread to the adrenal gland. In 38 patients whose primary tumours were profiled in the TRACERx Renal cohort (Table 1A), we profiled multiple regions from matched tumour thrombi, lymph node or visceral metastases using a bespoke gene panel (STAR Methods: Driver Panel).

The overall number of driver events (mutations and SCNAs as presented in Figure 11 A) was lower in metastases (mean=9), compared to primary tumours (mean=12, p=0.05, adjusted for the varying number of profiled regions, STAR methods) (Figure 11 A).

Consistent with evolutionary bottlenecking, metastases were significantly more

homogeneous (proportion of clonal variants = 0.87) compared to primary tumours

(proportion of clonal variants = 0.33, p=6.6x10-13, adjusted for the varying number of profiled regions, Figure 11B). Across the 56 prim ary- metastasis pairs 456 driver events were shared between primary and metastases, 230 were private to primary tumours and 39 driver events were private to metastases (Figure 11 C). Driver phylogenies were

reconstructed to infer clonal relationships between primary tumours and metastases (STAR methods).

The TRACERx Renal cohort was enriched for synchronous metastases (Figure 11 D), and to widen our investigation we analysed two additional cohorts. Using the Driver Panel (STAR methods) we multi-region profiled the“HUC” (Hospital Unversitario Cruces) cohort of archived formalin fixed paraffin embedded (FFPE) primary ccRCCs and matched

synchronous (6 cases), and metachronous metastases (23 cases) (STAR methods; Table 1B). We successfully profiled 80 primary tumour regions and 54 metastatic regions in 26 patients (two patients contributed multiple metastases). For the second cohort,“MSK” (Memorial Sloane Kettering), we re-analysed the sequencing data from a study of primary- metastasis pairs (Becerra et al., 2017) (STAR methods), to obtain both mutational and SCNA events in a total of 34 cases, including 19 synchronous, and 15 metachronous metastases (Table 1C). As expected, we noted a difference in the overall frequency of driver events in the HUC and MSK cohorts owing to the increased sensitivity for detecting subclonal alterations in the TRACERx Renal cohort. There was a wide temporo-spatial representation of metastases across the three cohorts encompassing 18 distinct metastatic sites (Figure 11 E), and presenting 0-17 years after the removal of the primary tumour

(Tables 1 A-C). Finally, we profiled a wide range of metastatic tissues sampled at post mortem in the context of the Posthumous Evaluation of Advanced Cancer Environment (PEACE) study (NCT03004755) in two cases of metastatic ccRCC (Table 1A).

Characterisation of the metastasising clone(s)

Taking advantage of the dense spatial sampling and phylogenetic reconstruction conducted in the TRACERx Renal cohort (Companion paper, Turajlic et al. 2018), we analysed the progression of individual clones from primary to metastatic sites. Across the 38 patients we observed 250 distinct tumour clones which we categorised into three groups based on the evidence of selection in the metastasis/metastases: i) clones that are not selected (“not selected”, n=129 clones, defined as subclonal in the primary and absent in metastasis), ii) clones that are maintained (“maintained”, n=38 clones, defined as the most recent common ancestor (MRCA) clones, clonal in both primary and metastasis), iii) clones that are selected (“selected”, n=83 clones, defined as subclonal in the primary and clonal in metastasis; or absent in the primary and present in metastasis) (Figure 12A, STAR Methods). Clones that were private to the metastasis may have evaded detection as a minor subclone in the primary tumour, or arisen de novo in the metastasis. The ability to differentiate the clones that appear to be selected versus not on a matched patient/tumour specific background across the whole cohort, allowed us to characterise the features associated with metastasis. We observed no difference in the number of non-synonymous mutations between the two groups (based on Driver Panel profiling median value = 4 for both, p=0.5295), however wGII was significantly elevated in selected clones (median“selected” = 0.29 vs“not selected” = 0.17, p<0.001 , Figure 12B). This was further supported by ploidy (determined by regional fluorescence activated cell sorting, FACS, STAR methods) also being significantly elevated in selected clones (mean DNA index“selected” = 1.29,“not selected” = 1.16, p<0.001 ,

Figure 12B). Multi-region immunohistochemistry staining for Ki67 (STAR Methods) demonstrated higher proliferation index in the clones that were selected, compared to those that were not (median Ki67 +40% higher in selected versus non-selected clones, p=0.0317, Figure 12B). Finally, we observed increased allelic imbalance at the human leukocyte antigen (HLA) locus in selected versus non-selected clones (HLA allelic imbalance observed in n=12“selected”, versus n=2“not selected” clones), in concordance with the findings in non-small cell lung cancer (McGranahan et al., 2017).

Next, we considered the individual driver events, mutational or SCNAs, that are selected during progression to metastasis, by comparing, for each event, the proportion of times it was found in“selected” versus“not selected” clones (Figure 12C). We conducted this analysis across TRACERx Renal (n=38), HUC (n=26) and MSK (n=34) cohorts, providing a total dataset of 98 matched prim ary- metastasis pairs. Significance was calculated by comparing event selection proportions, to null background rates as observed across all passenger events in each cohort (STAR Methods). "Selected” event frequencies were compared to "not selected", and one event was found to be significantly enriched in

"selected" clones: loss of chromosome 9p21.3 (p=0.0026, padj<0.1 after adjustment for multiple testing, Figure 12C). We also note loss of chromosome 14q31.1 reached significance in the meta-analysis before correction for multiple testing (p=0.0275, padj=0.303), suggesting this and other driver events may also contribute to metastasis. We acknowledge the risk of illusion of clonality (i.e. subclonal events appearing clonal within a single region of a primary tumour) limited our power to detect metastatic selection in the MSK, and to a lesser extent HUC, cohorts. For example, 53% of events in the TRACERx Renal cohort were subclonal, compared to only 31% in HUC and 11% in MSK cohorts.

Metastatic ccRCC has a variable spectrum of survival outcomes, with overall survival (OS) times ranging from short (<6 months), to prolonged (>5 years). Accordingly, we conducted OS analysis for the two events that were enriched in metastasising clones (P<0.05 from Figure 12C), to understand if they were also driving early ccRCC-related mortality, based on their presence/absence within the metastasising clone(s) of each case. OS data were not available for the MSK cohort. Hazard ratios (HR) were observed as follows (Figure 12D): 9p loss (HUC cohort HR=7.7, [2.8-20.8] 95% confidence interval, TRACERx cohort HR=lnfinity [no events in wild type group], p=0.0014 log-rank test across both cohorts, with study included as a term in the cox model) and 14q loss (HUC cohort HR=1.5, [0.6-3.9] 95%, TRACERx cohort HR=2.0, [0.5-8.2], non-significant). We note the strong association between reduced survival and 9p loss in the metastasising clone remained significant after correction for known clinical variables (p=0.046, adjusted for stage, grade and study)

(Figure 12E). 9p deletions have been reported to confer a poor prognosis (El-Mokadem et al., 2014; La Rochelle et al. , 2010), however the hazard ratios in our analysis (HR=7.7 and HR=infinity) are higher than reported in those studies (HR=4.3 (El-Mokadem et al., 2014), HR=1.7 (La Rochelle et al., 2010)), which may reflect the greater sensitivity of profiling events within the metastasising clones.

Evolution of tumour thrombus

Intravascular tumour growth and formation of tumour thrombus (TT) is observed in -15% of ccRCCs either in the renal vein (level I), extending to the infrahepatic inferior vena cava (IVC) (level II), retrohepatic or suprahepatic IVC (level III) or reaching the right atrium (level IV) (Psutka and Leibovich, 2015) (Figure 13). Untreated TT is associated with a poor outcome (Reese et al., 2013), but aggressive surgical management involving a

thrombectomy can result in long-term survival in some patients (Psutka and Leibovich,

2015). In the TRACERx Renal cohort 33/100 ccRCC cases presented with venous tumour extension (Companion paper, Turajlic et al., 2018), only one of which was classified as a “VHL monodriver” tumour which harboured 9p loss (K253, Figure 13). Median survival in patients with TT was 17.8 months (Table 1D) with three patients dying within 6 months of surgery due to disease progression (K328, K263, K390); classified as“multiple clonal” driver (2 cases) and“VHL wt” (1 case) subtypes (Table 1D). In 24/33 cases we successfully profiled the TT along its length (Table 1 D), and reconstructed driver phylogenies to infer the clonal relationship between primary tumour and the intravascular tumour extension (Figure 13). The TT was seeded directly by the most recent common ancestor (MRCA, the clone which harbours the full complement of alterations common to all the clones in the tumour; denoted by the first node in the phylogenetic tree) in ten cases (K239, K1 18, K250, K207, K059, K167, K276, K107, K253, K191 ; Figure 13). In other cases, the TT emerged from the more advanced subclones in the primary tumour, which harboured additional drivers, including 9p loss. Cases where the TT was seeded by the MRCA, suggesting intravascular growth was an early event, had an improved clinical outcome compared to the cases where late emerging clone seeded the TT. Whilst most primary tumours had evidence of ongoing evolution, tumour thormbi harboured limited additional alterations (94.9% of TT events were shared with primary). Consistent with the propensity of TT to progress rapidly (Woodruff et al. , 2013), we observed an elevated proliferation index within primary tumours presenting with TT compared to those without ( P = 0.00095). Thus, the lack of fixation of new driver events in TTs may be due to their rapid extension and/or limited selective pressure in the intravascular space.

An interesting biological and clinical question relates to the ability of TT to act as a source of other metastases, and in this context, we profiled six patients with venous tumour extension and concurrent lymph node and/or visceral metastases. In some cases, distinct clones in the primary tumour seeded the TT and the metastasis (K326 and K390). Consistent with the worse prognosis conferred by lymph node involvement in ccRCC, in K390 the lymph node seeding clone harboured 9p loss while the TT clone did not. The same primary clone seeded both TT and metastasis in K096 and K427 (Figure 20); whilst in K107 and K263 (Figure 20) the metastasising clone appeared to first seed the thrombus, and then lymph node and adrenal sites, respectively, suggesting TT may act as a reservoir of metastases, consistent with the poor outcomes of untreated thrombus (Reese et al., 2013). The alternative explanation is that all the sites, including TT were seeded by a clone which evaded detection in the primary tumour.

Evolution of progressive disease

Within the 38 TRACERx Renal prim ary- metastasis cohort 25 patients developed progressive disease. The clinical outcomes in this group were variable, with overall survival time ranging from 1.5-54.4 months (Table 1A). Given that cytoreductive nephrectomy and

metastasectomy are performed to achieve longer disease-free survival, we considered the evolutionary features of cases that progressed rapidly (i.e. multiple sites of disease progression within 6 months of surgery) versus those with attenuated progression (i.e. single site progression <6 months; or multi-site progression >6 months), capturing both the speed and the extent of metastatic spread (Table 1E, Figure 14A). One patient (K328) died from operative complications and was excluded from the analysis. Eight cases were classified as having“rapid progression”: K376, K326, K263, K107, K153, K446, K390 and K066 (Figure 14A). This group was enriched for“multiple clonal driver”,“VHL wild type” and“BAP1 driven” evolutionary subtypes (Figure 14B) and associated with lower ITH and elevated wGII relative to the cases with attenuated progression (Figure 14C). All primary tumours in this group harboured loss of 9p (Figure 14A). They were more likely to progress to liver metastases (6/8) compared to cases in the“attenuated progression” group (1/16)

(p=0.0013), and had a short overall survival (Figure 14A). Particularly notable in this group was case K153 in which lymph node and lung metastases were seeded from the same “BAP1 driven” subclone, which had high WGII and harboured 9p loss, while the competing “PBRM1~>SETD2” subclone failed to metastasise.

16 cases were classified within the“attenuated progression” group: K379, K096, K208,

K071 , K243, K206, K520, K180, K029, K228, K427, K253, K229, K386, K276, K280 (Figure 14A). This group was enriched for“PBRM 1 SETD2”,“PBRM1^PI3K”,“PBRMI^· SCNA” and“VHL monodriver” evolutionary subtypes, with the primary tumours were characterised by higher ITH index and lower wGII, as compared to the“rapid progression” group (Figure 14C). Disease progression interval was longer and often limited to a single metastatic site. Consequently, in some patients, metastatic disease was successfully controlled with further surgery (K029) or radiotherapy (K096, K228, K208, K243), consistent with the lack of other occult metastases. For example, case K029 presented with spatially

separate bone metastases three years apart. The metastasising clone harboured a PBRM1 mutation, but not 9p loss. Thus, although ITH is associated with metastatic disease, the pattern of metastases suggests a reduced metastatic efficiency, possibly due to increased clonal competition. This observation is consistent with the notion that heterogenous tumours harbour clones with a wide range of metastatic competence.

Evolution of latent metastases

We compared the time from primary tumour to metastasis, by tissue site, across the combined TRACERx/HUC/MSK cohorts. In keeping with the known modes of late recurrence in ccRCC, we observed the pancreatic metastases to have the longest time to presentation (median 15 years, compared to 3 years for all other tissue sites, Figure 15A). Intriguingly, pancreatic metastases were found to have significantly lower wGII, as compared to all other metastatic tissue sites (p=0.0489, Figure 15B). A shared clonal ancestry was confirmed between primary and metastatic sites in all 3 pancreatic metastasis cases, and we observed a strikingly low number of additional driver alterations in pancreatic metastases, despite the extended latency time (Figure 15C). In the case of SP006, the pancreatic metastasis occurred 17 years after the primary tumour was resected, and the latent clone mapped directly back to the founding MRCA clone, suggesting early divergence of a primitive ancestral clone. Similarly, in SP023, a case with pancreatic metastasis at 15 years, the latent clone derived from the primary MRCA and only acquired one additional driver mutation in MTOR (Figure 15C, Figure 18). Finally, SP058 presented with pancreatic metastasis at 8 years, with a single additional driver event ( SETD2 mutation) in metastasis, while we detected alternative subclones with a greater number of driver events in the primary tumour (Figure 15C). The seeding by the ancestral clone and the lack of 9p loss suggests that the pancreas may be a more permissive metastatic niche for ccRCC. The reasons for the characteristic latency of pancreatic metastases remain unknown, but are likely to involve interactions with the tumour microenvironment, the immune system and altered epigenetic states (Giancotti, 2013).

Spatial resolution of metastases through post-mortem sampling

To explore the clonal dynamics of multiple metastases we sampled them at post-mortem in two cases (Table 1A, Figurel 6) through the PEACE study (NCT03004755). Case K548 presented with a primary ccRCC which had already disseminated to multiple sites including adrenal, loco-regional and mediastinal lymph nodes, liver, and pleura (Table 1F). All disease sites, including the primary tumour, were sampled at post-mortem (Figure 16A). Clonal mutations were detected in VHL, PBRM1 and SETD2 genes, and accordingly this case was categorised as a“multiple clonal driver” subtype. The primary tumour had low ITH and high wGII, and all 13 metastatic sites sampled were seeded by the dominant clone which was characterised, in addition, by 9p loss Figure 16A). We note this patient progressed rapidly through two lines of systemic therapy and died six months after the diagnosis of ccRCC (Table 1F). The evolutionary features of the primary tumour are in keeping with those we observe in the TRACERx Renal cases with“rapid progression” (Figure 14A).

In case K489 the patient presented with a primary ccRCC and underwent a nephrectomy with curative intent (Figure 16B). 7 years following surgery two pancreatic metastases were detected on imaging and the patient underwent a complete metastasectomy (Table 1F, Figure 16B). 4 years later they presented with lymph node and lung metastases (Figure 6B). They received multiple lines of systemic therapy, subsequently developing metastases at additional sites including liver and bone, and succumbing to their disease 17 years after the original diagnosis (Table 1 F, Figure 16B). We obtained fresh samples at post-mortem from multiple lymph node sites, liver, lung, and contralateral kidney metastases; and we accessed the primary tumour and the resected pancreatic metastases from archived FFPE material. The primary tumour harboured a clonal VHL mutation and 3p loss, and a subclonal PBRM1 and multiple SETD2 mutations, indicating parallel evolution. These features were consistent with the“PBRM1 ->SETDZ evolutionary subtype (Companion paper, Turajlic et al., 2018). In accordance with our observations in the TRACERx renal cohort (Figure 14), the pattern of disease spread was consistent with“attenuated progression”. The two pancreatic metastases were seeded by separate clones (indicating potentially distinct waves of metastatic spread) neither of which harboured 9p loss . By contrast, subsequent metastases to the lymph nodes, liver, lung and kidney were seeded by an advanced clone harbouring additional SCNA events, including loss of 9p.

Supplementary Table S1A

Continued

Continued

Tumour characteristics

Continued

Continued

Table S1B

Continued

Continued

Continued

Continued

Continued

Continued

Table S1C

Table S1 D

^*TNM staging according to the AJCC, 7^th Edition Table S1E

Example 3 - association of driver SCNAs and weighted genomic instability index (WGII) in an independent dataset

Aim

To assess association of driver SCNAs and weighted genomic instability index (WGII) in an independent dataset. Methods

The AACR Project Genomics, Evidence, Neoplasia, Information, Exchange (GENIE) is a multi-phase, multi-year, international data-sharing project that aims to catalyze precision cancer medicine. In this analysis we leveraged data from the MSK-IMPACT cohort within the GENIE dataset; patients have targeted panel sequencing of 341 putative cancer genes with a SNP tiling array to supplement copy number analysis (1).

Copy number alterations were then called as losses or gains relative to overall sample wide estimated ploidy. Using cytobands described in Turajlic et al (2), we identified arm level gain or loss was called when >30% of the chromosome arm has copy number gain or loss. The average proportion of the genome with aberrant copy number, weighted on each of the 22 autosomal chromosomes, was estimated as the wGII (3). WGII was dicothomised around the median value into‘High’ and‘Low’. Contingency tables were constructed for each chromosome arm level event and the relative risk of death and 95% Cl calculated.

Results We identified 202 patients with clear cell renal cell carcinoma (ccRCC) who had clinical outcome data (“Alive”,“Deceased”; no time to event analysis) as reported in supplementary information previously (4). Relative risk (RR) estimate and 95% Cl are shown in Table 1 and represented as a Forest Plot in Figure 22. We saw a significant association with loss of 9p and high WGII score. There was a non-significant trend for 14q loss and an increased risk of death in this cohort.

Table 1 : Relative Risk estimate and 95% Cl

References

1. Cheng DT, Mitchell TN, Zehir A, Shah RH, Benayed R, Syed A, et al. Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J Mol Diagn. 2015;17(3):251-64.

2. Turajlic S, Xu H, Litchfield K, Rowan A, Horswell S, Chambers T, et al. Deterministic Evolutionary Trajectories Influence Primary Tumor Growth: TRACERx Renal. Cell.

2018;173(3):595-610 e11. 3. Endesfelder D, Burrell R, Kanu N, McGranahan N, Howell M, Parker PJ, et al.

Chromosomal instability selects gene copy-number variants encoding core regulators of proliferation in ER+ breast cancer. Cancer Res. 2014;74(17):4853-63.

4. Zehir A, Benayed R, Shah RH, Syed A, Middha S, Kim HR, et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat Med. 2017;23(6):703-13.

All documents referred to herein are hereby incorporated by reference in their entirety, with special attention to the patient matter for which they are referred. Various modifications and variations of the described methods and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention.

Although the invention has been described in connection with specific preferred

embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention which are obvious to those skilled in molecular biology, cellular immunology or related fields are intended to be within the scope of the following claims.

Claims

1. A method for predicting the response of a patient with clear cell renal cell carcinoma (ccRCC) to drug treatment or surgery, wherein said method comprises analysing for a modification in a gene selected from BAP1 , PBRM1 , SETD2, PTEN, VHL, mTOR, PIK3CA, TSC1 and TSC2, and/or a somatic copy number alteration (SCNA) selected from loss 9p, loss 14q, gain 8q and gain 12p in a sample from said patient.

2. The method of claim 1 wherein the presence of: i) an inactivating modification in the VHL gene and in two or more of BAP1 , PBRM1 , SETD2 and PTEN genes; (ii) an inactivating modification in the BAP1 gene, and optionally the VHL gene;

(iii) an absence of an inactivating modification in the VHL gene; and/or

3. The method of claim 1 wherein the presence of:

(i) an inactivating modification in the PBRM1 gene, and in the SETD2 gene;

4. Use of a gene selected from BAP1 , PBRM1 , SETD2, PTEN, VHL, mTOR, PIK3CA, TSC1 and TSC2, and/or a somatic copy number alteration (SCNA) selected from loss 9p, loss 14q, gain 8q and gain 12p as biomarkers for predicting the response of a patient with ccRCC to drug treatment or surgery.

5. The method or use according to any preceding claim, wherein said method or use identifies a patient suitable for treatment with drug treatment or surgery.

6. The method or use according to any preceding claim wherein said sample is a tumour biopsy.

7. The method or use of claim 6 wherein more than one biopsy is taken from said patient, preferably more than 8 biopsies.

8. The method or use according to any preceding claim wherein said patient is a human.

9. The method or use according to any preceding claim wherein said clear cell renal cell carcinoma is stage I, II III or IV.

10. The method or use according to any preceding claim wherein said drug treatment is an anti-ccRCC drug selected from Bevacizumab, Nivolumab. Sorafenib Tosylate, Sorafenib, Lenvatinib, Tivozanib, Sutent, Axitinib, Pazopanib, Cabozantinib, Sunitinib, Cabozantinib., Temsirolimus, Everolimus, an interleukin, preferably interleukin-2 (IL-2), an interferon, Ipillimumab, Pembrolizumab, Atezolizumab, Nivolumab, Avelumab and Durvalumab.

11. The method or use according to any preceding claim wherein said surgery is surgical resection of a ccRCC tumour or metastasis.

12. A method of treating ccRCC in a patient predicted to have an improved response to drug treatment by the method according to any preceding claim comprising administering a therapeutically effective amount of said drug treatment to said patient.

13. A method of treating ccRCC in a patient predicted to have an improved response to surgery by the method according to any preceding claim comprising surgical resection of a ccRCC tumour or metastasis.

14. An anti- ccRCC drug for use in treating a patient with ccRCC , wherein said patient is predicted to have an improved response to drug treatment by the method according to any preceding claim.

15. Use of an anti- ccRCC drug in the manufacture of a medicament for use in treating a patient with ccRCC , wherein said patient is predicted to have an improved response to drug treatment by the method according to any preceding claim.

16. A kit for use in identifying patients with ccRCC who are predicted to have an improved response to drug treatment or surgery, said kit comprising primers suitable for identifying mutations in a gene selected from BAP1 , PBRM1 , SETD2, PTEN, VHL, mTOR, PIK3CA, TSC1 and TSC2, and/or a somatic copy number alteration (SCNA) selected from loss 9p, loss 14q, gain 8q and gain 12p, and wherein the kit optionally comprises a set of instructions.

17. A method of predicting the prognosis of a patient with ccRCC, wherein said method comprises analysing the weighted genome instability index (wGII) and the intratumour heterogeneity (ITH) index in a sample from said patient.

18. The method of claim 17 wherein low ITH and low WGII are indicative of a good or improved prognosis.

19. The method of claim 17 wherein high ITH and low WGII, or high ITH and high WGII are indicative of an intermediate prognosis.

20. The method of claim 17 wherein low ITH and high wGII are indicative of a worse prognosis.

21. A method, use, composition for use or kit substantially as described herein and with reference to the accompanying Examples.