Abstract
While triple-negative breast cancer (TNBC) is known to be heterogeneous at the genomic and transcriptomic levels, spatial information on tumor organization and cell composition is still lacking. Here, we investigate TNBC tumor architecture including its microenvironment using spatial transcriptomics on a series of 92 patients. We perform an in-depth characterization of tumor and stroma organization and composition using an integrative approach combining histomorphological and spatial transcriptomics. Furthermore, a detailed molecular characterization of tertiary lymphoid structures leads to identify a gene signature strongly associated to disease outcome and response to immunotherapy in several tumor types beyond TNBC. A stepwise clustering analysis identifies nine TNBC spatial archetypes, further validated in external datasets. Several spatial archetypes are associated with disease outcome and characterized by potentially actionable features. In this work, we provide a comprehensive insight into the complexity of TNBC ecosystem with potential clinical relevance, opening avenues for treatment tailoring including immunotherapy.
Similar content being viewed by others
Introduction
Triple-negative breast cancer (TNBC) accounts for 15–20% of all breast cancers (BC) and is associated with poor prognosis1,2,3. TNBC is known to be a heterogeneous disease, which poses important therapeutic challenges. As such, there remains an unmet need for more effective treatment options for patients affected by TNBC4,5. Multiomics analyses performed on bulk tumors previously identified several TNBC molecular subtypes, namely basal-like (BL), immunomodulatory (IM), luminal androgen receptor (LAR), mesenchymal (M) and mesenchymal stem-like (MSL)6,7,8,9,10. These molecular subtypes display distinct transcriptomic, genomic and tumor microenvironment profiles and are associated with different clinical outcomes and potential differences in treatment response10,11,12 (Fig. 1). Among them, the IM subtype is characterized by a higher expression of immune gene signatures and potentially targetable immune checkpoints and is associated with a better prognosis. The BL subtype has an intermediate prognosis and is characterized by high genomic instability, with DNA repair gene deficiency, and a high rate of TP53 mutations. In contrast, M and MSL tumors are mainly associated with angiogenesis and stroma signatures. Finally, the LAR subtype, which is characterized by androgen receptor (AR) expression and enriched for PIK3CA, AKT1, and CDH1 mutations, is usually associated with a worse prognosis6,9,10. Furthermore, the tumor microenvironment has emerged as a crucial factor in the prognosis and treatment response of TNBC. Quantitative levels of tumor-infiltrating lymphocytes (TILs), their spatial organization, and the formation of lymphoid aggregates have all shown promise as prognostic and predictive markers for chemotherapy and immunotherapy responses in TNBC13,14,15,16,17,18. Nevertheless, RNA sequencing (RNA seq) performed on bulk tumors has major drawbacks, as it does not capture geographic intratumoral heterogeneity, epistatic interactions between different tumor clones, or the tumor microenvironment19,20,21. This approach is, therefore, inadequate to comprehensively dissect the complexity of the tumour ecosystem.
Recently, Spatial Transcriptomics (ST) has emerged as a tool for spatially resolved transcriptome-wide expression analyses, allowing tissue exploration in an unsupervised manner22,23. Several publications highlight the power of ST, either alone or integrated into single-cell RNA sequencing, for the investigation of tumor heterogeneity including its tumor microenvironment24,25,26,27,28. The accumulation of ST data in the BC field is still in its early stages and therefore fuels the interest to expand this technology to a broader cohort.
Here, we aim to investigate the spatial architecture that characterizes TNBC heterogeneity and assess its impact on response to therapy and clinical outcome. To this end, we perform ST on a cohort of 92 TNBC patients. We first examine intratumoral heterogeneity at the pure histomorphological level by performing a detailed morphological annotation of the tumors, enabling us to capture the histomorphological intratumoral heterogeneity across the different TNBC molecular subtypes. In a second step, we dissect tumor and stroma compartments by interrogating the associations between specific morphological, transcriptomics and cellular features. Using spatial deconvolution, we show that tumor and stroma compartments have different contributions to TNBC molecular classification. We further focus our analysis on tertiary lymphoid structures (TLS), a key immune aggregate, and develop a specific 30-gene TLS signature derived from ST data. Of interest, the prognostic and predictive values to immune checkpoint blockade (ICB) are shown and validated in TNBC and non-TNBC BC cohorts as well as in other tumor types. Finally, spatial expression data allow us to fine-tune our understanding of the TNBC ecosystem with the identification of nine spatial archetypes (SA) with potential clinical relevance.
Results
Leveraging TNBC expression patterns using ST
In order to comprehensively investigate the morphological landscape and transcriptomic program of intratumoral heterogeneity that characterizes TNBC, we conducted ST analyses on a series of 96 primary TNBC samples from 94 patients (Fig. 2). Characteristics of the cohort are summarized in Supplementary Table 1 and Supplementary data 1. In particular, we used the glass slide ST array with 1934 spatially barcoded spots of 100 μm diameter29,30. The transcriptional profile of each spot is composed of a mixture of various cell types, with up to 200 cells per spot. In this retrospective study, we used fresh-frozen surgical specimens collected from the tissue bank of the Jules Bordet Institute (Brussels, Belgium). Three of the four consecutive cryosections were used for ST analysis, with the fourth section being used for double immunochemistry (IHC) staining to assist histomorphological annotation of TILs) CD3+ T cells and CD20+ B cells. Additional consecutive sections were used to perform bulk RNA sequencing for correlative analyses with the ST findings.
Of note, high-quality sequencing data were obtained for 281 out of the 288 ST subarrays performed, corresponding to 92 out of 94 patients. In total, 270,310 ST spots were analyzed, with a median of 960 spots per subarray and a median of 2554 genes at the spot level (Supplementary data 2).
Molecular TNBC subtypes display distinct morphological patterns
In order to explore spatial intratumoral heterogeneity at the morphological level, two specialized breast pathologists conducted a thorough manual annotation of the hematoxylin and eosin (H/E) stained slides associated with the ST slides. A total of 15 histomorphological categories were depicted and further grouped into three broad classes: tumor (invasive and in situ), stroma (fat tissue, lactiferous ducts, vessels, stroma cells and acellular stroma) and immune features (lymphocytes and TLS) (Supplementary Table 2). Furthermore, we investigated TILs organization and localization by assigning each sample to one of the four classes according to the Tumor Immune Micro-Environment (TIME) classification17: fully inflamed (FI), stroma restricted (SR), margin restricted (MR) and immune desert (ID) (Supplementary data 1). As illustrated in Fig. 3a, b, TNBC appears as a highly heterogeneous entity made up of a mosaic of variable morphological features.
We first interrogated the histomorphological composition of the tumor samples according to the TNBC molecular subtypes and observed that IM, BL and M subtypes were enriched in tumor content while LAR and MSL subtypes were enriched in stroma components (Fig. 3c and Supplementary Fig. 1a). As expected, lymphocytes were mostly present in the immune-related IM subtype whereas LAR and MSL subtypes exhibited a higher prevalence of normal structures including fat tissue and vessels (full results in Supplementary data 3).
We next aimed to examine the spatial organization of tumor and stroma cells within each of the TNBC molecular subtypes. For this purpose, neighboring individual tumor cells were grouped and classified as tumor patches, while adjacent stroma cells/structures were grouped and classified as stroma patches. The patches were further characterized by three different metrics: their number, mean size, and ‘evenness’, a patch size diversity index (Fig. 3d). As represented in Fig. 3e, BL and IM subtypes were characterized by a smaller number of larger tumor patches associated with a higher diversity, whereas LAR, M and MSL subtypes displayed a higher number of small, dispersed tumor patches (Supplementary data 4). Stroma patches exhibited a reverse pattern across the TNBC molecular subtypes (Supplementary Fig. 1b, Supplementary data 5).
These data show that the TNBC molecular subtypes display distinct patterns of tumor organization and cell composition that could potentially be captured from routine H/E slides using deep learning algorithms.
Spatial deconvolution of the different TNBC molecular subtypes shows different contributions of tumor and stroma compartments
We next sought to use the spatial expression data to investigate the contribution of the tumor and non-tumor (defined as ‘stroma’) compartments to the TNBC molecular classification as determined by bulk transcriptomic analysis.
To do so, we first developed regressors to estimate the composition of each ST spot at the gene expression level considering the 15 previously described histomorphological categories grouped into a total of nine categories: tumor, stroma, necrosis, fat tissue, vessels, lactiferous ducts, in situ, TLS and lymphocytes (Supplementary Table 3). These regressors accurately predicted most histomorphological categories (Supplementary Fig. 2a, Supplementary data 6). We then designed a deconvolution method to estimate the presence of each of the nine morphological categories at the spot level from the ST gene expression data, bridging morphological information to ST data (details in METHODS). This allowed us to compute tumor and stroma pseudobulks (PB), which are a numerical approximation of the gene expression profiles deriving specifically from tumor and non-tumor cells, respectively (Fig. 4a and Supplementary Fig. 2b). Of note, stroma compartment was defined as the sum of all categories, except tumor, in situ and necrosis categories.
We next computed the TNBC molecular subtypes on the tumor and stroma PB separately in order to dissect which molecular and cellular features of the tumor or stroma compartment drive the TNBC classification. Of note, similar analyses were performed on the global pseudobulk of the whole ST array, mimicking the most standardly used bulk RNA sequencing (Fig. 4a, b; Supplementary data 7). As shown in Fig. 4b, the TNBC classification based on the tumor compartment only identified three subtypes, namely LAR, M and BL suggesting that IM and MSL subtypes as defined by bulk RNA analysis rely on stroma features. IM subtype is defined by BL tumors with high levels of infiltrating lymphocytes in contact with tumor cells or in the stroma, corresponding to the FI and SR TIME classes respectively. MSL subtype is mostly defined by M tumors associated with MSL stroma whereas M subtype was composed of M tumors associated with either M or MSL stroma. Finally, the BL subtype was mainly characterized by basal tumors with SR TILs, whereas LAR subtype is composed by LAR tumors with the absence or low levels of MR TILs. Of note, the few samples classified as IM in the tumor compartment (N = 6) and BL in the stroma compartment (N = 5) were probably artefactual and due to the limitation of the deconvolution method to discriminate the contamination of immune and tumor cells in each compartment respectively (Fig. 4b). These results demonstrate that both tumor and stroma compartments are essential for the TNBC classification and highlight the extent of the TNBC subtype heterogeneity that cannot be captured by bulk RNA seq.
We then investigated the biological processes that characterize each tumor and stroma compartment underlying the TNBC classification (Supplementary Tables 4–7). When focusing on the tumor compartment, M tumors were associated with high epithelial-mesenchymal transition (EMT) signaling and expression of PDGFRA and NTRK2 genes. LAR tumors were enriched with metabolism and PI3K/AKT/mTOR signaling, while BL tumors showed higher proliferation and DNA repair hallmarks (Fig. 4c, Supplementary Fig. 3a, Supplementary Tables 8 and 9). Similar analyses performed on the stroma compartment revealed that MSL stroma showed higher angiogenesis and cancer-associated fibroblast (CAF) signals while M stroma was associated with higher neutrophil infiltration and lower immune signals, consistent with M stroma exhibiting ID TIME phenotype. LAR stroma showed higher adipogenesis, whereas IM stroma was characterized by higher levels of several immune signatures, including TLS, tissue-resident memory cell (Trm) and interferon gamma signaling pathway inflammatory CAF (IFNγ-iCAF) signatures (Fig. 4c, Supplementary Fig. 3b, Supplementary Tables 10 and 11).
We next assessed whether the combinations of tumor and stroma components from different molecular subtypes within a given sample had an impact on clinical outcome (Supplementary Table 12). As previously highlighted, the M subtype can derive from the association with M stroma, or MSL stroma (Fig. 4b, d). Interestingly, patients assigned as M subtype had a better distant relapse-free survival (DRFS) when the stroma was MSL and enriched in immune activation signaling including two inflammatory CAFs namely IFNγ-iCAF and interleukin pathway inflammatory CAF (IL-iCAF), than when the stroma was M (Fig. 4e, f). Our findings were further corroborated using the TIME classification where the MSL and M stroma were predominantly associated with the MR and ID immunophenotypes respectively (Fig. 4b). These results indicate substantial intra-patient heterogeneity within each TNBC molecular subtype, with different contributions of tumor and stroma features associated with distinct molecular characteristics and clinical outcome.
Taking advantage of having ST data, we further explored the biological processes underlying the previously described tumor patches organization including number and size by performing a gene set variation analysis (GSVA) on the MSigDB hallmarks31 and other relevant gene signatures on the tumor and stroma PBs. This analysis revealed that tumor with larger tumor patches were enriched in proliferation (‘MYC targets v2’, ‘CIN70’, ‘GGI’) and immune signaling (‘Immune2’, ‘TLS_Lundeberg’, ‘Immune1’) whereas tumor organized in smaller patches had higher metabolism (‘xenobiotic metabolism’, ‘fatty acid metabolism’, ‘oxidative phosphorylation’) and hormone-related signaling (‘estrogen response late/early’, ‘androgen response’) (Supplementary Fig. 4a, b).
Characterization of TLS and its association with response to immunotherapy
TLS are ectopic lymphoid organs that have been shown to be associated with favorable prognosis and response to therapy including immunotherapy32,33,34,35,36. The underlying mechanism is thought to be a better tumor antigen presentation to T cells, enhancing a specific anti-tumor response37.
Currently, there is no consensus for TLS detection from H/E or IHC-stained tissue sections38. Moreover, the gene expression of TLS is difficult to obtain, as they are small aggregates composed of different cells which could be in common with TILs or non-TLS immune aggregates. For this reason, both bulk and single-cell sequencing fail to interrogate their gene expression profiles. ST analysis gave us the unique opportunity to specifically characterize these structures. This allowed us to derive gene expression profiles specific to TLS compartments by deconvoluting the ST gene expression data (Supplementary Fig. 2a, Supplementary data 6 and 8). As shown in Fig. 5a, the projection of TLS estimated by the previously developed regressor co-localizes with histo- and morphologically defined TLS demonstrating the reliability of the regressor to predict TLS from gene expression data.
We then interrogated how TLS composition differed from non-organized lymphocyte compartment. As expected, TLS were enriched in all B cell subsets, CD4+ (central) memory and naïve T cells, as well as mast cells (Fig. 5b and Supplementary Fig. 5a). A functional enrichment analysis showed an activation of ‘mitotic G2/M transition checkpoint’, ‘vascular endothelial cell proliferation’, ‘V/D/J recombination’ and ‘regulation of cell chemotaxis to fibroblast growth factor’ in TLS compared to single lymphocytes, suggesting that TLS formation may be associated with immune cell proliferation, angiogenesis, adaptative humoral response and tissue remodeling (Fig. 5c, Supplementary Fig. 5b, c, Supplementary data 9–12).
We then aimed to develop a specific TLS signature (details in METHODS) by comparing gene expression data from TLS versus lymphocyte compartment as well as TLS versus other non-immune-related compartments (Fig. 5d, Supplementary data 13 and 14). This led to the development of the 30-gene TLS ST signature which includes eleven B cell-specific genes (CD79A, CD79B, TNFRSF13C, BLK, CD22, CD37, MS4A1, NIBAN3, CD19, IKZF3, LINC00926), six TLS priming genes (CXCR5, LTB, SELL, CCL19, POU2AF1, CXCL13), four immunoglobulin genes (FCRLA, VPREB3, FCMR, AL928768.3), three T cell-specific genes (RASGRP2, TCF7, RIPOR2), four immune response-related genes (RAC2, IL16, CCR7, CD52) as well as TCL1A and ATP2A3 genes (Supplementary Table 13). As expected, there was an overlap between the genes of our signature and previously reported TLS signatures16,24,39, strengthening the reliability of our ST approach (Supplementary Fig. 6a). We further compared the performance of our TLS ST 30-gene signature with the other TLS signatures in discriminating TLS from the nine histo-pathological categories, in particular TILs (represented by the lymphocytes compartment), and showed that our TLS signature demonstrated the highest specificity to the TLS compartment (Supplementary Fig. 6b, Supplementary data 15).
Furthermore, the projection of our TLS ST signature on the ST slide accurately overlapped with the regions annotated as TLS by the pathologist (Fig. 5a, e). The high accuracy of TLS prediction by our signature was quantitatively assessed by the area under the curve (AUC) (Supplementary Fig. 6c). Even when compared with other TLS signature, it demonstrated its high specificity for TLS detection (Supplementary Fig. 6d, Supplementary data 16 and 17). As expected, the highest levels of the TLS signature were observed in the IM subtype and FI tumors, whereas the lowest levels were observed in the M subtype and ID tumors (Fig. 5f, g, Supplementary data 18 and 19).
We next evaluated the predictive value of our TLS ST signature for clinical outcome and response to immunotherapy using our ST cohort, publicly available BC datasets (Molecular Taxonomy of Breast Cancer International Consortium (METABRIC)40, SCAN-B41 and I-SPY242), as well as a dataset of metastatic non-breast tumors treated with immunotherapy43. As shown in Fig. 6a, patients with high expression levels of the TLS ST signature were associated with a good prognosis in our ST cohort and in external METABRIC and SCAN-B TNBC cohorts. Of interest, higher levels of the TLS ST signature were associated with higher pathological complete response (pCR) rates mainly in early-stage TNBC patients treated with immunotherapy in the I-SPY2 study (Fig. 6b and Supplementary Fig. 7a, b). Similar results were found in non-BC cohorts treated with immunotherapy including metastatic melanoma, pancreatic and bladder cancers, where the TLS ST signature outperformed other gene signatures in predicting clinical outcomes (Fig. 6c, d and Supplementary Fig. 7c, d). This suggests a key role of TLS in obtaining a sustainable, adaptive antitumor immune response in the vicinity of the tumor area.
Finally, we assessed whether our TLS ST signature predicted response to immunotherapy independently from previously reported immune signatures. As illustrated in Fig. 6e, our TLS signature retained its predictive ability of response to immunotherapy after adjusting for various immune signatures, including other TLS signatures. In contrast, the predictive efficacy of several immune signatures diminished after controlling for our TLS ST signature (Fig. 6f, Supplementary Figs. 8 and 9). Overall, these results highlight the importance of an organized immune response in predicting response to immunotherapy that can be efficiently and specifically captured by our ST signature.
Identification of fourteen spatial molecular patterns shared across TNBC patients
To further assess intra-tumor spatial heterogeneity, we performed an unsupervised K-means clustering of the spots of each sample based on ST expression data allowing the identification of spatial molecular patterns (Fig. 7a, details in METHODS). This analysis revealed 418 clusters (range 2–8 clusters per sample) among the 94 TNBC samples, each cluster corresponding to a set of spots sharing a similar gene expression pattern (Fig. 7b, Supplementary Fig. 10).
In order to investigate the intra-tumor heterogeneity according to the TNBC classification, we first assigned each individual cluster to a specific molecular subtype. As shown in Fig. 7c, the majority of the TNBC samples harbored clusters from two to three different molecular subtypes with only 16 among the 94 tumor samples being only composed of clusters from the same subtype as the global PB. Half of them belonged to the IM subtype with a FI immunophenotype (N = 9/16). The majority of the BL subtype encompassed one or two IM clusters corresponding to a SR immunophenotype (N = 11/13) whereas 75% of the M subtype included both M and MSL clusters associated with either SR or ID immunophenotype, respectively (N = 8/21 for each class). Similarly, many LAR tumors contained MSL clusters with ID immunophenotype (N = 5/9) (Supplementary data 20). These data revealed the extent of intra-patient heterogeneity beyond TNBC classification and its association with TILs localization.
We next interrogated whether individual clusters were shared across patients by performing an inter-patient megaclustering of the 418 individual intra-patient clusters with the K-means algorithm (details in METHODS). This analysis revealed 14 megaclusters (MC) that recapitulate the spatial gene expression heterogeneity that characterizes TNBC (Fig. 7a, d, Supplementary Fig. 11a, Supplementary data 21–23). The projection of the MCs is illustrated in Fig. 7b and Supplementary Fig. 10.
Characterization of each MC including morphological annotations, TNBC classification and gene expression features is summarized in Fig. 7d and Supplementary Table 14. MC 1, 2, 3, 5 and 6 were characterized by high tumor content, high proliferation and low immune signal, in contrast to MC 7, 9, 10, 14 associated with high levels of several immune signatures (Supplementary Fig. 11b, c). MC 9 showed the highest level of TLS signature (‘TLS ST’) indicating an organized immune response in contrast to MC 11, 12 and 13, displaying features in line with immune inhibition including stroma activation. A high expression of the CD73 gene in MC 11 and 14 suggests an activation of the adenosine pathway44 (Supplementary Fig. 12). Furthermore, MC 14 exhibited an infiltration of innate immune cells with high levels of dendritic cells (DC), monocytes and macrophages. Together, MC 13 and 14 were broadly enriched in adipocytes (Supplementary Fig. 13).
Interestingly, MC 1 to 6 were characterized by high expression of DNA repair signatures, with MC 1 displaying the highest expression of homologous recombination repair and mismatch repair genes as well as high expression of several demethylase enzymes (Fig. 7d, Supplementary Fig. 11b and 12). MC 1 was most likely infiltrated by polarized Th2 cells45, while MC 2 was more enriched with pro-tumorigenic neutrophils46 (Supplementary Fig. 13). Furthermore, MC 2 expressed higher level of the CD47 gene, which plays a role in the maintenance of neutrophils associated with a delay in neutrophil apoptosis47 (Supplementary Fig. 12). MC 5–6, mainly found in IM and BL subtypes, presented high mTORC1 and PI3K/AKT/mTOR signaling, whereas MC 11 and MC 14, identified in M and MSL subtypes, exhibited high levels of angiogenesis and EMT. MC 8, which was specific to the LAR subtype, displayed the highest expression of the AR gene as well as high expression of several signatures capturing metabolism and adipogenesis (Fig. 7d, Supplementary Fig. 11b, c).
We finally assessed whether individual MC could predict clinical outcome. To do so, we re-estimated the MC contributions by deconvolution at the spot level, and then performed univariate and multivariate (adjusted for clinic-pathological characteristics) survival analyses (Supplementary Fig. 14). As illustrated in Fig. 7e, MC 5 was associated with better outcome, in contrast to MC 2 and MC 12 showing a trend toward poor survival. To validate our findings in other datasets, we developed a deconvolution method (details in METHODS) to assess the presence of each MC from bulk gene expression data (Supplementary Figs. 15–17). Interestingly, the association between all MCs and clinical outcomes showed a consistent trend in a large cohort combining the METABRIC and SCAN-B datasets (Fig. 7f, Supplementary Fig. 18). Additionally, in this extensive dataset, survival analyses identified an additional MC, specifically MC 9, which showed a trend toward being associated with better outcome and is characterized by an organized immune response (Fig. 7f).
Altogether, these findings demonstrated that tumors share similar morphological, molecular and cellular features captured by 14 distinct MCs associated with potential clinical implications. Of note, despite being derived from ST data, the MCs could be effectively captured using bulk RNA seq analysis alone.
Nine SAs recapitulate TNBC ecosystem with potential clinical implications
Here, we investigated TNBC ecosystem by assessing whether specific MCs coexist within samples and in which proportions. Hierarchical clustering on the MC proportions (details in METHODS) revealed 9 TNBC SAs, each one being determined by a combination of the 14 MCs (Fig. 8a, b, Supplementary Fig. 19a, Supplementary data 24). Each TNBC sample was assigned to one of the 9 SAs. While some SAs were more heterogeneous according to the TNBC molecular classification, others were more specific to a particular subtype. As illustrated in Fig. 8c, IM tumors were enriched in SA 4, whereas LAR tumors were exclusively represented by SA 5 (Supplementary Fig. 19b).
We then investigated the molecular and cellular processes that characterize each SA. SA 1 to SA 4 mostly exhibited FI and SR immunophenotypes, while SA 5 to SA 9 displayed MR and ID immunophenotypes (Fig. 8a). SA 1, SA 3, and SA 4 exhibited enrichment in immune signals (Supplementary Figs. 19c–21, Supplementary data 25). Notably, SA 4 demonstrated an organized immune response, characterized by high levels of tertiary lymphoid structure (‘TLS ST’ signature) and infiltration by IFNγ-iCAF, coupled with low stroma activation and low EMT (Fig. 8d, e, Supplementary Fig. 20b, d, Supplementary data 26). These observations collectively suggest a highly immunogenic SA. Furthermore, cell-type enrichment analysis revealed that SA 4 was enriched in B cells, plasma cells, CD8+ central memory and naive T cells, macrophages and DCs (Supplementary Fig. 20a, c; Supplementary Tables 15 and 16).
As an exploratory analysis, we also interrogated if SAs exhibited therapeutic vulnerabilities through the expression of specific genes or pathways deregulation. Of interest, SA 1 and SA 4 showed higher expression of the “VCpredTN” signature, previously associated to response to PARP inhibitors42,48,49,50 (Fig. 8e, Supplementary Fig. 20b, d). SA 2 was associated with higher expression of TACSTD2 and ERBB3 genes coding for Trop-2 and HER3 proteins, respectively, with Trop-2 targetable with anti-Trop-2 antibody-drug conjugate (ADC) and HER3 known as a member of human epidermal growth factor (EGFR/HER) family of receptor tyrosine kinases which could be targeted by anti-HER3 therapies51,52 (Fig. 8e, f). SA 3 and SA 4 exhibited enrichment in PD-L1 expression, which was globally associated with immune infiltration53 (Fig. 8f). SA 7 was characterized by a high immunosuppressive ‘adenosine’ pathway (CD73 gene), thus being potentially sensitive to anti-CD7354,55, whereas SA 8 showed high NECTIN4 expression, coding for Nectin-4 which is a target for a specific ADC56,57 (Fig. 8e, f, Supplementary Fig. 21). In addition, the LAR-specific SA 5 showed relatively higher expression of ERBB2 gene and activation of several metabolism-related pathways suggesting sensitivity to anti-HER2 and antimetabolite drugs58 (Fig. 8e, f).
In order to validate our findings, we designed a method to recover SAs from bulk gene expression data (details in METHODS) and assigned one SA per sample in several TNBC cohorts, including our ST series, METABRIC and SCAN-B. For that, bulk RNA seq analysis was performed on our ST cohort. Most SAs could be recapitulated from bulk gene expression data, SA 6 proving to be the most challenging (Supplementary Figs. 22–28, Supplementary data 27).
Survival analyses revealed that patients with SA 4 tumors exhibited the highest survival rates, while patients with SA 8 tumors experienced the poorest outcomes (Fig. 8g, h, Supplementary Figs. 29–33). Interestingly, within the SA 4 group, patients with IM tumors showed a tendency towards a more favorable prognosis compared to those with IM tumors from other SAs (Fig. 8h). This observation underscores the significant heterogeneity intrinsic to the IM subtype.
These results demonstrate that ST analyses revealed the complexity of TNBC ecosystem beyond TNBC classification, with the identification of clinically relevant SAs, some of them being associated with survival and which could be considered as potential therapeutic targets (Fig. 9, Supplementary Table 15).
Discussion
In the past decade, accumulating evidence derived from bulk tumor analyses showed that TNBC is a heterogeneous disease with the presence of at least five molecular subtypes associated with distinct clinical outcomes and response to neoadjuvant therapy6,7,8,10,11,12,59. One of the major limitations of bulk multiomics analyses is that they ignored tumor geographic heterogeneity, including cell-cell interactions between different tumor clones and their microenvironment. Despite the recent development of novel therapeutic options such as immunotherapy and PARP inhibitors, there is still a substantial proportion of TNBC patients who experience disease recurrence, highlighting the need to better characterize tumor heterogeneity to improve patients’ care60. Only a few reports evaluated the heterogeneity of BC using spatially resolved transcriptomics, either in combination with single-cell analysis or separately. Furthermore, they only assessed a limited sample size24,25,26,27,28.
Here, we applied ST technology on the TNBC cohort to gain deeper insights into tumor heterogeneity. We first performed a meticulous manual histomorphological annotation using standard H/E-stained slides, demonstrating several differences in morphological composition and tumor organization that portray each TNBC molecular subtype. We notably showed that LAR and MSL tumors were organized in small tumor patches associated with high metabolism and EMT signaling, as compared to BL and IM, which display highly proliferative large tumor patches. These findings show the potential of applying deep learning algorithms in digital pathology for the identification of TNBC molecular subtypes using standard-of-care H/E slides, thereby avoiding the use of costly and less accessible RNA seq technology in the clinic61,62.
In order to study the underlying cellular and biological processes shaping the architecture that characterizes each TNBC subtype, we developed a deconvolution method allowing to depict gene expression signals from nine different histomorphological categories capturing tumor and stroma compartments, with high accuracy for most of the categories. This spatial deconvolution allowed us to show that tumor and stroma compartments have different contributions to TNBC molecular classification, with potential clinical implications. For instance, the IM subtype was predominantly composed of BL tumor cells with extensive lymphocytic infiltration, whereas the MSL subtype was mostly defined by M tumors associated with specific stroma features. Interestingly, M subtype population associated with M in the stroma had a poorer outcome as compared to M subtype associated with MSL in the stroma characterized by high immune signatures including TLS and B cell enrichment. This highlights the power of ST technology to dissect intratumoral heterogeneity at an unprecedented level, complementing TNBC subtypes.
We next investigated how spatial expression data can inform us on the cellular and molecular features that drive tumorigenesis beyond TNBC classification. An unsupervised clustering of the spots from all samples revealed the presence of 14 spatial molecular patterns that were shared among patients, further referred as megaclusters, and which recapitulate TNBC gene expression architecture. Each of these megaclusters showed specific molecular and cellular characteristics with some of them being associated with prognosis. This analysis revealed the extent of intra-patient heterogeneity since each sample harbors one to six different megaclusters. Of note, we identified a megacluster characterized by high metabolism and adipogenesis signaling that was specific to LAR tumors, highlighting the distinctive biology of these tumors. Finally, we derived nine SAs depicting the coexistence of several megaclusters within samples recapitulating TNBC ecosystem. Importantly, all SAs could be robustly captured using bulk gene expression data from external datasets, demonstrating the transferability of the results derived from such a complex technology into a tool that can easily be used in the clinic. Interestingly, we identified two distinct immune-enriched SAs, SA 3 and SA 4, exhibiting different clinical outcomes that cannot be captured by the IM TNBC molecular subtype or routine biomarkers, such as TILs. SA 4, in particular, demonstrated an organized immune response associated with better survival. These findings suggest that variability in these archetypes may explain why some patients do not respond to immunotherapy despite having highly immune-infiltrated tumors or experience relapse despite achieving a complete pathological response. This variability may also provide a rationale for de-escalating pembrolizumab-chemotherapy combinations in the neoadjuvant setting of TNBC treatment53.
The identification of these SA is of particular relevance for the clinic since some of them were characterized by targetable deregulated pathways including homologous repair deficiency (HRD), immunosuppressive adenosine pathway or high expression of ERBB3 and NECTIN4 that can potentially benefit from PARP inhibitors, oleclumab, an anti-CD73 inhibitor or specific ADCs such as patritumab deruxtecan and enfortumab vedotin targeting ERBB3 and Nectin-4 respectively52,55,56,57. The latter may be of particular importance given the worse clinical outcome observed in NECTIN4-high SA 8 patients (Fig. 9).
Moreover, considering the distinctive biology of LAR tumors both in bulk and at the ST level, SA 5 primarily consists of LAR tumors, suggesting that those tumors should be considered as a distinct BC entity, alongside luminal, HER2+, and triple-negative BC. Given the limited benefit to anti-androgen treatment in AR-positive TNBC tumors, one may hypothesize that these patients may benefit from a combination of anti-androgens with antimetabolite drugs or PIK3CA inhibitors11,63,64,65,66.
Recently, immunotherapy became the standard of care for early-stage TNBC patients. It is important to note that not all patients benefit from this treatment, highlighting the need for the identification of predictive biomarkers to select those who are most likely to benefit, avoiding unnecessary costs and potential lifelong toxicities. Although TILs and TLS have demonstrated an association with better outcome and favorable response to immunotherapy, biomarkers predicting response to immunotherapy in early-stage BC are still lacking in the clinic67,68. At present, there is no agreement regarding the optimal method for detecting TLS from H/E- or IHC-stained tissue sections37,69. Taking advantage of having spatial gene expression data, we developed a specific TLS signature and demonstrated its prognostic and predictive value to immunotherapies in TNBC, and importantly also in other tumor types. Of interest, our TLS ST signature outperformed the predictive value of other immune-related signatures, and therefore appears as a promising biomarker able to identify patients who would benefit from immunotherapy.
Our study faced several challenges and limitations. The resolution of STs has dramatically increased, from our ST technology (2 K spots of 100 μm diameter) to the most recent Visium (4992 spots of 55 µm diameter) and Visium HD technology (2 × 2 µm barcoded squares), which approaches single-cell scale70. Our spots were analyzed as individual mini-RNA bulks containing up to 200 cells, which obscured the cellular and molecular heterogeneity within each spot. This limitation led to issues such as the presence of the BL subtype in the stroma compartment and the IM subtype in the tumor compartment, due to the discrimination limits of the regression tool (Fig. 4b). Additionally, deeper analysis of small structures like TLS is constrained, as only a few spots (1–3) cover one TLS. The analysis of TLS composition and the spatial distribution of different cell types within them would benefit from recent technologies. Another limitation is the validation of the clinical relevance of our results in external cohorts. We are eagerly awaiting access to data from clinical trials to assess the clinical utility of the SAs and the TLS ST signature. Finally, the high costs associated with ST technology limit its routine clinical application. However, we were able to derive the SA from lower-resolution data, such as bulk RNA seq, which is more aligned with current clinical practices, given the existing use of prognostic signatures in luminal BC71,72. Moreover, future research could explore other low-resolution biomarkers, such as those identified through IHC, as potential surrogates for these SAs in clinical practice.
In conclusion, our study offers deeper insights into the extent of intra- and inter-patient heterogeneity in TNBC. Our results exemplify the benefit of employing ST in unraveling the complexity of the TNBC ecosystems, emphasizing the importance of considering TNBC heterogeneity in patients’ care and for the development of future clinical trials.
Methods
Study population and sample collection
In this retrospective study, 94 early-stage TNBC patients treated at the Institut Jules Bordet (Brussels, Belgium) with standard-of-care therapies between 2000 and 2016 were included. TNBC patients were selected based on their negative status for estrogen (ER) and progesterone (PR) receptors and the absence of HER2 amplification. Information regarding germline mutations in BRCA1 and BRCA2 genes was extracted from the patients’ medical records. All patients underwent initial surgery, followed by adjuvant chemotherapy and/or radiotherapy. For each patient, a clinical follow-up of approximately five years was required, unless the patient experienced a specific event. Clinical outcomes were defined following STEEP Version 2.0 criteria73. Five standardized endpoints were reported as follows: 1. Recurrence-free survival (RFS) as the time interval from the date of diagnosis to death related or not to BC, or any recurrence (distant or locoregional), excluding contralateral BC; 2. Invasive breast cancer-free survival (iBCFS) as the time interval from the date of diagnosis to death related or not to BC, or any recurrence (distant or locoregional), including contralateral BC; 3. Invasive disease-free survival (iDFS) as the time interval from the date of diagnosis to death related or not to BC, or any recurrence (distant or locoregional), including contralateral BC and second primary non-BC related invasive cancer; 4. Distant relapse-free survival (DRFS) as the time interval from the date of diagnosis to death related or not to BC or any distant recurrence; 5. Overall survival (OS) as the time interval from the date of diagnosis to death related or not to BC (Supplementary Table 12). For patients with a longer follow-up, clinical outcomes were censored at 10 years.
For each patient, a frozen surgical tumor tissue sample stored at −80 °C was collected from the institutional tissue bank with their prior consent. For two patients, surgical samples of locoregional relapse were also collected. The selection of the samples was based on the pre-ST screening H/E slide of the frozen tissue satisfying the criteria of tumor cellularity >15% and sample size compatible with the dimensions of the ST array. Therefore, a total of 96 TNBC samples were profiled using ST. The samples were collected in concordance with the Declaration of Helsinki and the study was approved by the Local Ethical Review Board before initiation of the work (research ethics reference: CE2862). Clinical and pathological data were collected in Supplementary data 1.
Each TNBC sample was profiled for ST as triplicates consisting of three consecutive sections (16 μm-thick). As such, a 3D-stack of spatial gene expression could be obtained for each sample after image alignment. The following five consecutive sections (8μm-thick) were collected for further analysis, one of them being used for lymphocytes B and T detection using double immunohistochemistry staining with anti-CD20 antibody (ready-to-use mouse monoclonal, IR604, Dako, USA) and anti-CD3 antibody (rabbit monoclonal, IR503, Dako, USA). CD3 was applied first to the tissues section and detected with the ultraView Universal DAB Detection Kit (760-500, Roche Ventana Medical Systems, Inc.). CD20 was applied next to the same tissue and detected with an UltraView Universal Alkaline Phosphatase Red Detection Kit (760-501, Roche Ventana Medical Systems, Inc.)74. The IHC could not be performed for eight samples due to a lack of material (from TNBC 21 to TNBC 28). Finally, ten additional consecutive sections (8μm-thick) were used to extract RNA for bulk sequencing.
External validation cohorts
BC datasets
Data from BC cases were collected from three external BC datasets: METABRIC (335 TNBCs), SCAN-B (672 TNBCs) and I-SPY2 (987 BCs including 363 TNBCs). The latter dataset was specifically used to assess response to immunotherapy in early-stage BC patients.
The METABRIC dataset is a product of the METABRIC40. Data were downloaded from https://www.cbioportal.org/study/summary?id=brca_metabric. It contains normalized RNA microarray profiling of 2509 fresh-frozen BC samples performed on the Illumina HT-12 v3 arrays. TNBCs were identified as having both the “ER_STATUS” and the “HER2_STATUS” fields as negative. Clinical outcomes were updated with a more recent publication of the same group75. OS and DRFS were assessed following the same criteria as the ones used in our ST cohort (described in Methods: “Study population and samples collection”). EFS was defined as the time interval from the date of diagnosis to death related or not to BC, or any recurrence (distant or locoregional) without specific precision about the contralateral BC.
The SCAN-B dataset comes from the Swedish Cancerome Analysis Network–Breast study (SCAN-B, ClinicalTrials.gov ID NCT02306096)41. Expression data, obtained by whole transcriptome RNA sequencing, were downloaded from https://data.mendeley.com/datasets/yzxtxn4nmd. The library protocol of the adjusted version was used. TNBC patients were identified based on negative status for both HER2 and ER. Clinical outcomes such as OS followed the same criteria as the ones used in our ST cohort. By adding death as an event (including if the death occurred less than 1 year after the end of the last follow-up), we refined distant recurrence-free interval (DRFi) and breast cancer-free interval as DRFS and iBCFS, respectively, according to the STEEP version 2.0 criteria along with the times to event being adapted if needed.
The I-SPY2 dataset comes from the adaptive phase II I-SPY2 trial, in which multiple experimental groups were compared to evaluate new agents in BC in the neoadjuvant setting42. Expression data were downloaded from the GEO database (GSE194040). Clinical data, including the pathological complete response (pCR), were downloaded from the article “Supplementary information”. TNBCs were selected using the “Receptor.Subtype” field from the clinical data.
Non-BC immunotherapy datasets
Fourteen datasets with available gene expression data were downloaded from https://www.orcestra.ca/clinical_icb, for a total of 1073 patients with diverse metastatic cancers treated with immune checkpoint blockades, including 370 melanoma, 296 renal, 195 bladder, 51 urothelial, 31 lung and 128 other cancers43. Available clinical outcomes were progression-free survival (PFS) (N = 572 patients), OS (N = 856 patients) and RECIST radiological response (N = 842 patients).
ST experiments
ST Arrays
The ST microarrays were generated as a grid of 1934 printed spatially barcoded spots of 100 μm diameter containing approximately 200 million uniquely barcoded (spatial barcodes) oligonucleotides with poly-T20 VN capture regions per spot and a center-to-center distance of 150 μm. The dimensions of the ST microarray are 6.2 mm × 6.4 mm and each glass slide contains six ST microarrays (=subarrays) allowing up to six different tissue sections.
Tissue handling, staining, and imaging
The protocols used in our study have been described previously in Andersson et al.24, Ståhl et al.29, Salmén et al.30, and Ji et al.76. In short, fresh-frozen material was sectioned at 16 μm. After placing the tissue on top of the barcoded microarray, the glass slide was warmed at 37 °C for 1 min for tissue attachment and fixed in ~4% formaldehyde (Sigma-Aldrich, F8775) diluted in PBS (phosphate-buffered saline, Medicago, 09-9400) for 10 min at room temperature (RT). The slide was then washed briefly with 1× PBS. The tissue was dried with isopropanol (Fisher Scientific, A461-1) before staining. The tissue was stained with Mayer’s hematoxylin (Agilent, S3309) for 4 min, washed in Milli-Q water, incubated in bluing buffer (Agilent, CS702) for 2 min, washed in Milli-Q water, and further incubated for 1 min in 1:20 eosin solution (Sigma-Aldrich, HT110216) in Tris-buffer (pH 6). The tissue sections were dried for 5 min at 37 °C and then mounted with 85% glycerol (Merck, 104094) and a coverslip. Imaging was performed using the Zeiss AxioImager 2Z microscope and the Metafer Slide Scanning System (Metasystems) at ×20 magnification. The images were processed with the VSlide software (v1.0.0). After the imaging was complete, the coverslip and remaining glycerol were removed by dipping the whole slide in Milli-Q water followed by a brief wash in 80% ethanol and warming for 1 min at 37 °C.
Permeabilization and cDNA synthesis
Using the pre-permeabilization mix for most tissue types from the protocol, the pre-permeabilization was carried out using 0.4% Collagenase 1 (Thermo Scientific, 17018-029) in BSA (Bionordika, B9000S) and HBSS buffer (Thermo Fisher Scientific, 14025-050) and was incubated for 20 min in 37 °C. The pre-permeabilization was followed by incubation in 0.1% pepsin-HCl (Sigma-Aldrich, P7000-25G, pH 1) for 10 min at 37 °C to permeabilize the tissue. A cDNA-mix containing Superscript III (Thermo Fisher, 18080085), RNaseOUT (Thermo Fisher, 10777019), 0.1 M DTT (Thermo Fisher, included with Superscript III), dNTPs (Thermo Fisher, R0191), BSA (Bionordika, B9000S), Actinomycin D (Sigma-Aldrich, A1410-2MG) was added after which the slide was incubated at 42 °C overnight (~18 h). The tissue was washed with 0.1× SSC (Sigma-Aldrich, S6639) between each incubation step.
Tissue removal and cDNA release from the surface
For tissue removal, a two-step protocol was used30. Beta-Mercaptoethanol (Calbiochem, 444203) was diluted in RNeasy lysis buffer (Qiagen, 79216) and the slide was incubated for 1 h at 56 °C. The wells were washed with 0.1× SSC followed by incubation with proteinase K (Qiagen, 19131), diluted in proteinase K digestion buffer (Qiagen, 1034963), for 1 h at 56 °C. The wells were subsequently washed in 2× SSC + 0.1% SDS followed by 0.2× SSC and finally with 0.1× SSC after, which they were dried. The released mix consisting of second-strand buffer (Thermo Fisher, 10812014), dNTPs (Thermo Fisher, R0191), BSA (Bionordika, B9000S), and USER enzyme (Bionordika, M5505L) was added to each well and incubated for 2 h at 37 °C. After probe release, the spatial spots containing non-released DNA oligonucleotide fragments were detected by hybridization of fluorescently labeled probes and imaging, in order to obtain Cy3-images for image alignment and spot detection, as described in the protocol29,30.
Library preparation and sequencing
Parts of ST library preparations were carried out using an automated pipetting system (MBS Magnatrix Workstation), also previously reported30,77. Second-strand synthesis and blunting were carried out by adding DNA polymerase I (Thermo Fisher, 18010025), RNase H (Thermo Fisher, 18021071), and T4 DNA polymerase (Bionordika, M0203L). The libraries were purified and amplified RNA was generated by a 14 h in vitro transcription reaction using T7 RNA polymerase (Sigma-Aldrich, AM1334), supplemented with NTPs (Sigma-Aldrich, AM1334) and SUPERaseIN (Sigma-Aldrich, AM2694). The material was purified, and an adapter was ligated to the 3′-end using truncated T4 RNA ligase 2 (Bionordika, M0242L). Generation of cDNA was carried out at 50 °C for 1 h by Superscript III (Thermo Fisher, 18080085) in first-strand buffer (Thermo Fisher, included with Superscript III), RNaseOUT (Thermo Fisher, 10777019), DTT (Thermo Fisher, included with Superscript III) and dNTPs (Thermo Fisher, R0191). Double-stranded cDNA was purified, and full Illumina sequencing adapters and indexes were added by PCR using 2xKAPA HotStart ready-mix (Roche, KK2602). The number of amplification cycles needed for each section was determined by qPCR with the addition of EVA Green (Biotium, 31000). Final libraries were then purified and validated using an Agilent Bioanalyzer and Qubit before sequencing on the NextSeq500 (v2) at a depth of ~100 million paired-end reads per tissue section. The forward read contained 31 nucleotides with 46 nucleotides in the reverse read.
Bulk RNA sequencing
RNA was extracted from 10 consecutive sections of the ST sections from bulk TNBC samples using the AllPrep DNA/RNA/miRNA Universal kit (Qiagen, Germany). Quantification of the RNA was performed using the Qubit fluorometer technology according to the manufacturer’s protocol. Starting from 200 ng of total RNA, rRNA was depleted using the Ribo-Zero Magnetic Kit (Epicentre; Madison, Wisconsin) according to the manufacturer’s instructions. Library quality control and quantification were performed using the Fragment Analyzer (Advanced Analytical) and the QUBIT® (Invitrogen). Total RNA Library Prep was used for the preparation of the librairies. Sequencing was performed with a target read depth for rRNA-depleted total RNA (RiboZero Gold) of 30 × 106 reads on the Illumina NovaSeq 6000 platform in 2×100 bp paired-end mode at the BRIGHTcore (BRussels Interuniversity Genomics High Throughput core) sequencing facility of the ULB (http://www.brightcore.be). Reads were trimmed using Trimmomatic 0.3878. Genes were quantified using Salmon79 on the reference human genome hg38 and GENCODE release 38.
TILs and TIME classification
The percentage of TILs was evaluated by two dedicated breast pathologists on H/E-stained slides for each tumour sample following the international TILs working group guidelines80. Percentages of CD20+ B- and CD3+ T cells infiltration were determined by double IHC CD20/CD3 staining74. In addition, each TNBC sample was further classified according to the TIME classification into one of four immunophenotypes17: FI (characterized by intratumoral localization of TILs (iTILs)], SR (absence of iTILs, but presence of TILs in the stroma), MR (presence of TILs at tumor margins), and ID (low quantity of TILs).
Morphological analyses
Morphological annotations
Fifteen histomorphological categories were annotated by two dedicated breast pathologists from the same H/E slides, which correspond to one of the three consecutive sections designated for ST and also used for TILs quantification (Supplementary Table 2). The QuPath software (version 0.2.3) was used to the morphological annotation. Three categories were further annotated at the single-cell level using a machine learning approach available in the QuPath software for the identification of tumor cells, stroma (=fibroblasts) cells and lymphocytes. The other categories were manually delineated: high TILs stroma (estimated as ≥30% lymphocytes on the area), low TILs stroma (estimated as <30% lymphocytes on the area), acellular stroma (stroma formed by collagen fibers underlying different cell types of tumor microenvironment), fat tissue, lactiferous ducts, TLS, necrosis, vessels, nerve, heterologous element, carcinoma in situ, and tumor region (defined as tumor cells enriched area). Artifacts and whitespace were also annotated.
TLS were identified among overall TILs taking into consideration the double IHC CD20/CD3 staining, when available. In detail, TLS was defined as lymphoid follicles including a dense cellular aggregate (recapitulating the germinal center–GC) or lymphoid aggregates and lymphoid follicles without GC and consisted of CD20-positive B zones with CD3-positive T zone aggregates.
The morphological annotations were saved as.png files, superimposable to the original H/E slides. This allowed to extract the morphological composition of each spot from the annotated H/E.
Contribution of histomorphological categories
We quantified the contribution of each morphological annotation in each slide as the fraction of pixels set to that annotation out of the total number of annotated pixels. Artifact, whitespace and non-annotated pixels were removed for this analysis.
To resolve the overlap between certain histomorphological annotations, we merged some of them together (Supplementary Table 2). To do so, we defined the category “Tumor” as the sum of “tumor cells” and “tumor regions” and the category “Lymphocytes” as the sum of “lymphocytes” and half of the “High TILs stroma”. The category of “Stroma cells” was defined as the sum of the “stroma cells”, half of the “Low TILs stroma” and half of the “High TILs stroma” categories. We added to the “Acellular stroma” category half of the “Low TILs stroma” category. Of note, the contributions of acellular stroma, stroma cells and lymphocytes were estimated among the manual annotations of “Low and High TILs stroma” respectively. “Total stroma” or “Stroma” was defined as the sum of the acellular stroma and the stroma cells. In summary, the sum of “Tumor %” + “In situ %” + “Total stroma %” + “Lymphocytes %” + “Tertiary lymphoid structures %” + “Necrosis %” + “Fat tissue %” + “Vessels %” + “Lactiferous ducts %” + “Nerve %” + “Heterologous element %” = 100% (=total of the pixels from the morphological annotation).
Organization of tumor and stroma into patches
To assess the spatial organization of tumor and stroma cells, neighboring cells were grouped into “patches”. To this end, cells were dilated (artificially enlarged) using a circular kernel with a radius of 15 pixels, which corresponds ~1.5× the usual size of cancer cells. ‘Patches’ were then found by joining adjacent pixels. Several metrics were then calculated on those patches: size in pixels (Np), number of patches (N), the number of patches required to reach half of the total surface for that annotation, and “evenness” which is the Shannon entropy normalized by the number of patches (Supplementary Data 4–5). Evenness describes the uniformity of the patch size, measuring distribution equality (an evenness of 1 being a case where all patches have the same size, while an evenness close to 0 corresponds to a case with one dominant patch).
Data processing
Data preprocessing and normalization
Sequencing data were preprocessed as previously described29. Briefly, the demultiplexed reads were then filtered for amplification duplicates using the UMI with a minimal hamming distance of 2. The UMI-filtered counts were used in the analysis. ST-pipeline (v.1.6.0) used for the analysis is available at https://github.com/SpatialTranscriptomicsResearch/st_pipeline. High quality sequencing data for ST experiments were obtained for 281 out of the 288 arrays performed, corresponding to 94 out of 96 samples (TNBC 17 and 18 failed to be sequenced) (Supplementary data 2). For 93 samples, we obtained data for triplicate and for one sample (TNBC70), we only had duplicate data. A total number of 3,561,450,620 UMIs were generated.
Spot selections were performed following different steps. First, we filtered out spots extending outside of the tissue region or had more than 1% of their pixels annotated as “artifacts” (usually tissue folds). From a total of 3,561,450,620 UMIs and 537,069 spots, there remained 2,864,571,670 (82.6%) UMIs and 281,852 spots (52.5%) after this filtering. Most of the spots were removed because they were out of tissue: this concerned 46.4% of the spots and 17.4% of the UMIs. Furthermore, 5822 spots (1.1%) containing 78,864,709 UMIs (2.2%) were removed as artifacts. Finally, we removed 11,542 spots (2.1%) corresponding to 2,404,110 UMIs (0.07%), because they had less than 500 UMIs. We did not remove ribosomal protein genes (RPL and RPS), mitochondrial genes (MT−), nor MTRNR genes. After all filtering, there remained 270,310 spots and 2,862,167,560 UMIs for further ST analyses (Supplementary data 2).
Genes among the ST spots were normalized using:
with the sum being by spot.
Slides superposition
Two or three consecutive ST tissue sections, named subarrays (duplicate or triplicate) were available for each TNBC sample. These tissue sections were distant from each other only by 16 microns, and we found that there were relatively little biological differences between them. For the analysis, sections were considered as technical replicates. In order to compare them, we first needed to make them as stackable as possible using rotation, translation, and possibly reversion (if the tissue section was flipped upside down before being placed on the glass slide). These transformations allowed the analysis for each TNBC sample across the 3D axis and contributed to the identification of the ‘spatial genes’ across the two or three consecutive ST tissue sections (described in Methods: “Intra-patient clustering”). The transformations were performed compared to an unmodified reference slide, that was always the one that had been annotated by the pathologist. A custom tool in R was developed to this end using the Shiny package. Quality of registration was judged by eye and case-by-case.
Batch correction
Each tumor was assayed in duplicate or triplicate, on different “subarrays”. In some cases, there were significant differences in gene expression levels between those replicates, for which we corrected. Batch effects by gene were estimated with a negative binomial model using the glmGamPoi R package, size factors being estimated using the deconvolution method (i.e., size_factors = ”deconvolution” in glm_gp). Cases where no gene-specific effect was larger than 0.2 were considered as having no batch effect and were not corrected. For the others, a reference (“base”) subarray had to be defined. We decided to pick either the subarray that was closest to the mean of the full experiment, or the mean of all subarrays from the sample. Specifically, we calculated the correlations between the average of ST global pseudobulk (PB) defined across all samples and the global PBs of the subarrays, as well as with the mean of the sample’s PBs. The highest of those correlations was considered. If it was from one subarray, then that subarray was taken as the reference. If it was from the mean of the 2 or 3 subarrays, then the mean was taken as the reference. However, unnecessary batch correction could occur due to some technical issues such as tissue fold on one of the subarrays. We performed visual inspection and comparison of the projections on ST subarrays and for some TNBC samples, a manual correction for the reference was performed (Supplementary data 28). Each gene was then regressed out using the gene-specific coefficient.
Data analyses
Estimation of the fraction of each morphological category per spot
We estimated from the gene expression of ST spots, the fraction of each histomorphological category, as described in Methods: “Contribution of histomorphological categories”. A total of nine categories were retained: tumor, stroma, necrosis, fat tissue, vessels, lactiferous ducts, in situ, TLS and lymphocytes (Supplementary Table 3). This was done by firstly computing the fraction of pixels in each spot that belonged to each category. Then a regressor was designed to estimate those fractions from the spot gene expression data. The accuracy of the regressor was estimated using a leave-a-patient out approach. A final regressor was finally designed using all data, and applied to each subarray, annotated or not, to get the fraction of each histomorphological category in each spot.
Specifically, we used a linear booster regressor, implemented in the R package xgboost (version 1.6.0.1) (https://CRAN.R-project.org/package=xgboost), using the “logistic” objective function. Using leave-a-patient-out cross validation, we found that using 25 rounds and 250 components led to a good accuracy overall. We assessed the quality of the obtained regressions using AUC for each histomorphological category, binarized as present in a spot if proportion >25%, calculated with the R package pRoc81 (version 1.18.0) (Supplementary Fig. 2a). We used AUCs to judge the fit quality as it is independent of the prevalence of the histomorphological categories, which varies dramatically between them.
The regressed fractions were projected to all ST subarrays and visually inspected, showing that they fitted well with the morphological annotations (Supplementary Fig. 2b). The use of those regressed fractions allowed to generalize the annotations across all subarrays, including those lacking morphological annotations.
Gene expression signatures and gene set variation analysis
We estimated the contribution of biological processes using gene set analysis. First, we applied GSVA (R package version 1.44.5) to several gene sets. Those sets included 46 of the hallmark gene sets31 (version 7.0) downloaded from MSigDB (https://www.gsea-msigdb.org/gsea/msigdb/). We excluded the following 4 hallmark signatures due to a lack of association with tumor processes or microenvironment in BC: pancreas beta cells, spermatogenesis, UV response up, UV response down. We also applied GSVA to the following gene sets: Cancer-associated fibroblast S182 (CAF) and various CAFs subtypes83, Tissue-resident memory T cell84 (Trm), Tumor associated macrophages (TAM)85 and our in-house developed 30-gene TLS ST signature (described in Methods: “TLS signature”). GSVA was also applied on the “biological process” domain of the Gene Ontology, as downloaded from MSigDB (dataset c5, 9996 signatures in total).
Other signatures involving coefficients for each gene were not computed as GSVA but as a weighted mean of gene expression values. This concerned two external TLS signature24,39; two HRD-like signatures42, previously associated to the response to PARP inhibitors: VCpredTN and Parpi7; three proliferation signatures: Genomic Grade Index86 (GGI), GENE7087, CIN7088; two stroma signatures: Stroma189 and Stroma290; and two immune infiltration signatures Immune191 and Immune290. ESR1 and AR were simply taken as the normalized gene expression (Supplementary Table 4).
Single genes of interest
We established a list of interesting individual genes with a relevant biological and/or therapeutic role in TNBC using the recent literature (Supplementary Table 5). This list included 45 immune genes, 8 mismatched repair genes, 6 target genes for ADC therapy, 13 genes involved in signaling pathways, 5 genes involved in (de-)methylation, 5 angiogenesis-related genes, 11 cell-cycling-related genes, 3 genes involved in the hormonal status and 19 genes involved in HRD.
xCells deconvolution
The xCell package (https://link.springer.com/protocol/10.1007/978-1-0716-0327-7_19) was used for cell type enrichment analyses. The enrichment scores obtained by xCell are similar to fraction of a given cell type, for each cell type from a specific PB. We selected the 48 most relevant cell types: 6 B cell, 8 CD4 + - and 4 CD8 + - T-cell subpopulations, 3 other lymphocytes (NK, NKT and Tgd cells), 11 myeloid derived subpopulations, 5 stroma related cell types, 5 stem/progenitor cell types, as well as adipocytes, preadipocytes, epithelial cells and neurons (Supplementary Table 6).
TNBC molecular subtyping
Lehmann’s classification
Lehmann’s molecular classification is composed of six stable subtypes: two basal-like subtypes (BL1 & BL2), an IM subtype, a LAR subtype, a M subtype, and a MSL. Lehmann’s molecular subtypes were assigned using a re-implementation of the published method, based on the published list of genes positively and negatively associated with each subtype6. Briefly, each gene was first standardized to a mean of 0 and a standard deviation of 1. Positive and negative signatures were calculated for each subtype as the mean of the genes positively or negatively associated with that subtype. A subtype score was obtained using the difference between those positive and negative signatures. Each TNBC sample was assigned to the TNBC molecular subtype with the highest score. The main difference with the published method is the lack of the unstable subtype, as each sample was associated with a subtype.
Bareche’s classification
In line with Bareche et al.10, samples classified from our re-implementation of Lehmann’s classification as BL2 were reclassified using the second highest score. In this setting, the Basal-like 1 (BL1) subtype was referred as the BL subtype. We used only Bareche’s molecular classification system for our analyses.
Molecular subtypes on a subset of ST spots or at the single-spot level
The calculation of subtypes starts with a gene normalization and standardization. Because of this, if the subtyping is done on PBs based on a given histomorphological category (i.e., tumor spots), the standardization would be very different than what would be obtained on a real bulk sample.
For this reason, for subtypes calculated on a subset of spots or even a single spot taken from the ST, we standardized the genes based on the means and standard deviations obtained across the samples global PBs, i.e., we used the same standardization as the one used to determine the sample PBs subtypes.
Deconvolution methods
Morphological deconvolution
We defined the full slide pseudobulk (=global PB) as the sum of the reads from all the selected spots of a sample’s ST subarray. We endeavored to estimate the PB that would be obtained on the different histomorphological compartments, each potentially being a mixture of cells (e.g., ‘necrosis’ is a mixture, while ‘tumor’ is a single type of cells). To find those PB expressions, a deconvolution-based method was designed. For this method, it was necessary to have an estimate of the proportion of each morphological category in each ST spot. Directly exploiting the morphological annotations proved to be unreliable and would confine the analysis to the annotated ST slide only. Hence, we used the estimates of the proportion of each annotation as obtained by the regressor (described in Methods: “Estimation of the fraction of each morphological category per spot”).
We modeled the observed data as coming from a weighted means of the different PBs corresponding to the different histomorphological category. Those PBs were considered as different in each sample. The sampling error was modeled using a negative binomial distribution. Specifically, we postulated that the mean expression of each gene \(g\) in each ST spot \(s\) is a function of sample-specific cell type (or histomorphological category) prototypes \(p\) and the amount of the given cell type (or histomorphological category) \(e\), as given by the regressor. We added a spot-specific scaling factor as some spots had more reads than others. The observed UMI count number was drawn from a negative binomial around this mean:
where \({N}_{t}\) is the number of annotations, \({e}_{{is}}\) is the fraction of annotation \(i\) in sample \(s\), \({p}_{{gi}}\) is the expression of gene \(g\) in PB \(i\), \({r}_{g}\) is the gene-specific extra variance from the negative binomial, and \({s}_{s}\) is the spot-specific scaling factor.
The likelihood was optimized on the genes (\({p}_{{gi}}\) and \({r}_{g}\)) and the scaling factors (\({s}_{s}\)) iteratively, for 5 iterations. Histomorphological category \(i\) was only considered for a sample if it represented at least a total of 2 spots across all replicates, that is \({\sum}_{s}{e}_{{is}} > 2\).
Tumor and stroma compartments
The tumor compartment was defined as the weighted means of the expression of the “tumor” category exclusively. The stroma compartment was defined as the weighted means of the expression of all histomorphological categories except for “tumor”, “necrosis” and “in situ”:
A third compartment designated as “other” recapitulated “necrosis” and “in situ” categories. Of note, we excluded one sample from the analysis of the tumor/stroma compartments (TNBC 10) as it was infiltrated only by in situ carcinoma.
TLS signature
To increase the specificity, we considered only the TLS compartments of patients for whom the pathologist and/or the regressor detected the presence of TLS (N = 52 patients), while still using all patients for the other compartments (Supplementary data 8). We performed a differential gene expression analysis between PBs derived from the TLS compartment and each of the other 8 compartments (lymphocytes, tumor, stroma, lactiferous ducts, fat tissue, necrosis, vessels and in situ) using Wilcoxon rank sum tests. This way, we obtained a P value and a fold change (FC) for each gene and all eight compartments. For each gene, we retained the highest P value and the smallest FC across all eight compartments to increase specificity and we obtained a list of differential expressed genes associated to one P value and one FC. From it, we selected genes having a P value < 10−9 and a FC > 2, leading to a 30-genes TLS signature (Supplementary Table 13, Supplementary data 13 and 14). The biological function of each gene was characterized manually using https://www.genecards.org.
The TLS signature expression level was estimated on spots across our ST cohort as the mean expression of TLS genes. The projections showed concordance with the manual annotation and the morphological deconvolution of TLS. To quantify the level of the signature accuracy, we assessed the AUCs reached by each individual gene of the signature to predict the presence or absence of TLS from morphological annotation and from morphological regression on ST global PB and bulk RNA seq data (Supplementary data 17). Comparing ST bulk RNA seq and global PB allowed us to assess the predictive accuracy from the bulk analyses. Moreover, we assessed the AUC of the global TLS signature to predict the presence or absence of TLS from morphological annotation and from morphological regression on ST global PB and bulk RNA seq data and we compared it with the AUC of an external ST-derived TLS signature24 (as described in Methods: “Gene expression signatures and gene set variation analysis”). Overall, the AUC of our TLS signature is higher if we calculated it from ST global PB in the regression setting of TLS detection (AUC = 0.79, CI = 0.7–0.89) compared to the annotation setting (AUC = 0.67, CI = 0.55–0.78). We also compared it to the AUC of the external ST-derived TLS signature (Lundeberg sig) and the results were similar (Supplementary data 16).
Clustering-based methods
Intra-patient clustering
To identify ‘spatial genes’, for each sample, we stacked all replicate ST subarrays into a single 2D space before identifying the 10 closest neighbors for each spot. Spatial nearest neighbors for each spot were obtained using the Euclidean distance on the x and y coordinates, potentially across subarrays. Expression of each gene was standardized to a mean of 0 and a standard deviation of 1 across all spots from all subarrays. We calculated for each gene the variance between standardized expression of a given spot and the one of the 10 nearest neighboring spots. The variance was averaged across all spots. Genes with a lower variance were considered as ‘spatial genes’.
We performed several clustering’s testing different sets of parameters and selected the result that was as well spatially defined as possible. Specifically, clustering was performed on the top 100, 200, 300, 400 or 500 spatial genes, using k-means with the number of clusters \({N}_{c}\) between 2 and 10. Moreover, clustering was performed using either the original expression values of those top spatial genes, or their projections via UMAP embeddings. This way, a total of 5 × 9 × 2 = 90 clustering’s were obtained for each sample.
We reasoned that the optimal clustering should be the most spatially conserved. To assess this, we compared for each spot its cluster to the one of its neighbors, and computed the fraction of cases where the clusters were identical, \(F\):
where \({N}_{s}\) is the number of spots, \(N\left(s\right)\) is the set of 10 neighbors of spot \(s\), and \({c}_{s}\) is the cluster of spot \(s\). If clusters were randomly spread, on average \(F=1/{N}_{c}\). Hence, we selected the clustering with the highest \(F-1/{N}_{c}\).
Identification of cluster prototypes
Using the clusters as given by the k-means led to prototypes that were too similar, due to many spots being located at the margin between multiple clusters. Ideally, prototypes should correspond to spots whose expression profile is derived solely from one single cluster. However, it is not guaranteed that such spots exist, and since we wanted to use as much data as possible, a deconvolution method was designed. The basic idea of the method is that each spot could belong to more than one cluster (similarly to fuzzy clustering), while also ensuring that the obtained solution remained close to the original clustering.
This is formalized by stating that the expression of a gene in a spot is drawn from a negative binomial distribution whose mean is a mixture of some pure clusters:
where \({c}_{{gs}}\) is the count of gene \(g\) in spot \(s\), \({N}_{c}\) is the number of clusters, \({m}_{{is}}\) is the contribution of cluster \(i\) in spot \(s\), \({p}_{{gi}}\) is the mean expression of gene \(g\) in cluster \(i\), \({{{\rm{NB}}}}\) is the negative binomial distribution, and \({r}_{g}\) is the parameter controlling the extra variance for gene \(g\). Optimizing the log likelihood would lead to a seeded non-negative matrix factorization. To ensure that the end results remained relatively close to the original clustering, a penalty term was added to the log likelihood:
where \(r\) is the Pearson correlation coefficient, and \({m}_{.s}\) is the vector of the mixture for spot \(s\) across clusters. \({{m}_{.s}}^{0}\) is the same mixture vector but taken from the k-means (so an indicator function, 1 if the spot is in the cluster, and 0 otherwise), and \(\propto\) is a weight that sets the trade-off between the factorization and the anchoring, that we set at 1000.
The likelihood was optimized alternatively on the prototypes (\({p}_{{gi}}\), \({r}_{g}\)) and the mixtures (\({m}_{{is}}\)) for 40 iterations.
Pseudobulks of deconvoluted prototypes were used to perform inter-patient clustering to avoid sample-driven clustering effect. Example of the projections of both versions (k-means and deconvolution) of intra-patient clusters were reported in Supplementary Fig. 10. The method is also further detailed in the vignette of the R package STstuff.
Intra-patient clustering (megaclustering)
The 424 prototypes \({p}_{{gi}}\) were clustered using k-means, leading to clusters of clusters, which we named MC. Some data transformations were performed on the prototypes before clustering. First, we removed prototypes that we deemed to represent too few spots. For this, the 6 prototypes which represented less than 50,000 reads, as estimated from the deconvolution, were merged with another prototype from the same patient. That prototype was chosen as the one having the highest Spearman correlation with the prototype to remove using the 5000 most expressed genes. Second, the data were normalized and log transformed from the count-like \({p}_{{gi}}\):
Genes for which >95% of the clusters had less than 1 read per 10,000 were discarded, ending with 3888 genes to perform the megaclustering. K-means was done at least 1000 times, with random starts, or as many times as needed to ensure that the best solution was found at least five times.
In the resulting clustering, if any MC consisted of clusters from one sample only, it was removed. The clusters from that MC were then reassigned to the closest remaining MC.
K-means were done for 5–20 clusters. We used a measure of robustness of the resulting clustering to decide the optimal number of clusters, as described in Methods: “Selection of the number of MCs” (Supplementary data 22).
Recovery of MCs in ST bulk RNA seq and other bulk gene expression datasets
We wanted to be able to estimate the proportion of each MC from bulk RNA seq. For this, we designed a deconvolution method. This method was tested on the bulk RNA seq data obtained from the ST samples.
We first defined the prototype \({P}_{{gi}}\) of MC \(i\) as the mean of all its cluster prototypes:
where \({{MC}}_{i}\) is the set of clusters belonging the MC \(i\), \({N}_{i}\) being the size of this set. We considered that each sample was a mixture of MCs, hence of those prototypes, using negative binomial distributions. To account for differences between studies, we added a study-specific scaling factor for each gene, leading to the following model:
where \({x}_{{gi}}\) is the count of gene \(g\) in sample \(i\), \({m}_{{ij}}\) is the contribution of megacluster \(j\) in sample \(i\), and \({r}_{g}\) is the extra variance for gene \(g\). An initial estimate for \({s}_{g}\) was obtained by comparing the mean expression of each gene in the other dataset with the mean expression in the spatial dataset. Then \({m}_{{ij}}\), and \({s}_{g}\), \({r}_{g}\) are optimized in turn. Practically only one iteration was done (so \({m}_{{ij}}\) was optimized, then \({s}_{g}\), \({r}_{g}\), then \({m}_{{ij}}\) again).
Selection of the number of MCs
We reasoned that a proper megaclustering should be robust, hence recoverable from the ST bulk RNA seq data. We verified this assumption using a 10-fold cross-validation method, on the data from the bulk RNA seq samples. Practically, 9/10th of the patients were used as a training set, the last 1/10th being used as the test set. The prototypes for the MCs \({P}_{{gi}}\) were obtained on the training set. We estimated the proportion of those MCs in the test set using the method as described in Methods: “Recovery of MCs in ST bulk RNA seq and other bulk gene expression datasets”.
This was done for all splits and repeated five times. We binarized those proportions. The result estimates of the presence of each MC were then compared to the actual presence of the MC in the corresponding samples, using the AUC. We picked the number of MCs leading to the highest mean AUC (Supplementary data 22 and 23).
Per-spot projection of MCs by deconvolution
To find the proportion of each MC in each spot, we used a method similar to the one used to recover MCs in other datasets. The only difference was that no rescaling of the genes was performed as the prototypes were calculated on the same dataset (\({i.e.,s}_{g}=1\)). Deconvoluting MCs at the spot level led to a higher number of MCs found by sample than obtained by the original k-means, as MCs appearing only on a subset of spots, or only present mixed with another MC, could be missed by the k-means (Supplementary Fig. 10).
By summing those proportions by spots, we obtained a number of cumulated spots for each MC and then, an estimate of the proportion of each MC by sample. Those proportions were used for most of the downstream analyses on MCs (i.e., survival analysis, SA).
Spatial archetypes
Definition of the SA
“Spatial archetypes” are a further clustering of the MCs proportions, to stratify the TNBC patients. They are based on a hierarchical clustering (Ward method, more specifically Ward D2) on the MC compositions obtained by spot deconvolution, using the Euclidean distance. To determine the optimal number of SAs, we used the R package NbClust which compile a large number of indices used for this purpose. We used the most common number of clusters obtained from those indices (Supplementary data 24).
Recovery of SAs from bulk expression data
To recover the SAs in bulk RNA seq datasets (ST cohort, METABRIC and SCAN-B), we used a nearest-neighbor classifier. In these datasets, we performed the deconvolution of MCs as previously described in Methods: “Recovery of MCs in ST bulk RNA seq and other datasets”. We used a quantile transformation to rescale the MC proportions from the bulk RNA seq dataset to make them more similar to the distribution of the ST TNBC dataset used for the clustering. Values below 0.01 were put to 0.01. Each sample in the bulk RNA seq dataset was assigned to the same SA as the closest sample in the ST dataset, based on the Euclidean distance between logarithms of those quantile matrices.
The accuracy of the recovery was assessed by comparing SAs obtained in the bulk RNA seq of our ST TNBC cohort with the original SAs, as a confusion matrix (Supplementary data 27).
Quantification and statistical analysis
Univariate and multivariate Cox proportional hazard models were used for survival analyses. In the univariate analyses for survival, P values were obtained with the likelihood ratio test of the Cox model. In multivariate analyses, the model included as co-variates tumor size (> vs ≤20 mm for METABRIC and SCAN-B, >vs ≤T1 for ST), lymph node status (N0 vs. N +/x), age at diagnosis as a continuous variable. P values were then derived by using a likelihood ratio test between nested models, with and without the variable of interest. Confidence intervals and hazard ratios (HR) were obtained in both cases from the Cox model. Univariate and multivariate analyses were performed including all patients with complete data.
Firth’s penalized likelihood method was applied in case of infinite hazard/odd ratios for logistic regression (logistf package) and cox model (coxphf package). It was only obtained for the HR/OR, the P values from the original analysis were conserved.
Survival curves were estimated using Kaplan-Meier method by considering aforementioned clinical end points (iBCFS, RFS, iDFS, DRFS, OS). P values of different prognostic groups were obtained with a permutation version of the log-rank test when no controls for studies or co-variates are required92.
Wilcoxon rank sum (for comparisons between two groups) and Kruskal-Wallis tests (for comparisons between three or more groups) were used to compare continuous variables to categorical variables. Odds ratios (OR) were obtained using logistic regressions.
All P values were two-sided except for differential analysis of GO on TLS compartment vs. other (Fig. 5c, Supplementary Fig. 5b, c) and for the expression of five targetable genes across SAs in combined TNBC cohorts (Fig. 8f) where the P value was one-sided. False-discovery rates (FDRs) were obtained by adjusting p values for multiple comparisons using the Benjamini & Hochberg procedure. Some P values that were mostly descriptive were not corrected. In that case, no FDR was reported. P values and FDRs were considered significant when <0.05.
The dot plots (i.e., Fig. 4c and Supplementary Figs. 20–21) present a color code (blue or red) for the direction of the association. Wilcoxon tests are used for FDRs. Dots are bordered and dark-colored when FDRs <0.05, compared to lighter-colored dots when FDRs ≥ 0.05. Effect sizes are ORs, calculated with logistic regression and capped at 4. When none of the classes shows an FDR < 0.05, the results are not included in the plot. Exceptionally, the dot plots from Figs. 4c and 8e were reconstituted from individual dot plots representing the associations between the whole list of the molecular features separately for expression-based signatures, cell types or single genes, and TNBC molecular subtypes (Fig. 4c) or SA (Fig. 8e). In this case, the effect sizes and the FDRs referred to the source individual dot plots.
All correlations were assessed calculating the Spearman’s rank correlation coefficient (rho) on pairwise complete observations and considered significant for P value < 0.05 or FDR < 0.05 if corrected for multiple testing.
For the boxplots, the box bounds are represented by the interquartile range divided by the median, with the whiskers extending to a maximum of 1.5 times the interquartile range beyond the box. Comparisons across subgroups (pairwise or 1 vs all) were done using Wilcoxon rank sum and P values adjustment with the Benjamini & Hochberg and displayed using the “star” notation (****FDR < 0.0001; ***FDR < 0.001; **FDR < 0.01; *FDR < 0.05).
All statistical analyses were performed using the R software (v4.2.1).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The raw sequencing files for the ST data and bulk RNA seq of the ST TNBC cohort generated in this study have been deposited at the European Genome-Phenome Archive (EGA) under accession code EGAS50000000475. The raw sequencing data are available under restricted access due to data privacy laws. The data can be obtained upon signature of a data access agreement (DAA) between the investigator requesting the access and Institut Jules Bordet (IJB), subject to applicable laws. Access requests can be initiated by email to the corresponding author (christos.sotiriou@hubruxelles.be) with an approximate timeframe to reply of 4 weeks. The conditions related to the access to the data are specified in the DAA. In detail, the raw data are accessible for reproducibility purposes and for academic and non-academic investigators aiming to perform original research. The data can be used for a maximum of 3 years after its reception. The processed count matrices derived from the raw ST data and the associated brightfield images (H/E-images) are available without any restrictions via the repository Zenodo [https://doi.org/10.5281/zenodo.8135721]. QC filtering, IHC images, morphological annotations and all the projections on ST slides (i.e., morphological regression, clusters, megaclusters, signatures) are accessible via the repository. The public datasets can all be accessed via https://www.cbioportal.org/study/summary?id=brca_metabric for METABRIC, https://data.mendeley.com/datasets/yzxtxn4nmd for SCAN-B, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194040 for I-SPY2 and https://www.orcestra.ca/clinical_icb for other ICB-treated metastatic cancers. The same public datasets are also accessible via the Zenodo repository. The remaining data are available in the Article, its Supplementary information and Source data file. Source data are provided with this paper.
Code availability
All code, data, and results that relate to the content of this manuscript are accessible via the GitHub repository: https://github.com/BCTL-Bordet/ST [https://doi.org/10.5281/zenodo.13867936]93. The repository also includes the results produced in the analysis and the code used for this purpose.
References
Bauer, K., Parise, C. & Caggiano, V. Use of ER/PR/HER2 subtypes in conjunction with the 2007 St Gallen Consensus Statement for early breast cancer. BMC Cancer 10, 228 (2010).
Dent, R. et al. Triple-negative breast cancer: clinical features and patterns of recurrence. Clin. Cancer Res. 13, 4429–4434 (2007).
Foulkes, W. D., Smith, I. E. & Reis-Filho, J. S. Triple-negative breast cancer. N. Engl. J. Med. 363, 1938–1948 (2010).
Symmans, W. F. et al. Long-term prognostic risk after neoadjuvant chemotherapy associated with residual cancer burden and breast cancer subtype. J. Clin. Oncol. 35, 1049–1060 (2017).
Bianchini, G., Balko, J. M., Mayer, I. A., Sanders, M. E. & Gianni, L. Triple-negative breast cancer: challenges and opportunities of a heterogeneous disease. Nat. Rev. Clin. Oncol. 13, 674–690 (2016).
Lehmann, B. D. et al. Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies. J. Clin. Invest. 121, 2750–2767 (2011).
Burstein, M. D. et al. Comprehensive genomic analysis identifies novel subtypes and targets of triple-negative breast cancer. Clin. Cancer Res. 21, 1688–1698 (2015).
Jiang, Y.-Z. et al. Genomic and transcriptomic landscape of triple-negative breast cancers: subtypes and treatment strategies. Cancer Cell 35, 428–440.e5 (2019).
Lehmann, B. D. et al. Refinement of triple-negative breast cancer molecular subtypes: implications for neoadjuvant chemotherapy selection. Plos One 11, e0157368 (2016).
Bareche, Y. et al. Unravelling triple-negative breast cancer molecular heterogeneity using an integrative multiomic analysis. Ann. Oncol. 29, 895–902 (2018).
Bareche, Y. et al. Unraveling triple-negative breast cancer tumor microenvironment heterogeneity: towards an optimized treatment approach. J. Natl Cancer Inst. 112, 708–719 (2020).
Lehmann, B. D. et al. Multi-omics analysis identifies therapeutic vulnerabilities in triple-negative breast cancer subtypes. Nat. Commun. 12, 6276 (2021).
Wang, X. Q. et al. Spatial predictors of immunotherapy response in triple-negative breast cancer. Nature 621, 868–876 (2023).
Yam, C. et al. Immune phenotype and response to neoadjuvant therapy in triple-negative breast cancer. Clin. Cancer Res. 27, 5365–5375 (2021).
Garaud, S., Dieu-Nosjean, M.-C. & Willard-Gallo, K. T follicular helper and B cell crosstalk in tertiary lymphoid structures and cancer immunotherapy. Nat. Commun. 13, 2259 (2022).
Cabrita, R. et al. Tertiary lymphoid structures improve immunotherapy and survival in melanoma. Nature 577, 561–565 (2020).
Gruosso, T. et al. Spatially distinct tumor immune microenvironments stratify triple-negative breast cancers. J. Clin. Invest. 129, 1785–1800 (2019).
Hammerl, D. et al. Spatial immunophenotypes predict response to anti-PD1 treatment and capture distinct paths of T cell evasion in triple negative breast cancer. Nat. Commun. 12, 5668 (2021).
Ignatiadis, M. et al. Gene modules and response to neoadjuvant chemotherapy in breast cancer subtypes: a pooled analysis. J. Clin. Oncol. 30, 1996–2004 (2012).
Haibe-Kains, B. et al. A three-gene model to robustly identify breast cancer molecular subtypes. J. Natl Cancer Inst. 104, 311–325 (2012).
Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).
Larsson, L., Frisén, J. & Lundeberg, J. Spatially resolved transcriptomics adds a new dimension to genomics. Nat. Methods 18, 15–18 (2021).
Marx, V. Method of the year: spatially resolved transcriptomics. Nat. Methods 18, 9–14 (2021).
Andersson, A. et al. Spatial deconvolution of HER2-positive breast cancer delineates tumor-associated cell type interactions. Nat. Commun. 12, 6012 (2021).
Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53, 1334–1347 (2021).
Bassiouni, R. et al. Spatial transcriptomic analysis of a diverse patient cohort reveals a conserved architecture in triple-negative breast cancer. Cancer Res. 83, 34–48 (2023).
Liu, Y. et al. Combined single‐cell and spatial transcriptomics reveal the metabolic evolvement of breast cancer during early dissemination. Adv. Sci. 10, 2205395 (2023).
Hu, J. et al. Deciphering tumor ecosystems at super resolution from spatial transcriptomics with TESLA. Cell Syst. 14, 404–417.e4 (2023).
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
Salmén, F. et al. Barcoded solid-phase RNA capture for spatial transcriptomics profiling in mammalian tissue sections. Nat. Protoc. 13, 2501–2534 (2018).
Liberzon, A. et al. The molecular signatures database hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
Stanton, S. E., Adams, S. & Disis, M. L. Variation in the incidence and magnitude of tumor-infiltrating lymphocytes in breast cancer subtypes: a systematic review. JAMA Oncol. 2, 1354 (2016).
Gu-Trantien, C. et al. CD4+ follicular helper T cell infiltration predicts breast cancer survival. J. Clin. Invest. 123, 2873–2892 (2013).
Zhang, Y. et al. Single-cell analyses reveal key immune cell subsets associated with response to PD-L1 blockade in triple-negative breast cancer. Cancer Cell 39, 1578–1593.e8 (2021).
Helmink, B. A. et al. B cells and tertiary lymphoid structures promote immunotherapy response. Nature 577, 549–555 (2020).
Fridman, W. H. et al. B cells and tertiary lymphoid structures as determinants of tumour immune contexture and clinical outcome. Nat. Rev. Clin. Oncol. 19, 441–457 (2022).
Sautès-Fridman, C., Petitprez, F., Calderaro, J. & Fridman, W. H. Tertiary lymphoid structures in the era of cancer immunotherapy. Nat. Rev. Cancer 19, 307–325 (2019).
Buisseret, L. et al. Reliability of tumor-infiltrating lymphocyte and tertiary lymphoid structure assessment in human breast cancer. Mod. Pathol. 30, 1204–1212 (2017).
Meylan, M. et al. Tertiary lymphoid structures generate and propagate anti-tumor antibody-producing plasma cells in renal cell cancer. Immunity 55, 527–541.e5 (2022).
Group, M. E. T. A. B. R. I. C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).
Staaf, J. et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. Npj Breast Cancer 8, 94 (2022).
Wolf, D. M. et al. Redefining breast cancer subtypes to guide treatment prioritization and maximize response: predictive biomarkers across 10 cancer therapies. Cancer Cell 40, 609–623.e6 (2022).
Bareche, Y. et al. Leveraging big data of immune checkpoint blockade response identifies novel potential targets. Ann. Oncol. 33, 1304–1317 (2022).
Allard, B., Allard, D., Buisseret, L. & Stagg, J. The adenosine pathway in immuno-oncology. Nat. Rev. Clin. Oncol. 17, 611–629 (2020).
Chen, Y. et al. Pharmaceutical targeting Th2-mediated immunity enhances immunotherapy response in breast cancer. J. Transl. Med. 20, 615 (2022).
Wu, L. et al. Breast cancer cell–neutrophil interactions enhance neutrophil survival and pro-tumorigenic activities. Cancers 12, 2884 (2020).
Barrera, L. et al. CD47 overexpression is associated with decreased neutrophil apoptosis/phagocytosis and poor prognosis in non-small-cell lung cancer patients. Br. J. Cancer 117, 385–397 (2017).
Tutt, A. N. J. et al. Adjuvant olaparib for patients with BRCA1 - or BRCA2 -mutated breast cancer. N. Engl. J. Med. 384, 2394–2405 (2021).
Robson, M. et al. Olaparib for metastatic breast cancer in patients with a germline BRCA mutation. N. Engl. J. Med. 377, 523–533 (2017).
Litton, J. K. et al. Talazoparib in patients with advanced breast cancer and a germline BRCA mutation. N. Engl. J. Med. 379, 753–763 (2018).
Bardia, A. et al. Sacituzumab Govitecan In Metastatic Triple-negative Breast Cancer. N. Engl. J. Med. 384, 1529–1541 (2021).
Oliveira, M. et al. Patritumab deruxtecan in untreated hormone receptor-positive/HER2-negative early breast cancer: final results from part A of the window-of-opportunity SOLTI TOT-HER3 pre-operative study. Ann. Oncol. 34, 670–680 (2023).
Schmid, P. et al. Pembrolizumab for early triple-negative breast cancer. N. Engl. J. Med. 382, 810–821 (2020).
Buisseret, L. et al. Clinical significance of CD73 in triple-negative breast cancer: multiplex analysis of a phase III clinical trial. Ann. Oncol. 29, 1056–1062 (2018).
Herbst, R. S. et al. COAST: an open-label, phase II, multidrug platform study of durvalumab alone or in combination with oleclumab or monalizumab in patients with unresectable, stage III non–small-cell lung cancer. J. Clin. Oncol. 40, 3383–3393 (2022).
M-Rabet, M. et al. Nectin-4: a new prognostic biomarker for efficient therapeutic targeting of primary and metastatic triple-negative breast cancer. Ann. Oncol. 28, 769–776 (2017).
Powles, T. et al. Enfortumab vedotin in previously treated advanced urothelial carcinoma. N. Engl. J. Med. 384, 1125–1135 (2021).
Modi, S. et al. Trastuzumab deruxtecan in previously treated HER2-low advanced breast cancer. N. Engl. J. Med. 387, 9–20 (2022).
Staaf, J. et al. Whole-genome sequencing of triple-negative breast cancers in a population-based clinical study. Nat. Med. 25, 1526–1533 (2019).
Bianchini, G., De Angelis, C., Licata, L. & Gianni, L. Treatment landscape of triple-negative breast cancer—expanded options, evolving needs. Nat. Rev. Clin. Oncol. 19, 91–113 (2022).
Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. Npj Precis. Oncol. 7, 14 (2023).
Sun, P. et al. A computational tumor-infiltrating lymphocyte assessment method comparable with visual reporting guidelines for triple-negative breast cancer. EBioMedicine 70, 103492 (2021).
Arnedos, M. et al. 213MO Primary endpoint analysis of a randomized phase II of darolutamide or capecitabine in patients with triple-negative androgen receptor-positive advanced breast cancer (UCBG3-06 START trial). Ann. Oncol. 33, S635 (2022).
Lehmann, B. D. et al. TBCRC 032 IB/II multicenter study: molecular insights to AR antagonist and PI3K inhibitor efficacy in patients with AR+ metastatic triple-negative breast cancer. Clin. Cancer Res. 26, 2111–2123 (2020).
Bonnefoi, H. et al. A phase II trial of abiraterone acetate plus prednisone in patients with triple-negative androgen receptor positive locally advanced or metastatic breast cancer (UCBG 12-1). Ann. Oncol. 27, 812–818 (2016).
Ascione, L. et al. PIK3CA mutations in breast cancer subtypes other than HR-positive/HER2-negative. J. Pers. Med. 12, 1793 (2022).
Debien, V. et al. Immunotherapy in breast cancer: an overview of current strategies and perspectives. Npj Breast Cancer 9, 7 (2023).
Wang, X. et al. Predictive biomarkers for response to immunotherapy in triple negative breast cancer: promises and challenges. J. Clin. Med. 12, 953 (2023).
Munoz-Erazo, L., Rhodes, J. L., Marion, V. C. & Kemp, R. A. Tertiary lymphoid structures in cancer – considerations for patient prognosis. Cell. Mol. Immunol. 17, 570–575 (2020).
Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature 596, 211–220 (2021).
Sparano, J. A. et al. Prospective validation of a 21-gene expression assay in breast cancer. N. Engl. J. Med. 373, 2005–2014 (2015).
Cardoso, F. et al. 70-gene signature as an aid to treatment decisions in early-stage breast cancer. N. Engl. J. Med. 375, 717–729 (2016).
Tolaney, S. M. et al. Updated standardized definitions for efficacy end points (STEEP) in adjuvant breast cancer clinical trials: STEEP version 2.0. J. Clin. Oncol. 39, 2720–2731 (2021).
Buisseret, L. et al. Tumor-infiltrating lymphocyte composition, organization and PD-1/ PD-L1 expression are linked in breast cancer. OncoImmunology 6, e1257452 (2017).
Rueda, O. M. et al. Dynamics of breast-cancer relapse reveal late-recurring ER-positive genomic subgroups. Nature 567, 399–404 (2019).
Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 497–514.e22 (2020).
Jemt, A. et al. An automated approach to prepare tissue-derived spatially barcoded RNA-sequencing libraries. Sci. Rep. 6, 37137 (2016).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Salgado, R. et al. The evaluation of tumor-infiltrating lymphocytes (TILs) in breast cancer: recommendations by an International TILs Working Group 2014. Ann. Oncol. 26, 259–271 (2015).
Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).
Costa, A. et al. Fibroblast heterogeneity and immunosuppressive environment in human breast cancer. Cancer Cell 33, 463–479.e10 (2018).
Kieffer, Y. et al. Single-cell analysis reveals fibroblast clusters linked to immunotherapy resistance in cancer. Cancer Discov. 10, 1330–1351 (2020).
Lee, Y. J. et al. CD39+ tissue-resident memory CD8+ T cells with a clonal overlap across compartments mediate antitumor immunity in breast cancer. Sci. Immunol. 7, eabn8390 (2022).
Cassetta, L. et al. Human tumor-associated macrophage and monocyte transcriptional landscapes reveal cancer-specific reprogramming, biomarkers, and therapeutic targets. Cancer Cell 35, 588–602.e10 (2019).
Sotiriou, C. et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J. Natl Cancer Inst. 98, 262–272 (2006).
van’t Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).
Carter, S. L., Eklund, A. C., Kohane, I. S., Harris, L. N. & Szallasi, Z. A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers. Nat. Genet. 38, 1043–1048 (2006).
Farmer, P. et al. A stroma-related gene signature predicts resistance to neoadjuvant chemotherapy in breast cancer. Nat. Med. 15, 68–74 (2009).
Desmedt, C. et al. Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin. Cancer Res. 14, 5158–5165 (2008).
Teschendorff, A. E., Miremadi, A., Pinder, S. E., Ellis, I. O. & Caldas, C. An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer. Genome Biol. 8, R157 (2007).
Heller, G. & Venkatraman, E. S. ResampLing Procedures To Compare Two Survival Distributions In The Presence Of Right-censored Data. Biometrics 52, 1204 (1996).
Venet, D. Spatial transcriptomics reveals substantial heterogeneity in triple negative breast cancer with potential clinical implications. Zenodo https://doi.org/10.5281/ZENODO.13867935 (2024).
Acknowledgements
This research was supported by the Fondation Julie-Françoise Drion, the Fondation contre le cancer, the Breast Cancer Research Foundation, the Association Jules Bordet, and the Belgian Fonds National de la Recherche Scientifique (F.R.S.-FNRS). X.W. was supported by the Belgian Fonds National de la Recherche Scientifique (F.R.S.-FNRS). D.V. was supported by the Foundation Julie-Françoise Drion. M.R. was supported by Télévie and the Belgian Fonds National de la Recherche Scientifique (F.R.S.-FNRS). M.R. was supported by Télévie and the Belgian Fonds National de la Recherche Scientifique (F.R.S.-FNRS), and by Fondation Rose et Jean Hoguet. J.L. was supported by the Swedish Cancer Society. Computational resources have been provided by the Consortium des Équipements de Calcul Intensif (CÉCI), funded by the Fonds de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and by the Walloon Region. We acknowledge Maja Marklund, Konstantin Carlberg, Ludvig Larsson and Annelie Mollbrink for their involvement in the experimental STs workflow, David Gacquer for his contribution to the conceptualization of the figures and Samira Majjaj for the bulk RNA extraction and library preparation.
Author information
Authors and Affiliations
Contributions
X.W., D.V., F.R., and C.S. contributed to the interpretation of the results and wrote the manuscript with input from the remaining authors. X.W., and D.V. performed data analysis. D.V. conducted the bioinformatics pipeline including data integration and software design. F.L., D.L. (referred to as the pathologists) and X.W. performed morphological annotation and pathological classification of the samples. M.R. and A.J.G. contributed to the bioinformatics analyses. M.R., A.J.G., L.B., and M.I. contributed to the interpretation of the results. M.R., A.J.G. and M.C. contributed to the review of the manuscript. F.D., G.R., and L.S. carried out the laboratory experiments. L.C. and D.L. provided samples from the tissue bank. K.T., E.G.V., L.F., S.S., L.K., and L.S. initiated the experimental spatial transcriptomics workflow. N.B. performed the tissue and ST matching of the entire dataset. Y.M. contributed to the bioinformatics analyses. J.L. provided technical facilities to ST technology and sequencing platform. C.S., F.R., and J.L. planned the ST study and supervised the project. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
J.L., S.S., K.T., E.G.V. and N.B. are scientific advisors for 10xGenomics Inc. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks David Craig and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, X., Venet, D., Lifrange, F. et al. Spatial transcriptomics reveals substantial heterogeneity in triple-negative breast cancer with potential clinical implications. Nat Commun 15, 10232 (2024). https://doi.org/10.1038/s41467-024-54145-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-54145-w