Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3674658.3674680acmotherconferencesArticle/Chapter ViewFull TextPublication PagesicbbtConference Proceedingsconference-collections
research-article
Open access

Impact of Loss Functions on Label-free Virtual H&E Staining

Published: 18 November 2024 Publication History

Abstract

Label-free virtual Haematoxylin and Eosin (H&E) staining has the potential to generate realistic histological images for rapid clinical diagnosis, eliminating the need for the conventional, costly, and time-consuming tissue staining procedure. Although various deep learning techniques have proven effective for this purpose, there has been limited attention given to how different loss functions influence the fidelity of synthesis. In this study, we focus on assessing the qualitative and quantitative impact of four widely applied loss functions designed for high-fidelity image synthesis in the context of virtual H&E staining using single-channel label-free autofluorescence images. Qualitative analysis involves a visual comparison between true and synthetic images, while quantitative analysis utilises several well-known image similarity metrics to measure the distance between real and virtual images. Our experimental results demonstrate the feasibility of extra regularisation terms with different weights on synthesis H&E images. Both visual inspection and quantitative outcomes align well with each other, but both should be facilitated to reach a conclusive decision for the virtual staining with optimal quality.

1 Introduction

Haematoxylin and Eosin (H&E) staining serves as the primary examination method for human tissue samples in histology, standing as the gold standard for evaluating nuclei and cytoplasmic features to detect abnormalities associated with human diseases, including cancer [24]. Despite its technical simplicity, the routine staining procedures for achieving high-quality results demand specific resources, such as effective chemical and material management. Furthermore, inherent challenges are often associated with H&E staining, including staining variability, potential artifact formations stemming from various steps in the process, and the need for careful tissue handling [23].
Recent advancements in deep learning (DL) technology have made it possible to achieve label-free virtual H&E staining directly from autofluorescence images obtained on unstained tissue. This innovative approach allows for the rapid generation of clinical-grade H&E-stained images in almost real-time, eliminating the need for conventional staining procedures [2, 14]. In the realm of Deep Learning (DL) techniques, U-Nets [20] and Generative Adversarial Networks (GANs, [7]) are widely employed models, along with their variations. Notably, the pix2pix network [9] and CycleGAN [32] are particularly prominent for applications in supervised and unsupervised virtual histological staining, respectively.
Despite the success of GANs in virtual histological staining, one noticeable challenge known as hallucination may persist, where synthetic data generated is irrelevant to the input data distribution, leading to mis-diagnosis of medical conditions [3]. To overcome this problem, a recommended strategy is to incorporate extra regularisation terms into the GAN’s loss functions, allowing guided training process for more expected synthesis through more effective constraints. Apart from common loss functions utilised in GANs, diverse extra constraints have been applied, including L1 loss [17, 27], total variation (TV) [12, 15, 19, 30], physics-guided loss [22] to minimise the impact of background noise, and structural similarity index (SSIM) [1, 11]. However, the influence of introducing additional regularisation terms on the outcomes of virtual histological staining into the training process remains largely unexplored.
In the field of computer vision, numerous endeavours have been undertaken to capitalise on additional constraints for achieving high-quality image synthesis. One category of such efforts involves leveraging high-level features extracted from pretrained models. For instance, Mahendran et al. introduced the concept of deep features obtained from various layers, representing a diverse range of details [16]. Another regularisation approach based on deep features is texture loss, wherein multi-level features from pretrained VGG19 are imposed on generated images, and feature correlations at a given level are determined by the Gram matrix [5]. Subsequently, methods such as content and style transfer [6] and perceptual losses [10] were proposed to underscore the varied details embedded in both source and target images, contributing to more realistic image synthesis.
In this study, we aim to systematically assess the influence of various loss functions on virtual H&E staining. Specifically, we will utilise the pix2pix GAN as the foundational model and integrate five additional regularisation functions, namely Structural Similarity Index (SSIM, [25]), Total Variation (TV, [21]), Deep Image Structure and Texture Similarity (DISTS) [4], content, and texture losses. The evaluation of outcomes will involve both qualitative examination through visual inspection and quantitative analysis using four commonly employed image similarity metrics. These metrics include mean squared error (MSE), Peak Signal-to-Noise Ratio (PSNR) [31], Normalised Mutual Information (NMI) [31], and Feature-based Similarity Index (FSIM) [28].

2 Methodology

2.1 Experimental setup

Data collection: a tumour microarray (TMA) was reconstructed containing 79 cores from 79 patients. Autofluorescence images were collected using a confocal microscopy (Leica STELLARIS 8 FALCON Microscope, with 20x/0.75NA objective) on the tissue. Afterwards, the tissue was sent for H&E staining and scanned on a bright-field slide scanner (ZEISS Axio Scan.Z1), with an 20x/0.75NA objective.
Data post-processing: autofluorescence images were first stitched using an ImageJ stitching plugin [18]. The stitched images were co-registered with the corresponding histology images with affine transformation, following the co-registration process described in [19], except that we did not apply elastic registration after the global alignment. Eventually, the co-registered images were resampled to 256 × 256 to be fed into the DL model, where those patches with over 75% background were disposed.
The DL model: the pix2pix GAN, introduced by Isola et al. [9], is exploited as the fundamental model, which is a conditional GAN designed for paired image-to-image translation. Notably, pix2pix is not confined to a specific domain and has found applications in a wide range of tasks, spanning both general computer vision and medical imaging. The architecture comprises a U-Net-like GAN responsible for mapping images from the source domain to the target domain and a discriminator to evaluate a pair consisting of the generated image and the corresponding input image (serving as the mask) to discern whether the generated image is authentic or generated. In this study, pix2pix is facilitated as the baseline model since it stands as one of the most commonly utilised models for supervised virtual H&E staining [14]. In addition, our advantage is the pixel-level co-registration of the autofluorescence and H&E-stained images, allowing us to employed supervised DL models for optimal digital staining, thus, introducing minimal uncertainty associated with unsupervised methods.
The original loss function of pix2pix is composed of two parts: adversarial loss and pixel-wise loss. It can be defined as:
\begin{equation} \mathcal {L}(G, D) = \mathcal {L}_{\text{GAN}}(G, D) + \lambda \cdot \mathcal {L}_{\text{L1}}(G) \end{equation}
(1)
where G and D are the generator and discriminator, respectively, \(\mathcal {L}_{\text{GAN}}(G, D)\) is the adversarial loss, \(\cdot \mathcal {L}_{\text{L1}}(G)\) is the pixel-wise loss, and λ is a constant. To evaluate the impact of extra regularisation parts, we extend Equation 1 with one extra regularisation terms described in Section 2.2, denoted as \(\mathcal {L}_{\text{extra}}(G)\), using Equation 2
\begin{equation} \mathcal {L}(G, D) = \mathcal {L}_{\text{GAN}}(G, D) + \lambda \cdot \mathcal {L}_{\text{L1}}(G) + \beta \cdot \mathcal {L}_{\text{extra}}(G) \end{equation}
(2)
where β is another constant value to weight the contribution of the extra term.
Training details: in total, 69 TMA cores were used for the training, where 7 cores were left out as the independent testing set. To ensure a fair and consistent comparison, all training sessions using the specified loss functions utilised identical hyperparameters over 300 epochs. For λ and β in Equation 2, λ is fixed at 1 and β is one value from [1, 2, 5, 10, 20]. The training process employed a batch size of 16 on 4 NVIDIA V100 GPUs, provided through the EPSRC Tier-2 National HPC Services Cirrus hosted by EPCC 1 at The University of Edinburgh. This approach aimed to maintain objectivity and reliability in the evaluation of different loss functions.

2.2 Extra regularisation terms

SSIM is a prevalent metric to measure the similarity between images. The original equation [25] for pixel-wise SSIM is composed with three components, namely luminance, contrast, and structure. It computes mean, variance, and covariance of a patch around a given pixel within an image. A simplified version is governed by Equation 3 [25]:
\begin{equation} \text{SSIM}(x, y) = \frac{{(2\mu _x\mu _y + C_1) \cdot (2\sigma _{xy} + C_2)}}{{(\mu _x^2 + \mu _y^2 + C_1) \cdot (\sigma _x^2 + \sigma _y^2 + C_2)}} \end{equation}
(3)
where μ is the pixel mean, σ is the variance, σx, y is the covariance of x and y, x and y are two patches, and C1 and C2 are two variables to stabilise the division.
TV was initially introduced for tasks like image denoising and reconstruction, as outlined in the work by Yuhas [26]. It quantifies the variation or complexity within an image. Rooted in convex analysis and functional analysis, TV is frequently employed as a regularisation term in optimisation problems, particularly in applications like image denoising, inpainting, and compressed sensing. TV is given by:
\begin{equation} \text{TV}(I) = \sum _{i,j} \sqrt {(\Delta _x I_{i,j})^2 + (\Delta _y I_{i,j})^2} \end{equation}
(4)
where Δx and Δy represent discrete differences along the horizontal and vertical directions, respectively.
Content loss pertains to the evaluation of deep image representations, quantifying the divergence between the deep features of source and target images [6, 16]. These features are extracted through a pretrained VGG network across multiple layers, where higher layers capture intricate high-level details, while lower layers emphasize the preservation of pixel-wise consistency with the original image. The content loss is then computed based on the feature maps from one or more layers, emphasising the preservation of the essential content from the reference image. The incorporation of content loss ensures that the generated image retains meaningful structures and objects present in the reference content [6, 10].
Texture loss, also known as style loss, is to assess the dissimilarity in textural details between images [5]. It involves evaluating the variance in textures between the generated and the reference image, which are derived from one or more layers of a pretrained DL models, such as VGG. Unlike content loss, texture properties at a given layer are calculated using the Gram matrix, where the input is the vectorised feature maps retrieved from the layer [6]. Texture loss is to compare the texture properties of the original and generated images.
DISTS [4] was specifically crafted to accommodate texture resampling while also serving as a versatile loss function. In contrast to content and texture losses, DISTS integrates both texture and structure information at the level of deep features generated by pretrained deep learning models. The texture component is obtained using global means, as described in Equation 5 [4], while the structure component is derived from global correlations, as outlined in Equation 6 [4].
\begin{equation} l(\tilde{x}^{(i)}_j,\tilde{y}^{(i)}_j)=\frac{2 \mu _{\tilde{x}_j}^{(i)}\mu _{\tilde{y}_j}^{(i)}+c_{1}}{\left(\mu _{\tilde{x}_j}^{(i)}\right)^{2}+\left(\mu _{\tilde{y}_j}^{(i)}\right)^{2}+c_{1}} \end{equation}
(5)
\begin{equation} s(\tilde{x}^{(i)}_j,\tilde{y}^{(i)}_j)=\frac{2 \sigma _{\tilde{x}_j\tilde{y}_j}^{(i)}+c_{2}}{\left(\sigma _{\tilde{x}_j}^{(i)}\right)^{2}+\left(\sigma _{\tilde{y}_j}^{(i)}\right)^{2}+c_{2}} \end{equation}
(6)
where \(\tilde{x}^{(i)}_j,\tilde{y}^{(i)}_j\) are the jth deep features of source and generated images, respectively, at ith convolutional layer, \(\mu _{\tilde{x}_j}^{(i)}, \mu _{\tilde{y}_j}^{(i)}, \sigma _{\tilde{x}_j}^{(i)}, and \mu _{\sigma {y}_j}^{(i)}\) are the global means and variances of \(\tilde{x}^{(i)}_j,\tilde{y}^{(i)}_j\), \(\sigma _{\tilde{x}_j\tilde{y}_j}^{(i)}\) is the global covariance between \(\tilde{x}^{(i)}_j,\tilde{y}^{(i)}_j\), and c1 and c2 are small positive values for numerical stability.
Taking Equation 5 and Equation 6 together, the DISTS loss can be presented using Equation 7 [4]:
\begin{equation} D(x, y;\alpha ,\beta)=1-\sum _{i=0}^{m}\sum _{j=1}^{n_i}\left(\alpha _{ij}l(\tilde{x}^{(i)}_j,\tilde{y}^{(i)}_j)+ \beta _{ij}s(\tilde{x}^{(i)}_j,\tilde{y}^{(i)}_j)\right) \end{equation}
(7)
where α and β are positive learnable weights, satisfying\(\sum _{i=0}^{m}\sum _{j=1}^{n_{ij}} (\alpha _{ij} + \beta _{ij}) = 1\).

2.3 Image similarity metrics

MSE a common metric used for measuring image similarity by quantifying the average squared differences between corresponding pixel values in two images. It is a straightforward and widely used method for comparing the pixel-wise intensity values of images. Given two images I and J of the same size M × N, the MSE is computed as:
\begin{equation} \text{MSE}(I, J) = \frac{1}{MN} \sum _{i=1}^{M} \sum _{j=1}^{N} (I(i, j) \end{equation}
(8)
PSNR [31] serves as a commonly employed metric in image processing to assess the fidelity and clarity of a reconstructed signal in comparison to the original signal. This metric offers a numerical indication of the extent to which the quality of the processed signal diverges from the original. Its significance is notably pronounced in applications where achieving high signal quality is paramount. The PSNR is formally defined as:
\begin{equation} \text{PSNR} = 10 \cdot \log _{10}\left(\frac{{\text{MAX}^2}}{{\text{MSE}}}\right) \end{equation}
(9)
where MAX is the maximum possible pixel value in the image, and MSE is the mean squared error between the original and processed images.
NMI is a statistical metric to quantify the similarity between two images or image segmentations [31]. It is particularly useful when assessing the agreement or correspondence between different partitionings of image data, such as comparing ground truth annotations with algorithmically generated results. The calculation is as:
\begin{equation} \text{NMI}(X, Y) = \frac{I(X;Y)}{\sqrt {H(X) \cdot H(Y)}} \end{equation}
(10)
where I(X;Y) is the mutual information between image X and Y, and H is the entropy.
FSIM maps the features and measures the similarities between two images[28]. It leverages two criteria, namely phase congruency (PC, perceived features) and gradient magnitude (GM, image gradient with conventional gradient operators, such as Sobel operator), for the measurement, which is governed by Equation 11 [28]:
\begin{equation} {S_{L}}({\bf x}) = {[{{2PC_{1}({\bf x}) \cdot PC_{2}({\bf x}) + {T_{1}}} \over {PC_{1}^{2}({\bf x}) + PC_{2}^{2}({\bf x}) + {T_{1}}}}]^{\alpha } } \cdot {[{{2{G_{1}}({\bf x}) \cdot {G_{2}}({\bf x}) + {T_{2}}} \over {{G_{1}^{2}}({\bf x}) + {G_{2}^{2}}({\bf x}) + {T_{2}}}}]^{\beta } } \end{equation}
(11)
where α and β are two adjustable parameters to tune the relative significance of PC and GM, PC is the PC criterion,G is the GM criterion, and T1 and T2 are two positive numbers for numerical stability. Detailed implementation of PC and G can be found in [28].

3 Results

3.1 Visual inspection

Figure 1:
Figure 1: Virtual H&E staining with various loss functions of diverse weights using a single-channel autofluorescence intensity image. a the input label-free intensity image.b true H&E-stained image as the reference. Row c to g are the virtually stained images using SSIM, texture, DISTS, TV, and content as the loss function with the weights of 1, 2, 5, 10, and 20, respectively. Red arrows point to a cluster of immune cells, which are missing in all synthesised images. Orange arrows indicate a few macrophages, which are not reconstructed in all virtual images. Blue arrows refer to a cluster of red blood cells, which are mis-interpreted as other types of cells in the synthetic images. Green arrows link to some cancer cell surrounded by immune cells, where some of them are mis-reconstructed as immune cells.
We initially selected one core to illustrate the synthesised H&E images. The outcomes are depicted in Figure 1, along with the corresponding autofluorescence image (Figure 1.a) and the true H&E image (Figure 1.b) serving as the reference. Generally, incorporating SSIM, DISTS, or texture as additional regularisation forms can yield plausible virtual H&E images. In contrast, all virtual images with TV loss are too blurry for effective diagnosis decision making. In terms of content loss, a noticeable checkerboard effect is observed, and the higher the weight, the more visually apparent the artifact becomes.
Generally, across the given weights, SSIM loss can guide the GAN model for realistic synthesis of H&E images. An intriguing observation is that all virtual images with SSIM regularisation exhibit nearly identical quality, making them almost visually indistinguishable. As far as texture loss is concerned, with the increase of the weight, it tends to misinterpret more cellular components, such as macrophages (orange arrows) and red blood cells (RBC). However, it is better at differentiating smaller cells than SSIM, for example, those pointed by green arrows. The virtual images associated with DISTS present a similar pattern, and their quality is comparable to those with texture loss.
Despite the overall agreement on staining quality between the true H&E image (Figure 1.a) and those associated with SSIM, content, and DISTS loss, mis-reconstruction is evident for some cells. The red arrows point to a swarm of immune cells in the true staining, where none of the virtual stainings are able to reconstruct them accurately. This discrepancy may arise because the autofluorescence signals of these cells are suppressed by other components with high autofluorescence (the brightest area at the lower right part in Figure 1.a). It is worth mentioning that although the missing of these immune cells has little impact on cancer detection and diagnosis, it is significant for quantification and tumour microenvironment characterisation. For example, it is not possible to analyse how these immune cells may fight against tumour cells at this particular area. Green arrows highlight a few cancer cells surrounded by immune cells, which are not entirely recognised in SSIM, texture, and DISTS-based virtual staining. However, texture and DISTS losses can recover more cells than SSIM loss in this area. Additionally, a few macrophages, indicated by orange arrows, are identifiable in SSIM with all weights, as well as texture and DISTS with weights 1 and 2. Blue arrows refer to some RBC. Most of RBC are surprisingly present in SSIM-based reconstruction, whereas some of them are mistakenly interpreted as other cells with black nuclei in texture and DISTS-based reconstruction, particularly with higher weights (10 and 20).

3.2 Quantitative analysis

To quantify the results, we calculated the similarity between true and virtual H&E-stained images using four image similarity metrics defined in Section 2.3, namely MSE, NMI, PSNR, and FSIM. The metrics were computed on seven TMA cores and averaged across the cores. The results are presented in the figures below.
Figure 2:
Figure 2: Average MSE (↓) computed on seven TMA cores for each of the five different regularisation terms, using weights 1, 2, 5, 10, and 20 in combination, grouped by the weights (upper plot) and regularisations (lower plot).
Figure 2 illustrates the average MSE, categorised by weights (top plot) and regularizations (bottom plot). Mean MSE values align with the generated images with the regularization depicted in Figure 1. Virtual images related to TV regularisation appear blurred, resulting in the highest MSE values for all weights. Regarding content loss, as the weight increases, the quality of the reconstructed images deteriorates, leading to an increase in MSE values. Since all virtual images associated with SSIM loss are visually similar, they exhibit almost identical MSE values. For texture and DISTS, images resulting from higher weights tend to have more incorrectly reconstructed cells, consequently having higher MSE values.
Figure 3:
Figure 3: Average NMI (↑) computed on seven TMA cores for each of the five different regularisation terms, using weights 1, 2, 5, 10, and 20 in combination, grouped by the weights (upper plot) and regularisations (lower plot).
Figure 3 illustrates the average NMI, grouped by weights (top plot) and regularisations (bottom plot). In contrast to MSE, the contrast in NMI values across various regularisations and weights is not as apparent. However, similar patterns to those found in Figure 2 can still be identified. For example, higher weights result in lower NMI for the TV and content losses. Additionally, different weights do not significantly impact the quality of the SSIM-regulated images, thereby synthesising virtually stained images with comparable quality.
Figure 4:
Figure 4: Average PSNR (↑) computed on seven TMA cores for each of the five different regularisation terms, using weights 1, 2, 5, 10, and 20 in combination, grouped by the weights (upper plot) and regularisations (lower plot).
PSNR is an approximate estimation of human perception of reconstructed images compared to the original images, with higher values indicating better quality. Figure 4 depicts the PSNR of the loss functions in parallel with various weights. Firstly, mean PSNR values agree well with the synthetic images, as well as those of MSE and NMI. SSIM-based outcomes have the highest PSNR for all weights, indicating that images generated with SSIM have the overall best quality. In contrast, TV-based outcomes have the lowest PSNR, and thus, they have the least similarity with true H&E images. Texture and DISTS reach similar scores comparable to SSIM. Additionally, PSNR presents a slightly better contrast among different parameters, thus allowing for better differentiation compared to NMI, although the contrast is not as significant as that seen with MSE.
Figure 5:
Figure 5: Average FSIM (↑) computed on seven TMA cores for each of the five different regularisation terms, using weights 1, 2, 5, 10, and 20 in combination, grouped by the weights (upper plot) and regularisations (lower plot).
As discussed in Section 2.3, FSIM integrates phase congruency (to measure features in frequency domain and is invariant to contrast) and gradient magnitude (image gradient), thereby conveying more information than SSIM, NMI, and PSNR [28]. Intuitively, the FSIM metric appears to more distinctly capture differences, especially evident in significant quality variations among virtual images that result in notable differences in PSNR values, for example, SSIM and TV-constraint images. In addition, the FSIM results align with those by other three metrics. However, It is also observed that the mean scores for texture and DISTS losses consistently decrease with increasing weights. In contrast, others exhibit variations as the weights increase. For instance, considering DISTS, weight 2 is associated with higher MSE and lower PSNR than weight 1, but FSIM presents a contradictory result.

4 Discussion and Conclusion

In this study, we have thoroughly evaluated the impact of loss functions on virtual H&E staining with label-free autofluorescence images. Specifically, we examined five additional regularisation terms widely applied in virtual histological staining, in parallel with five different weights. Given the implementation outlined in Section 2.1, it is evident that SSIM, texture, and DISTS, when assigned various weights, can produce virtual H&E staining that meets acceptable standards. Importantly, these variations do not hinder diagnostic decision-making. Additionally, content loss with smaller weights also yields satisfactory results. On the contrary, content loss with substantial weights and TV are incapable of achieving acceptable outcomes in virtual H&E staining. The quantitative results obtained from similarity metrics, including MSE, NMI, PSRN, and FSIM, align consistently with the quality of the generated images, making them effective for comparison purposes. However, it remains exceptionally difficult to solely rely on these metrics to identify satisfactory virtual staining.
The results illustrate that although SSIM loss is mathematically and conceptually the simplest loss, it is capable of generating reliable images and surpasses all other losses across all quantitative scores. In addition, SSIM loss is much less susceptible to variations induced by different weights compared to other loss functions, maintaining a relatively consistent performance across the range of weights considered. However, as depicted in Figure 1, SSIM loss struggles reconstructing small immune cell (green arrows in Figure 1), although this may not affect diagnostic decision making.
Compared to SSIM loss, content, texture, and DISTS losses are significantly more complicated as they all utilise deep features to determine similarity. For instance, with the GPU employed for training, SSIM can process 64 images per run, while the other three can only handle 16 images. Consequently, the training time per one epoch increases dramatically from about 25 seconds for SSIM to over 110 seconds for the other three losses, even with the same batch size of 16.
We also showed that TV loss does not generate acceptable images, which is contradictory with the outcomes in [19]. This may have several reasons. First, our autofluorescence images are single-channel, where their images were collected using different filters, resulting in multi-channel images at different excitation and emissions wavelengths. In addition, Our images were collected with a confocal microscope, whereas their images were acquired using a wide-field microscope, allowing their images to more closely match the true H&E images. Meanwhile, our GAN model is different from their custom model. Last but not least, our study aims to evaluate the impact of these loss functions. To ensure a fair comparison, we intended to fix other parameters instead of fine-tuning them for optimal images.
As far as the metrics are concerned, all align well with the visual outcomes. Particularly, FSIM presnts the most significant contrast, whereas NMI produces scores with the least contrast. Therefore, FSIM may be more suitable for identifying noticeable differences. However, it becomes less sensitive to dissimilarity at the micro level. For example, DISTS-basd scores across the weights are too close to reflect the cellular differences in virtual images. In comparison, MSE and PSNR are more sensitive than FSIM on those trivial changes.
Another important aspect in the evaluation is the hyperparameter β used in Equation 2. We acknowledge that there are an infinite number of values eligible for the purpose. Empirically, this value were set to approximately 10 times the value of α [6, 8, 19, 32]. Accordingly, we selected various β less than 10, including 1, 2, 5, and 10. We also evaluated 20 as an additional value to have an initial impression on whether a large hyperparameter would be beneficial for the reconstruction. Our findings align with existing literature, indicating that within the range of [1, 10], the reconstruction generally improves as the hyperparameter increases. However, larger values exceeding 10 appear to negatively impact the quantitative outcomes.
Future work will extend to other loss functions, such as LPIPS (Learned Perceptual Image Patch Similarity, [29]), and the combination of those loss functions. In addition, although pix2pix dominates the supervised virtual histological staining, novel DL technologies, such as vision transformer and diffusion model [13], have yielded outstanding performance. Therefore, another potential improvement is to include those models as the foundation.

Acknowledgments

This study was partially funded by Cancer Research UK’s Early Detection and Diagnosis Primer Award (EDDPMA-May22/100054), NVIDIA Academic Hardware Grant Program, and the UK Engineering and Physical Sciences Research Council (EP/R005257/1 and EP/S025987/).

Footnote

References

[1]
Bijie Bai, Hongda Wang, Yuzhu Li, Kevin de Haan, Francesco Colonnese, Yujie Wan, Jingyi Zuo, Ngan B. Doan, Xiaoran Zhang, Yijie Zhang, Jingxi Li, Xilin Yang, Wenjie Dong, Morgan Angus Darrow, Elham Kamangar, Han Sung Lee, Yair Rivenson, and Aydogan Ozcan. 2022. Label-Free Virtual HER2 Immunohistochemical Staining of Breast Tissue using Deep Learning. BME Frontiers 2022 (2022). https://doi.org/doi:10.34133/2022/9786242
[2]
Bijie Bai, Xilin Yang, Yuzhu Li, Yijie Zhang, Nir Pillar, and Aydogan Ozcan. 2023. Deep learning-enabled virtual histological staining of biological samples. Light: Science & Applications 12, 1 (2023), 57.
[3]
Joseph Paul Cohen, Margaux Luck, and Sina Honari. 2018. Distribution matching losses can hallucinate features in medical image translation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I. Springer, 529–536.
[4]
K. Y. Ding, K. D. Ma, S. Q. Wang, and E. P. Simoncelli. 2022. Image Quality Assessment: Unifying Structure and Texture Similarity. Ieee Transactions on Pattern Analysis and Machine Intelligence 44, 5 (2022), 2567–2581.
[5]
Leon Gatys, Alexander S Ecker, and Matthias Bethge. 2015. Texture synthesis using convolutional neural networks. Advances in neural information processing systems 28 (2015).
[6]
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition (June 2016), 2414–2423.
[7]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (Eds.), Vol. 27. Curran Associates, Inc.
[8]
Yiyu Hong, You Jeong Heo, Binnari Kim, Donghwan Lee, Soomin Ahn, Sang Yun Ha, Insuk Sohn, and Kyoung-Mee Kim. 2021. Deep learning-based virtual cytokeratin staining of gastric carcinomas to measure tumor–stroma ratio. Scientific Reports 11, 1 (2021), 19255.
[9]
P. Isola, J. Y. Zhu, T. H. Zhou, and A. A. Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. 30th Ieee Conference on Computer Vision and Pattern Recognition (Cvpr 2017) (2017), 5967–5976.
[10]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14 (October 2016), 694–711.
[11]
Lei Kang, Xiufeng Li, Yan Zhang, and Terence T.W. Wong. 2022. Deep learning enables ultraviolet photoacoustic microscopy based histological imaging with near real-time virtual staining. Photoacoustics 25 (2022), 100308.
[12]
Nischita Kaza, Ashkan Ojaghi, and Francisco E. Robles. 2022. Virtual Staining, Segmentation, and Classification of Blood Smears for Label-Free Hematology Analysis. BME Frontiers 2022 (2022), 9853606. arXiv:https://spj.science.org/doi/pdf/10.34133/2022/9853606
[13]
Amirhossein Kazerouni, Ehsan Khodapanah Aghdam, Moein Heidari, Reza Azad, Mohsen Fayyaz, Ilker Hacihaliloglu, and Dorit Merhof. 2023. Diffusion models in medical imaging: A comprehensive survey. Medical Image Analysis 88 (2023), 102846.
[14]
Lucas Kreiss, Shaowei Jiang, Xiang Li, Shiqi Xu, Kevin C. Zhou, Kyung Chul Lee, Alexander Mühlberg, Kanghyun Kim, Amey Chaware, Michael Ando, Laura Barisoni, Seung Ah Lee, Guoan Zheng, Kyle J. Lafata, Oliver Friedrich, and Roarke Horstmeyer. 2023. Digital staining in optical microscopy using deep learning - a review. PhotoniX 4, 1 (2023), 34.
[15]
J. X. Li, J. Garfinkel, X. R. Zhang, D. Wu, Y. J. Zhang, K. de Haan, H. D. Wang, T. R. Liu, B. J. Bai, Y. Rivenson, G. Rubinstein, P. O. Scumpia, and A. Ozcan. 2021. Biopsy-free in vivo virtual histology of skin using deep learning. Light-Science & Applications 10, 1 (2021).
[16]
Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5188–5196.
[17]
Pranita Pradhan, Tobias Meyer, Michael Vieth, Andreas Stallmach, Maximilian Waldner, Michael Schmitt, Juergen Popp, and Thomas Bocklitz. 2021. Computational tissue staining of non-linear multimodal imaging using supervised and unsupervised deep learning. Biomed. Opt. Express 12, 4 (Apr 2021), 2280–2298.
[18]
Stephan Preibisch, Stephan Saalfeld, and Pavel Tomancak. 2009. Globally optimal stitching of tiled 3D microscopic image acquisitions. Bioinformatics 25, 11 (04 2009), 1463–1465. arXiv:https://academic.oup.com/bioinformatics/article-pdf/25/11/1463/48989910/bioinformatics_25_11_1463.pdf
[19]
Y. Rivenson, H. D. Wang, Z. S. Wei, K. de Haan, Y. B. Zhang, Y. C. Wu, H. Gunaydin, J. E. Zuckerman, T. Chong, A. E. Sisk, L. M. Westbrook, W. D. Wallace, and A. Ozcan. 2019. Virtual histological staining of unlabelled tissue-autofluorescence images via deep learning. Nature Biomedical Engineering 3, 6 (2019), 466–477.
[20]
O. Ronneberger, P. Fischer, and T. Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention, Pt Iii 9351 (2015), 234–241.
[21]
Leonid I Rudin, Stanley Osher, and Emad Fatemi. 1992. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60, 1-4 (1992), 259–268.
[22]
Ayush Somani, Arif Ahmed Sekh, Ida S. Opstad, Åsa Birna Birgisdottir, Truls Myrmel, Balpreet Singh Ahluwalia, Alexander Horsch, Krishna Agarwal, and Dilip K. Prasad. 2022. Virtual labeling of mitochondria in living cells using correlative imaging and physics-guided deep learning. Biomed. Opt. Express 13, 10 (Oct 2022), 5495–5516.
[23]
Syed Ahmed Taqi, Syed Abdus Sami, Lateef Begum Sami, and Syed Ahmed Zaki. 2018. A review of artifacts in histopathology. Journal of oral and maxillofacial pathology: JOMFP 22, 2 (2018), 279.
[24]
M Titford. 2005. The long history of hematoxylin. Biotechnic & Histochemistry 80, 2 (Jan. 2005), 73–78.
[25]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.
[26]
Roberta H Yuhas, Alexander FH Goetz, and Joe W Boardman. 1992. Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm. In JPL, Summaries of the Third Annual JPL Airborne Geoscience Workshop. Volume 1: AVIRIS Workshop.
[27]
Guanghao Zhang, Bin Ning, Hui Hui, Tengfei Yu, Xin Yang, Hongxia Zhang, Jie Tian, and Wen He. 2022. Image-to-images translation for multiple virtual histological staining of unlabeled human carotid atherosclerotic tissue. Molecular Imaging and Biology (2022), 1–11.
[28]
Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang. 2011. FSIM: A feature similarity index for image quality assessment. IEEE transactions on Image Processing 20, 8 (2011), 2378–2386.
[29]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In CVPR.
[30]
Y. J. Zhang, K. de Haan, Y. Rivenson, J. X. Li, A. Delis, and A. Ozcan. 2020. Digital synthesis of histological stains using micro-structured and multiplexed virtual staining of label-free tissue. Light-Science & Applications 9, 1 (2020).
[31]
S Kevin Zhou, Daniel Rueckert, and Gabor Fichtinger. 2019. Handbook of medical image computing and computer assisted intervention. Academic Press.
[32]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In 2017 IEEE International Conference on Computer Vision (ICCV).

Index Terms

  1. Impact of Loss Functions on Label-free Virtual H&E Staining

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICBBT '24: Proceedings of the 2024 16th International Conference on Bioinformatics and Biomedical Technology
      May 2024
      279 pages
      ISBN:9798400717666
      DOI:10.1145/3674658

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 November 2024

      Check for updates

      Author Tags

      1. Loss functions
      2. virtual H&E staining
      3. label-free autofluorescence microscopic images
      4. paired image-to-image translation
      5. image similarity metrics

      Qualifiers

      • Research-article

      Funding Sources

      • Cancer Research UK
      • NVIDIA Academic Hardware Grant Program
      • Engineering and Physical Sciences Research Council

      Conference

      ICBBT 2024

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 158
        Total Downloads
      • Downloads (Last 12 months)158
      • Downloads (Last 6 weeks)62
      Reflects downloads up to 17 Feb 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media