Keywords

1 Introduction

Diffuse optical tomography has witnessed increased interest as an imaging modality and has recently demonstrated its clinical potential in probing tumors [1, 2] for being an affordable, non-ionising alternative to X-ray mammography, the primary screening technique for breast cancer detection. Using near-infrared light in the spectral range of 600 to 950 nm, DOT enables measuring the distribution of the tissue’s optical absorption and scattering parameters that can be used to quantitatively assess tissue malignancy. The main challenge then remains how to accurately recover these parameters given the ill-posedness of the DOT image reconstruction problem and the absence of exact analytic inverse.

Most image reconstruction methods are analytic and iterative approaches that often suffer from high computational complexity and are complicated by factors such as imaging geometry, source calibrations and sensor non-idealities [4]. While many deep learning based image reconstruction approaches were proposed recently [5] and showed increased reconstruction speed, resolution enhancement and artifact removal for a variety of imaging modalities, most methods focus on CT and MR image reconstruction [3, 5,6,7] and only a few tackle ultrasound, photo-acoustic, multiple scattering, and DOT inverse problems [8,9,10,11].

Most DOT inverse problems consider a circular shape scanner with 16 or more point sources uniformly distributed along the field of view boundary to maximize the number of measurements thereby improving spatial resolution, especially in strongly scattering media. Most recently, a multi-layer perceptron network (MLP) was used to solve DOT image reconstruction problem using high source count [11]. However, increasing the number of sources and detectors adds complexity to the DOT scanner hardware and increases manufacturing cost and computational resources.

One common limitation of existing reconstruction methods is that they perform poorly on data with a very low number of point sources (limited projection data), limited-angle acquisition (e.g, acquisition from one view), or both [1].

Sun et al. [12, 13] address the multiple scattering problem of microwaves in biological samples. They study the effect of decreasing the number of sources, to a limited extent (to a minimum of 20 sources), on deep learning based reconstruction methods for weak and strong scattering scenarios. While their proposed reconstruction model leverages the rich data representation collected from 20 up to 40 point sources, it relies on a computationally expensive analytic reconstruction step to provide a first estimate of the reconstructed image, prohibiting real time inference. Furthermore, their deep model is not optimized in an end-to-end manner.

Limited-angle and limited sources DOT image reconstruction in a strong scattering medium is a challenging task that has been considered in end-to-end fashion for a functional hand-held probe in a clinical trial by Ben Yedder et al. [14]. Yet, their results still suffer from noisy reconstruction and deviation of the reconstruction lesion compared to the ground truth location.

To address the aforementioned limitations, in this paper, we propose a deep learning DOT reconstruction method based on a novel loss function and transfer learning to solve the limited-angle and limited sources DOT image reconstruction problem in a strong scattering medium.

By adaptively focusing on important features and filtering irrelevant and noisy ones using the Fuzzy Jaccard loss, our network is able to reduce false positive reconstructed pixels and as a result reconstruct more accurate images.

Training machine learning based methods requires a high number of training samples, a challenging requirement in a medical setting, especially with relatively new imaging devices like DOT probes. Synthetic data simulators can provide an alternative source of training data. However, creating a realistic synthetic dataset is a challenging task as it requires careful modelling of the complex interplay of factors influencing real world acquisition environment. A potential remedy is to attempt to bridge the gap between real word acquisition and synthetic data simulation via transfer learning. To the best of our knowledge, this is the first work to employ a Jaccard based loss and transfer learning to the DOT reconstruction problem.

2 Methodology

2.1 Background

Given a set of raw acquired DOT measurements \(y \in {\mathbb {R}}^{S\times D}\) from S sources with D sensors, our objective is to reconstruct an image \({x} \in {\mathbb {R}}^{W \times H}\), which represents the tissue’s optical coefficients. This problem is commonly formulated as finding a reconstructed image \({\hat{x}}^*\) that minimizes the reconstruction error between the sensor-domain sampled data y and the forward projection \({\mathcal {F}}(\cdot )\) from a possible reconstructed image \({\hat{x}}\):

$$\begin{aligned} {\hat{x}}^* = \mathop {\mathrm{argmin}}\limits _{{\hat{x}}} \Vert {\mathcal {F}}({\hat{x}}) -y\Vert + \lambda {\mathcal {R}}({\hat{x}}) \end{aligned}$$
(1)

where \({\mathcal {F}}(\cdot )\) is a known predefined forward projection that converts \({\hat{x}}\) to the sensor domain, \({\mathcal {R}}(\cdot )\) is a regularization term encoding the prior information about the data, and \(\lambda \) is a hyper-parameter that controls the contribution of the regularization term. This objective function is traditionally minimized in an iterative manner until convergence. Alternatively we can learn the task of reconstructing the image from sensor-domain data by way of a deep neural network with a significant one-time, off-line training cost that is offset by a fast inference time.

2.2 Deep Learning Reconstruction

Given pairs of measurement vectors y and their corresponding ground truth image x, our goal is to optimize the parameters \(\theta \) of a fully convolutional neural network in an end-to-end manner to learn the mapping between the measurement vector y and its reconstructed tomographic image x, which recovers the optical parameters of underlying imaged tissue. Therefore, we seek the inverse function \({\mathcal {F}}^{-1}(\cdot )\) that solves:

$$\begin{aligned} \theta {^*} = \mathop {\mathrm{argmin}}\limits _{\theta } {\mathcal {L}}\left( {\mathcal {F}}^{-1}(y,\theta ), x \right) + \lambda {\mathcal {R}}({\mathcal {F}}^{-1}(y,\theta )) \end{aligned}$$
(2)

where \({\mathcal {L}}\) is the loss function of the network that, broadly, penalizes the dissimilarity between the estimated reconstruction and the ground truth. We use an L2 regularization term (\({\mathcal {R}}\)).

Fig. 1.
figure 1

The overall architecture of the proposed model. (lower left) The probe (in black) is positioned to image a phantom (white) with an embedded synthetic lesion (red arrow). The transfer learning (multilayer perceptron) network maps phantom measurements \(y^p\) to the domain of in silico training measurements \(y^s\). The mapped measurements are passed through the reconstruction network to produce the reconstructed image \({{\hat{x}}^*}\) (rightmost image) (Color figure online)

Deep Network. The proposed architecture, which extends Yedder et al.’s [14] FCNN architecture with a transfer learning and novel loss components, is a decoder-like network that consists of a fully connected layer followed by a set of residual layers. The fully connected layer maps the measurement vector to a two-dimensional array and provides a coarse image estimate, while the subsequent residual blocks refine the image estimate by passing it through a set of nonlinear transformations to produce the final reconstruction image. The architecture of our proposed model is shown in Fig. 1, where each residual block uses convolutions with batch normalization and ReLU.

Novel Loss Function. To address DOT image reconstruction from a limited information representation (one view with few sources), we propose a novel loss function, \({\mathcal {L}}\), that dynamically combines two loss terms:

$$\begin{aligned} {\mathcal {L}} = {\mathcal {L}}_{\mathrm {MSE}} +\beta (epoch) {\mathcal {L}}_{\mathrm {FJ}} \end{aligned}$$
(3)

where \({\mathcal {L}}_{\mathrm {MSE}}\) is the mean squared error (MSE) loss, which focuses on pixel-wise similarity. \({\mathcal {L}}_{\mathrm {FJ}}\) is similarity coefficient based fuzzy Jaccard term designed to promote lesion location and appearance similarity while penalizing artifacts. \(\beta \) is a hyper-parameter balancing the two terms and varies with the training epochs to capture the dynamics of this interaction. In particular, \(\beta (epoch+{\varDelta })=\beta _0+\gamma \beta (epoch)\) with \(\gamma >0\), which allows the network to learn to first reconstruct, via \({\mathcal {L}}_{\mathrm {MSE}}\), an image estimate that is relatively close to the ground truth image pixel wise distribution and then, via \({\mathcal {L}}_{\mathrm {FJ}}\), gradually refine that candidate image. In DOT image reconstruction of a breast tissue with zero or more isolated lesions, the majority of the pixels are background, \({\mathcal {L}}_{\mathrm {FJ}}\) is chosen to address this imbalance. Further, \({\mathcal {L}}_{\mathrm {FJ}}\) does not require binary values and accounts for the similarity between the foreground as well as the background pixel values. Finally, a log transform of \({\mathcal {L}}_{\mathrm {FJ}}\) ensures a steep convex gradient,

$$\begin{aligned} {{\mathcal {L}}_{\mathrm {FJ}}} = - {\log { \left( \frac{\sum _{i=1}^{n}\min (a_{i}, a'_{i})}{\sum _{i=1}^{n} \max (a_{i}, a'_{i})} +\epsilon \right) }} \end{aligned}$$
(4)

where the \(min(\cdot , \cdot )\) and \(max(\cdot , \cdot )\) functions compute a probabilistic intersection and union, respectively while setting \(\epsilon \,=\,10^{-5}\) avoids \(\log {(0)}\) domain errors.

Transfer Learning Network. As training on real world data is limited by availability of samples, we resort to generating artificial training data via a simulator. A transfer learning network, implemented as a multilayer perceptron, tackles the domain shift between the real data measurement \(y^p\), as collected from the probe and used during inference, and the in silico data measurement \(y^s\) used during training time (Fig. 1-upper left). By minimizing a loss \({\mathcal {L}}_{TL}\), the transfer learning network learns to translate the real world data distribution onto the in silico data distribution while avoiding overfitting on the in silico model. Finally, by retraining or fine-tuning this transfer learning network only, our proposed approach can be generalized to new DOT sensors and or source configurations.

Given the i-th phantom \(x_i^p\) we simulate its \(x_i^s\) tissue equivalent and derive the corresponding sensor measurements \(y_i^s\), while we collect \(y^p_i\) using a physical probe. By minimizing \({\mathcal {L}}_{TL}\) over \(N_p\) phantom experiments, the transfer learning module learns the mapping \(\phi (y^p_i) \approx y^s_i \) to ensure it is in the same domain as \(y^s\),

$$\begin{aligned} \theta ^*=\mathop {\mathrm{argmin}}\limits _{\theta } {\mathcal {L}}_{\mathrm {TL}}(\theta ) ~~~\text {where}~~~{\mathcal {L}}_{\mathrm {TL}}(\theta ) =\sum _{i=1}^{N_p} ||\phi (y_i^p; \theta ) - y_i^s|| \end{aligned}$$
(5)

where a final test reconstructed image is computed as, \(\hat{x_i}^* = {\mathcal {F}}^{-1}(\phi (y_i^p))\).

3 Experiments and Results

We compare our proposed approach to the state of the art FCNN architecture for limited angle data [14] and the aforementioned MLP approach [11], as well as the analytic reconstruction approach described by Shokoufi et al. [15]. In addition, we evaluate the individual contributions of the terms of our loss function and the transfer learning.

3.1 Dataset

To train our network \({\mathcal {F}}^{-1}(\cdot )\) we use in silico training data pairs \((x^s, y^s)\). It includes images \(x^s\) of optical tissue properties, discretized into finite-element nodes, and their corresponding forward projection measurements \(y^s\) from the Toast++ software suite [17] using realistic human breast tissue and lesion optical parameters distribution values [16]. Simulated lesions have varying sizes, locations, and optical coefficients. The forward model mimics the functional hand-held probe sources and detectors geometries [1]. It comprises 2 LED light sources that illuminate the tissue symmetrically and surrounds 128 detectors where both LED and all detectors are co-linear. The output of the forward model is a 1  \(\times \)  256 vector \(y^s\). A total of 21,590 samples data pairs are used.

The test-set is based on a tissue-equivalent solution where an intralipid solution is used to mimic background breast tissue due to its similarity in optical properties to breast tissue. A tube with 4 mm cross-sectional diameter was filled with an Indian-ink tumor-like liquid and was placed at different locations inside the solution container/solid phantom to mimics cancerous lesions. All phantom measurements \(y^p\) are collected with the DOB-probe.

3.2 Implementation

The model was implemented in the Keras framework and trained for a total of 1,000 epochs on an Nvidia Titan X GPU using the Adam optimizer. By optimizing the model’s performance on the validation set, we set all hyper-parameters as follows: Batch size to 64; learning rate to 0.001; AMS Grad optimizer set to true; and (\({\varDelta }=10\), \(\beta _0=0.2\), \(\gamma =0.002\)), which describe the update equation of the hyper-parameter \(\beta \) in (3); We use a 80/10/10 training/validation/test split of the in silico data.

3.3 Qualitative Results

Our model is trained on the in silico data and tested on the phantom dataset. In Fig. 2, we visually compare our proposed reconstruction method to the competing methods’ results on sample phantom cases. As mentioned earlier, the transfer learning module maps the real world distribution onto the learned in silico distribution. Without such mapping, unsurprisingly, we notice artifacts in the reconstructed image; note the extensive scattering of false positives with different scales and locations (Fig. 2 - FCNN (MSE)). Adopting transfer learning clearly reduces these artifacts (Fig. 2 - FCNN (MSE+TL)). Further, observe how incorporating both the new loss term \({\mathcal {L}}_{FJ}\) and transfer learning module significantly reduces the artifacts and improves lesion localization, which otherwise could compromise diagnosis (Fig. 2 - FCNN (MSE+TL+FJ)).

Fig. 2.
figure 2

Qualitative reconstruction performance of our model compared to state of the art techniques on phantom samples with known lesion ground truth locations. The parabolic shape of the reconstruction produced by the analytical approach is due to the algorithm used.

Table 1. Quantitative results on 32 phantom experiments.

While MLP showed good performance in the complete information case, namely a circular shape scanner with 16+ uniformly distributed point sources [11], it underperforms on the limited angle experiments (Fig. 2 - MLP). We hypothesize that this difference in performance is due to the convolution operators’ ability to extract comprehensive contextual information and synthesize more complex robust features.

3.4 Quantitative Results

We measure the reconstruction quality via: (i) Lesion localization error, i.e. the distance between the centre of the lesions in the ground truth image versus the reconstructed image; (ii) peak signal to noise ratio (PSNR); (iii) structural similarity index (SSIM); and (iv) the Fuzzy Jaccard [18]. All reconstructed images were first normalized prior to calculating the performance metrics. Table 1 presents the results on the phantom dataset.

Using the transfer learning \(\mathcal \phi (\cdot )\) module, we observe \({\sim }\)10% improvement in Fuzzy Jaccard and \({\sim }\)16% in SSIM compared to state of the art FCNN with MSE only. Adding the new \({\mathcal {L}}_{FJ}\) loss term boosts the improvement in these two metrics further to \({\sim }\)34% and \({\sim }\)33%, respectively. The lesion localization error is also considerably reduced when using transfer learning and \({\mathcal {L}}_{FJ}\).

4 Conclusion

We proposed novel extensions to deep learning based diffuse optical tomography image reconstruction. We have shown empirically that our model, trained with the novel hybrid loss function, attains superior quantitative results on multiple evaluation metrics and, qualitatively, improves the reconstructed images, showing fewer artifacts that could compromise clinical diagnosis. The transfer learning module renders an in silico trained network applicable to real world data. More importantly our approach is decoupled from a change in real world measurements and can be generalized to new source configurations. Our next phase in this research is to improve further the lesion localization and validate our approach on real patient data to assess its diagnostic accuracy.