Keywords

1 Introduction

Retinal fundus images are the only type of medical image that directly observes the blood vessels to generate clear, high resolution visualizations. They are simple, noninvasive, relatively cheap, and require no radiation or pharmaceuticals. They are used to diagnose various retinal diseases including diabetic retinopathy, age-related macular degeneration, epiretinal membrane, and glaucoma. They can also be used for early diagnosis and prevention of chronic diseases including diabetes and hypertension.

Chronic diseases can damage vessels and also cause new vessels to be formed [15]. Clinicians require highly accurate detection and measurement of vessels, including fine, filamentary vessels with thin complex shapes and low contrast, for better diagnoses. Thus, this problem has been extensively researched [1].

Fig. 1.
figure 1

Qualitative comparison between (a) manual expert annotations of DRIVE [14] and (b) the results of the proposed method with minimal manual editing of minor errors. Our goal is to construct a foundational dataset for achieving superhuman accuracy in deep learning based retinal vessel segmentation.

Public retinal image datasets including DRIVE [14], STARE [9], CHASE_DB1 [6], and HRF [3] have been vital to the research. These datasets all include vessel region masks achieved by manual expert annotations. While they are assumed as the ground truth, it is actually very difficult to measure their accuracy. Comparison with a second expert actually show the limitations of human annotations. The accuracy of recent automatic vessel segmentation methods [11, 13] are higher than the annotations by the second expert. This is because inter-observer differences inevitably occur at ambiguous regions. Filamentary vessels are often barely visible in the retinal fundus image, even with zooming and contrast enhancement. If we can provide consistent and detailed annotations at these regions, ground truth expert annotations can be improved, which in turn can improve machine learning based methods.

In this paper we present a new framework for retinal vessel extraction from fundus images through registration and segmentation of corresponding fluorescein angiography (FA) images. In FA, a fluorescent dye is injected into the bloodstream to highlight the vessels by increasing contrast. But the highlights are temporally dispersed among the multiple FA frames while the dye flows through the vessels from arterioles to venules. Thus we must first align the FA frames and aggregate the per-frame segmentations to construct a detailed vessel mask. Here, alignment is done by keypoint based registration, and vessel segmentation is done using a convolutional neural net (CNN) [11]. The constructed FA vessel mask is then registered to the fundus image based on an initial vessel segmentation for the fundus image, again using a CNN. Postprocessing is performed to refine the final fundas image vessel mask based on the FA vessel mask. We believe the proposed method is the first method to successfully elevate the level of detail and accuracy in automatic vessel segmentation of filamentary vessels for fundus images by incorporating FA. Moreover, it can be used to generate more detailed and consistent ground truth vessel masks as shown in Fig. 1.

Fig. 2.
figure 2

The registration framework for a pair of FA frames.

2 Methods

The proposed method can be compartmentalized into three subprocesses, corresponding to (1) registration of FA frames and their vessel extraction, (2) multi-modal registration of aggregated FA vessels to the fundus image, (3) postprocessing for fine refinement of the vessel mask. We describe the details of each subprocess in the following subsections.

2.1 Registration and Vessel Extraction of FA Frames

Here, the objective is to extract a mask of all vessels including the filamentary ones. We thus extract all vessels from all FA frames and aggregate them in a combined registered frame. In contrast to methods based on registration of extracted vessels [12], vessels are extracted after registration, since the highlighted vessels change considerably due to blood flow. Moreover, registration actually helps the vessel extraction, since false positives can be avoided through the aggregation of vessel regions. Thus, we propose a three step hierarchical process, combining coarse rigid registration in the pixel domain and fine non-rigid registration in the vessel mask domain to ensure robustness against appearance changes of the frames and its vessels. A visual summary of the framework is given in Fig. 2. We note that this process is iteratively performed for all adjacent frame pairs with the initial frame as the reference frame.

In the first step, feature point matching is performed in the pixel domain. We use the SIFT descriptor on local maxima in the difference of Gaussians [10]. Keypoint matching is performed using RANSAC (random sample consensus) with the perspective transform matrix [8]. The source image is then rigidly registered to the target image using the transform matrix determined by the keypoint matches.

We next refine the rigid registration by non-rigid registration of vessel probability maps. Here, we leverage recent developments in deep learning by using a recently proposed convolutional neural network (CNN) called the retinal SSANet [11]. Since no training data is available for supervised learning of the network for the FA frames, we utilize the public datasets DRIVE [14] and HRF [3] comprising fundus images and expert annotated ground truth vessel maps. To account for the differences in image characteristics, we convert to greyscale and then invert intensity before training. We also resize the images to match FA resolution.

Given the vessel probability maps, we then perform pixel-wise non-rigid registration. We assume a b-spline transform model with similarity measured by normalized cross-correlation and optimization with the gradient based L-BFGS-B [4] algorithm.

2.2 Registration of FA and Fundus Image

To register the aggregated probabilistic vessel map of FA, we generate a similar map for the fundus image. Again, we train a retinal SSANet [11], this time without any preprocessing on images of DRIVE and HRF images. The vessel maps are generated from inference of this network. Based on the vessel maps, we again perform coarse rigid registration, this time using chamfer matching, followed by fine non-rigid registration. For chamfer matching, we first assign the binarized fundus image and FA vessel masks as the source and target shapes. We then find the global displacement vector and the rotation angle (within a \(\pm 5 ^\circ \) range) that minimizes the sum of distances between each point on the source shape and the target shape by brute force search on the distance transform (DT) of the target shape. We use the inverse of the obtained transform to align the FA map to the fundus image. For non-rigid registration, we use the same specifics as in Subsect. 2.1. A visual summary of the framework is given in Fig. 3

Fig. 3.
figure 3

The registration framework for the aggregated FA vessel mask and fundus image.

2.3 Postprocessing

Here, we aim to generate an accurate binary vessel mask of the fundus image, from the aligned probabilistic vessel map of the FA. The postprocessing comprises binarization and refinement. A visual summary is given in Fig. 4.

Fig. 4.
figure 4

The postprocessing framework comprising binarization and refinement.

To avoid discontinuities that may occur at the filamentary vessels from simple thresholding, we apply hysteresis thresholding for binarization. Pixels over a higher threshold \(\tau _{h}\) are used as seeds for region growing pixels with probability higher than a lower threshold \(\tau _{l}\). Here, we empirically set \(\tau _{h}=0.75\) and \(\tau _{l}=0.1\).

Furthermore, we refine the vessel mask to align the vessel boundaries to the image gradients in the fundus image. Specifically, we utilize the Frangi filter [5], in an inverted manner with sigma values 1 to 3, to detect the valleys between vessels and the outer boundaries, and then erode these regions.

3 Experimental Results

3.1 Dataset and Experimental Environment

The dataset comprises 200 cases of FA and fundus image pairs from 153 patients. They were acquired using Canon CF60Uvi, Kowa VX-10, and Kowa VX-10a cameras. The number of FA frames was on average 7.14, with minimum 2 to maximum 24. Image resolutions originally varying from \(1604 \times 1216\) to \(2144 \times 1424\) were all normalized to \(1536 \times 1024\).

Computation times for the FA registration per frame pair, FA-fundus registration, and postprocessing averaged 57, 28, and 3 s, respectively, running on a 2.2 GHz Intel Xeon CPU and a nVidia Titan V GPU. Most of the computation was due to feature point matching (over 40 of the 57 s), and non-rigid registration took 15–16 s on average. OpenCV [2] was used for feature point matching and SimpleITK [16] was used for non-rigid registration.

3.2 Qualitative Evaluation

Figure 5 shows qualitative results of six sample cases. Here, we provide the vessel segmentation results generated by a CNN, namely, the SSANet [11] trained on the HRF [3] dataset, as a reference point for comparison. Although we are aware that this comparison maybe unfair, we were unable to establish an alternative comparative reference. Here, we can see that many filamentary vessels are indeed visible in the fundus images, but only with close visual inspection. Figure 6 shows a particular example of this case.

Fig. 5.
figure 5

Qualitative results. Six sample cases are shown in \(3 \times 2\) formation, with (top) the original image, (middle) the results of the proposed method, and (bottom) vessel segmentation results of the SSANet [11] trained on the public HRF [3] dataset. Left and right columns shows the images in full and zoomed resolution, respectively.

Fig. 6.
figure 6

Example illustrating the visibility of filamentary vessels in the fundus images.

3.3 Quantitative Evaluation

Ground truth (GT) segmentation masks are required for quantitative evaluation. But we cannot rely on expert annotation for filamentary vessels. We thus generate GT masks by manually editing the results from the proposed method. Editing mostly comprised removal of false positives near the optic disk by direct annotation. The average duration was 53 s per image.

We compared the results of the proposed method with the aforementioned SSANet [11] trained with different public training datasets including DRIVE [14], STARE [9], CHASE_DB1 [6], and HRF [3], in Table 1. The networks were trained on the resolution of the images in each dataset, and fundus images of our FA-fundus image set were resized accordingly and given as input images. Measures were computed based on the aforementioned GT. We present these results as a reference for understanding the performance of the proposed method.

Table 1. Quantitative results of proposed method compared to results obtained by a CNN (SSANet [11]) trained on public fundus image datasets with expert annotated ground truth. Sensitivity (Se), specificity (Sp), accuracy (Acc), area-under-curve of the receiver operating characteristic curve (AUC-ROC) are presented.

4 Discussion

We present a new method to generate fine-scale vessel segmentation masks for fundus images by registration with FA. We have shown that the obtained results contain a considerable amount of filamentary vessels that are virtually indiscernible to the naked eye with only the color fundus image. We believe that these results conversely show the limitations of expert annotations as ground truth, which is the standard of all previously released public datasets. Nonetheless, since the proposed method may still contain errors, the requirement of expert annotation remains in order to designate data as ground truth.

For future works, we plan to establish better means of quantitative evaluations for the proposed method. We are aware of the biases of GT toward the proposed method in our current quantitative evaluations. Unfortunately, methods such as that of Galdran et al. [7], that learns to estimate accuracy from existing GT, is inapplicable since it relies on existing expert annotations.

Also, we plan to construct a new dataset for filamentary vessels that can be used for improving deep learning based methods for retinal vessel segmentation. Our ultimate aim is to construct a dataset that can be the foundation for achieving superhuman accuracy. Particularly, although we utilize FA to construct the ground truth for this dataset, our intention is to use the generated ground truth for supervised learning of a vessel segmentation method with only fundus images as input.