Keywords

1 Introduction

A typical human red blood cell can be morphologically described as a biconcave discoid, called a discocyte [11]. Changes in the volume of the cell change its appearance: as the volume decreases, it shrivels into a star-like shape called echinocyte with distinguishable convex rounded protrusions. As volume increases, the cell expands into a shape with single- or multi-concave invaginations, called a stomatocyte. In physiological conditions, seven different morphological subtypes can be distinguished by hematologists (see Fig. 1) and appear in a particular frequencies, which change upon environmental challenges or in a course of a number of diseases [8].

Detection and classification of the red blood cell subtypes is a crucial step for blood sample analysis and the diagnosis of blood diseases [11]. Earlier on, morphological analysis was always performed on blood smears. However, more and more 2D and 3D images of living, often moving, red blood cells are produced for research and clinical needs with different modalities, illumination conditions and zoom levels. Classification of red bloods cells nowadays still relies on manual annotation by an expert.

Deep learning approaches are known to be versatile and adaptive to new environments and excel on a couple of recent biomedical challenges, like the classification of skin cancer [4] or the prediction of mutations from histopathological slides [2]. A first approach to the classification of red bloods cells has been recently also proposed [14]. However in general, the application of powerful deep learning algorithms in clinical applications is heavily limited by the need of large amounts of well annotated data, since expert time is typically scarce and expensive. We thus want to significantly reduce redundancy in manual annotation by developing uncertainty based scores that allow us to involve expensive expert knowledge only where necessary.

One promising approach to break the bottleneck of data annotation is active learning, which uses a learning algorithm to interactively query experts for new annotations. This expert-in-the-loop process has been demonstrated to achieve similar or even greater performance as compared to a fully labelled data-set, with a fraction of the cost and time that it takes to label all the data [9]. Here, we combine active learning with object detection and develop a novel active learning annotation tool to guide expert annotation. Although different active learning methods have been proposed to accelerate the annotation process for classification problems, e.g. [5], few approach exist that allow for object detection, and none for a multiclass detection problem with clinical relevance.

Our active learning annotation approach interactively selects a candidate annotation set by measuring the uncertainty of classification and detection of single cells, and by considering rare classes in data set. Our approach is the first to calculate relevance for the goal of active learning in multiclass object detection and the first to come up with intelligent data selection for expert annotation for biomedical images.

Fig. 1.
figure 1

Red bloods cells change their morphology due to microenvironmental changes or in the course of a disease. They can be classified into seven subtypes, from left to right: dehydrated stomatocyte (S.D.), normal stomatocyte (S.N.), discocyte (D), primary, secondary, tertiary and final echinocyte (E.1, E.2, E.3, E.F.). We show three exemplary cells for each class from our brightfield dataset containing 208 images and near 8000 cells.

2 Method

Our proposed active learning annotation approach starts from a Faster R-CNN model with an annotated training set (see Fig. 2). We apply the trained model on not annotated images and select the most relevant images based on a novel uncertainty analysis in order to ask for expert annotations. With these additional annotations, we update our model and select new images for more annotations. We keep iterating this annotation process until all cells above a particular uncertainty are annotated or a desired classification performance is achieved.

Fig. 2.
figure 2

Overview of the proposed active learning annotation tool. First, a Faster R-CNN model is trained on an annotated dataset. Not annotated images are then analyzed with the trained model, the uncertainty of detection and classification is determined, and the most relevant images are passed to the expert for annotation. With the new annotations, a new cycle starts.

2.1 Object Detection with Faster R-CNN

Faster R-CNN is an advanced version of Fast R-CNN [6] and R-CNN [7] and was first proposed in [12]. In this approach a Fast R-CNN is coupled with a Region Proposal Network (RPN) and both networks are trained together: convolutional layers extract features from the input image, the RPN generates object proposals based on the feature map, and each proposal is classified into one of the defined classes. We used a VGG-16 network [13] pretrained on ImageNet [3] as the backbone. More formally, considering \(F_\theta \) a Faster R-CNN model with weights \(\theta \) and I an input image we have

$$\begin{aligned} p,t^k = F_\theta (I) \end{aligned}$$
(1)

where p is discrete probability distribution over all classes (as is normally computed with soft-max over the last fully connected layer) and \(t^k\) is the bounding box regression for every class. The multi-task loss L of the Fast R-CNN can be defined as:

$$\begin{aligned} L = L_{cls}(p,u) + \lambda [u>0] L_{loc}(t^u,v) \end{aligned}$$
(2)

where u,v are annotations of bounding boxes and their classes from the dataset respectively and \(t^u\) is the bounding box regression corresponding to the ground truth u. Brackets are Iverson brackets that yield 0 for the background class (\(u=0\)) and 1 for the rest. \(\lambda \) is a balancing parameter between classification loss \(L_{cls}\) and localization loss \(L_{loc}\) [6]. The localization loss is defined as

$$\begin{aligned} L_{loc}(t_i,v)= \sum _{m\in \{x,y,w,h\}} \mathrm {smooth}_{L1} (t^m_i - v^m). \end{aligned}$$
(3)

The classification loss \(L_{cls}\) is calculated with softmax cross entropy:

$$\begin{aligned} L_{cls} = -\log p_u. \end{aligned}$$
(4)

For training we used an approximate joint training method [12].

2.2 Uncertainty Score per Cell

For each cell, we measure the uncertainty of our model prediction with three scores: (i) the detection uncertainty, (ii) the classification uncertainty and (iii) a binary score for the possibility of the object belonging to a rare class. We explain each score in the following subsections.

Detection Uncertainty. For each non-annotated image, we apply our model N times and quantify uncertainty using dropout variational inference [10]. To evaluate the certainty of the model in detecting cells, we compare bounding boxes of each cell across N inferences. For every inference, we only keep the bounding box of the class that has the highest probability in p. We call this bounding box d. Our uncertainty score of detection \(U^d\) is thus defined as

$$\begin{aligned} U^d = \frac{1}{N-1}\sum _{i=2}^{i=N}{ \mathrm {IoU} (d_1,d_i)} \end{aligned}$$
(5)

where \(d_i\) is bounding box d in the \(i^{th}\) inference and \(\mathrm {IoU}\) measures intersection over union between two given bounding boxes. It is clear that \(U^d\in [0,1]\).

Classification Uncertainty. For each detected cell, the most probable class c from p is picked. Having N inferences we have the set \(c = \{c_1,c_2,...,c_N\}\). Hence, we measure the uncertainty of classification using

$$\begin{aligned} U^c = \frac{1}{N} \sum _{i = 1}^{N} [c_i = c_m] \end{aligned}$$
(6)

where \(c_m\) is the mode, i.e. the item with the most frequency, in set c and \(U^c\) is its frequency. Similar to \(U^d\), \(U^c\) is also in [0, 1].

Rare Class Prediction. The red blood cell dataset has a strong class unbalance: cells belonging to the discocyte (D.) or primary echinocyte (E.1, see Fig. 1) class are much more frequent compared to dehydrated stomatocytes (S.D.) or final echinocytes (E.F.). Blood cells of rare classes are clinically interesting yet detecting them is extremely challenging due to the small number of samples and large variations in appearance. In order to boost the precision of detection in rare classes, we introduce a metric \(U^r\) to prioritize annotation of those cells that are likely to belong to a rare class.

$$\begin{aligned} \forall j \in R : U^r = {\left\{ \begin{array}{ll} 0 &{} p^j \le 0.2 \\ 1 &{} p^j > 0.2 \end{array}\right. } \end{aligned}$$
(7)

where R is the set of rare classes and \(p^j\) is the probability of class j in the Faster R-CNN discrete class probability.

2.3 Relevance Score per Image

We rank every non-annotated image with a relevance score defined as

$$\begin{aligned} R_{\text {img}} = \sum _{j=1}^{M}(U_j^c\le \alpha ) + \sum _{j=1}^{M}(U_j^d\le \beta ) + \gamma \times \sum _{j=1}^{M} {U^r_j} \end{aligned}$$
(8)

where M is the number of detected cells in the image, and \(\alpha \) and \(\beta \) are thresholds defined for detection and classification uncertainties. Images having a higher uncertainty are selected for the analysis of the expert. \(\gamma \) weights the contribution of cells that are suspected to be rare classes. In our experiments, we chose \(\alpha = 0.80\), \(\beta = 0.90\) and \(\gamma = 10\).

3 Experiment and Results

3.1 Data

Our dataset consists of 208 brightfield images with 572 \(\times \) 572 pixel obtained from human blood samples by an Axiocam mounted on Axiovert 200 m Zeiss microscope with a 100x objective. Cells are not stained and no preprocessing is performed. Each image contains 30–40 cells, with a total of 7669 cells in the dataset. Each red blood cell belongs to one out of seven classes according to the morphology classification shown in Fig. 1. Class frequency is highly unbalanced: While we find 3803 (=50%) discocytes, but only 36 (=0.5%) dehydrated stomatocytes. From the 208 images, we use 30 expert annotated images (\(\approx \)1100 cells) to train our initial Faster R-CNN model and hold out 20 annotated images (\(\approx \)720 cells) to evaluate the model accuracy. The remaining images are used for experiments of either active-learning guided expert annotation or randomly selected annotation.

3.2 Cell Uncertainty

Our approach is able to determine the uncertainty of each single cell, calculate the relevance of each image and rank images for annotation accordingly. Figure 3 shows three exemplary images with different types of cells selected by our strategy: cells associated with high detection uncertainty (Fig. 3a), cells associated with high classification uncertainty (Fig. 3b), and cells that are predicted to belong to a rare class (Fig. 3c). We highlight these cells in red boxes. In contrary, cells in green boxes are considered to be less informative for the model and do not require expert review. Images containing many or highly uncertain cells are ranked as highly relevant and presented to the expert for annotation.

Fig. 3.
figure 3

Exemplary cells (marked with a red box) that are considered to need expert annotation by our uncertainty assessment due to uncertain classification (a), uncertain detection (b) and association to a rare class (c). In (a) and (b), numbers above boxes indicate the measured certainty per instance and red boxes are below the acceptable threshold. (Color figure online)

Fig. 4.
figure 4

Active learning based annotation boosts the precision in cell detection and classification. (a) Weighted detection precision of all classes increases more rapidly with active learning (solid line) as compared to a random selection of cells (dashed) (b) Average precision for the rare class of dehydrated stomatocytes (S.D.), a rare subtype with high clinical relevance, increases sharply when active learning is used. We show the mean and standard deviation from 10 experiments, where we order the images to be newly annotated either randomly, or by sorting 50 randomly selected images according to their relevance score.

3.3 Evaluation

We evaluate our active learning annotation approach systematically by comparing its performance with a baseline method where the expert is asked to annotate randomly selected images. In Fig. 4a we show the object detection precision for all seven classes, weighted by the number of cells in each class:

$$\begin{aligned} \mathrm {Precision_{all}} = \frac{\sum _{k=1}^{K} N_k \times AP_k}{\sum _{k=1}^{K} N_k} \end{aligned}$$
(9)

where \(N_k\) is number of detected cells in class k. This value increases by \(5\%\) as we add 1000 newly annotated cells using active learning. In contrast, the performance boost with the same number of randomly annotated cells is slower and around only \(2\%\) for 1000 additionally annotated cells. The difference between the two methods is even more pronounced in the detection precision of blood cells of a rare class. The peculiar morphology of dehydrated stomatocytes and a potential over-representation has been linked to disease mutations [1]. Hence an accurate detection of this subtype is clinically highly important but impeded by the rareness of the cells. While dehydrated stomatocytes can be hardly captured by random annotation, our active learning annotation approach highlights cells that are predicted to belong to this rare class and prioritizes them for expert annotations. This leads to a fast increase of the average detection precision of this class from around \(15\%\) to around \(50\%\) for 1000 newly annotated cells (see Fig. 4b), while the average precision is unchanged in the random approach, where few if any dehydrated stomatocytes are annotated among the 1000 randomly selected new annotations.

4 Conclusion

Our original active learning annotation approach is able to speed up annotation and improve classification of red blood cell sub-types. This is an important task in diagnosis and prognosis of many blood diseases. Efficient annotation is urgently required for other biomedical data sets, and in particular for digital pathology applications. An extension of our framework into a software prototype will boost annotated data sets and open new avenues for computational pathology solutions.