Multiclass Deep Active Learning for Detecting Red Blood Cell Subtypes in Brightfield Microscopy

Ario Sadafi^16,17,18,
Niklas Koehler¹⁶,
Asya Makhro¹⁹,
Anna Bogdanova¹⁹,
Nassir Navab^17,20,
Carsten Marr¹⁶ &
…
Tingying Peng^16,17

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11764))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

17k Accesses
24 Citations

Abstract

The recent success of deep learning approaches relies partly on large amounts of well annotated training data. For natural images object annotation is easy and cheap. For biomedical images however, annotation crucially depends on the availability of a trained expert whose time is typically expensive and scarce. To ensure efficient annotation, only the most relevant objects should be presented to the expert. Currently, no approach exists that allows to select those for a multiclass detection problem. Here, we present an active learning framework that identifies the most relevant samples from a large set of not annotated data for further expert annotation. Applied to brightfield images of red blood cells with seven subtypes, we train a faster R-CNN for single cell identification and classification, calculate a novel confidence score using dropout variational inference and select relevant images for annotation based on (i) the confidence of the single cell detection and (ii) the rareness of the classes contained in the image. We show that our approach leads to a drastic increase of prediction accuracy with already few annotated images. Our original approach improves classification of red blood cell subtypes and speeds up the annotation. This important step in diagnosing blood diseases will profit from our framework as well as many other clinical challenges that suffer from the lack of annotated training data.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 675115 – RELEVANCE – H2020-MSCA-ITN-2015/H2020-MSCA-ITN-2015. CM acknowledges support from the BMBF (Project MicMode).

You have full access to this open access chapter, Download conference paper PDF

Automating Sickle Cell Counting Using Object Detection Techniques

A Large-Scale Multi Domain Leukemia Dataset for the White Blood Cells Detection with Morphological Attributes for Explainability

Annotation-Efficient Cell Counting

Keywords

1 Introduction

A typical human red blood cell can be morphologically described as a biconcave discoid, called a discocyte [11]. Changes in the volume of the cell change its appearance: as the volume decreases, it shrivels into a star-like shape called echinocyte with distinguishable convex rounded protrusions. As volume increases, the cell expands into a shape with single- or multi-concave invaginations, called a stomatocyte. In physiological conditions, seven different morphological subtypes can be distinguished by hematologists (see Fig. 1) and appear in a particular frequencies, which change upon environmental challenges or in a course of a number of diseases [8].

Detection and classification of the red blood cell subtypes is a crucial step for blood sample analysis and the diagnosis of blood diseases [11]. Earlier on, morphological analysis was always performed on blood smears. However, more and more 2D and 3D images of living, often moving, red blood cells are produced for research and clinical needs with different modalities, illumination conditions and zoom levels. Classification of red bloods cells nowadays still relies on manual annotation by an expert.

Deep learning approaches are known to be versatile and adaptive to new environments and excel on a couple of recent biomedical challenges, like the classification of skin cancer [4] or the prediction of mutations from histopathological slides [2]. A first approach to the classification of red bloods cells has been recently also proposed [14]. However in general, the application of powerful deep learning algorithms in clinical applications is heavily limited by the need of large amounts of well annotated data, since expert time is typically scarce and expensive. We thus want to significantly reduce redundancy in manual annotation by developing uncertainty based scores that allow us to involve expensive expert knowledge only where necessary.

One promising approach to break the bottleneck of data annotation is active learning, which uses a learning algorithm to interactively query experts for new annotations. This expert-in-the-loop process has been demonstrated to achieve similar or even greater performance as compared to a fully labelled data-set, with a fraction of the cost and time that it takes to label all the data [9]. Here, we combine active learning with object detection and develop a novel active learning annotation tool to guide expert annotation. Although different active learning methods have been proposed to accelerate the annotation process for classification problems, e.g. [5], few approach exist that allow for object detection, and none for a multiclass detection problem with clinical relevance.

Our active learning annotation approach interactively selects a candidate annotation set by measuring the uncertainty of classification and detection of single cells, and by considering rare classes in data set. Our approach is the first to calculate relevance for the goal of active learning in multiclass object detection and the first to come up with intelligent data selection for expert annotation for biomedical images.

2 Method

Our proposed active learning annotation approach starts from a Faster R-CNN model with an annotated training set (see Fig. 2). We apply the trained model on not annotated images and select the most relevant images based on a novel uncertainty analysis in order to ask for expert annotations. With these additional annotations, we update our model and select new images for more annotations. We keep iterating this annotation process until all cells above a particular uncertainty are annotated or a desired classification performance is achieved.

2.1 Object Detection with Faster R-CNN

Faster R-CNN is an advanced version of Fast R-CNN [6] and R-CNN [7] and was first proposed in [12]. In this approach a Fast R-CNN is coupled with a Region Proposal Network (RPN) and both networks are trained together: convolutional layers extract features from the input image, the RPN generates object proposals based on the feature map, and each proposal is classified into one of the defined classes. We used a VGG-16 network [13] pretrained on ImageNet [3] as the backbone. More formally, considering $F_\theta $ a Faster R-CNN model with weights $\theta $ and I an input image we have

$$\begin{aligned} p,t^k = F_\theta (I) \end{aligned}$$

(1)

where p is discrete probability distribution over all classes (as is normally computed with soft-max over the last fully connected layer) and $t^k$ is the bounding box regression for every class. The multi-task loss L of the Fast R-CNN can be defined as:

$$\begin{aligned} L = L_{cls}(p,u) + \lambda [u>0] L_{loc}(t^u,v) \end{aligned}$$

(2)

where u,v are annotations of bounding boxes and their classes from the dataset respectively and $t^u$ is the bounding box regression corresponding to the ground truth u. Brackets are Iverson brackets that yield 0 for the background class ($u=0$) and 1 for the rest. $\lambda $ is a balancing parameter between classification loss $L_{cls}$ and localization loss $L_{loc}$ [6]. The localization loss is defined as

$$\begin{aligned} L_{loc}(t_i,v)= \sum _{m\in \{x,y,w,h\}} \mathrm {smooth}_{L1} (t^m_i - v^m). \end{aligned}$$

(3)

The classification loss $L_{cls}$ is calculated with softmax cross entropy:

$$\begin{aligned} L_{cls} = -\log p_u. \end{aligned}$$

(4)

For training we used an approximate joint training method [12].

2.2 Uncertainty Score per Cell

For each cell, we measure the uncertainty of our model prediction with three scores: (i) the detection uncertainty, (ii) the classification uncertainty and (iii) a binary score for the possibility of the object belonging to a rare class. We explain each score in the following subsections.

Detection Uncertainty. For each non-annotated image, we apply our model N times and quantify uncertainty using dropout variational inference [10]. To evaluate the certainty of the model in detecting cells, we compare bounding boxes of each cell across N inferences. For every inference, we only keep the bounding box of the class that has the highest probability in p. We call this bounding box d. Our uncertainty score of detection $U^d$ is thus defined as

$$\begin{aligned} U^d = \frac{1}{N-1}\sum _{i=2}^{i=N}{ \mathrm {IoU} (d_1,d_i)} \end{aligned}$$

(5)

where $d_i$ is bounding box d in the $i^{th}$ inference and $\mathrm {IoU}$ measures intersection over union between two given bounding boxes. It is clear that $U^d\in [0,1]$.

Classification Uncertainty. For each detected cell, the most probable class c from p is picked. Having N inferences we have the set $c = \{c_1,c_2,...,c_N\}$. Hence, we measure the uncertainty of classification using

$$\begin{aligned} U^c = \frac{1}{N} \sum _{i = 1}^{N} [c_i = c_m] \end{aligned}$$

(6)

where $c_m$ is the mode, i.e. the item with the most frequency, in set c and $U^c$ is its frequency. Similar to $U^d$, $U^c$ is also in [0, 1].

Rare Class Prediction. The red blood cell dataset has a strong class unbalance: cells belonging to the discocyte (D.) or primary echinocyte (E.1, see Fig. 1) class are much more frequent compared to dehydrated stomatocytes (S.D.) or final echinocytes (E.F.). Blood cells of rare classes are clinically interesting yet detecting them is extremely challenging due to the small number of samples and large variations in appearance. In order to boost the precision of detection in rare classes, we introduce a metric $U^r$ to prioritize annotation of those cells that are likely to belong to a rare class.

$$\begin{aligned} \forall j \in R : U^r = {\left\{ \begin{array}{ll} 0 &{} p^j \le 0.2 \\ 1 &{} p^j > 0.2 \end{array}\right. } \end{aligned}$$

(7)

where R is the set of rare classes and $p^j$ is the probability of class j in the Faster R-CNN discrete class probability.

2.3 Relevance Score per Image

We rank every non-annotated image with a relevance score defined as

$$\begin{aligned} R_{\text {img}} = \sum _{j=1}^{M}(U_j^c\le \alpha ) + \sum _{j=1}^{M}(U_j^d\le \beta ) + \gamma \times \sum _{j=1}^{M} {U^r_j} \end{aligned}$$

(8)

where M is the number of detected cells in the image, and $\alpha $ and $\beta $ are thresholds defined for detection and classification uncertainties. Images having a higher uncertainty are selected for the analysis of the expert. $\gamma $ weights the contribution of cells that are suspected to be rare classes. In our experiments, we chose $\alpha = 0.80$, $\beta = 0.90$ and $\gamma = 10$.

3 Experiment and Results

3.1 Data

Our dataset consists of 208 brightfield images with 572 $\times $ 572 pixel obtained from human blood samples by an Axiocam mounted on Axiovert 200 m Zeiss microscope with a 100x objective. Cells are not stained and no preprocessing is performed. Each image contains 30–40 cells, with a total of 7669 cells in the dataset. Each red blood cell belongs to one out of seven classes according to the morphology classification shown in Fig. 1. Class frequency is highly unbalanced: While we find 3803 (=50%) discocytes, but only 36 (=0.5%) dehydrated stomatocytes. From the 208 images, we use 30 expert annotated images ($\approx $1100 cells) to train our initial Faster R-CNN model and hold out 20 annotated images ($\approx $720 cells) to evaluate the model accuracy. The remaining images are used for experiments of either active-learning guided expert annotation or randomly selected annotation.

3.2 Cell Uncertainty

Our approach is able to determine the uncertainty of each single cell, calculate the relevance of each image and rank images for annotation accordingly. Figure 3 shows three exemplary images with different types of cells selected by our strategy: cells associated with high detection uncertainty (Fig. 3a), cells associated with high classification uncertainty (Fig. 3b), and cells that are predicted to belong to a rare class (Fig. 3c). We highlight these cells in red boxes. In contrary, cells in green boxes are considered to be less informative for the model and do not require expert review. Images containing many or highly uncertain cells are ranked as highly relevant and presented to the expert for annotation.

3.3 Evaluation

We evaluate our active learning annotation approach systematically by comparing its performance with a baseline method where the expert is asked to annotate randomly selected images. In Fig. 4a we show the object detection precision for all seven classes, weighted by the number of cells in each class:

$$\begin{aligned} \mathrm {Precision_{all}} = \frac{\sum _{k=1}^{K} N_k \times AP_k}{\sum _{k=1}^{K} N_k} \end{aligned}$$

(9)

where $N_k$ is number of detected cells in class k. This value increases by $5\%$ as we add 1000 newly annotated cells using active learning. In contrast, the performance boost with the same number of randomly annotated cells is slower and around only $2\%$ for 1000 additionally annotated cells. The difference between the two methods is even more pronounced in the detection precision of blood cells of a rare class. The peculiar morphology of dehydrated stomatocytes and a potential over-representation has been linked to disease mutations [1]. Hence an accurate detection of this subtype is clinically highly important but impeded by the rareness of the cells. While dehydrated stomatocytes can be hardly captured by random annotation, our active learning annotation approach highlights cells that are predicted to belong to this rare class and prioritizes them for expert annotations. This leads to a fast increase of the average detection precision of this class from around $15\%$ to around $50\%$ for 1000 newly annotated cells (see Fig. 4b), while the average precision is unchanged in the random approach, where few if any dehydrated stomatocytes are annotated among the 1000 randomly selected new annotations.

4 Conclusion

Our original active learning annotation approach is able to speed up annotation and improve classification of red blood cell sub-types. This is an important task in diagnosis and prognosis of many blood diseases. Efficient annotation is urgently required for other biomedical data sets, and in particular for digital pathology applications. An extension of our framework into a software prototype will boost annotated data sets and open new avenues for computational pathology solutions.

References

Andolfo, I., et al.: Multiple clinical forms of dehydrated hereditary stomatocytosis arise from mutations in piezo1. Blood 121(19), 3925–3935 (2013)
Article Google Scholar
Coudray, N., et al.: Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24(10), 1559 (2018)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)
Google Scholar
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115 (2017)
Article Google Scholar
Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1183–1192. JMLR. org (2017)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Hw, G.L., Wortis, M., Mukhopadhyay, R.: Stomatocyte-discocyte-echinocyte sequence of the human red blood cell: evidence for the bilayer-couple hypothesis from membrane mechanics. Proc. Natl. Acad. Sci. 99(26), 16766–16769 (2002)
Article Google Scholar
Ilhan, H.O., Amasyali, M.F.: Active learning as a way of increasing accuracy. Int. J. Comput. Theory Eng. 6(6), 460 (2014)
Article Google Scholar
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems, pp. 5574–5584 (2017)
Google Scholar
Minetti, G., et al.: Red cell investigations: art and artefacts. Blood Rev. 27(2), 91–101 (2013)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Xu, M., Papageorgiou, D.P., Abidi, S.Z., Dao, M., Zhao, H., Karniadakis, G.E.: A deep convolutional neural network for classification of red blood cells in sickle cell anemia. PLoS Comput. Biol. 13(10), e1005746 (2017)
Article Google Scholar

Download references

Acknowledgements

CM acknowledges support from the DFG funded SFB1243 (A09) and the “MicMode-I2T” project (FKZ#017x1710D) funded by the Federal Ministry of Education and Research of Germany (BMBF), DLR project management, in the “e:Med” framework program.

Author information

Authors and Affiliations

Institute of Computational Biology, Helmholtz Zentrum München – German Research Center for Environmental Health, Munich, Germany
Ario Sadafi, Niklas Koehler, Carsten Marr & Tingying Peng
Computer Aided Medical Procedures, Technische Universität München, Munich, Germany
Ario Sadafi, Nassir Navab & Tingying Peng
Arivis AG, Munich, Germany
Ario Sadafi
Red Blood Cell Research Group, Institute of Veterinary Physiology, Vetsuisse Faculty and the Zurich Center for Integrative Human Physiology, University of Zurich, Zurich, Switzerland
Asya Makhro & Anna Bogdanova
Computer Aided Medical Procedures, Johns Hopkins University, Baltimore, USA
Nassir Navab

Authors

Ario Sadafi
View author publications
You can also search for this author in PubMed Google Scholar
Niklas Koehler
View author publications
You can also search for this author in PubMed Google Scholar
Asya Makhro
View author publications
You can also search for this author in PubMed Google Scholar
Anna Bogdanova
View author publications
You can also search for this author in PubMed Google Scholar
Nassir Navab
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Marr
View author publications
You can also search for this author in PubMed Google Scholar
Tingying Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Carsten Marr or Tingying Peng .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sadafi, A. et al. (2019). Multiclass Deep Active Learning for Detecting Red Blood Cell Subtypes in Brightfield Microscopy. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11764. Springer, Cham. https://doi.org/10.1007/978-3-030-32239-7_76

Download citation

DOI: https://doi.org/10.1007/978-3-030-32239-7_76
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32238-0
Online ISBN: 978-3-030-32239-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Multiclass Deep Active Learning for Detecting Red Blood Cell Subtypes in Brightfield Microscopy

Abstract

Similar content being viewed by others

Automating Sickle Cell Counting Using Object Detection Techniques