A Divide-and-Conquer Approach Towards Understanding Deep Networks

Weilin Fu¹⁶,
Katharina Breininger¹⁶,
Roman Schaffert¹⁶,
Nishant Ravikumar¹⁶ &
…
Andreas Maier^16,17

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11764))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

17k Accesses
12 Citations
1 Altmetric

Abstract

Deep neural networks have achieved tremendous success in various fields including medical image segmentation. However, they have long been criticized for being a black-box, in that interpretation, understanding and correcting architectures is difficult as there is no general theory for deep neural network design. Previously, precision learning was proposed to fuse deep architectures and traditional approaches. Deep networks constructed in this way benefit from the original known operator, have fewer parameters, and improved interpretability. However, they do not yield state-of-the-art performance in all applications. In this paper, we propose to analyze deep networks using known operators, by adopting a divide-and-conquer strategy to replace network components, whilst retaining networks performance. The task of retinal vessel segmentation is investigated for this purpose. We start with a high-performance U-Net and show by step-by-step conversion that we are able to divide the network into modules of known operators. The results indicate that a combination of a trainable guided filter and a trainable version of the Frangi filter yields a performance at the level of U-Net (AUC 0.974 vs. 0.972) with a tremendous reduction in parameters (111, 536 vs. 9, 575). In addition, the trained layers can be mapped back into their original algorithmic interpretation and analyzed using standard tools of signal processing.

You have full access to this open access chapter, Download conference paper PDF

Abstract: Divide-And-Conquer Approach Towards Understanding Deep Networks

A cognitive deep learning approach for medical image processing

Article Open access 24 February 2024

Impact of Loss Function in Deep Learning Methods for Accurate Retinal Vessel Segmentation

Keywords

1 Introduction

Deep learning (DL) technology [6] has been successfully applied in various fields including medical image segmentation, which provides substantial support for diagnosis, therapy planning and treatment procedures. Despite their outstanding achievements, DL-based algorithms have long been criticized for being a black-box and many design choices in Convolutional Neural Network (CNN) topologies are driven rather by experimental improvements than theoretical foundation. Accordingly, understanding the actual working principle of the architectures is difficult. One option to gain interpretability is to constrain the network with known operators. Precision learning [8, 9], which integrates known operators [3, 13] into DL models, can provide a suitable mechanism to design CNN architectures. This strategy integrates prior knowledge into the deep learning pipeline, thereby improving interpretability, providing guarantees and quality control in certain settings. However, the quantitative performance of these approaches often falls short compared to completely data-driven approaches.

In this work, we propose an approach to debug and identify the limitation/bottleneck of a known operator workflow. Frangi-Net [3], which is the deep learning counterpart of the Frangi filter [2] is utilized as an exemplary network. The performance of different methods is evaluated on the retinal vessel segmentation task, using data from the Digital Retinal Images for Vessel Extraction (DRIVE) database [11]. Experiments are designed under the assumption that if the replacement of one step leads to a performance boost, then this step is the probable bottleneck of the overall workflow. In our case, we debug the Frangi-Net by replacing the preprocessing step with the powerful U-Net [10]. With the output from the U-Net as input, Frangi-Net approaches state-of-the-art performance. Thereby, we conclude that the preprocessing method is the weakness of the Frangi-Net segmentation pipeline. In other words, given a proper preprocessing algorithm, Frangi-Net may be capable of accomplishing the retinal vessel segmentation task. To verify this hypothesis, we further utilize the guided filter layer [12], which is a deep learning module designed for image quality enhancement. Experimental results confirm our hypothesis: the additional guided filter layer indeed brings about a substantial improvement in performance. Due to the modular design, analysis of the trained filter block is possible which reveals slightly unexpected behaviour. Our work has two main contributions: Firstly, we propose a feasible way to identify the bottleneck of a precision learning-based workflow. Secondly, the debugging procedure yields a network pipeline with well-defined explainable steps for retinal vessel segmentation, i.e., guided filter layer for preprocessing, and Frangi-Net for vesselness computation.

2 Methods

2.1 Frangi-Net

In this work, Frangi-Net, which is the deep learning counterpart of the Frangi filter [2], is utilized as the segmentation network in different pipelines. The Frangi filter is a widely used multi-scale tube segmentation method, which calculates vesselness response $V_0$ of dark tubes at scale $\sigma $ with Hessian eigenvalues ($|\lambda _1| \le |\lambda _2|$) using:

$$\begin{aligned} V_0(\sigma ) = \left\{ \begin{array}{ll} 0, &{} \text { if } \lambda _2 < 0,\\ \exp (-\frac{R_B^{2}}{2\beta ^{2}})(1-\exp (-\frac{S^2}{2c^2})), &{} \text { otherwise,} \end{array} \right. \end{aligned}$$

(1)

where $S = \sqrt{\lambda _1^2 + \lambda _2^2}$ is the second-order structureness, $R_B = \frac{\Vert \lambda _1\Vert }{\Vert \lambda _2\Vert }$ is the blobness measure, and $\beta , c$ are image-dependent parameters for blobness and structureness terms. Frangi-Net is constructed by representing each step in the multi-scale Frangi filter as a layer. Here, we employ a Frangi-Net with 8 different Gaussian scales ranging from 0.5 to 4.0. The convolution kernels are initialized as the second-order partial derivatives of the Gaussian kernel at the corresponding scales. We employ two additional $1\times 1$ convolution layers before the final softmax output layer, to regulate the data range. The hyper-parameters $\beta , c$ in Eq. 1 of all scales are initialized to 0.5 and 1.0, respectively. The network has 6, 525 weights, and the overall architecture is shown in Fig. 1.

2.2 U-Net

In this work, a U-Net [10] is directly applied to retinal vessel segmentation, and forms the baseline method for all comparisons. U-Net is a successful encoder-decoder CNN architecture, popularized in the field of medical image segmentation. It combines location information in the contracting encoder path, with contextual information in the expanding decoder path via skip connections. Here, we adapt a three-level U-Net with 16 initial features with two main modifications. Firstly, batch normalization layers are added after convolution layers to stabilize the training process. Secondly, deconvolution layers are replaced with upsampling layers followed by a $1\times 1$ convolution layer. The overall architecture contains 111, 536 trainable weights.

2.3 U-Net + Frangi-Net

In order to analyze the reason for the performance differences between Frangi-Net and the U-Net, we propose to employ the latter as a “wildcard preprocessing network”. To this end, we concatenate the two networks such that the output of the U-Net serves as input for the Frangi-Net and train the segmentation pipeline end-to-end. The intuition here is that, if the combined network is able to achieve a performance on par with the completely data driven approach, the bottleneck of the known-operator network lies in the preprocessing. Otherwise, the known operator is inadequate to solve the task at hand, even with optimized images. Since Frangi-Net only takes single channel input, two additional modifications are made to the final layers of U-Net: the final convolution layer yields a one channel output, and a sigmoid layer is employed to replace the softmax layer for feature map activation. The modified U-Net architecture is shown in Fig. 2.

2.4 Guided Filter Layer + Frangi-Net

Preliminary experiments conducted using U-Net and U-Net + Frangi-Net indicated that the preprocessing step was indeed the bottleneck in the vessel segmentation pipeline. Consequently, we propose to replace the “wildcard” U-Net with a guided filter layer. The guided filter layer was proposed as differentiable neural network counterpart of the guided filter [4], which can be utilized as an edge-preserving, denoising approach. The guided filter takes one image p and one guidance image I as input to produce one output image q. This translation-variant filtering process can be simplified and described in Eq. 2:

$$\begin{aligned} q_i = \sum _j W_{ij} (I)p_j,\end{aligned}$$

(2)

where i, j are pixel indices, and $W_{ij}$ is the kernel which is a function of the guidance image I and is independent of p.

A guided filter layer with two trainable components is used as the preprocessing block. First, the guidance map I is generated with a CNN, using image p as input. Here, the CNN is configured as a five-layer Context Aggregation Network (CAN) [1]. Subsequently, a small feature extractor is applied to image p before being passed to the guided filter layer. This feature extractor is composed of two $3\times 3$ convolution layers with five intermediate channels, and one final output feature map. The guided filter block contains 3, 050 parameters. The architecture is shown in Fig. 3.

3 Experiments and Results

3.1 Data

The DRIVE database is employed to evaluate different pipelines in this study. The database contains 40 RGB fundus photographs of size $565\times 584$ pixels, which are evenly divided into a training and a testing set. A validation set of four images is further separated from the training set to monitor the training process and avoid overfitting. The green channels, which have the best contrast between vessels and the background, are extracted and processed using Contrast Limited Adaptive Histogram Equalization (CLAHE) [14] to balance inhomogeneous illuminations. Manual labels and Field Of View (FOV) masks are provided for all images. For each image of the training set, a weighting map w which emphasizes thin vessels is generated on the basis of the manual label using the equation $w=\frac{1}{\alpha \times d}$, where d denotes the vessel diameter in the ground truth, and $\alpha $ is a factor manually set to 0.18. In order to have a meaningful and fair comparison between different methods, all FOV masks are eroded inward by four pixels to remove potential border effects. Performance evaluation is conducted inside the FOV masks.

3.2 Network Training

The objective functions for all learning-based methods in this work are constructed with three parts as: $L_{total} = w\cdot L_{focal} + \lambda _w\cdot R_{w} + \lambda _s\cdot R_s$, where w is the weighting map which emphasizes small vessels; $L_{focal}$ is the class balanced focal loss [7], with a focusing factor of 2.0; $R_w$ denotes an $\ell _2$-norm regularizer on the network weights to prevent overfitting; $R_s$ represents a similarity regularizer which is the mean squared error between the input and output of the preprocessing net. $\lambda _w, \lambda _s$ are the scaling factors of the corresponding regularizers, and are set to 0.2 and 0.1, respectively. The Adam optimizer [5] with learning rate decay is utilized to minimize the objective function. The initial learning rate is $5\times 10^{-4}$ for U-Net, and $5\times 10^{-5}$ for all other pipelines. All networks are trained with a batch size of 50, and with $168\times 168$ image patches. Data augmentation in form of rotation, shearing, additive Gaussian noise, and intensity shifting is employed. All methods are implemented in Python 3.5.2 using TensorFlow 1.10.0.

3.3 Evaluation and Results

The evaluation performance of six different segmentation workflows is evaluated on the DRIVE testing set, and is summarized in Table 1. Binarization of the output probability maps from the network pipelines is performed with a single threshold which maximizes the F1 score on the validation set. The input, intermediate outputs of the preprocessing nets and the corresponding probability map results from the Frangi-Net for an representative region of interest (ROI) of an image from the testing set are presented in Fig. 4.

Table 1. Performance evaluation on DRIVE testing set. prep., reg., seg. denote preprocessing net, regularizer, and segmentation method, respectively.

Full size table

From Table 1, we observe that the Frangi-Net without additional preprocessing (FN) performs better than the original Frangi filter (FF), but worse than the completely data-driven U-Net (UN). Using the U-Net as a preprocessing network (UP + FN), we observe a performance boost, achieving results on-par with UN, with respect to all evaluation metrics and reaching an AUC score of 0.975. With an additional regularizer $R_s$ that enforces the similarity between the input and output of the preprocessing network, the performance is only modestly impaired. When looking at the intermediate outputs of the preprocessing nets (see Fig. 4(b) and (c)), we observe that the UP substantially enhances the contrast for small vessels and reduces noise compared to the input image (a). Low frequency information, e.g., the illumination in the bright optic disc and the dark macula region, is removed when no additional $R_s$ is applied. This provides further confirmation of the hypothesis that the main bottleneck of the proposed known-operator pipeline lies in the preprocessing, and can be combated by an appropriate adaption of this step. This is supported by the results achieved using the guided filter layer for preprocessing (GF + FN).

The guided filter layer, however, does not simply learn an edge-preserving denoising filtering as the intermediate output reveals (see Fig. 4(d)). It performs a substantial enhancement of small vessels and removal of the low-frequency background comparable to UP (see Fig. 4(b)). In this case, the performance of the pipeline is only marginally inferior to that of the U-Net, approaching an AUC score of 0.972.

4 Discussion and Conclusion

We proposed a method to analyze and interpret a DL-based algorithm, via step-by-step conversion of a fully-data driven approach, to construct a pipeline using well-defined known operators. The approach helps to identify and combat bottlenecks in a known-operator pipeline, by localizing the components responsible for drops in performance. Additionally, it provides a mechanism to interpret deep network architectures in a divide-and-conquer pattern, by replacing each step in the network pipeline with a well-defined operator.

The potential of the proposed framework to improve our understanding of deep neural networks and enable intelligent network design was demonstrated for the exemplary task of retinal vessel segmentation. The previously proposed known-operator network Frangi-Net enables easy interpretation, but performs worse than a fully data-driven approach such as the U-Net. Conversely, an interpretation of the fully data-driven approach remains vague despite satisfactory performance. By using the U-Net as a debugging tool, we confirm that with appropriate preprocessing, the Frangi-Net is capable of achieving on-par performance. This performance boost also indicates that the preprocessing is the bottleneck of the Frangi-Net workflow. Subsequently, we identify the guided filter layer as a suitable known operator that can serve as a replacement for the U-Net in terms of preprocessing, while retaining performance.

The quantitative results support our hypothesis that the task of vessel segmentation can be separated into two steps: a preprocessing step that enhances image quality, and a segmentation step which yields the actual vesselness probability map. By replacing these elements step-by-step, we are able to preserve high segmentation performance while incorporating interpretability into the network pipeline with well-defined, understandable steps.

While the results of the U-Net preprocessing with similarity regularization demonstrate that there exists an edge-preserving filtering approach that results in an equally effective segmentation based on the vesselness filter, the guided filter layer does not fulfill the expected filtering behavior. Instead of edge-preserving filtering, the guided filter layer learns a domain transfer to a vessel-enhancing representation that removes low frequency information at the same time. Looking at Eq. 2 this seems surprising, as the guided filter uses the guidance image only for design of the filter kernel in a shift-variant filtering process. Yet, this design does not guarantee an edge-preserving filtering per se as the guidance image may also result in band-pass kernels. As a result, the filter learns to create kernels that are optimal with respect to the purpose of the net that is a vessel enhanced image in our case.

Still, our divide-and-conquer approach allows to specify the important parts of a network. This is achieved by showing that a known operator network which is restricted in what it can learn with 9, 575 vs. 111, 536 parameters, performs comparably to a completely data-driven network with an AUC score of 0.972 vs. 0.974. The use of a powerful network, i.e., U-Net in this case, supplements the performance and addresses the shortcomings of the known operators, and thus helps to improve understanding of the network for a specific task. Future work will look into exploiting the divide-and-conquer approach to aid network interpretation and performance improvement for other tasks, based on known operator modules. It provides a systematic framework to design interpretable network pipelines with minimal loss in performance, relative to completely data-driven approaches, which is compelling for the intelligent design of networks in the future.

References

Chen, Q., Xu, J., Koltun, V.: Fast image processing with fully-convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2497–2506 (2017)
Google Scholar
Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: Wells, W.M., Colchester, A., Delp, S. (eds.) MICCAI 1998. LNCS, vol. 1496, pp. 130–137. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0056195
Chapter Google Scholar
Fu, W., et al.: Frangi-Net. In: Maier, A., Deserno, T., Handels, H., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds.) Bildverarbeitung für die Medizin 2018. INFORMAT, pp. 341–346. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-662-56537-7_87
Chapter Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Maier, A., et al.: Precision learning: towards use of known operators in neural networks. In: ICPR, pp. 183–188. IEEE (2018)
Google Scholar
Maier, A.K., et al.: Learning with known operators reduces maximum training error bounds. arXiv preprint arXiv:1907.01992 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 23(4), 501–509 (2004)
Article Google Scholar
Wu, H., Zheng, S., Zhang, J., Huang, K.: Fast end-to-end trainable guided filter. In: CVPR, pp. 1838–1847 (2018)
Google Scholar
Würfl, T., Ghesu, F.C., Christlein, V., Maier, A.: Deep learning computed tomography. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9902, pp. 432–440. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46726-9_50
Chapter Google Scholar
Zuiderveld, K.: Contrast limited adaptive histogram equalization. In: Graphics Gems IV, pp. 474–485. Academic Press Professional, Inc. (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nürnberg, 91058, Erlangen, Germany
Weilin Fu, Katharina Breininger, Roman Schaffert, Nishant Ravikumar & Andreas Maier
Erlangen Graduate School in Advanced Optical Technologies (SAOT), 91058, Erlangen, Germany
Andreas Maier

Authors

Weilin Fu
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Breininger
View author publications
You can also search for this author in PubMed Google Scholar
Roman Schaffert
View author publications
You can also search for this author in PubMed Google Scholar
Nishant Ravikumar
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Maier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weilin Fu .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, W., Breininger, K., Schaffert, R., Ravikumar, N., Maier, A. (2019). A Divide-and-Conquer Approach Towards Understanding Deep Networks. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11764. Springer, Cham. https://doi.org/10.1007/978-3-030-32239-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-32239-7_21
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32238-0
Online ISBN: 978-3-030-32239-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)