Abstract
Developing a convolutional neural network (CNN) for medical image segmentation is a complex task, especially when dealing with the limited number of available labelled medical images and computational resources. This task can be even more difficult if the aim is to develop a deep network and using a complicated structure like attention blocks. Because of various types of noises, artefacts and diversity in medical images, using complicated network structures like attention mechanism to improve the accuracy of segmentation is inevitable. Therefore, it is necessary to develop techniques to address the above difficulties. Neuroevolution is the combination of evolutionary computation and neural networks to establish a network automatically. However, Neuroevolution is computationally expensive, specifically to create 3D networks. In this paper, an automatic, efficient, accurate, and robust technique is introduced to develop deep attention convolutional neural networks utilising Neuroevolution for both 2D and 3D medical image segmentation. The proposed evolutionary technique can find a very good combination of six attention modules to recover spatial information from downsampling section and transfer them to the upsampling section of a U-Net-based network—six different CT and MRI datasets are employed to evaluate the proposed model for both 2D and 3D image segmentation. The obtained results are compared to state-of-the-art manual and automatic models, while our proposed model outperformed all of them.
Similar content being viewed by others
Data Availability
Datasets are publicly available
Code Availability
Not applicable
References
Abbas Q, Ibrahim ME, Jaffar MA: A comprehensive review of recent advances on deep vision systems. Artif Intell Rev 52(1):39–76, 2019
Back T: Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford university press, 1996
Bahdanau D, Cho K, Bengio Y: Neural machine translation by jointly learning to align and translate. arXiv preprint, 2014. arXiv:14090473
Baldeon-Calisto M, Lai-Yuen SK: Adaresu-net: Multiobjective adaptive convolutional neural network for medical image segmentation. Neurocomputing 392:325–340, 2020
Calisto MB, Lai-Yuen SK: Adaen-net: An ensemble of adaptive 2d-3d fully convolutional networks for medical image segmentation. Neural Netw 2020
Chen P, Sun Z, Bing L, Yang W: Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, 2017, pp 452–461
Cheung B, Sable C: Hybrid evolution of convolutional networks. In 2011 10th International Conference on Machine Learning and Applications and Workshops, vol. 1, IEEE, 2011, pp. 293–297
Chollet F, et al: Keras. https://keras.io, 2015
Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O: 3d u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, Springer, 2016, pp. 424–432
CireşAn D, Meier U, Masci J, Schmidhuber J: Multi-column deep neural network for traffic sign classification. Neural netw. 32:333–338, 2012
Darwish A, Hassanien AE, Das S: A survey of swarm and evolutionary computing approaches for deep learning. Artif Intell Rev 53, 3:1767–1812, 2020
Dice LR: Measures of the amount of ecologic association between species. Ecology 26, 3:297–302, 1945
Dong N, Xu M, Liang X, Jiang Y, Dai W, Xing E: Neural architecture search for adversarial medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2019, pp. 828–836
Drozdzal M, Vorontsov E, Chartrand G, Kadoury S, Pal C: The importance of skip connections in biomedical image segmentation. In Deep Learning and Data Labeling for Medical Applications. Springer, 2016, pp. 179–187
Fogel DB: Phenotypes, genotypes, and operators in evolutionary computation. In Proc. 1995 IEEE Int. Conf. Evolutionary Computation (ICEC 95), 1995, pp. 193–198
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H: Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3146–3154
Fujino S, Mori N, Matsumoto K: Deep convolutional networks for human sketches by means of the evolutionary deep learning. In 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS), IEEE, 2017, pp. 1–5
Goldberg DE, Deb K: A comparative analysis of selection schemes used in genetic algorithms. In Foundations of genetic algorithms, vol. 1. Elsevier, 1991, pp. 69–93
Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS: Deep learning for visual understanding: A review. Neurocomputing 187:27–48, 2016
Hassanzadeh T, Essam D, Sarker R: 2d to 3d evolutionary deep convolutional neural networks for medical image segmentation. IEEE Trans Med Imaging, 2020
Hassanzadeh T, Essam D, Sarker R: Evolutionary attention network for medical image segmentation. In 2020 The International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2020, pp. 1–8
Hassanzadeh T, Essam D, Sarker R: An evolutionary denseres deep convolutional neural network for medical image segmentation. IEEE Access, 2020
Hassanzadeh T, Essam D, Sarker R: Evou-net: an evolutionary deep fully convolutional neural network for medical image segmentation. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, 2020, pp. 181–189
He K, Zhang X, Ren S, Sun J: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition 2016, pp. 770–778
Heimann T, Van Ginneken B, Styner MA, Arzhaeva Y, Aurich V, Bauer C, Beck A, Becker C, Beichel R, Bekes G, et al: Comparison and evaluation of methods for liver segmentation from ct datasets. IEEE Trans Med Imaging 28, 8:1251–1265, 2009
Hochreiter S, Schmidhuber J: Long short-term memory. Neural Comput 9, 8:1735–1780, 1997
Holland JH, et al: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992
Hu J, Shen L, Sun G: Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ: Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition 2017, pp. 4700–4708
Khan A, Sohail A, Zahoora U, Qureshi AS: A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53, 8:5455–5516, 2020
Kolařík M, Burget R, Uher V, Říha K, Dutta MK: Optimized high resolution 3d dense-u-net network for brain and spine segmentation. Appl Sci 9, 3:404, 2019
Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. Commun ACM 60, 6:84–90, 2017
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD: Backpropagation applied to handwritten zip code recognition. Neural Comput 1, 4:541–551, 1989
Li H, Xiong P, An J, Wang L: Pyramid attention network for semantic segmentation. arXiv preprint, 2018. arXiv:1805.10180
Li Y, Hao Z, Lei H: Survey of convolutional neural network. J Comput Appl 36, 9:2508–2515, 2016
Li Y, Zhu Z, Kong D, Han H, Zhao Y: Ea-lstm: Evolutionary attention-based lstm for time series prediction. Knowledge-Based Systems 181:104785, 2019
Liu X, Deng Z, Yang Y: Recent progress in semantic image segmentation. Artif Intell Rev 52, 2:1089–1106, 2019
Mane D, Kulkarni UV: A survey on supervised convolutional neural network and its major applications. In Deep Learning and Neural Networks: Concepts, Methodologies, Tools, and Applications. IGI Global, 2020, pp. 1058–1071
Mortazi A, Bagci U: Automatically designing cnn architectures for medical image segmentation. In International Workshop on Machine Learning in Medical Imaging, Springer, 2018, pp. 98–106
Qin Z, Yu F, Liu C, Chen X: How convolutional neural network see the world-a survey of convolutional neural network visualization methods. arXiv preprint, 2018. arXiv:1804.11191
Real E, Aggarwal A, Huang Y, Le QV: Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, 2019, vol. 33, pp. 4780–4789
Real E, Moore S, Selle A, Saxena S, Suematsu YL, Tan J, Le Q, Kurakin A: Large-scale evolution of image classifiers. arXiv preprint, 2017. arXiv:1703.01041
Ronneberger O, Fischer P, Brox T: U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241
Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, Rueckert D: Attention gated networks: Learning to leverage salient regions in medical images. Med Image Anal 53, 197–207, 2019
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C: Disan: Directional self-attention network for rnn/cnn-free language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018, vol. 32
Simpson AL, Antonelli M, Bakas S, Bilello M, Farahani K, Van Ginneken B, Kopp-Schneider A, Landman BA, Litjens G, Menze B, et al: A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint, 2019. arXiv:1902.09063
Stanley KO, Miikkulainen R: Evolving neural networks through augmenting topologies. Evol Comput 10, 2:99–127, 2002
Tian Y, Zhang Y, Zhou D, Cheng G, Chen WG, Wang R: Triple attention network for video segmentation. Neurocomputing 417:202–211, 2020
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X: Residual attention network for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3156–3164
Weng Y, Zhou T, Li Y, Qiu X: Nas-unet: Neural architecture search for medical image segmentation. IEEE Access 7:44247–44257, 2019
Yu F, Koltun V: Multi-scale context aggregation by dilated convolutions. arXiv preprint, 2015. arXiv:1511.07122
Yu L, Yang X, Chen H, Qin J, Heng PA: Volumetric convnets with mixed residual connections for automated prostate segmentation from 3d mr images. In Thirty-first AAAI conference on artificial intelligence, 2017
Zhang H, Jin Y, Cheng R, Hao K: Efficient evolutionary search of attention convolutional networks via sampled training and node inheritance. IEEE Trans Evol Comput, 2020
Zoph B, Le QV: Neural architecture search with reinforcement learning. arXiv preprint, 2016. arXiv:1611.01578
Zoph B, Vasudevan V, Shlens J, Le QV: Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710
Funding
Not applicable
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics Approval
Not applicable
Consent to Participate
Not applicable
Consent for Publication
Not applicable
Conflict of Interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix Extra Evaluation
2D Versus 3D
In this section, the proposed evolutionary 2D attention model is compared versus the proposed evolutionary 3D attention model.
Time Comparison
In this work, one NVIDIA GPU was used for training the 2D model and two NVIDIA GPUs were used for training the proposed 3D model. Figure 5 illustrates the required training time for 2D and 3D models. For example, our proposed model needs to be trained for about 24 days to find a set of 3D networks for 3D heart segmentation; however, the 2D model needs less than four days for training. Figure 5 shows a considerable difference between 2D and 3D models regarding the required computation. However, our 3D evolutionary model is still feasible to be implemented using a limited number of GPUs.
Parameter Comparison
In this section, the best obtained 2D attention network, its corresponding converted 3D network, and the evolutionary 3D attention network are compared regarding the number of trainable parameters. As shown in Fig. 6, the obtained 2D networks use less than a million parameters; however, with converting the 2D operations to 3D, the number of parameters are increased about five to six times. The final evolutionary attention 3D networks utilised two to less than nine million parameters, which can be considered a relatively small network for 3D image segmentation.
Structure Comparison
Table 9 provides the genotypes of the best-found networks regarding each dataset in 2D and 3D. As shown in Table 9, each network has its own structural and training parameters. Table 9 shows the diversity of the found networks and the input data’s effects on the final network. According to the paper’s approach, each of these genotypes can be converted to their corresponding networks or phenotypes. It needs to be noted that each network was evolved with its own training parameters. For example, the best-found 2D network for the Sliver dataset is trained using Rmsprop optimiser with a learning rate of 0.001; the batch size is 8, augmentation size is 32000, and the “he-uniform” is used for the weight initialisation.
Attention Modules
A comparison of the best five 2D and 3D networks utilising attention modules is provided in this section.
Figure 7 represents the distribution of utilised attention modules in the five best 2D networks regarding each dataset. As shown in Fig. 7, in some cases like the Sliver datasets, the best five networks utilised all modules in their structures; however, in some of them such as Prostate dataset five different types of attention modules are used in the five best networks. A similar pattern can be seen in the 3D networks (see Fig. 8). As shown in Fig. 8, the best 3D attention networks of Liver dataset only used residual activation unit and squeeze and excitation.
Based on the input data, and the obtained DSCs during evolution, the best combination of attention modules was selected for each network, which is a very difficult task if we design a network manually.
Extra Evaluation
Cross-Validation
To show our proposed 3D attention model’s robustness and remove randomness in our experiments, fourfold cross-validation was applied on the Sliver dataset. The Sliver dataset is one of the smallest datasets that we utilised in this work. Therefore it can be a good candidate for cross-validation. Table 10 shows the number of volumes in train, test, and validation sets for fourfold cross-validation. The number of volumes is shown as \(N\times M\times X\times Y\), where N is the number of volumes, M is the number of slices in a volume, X is width, and M is height of slices in a volume.
For each fold, evolution started with 30 population and continued up to nine generations using 15 population. During the evolution, the validation set is used for evaluation of the networks. In the end, the five best networks are selected, and the obtained DSCs using test sets are reported in Table 11.
As can be seen from Table 11, the proposed evolutionary attention model could obtain high accuracy 3D networks for 3D medical image segmentation regarding each fold.
Effect of Attention Modules
Also, to show the effect of using attention modules to recover and transfer extracted feature maps, examples of extracted features before and after residual activation unit and attention residual module are presented in Fig. 9. As shown in Fig. 9, the first row shows two examples of Heart and Hippocampus images along with their corresponding ground-truth. Figure 9b indicates a number of input feature maps to the residual activation unit, and Fig. 9c represents the module’s output feature maps. As can be seen from Fig. 9, after using the attention modules, a part of information about region of interest (RoI) is recovered. Also, a similar pattern can be seen for a Hippocampus image.
Subjective Comparison
In this section, another example of the segmented images regarding each dataset is provided as a subjective comparison. The results are compared versus six previous works, including Converted 2D to 3D [20], 3D U-Net [9], ConvNet [52], 3D Dense U-Net, UNet attention [44] [31], and NAS U-Net [50] respectively. NAS U-Net [50] is an automatic reinforcement-based technique to develop a network. Also, Converted 2D to 3D [20] is an automatic evolutionary model to develop networks and the rest are manually designed networks. As shown in Fig. 10, our proposed model could predict the ROI with high accuracy; however, over or under segmentation can be seen in some of the previous work’s results. For example, ConvNet even could not predict the ROI from the Hippocampus image. Because these networks are developed for a specific application or dataset, the networks structure or their training parameters need to be tuned accordingly when changing the application or dataset. It needs to be noted that all the previous works are implemented and trained based on their source papers.
Example of Crossover and Mutation
To clarify the crossover and mutation in our proposed model, an example is provided in Fig. 11. As shown in Fig. 11, two chromosomes are selected as parents, applying one-point crossover, two new offspring generated. Besides, three random mutations happened in one of the children and one mutation to the other one. This procedure needs to be repeated to create a new generation. After training the proposed model to the stated generation, a number of best networks will be selected as the final best networks.
Rights and permissions
About this article
Cite this article
Hassanzadeh, T., Essam, D. & Sarker, R. Evolutionary Deep Attention Convolutional Neural Networks for 2D and 3D Medical Image Segmentation. J Digit Imaging 34, 1387–1404 (2021). https://doi.org/10.1007/s10278-021-00526-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10278-021-00526-2