Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Evolutionary Deep Attention Convolutional Neural Networks for 2D and 3D Medical Image Segmentation

  • Original Paper
  • Published:
Journal of Digital Imaging Aims and scope Submit manuscript

Abstract

Developing a convolutional neural network (CNN) for medical image segmentation is a complex task, especially when dealing with the limited number of available labelled medical images and computational resources. This task can be even more difficult if the aim is to develop a deep network and using a complicated structure like attention blocks. Because of various types of noises, artefacts and diversity in medical images, using complicated network structures like attention mechanism to improve the accuracy of segmentation is inevitable. Therefore, it is necessary to develop techniques to address the above difficulties. Neuroevolution is the combination of evolutionary computation and neural networks to establish a network automatically. However, Neuroevolution is computationally expensive, specifically to create 3D networks. In this paper, an automatic, efficient, accurate, and robust technique is introduced to develop deep attention convolutional neural networks utilising Neuroevolution for both 2D and 3D medical image segmentation. The proposed evolutionary technique can find a very good combination of six attention modules to recover spatial information from downsampling section and transfer them to the upsampling section of a U-Net-based network—six different CT and MRI datasets are employed to evaluate the proposed model for both 2D and 3D image segmentation. The obtained results are compared to state-of-the-art manual and automatic models, while our proposed model outperformed all of them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

Datasets are publicly available

Code Availability

Not applicable

References

  1. Abbas Q, Ibrahim ME, Jaffar MA: A comprehensive review of recent advances on deep vision systems. Artif Intell Rev 52(1):39–76, 2019

    Article  Google Scholar 

  2. Back T: Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford university press, 1996

  3. Bahdanau D, Cho K, Bengio Y: Neural machine translation by jointly learning to align and translate. arXiv preprint, 2014. arXiv:14090473

  4. Baldeon-Calisto M, Lai-Yuen SK: Adaresu-net: Multiobjective adaptive convolutional neural network for medical image segmentation. Neurocomputing 392:325–340, 2020

    Article  Google Scholar 

  5. Calisto MB, Lai-Yuen SK: Adaen-net: An ensemble of adaptive 2d-3d fully convolutional networks for medical image segmentation. Neural Netw 2020

  6. Chen P, Sun Z, Bing L, Yang W: Recurrent attention network on memory for aspect sentiment analysis. In: Proceedings of the 2017 conference on empirical methods in natural language processing, 2017, pp 452–461

  7. Cheung B, Sable C: Hybrid evolution of convolutional networks. In 2011 10th International Conference on Machine Learning and Applications and Workshops, vol. 1, IEEE, 2011, pp. 293–297

  8. Chollet F, et al: Keras. https://keras.io, 2015

  9. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O: 3d u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, Springer, 2016, pp. 424–432

  10. CireşAn D, Meier U, Masci J, Schmidhuber J: Multi-column deep neural network for traffic sign classification. Neural netw. 32:333–338, 2012

    Article  Google Scholar 

  11. Darwish A, Hassanien AE, Das S: A survey of swarm and evolutionary computing approaches for deep learning. Artif Intell Rev 53, 3:1767–1812, 2020

    Article  Google Scholar 

  12. Dice LR: Measures of the amount of ecologic association between species. Ecology 26, 3:297–302, 1945

    Article  Google Scholar 

  13. Dong N, Xu M, Liang X, Jiang Y, Dai W, Xing E: Neural architecture search for adversarial medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2019, pp. 828–836

  14. Drozdzal M, Vorontsov E, Chartrand G, Kadoury S, Pal C: The importance of skip connections in biomedical image segmentation. In Deep Learning and Data Labeling for Medical Applications. Springer, 2016, pp. 179–187

  15. Fogel DB: Phenotypes, genotypes, and operators in evolutionary computation. In Proc. 1995 IEEE Int. Conf. Evolutionary Computation (ICEC 95), 1995, pp. 193–198

  16. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H: Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3146–3154

  17. Fujino S, Mori N, Matsumoto K: Deep convolutional networks for human sketches by means of the evolutionary deep learning. In 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS), IEEE, 2017, pp. 1–5

  18. Goldberg DE, Deb K: A comparative analysis of selection schemes used in genetic algorithms. In Foundations of genetic algorithms, vol. 1. Elsevier, 1991, pp. 69–93

  19. Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS: Deep learning for visual understanding: A review. Neurocomputing 187:27–48, 2016

    Article  Google Scholar 

  20. Hassanzadeh T, Essam D, Sarker R: 2d to 3d evolutionary deep convolutional neural networks for medical image segmentation. IEEE Trans Med Imaging, 2020

  21. Hassanzadeh T, Essam D, Sarker R: Evolutionary attention network for medical image segmentation. In 2020 The International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2020, pp. 1–8

  22. Hassanzadeh T, Essam D, Sarker R: An evolutionary denseres deep convolutional neural network for medical image segmentation. IEEE Access, 2020

  23. Hassanzadeh T, Essam D, Sarker R: Evou-net: an evolutionary deep fully convolutional neural network for medical image segmentation. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, 2020, pp. 181–189

  24. He K, Zhang X, Ren S, Sun J: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition 2016, pp. 770–778

  25. Heimann T, Van Ginneken B, Styner MA, Arzhaeva Y, Aurich V, Bauer C, Beck A, Becker C, Beichel R, Bekes G, et al: Comparison and evaluation of methods for liver segmentation from ct datasets. IEEE Trans Med Imaging 28, 8:1251–1265, 2009

    Article  Google Scholar 

  26. Hochreiter S, Schmidhuber J: Long short-term memory. Neural Comput 9, 8:1735–1780, 1997

    Article  CAS  Google Scholar 

  27. Holland JH, et al: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992

  28. Hu J, Shen L, Sun G: Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141

  29. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ: Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition 2017, pp. 4700–4708

  30. Khan A, Sohail A, Zahoora U, Qureshi AS: A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53, 8:5455–5516, 2020

    Article  Google Scholar 

  31. Kolařík M, Burget R, Uher V, Říha K, Dutta MK: Optimized high resolution 3d dense-u-net network for brain and spine segmentation. Appl Sci 9, 3:404, 2019

    Article  Google Scholar 

  32. Krizhevsky A, Sutskever I, Hinton GE: Imagenet classification with deep convolutional neural networks. Commun ACM 60, 6:84–90, 2017

    Article  Google Scholar 

  33. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD: Backpropagation applied to handwritten zip code recognition. Neural Comput 1, 4:541–551, 1989

    Article  Google Scholar 

  34. Li H, Xiong P, An J, Wang L: Pyramid attention network for semantic segmentation. arXiv preprint, 2018. arXiv:1805.10180

  35. Li Y, Hao Z, Lei H: Survey of convolutional neural network. J Comput Appl 36, 9:2508–2515, 2016

    Google Scholar 

  36. Li Y, Zhu Z, Kong D, Han H, Zhao Y: Ea-lstm: Evolutionary attention-based lstm for time series prediction. Knowledge-Based Systems 181:104785, 2019

    Article  Google Scholar 

  37. Liu X, Deng Z, Yang Y: Recent progress in semantic image segmentation. Artif Intell Rev 52, 2:1089–1106, 2019

    Article  Google Scholar 

  38. Mane D, Kulkarni UV: A survey on supervised convolutional neural network and its major applications. In Deep Learning and Neural Networks: Concepts, Methodologies, Tools, and Applications. IGI Global, 2020, pp. 1058–1071

  39. Mortazi A, Bagci U: Automatically designing cnn architectures for medical image segmentation. In International Workshop on Machine Learning in Medical Imaging, Springer, 2018, pp. 98–106

  40. Qin Z, Yu F, Liu C, Chen X: How convolutional neural network see the world-a survey of convolutional neural network visualization methods. arXiv preprint, 2018. arXiv:1804.11191

  41. Real E, Aggarwal A, Huang Y, Le QV: Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, 2019, vol. 33, pp. 4780–4789

    Article  Google Scholar 

  42. Real E, Moore S, Selle A, Saxena S, Suematsu YL, Tan J, Le Q, Kurakin A: Large-scale evolution of image classifiers. arXiv preprint, 2017. arXiv:1703.01041

  43. Ronneberger O, Fischer P, Brox T: U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241

  44. Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, Rueckert D: Attention gated networks: Learning to leverage salient regions in medical images. Med Image Anal 53, 197–207, 2019

    Article  Google Scholar 

  45. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C: Disan: Directional self-attention network for rnn/cnn-free language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018, vol. 32

  46. Simpson AL, Antonelli M, Bakas S, Bilello M, Farahani K, Van Ginneken B, Kopp-Schneider A, Landman BA, Litjens G, Menze B, et al: A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint, 2019. arXiv:1902.09063

  47. Stanley KO, Miikkulainen R: Evolving neural networks through augmenting topologies. Evol Comput 10, 2:99–127, 2002

    Article  Google Scholar 

  48. Tian Y, Zhang Y, Zhou D, Cheng G, Chen WG, Wang R: Triple attention network for video segmentation. Neurocomputing 417:202–211, 2020

    Article  Google Scholar 

  49. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X: Residual attention network for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3156–3164

  50. Weng Y, Zhou T, Li Y, Qiu X: Nas-unet: Neural architecture search for medical image segmentation. IEEE Access 7:44247–44257, 2019

    Article  Google Scholar 

  51. Yu F, Koltun V: Multi-scale context aggregation by dilated convolutions. arXiv preprint, 2015. arXiv:1511.07122

  52. Yu L, Yang X, Chen H, Qin J, Heng PA: Volumetric convnets with mixed residual connections for automated prostate segmentation from 3d mr images. In Thirty-first AAAI conference on artificial intelligence, 2017

  53. Zhang H, Jin Y, Cheng R, Hao K: Efficient evolutionary search of attention convolutional networks via sampled training and node inheritance. IEEE Trans Evol Comput, 2020

  54. Zoph B, Le QV: Neural architecture search with reinforcement learning. arXiv preprint, 2016. arXiv:1611.01578

  55. Zoph B, Vasudevan V, Shlens J, Le QV: Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710

Download references

Funding

Not applicable

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tahereh Hassanzadeh.

Ethics declarations

Ethics Approval

Not applicable

Consent to Participate

Not applicable

Consent for Publication

Not applicable

Conflict of Interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix Extra Evaluation

2D Versus 3D

In this section, the proposed evolutionary 2D attention model is compared versus the proposed evolutionary 3D attention model.

Time Comparison

In this work, one NVIDIA GPU was used for training the 2D model and two NVIDIA GPUs were used for training the proposed 3D model. Figure 5 illustrates the required training time for 2D and 3D models. For example, our proposed model needs to be trained for about 24 days to find a set of 3D networks for 3D heart segmentation; however, the 2D model needs less than four days for training. Figure 5 shows a considerable difference between 2D and 3D models regarding the required computation. However, our 3D evolutionary model is still feasible to be implemented using a limited number of GPUs.

Fig. 5
figure 5

Comparison of 2D model versus 3D model regarding the required training time

Parameter Comparison

In this section, the best obtained 2D attention network, its corresponding converted 3D network, and the evolutionary 3D attention network are compared regarding the number of trainable parameters. As shown in Fig. 6, the obtained 2D networks use less than a million parameters; however, with converting the 2D operations to 3D, the number of parameters are increased about five to six times. The final evolutionary attention 3D networks utilised two to less than nine million parameters, which can be considered a relatively small network for 3D image segmentation.

Fig. 6
figure 6

Comparison of 2D model versus 3D model regarding the number of training parameters

Structure Comparison

Table 9 provides the genotypes of the best-found networks regarding each dataset in 2D and 3D. As shown in Table 9, each network has its own structural and training parameters. Table 9 shows the diversity of the found networks and the input data’s effects on the final network. According to the paper’s approach, each of these genotypes can be converted to their corresponding networks or phenotypes. It needs to be noted that each network was evolved with its own training parameters. For example, the best-found 2D network for the Sliver dataset is trained using Rmsprop optimiser with a learning rate of 0.001; the batch size is 8, augmentation size is 32000, and the “he-uniform” is used for the weight initialisation.

Table 9 The chromosomes of the best founded 2D and 3D networks using six various datasets

Attention Modules

A comparison of the best five 2D and 3D networks utilising attention modules is provided in this section.

Figure 7 represents the distribution of utilised attention modules in the five best 2D networks regarding each dataset. As shown in Fig. 7, in some cases like the Sliver datasets, the best five networks utilised all modules in their structures; however, in some of them such as Prostate dataset five different types of attention modules are used in the five best networks. A similar pattern can be seen in the 3D networks (see Fig. 8). As shown in Fig. 8, the best 3D attention networks of Liver dataset only used residual activation unit and squeeze and excitation.

Based on the input data, and the obtained DSCs during evolution, the best combination of attention modules was selected for each network, which is a very difficult task if we design a network manually.

Fig. 7
figure 7

Distribution of utilised attention modules in five best 2D networks. 1: Concatenation, 2: dense, 3: residual, 4: residual activation unit, 5: attention residual, 6: squeeze and excitation

Fig. 8
figure 8

Distribution of utilised attention modules in five best 3D networks. 1: Concatenation, 2: dense, 3: residual, 4: residual activation unit, 5: attention residual, 6: squeeze and excitation

Extra Evaluation

Cross-Validation

To show our proposed 3D attention model’s robustness and remove randomness in our experiments, fourfold cross-validation was applied on the Sliver dataset. The Sliver dataset is one of the smallest datasets that we utilised in this work. Therefore it can be a good candidate for cross-validation. Table 10 shows the number of volumes in train, test, and validation sets for fourfold cross-validation. The number of volumes is shown as \(N\times M\times X\times Y\), where N is the number of volumes, M is the number of slices in a volume, X is width, and M is height of slices in a volume.

Table 10 The number of volumes in train, test, and validation sets for fourfold cross-validation using Sliver dataset

For each fold, evolution started with 30 population and continued up to nine generations using 15 population. During the evolution, the validation set is used for evaluation of the networks. In the end, the five best networks are selected, and the obtained DSCs using test sets are reported in Table 11.

Table 11 The obtained DSCs of the five best networks in each fold

As can be seen from Table 11, the proposed evolutionary attention model could obtain high accuracy 3D networks for 3D medical image segmentation regarding each fold.

Effect of Attention Modules

Also, to show the effect of using attention modules to recover and transfer extracted feature maps, examples of extracted features before and after residual activation unit and attention residual module are presented in Fig. 9. As shown in Fig. 9, the first row shows two examples of Heart and Hippocampus images along with their corresponding ground-truth. Figure 9b indicates a number of input feature maps to the residual activation unit, and Fig. 9c represents the module’s output feature maps. As can be seen from Fig. 9, after using the attention modules, a part of information about region of interest (RoI) is recovered. Also, a similar pattern can be seen for a Hippocampus image.

Fig. 9
figure 9

Examples of input and output feature maps of residual activation unit and attention residual block for Heart and Hippocampus images

Subjective Comparison

In this section, another example of the segmented images regarding each dataset is provided as a subjective comparison. The results are compared versus six previous works, including Converted 2D to 3D [20], 3D U-Net [9], ConvNet [52], 3D Dense U-Net, UNet attention [44] [31], and NAS U-Net [50] respectively. NAS U-Net [50] is an automatic reinforcement-based technique to develop a network. Also, Converted 2D to 3D [20] is an automatic evolutionary model to develop networks and the rest are manually designed networks. As shown in Fig. 10, our proposed model could predict the ROI with high accuracy; however, over or under segmentation can be seen in some of the previous work’s results. For example, ConvNet even could not predict the ROI from the Hippocampus image. Because these networks are developed for a specific application or dataset, the networks structure or their training parameters need to be tuned accordingly when changing the application or dataset. It needs to be noted that all the previous works are implemented and trained based on their source papers.

Fig. 10
figure 10

One sample 3D segmented volume from each dataset. The red contour is the ground truth, cyan is proposed 3D attention model, green is Converted 2D to 3D [20], blue is 3D U-Net [9], grey is UNet attention [44], yellow is ConvNet [52], orange is 3D Dense U-Net [31], and magenta is NAS U-Net [50]

Example of Crossover and Mutation

To clarify the crossover and mutation in our proposed model, an example is provided in Fig. 11. As shown in Fig. 11, two chromosomes are selected as parents, applying one-point crossover, two new offspring generated. Besides, three random mutations happened in one of the children and one mutation to the other one. This procedure needs to be repeated to create a new generation. After training the proposed model to the stated generation, a number of best networks will be selected as the final best networks.

Fig. 11
figure 11

An example of crossover and mutation

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hassanzadeh, T., Essam, D. & Sarker, R. Evolutionary Deep Attention Convolutional Neural Networks for 2D and 3D Medical Image Segmentation. J Digit Imaging 34, 1387–1404 (2021). https://doi.org/10.1007/s10278-021-00526-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10278-021-00526-2

Keywords

Navigation