Abstract
A reliable/trustworthy image segmentation pipeline plays a central role in deploying AI medical image analysis systems in clinical practice. Given a segmentation map produced by a segmentation model, it is desired to have an automatic, accurate, and reliable method in the pipeline for segmentation quality assessment (SQA) when the ground truth is absent. In this paper, we present a novel holistic consistency based method for assessing at the subject-level the quality of segmentation produced by state-of-the-art segmentation models. Our method does not train a dedicated model using labeled samples to assess segmentation quality; instead, it systematically explores the segmentation consistency in an unsupervised manner. Our approach examines the consistency of segmentation results across three major aspects: (1) consistency across sub-models; (2) consistency across models; (3) consistency across different runs with random dropouts. For a given test image, combining consistency scores from the above mentioned aspects, we can generate an overall consistency score that is highly correlated with the true segmentation quality score (e.g., Dice score) in both linear correlation and rank correlation. Empirical results on two public datasets demonstrate that our proposed method outperforms previous unsupervised methods for subject-level SQA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Audelan, B., Delingette, H.: Unsupervised quality control of segmentations based on a smoothness and intensity probabilistic model. Med. Image Anal. 68, 101895 (2021)
Bilic, P., et al.: The Liver tumor segmentation benchmark (LiTS). Med. Image Anal. 84, 102680 (2023). https://doi.org/10.1016/j.media.2022.102680
Chen, H., Murphy, R.F.: Evaluation of cell segmentation methods without reference segmentations. Mol. Biol. Cell. 34(6), ar50 (2023)
Colleoni, E., Edwards, P., Stoyanov, D.: Synthetic and real inputs for tool segmentation in robotic surgery. In: Martel, A.L., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III, pp. 700–710. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_67
DeVries, T., Taylor, G.W.: Leveraging uncertainty estimates for predicting segmentation quality. arXiv preprint arXiv:1807.00502 (2018)
Dong, B., Wang, W., Fan, D.-P., Li, J., Fu, H., Shao, L.: Polyp-PVT: polyp segmentation with pyramid vision Transformers. arXiv preprint arXiv:2108.06932 (2021)
Fan, D.-P.: Official code of Polyp-PVT for polyp segmentation in endoscopic images. https://github.com/DengPingFan/Polyp-PVT/
Fan, D.-P., et al.: PraNet: parallel reverse attention network for polyp segmentation. In: Martel, A.L., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI, pp. 263–273. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_26
Fan, D.-P., et al.: Inf-Net: automatic COVID-19 lung infection segmentation from CT images. IEEE Trans. Med. Imaging 39(8), 2626–2637 (2020)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML, pp. 1050–1059. PMLR (2016)
Huang, C., Wu, Q., Meng, F.: QualityNet: segmentation quality evaluation with deep convolutional networks. In: 2016 Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2016)
Jungo, A., Reyes, M.: Assessing reliability and challenges of uncertainty estimations for medical image segmentation. In: MICCAI, pp. 48–56. Springer (2019). https://doi.org/10.1007/978-3-030-32245-8_6
Kushibar, K., Campello, V., Garrucho, L., Linardos, A., Radeva, P., Lekadir, K.: Layer Ensembles: a single-pass uncertainty estimation in deep learning for segmentation. In: MICCAI, pp. 514–524. Springer (2022). https://doi.org/10.1007/978-3-031-16452-1_49
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Lee, C.-Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, pp. 562–570. PMLR (2015)
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2014)
Rahman, Q.M., Sünderhauf, N., Corke, P., Dayoub, F.: FSNet: a failure detection framework for semantic segmentation. IEEE Robot. Autom. Lett. 7(2), 3030–3037 (2022)
Robinson, R., et al.: Real-time prediction of segmentation quality. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part IV, pp. 578–585. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_66
Rottmann, M., et al.: Prediction error meta classification in semantic segmentation: detection via aggregated dispersion measures of softmax probabilities. In: IJCNN, pp. 1–9. IEEE (2020)
Sedgwick, P.: Pearson’s correlation coefficient. The BMJ, 345 (2012)
Valindria, V.V., et al.: Reverse classification accuracy: predicting segmentation performance in the absence of ground truth. IEEE Trans. Med. Imaging. 36(8), 1597–1606 (2017)
Zar, J.H.: Spearman rank correlation. Encyclopedia Biostat., 7 (2005)
Zhang. W.: Official code of HSNet for polyp segmentation in endoscopic images. https://github.com/baiboat/HSNet/
Zhang, W.: Official Code of Inf-Net for lung infection segmentation in CT images. https://github.com/DengPingFan/Inf-Net/
Zhang, W., Chong, F., Zheng, Yu., Zhang, F., Zhao, Y., Sham, C.-W.: HSNet: a hybrid semantic network for polyp segmentation. Comput. Biol. Med. 150, 106173 (2022)
Zhou, L., Deng, W., Wu, X.: Robust image segmentation quality assessment. In: Medical Imaging with Deep Learning (2020)
Acknowledgments
We sincerely thank the reviewers for their time and effort in reviewing our manuscript and for providing constructive feedback to improve our work. This research was supported in part by the Natural Science Foundation of Jiangsu Province (Grant BK20220949), and National Natural Science Foundation of China (Grant 62201263).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to decalre.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Zhou, T., Chen, Q., Dou, Q., Wang, S. (2025). Holistic Consistency for Subject-Level Segmentation Quality Assessment in Medical Image Segmentation. In: Sudre, C.H., Mehta, R., Ouyang, C., Qin, C., Rakic, M., Wells, W.M. (eds) Uncertainty for Safe Utilization of Machine Learning in Medical Imaging. UNSURE 2024. Lecture Notes in Computer Science, vol 15167. Springer, Cham. https://doi.org/10.1007/978-3-031-73158-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-73158-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73157-0
Online ISBN: 978-3-031-73158-7
eBook Packages: Computer ScienceComputer Science (R0)