Abstract
Brain computed tomography (CT) report generation, which aims at generating accurate and descriptive reports for Brain CT imaging, has gained growing attention from researchers. Existing works mainly train a language-generation model with complex image-text pairs for supervision, which still struggled with the following challenges: 1) the serious long-tail distribution of textual supervise signals led by imbalanced text length distribution, and 2) the insufficient medical data caused by expensive expert intervention. In this paper, we propose a novel Gaussian heuristic curriculum learning (GHCL) model to effectively tackle the long-tail data distribution and optimally utilize the limited training data. Specifically, our training process mimics the learning process of physicians in a step-wise paradigm. Firstly, we evaluate the scores of training difficulty for each sample through two elaborately designed Gaussian heuristic metrics. Then, during the training of the language-generation model, we iteratively select the most suitable batch of training samples, which is comprehensively considered by the calculated scores of training difficulty. In this way, GHCL can effectively guide the progressive learning of the report generation model and boost the quality of generated Brain CT reports. We comprehensively compare the method with previous state-of-the-art models on the Brain CT report generation dataset BCT-CHR. Experimental results demonstrate that our method surpasses previous state-of-the-art approaches and GHCL is flexible to combine with existing approaches to further improve the performance.
Similar content being viewed by others
Data availability
The datasets are not publicly available due to that we do not have permission to make them public, but are available from the corresponding author on reasonable request.
References
Jing, B., Xie, P., Xing, E.P.: On the Automatic Generation of Medical Imaging Reports. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers (2018)
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 (2018)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: A neural image caption generator. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015 (2015)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel, R.S., Bengio, Y.: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015 (2015)
Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays. 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 (2018)
Ni, J., Hsu, C., Gentili, A., McAuley, J.J.: Learning visual-semantic embeddings for reporting abnormal findings on chest x-rays. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, pp. 1954–1960 (2020)
Yang, S., Ji, J., Zhang, X., Liu, Y., Wang, Z.: Weakly Guided Hierarchical Encoder-Decoder Network for Brain CT Report Generation. IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021, Houston, TX, USA, December 9-12, 2021 (2021)
Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S.K., Xiao, L.: Radiology report generation with a learned knowledge base and multi-modal alignment. Medical Image Anal. 86, 102798 (2023)
Yan, A., He, Z., Lu, X., Du, J., Chang, E.Y., Gentili, A., McAuley, J.J., Hsu, C.: Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation. Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021 (2021)
Chen, Z., Shen, Y., Song, Y., Wan, X.: Cross-modal Memory Networks for Radiology Report Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021 (2021)
Qin, H., Song, Y.: Reinforced Cross-modal Alignment for Radiology Report Generation. Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022 (2022)
Wang, J., Bhalerao, A., He, Y.: Cross-Modal Prototype Driven Network for Radiology Report Generation. Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXV (2022)
Xue, Y., Xu, T., Long, L.R., Xue, Z., Antani, S.K., Thoma, G.R., Huang, X.: Multimodal Recurrent Model with Attention for Automated Radiology Report Generation. Medical Image Computing and Computer Assisted Intervention - MICCAI 2018 - 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I (2018)
Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A.L., Xu, D.: When Radiology Report Generation Meets Knowledge Graph. The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 (2020)
Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021)
Chen, Z., Song, Y., Chang, T., Wan, X.: Generating Radiology Reports via Memory-driven Transformer. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020 (2020)
Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C.L.: Microsoft COCO captions: Data collection and evaluation server. CoRR (2015)
Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-Critical Sequence Training for Image Captioning. 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 (2017)
Liu, F., Liu, Y., Ren, X., He, X., Sun, X.: Aligning visual regions and textual concepts for semantic-grounded image representations. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 6847–6857 (2019)
Liu, F., Ren, X., Wu, X., Ge, S., Fan, W., Zou, Y., Sun, X.: Prophet attention: Predicting attention with future attention. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020)
Zheng, A., Zheng, S., Bai, C., Chen, D.: Triple-level relationship enhanced transformer for image captioning. Multim. Syst. 29(4), 1955–1966 (2023)
Carmo Nogueira, T., Vinhal, C.D.N., Cruz Júnior, G., Ullmann, M.R.D., Marques, T.C.: A reference-based model using deep learning for image captioning. Multim. Syst. 29(3), 1665–1681 (2023)
Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning. 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, vol. 382, pp. 41–48 (2009)
Platanios, E.A., Stretcu, O., Neubig, G., Póczos, B., Mitchell, T.M.: Competence-based curriculum learning for neural machine translation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 1162–1172 (2019)
Kumar, G., Foster, G.F., Cherry, C., Krikun, M.: Reinforcement learning based curriculum optimization for neural machine translation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 2054–2061 (2019)
Zhang, X., Kumar, G., Khayrallah, H., Murray, K., Gwinnup, J., Martindale, M.J., McNamee, P., Duh, K., Carpuat, M.: An empirical exploration of curriculum learning for neural machine translation. CoRR abs/1811.00739 (2018)
Liu, X., Lai, H., Wong, D.F., Chao, L.S.: Norm-based curriculum learning for neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pp. 427–436 (2020)
Weinshall, D., Cohen, G., Amir, D.: Curriculum learning by transfer learning: Theory and experiments with deep networks. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, vol. 80, pp. 5235–5243 (2018)
Wang, Y., Gan, W., Yang, J., Wu, W., Yan, J.: Dynamic curriculum learning for imbalanced data classification. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 5016–5025 (2019)
Li, Q., Huang, S., Hong, Y., Zhu, S.: A competence-aware curriculum for visual concepts learning via question answering. In: Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part II, vol. 12347, pp. 141–157 (2020)
Elman, J.L.: Learning and development in neural networks: The importance of starting small. Cognition 48, 71–99 (1993)
Liu, F., Ge, S., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. CoRR abs/2206.14579 (2022)
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA (2002)
Lavie, A., Agarwal, A.: METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments. Proceedings of the Second Workshop on Statistical Machine Translation, WMT@ACL 2007, Prague, Czech Republic, June 23, 2007 (2007)
Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Vedantam, R., Zitnick, C.L., Parikh, D.: CIDEr: Consensus-based image description evaluation. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015 (2015)
Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A Hierarchical Approach for Generating Descriptive Image Paragraphs. 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 (2017)
Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 10575–10584 (2020)
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 61906007 and 62276010, in part by the R &D Program of Beijing Municipal Education Commission under Grant KM202110005022 and KZ202210005009.
Author information
Authors and Affiliations
Contributions
QS, XZ and YL: contributed to the study conception and design. The first draft of the manuscript was written by QS and YS, and reviewing and editing were mainly performed by XZ and JJ. Data collection and clinical analysis were performed by YL and HX. YL: contributed crucial medical knowledge for the work. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Additional information
Communicated by B. Bao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shen, Q., Shi, Y., Zhang, X. et al. GHCL: Gaussian heuristic curriculum learning for Brain CT report generation. Multimedia Systems 30, 69 (2024). https://doi.org/10.1007/s00530-024-01266-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01266-3