Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3474085.3475418acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-Modal Multi-Instance Learning for Retinal Disease Recognition

Published: 17 October 2021 Publication History

Abstract

This paper attacks an emerging challenge of multi-modal retinal disease recognition. Given a multi-modal case consisting of a color fundus photo (CFP) and an array of OCT B-scan images acquired during an eye examination, we aim to build a deep neural network that recognizes multiple vision-threatening diseases for the given case. As the diagnostic efficacy of CFP and OCT is disease-dependent, the network's ability of being both selective and interpretable is important. Moreover, as both data acquisition and manual labeling are extremely expensive in the medical domain, the network has to be relatively lightweight for learning from a limited set of labeled multi-modal samples. Prior art on retinal disease recognition focuses either on a single disease or on a single modality, leaving multi-modal fusion largely underexplored. We propose in this paper Multi-Modal Multi-Instance Learning (MM-MIL) for selectively fusing CFP and OCT modalities. Its lightweight architecture (as compared to current multi-head attention modules) makes it suited for learning from relatively small-sized datasets. For an effective use of MM-MIL, we propose to generate a pseudo sequence of CFPs by over sampling a given CFP. The benefits of this tactic include well balancing instances across modalities, increasing the resolution of the CFP input, and finding out regions of the CFP most relevant with respect to the final diagnosis. Extensive experiments on a real-world dataset consisting of 1,206 multi-modal cases from 1,193 eyes of 836 subjects demonstrate the viability of the proposed model.

References

[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[2]
Philippe M Burlina, Neil Joshi, Michael Pekala, Katia D Pacheco, David E Freund, and Neil M Bressler. 2017. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmology, Vol. 135, 11 (2017), 1170--1176.
[3]
Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In CVPR.
[4]
Hila Chefer, Shir Gur, and Lior Wolf. 2021. Transformer Interpretability Beyond Attention Visualization. In CVPR.
[5]
Ying Cheng, Ruize Wang, Zhihao Pan, Rui Feng, and Yuejie Zhang. 2020. Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning. In ACMMM.
[6]
Siying Dai, Leiting Chen, Ting Lei, Chuan Zhou, and Yang Wen. 2020. Automatic Detection Of Pathological Myopia And High Myopia On Fundus Images. In ICME.
[7]
Jeffrey De Fauw, Joseph R Ledsam, Bernardino Romera-Paredes, Stanislav Nikolov, Nenad Tomasev, Sam Blackwell, Harry Askham, Xavier Glorot, Brendan O'Donoghue, Daniel Visentin, et al. 2018. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature medicine, Vol. 24, 9 (2018), 1342--1350.
[8]
Rishab Gargeya and Theodore Leng. 2017. Automated identification of diabetic retinopathy using deep learning. Ophthalmology, Vol. 124, 7 (2017), 962--969.
[9]
Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In ACMMM.
[10]
Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, et al. 2016. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, Vol. 316, 22 (2016), 2402--2410.
[11]
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?. In CVPR.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
[13]
Daniel S Kermany, Michael Goldbaum, Wenjia Cai, Carolina CS Valentim, Huiying Liang, Sally L Baxter, Alex McKeown, Ge Yang, Xiaokang Wu, Fangbing Yan, et al. 2018. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, Vol. 172, 5 (2018), 1122--1131.
[14]
Cecilia S. Lee, Doug M. Baughman, and Aaron Y. Lee. 2017. Deep Learning is Effective for Classifying Normal versus Age-Related Macular Degeneration OCT Images. Opthalmology Retina, Vol. 1, 4 (2017), 322--327.
[15]
Bing Li, Huan Chen, Bilei Zhang, Mingzhen Yuan, Xuemin Jin, Bo Lei, Jie Xu, Wei Gu, David Wong, Xixi He, Hao Wang, Dayong Ding, Xirong Li, Weihong Yu, and Youxin Chen. 2021. Development and evaluation of a deep learning model for the detection of multiple fundus diseases based on color fundus photography. British Journal of Ophthalmology (2021).
[16]
Xirong Li, Wencui Wan, Yang Zhou, Jianchun Zhao, Qijie Wei, Junbo Rong, Pengyi Zhou, Limin Xu, Lijuan Lang, Yuying Liu, Chengzhi Niu, Dayong Ding, and Xuemin Jin. 2020. Deep Multiple Instance Learning with Spatial Attention for ROP Case Classification, Instance Selection and Abnormality Localization. In ICPR.
[17]
Ilse Maximilian, Tomczak Jakub, and Welling Max. 2018. Attention-based Deep Multiple Instance Learning. In ICML.
[18]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS.
[19]
Leslie N Smith and Nicholay Topin. 2019. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications.
[20]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NeurIPS.
[21]
Weisen Wang, Zhiyan Xu, Weihong Yu, Jianchun Zhao, Jingyuan Yang, Feng He, Zhikun Yang, Di Chen, Dayong Ding, Youxin Chen, and Xirong Li. 2019. Two-Stream CNN with Loose Pair Training for Multi-modal AMD Categorization. In MICCAI.
[22]
Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020. Multi-Modality Cross Attention Network for Image and Sentence Matching. In CVPR.
[23]
Jun Wu, Yao Zhang, Jie Wang, Jianchun Zhao, Dayong Ding, Ningjiang Chen, Lingling Wang, Xuan Chen, Chunhui Jiang, Xuan Zou, Xing Liu, Hui Xiao, Yuan Tian, Zongjiang Shang, Kaiwei Wang, Xirong Li, Gang Yang, and Jianping Fan. 2020. AttenNet: Deep Attention Based Retinal Disease Classification in OCT Images. In MMM.
[24]
Gang Yang, Fan Li, Dayong Ding, Jun Wu, and Jie Xu. 2021. Automatic Diagnosis of Glaucoma on Color Fundus Images Using Adaptive Mask Deep Network. In MMM.
[25]
Hao Ye, Zuxuan Wu, Rui-Wei Zhao, Xi Wang, Yu-Gang Jiang, and Xiangyang Xue. 2015. Evaluating Two-Stream CNN for Video Classification. In ICMR.

Cited By

View all
  • (2024)IMF-MF: Interactive moment localization with adaptive multimodal fusion and self-attentionJournal of Intelligent & Fuzzy Systems10.3233/JIFS-233071(1-12)Online publication date: 4-Apr-2024
  • (2024)Advances and prospects of multi-modal ophthalmic artificial intelligence based on deep learning: a reviewEye and Vision10.1186/s40662-024-00405-111:1Online publication date: 1-Oct-2024
  • (2024)Multimodality Data Augmentation Network for Arrhythmia ClassificationInternational Journal of Intelligent Systems10.1155/2024/99548212024:1Online publication date: 14-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep multi-instance learning
  2. multi-modal feature fusion
  3. multi-modal retinal imaging
  4. retinal disease recognition

Qualifiers

  • Research-article

Funding Sources

  • BJNSF
  • the Pharmaceutical Collaborative Innovation Research Project of Beijing Science and Technology Commission
  • BJNSFC Haidian Original Innovation Joint Fund

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)21
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)IMF-MF: Interactive moment localization with adaptive multimodal fusion and self-attentionJournal of Intelligent & Fuzzy Systems10.3233/JIFS-233071(1-12)Online publication date: 4-Apr-2024
  • (2024)Advances and prospects of multi-modal ophthalmic artificial intelligence based on deep learning: a reviewEye and Vision10.1186/s40662-024-00405-111:1Online publication date: 1-Oct-2024
  • (2024)Multimodality Data Augmentation Network for Arrhythmia ClassificationInternational Journal of Intelligent Systems10.1155/2024/99548212024:1Online publication date: 14-Jul-2024
  • (2024)Geometric Correspondence-Based Multimodal Learning for Ophthalmic Image AnalysisIEEE Transactions on Medical Imaging10.1109/TMI.2024.335260243:5(1945-1957)Online publication date: May-2024
  • (2024)Semantic Neural Network for Micro-Vessels Extraction2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)10.1109/ICHI61247.2024.00015(51-56)Online publication date: 3-Jun-2024
  • (2024)Cascaded Network for Multiscale Feature Extraction2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)10.1109/ICHI61247.2024.00014(46-50)Online publication date: 3-Jun-2024
  • (2024)A comprehensive investigation of multimodal deep learning fusion strategies for breast cancer classificationArtificial Intelligence Review10.1007/s10462-024-10984-z57:12Online publication date: 12-Oct-2024
  • (2024)MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text ExpertiseMedical Image Computing and Computer Assisted Intervention – MICCAI 202410.1007/978-3-031-72378-0_67(722-732)Online publication date: 3-Oct-2024
  • (2024)An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image SynthesisMedical Image Computing and Computer Assisted Intervention – MICCAI 202410.1007/978-3-031-72378-0_61(656-666)Online publication date: 3-Oct-2024
  • (2023)A Multimodal Deep Neural Network for ECG and PCG Classification With Multimodal Fusion2023 13th International Conference on Information Science and Technology (ICIST)10.1109/ICIST59754.2023.10367180(124-128)Online publication date: 8-Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media