research-article

Multi-Modal Multi-Instance Learning for Retinal Disease Recognition

Authors:

Youxin ChenAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 2474 - 2482

https://doi.org/10.1145/3474085.3475418

Published: 17 October 2021 Publication History

Abstract

This paper attacks an emerging challenge of multi-modal retinal disease recognition. Given a multi-modal case consisting of a color fundus photo (CFP) and an array of OCT B-scan images acquired during an eye examination, we aim to build a deep neural network that recognizes multiple vision-threatening diseases for the given case. As the diagnostic efficacy of CFP and OCT is disease-dependent, the network's ability of being both selective and interpretable is important. Moreover, as both data acquisition and manual labeling are extremely expensive in the medical domain, the network has to be relatively lightweight for learning from a limited set of labeled multi-modal samples. Prior art on retinal disease recognition focuses either on a single disease or on a single modality, leaving multi-modal fusion largely underexplored. We propose in this paper Multi-Modal Multi-Instance Learning (MM-MIL) for selectively fusing CFP and OCT modalities. Its lightweight architecture (as compared to current multi-head attention modules) makes it suited for learning from relatively small-sized datasets. For an effective use of MM-MIL, we propose to generate a pseudo sequence of CFPs by over sampling a given CFP. The benefits of this tactic include well balancing instances across modalities, increasing the resolution of the CFP input, and finding out regions of the CFP most relevant with respect to the final diagnosis. Extensive experiments on a real-world dataset consisting of 1,206 multi-modal cases from 1,193 eyes of 836 subjects demonstrate the viability of the proposed model.

References

[1]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).

[2]

Philippe M Burlina, Neil Joshi, Michael Pekala, Katia D Pacheco, David E Freund, and Neil M Bressler. 2017. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmology, Vol. 135, 11 (2017), 1170--1176.

[3]

Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In CVPR.

[4]

Hila Chefer, Shir Gur, and Lior Wolf. 2021. Transformer Interpretability Beyond Attention Visualization. In CVPR.

[5]

Ying Cheng, Ruize Wang, Zhihao Pan, Rui Feng, and Yuejie Zhang. 2020. Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning. In ACMMM.

Digital Library

[6]

Siying Dai, Leiting Chen, Ting Lei, Chuan Zhou, and Yang Wen. 2020. Automatic Detection Of Pathological Myopia And High Myopia On Fundus Images. In ICME.

[7]

Jeffrey De Fauw, Joseph R Ledsam, Bernardino Romera-Paredes, Stanislav Nikolov, Nenad Tomasev, Sam Blackwell, Harry Askham, Xavier Glorot, Brendan O'Donoghue, Daniel Visentin, et al. 2018. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature medicine, Vol. 24, 9 (2018), 1342--1350.

[8]

Rishab Gargeya and Theodore Leng. 2017. Automated identification of diabetic retinopathy using deep learning. Ophthalmology, Vol. 124, 7 (2017), 962--969.

[9]

Negin Ghamsarian, Hadi Amirpourazarian, Christian Timmerer, Mario Taschwer, and Klaus Schöffmann. 2020. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In ACMMM.

Digital Library

[10]

Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek Wu, Arunachalam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Jorge Cuadros, et al. 2016. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, Vol. 316, 22 (2016), 2402--2410.

[11]

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?. In CVPR.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.

[13]

Daniel S Kermany, Michael Goldbaum, Wenjia Cai, Carolina CS Valentim, Huiying Liang, Sally L Baxter, Alex McKeown, Ge Yang, Xiaokang Wu, Fangbing Yan, et al. 2018. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, Vol. 172, 5 (2018), 1122--1131.

[14]

Cecilia S. Lee, Doug M. Baughman, and Aaron Y. Lee. 2017. Deep Learning is Effective for Classifying Normal versus Age-Related Macular Degeneration OCT Images. Opthalmology Retina, Vol. 1, 4 (2017), 322--327.

[15]

Bing Li, Huan Chen, Bilei Zhang, Mingzhen Yuan, Xuemin Jin, Bo Lei, Jie Xu, Wei Gu, David Wong, Xixi He, Hao Wang, Dayong Ding, Xirong Li, Weihong Yu, and Youxin Chen. 2021. Development and evaluation of a deep learning model for the detection of multiple fundus diseases based on color fundus photography. British Journal of Ophthalmology (2021).

[16]

Xirong Li, Wencui Wan, Yang Zhou, Jianchun Zhao, Qijie Wei, Junbo Rong, Pengyi Zhou, Limin Xu, Lijuan Lang, Yuying Liu, Chengzhi Niu, Dayong Ding, and Xuemin Jin. 2020. Deep Multiple Instance Learning with Spatial Attention for ROP Case Classification, Instance Selection and Abnormality Localization. In ICPR.

[17]

Ilse Maximilian, Tomczak Jakub, and Welling Max. 2018. Attention-based Deep Multiple Instance Learning. In ICML.

[18]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS.

Digital Library

[19]

Leslie N Smith and Nicholay Topin. 2019. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications.

[20]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NeurIPS.

Digital Library

[21]

Weisen Wang, Zhiyan Xu, Weihong Yu, Jianchun Zhao, Jingyuan Yang, Feng He, Zhikun Yang, Di Chen, Dayong Ding, Youxin Chen, and Xirong Li. 2019. Two-Stream CNN with Loose Pair Training for Multi-modal AMD Categorization. In MICCAI.

[22]

Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020. Multi-Modality Cross Attention Network for Image and Sentence Matching. In CVPR.

[23]

Jun Wu, Yao Zhang, Jie Wang, Jianchun Zhao, Dayong Ding, Ningjiang Chen, Lingling Wang, Xuan Chen, Chunhui Jiang, Xuan Zou, Xing Liu, Hui Xiao, Yuan Tian, Zongjiang Shang, Kaiwei Wang, Xirong Li, Gang Yang, and Jianping Fan. 2020. AttenNet: Deep Attention Based Retinal Disease Classification in OCT Images. In MMM.

[24]

Gang Yang, Fan Li, Dayong Ding, Jun Wu, and Jie Xu. 2021. Automatic Diagnosis of Glaucoma on Color Fundus Images Using Adaptive Mask Deep Network. In MMM.

[25]

Hao Ye, Zuxuan Wu, Rui-Wei Zhao, Xi Wang, Yu-Gang Jiang, and Xiangyang Xue. 2015. Evaluating Two-Stream CNN for Video Classification. In ICMR.

Digital Library

Cited By

Singh PKushwaha AVarshney N(2024)IMF-MF: Interactive moment localization with adaptive multimodal fusion and self-attentionJournal of Intelligent & Fuzzy Systems10.3233/JIFS-233071(1-12)Online publication date: 4-Apr-2024
https://doi.org/10.3233/JIFS-233071
Wang SHe XJian ZLi JXu CChen YLiu YChen HHuang CHu JLiu Z(2024)Advances and prospects of multi-modal ophthalmic artificial intelligence based on deep learning: a reviewEye and Vision10.1186/s40662-024-00405-111:1Online publication date: 1-Oct-2024
https://doi.org/10.1186/s40662-024-00405-1
Xu ZZang MLiu TWang ZZhou SLiu CWang Q(2024)Multimodality Data Augmentation Network for Arrhythmia ClassificationInternational Journal of Intelligent Systems10.1155/2024/99548212024:1Online publication date: 14-Jul-2024
https://doi.org/10.1155/2024/9954821
Show More Cited By

Index Terms

Multi-Modal Multi-Instance Learning for Retinal Disease Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval

Recommendations

Deep multi-instance learning for end-to-end person re-identification

In this paper, we introduce a deep multi-instance learning framework to boost the instance-level person re-identification performance. Motivated by the observation of considerably dramatic and complex varieties of visual appearances in many current ...
Multi-instance multi-label learning

In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more ...
Efficient multi-modal hypergraph learning for social image classification with complex label correlations

Multi-label and multi-modality are two dramatic characteristics of social images. Multi-labels illustrate the co-occurrence of objects in an image; while multimodal features represent the image from different viewpoints. They describe social images from ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

BJNSF
the Pharmaceutical Collaborative Innovation Research Project of Beijing Science and Technology Commission
BJNSFC Haidian Original Innovation Joint Fund

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
407
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)21

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Singh PKushwaha AVarshney N(2024)IMF-MF: Interactive moment localization with adaptive multimodal fusion and self-attentionJournal of Intelligent & Fuzzy Systems10.3233/JIFS-233071(1-12)Online publication date: 4-Apr-2024
https://doi.org/10.3233/JIFS-233071
Wang SHe XJian ZLi JXu CChen YLiu YChen HHuang CHu JLiu Z(2024)Advances and prospects of multi-modal ophthalmic artificial intelligence based on deep learning: a reviewEye and Vision10.1186/s40662-024-00405-111:1Online publication date: 1-Oct-2024
https://doi.org/10.1186/s40662-024-00405-1
Xu ZZang MLiu TWang ZZhou SLiu CWang Q(2024)Multimodality Data Augmentation Network for Arrhythmia ClassificationInternational Journal of Intelligent Systems10.1155/2024/99548212024:1Online publication date: 14-Jul-2024
https://doi.org/10.1155/2024/9954821
Wang YZhen LTan TFu HFeng YWang ZXu XGoh RNg YCalhoun CTan GSun JLiu YTing D(2024)Geometric Correspondence-Based Multimodal Learning for Ophthalmic Image AnalysisIEEE Transactions on Medical Imaging10.1109/TMI.2024.335260243:5(1945-1957)Online publication date: May-2024
https://doi.org/10.1109/TMI.2024.3352602
Khan MKhan MCopus B(2024)Semantic Neural Network for Micro-Vessels Extraction2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)10.1109/ICHI61247.2024.00015(51-56)Online publication date: 3-Jun-2024
https://doi.org/10.1109/ICHI61247.2024.00015
Khan MKhan MCopus B(2024)Cascaded Network for Multiscale Feature Extraction2024 IEEE 12th International Conference on Healthcare Informatics (ICHI)10.1109/ICHI61247.2024.00014(46-50)Online publication date: 3-Jun-2024
https://doi.org/10.1109/ICHI61247.2024.00014
Nakach FIdri AGoceri E(2024)A comprehensive investigation of multimodal deep learning fusion strategies for breast cancer classificationArtificial Intelligence Review10.1007/s10462-024-10984-z57:12Online publication date: 12-Oct-2024
https://doi.org/10.1007/s10462-024-10984-z
Wu RZhang CZhang JZhou YZhou TFu H(2024)MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text ExpertiseMedical Image Computing and Computer Assisted Intervention – MICCAI 202410.1007/978-3-031-72378-0_67(722-732)Online publication date: 3-Oct-2024
https://doi.org/10.1007/978-3-031-72378-0_67
Elbatel MKamnitsas KLi X(2024)An Organism Starts with a Single Pix-Cell: A Neural Cellular Diffusion for High-Resolution Image SynthesisMedical Image Computing and Computer Assisted Intervention – MICCAI 202410.1007/978-3-031-72378-0_61(656-666)Online publication date: 3-Oct-2024
https://doi.org/10.1007/978-3-031-72378-0_61
Han HXiang MLian CLiu DZeng Z(2023)A Multimodal Deep Neural Network for ECG and PCG Classification With Multimodal Fusion2023 13th International Conference on Information Science and Technology (ICIST)10.1109/ICIST59754.2023.10367180(124-128)Online publication date: 8-Dec-2023
https://doi.org/10.1109/ICIST59754.2023.10367180
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents