Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3637528.3672069acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain

Published: 24 August 2024 Publication History

Abstract

Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. Our method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy. Code available at: https://github.com/aminK8/Masked-LoGoNet.

Supplemental Material

MP4 File - the Ohio State University
Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain
MP4 File
Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain

References

[1]
Bobby Azad, Reza Azad, Sania Eskandari, and other. 2023. Foundational models in medical imaging: A comprehensive survey and future vision. arXiv preprint arXiv:2310.18689 (2023).
[2]
Shekoofeh Azizi et al. 2021. Big self-supervised models advance medical image classification. In Proceedings of the IEEE/CVF international conference on computer vision. ICCV.
[3]
Yu Cai, Hao Chen, Xin Yang, et al. 2023. Dual-distribution discrepancy with self-supervised refinement for anomaly detection in medical images. Medical Image Analysis, Vol. 86 (2023), 102794.
[4]
Yutong Cai and Yong Wang. 2022. Ma-unet: An improved version of unet based on multi-scale and attention mechanism for medical image segmentation. In Third International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021), Vol. 12167. SPIE, 205--211.
[5]
Bing Cao, Han Zhang, Nannan Wang, et al. 2020. Auto-GAN: self-supervised collaborative learning for medical image synthesis. In Proceedings of the AAAI conference on artificial intelligence.
[6]
Hu Cao, Yueyue Wang, Joy Chen, et al. 2023. Swin-unet: Unet-like pure transformer for medical image segmentation. In Computer Vision--ECCV.
[7]
Rich Caruana. 1997. Multitask Learning. Mach. Learn. (1997). https://doi.org/10.1023/A:1007379606734
[8]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning.
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Minneapolis, MN, USA, 4171--4186.
[10]
Alexey Dosovitskiy, Lucas Beyer, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[11]
Eli Gibson, Francesco Giganti, and et al Hu. 2018. Automatic multi-organ segmentation on abdominal CT with dense V-networks. IEEE transactions on medical imaging (2018).
[12]
Meng-Hao Guo, Cheng-Ze Lu, et al. 2022. Visual attention network. arXiv preprint arXiv:2202.09741 (2022).
[13]
Fatemeh Haghighi, Mohammad Reza Hosseinzadeh Taher, et al. 2022. DiRA: Discriminative, restorative, and adversarial learning for self-supervised medical image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20824--20834.
[14]
Ali Hatamizadeh, Vishwesh Nath, et al. 2021. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In International MICCAI Brainlesion Workshop.
[15]
Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, et al. 2022. Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Conference on WACV.
[16]
Kaiming He, Xinlei Chen, et al. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000--16009.
[17]
Sheng He, P Ellen Grant, and Yangming Ou. 2021. Global-local transformer for brain age estimation. IEEE transactions on medical imaging (2021).
[18]
Yufan He, Aaron Carass, Lianrui Zuo, et al. 2021. Autoencoder based self-supervised test-time adaptation for medical image analysis. Medical image analysis, Vol. 72 (2021), 102136.
[19]
Yufan He, Dong Yang, Holger Roth, et al. 2021. Dints: Differentiable neural network topology search for 3d medical image segmentation. In Proceedings of the IEEE/CVF Conference on CVPR.
[20]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[21]
Wei-Ning Hsu, Benjamin Bolte, et al. 2021. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2021).
[22]
Fabian Isensee, Jens Petersen, et al. 2018. nnu-net: Self-adapting framework for u-net-based medical image segmentation. arXiv preprint arXiv:1809.10486 (2018).
[23]
Devendra K Jangid, Neal R Brodnik, et al. 2024. Q-RBSA: high-resolution 3D EBSD map generation using an efficient quaternion transformer network. npj Computational Materials, Vol. 10, 1 (2024), 27.
[24]
Ioannis Kakogeorgiou, Spyros Gidaris, Bill Psomas, et al. 2022. What to hide from your students: Attention-guided masked image modeling. In European Conference on Computer Vision. Springer, 300--318.
[25]
Amin Karimi Monsefi, Pouya Shiri, et al. 2023. CrashFormer: A Multimodal Architecture to Predict the Risk of Crash. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Advances in Urban-AI. 42--51.
[26]
Salman Khan, Muzammal Naseer, et al. 2022. Transformers in vision: A survey. ACM computing surveys (CSUR), Vol. 54, 10s (2022), 1--41.
[27]
Jiangyun Li, Junfeng Zheng, Meng Ding, and Hong Yu. 2021. Multi-branch sharing network for real-time 3D brain tumor segmentation. Journal of Real-Time Image Processing (2021), 1--11.
[28]
Zhaowen Li, Zhiyang Chen, Fan Yang, et al. 2021. Mst: Masked self-supervised transformer for visual representation. Advances in Neural Information Processing Systems (2021).
[29]
Thomas M. Mitchell. 1997. Machine Learning 1 ed.). McGraw-Hill, Inc., USA.
[30]
Andriy Myronenko. 2019. 3D MRI brain tumor segmentation using autoencoder regularization. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, Held in Conjunction with MICCAI.
[31]
Ozan Oktay, Jo Schlemper, et al. 2018. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018).
[32]
Shehan Perera, Pouyan Navard, and Alper Yilmaz. 2024. SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation. arXiv preprint arXiv:2404.10156 (2024).
[33]
Yuge Shi, N Siddharth, et al. 2022. Adversarial masking for self-supervised learning. In International Conference on Machine Learning.
[34]
Amber L Simpson, Michela Antonelli, Spyridon Bakas, et al. 2019. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 (2019).
[35]
Satya P Singh, Lipo Wang, et al. 2020. 3D deep learning on medical images: a review. Sensors, Vol. 20, 18 (2020), 5097.
[36]
Aiham Taleb, Winfried Loetzsch, Noel Danz, Julius Severin, et al. 2020. 3d self-supervised methods for medical imaging. Advances in neural information processing systems (2020).
[37]
Yucheng Tang, Dong Yang, et al. 2022. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF Conference on CVPR.
[38]
Xing Tao, Yuexiang Li, Wenhui Zhou, Kai Ma, and Yefeng Zheng. 2020. Revisiting Rubik's cube: self-supervised learning with volume-wise transformation for 3D medical image segmentation. In Medical Image Computing and Computer Assisted Intervention--MICCAI 2020: 23rd International Conference. Springer, Lima, Peru, 238--248.
[39]
Jeya Maria Jose Valanarasu et al. 2021. Medical transformer: Gated axial-attention for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention.
[40]
Hongyi Wang, Yingying Xu, Qingqing Chen, et al. 2023. Adaptive decomposition and shared weight volumetric transformer blocks for efficient patch-free 3d medical image segmentation. IEEE Journal of Biomedical and Health Informatics (2023).
[41]
Risheng Wang, Tao Lei, et al. 2022. Medical image segmentation using deep learning: A survey. IET Image Processing (2022).
[42]
Huisi Wu, Shihuai Chen, et al. 2022. FAT-Net: Feature adaptive transformers for automated skin lesion segmentation. Medical image analysis (2022).
[43]
Yingda Xia, Fengze Liu, Dong Yang, et al. 2020. 3d semi-supervised learning with uncertainty-aware multi-view co-training. In Proceedings of the IEEE/CVF Conference on WACV.
[44]
Zhenda Xie, Zheng Zhang, Yue Cao, et al. 2022. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on CVPR.
[45]
Junshen Xu and Elfar Adalsteinsson. 2021. Deformed2self: Self-supervised denoising for dynamic medical imaging. In Medical Image Computing and Computer Assisted Intervention--MICCAI 2021: 24th International Conference, Strasbourg, France, September 27--October 1, 2021, Proceedings, Part II 24. Springer, 25--35.
[46]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Annual Conference on Neural Information Processing Systems. NeurIPS.
[47]
He Zhao, Yuexiang Li, Nanjun He, Kai Ma, et al. 2021. Anomaly detection for medical images using self-supervised and translation-consistent features. IEEE Transactions on Medical Imaging, Vol. 40, 12 (2021), 3641--3651.
[48]
Hong-Yu Zhou, Shuang Yu, Cheng Bian, et al. 2020. Comparing to learn: Surpassing imagenet pretraining on radiographs by comparing image representations. In Medical Image Computing and Computer Assisted Intervention.
[49]
Jinghao Zhou, Chen Wei, Huiyu Wang, et al. 2021. ibot: Image bert pre-training with online tokenizer. arXiv preprint arXiv:2111.07832 (2021).
[50]
Mengxi Zhou, Nathan Doble, et al. 2022. Using deep learning for the automated identification of cone and rod photoreceptors from adaptive optics imaging of the human retina. Biomedical Optics Express (2022).
[51]
Mengxi Zhou and Rajiv Ramnath. 2022. A Structure-Focused Deep Learning Approach for Table Recognition from Document Images. In 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 593--601.
[52]
Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, et al. 2019. Unet: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE transactions on medical imaging (2019).
[53]
Jiuwen Zhu, Yuexiang Li, Yifan Hu, et al. 2020. Rubik's cube: A self-supervised feature learning framework for 3d medical image analysis. Medical image analysis (2020).

Index Terms

  1. Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2024
    6901 pages
    ISBN:9798400704901
    DOI:10.1145/3637528
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dual-encoder
    2. image segmentation
    3. medical imaging
    4. multi-task learning
    5. self-supervised learning

    Qualifiers

    • Research-article

    Conference

    KDD '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 78
      Total Downloads
    • Downloads (Last 12 months)78
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 25 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media