Simultaneous Monocular Endoscopic Dense Depth and Odometry Estimation Using Local-Global Integration Networks

Wenkang Fan ORCID: orcid.org/0000-0002-8364-5159¹⁴,
Wenjing Jiang¹⁴,
Hao Fang^14,15,
Hong Shi¹⁶,
Jianhua Chen¹⁶ &
…
Xiongbiao Luo ORCID: orcid.org/0000-0001-7906-8857^14,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15006))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

985 Accesses

Abstract

Accurate dense depth prediction of monocular endoscopic images is essential in expanding the surgical field and augmenting the perception of depth for surgeons. However, it remains challenging since endoscopic videos generally suffer from limited field of view, illumination variations, and weak texture. This work proposes LGIN, a new architecture with unsupervised learning for accurate dense depth recovery of monocular endoscopic images. Specifically, LGIN creates a hybrid encoder using dense convolution and pyramid vision transformer to extract local textural features and global spatial-temporal features in parallel, while building a decoder to effectively integrate the local and global features and use two-heads to estimate dense depth and odometry simultaneously, respectively. Additionally, we extract structure-valid regions to assist odometry prediction and unsupervised training to improve the accuracy of depth prediction. We evaluated our model on both clinical and synthetic unannotated colonoscopic video images, with the experimental results demonstrating that our model can achieve more accurate depth distribution and more sufficient textures. Both the qualitative and quantitative assessment results of our method are better than current monocular dense depth estimation models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy

Self-supervised Cascade Training for Monocular Endoscopic Dense Depth Recovery

Dense Depth Estimation from Stereo Endoscopy Videos Using Unsupervised Optical Flow Methods

References

Bae, J., Moon, S., Im, S.: Deep digging into the generalization of self-supervised monocular depth estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 187–196 (2023)
Google Scholar
Bian, J.W., Zhan, H., Wang, N., Li, Z., Zhang, L., Shen, C., Cheng, M.M., Reid, I.: Unsupervised scale-consistent depth learning from video. International Journal of Computer Vision 129(9), 2548–2564 (2021)
Article Google Scholar
Chen, M., Zhang, L., Feng, R., Xue, X., Feng, J.: Rethinking local and global feature representation for dense prediction. Pattern Recognition 135, 109168 (2023)
Article Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR). pp. 1–21 (2021)
Google Scholar
Fan, W., Zhang, K., Shi, H., Chen, J., Chen, Y., Luo, X.: Deep triple-supervision learning unannotated surgical endoscopic video data for monocular dense depth estimation. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1–5. IEEE (2023)
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3828–3838 (2019)
Google Scholar
Gottlieb, K., Daperno, M., Usiskin, K., Sands, B.E., Ahmad, H., Howden, C.W., Karnes, W., Oh, Y.S., Modesto, I., Marano, C., et al.: Endoscopy and central reading in inflammatory bowel disease clinical trials: achievements, challenges and future developments. Gut 70(2), 418–426 (2021)
Google Scholar
Han, W., Yin, J., Jin, X., Dai, X., Shen, J.: Brnet: Exploring comprehensive features for monocular depth estimation. In: European Conference on Computer Vision. pp. 586–602. Springer (2022)
Google Scholar
Huang, B., Zheng, J.Q., Nguyen, A., Xu, C., Gkouzionis, I., Vyas, K., Tuch, D., Giannarou, S., Elson, D.S.: Self-supervised depth estimation in laparoscopic image using 3d geometric consistency. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 13–22. Springer (2022)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4700–4708 (2017)
Google Scholar
Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Mori, K.: Multi-view guidance for self-supervised monocular depth estimation on laparoscopic images via spatio-temporal correspondence. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 429–439. Springer (2023)
Google Scholar
Liu, X., Sinha, A., Ishii, M., Hager, G.D., Reiter, A., Taylor, R.H., Unberath, M.: Dense depth estimation in monocular endoscopy with self-supervised learning methods. IEEE Transactions on Medical Imaging PP(99), 1–1 (2019)
Google Scholar
Liu, Y., Zuo, S.: Self-supervised monocular depth estimation for gastrointestinal endoscopy. Computer Methods and Programs in Biomedicine p. 107619 (2023)
Google Scholar
Ma, R., Wang, R., Zhang, Y., Pizer, S., McGill, S.K., Rosenman, J., Frahm, J.M.: Rnnslam: Reconstructing the 3d colon to visualize missing regions during a colonoscopy. Medical Image Analysis 72, 102100 (2021)
Article Google Scholar
Ozyoruk, K.B., Gokceler, G.I., Bobrow, T.L., Coskun, G., Incetan, K., Almalioglu, Y., Mahmood, F., Curto, E., Perdigoto, L., Oliveira, M., et al.: Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Medical Image Analysis 71, 102058 (2021)
Article Google Scholar
Papa, L., Russo, P., Amerini, I.: Meter: a mobile vision transformer architecture for monocular depth estimation. IEEE Transactions on Circuits and Systems for Video Technology (2023)
Google Scholar
Piccinelli, L., Sakaridis, C., Yu, F.: idisc: Internal discretization for monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21477–21487 (2023)
Google Scholar
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: IEEE/CVF International Conference on Computer Vision (ICCV). pp. 12179–12188 (2021)
Google Scholar
Rau, A., Bhattarai, B., Agapito, L., Stoyanov, D.: Bimodal camera pose prediction for endoscopy. IEEE Transactions on Medical Robotics and Bionics (2023)
Google Scholar
Shao, S., Pei, Z., Chen, W., Zhu, W., Wu, X., Sun, D., Zhang, B.: Self-supervised monocular depth and ego-motion estimation in endoscopy: Appearance flow to the rescue. Medical Image Analysis 77, 102338 (2022)
Article Google Scholar
Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 568–578 (2021)
Google Scholar
Wang, Y., Shi, M., Li, J., Huang, Z., Cao, Z., Zhang, J., Xian, K., Lin, G.: Neural video depth stabilizer. arXiv preprint arXiv:2307.08695 (2023)
Wang, Y., Xu, Z., Wang, X., Shen, C., Cheng, B., Shen, H., Xia, H.: End-to-end video instance segmentation with transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 8741–8750 (2021)
Google Scholar
Yang, Z., Pan, J., Dai, J., Sun, Z., Xiao, Y.: Self-supervised lightweight depth estimation in endoscopy combining cnn and transformer. IEEE Transactions on Medical Imaging (2024)
Google Scholar
Yuan, W., Gu, X., Li, H., Dong, Z., Zhu, S.: Monocular scene reconstruction with 3d sdf transformers. arXiv preprint arXiv:2301.13510 (2023)
Yue, H., Gu, Y.: Tcl: Triplet consistent learning for odometry estimation of monocular endoscope. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 144–153. Springer (2023)
Google Scholar
Zhang, N., Nex, F., Vosselman, G., Kerle, N.: Lite-mono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18537–18546 (2023)
Google Scholar
Zheng, Q., Yu, T., Wang, F.: Dcu-net: Self-supervised monocular depth estimation based on densely connected u-shaped convolutional neural networks. Computers & Graphics 111, 145–154 (2023)
Article Google Scholar

Download references

Acknowledgement

This work was supported partly by the National Natural Science Foundation of China under Grants 82272133 and the Fujian Provincial Technology Innovation Joint Funds under Grant 2019Y9091.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Xiamen University, Xiamen, 361102, China
Wenkang Fan, Wenjing Jiang, Hao Fang & Xiongbiao Luo
National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361102, China
Hao Fang & Xiongbiao Luo
Fujian Medical University Cancer Hospital, Fuzhou, 350014, China
Hong Shi & Jianhua Chen

Authors

Wenkang Fan
View author publications
You can also search for this author in PubMed Google Scholar
Wenjing Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Fang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiongbiao Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiongbiao Luo .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, W., Jiang, W., Fang, H., Shi, H., Chen, J., Luo, X. (2024). Simultaneous Monocular Endoscopic Dense Depth and Odometry Estimation Using Local-Global Integration Networks. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15006. Springer, Cham. https://doi.org/10.1007/978-3-031-72089-5_53

Download citation

DOI: https://doi.org/10.1007/978-3-031-72089-5_53
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72088-8
Online ISBN: 978-3-031-72089-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Simultaneous Monocular Endoscopic Dense Depth and Odometry Estimation Using Local-Global Integration Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy

Self-supervised Cascade Training for Monocular Endoscopic Dense Depth Recovery

Dense Depth Estimation from Stereo Endoscopy Videos Using Unsupervised Optical Flow Methods

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Simultaneous Monocular Endoscopic Dense Depth and Odometry Estimation Using Local-Global Integration Networks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy

Self-supervised Cascade Training for Monocular Endoscopic Dense Depth Recovery

Dense Depth Estimation from Stereo Endoscopy Videos Using Unsupervised Optical Flow Methods

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation