Abstract
Handling wireless capsule endoscopy (WCE) de-redundancy is a challenging task. This paper proposes a scheme, called SS-VCF-Der, to consider applying a flow field estimation between two successive WCE frames to WCE imaging motion analysis and then address the WCE de-redundancy problem based on the results of the motion analysis. To this end, we intend to exploit a self-supervised technique to learn interframe visual correspondence representations from large amounts of raw WCE videos without manual human supervision, and predict the flow field. Our key idea is to use the natural spatial-temporal coherence in color and cycle consistency in time in WCE videos as free supervisory signal to learn WCE visual correspondence relations from scratch. We call this procedure self-supervised visual correspondence flow learning (SS-VCF). At training time, we use three losses: forward-backward cycle-consistency loss, visual similarity loss, and color loss, to train and optimize model. At test time, we use the acquired representation to generate a flow field for analyzing pixel movement between two successive WCE frames. Furthermore, according to the resulting flow field estimation, we compute the motion intensity of motion fields between two successive frames, and use our proposed de-redundancy method, namely SS-VCF-MI, to select some frames as key ones with distinct scene changes in local neighborhood so as to achieve the purpose of de-redundancy. Extensive experiments on our collected WCE-2019-Video dataset show that our scheme can achieve a promising result, verifying its effectiveness on the visual correspondence representation and redundancy removal for WCE videos.
Similar content being viewed by others
Notes
We do not distinguish between the term unsupervised and self-supervised, as both refer to learning without human supervision. But in this paper, we use the term of self-supervised learning for WCE video representation.
References
Al-shebani Q, Premaratne P, McAndrew DJ, Vial PJ, Abey S (2019) A frame reduction system based on a color structural similarity (css) method and bayer images analysis for capsule endoscopy. Artif Intell Med 94:18–27. https://doi.org/10.1016/j.artmed.2018.12.008
Baker S, Roth S, Scharstein D, Black MJ, Lewis JP, Szeliski R: A database and evaluation methodology for optical ow. In: 2007 IEEE 11th International Conference on Computer Vision, pp 1–8 (2007). https://doi.org/10.1109/ICCV.2007.4408903
Beg S, Card T, Sidhu R, Wronska E, Ragunath K, Ching H-L, Koulaouzidis A, Yung D, Panter S, Mcalindon M, Johnson M, Kurup A, Shonde A, San-Juan Acosta M, Sansone S, Simmon E, Thurston V, Healy A, Chetcuti Zammit S, Schembri J, Lau MS, Lam C, Nizamuddin M, Baxter A, Patel J, Archer T, Oppong P, Phillips F, Dorn T, Fateen W, White J, Budihal S, Tan H, Tiwari R (2021) The impact of reader fatigue on the accuracy of capsule endoscopy interpretation. Digestive and Liver Disease 53(8):1028–1033. https://doi.org/10.1016/j.dld.2021.04.024
Biniaz A, Zoroo RA, Sohrabi MR (2020) Automatic reduction of wireless capsule endoscopy reviewing time based on factorization analysis. Biomed Signal Process Control 59:101897. https://doi.org/10.1016/j.bspc.2020.101897
Butler DJ, Wul J, Stanley GB, Black MJ: A naturalistic open source movie for optical ow evaluation. In: Proceedings of the 12th European Conference on Computer Vision - Volume Part VI. ECCV’12, pp 611–625. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_44
Chen J, Zou Y, Wang Y: Wireless capsule endoscopy video summarization: A learning approach based on siamese neural network and support vector machine. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp 1303–1308 (2016). https://doi.org/10.1109/ICPR.2016.7899817
Dalal N, Triggs B: Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 1 - Volume 01. CVPR ’05, pp 886–893. IEEE Computer Society, USA (2005). https://doi.org/10.1109/CVPR.2005.177
Divakaran A, Peker K, Huifang S: A region based descriptor for spatial distribution of motion activity for compressed video. In: Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101), vol 2, pp 287–2902 (2000). https://doi.org/10.1109/ICIP.2000.899359
Divakaran A, Sun H: Descriptor for spatial distribution of motion activity for compressed video. In: Storage and Retrieval for Media Databases 2000, vol 3972, pp 392–398. https://doi.org/10.1117/12.373571
Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, v. d. Smagt P, Cremers D, Brox T: Flownet: Learning optical ow with convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766 (2015). https://doi.org/10.1109/ICCV.2015.316
Dray X, Iakovidis D, Houdeville C, Jover R, Diamantis D, Histace A, Koulaouzidis A (2021) Artificial intelligence in small bowel capsule endoscopy - current status, challenges and future promise. J Gastroenterology Hepatology 36(1):12–19. https://doi.org/10.1111/jgh.15341
Drozdzal M, Igual L, Vitrià J, Malagelada C, Azpiroz F, Radeva P: Aligning endoluminal scene sequences in wireless capsule endoscopy. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pp 117–124 (2010). https://doi.org/10.1109/CVPRW.2010.5543456
Dwibedi D, Aytar Y, Tompson J, Sermanet P, Zisserman A: Temporal cycle-consistency learning. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1801–1810 (2019). https://doi.org/10.1109/CVPR.2019.00190
Figueiredo IN, Leal C, Pinto L, Figueiredo PN, Tsai R: Dissimilarity measure of consecutive frames in wireless capsule endoscopy videos: A way of searching for abnormalities. In: 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), pp 702–707 (2017). https://doi.org/10.1109/CBMS.2017.18
Figueiredo IN, Leal C, Pinto L, Figueiredo PN, Tsai R (2018) Hybrid multiscale affine and elastic image registration approach towards wireless capsule endoscope localization. Biomed Signal Process Control 39:486–502. https://doi.org/10.1016/j.bspc.2017.08.019
Fu Y, Liu H, Cheng Y, Yan T, Li T, Meng MQ-: Key-frame selection in wce video based on shot detection. In: Proceedings of the 10th World Congress on Intelligent Control and Automation, pp 5030–5034 (2012). https://doi.org/10.1109/WCICA.2012.6359431
Han K, Rezende RS, Ham B, Wong KK, Cho M, Schmid C, Ponce J: Scnet: Learning semantic correspondence. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 1849–1858 (2017). https://doi.org/10.1109/ICCV.2017.203
He K, Zhang X, Ren S, Sun J: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Horn BKP, Schunck BG (1981) Determining optical ow. Artif Intell 17(1):185–203. https://doi.org/10.1016/0004-3702(81)90024-2
Iakovidis DK, Koulaouzidis A (2015) Software for enhanced video capsule endoscopy: challenges for essential progress. Nature Rev Gastroenterology Hepatology 12:172–186. https://doi.org/10.1038/nrgastro.2015.13
Iakovidis DK, Tsevas S, Polydorou A (2010) Reduction of capsule endoscopy reading times by unsupervised image mining. Comput Med Imag Graph 34(6):471–478. https://doi.org/10.1016/j.compmedimag.2009.11.005
Iddan G, Meron G, Glukhovsky A, Swain P (2000) Wireless capsule endoscopy. Nature 405(6785):417–417. https://doi.org/10.1038/35013140
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T: Flownet 2.0: Evolution of optical ow estimation with deep networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1647–1655 (2017). https://doi.org/10.1109/CVPR.2017.179
Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K: Spatial transformer networks (2015) arXiv:1506.02025
Jani KK, Srivastava R (2019) A survey on medical image analysis in capsule endoscopy. Current Med Imag Rev 15(7):622–636. https://doi.org/10.2174/1573405614666181102152434
Karargyris A, Bourbakis N: A video-frame based registration using segmentation and graph connectivity for wireless capsule endoscopy. In: 2009 IEEE/NIH Life Science Systems and Applications Workshop, pp 74–79 (2009). https://doi.org/10.1109/LISSA.2009.4906713
Kim S, Min D, Ham B, Lin S, Sohn K (2019) Fcss: Fully convolutional self-similarity for dense semantic correspondence. IEEE Trans Pattern Anal Mach Intell 41(3):581–595. https://doi.org/10.1109/TPAMI.2018.2803169
Kingma DP, Ba J: Adam: A method for stochastic optimization (2014) arXiv:1412.6980
Koulaouzidis A, Dabos K, Philipper M, Toth E, Keuchel M (2021) How should we do colon capsule endoscopy reading: a practical guide. Therapeutic Advances in Gastrointestinal Endoscopy 14:26317745211001984. https://doi.org/10.1177/26317745211001983. (PMID: 33817637)
Lai Z, Xie W: Self-supervised learning for video correspondence ow (2019) arXiv:1905.00875
Lan L, Ye C: Recurrent generative adversarial networks for unsupervised wce video summarization. Knowledge-Based Systems, 106971 (2021). https://doi.org/10.1016/j.knosys.2021.106971
Larsson G, Maire M, Shakhnarovich G: Colorization as a proxy task for visual understanding. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 840–849. IEEE Computer Society, Los Alamitos, CA, USA (2017). https://doi.org/10.1109/CVPR.2017.96
Lee H-G, Choi M-K, Shin B-S, Lee S-C (2013) Reducing redundancy in wireless capsule endoscopy videos. Comput Biology Med 43(6):670–682. https://doi.org/10.1016/j.compbiomed.2013.02.009
Lee J, Kim D, Ponce J, Ham B: Sfnet: Learning object-aware semantic correspondence. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2273–2282 (2019). https://doi.org/10.1109/CVPR.2019.00238
Li C, Hamza AB, Bouguila N, Wang X, Ming F, Xiao G (2014) Online redundant image elimination and its application to wireless capsule endoscopy. Signal Imag Video Process 8(8):1497–1506. https://doi.org/10.1007/s11760-012-0384-3
Liao C, Wang C, Bai J, Lan L, Wu X (2021) Deep learning for registration of region of interest in consecutive wireless capsule endoscopy frames. Comput Method Prog Biomed 208:106189. https://doi.org/10.1016/j.cmpb.2021.106189
Lien G, Liu C, Jiang J, Chuang C, Teng M (2012) Magnetic control system targeted for capsule endoscopic operations in the stomach|design, fabrication, and in vitro and ex vivo evaluations. IEEE Trans Biomed Eng 59(7):2068–2079. https://doi.org/10.1109/TBME.2012.2198061
Li S, Han K, Costain TW, Howard-Jenkins H, Prisacariu V: Correspondence networks with adaptive neighbourhood consensus. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10193–10202 (2020). https://doi.org/10.1109/CVPR42600.2020.01021
Li B, Meng MQ-, Hu C: Motion analysis for capsule endoscopy video segmentation. In: 2011 IEEE International Conference on Automation and Logistics (ICAL), pp 46–51 (2011). https://doi.org/10.1109/ICAL.2011.6024682
Li B, Meng MQ-, Zhao Q: Wireless capsule endoscopy video summary. In: 2010 IEEE International Conference on Robotics and Biomimetics, pp 454–459 (2010). https://doi.org/10.1109/ROBIO.2010.5723369
Li B, Meng MQ-: Capsule endoscopy video boundary detection. In: 2011 IEEE International Conference on Information and Automation, pp 373–378 (2011). https://doi.org/10.1109/ICINFA.2011.5949020
Liu C, Yuen J, Torralba A (2011) Sift ow: Dense correspondence across scenes and its applications. IEEE Trans Pattern Anal Mach Intell 33(5):978–994. https://doi.org/10.1109/TPAMI.2010.147
Liu H, Pan N, Lu H, Song E, Wang Q, Hung CC (2013) Wireless capsule endoscopy video reduction based on camera motion estimation. J Digital Imag. https://doi.org/10.1007/s10278-012-9519-x
Liu X, Lee J, Jin H: Learning video representations from correspondence proposals. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 4268–4276 (2019). https://doi.org/10.1109/CVPR.2019.00440
Li H, Zhang Y, Yang M, Men Y, Chao H: A rapid abnormal event detection method for surveillance video based on a novel feature in compressed domain of hevc. In: 2014 IEEE International Conference on Multimedia and Expo (ICME), pp 1–6 (2014). https://doi.org/10.1109/ICME.2014.6890212
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Lucas BD, Kanade T: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th International Joint Conference on Artificial Intelligence - Volume 2. IJCAI’81, pp 674–679. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1981). https://doi.org/10.5555/1623264.1623280
Mahasseni B, Lam M, Todorovic S: Unsupervised video summarization with adversarial lstm networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2982–2991 (2017). https://doi.org/10.1109/CVPR.2017.318
Meister S, Hur J, Roth S: Unflow: Unsupervised learning of optical flow with a bidirectional census loss (2017) arXiv:1711.07837
Muhammad K, Khan S, Kumar N, Del Ser J, Mirjalili S (2020) Visionbased personalized wireless capsule endoscopy for smart healthcare: Taxonomy, literature review, opportunities and challenges. Future Generation Comput Syst 113:266–280. https://doi.org/10.1016/j.future.2020.06.048
Nie R, Yang H, Peng H, Luo W, Fan W, Zhang J, Liao J, Huang F, Xiao Y: Application of Structural Similarity Analysis of Visually Salient Areas and Hierarchical Clustering in the Screening of Similar Wireless Capsule Endoscopic Images. arXiv e-prints, 2004–02805 (2020) arXiv:2004.02805 [eess.IV]
Paszke A, am Gross, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A: Automatic di erentiation in pytorch. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, pp 1–4 (2017)
Rocco I, Arandjelović R, Sivic J (2019) Convolutional neural network architecture for geometric matching. IEEE Trans Pattern Anal Mach Intell 41(11):2553–2567. https://doi.org/10.1109/TPAMI.2018.2865351
Rondonotti E, Pennazio M, Toth E, Koulaouzidis A (2020) How to read small bowel capsule endoscopy: a practical guide for everyday use. Endoscopy Int Open 8(10):1220–1224. https://doi.org/10.1055/a-1210-4830
Schoeffmann K, Fabro MD, Szkaliczki T, aszlo Böszörmenyi, Keckstein J (2015) Keyframe extraction in endoscopic video. J Multimed Tools Appl 74:11187–11206. https://doi.org/10.1007/s11042-014-2224-7
Spyrou E, Iakovidis DK (2013) Video-based measurements for wireless capsule endoscope tracking. Measure Sci Technol 25(1):015002. https://doi.org/10.1088/0957-0233/25/1/015002
Spyrou E, Diamantis D, Iakovidis DK: Panoramic visual summaries for efficient reading of capsule endoscopy videos. In: 2013 8th International Workshop on Semantic and Social Media Adaptation and Personalization, pp 41–46 (2013). https://doi.org/10.1109/SMAP.2013.21
Sushma B, Aparna P (2021) Summarization of wireless capsule endoscopy video using deep feature matching and motion analysis. IEEE Access 9:13691–13703. https://doi.org/10.1109/ACCESS.2020.3044759
Vondrick C, Shrivastava A, Fathi A, Guadarrama S, Murphy K: Tracking emerges by colorizing videos. In: Computer Vision - ECCV 2018, pp 402–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_24
Wang X, Jabri A, Efros AA: Learning correspondence from the cycleconsistency of time. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 2561–2571 (2019). https://doi.org/10.1109/CVPR.2019.00267
Wang N, Song Y, Ma C, Zhou W, Liu W, Li H: Unsupervised deep tracking. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 1308–1317 (2019). https://doi.org/10.1109/CVPR.2019.00140
Xu Y, Li K, Zhao Z, Meng MQ-: A novel system for closed-loop simultaneous magnetic actuation and localization of wce based on external sensors and rotating actuation. IEEE Trans Autom Sci Eng, 1–13 (2020). https://doi.org/10.1109/TASE.2020.3013954
Yuan Y, Meng MQ-: Hierarchical key frames extraction for wce video. In: 2013 IEEE International Conference on Mechatronics and Automation, pp 225–229 (2013). https://doi.org/10.1109/ICMA.2013.6617922
Zhang K, Chao W, Sha F, Grauman K: Video summarization with long short-term memory (2016) arXiv:1605.08110
Zhang R, Isola P, Efros AA: Colorful image colorization. In: Computer Vision - ECCV 2016, pp 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Acknowledgements
This work is supported in part by the Scientific Research Foundation of Chongqing University of Technology (0103210650), in part by the National Key Research and Development Program of China (Grant No. 2017YFB0802400), in part by the National Natural Science Foundation of China research fund (61672115), in part by the Chongqing Social Undertakings and Livelihood Security Science and Technology Innovation Project Special Program (cstc2017shmsA30003), and in part by the Humanity and Social Science Youth Foundation, Ministry of Education (Grant No. 17YJCZH043). In addition, we thank Juan Zhou and her colleagues from the Second Affiliated Hospital, Third Military Medical University, for the helpful discussions and suggestions. We also thank the Chongqing Jinshan Science & Technology (Group) Co., Ltd., for providing vital support with raw WCE videos. We would also like to thank the anonymous reviewers for their helpful comments which have led to many improvements in this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lan, L., Ye, C., Liao, C. et al. De-redundancy in wireless capsule endoscopy video sequences using correspondence matching and motion analysis. Multimed Tools Appl 83, 21171–21195 (2024). https://doi.org/10.1007/s11042-023-15530-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15530-7