Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2208.08315 (eess)

[Submitted on 17 Aug 2022 (v1), last revised 22 Aug 2022 (this version, v3)]

Title:Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation

Authors:Chengxi Zeng, Xinyu Yang, Majid Mirmehdi, Alberto M Gambaruto, Tilo Burghardt

View PDF

Abstract:We propose Video-TransUNet, a deep architecture for instance segmentation in medical CT videos constructed by integrating temporal feature blending into the TransUNet deep learning framework. In particular, our approach amalgamates strong frame representation via a ResNet CNN backbone, multi-frame feature blending via a Temporal Context Module (TCM), non-local attention via a Vision Transformer, and reconstructive capabilities for multiple targets via a UNet-based convolutional-deconvolutional architecture with multiple heads. We show that this new network design can significantly outperform other state-of-the-art systems when tested on the segmentation of bolus and pharynx/larynx in Videofluoroscopic Swallowing Study (VFSS) CT sequences. On our VFSS2022 dataset it achieves a dice coefficient of 0.8796 and an average surface distance of 1.0379 pixels. Note that tracking the pharyngeal bolus accurately is a particularly important application in clinical practice since it constitutes the primary method for diagnostics of swallowing impairment. Our findings suggest that the proposed model can indeed enhance the TransUNet architecture via exploiting temporal information and improving segmentation performance by a significant margin. We publish key source code, network weights, and ground truth annotations for simplified performance reproduction.

Comments:	Accepted by International Conference on Machine Vision 2022
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2208.08315 [eess.IV]
	(or arXiv:2208.08315v3 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2208.08315

Submission history

From: Chengxi Zeng [view email]
[v1] Wed, 17 Aug 2022 14:28:58 UTC (7,815 KB)
[v2] Thu, 18 Aug 2022 10:09:31 UTC (7,815 KB)
[v3] Mon, 22 Aug 2022 13:51:04 UTC (7,815 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators