MulTNet: A Multi-Scale Transformer Network for Marine Image Segmentation toward Fishing
<p>The detailed structure diagram of MulTNet.</p> "> Figure 2
<p>Examples of the marine animal segmentation obtained by FCN-8s, U-Net, Attention U-Net, RefineNet, ResUNet, Deeplabv3+, PSPNet, TrSeg, CKDNet, SegFormer, FAT-Net, and the proposed MulTNet.</p> "> Figure 3
<p>Examples of different levels of blur obtained by the proposed MulTNet.</p> "> Figure 4
<p>Examples of the marine animal segmentation obtained by FCN-8s, U-Net, Attention U-Net, RefineNet, ResUNet, Deeplabv3+, PSPNet, TrSeg, SegFormer, CKDNet, FAT-Net, and the proposed MulTNet.</p> "> Figure 5
<p>The training losses of various segmentation networks for two datasets: (<b>a</b>) dataset of marine animals; (<b>b</b>) dataset of ISIC 2018.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Proposed Network
3.1. Dimensionality Reduction CNN Module
3.2. Multi-Scale Transformer Module
3.3. Loss of MulTNet
4. Experiments and Discussion
4.1. Dataset Description
4.2. Model Trainings
4.3. Evaluation Metrics
4.4. Comparison with State-Of-The-Art Methods
4.5. Ablation Experiment
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Han, F.L.; Yao, J.; Zhu, H.; Wang, C. Marine organism detection and classification from underwater vision based on the deep CNN method. Math. Probl. Eng. 2020, 2020, 3937. [Google Scholar] [CrossRef]
- Zhuang, P.; Xing, L.; Liu, Y.; Guo, S.; Qiao, Y. Marine Animal Detection and Recognition with Advanced Deep Learning Models. In CLEF; Working Note; Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences: Shenzhen, China, 2017. [Google Scholar]
- Cao, Z.; Principe, J.C.; Ouyang, B.; Dalgleish, F.; Vuorenkoski, A. Marine animal classification using combined CNN and hand-designed image features. In Proceedings of the OCEANS 2015—MTS/IEEE Washington, Washington, DC, USA, 19–22 October 2015; pp. 1–6. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Xi, D.J.; Qin, Y.; Luo, J.; Pu, H.Y.; Wang, Z.W. Multipath fusion Mask R-CNN with double attention and its application into gear pitting detection. IEEE Trans. Instrum. Meas. 2021, 70, 5006011. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the MICCAI 2015: Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Qin, Y.; Wang, Z.; Xi, D.J. Tree CycleGAN with maximum diversity loss for image augmentation and its application into gear pitting detection. Appl. Soft Comput. 2022, 114, 108130. [Google Scholar] [CrossRef]
- Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention u-net: Learning where to look for the pancreas. arXiv 2018, preprint. arXiv:1804.03999. [Google Scholar]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. Refinenet: Multi-path refine-ment networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1925–1934. [Google Scholar]
- Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
- Jin, H.; Cao, L.; Kan, X.; Sun, W.; Yao, W.; Wang, X. Coal petrography extraction approach based on multiscale mixed-attention-based residual U-net. Meas. Sci. Technol. 2022, 33, 075402. [Google Scholar] [CrossRef]
- Wang, Z.; Guo, J.; Huang, W.; Zhang, S. High-resolution remote sensing image semantic segmentation based on a deep feature aggregation network. Meas. Sci. Technol. 2021, 32, 095003. [Google Scholar] [CrossRef]
- Sang, H.W.; Zhou, Q.H.; Zhao, Y. PCANet: Pyramid convolutional attention network for semantic segmentation. Image Vis. Comput. 2020, 103, 103997. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, preprint. arXiv:1511.07122. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, preprint. arXiv:1412.7062. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Jin, Q.; Cui, H.; Sun, C.; Meng, Z.; Su, R. Cascade knowledge diffusion network for skin lesion diagnosis and segmentation. Appl. Soft. Comput. 2021, 99, 106881. [Google Scholar] [CrossRef]
- Wei, Z.; Zhai, G.; Wang, Z.; Wang, W.; Ji, S. An artificial intelligence segmentation method for recognizing the free surface in a sloshing tank. Ocean Eng. 2021, 220, 108488. [Google Scholar] [CrossRef]
- Yao, H.; Duan, Q.; Li, D.; Wang, J. An improved K-means clustering algorithm for fish image segmentation. Math. Comp. Modell. 2013, 58, 790–798. [Google Scholar] [CrossRef]
- Martin-Abadal, M.; Riutort-Ozcariz, I.; Oliver-Codina, G.; Gonzalez-Cid, Y. A deep learning solution for Posidonia oceanica seafloor habitat multiclass recognition. In Proceedings of the OCEANS 2019-Marseille, Marseille, France, 17–20 June 2019; pp. 1–7. [Google Scholar]
- Martin-Abadal, M.; Guerrero-Font, E.; Bonin-Font, F.; Gonzalez-Cid, Y. Deep semantic segmentation in an AUV for online posidonia oceanica meadows identification. IEEE Access 2018, 6, 60956–60967. [Google Scholar] [CrossRef]
- Sengupta, S.; Ersbøll, B.K.; Stockmarr, A. SeaGrassDetect: A novel method for the detection of seagrass from unlabelled underwater videos. Ecol. Inform. 2020, 57, 101083. [Google Scholar] [CrossRef]
- Wang, L.; Shang, F.; Kong, D. An image processing method for an explosion field fireball based on edge recursion. Meas. Sci. Technol. 2022, 33, 095021. [Google Scholar] [CrossRef]
- Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color balance and fusion for underwater image enhancement. IEEE Trans. Image Process. 2017, 27, 379–393. [Google Scholar] [CrossRef]
- Iqbal, K.; Salam, R.A.; Osman, A.; Talib, A.Z. Underwater image enhancement using an integrated colour model. Int. J. Comput. Sci. 2007, 34, 239–244. [Google Scholar]
- Zhao, X.; Jin, T.; Qu, S. Deriving inherent optical properties from background color and underwater image enhancement. Ocean Eng. 2015, 94, 163–172. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, J.; Cao, Y.; Wang, Z. A deep CNN method for underwater image enhancement. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 1382–1386. [Google Scholar]
- Mahmmod, B.M.; Abdulhussain, S.H.; Suk, T.; Hussain, A. Fast Computation of Hahn Polynomials for High Order Moments. IEEE Access 2022, 10, 48719–48732. [Google Scholar] [CrossRef]
- Al-Utaibi, K.A.; Abdulhussain, S.H.; Mahmmod, B.M.; Naser, M.A.; Alsabah, M.; Sait, S.M. Reliable recurrence algorithm for high-order Krawtchouk polynomials. Entropy 2021, 23, 1162. [Google Scholar] [CrossRef] [PubMed]
- Skinner, K.A.; Matthew, J.-R. Underwater image dehazing with a light field camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 62–69. [Google Scholar]
- Bonin, F.; Burguera, A.; Oliver, G. Imaging systems for advanced underwater vehicles. J. Marit. Res. 2011, 8, 65–86. [Google Scholar]
- Eleftherakis, D.; Vicen-Bueno, R. Sensors to increase the security of underwater communication cables: A review of underwater monitoring sensors. Sensors 2020, 20, 737. [Google Scholar] [CrossRef] [PubMed]
- Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE T. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
- Duarte, A.; Codevilla, F.; Gaya, J.D.O.; Botelho, S.S. A dataset to evaluate underwater image restoration methods. In Proceedings of the OCEANS 2016-Shanghai, Shanghai, China, 10–123 April 2016; pp. 1–6. [Google Scholar]
- Radolko, M.; Farhadifard, F.; Von Lukas, U.F. Dataset on underwater change detection. In Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 September 2016; pp. 1–8. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. Adv. Neural. Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, preprint. arXiv:1810.04805. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, preprint. arXiv:1910.13461. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Amodei, D. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv 2019, preprint. arXiv:1910.10683. [Google Scholar]
- Wang, Y.; Xu, Z.; Wang, X.; Shen, C.; Cheng, B.; Shen, H.; Xia, H. End-to-end video instance segmentation with transformers. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 8741–8750. [Google Scholar]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, preprint. arXiv:2102.04306. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Beal, J.; Kim, E.; Tzeng, E.; Park, D.H.; Zhai, A.; Kislyuk, D. Toward transformer-based object detection. arXiv 2020, preprint. arXiv:2012.09958. [Google Scholar]
- Zhang, Q.; Yang, Y. ResT: An efficient transformer for visual recognition. arXiv 2021, preprint. arXiv:2105.13677. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, preprint. arXiv:2010.11929. [Google Scholar]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv 2021, preprint. arXiv:2103.14030. [Google Scholar]
- Xie, Y.; Zhang, J.; Shen, C.; Xia, Y. CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2021; pp. 171–180. [Google Scholar]
- Jin, Y.; Han, D.; Ko, H. TrSeg: Transformer for semantic segmentation. Pattern Recogn. Lett. 2021, 148, 29–35. [Google Scholar] [CrossRef]
- Wu, H.; Chen, S.; Chen, G.; Wang, W.; Lei, B.; Wen, Z. FAT-Net: Feature adaptive transformers for automated skin lesion segmentation. Med. Image Anal. 2022, 76, 102327. [Google Scholar] [CrossRef]
- Qian, Q.; Qin, Y.; Wang, Y.; Liu, F. A new deep transfer learning network based on convolutional auto-encoder for mechanical fault diagnosis. Measurement 2021, 178, 109352. [Google Scholar] [CrossRef]
- Xiang, S.; Qin, Y.; Luo, J.; Pu, H.; Tang, B. Multicellular LSTM-based deep learning model for aero-engine remaining useful life prediction. Reliab. Eng. Syst. Saf. 2021, 216, 107927. [Google Scholar] [CrossRef]
- Liu, R.; Fan, S.; Zhu, M.; Hou, M.; Luo, Z. Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light. IEEE Trans. Circ. Syst. Vid. 2020, 30, 4861–4875. [Google Scholar] [CrossRef]
- Xi, D.; Qin, Y.; Wang, S. YDRSNet: An integrated Yolov5-Deeplabv3+ real-time segmentation network for gear pitting measurement. J. Intell. Manuf. 2021, 1–15. [Google Scholar] [CrossRef]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–120901. [Google Scholar]
Methodology | Year | Highlights | Limitations |
---|---|---|---|
FCN [4]: Fully convolutional network | 2015 | It is the first time CNN is used to extract features in the field of semantic segmentation. | This model has poor robustness on the image detail and does not consider the relationship between pixels. |
U-Net [6]: U-shaped end-to-end network with skip connection. | 2015 | This model exploits skip connections between encoder and decoder to decrease the loss of context information. | This model generally shows poor performance in capturing details and explicitly building long-range dependency. |
Attention U-Net [8]: Extension of standard U-Net model with attention mechanism. | 2018 | Suppress irrelevant areas in the input image and highlight the salient features of specific local areas. | Attention gates (AGS) mainly focus on extracting the spatial information of the region of interest, and the ability to extract less relevant local regions is poor. |
RefineNet [9]: a generic multi-path refinement network. | 2016 | It effectively exploits features in the downsampling process to achieve high-resolution prediction using long-distance residual connectivity. | This model occupies large computing resources, resulting in low training speed and, secondarily, requires pre-trained weights on its backbones. |
ResUNet [10]: Extension of standard U-Net model with residual units. | 2017 | Residual units simplify the training process of deep networks and skip connections could facilitate information propagation. | This model cannot establish the dependency between pixels, which shows poor segmentation performance in blurred images. |
Deeplabv3+ [16]: A segmentation network composed of atrous spatial pyramid pooling and decoder modules. | 2018 | This model can capture sharper object boundaries and extract multi-scale contextual information. Moreover, the resolution of the feature map of the encoder module can be arbitrarily controlled by atrous convolution. | It adopts atrous convolution, which results in the loss of spatially continuous information. |
PSPNet [17]: a pyramid scene parsing network embedding context features with PPM. | 2017 | It can aggregate the context information of different regions to improve the global information extraction ability | It takes a long time to train and performs relatively poorly in detail handling. |
CKDNet [18]: a cascade knowledge diffusion network composed of coarse-level segmentation, classification, and fine-level segmentation. | 2020 | This model adopts knowledge transfer and diffusion strategies to aggregate semantic information from different tasks to boost segmentation performance. | This model consumes a lot of computing resources, resulting in low training speed. |
U-Net-FS [19]: A optimal U-Net for free surface segmentation | 2020 | The outcomes of experiments show U-Net-FS can capture numerous types of free surfaces with a high dice accuracy of over 0.94. | This model can only focus on local information and be trained with a single scale, so it cannot handle the change of image size well. |
The fish image segmentation method is combined with the K-means clustering segmentation algorithm and mathematical morphology [20]. | 2013 | The traditional K-means algorithm is improved by using the optimal number of clusters based on the peak number of the image gray histogram. | It is sensitive to the selection of the initial value K and outliers, which will lead to poor segmentation performance. |
Transunet [43]: Extension of standard U-Net model with transformer. | 2021 | This model can extract abundant global context by converting image features into sequences and exploiting the low-level CNN spatial information via a u-shaped architectural design. | This model exploits transposed convolutional layers to restore feature maps. However, it usually results in the chess-board effect, which shows discontinuous predictions among adjacent pixels. |
Swin transformer [49]: a hierarchical vision transformer with shifted windows | 2021 | The shift window scheme improves efficiency via limiting self-attention computation to non-overlapping local windows. Experimental results show SOTA performance in various tasks, including classification, detection, and segmentation tasks. | Constraint by shift window operation. It is time-consuming that this model must be modified and retrained according to different sizes of input. |
CoTr [50]: A hybrid network which combines CNN and transformer for 3D medical image segmentation. | 2021 | It exploits a deformable self-attention mechanism to decrease the spatial and computational complexities of building long-range relationships on multi-scale feature maps. | Since both transformer and 3D volumetric data require a large amount of GPU memory, this method divides the data into small patches and deals with them one at a time, which causes the loss of features from other patches. |
TrSeg [51]: A transformer-based segmentation architecture for multi-scale feature extraction. | 2021 | Different from existing networks for multi-scale feature extraction, this model incorporates a transformer to generate dependencies on original context information, which can adaptively extract multi-scale information well. | This model has relatively few limitations on low-level semantic information extraction. |
FAT-Net [52]: Feature adaptive transformer network for skin lesion segmentation. | 2021 | This model exploits both CNN and transformer encoder branches to extract rich local features and capture the important global context information. | This segmentation network still has limitations when the color change of the image is too complex, and the contrast of the image is too low. |
Parameter | Configuration |
---|---|
Optimizer | SGD |
Learning rate | 0.03 |
Weight decay | 0.01 |
Momentum | 0.9 |
Batch size | 6 |
Image size | 256*256 |
Activation function | GELU |
Dropout | 0.3 |
Model | MPA | IOU (Sea Urchin) | IOU (Sea Cucumber) | IOU (Starfish) | MIOU | Acc |
---|---|---|---|---|---|---|
FCN-8s | 36.87% | 27.41% | 1.68% | 15.99% | 35.48% | 97.05% |
U-Net | 40.91% | 29.14% | 3.45% | 26.47% | 39.01% | 96.97% |
Attention U-Net | 41.54% | 30.87% | 3.87% | 27.28% | 39.76% | 96.95% |
RefineNet | 43.14% | 32.63% | 4.02% | 31.32% | 41.21% | 96.89% |
ResUnet | 43.79% | 33.02% | 4.56% | 33.29% | 41.96% | 96.93% |
DeepLabv3+ | 44.24% | 34.21% | 5.22% | 33.23% | 42.40% | 96.96% |
CKDNet | 46.05% | 39.26% | 4.31% | 33.58% | 43.54% | 97.09% |
PSPNet | 45.38% | 37.67% | 5.62% | 34.13% | 43.62% | 97.07% |
TrSeg | 46.32% | 40.67% | 4.71% | 33.47% | 43.98% | 97.11% |
SegFormer [50] | 46.67% | 39.73% | 5.91% | 33.51% | 44.06% | 97.10% |
FAT-Net | 47.17% | 42.97% | 5.06% | 34.35% | 44.87% | 97.16% |
MulTNet | 47.69% | 44.14% | 5.21% | 35.93% | 45.63% | 97.27% |
Model | MPA | IOU | MIOU | Acc |
---|---|---|---|---|
FCN-8s | 89.98% | 67.47% | 79.03% | 92.12% |
U-Net | 92.19% | 76.05% | 84.59% | 94.36% |
Attention U-Net | 93.05% | 76.35% | 84.76% | 94.41% |
RefineNet | 92.85% | 76.64% | 84.83% | 94.32% |
ResUnet | 93.17% | 77.27% | 85.37% | 94.64% |
DeepLabv3+ | 93.31% | 78.21% | 85.99% | 94.92% |
PSPNet | 93.08% | 79.52% | 86.77% | 95.15% |
CKDNet | 93.13% | 79.90% | 86.99% | 95.21% |
SegFormer | 93.40% | 80.19% | 87.21% | 95.32% |
TrSeg | 93.22% | 80.37% | 87.31% | 95.33% |
FAT-Net | 93.65% | 81.09% | 87.80% | 95.56% |
MulTNet | 94.04% | 81.48% | 88.09% | 95.71% |
Model | Total Params | Training Speed (s/Iteration) | Testing Speed (s/Image) |
---|---|---|---|
FCN-8s | 134 M | 0.725 | 0.291 |
RefineNet | 85 M | 0.381 | 0.148 |
CKDNet | 52 M | 0.342 | 0.135 |
PSPNet | 71 M | 0.317 | 0.124 |
Segformer | 64 M | 0.312 | 0.127 |
TrSeg | 74 M | 0.293 | 0.113 |
ResUnet | 67 M | 0.285 | 0.110 |
DeepLabv3+ | 55 M | 0.238 | 0.096 |
AttentionU-Net | 42 M | 0.203 | 0.092 |
FAT-Net | 30 M | 0.194 | 0.089 |
U-Net | 32 M | 0.167 | 0.075 |
MulTNet | 59 M | 0.183 | 0.087 |
ISIC Dataset | Marine Animal Dataset | |||||
---|---|---|---|---|---|---|
Model | MIOU | MPA | Acc | MIOU | MPA | Acc |
Transformer | 73.89% | 89.80% | 90.34% | 23.99% | 25.00% | 95.97% |
DRCM-Transformer | 85.36% | 92.29% | 94.53% | 40.31% | 42.33% | 96.97% |
MulTNet | 88.09% | 94.04% | 95.71% | 45.63% | 47.69% | 97.27% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, X.; Qin, Y.; Xi, D.; Ming, R.; Xia, J. MulTNet: A Multi-Scale Transformer Network for Marine Image Segmentation toward Fishing. Sensors 2022, 22, 7224. https://doi.org/10.3390/s22197224
Xu X, Qin Y, Xi D, Ming R, Xia J. MulTNet: A Multi-Scale Transformer Network for Marine Image Segmentation toward Fishing. Sensors. 2022; 22(19):7224. https://doi.org/10.3390/s22197224
Chicago/Turabian StyleXu, Xi, Yi Qin, Dejun Xi, Ruotong Ming, and Jie Xia. 2022. "MulTNet: A Multi-Scale Transformer Network for Marine Image Segmentation toward Fishing" Sensors 22, no. 19: 7224. https://doi.org/10.3390/s22197224
APA StyleXu, X., Qin, Y., Xi, D., Ming, R., & Xia, J. (2022). MulTNet: A Multi-Scale Transformer Network for Marine Image Segmentation toward Fishing. Sensors, 22(19), 7224. https://doi.org/10.3390/s22197224