Attention-Based Background/Foreground Monocular Depth Prediction Model Using Image Segmentation
<p>In scenes with significant variations in depth, the multi-scale depth estimation technique SACRF [<a href="#B35-applsci-12-11186" class="html-bibr">35</a>] cannot adequately predict individual background/foreground depth maps, whereas our model could improve individual foreground and background depth map prediction using an attention mechanism and image segmentation.</p> "> Figure 2
<p>The architecture of our proposed model.</p> "> Figure 3
<p>The foreground depth prediction model architecture.</p> "> Figure 4
<p>Distribution mapA of the areas of image segments.</p> "> Figure 5
<p>A visualization of the results from our background depth prediction model using the different attention mechanisms: AM, SE, and CBAM.</p> "> Figure 6
<p>A visualization of the results from our foreground depth prediction model using the different attention mechanisms: AM, SE, and CBAM.</p> "> Figure 7
<p>A visualization of the results from the different stitching strategies: direct stitching and CNN stitching.</p> "> Figure 8
<p>A visualization of the comparison results.</p> ">
Abstract
:1. Introduction
- The addition of attention mechanisms to extract meaningful features to improve depth estimation;
- A segmentation technique that can segment foreground regions via a cluster method and use the segmented regions to improve foreground depth estimation;
- A new architecture that can individually predict foreground and background maps while avoiding effects from significant differences in field depth, especially in outdoor scenes.
2. Related Works
2.1. Supervised Learning
2.2. Unsupervised Learning
2.3. Joint Learning
2.4. Depth Estimation Based on Attention Mechanisms
3. Method
3.1. Attention-Based Feature Extraction
3.2. Background Depth Prediction
3.3. Foreground Segmentation
3.4. Foreground Depth Prediction
3.5. Depth Map Stitching
4. Experiments
4.1. Dataset
4.2. Implementation Details
4.2.1. Data Augmentation
4.2.2. Training Settings
4.2.3. Foreground and Background Region Classification
4.2.4. Configuration of Attention Mechanisms
4.3. Evaluation Metrics
4.4. Ablation Experiment
4.4.1. Attention Mechanism Selection
4.4.2. Depth Map Stitching
4.5. Performance Comparison
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, Y.; Jiang, J.; Sun, J.; Bai, L.; Wang, Q. A Survey of Depth Estimation Based on Computer Vision. In Proceedings of the IEEE 5th International Conference on Data Science Cyberspace, Hong Kong, China, 27–29 July 2020; pp. 135–141. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. In Proceedings of the Neural Information Processing Systems, Montreal, QC, USA, 8–13 December 2014; pp. 2366–2374. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting Depth, Surface Normals and Semantic Labels With a Common Multi-Scale Convolutional Architecture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2650–2658. [Google Scholar]
- Hua, Y.; Tian, H. Depth estimation with convolutional conditional random field network. Neurocomputing 2016, 214, 546–554. [Google Scholar] [CrossRef]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper Depth Prediction with Fully Convolutional Residual Networks. In Proceedings of the 4th International Conference on 3D Vision, Stanford, CA, USA, 25–28 October 2016; pp. 239–248. [Google Scholar]
- Hu, J.; Ozay, M.; Zhang, Y.; Okatani, T. Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1043–1051. [Google Scholar]
- Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Tao, D. Deep Ordinal Regression Network for Monocular Depth Estimation. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2002–2011. [Google Scholar]
- Li, B.; Shen, C.; Dai, Y.; van den Hengel, A.; He, M. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1119–1127. [Google Scholar]
- Mousavian, A.; Pirsiavash, H.; Košecká, J. Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks. In Proceedings of the 4th International Conference on 3D Vision, Stanford, CA, USA, 25–28 October 2016; pp. 611–619. [Google Scholar]
- Lee, J.H.; Heo, M.; Kim, K.R.; Kim, C.S. Single-Image Depth Estimation Based on Fourier Domain Analysis. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 330–339. [Google Scholar]
- Cheng, X.; Wang, P.; Yang, R. Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 108–125. [Google Scholar]
- Lee, J.H.; Han, M.K.; Ko, D.W.; Suh, I.H. From big to small: Multi-scale local planar guidance for monocular depth estimation. arXiv 2019, arXiv:1907.10326. [Google Scholar]
- Zuo, Y.; Fang, Y.; Yang, Y.; Shang, X.; Wu, Q. Depth Map Enhancement by Revisiting Multi-Scale Intensity Guidance Within Coarse-to-Fine Stages. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4676–4687. [Google Scholar] [CrossRef]
- Song, M.; Lim, S.; Kim, W. Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4381–4393. [Google Scholar] [CrossRef]
- Bian, J.; Li, Z.; Wang, N.; Zhan, H.; Shen, C.; Cheng, M.M.; Reid, I. Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 35–45. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Brostow, G.J. Unsupervised Monocular Depth Estimation With Left-Right Consistency. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6602–6611. [Google Scholar]
- Garg, R.; Bg, V.K.; Carneiro, G.; Reid, I. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 740–756. [Google Scholar]
- Luo, Y.; Ren, J.; Lin, M.; Pang, J.; Sun, W.; Li, H.; Lin, L. Single View Stereo Matching. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 155–163. [Google Scholar]
- Watson, J.; Firman, M.; Brostow, G.J.; Turmukhambetov, D. Self-Supervised Monocular Depth Hints. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 2162–2171. [Google Scholar]
- Casser, V.; Pirk, S.; Mahjourian, R.; Angelova, A. Depth Prediction without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos. AAAI Conf. Artificial Intell. 2019, 33, 8001–8008. [Google Scholar] [CrossRef] [Green Version]
- Poggi, M.; Tosi, F.; Mattoccia, S. Learning Monocular Depth Estimation with Unsupervised Trinocular Assumptions. In Proceedings of the 6th International Conference on 3D Vision, Verona, Italy, 5–8 September 2018; pp. 324–333. [Google Scholar]
- Zhou, T.; Brown, M.; Snavely, N.; Lowe, D.G. Unsupervised Learning of Depth and Ego-Motion from Video. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6612–6619. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Firman, M.; Brostow, G.J. Digging Into Self-Supervised Monocular Depth Estimation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 3827–3837. [Google Scholar]
- Guizilini, V.; Ambrus, R.; Pillai, S.; Raventos, A.; Gaidon, A. 3D Packing for Self-Supervised Monocular Depth Estimation. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2482–2491. [Google Scholar]
- Chen, L.; Yang, Z.; Ma, J.; Luo, Z. Driving Scene Perception Network: Real-Time Joint Detection, Depth Estimation and Semantic Segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1283–1291. [Google Scholar]
- Nekrasov, V.; Dharmasiri, T.; Spek, A.; Drummond, T.; Shen, C.; Reid, I. Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations. In Proceedings of the International Conference on Robotics and Automation, Montreal, QC, Canada, 20–24 May 2019; pp. 7101–7107. [Google Scholar]
- Chen, P.Y.; Liu, A.H.; Liu, Y.C.; Wang, Y.C.F. Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Wang, P.; Shen, X.; Lin, Z.; Cohen, S.; Price, B.; Yuille, A. Towards unified depth and semantic prediction from a single image. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2800–2809. [Google Scholar]
- Zhang, Z.; Cui, Z.; Xu, C.; Yan, Y.; Sebe, N.; Yang, J. Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4101–4110. [Google Scholar]
- He, L.; Lu, J.; Wang, G.; Song, S.; Zhou, J. SOSD-Net: Joint semantic object segmentation and depth estimation from monocular images. Neurocomputing 2021, 440, 251–263. [Google Scholar] [CrossRef]
- Zhu, S.; Brazil, G.; Liu, X. The Edge of Depth: Explicit Constraints Between Segmentation and Depth. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Liu, J.; Wang, Y.; Li, Y.; Fu, J.; Li, J.; Lu, H. Collaborative Deconvolutional Neural Networks for Joint Depth Estimation and Semantic Segmentation. IEEE Trans. Neural Netw. Learning Syst. 2018, 29, 5655–5666. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Cui, Z.; Xu, C.; Jie, Z.; Li, X.; Yang, J. Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Xu, D.; Ricci, E.; Ouyang, W.; Wang, X.; Sebe, N. Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 161–169. [Google Scholar]
- Xu, D.; Wang, W.; Tang, H.; Liu, H.; Sebe, N.; Ricci, E. Structured Attention Guided Convolutional Neural Fields for Monocular Depth Estimation. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3917–3925. [Google Scholar]
- Lee, S.; Lee, J.; Kim, B.; Yi, E.; Kim, J. Patch-wise attention network for monocular depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; Volume 35, pp. 1873–1881. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
- Saxena, A.; Sun, M.; Ng, A.Y. Make3D: Learning 3D Scene Structure from a Single Still Image. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 824–840. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhao, C.; Sun, Q.; Zhang, C.; Tang, Y.; Qian, F. Monocular depth estimation based on deep learning: An overview. Sci. China Technol. Sci. 2020, 63, 1612–1627. [Google Scholar] [CrossRef]
- Yin, W.; Liu, Y.; Shen, C.; Yan, Y. Enforcing Geometric Constraints of Virtual Normal for Depth Prediction. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5683–5692. [Google Scholar]
- Kuznietsov, Y.; Stückler, J.; Leibe, B. Semi-Supervised Deep Learning for Monocular Depth Map Prediction. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2215–2223. [Google Scholar]
- Guo, M.; Xu, T.; Liu, J.; Liu, Z.; Jiang, P.; Mu, T.; Zhang, S.; Martin, R.R.; Cheng, M.; Hu, S. Attention mechanisms in computer vision: A survey. arXiv 2021, arXiv:2111.07624. [Google Scholar] [CrossRef]
- Liu, S.; Johns, E.; Davison, A.J. End-To-End Multi-Task Learning With Attention. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1871–1880. [Google Scholar]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Yan, J.; Zhao, H.; Bu, P.; Jin, Y. Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation. In Proceedings of the International Conference on 3D Vision, Prague, Czech Republic, 12–15 September 2021; pp. 464–473. [Google Scholar]
- Yang, G.; Rota, P.; Alameda-Pineda, X.; Xu, D.; Ding, M.; Ricci, E. Variational Structured Attention Networks for Deep Visual Representation Learning. IEEE Trans. Image Process. 2022. Early Access. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 11531–11539. [Google Scholar]
- Zhang, H.; Dana, K.; Shi, J.; Zhang, Z.; Wang, X.; Tyagi, A.; Agrawal, A. Context Encoding for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7151–7160. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Gao, Z.; Xie, J.; Wang, Q.; Li, P. Global Second-Order Pooling Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3019–3028. [Google Scholar]
- Lee, H.; Kim, H.E.; Nam, H. SRM: A Style-Based Recalibration Module for Convolutional Neural Networks. In Proceedings of the International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 1854–1862. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Yuan, Y.; Huang, L.; Guo, J.; Zhang, C.; Chen, X.; Wang, J. Ocnet: Object context network for scene parsing. arXiv 2018, arXiv:1809.00916. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-Local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Xie, S.; Liu, S.; Chen, Z.; Tu, Z. Attentional ShapeContextNet for Point Cloud Recognition. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4606–4615. [Google Scholar]
- Yan, X.; Zheng, C.; Li, Z.; Wang, S.; Cui, S. PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 5588–5597. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 568–578. [Google Scholar]
- Chen, L.; Zhang, H.; Xiao, J.; Nie, L.; Shao, J.; Liu, W.; Chua, T.S. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6298–6306. [Google Scholar]
- Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual Attention Network for Image Classification. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6450–6458. [Google Scholar]
- Liu, J.J.; Hou, Q.; Cheng, M.M.; Wang, C.; Feng, J. Improving Convolutional Networks With Self-Calibrated Convolutions. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10093–10102. [Google Scholar]
- Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to Attend: Convolutional Triplet Attention Module. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 3139–3148. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3141–3149. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Wang, X.; Yin, W.; Kong, T.; Jiang, Y.; Li, L.; Shen, C. Task-Aware Monocular Depth Estimation for 3D Object Detection. AAAI Conf. Artificial Intell. 2020, 34, 12257–12264. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Attention Mechanism | < 1.25 | < 1.25 | < 1.25 | Abs Rel | Sq Rel | RMSE | RMSE log |
---|---|---|---|---|---|---|---|
No Attention Mechanism | 0.949 | 0.991 | 0.998 | 0.063 | 0.289 | 2.986 | 0.102 |
AM | 0.948 | 0.991 | 0.998 | 0.062 | 0.291 | 3.102 | 0.102 |
SE | 0.951 | 0.992 | 0.998 | 0.062 | 0.278 | 2.977 | 0.101 |
CBAM | 0.922 | 0.984 | 0.996 | 0.073 | 0.368 | 3.434 | 0.122 |
Attention Mechanisms | < 1.25 | < 1.25 | < 1.25 | Abs Rel | Sq Rel | RMSE | RMSE log |
---|---|---|---|---|---|---|---|
No Attention Mechanism | 0.947 | 0.991 | 0.996 | 0.069 | 0.105 | 0.860 | 0.097 |
AM | 0.969 | 0.993 | 0.996 | 0.057 | 0.080 | 0.797 | 0.086 |
SE | 0.967 | 0.993 | 0.996 | 0.059 | 0.088 | 0.810 | 0.086 |
CBAM | 0.965 | 0.993 | 0.996 | 0.058 | 0.087 | 0.808 | 0.086 |
Stitching Strategy | < 1.25 | < 1.25 | < 1.25 | Abs Rel | Sq Rel | RMSE | RMSE log |
---|---|---|---|---|---|---|---|
Direct Stitching | 0.963 | 0.995 | 0.999 | 0.058 | 0.198 | 2.395 | 0.089 |
CNN Stitching | 0.963 | 0.995 | 0.999 | 0.058 | 0.198 | 2.400 | 0.090 |
Method | < 1.25 | < 1.25 | < 1.25 | Abs Rel | Sq Rel | RMSE | RMSE log |
---|---|---|---|---|---|---|---|
LAPD [14] | 0.968 | 0.995 | 0.999 | 0.056 | 0.182 | 2.281 | 0.087 |
Eigen [2] | 0.652 | 0.880 | 0.952 | 0.221 | 1.624 | 6.248 | 0.293 |
DORN [7] | 0.935 | 0.983 | 0.993 | 0.085 | 0.354 | 2.852 | 0.129 |
VNL [40] | 0.938 | 0.989 | 0.997 | 0.068 | 0.301 | 3.296 | 0.116 |
SACRF [35] | 0.812 | 0.940 | 0.979 | 0.146 | 1.013 | 4.774 | 0.213 |
BTS [12] | 0.961 | 0.994 | 0.999 | 0.055 | 0.218 | 2.623 | 0.092 |
VISTA [47] | 0.964 | 0.995 | 0.999 | 0.055 | 0.190 | 2.357 | 0.089 |
Ours | 0.969 | 0.996 | 0.999 | 0.054 | 0.173 | 2.256 | 0.085 |
Method | < 1.25 | < 1.25 | < 1.25 | Abs Rel | Sq Rel | RMSE | RMSE log |
---|---|---|---|---|---|---|---|
LAPD [14] | 0.962 | 0.994 | 0.999 | 0.059 | 0.212 | 2.446 | 0.091 |
Eigen [2] | 0.688 | 0.920 | 0.964 | 0.193 | 1.371 | 5.976 | 0.265 |
DORN [7] | 0.938 | 0.986 | 0.995 | 0.080 | 0.334 | 2.920 | 0.120 |
VNL [40] | 0.937 | 0.989 | 0.997 | 0.072 | 0.319 | 3.384 | 0.118 |
SACRF [35] | 0.827 | 0.951 | 0.984 | 0.132 | 0.896 | 4.717 | 0.195 |
BTS [12] | 0.956 | 0.993 | 0.998 | 0.059 | 0.241 | 2.756 | 0.096 |
VISTA [47] | 0.959 | 0.993 | 0.999 | 0.059 | 0.212 | 2.462 | 0.092 |
Ours | 0.963 | 0.995 | 0.999 | 0.058 | 0.198 | 2.395 | 0.089 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chiang, T.-H.; Chiang, M.-H.; Tsai, M.-H.; Chang, C.-C. Attention-Based Background/Foreground Monocular Depth Prediction Model Using Image Segmentation. Appl. Sci. 2022, 12, 11186. https://doi.org/10.3390/app122111186
Chiang T-H, Chiang M-H, Tsai M-H, Chang C-C. Attention-Based Background/Foreground Monocular Depth Prediction Model Using Image Segmentation. Applied Sciences. 2022; 12(21):11186. https://doi.org/10.3390/app122111186
Chicago/Turabian StyleChiang, Ting-Hui, Meng-Hsiu Chiang, Ming-Han Tsai, and Che-Cheng Chang. 2022. "Attention-Based Background/Foreground Monocular Depth Prediction Model Using Image Segmentation" Applied Sciences 12, no. 21: 11186. https://doi.org/10.3390/app122111186
APA StyleChiang, T. -H., Chiang, M. -H., Tsai, M. -H., & Chang, C. -C. (2022). Attention-Based Background/Foreground Monocular Depth Prediction Model Using Image Segmentation. Applied Sciences, 12(21), 11186. https://doi.org/10.3390/app122111186