EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection

Xiao Song¹,
Xu Zhao ORCID: orcid.org/0000-0002-8176-623X^1,2,
Liangji Fang¹,
Hanwen Hu¹ &
…
Yizhou Yu³

3608 Accesses
124 Citations
Explore all metrics

Abstract

Recently, leveraging on the development of end-to-end convolutional neural networks, deep stereo matching networks have achieved remarkable performance far exceeding traditional approaches. However, state-of-the-art stereo frameworks still have difficulties at finding correct correspondences in texture-less regions, detailed structures, small objects and near boundaries, which could be alleviated by geometric clues such as edge contours and corresponding constraints. To improve the quality of disparity estimates in these challenging areas, we propose an effective multi-task learning network, EdgeStereo, composed of a disparity estimation branch and an edge detection branch, which enables end-to-end predictions of both disparity map and edge map. To effectively incorporate edge cues, we propose the edge-aware smoothness loss and edge feature embedding for inter-task interactions. It is demonstrated that based on our unified model, edge detection task and stereo matching task can promote each other. In addition, we design a compact module called residual pyramid to replace the commonly-used multi-stage cascaded structures or 3-D convolution based regularization modules in current stereo matching networks. By the time of the paper submission, EdgeStereo achieves state-of-art performance on the FlyingThings3D dataset, KITTI 2012 and KITTI 2015 stereo benchmarks, outperforming other published stereo matching methods by a noteworthy margin. EdgeStereo also achieves comparable generalization performance for disparity estimation because of the incorporation of edge cues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

EBStereo: edge-based loss function for real-time stereo matching

Article 14 July 2023

EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching

GPDF-Net: geometric prior-guided stereo matching with disparity fusion refinement

Article 04 June 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

The validation image indexes are 3, 15, 33, 34, 36, 45, 59, 60, 69, 71, 72, 80, 85, 88, 104, 108, 115, 146, 149, 150, 159, 161, 162, 163, 170, 172, 173, 175, 178, 179, 181, 185, 187, 188.
Baby1_06, Baby2_06, Barn1_01, Barn2_01, Bull_01, Cloth1_06, Poster_01, Sawtooth_01, Venus_01, Wood1_06

References

Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. TPAMI, 33(5), 898–916.
Article Google Scholar
Barron, J. T. (2017). A more general robust loss function. arXiv preprint arXiv:1701.03077.
Barron, J. T., Adams, A., Shih, Y., Hernández, C. (2015). Fast bilateral-space stereo for synthetic defocus. In CVPR, pp. 4466–4474.
Canny, J. (1986). A computational approach to edge detection. TPAMI, 6, 679–698.
Article Google Scholar
Chang, J. R., & Chen, Y. S. (2018). Pyramid stereo matching network. In CVPR.
Chen, Z., Sun, X., Wang, L., Yu, Y., Huang, C. (2015). A deep visual correspondence embedding model for stereo matching costs. In ICCV, pp. 972–980.
Cheng, J., Tsai, Y. H., Wang, S., Yang, M. H. (2017). SegFlow: Joint learning for video object segmentation and optical flow. In ICCV, pp. 686–695.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR, pp. 3213–3223.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255.
Dollar, P., Tu, Z., & Belongie, S. (2006). Supervised learning of edges and object boundaries. CVPR, 2, 1964–1971.
Google Scholar
Dollár, P., & Zitnick, C. L. (2015). Fast edge detection using structured forests. TPAMI, 37(8), 1558–1570.
Article Google Scholar
Dosovitskiy, A., Fischery, P., Ilg, E., HUsser, P. (2015). Flownet: Learning optical flow with convolutional networks. In ICCV, pp. 2758–2766.
Garg, R., Vijay Kumar, B. G., Carneiro, G., Reid, I. (2016). Unsupervised CNN for single view depth estimation: Geometry to the rescue. In ECCV, pp. 740–756.
Geiger, A., Lenz, P., Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In CVPR, pp. 3354–3361.
Geiger, A., Roser, M., Urtasun, R. (2010). Efficient large-scale stereo matching. In ACCV, pp. 25–38.
Gidaris, S., & Komodakis, N. (2017). Detect, replace, refine: Deep structured prediction for pixel wise labeling. In CVPR, pp. 5248–5257.
Godard, C., Aodha, O. M., Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR, pp. 6602–6611.
Godard, C., Mac Aodha, O., Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR.
Guney, F., Geiger, A. (2015). Displets: Resolving stereo ambiguities using object knowledge. In CVPR, pp. 4165–4175.
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.
Heise, P., Jensen, B., Klose, S., Knoll, A. (2015). Fast dense stereo correspondences by binary locality sensitive hashing. In ICRA, pp. 105–110.
Hirschmuller, H. (2005). Accurate and efficient stereo processing by semi-global matching and mutual information. In CVPR, pp. 807–814.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACMMM, pp. 675–678.
Jie, Z., Wang, P., Ling, Y., Zhao, B., Wei, Y., Feng, J., Liu, W. (2018). Left-right comparative recurrent model for stereo matching. In CVPR, pp. 3838–3846.
Kendall, A., Gal, Y., Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7482–7491.
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R. (2017). End-to-end learning of geometry and context for deep stereo regression. In ICCV.
Klaus, A., Sormann, M., & Karner, K. (2006). Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. ICPR, 3, 15–18.
Google Scholar
Knobelreiter, P., Reinbacher, C., Shekhovtsov, A., Pock, T. (2017). End-to-end training of hybrid CNN-CRF models for stereo. In CVPR, pp. 2339–2348.
Kolmogorov, V., & Zabih, R. (2001). Computing visual correspondence with occlusions using graph cuts. ICCV, 2, 508–515.
Google Scholar
Kuznietsov, Y., Stuckler, J., Leibe, B. (2017). Semi-supervised deep learning for monocular depth map prediction. In CVPR, pp. 6647–6655.
Liang, Z., Feng, Y., Guo, Y., Liu, H. (2018) Learning deep correspondence through prior and posterior feature constancy. In CVPR.
Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X. (2017). Richer convolutional features for edge detection. In CVPR, pp. 5872–5881.
Liu, Y., & Lew, M. S. (2016). Learning relaxed deep supervision for better edge detection. In CVPR, pp. 231–240.
Long, J., Shelhamer, E., Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR, pp. 3431–3440.
Lu, C., Uchiyama, H., Thomas, D., Shimada, A., & Taniguchi, R. I. (2018). Sparse cost volume for efficient stereo matching. Remote Sensing, 10(11), 1844.
Article Google Scholar
Luo, W., Schwing, A. G., Urtasun, R. (2016). Efficient deep learning for stereo matching. In CVPR, pp. 5695–5703.
Mao, J., & Xiao, T. (2017). What can help pedestrian detection? In CVPR.
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR, pp. 4040–4048.
Mei, X., Sun, X., Zhou, M., Jiao, S., Wang, H., Zhang, X. (2011). On building an accurate stereo matching system on graphics hardware. In ICCVW, pp. 467–474.
Mély, D. A., Kim, J., McGill, M., Guo, Y., & Serre, T. (2016). A systematic comparison between visual cues for boundary detection. Vision Research, 120, 93–107.
Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In CVPR, pp. 3061–3070.
Mottaghi, R., Chen, X., Liu, X., Cho, N. G., Lee, S. W. (2014). The role of context for object detection and semantic segmentation in the wild. In CVPR, pp. 891–898.
Nam, K. W., Park, J., Kim, I. Y., & Kim, K. G. (2012). Application of stereo-imaging technology to medical field. Healthcare Informatics Research, 18(3), 158–163.
Article Google Scholar
Pang, J., Sun, W., Ren, J., Yang, C., & Yan, Q. (2017). Cascade residual learning: A two-stage convolutional neural network for stereo matching. ICCV Workshop, 3, 1057–7149.
Google Scholar
Ramirez, P.Z., Poggi, M., Tosi, F., Mattoccia, S., Di Stefano, L. (2018). Geometry meets semantics for semi-supervised monocular depth estimation. In ACCV, pp. 298–313.
Ronneberger, O., Fischer, P., Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241.
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In GCPR, pp. 31–42.
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47(1–3), 7–42.
Article Google Scholar
Schmid, K., Tomic, T., Ruess, F., Hirschmüller, H., Suppa, M. (2013). Stereo vision based indoor/outdoor navigation for flying robots. In IROS, pp. 3955–3962.
Seki, A., & Pollefeys, M. (2017). SGM-nets: Semi-global matching with neural networks. In CVPR, pp. 21–26.
Shaked, A., & Wolf, L. (2017). Improved stereo matching with constant highway networks and reflective confidence learning. In CVPR, pp. 4641–4650.
Shean, D. E., Alexandrov, O., Moratto, Z. M., Smith, B. E., Joughin, I. R., Porter, C., Morin, P. (2016). An automated, open-source pipeline for mass production of digital elevation models (dems) from very-high-resolution commercial stereo satellite imagery. ISPRS.
Song, X., Zhao, X., Hu, H., Fang, L. (2018). Edgestereo: A context integrated residual pyramid network for stereo matching. In ACCV.
Sun, D., Yang, X., Liu, M. Y., Kautz, J. (2018). PWC-net: CNNS for optical flow using pyramid, warping, and cost volume. In CVPR, pp. 8934–8943.
Tonioni, A., Poggi, M., Mattoccia, S., Di Stefano, L. (2017). Unsupervised adaptation for deep stereo. In ICCV, pp. 1605–1613.
Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L. D. (2019). Real-time self-adaptive deep stereo. In CVPR, pp. 195–204.
Tulyakov, S., Ivanov, A., Fleuret, F. (2018). Practical deep stereo (PDS): Toward applications-friendly deep stereo matching. In NIPS.
Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In ICCV, pp. 1395–1403.
Yang, G., Deng, Z., Lu, H., Li, Z. (2018). SRC-disp: Synthetic-realistic collaborative disparity learning for stereo mathcing. In ACCV.
Yang, G., Zhao, H., Shi, J., Deng, Z., Jia, J. (2018). Segstereo: Exploiting semantic information for disparity estimation. In ECCV.
Yu, L., Wang, Y., Wu, Y., Jia, Y. (2018). Deep stereo matching with explicit cost aggregation sub-architecture. In AAAI.
Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In CVPR, pp. 4353–4361.
Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In CVPR, pp. 1592–1599.
Zbontar, J., & LeCun, Y. (2016). Stereo matching by training a convolutional neural network to compare image patches. JMLR, 17(1–32), 2.
MATH Google Scholar
Zhang, L., & Seitz, S. M. (2007). Estimating optimal parameters for mrf stereo from a single image pair. TPAMI, 29(2), 331–342.
Article Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J. (2017). Pyramid scene parsing network. In CVPR, pp. 2881–2890.
Zhong, Y., Dai, Y., Li, H. (2017). Self-supervised learning for stereo matching with self-improving ability. In CVPR.
Zhou, C., Zhang, H., Shen, X., Jia, J. (2017). Unsupervised learning of stereo matching. In ICCV, vol. 2.

Download references

Acknowledgements

This research has been supported in part by the funding from NSFC programs (61673269, U1764264, 61273285) and in part by the project funding from Institute of Medical Robotics, Shanghai Jiao Tong University

Author information

Authors and Affiliations

The Department of Automation, Shanghai Jiao Tong University, Shanghai, 200240, China
Xiao Song, Xu Zhao, Liangji Fang & Hanwen Hu
Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, 200240, China
Xu Zhao
Deepwise AI Lab, Beijing, China
Yizhou Yu

Authors

Xiao Song
View author publications
You can also search for this author in PubMed Google Scholar
Xu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Liangji Fang
View author publications
You can also search for this author in PubMed Google Scholar
Hanwen Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yizhou Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Zhao.

Additional information

Communicated by C.V. Jawahar, Hongdong Li, Greg Mori, Konrad Schindler.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, X., Zhao, X., Fang, L. et al. EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection. Int J Comput Vis 128, 910–930 (2020). https://doi.org/10.1007/s11263-019-01287-w

Download citation

Received: 08 March 2019
Accepted: 26 December 2019
Published: 28 January 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11263-019-01287-w

EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

EBStereo: edge-based loss function for real-time stereo matching

EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching

GPDF-Net: geometric prior-guided stereo matching with disparity fusion refinement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

EBStereo: edge-based loss function for real-time stereo matching

EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching

GPDF-Net: geometric prior-guided stereo matching with disparity fusion refinement

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now