Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Recently, leveraging on the development of end-to-end convolutional neural networks, deep stereo matching networks have achieved remarkable performance far exceeding traditional approaches. However, state-of-the-art stereo frameworks still have difficulties at finding correct correspondences in texture-less regions, detailed structures, small objects and near boundaries, which could be alleviated by geometric clues such as edge contours and corresponding constraints. To improve the quality of disparity estimates in these challenging areas, we propose an effective multi-task learning network, EdgeStereo, composed of a disparity estimation branch and an edge detection branch, which enables end-to-end predictions of both disparity map and edge map. To effectively incorporate edge cues, we propose the edge-aware smoothness loss and edge feature embedding for inter-task interactions. It is demonstrated that based on our unified model, edge detection task and stereo matching task can promote each other. In addition, we design a compact module called residual pyramid to replace the commonly-used multi-stage cascaded structures or 3-D convolution based regularization modules in current stereo matching networks. By the time of the paper submission, EdgeStereo achieves state-of-art performance on the FlyingThings3D dataset, KITTI 2012 and KITTI 2015 stereo benchmarks, outperforming other published stereo matching methods by a noteworthy margin. EdgeStereo also achieves comparable generalization performance for disparity estimation because of the incorporation of edge cues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. The validation image indexes are 3, 15, 33, 34, 36, 45, 59, 60, 69, 71, 72, 80, 85, 88, 104, 108, 115, 146, 149, 150, 159, 161, 162, 163, 170, 172, 173, 175, 178, 179, 181, 185, 187, 188.

  2. Baby1_06, Baby2_06, Barn1_01, Barn2_01, Bull_01, Cloth1_06, Poster_01, Sawtooth_01, Venus_01, Wood1_06

References

  • Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. TPAMI, 33(5), 898–916.

    Article  Google Scholar 

  • Barron, J. T. (2017). A more general robust loss function. arXiv preprint arXiv:1701.03077.

  • Barron, J. T., Adams, A., Shih, Y., Hernández, C. (2015). Fast bilateral-space stereo for synthetic defocus. In CVPR, pp. 4466–4474.

  • Canny, J. (1986). A computational approach to edge detection. TPAMI, 6, 679–698.

    Article  Google Scholar 

  • Chang, J. R., & Chen, Y. S. (2018). Pyramid stereo matching network. In CVPR.

  • Chen, Z., Sun, X., Wang, L., Yu, Y., Huang, C. (2015). A deep visual correspondence embedding model for stereo matching costs. In ICCV, pp. 972–980.

  • Cheng, J., Tsai, Y. H., Wang, S., Yang, M. H. (2017). SegFlow: Joint learning for video object segmentation and optical flow. In ICCV, pp. 686–695.

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR, pp. 3213–3223.

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255.

  • Dollar, P., Tu, Z., & Belongie, S. (2006). Supervised learning of edges and object boundaries. CVPR, 2, 1964–1971.

    Google Scholar 

  • Dollár, P., & Zitnick, C. L. (2015). Fast edge detection using structured forests. TPAMI, 37(8), 1558–1570.

    Article  Google Scholar 

  • Dosovitskiy, A., Fischery, P., Ilg, E., HUsser, P. (2015). Flownet: Learning optical flow with convolutional networks. In ICCV, pp. 2758–2766.

  • Garg, R., Vijay Kumar, B. G., Carneiro, G., Reid, I. (2016). Unsupervised CNN for single view depth estimation: Geometry to the rescue. In ECCV, pp. 740–756.

  • Geiger, A., Lenz, P., Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In CVPR, pp. 3354–3361.

  • Geiger, A., Roser, M., Urtasun, R. (2010). Efficient large-scale stereo matching. In ACCV, pp. 25–38.

  • Gidaris, S., & Komodakis, N. (2017). Detect, replace, refine: Deep structured prediction for pixel wise labeling. In CVPR, pp. 5248–5257.

  • Godard, C., Aodha, O. M., Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR, pp. 6602–6611.

  • Godard, C., Mac Aodha, O., Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR.

  • Guney, F., Geiger, A. (2015). Displets: Resolving stereo ambiguities using object knowledge. In CVPR, pp. 4165–4175.

  • He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.

  • Heise, P., Jensen, B., Klose, S., Knoll, A. (2015). Fast dense stereo correspondences by binary locality sensitive hashing. In ICRA, pp. 105–110.

  • Hirschmuller, H. (2005). Accurate and efficient stereo processing by semi-global matching and mutual information. In CVPR, pp. 807–814.

  • Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACMMM, pp. 675–678.

  • Jie, Z., Wang, P., Ling, Y., Zhao, B., Wei, Y., Feng, J., Liu, W. (2018). Left-right comparative recurrent model for stereo matching. In CVPR, pp. 3838–3846.

  • Kendall, A., Gal, Y., Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7482–7491.

  • Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R. (2017). End-to-end learning of geometry and context for deep stereo regression. In ICCV.

  • Klaus, A., Sormann, M., & Karner, K. (2006). Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. ICPR, 3, 15–18.

    Google Scholar 

  • Knobelreiter, P., Reinbacher, C., Shekhovtsov, A., Pock, T. (2017). End-to-end training of hybrid CNN-CRF models for stereo. In CVPR, pp. 2339–2348.

  • Kolmogorov, V., & Zabih, R. (2001). Computing visual correspondence with occlusions using graph cuts. ICCV, 2, 508–515.

    Google Scholar 

  • Kuznietsov, Y., Stuckler, J., Leibe, B. (2017). Semi-supervised deep learning for monocular depth map prediction. In CVPR, pp. 6647–6655.

  • Liang, Z., Feng, Y., Guo, Y., Liu, H. (2018) Learning deep correspondence through prior and posterior feature constancy. In CVPR.

  • Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X. (2017). Richer convolutional features for edge detection. In CVPR, pp. 5872–5881.

  • Liu, Y., & Lew, M. S. (2016). Learning relaxed deep supervision for better edge detection. In CVPR, pp. 231–240.

  • Long, J., Shelhamer, E., Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR, pp. 3431–3440.

  • Lu, C., Uchiyama, H., Thomas, D., Shimada, A., & Taniguchi, R. I. (2018). Sparse cost volume for efficient stereo matching. Remote Sensing, 10(11), 1844.

    Article  Google Scholar 

  • Luo, W., Schwing, A. G., Urtasun, R. (2016). Efficient deep learning for stereo matching. In CVPR, pp. 5695–5703.

  • Mao, J., & Xiao, T. (2017). What can help pedestrian detection? In CVPR.

  • Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR, pp. 4040–4048.

  • Mei, X., Sun, X., Zhou, M., Jiao, S., Wang, H., Zhang, X. (2011). On building an accurate stereo matching system on graphics hardware. In ICCVW, pp. 467–474.

  • Mély, D. A., Kim, J., McGill, M., Guo, Y., & Serre, T. (2016). A systematic comparison between visual cues for boundary detection. Vision Research, 120, 93–107.

  • Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In CVPR, pp. 3061–3070.

  • Mottaghi, R., Chen, X., Liu, X., Cho, N. G., Lee, S. W. (2014). The role of context for object detection and semantic segmentation in the wild. In CVPR, pp. 891–898.

  • Nam, K. W., Park, J., Kim, I. Y., & Kim, K. G. (2012). Application of stereo-imaging technology to medical field. Healthcare Informatics Research, 18(3), 158–163.

    Article  Google Scholar 

  • Pang, J., Sun, W., Ren, J., Yang, C., & Yan, Q. (2017). Cascade residual learning: A two-stage convolutional neural network for stereo matching. ICCV Workshop, 3, 1057–7149.

    Google Scholar 

  • Ramirez, P.Z., Poggi, M., Tosi, F., Mattoccia, S., Di Stefano, L. (2018). Geometry meets semantics for semi-supervised monocular depth estimation. In ACCV, pp. 298–313.

  • Ronneberger, O., Fischer, P., Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241.

  • Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In GCPR, pp. 31–42.

  • Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47(1–3), 7–42.

    Article  Google Scholar 

  • Schmid, K., Tomic, T., Ruess, F., Hirschmüller, H., Suppa, M. (2013). Stereo vision based indoor/outdoor navigation for flying robots. In IROS, pp. 3955–3962.

  • Seki, A., & Pollefeys, M. (2017). SGM-nets: Semi-global matching with neural networks. In CVPR, pp. 21–26.

  • Shaked, A., & Wolf, L. (2017). Improved stereo matching with constant highway networks and reflective confidence learning. In CVPR, pp. 4641–4650.

  • Shean, D. E., Alexandrov, O., Moratto, Z. M., Smith, B. E., Joughin, I. R., Porter, C., Morin, P. (2016). An automated, open-source pipeline for mass production of digital elevation models (dems) from very-high-resolution commercial stereo satellite imagery. ISPRS.

  • Song, X., Zhao, X., Hu, H., Fang, L. (2018). Edgestereo: A context integrated residual pyramid network for stereo matching. In ACCV.

  • Sun, D., Yang, X., Liu, M. Y., Kautz, J. (2018). PWC-net: CNNS for optical flow using pyramid, warping, and cost volume. In CVPR, pp. 8934–8943.

  • Tonioni, A., Poggi, M., Mattoccia, S., Di Stefano, L. (2017). Unsupervised adaptation for deep stereo. In ICCV, pp. 1605–1613.

  • Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L. D. (2019). Real-time self-adaptive deep stereo. In CVPR, pp. 195–204.

  • Tulyakov, S., Ivanov, A., Fleuret, F. (2018). Practical deep stereo (PDS): Toward applications-friendly deep stereo matching. In NIPS.

  • Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In ICCV, pp. 1395–1403.

  • Yang, G., Deng, Z., Lu, H., Li, Z. (2018). SRC-disp: Synthetic-realistic collaborative disparity learning for stereo mathcing. In ACCV.

  • Yang, G., Zhao, H., Shi, J., Deng, Z., Jia, J. (2018). Segstereo: Exploiting semantic information for disparity estimation. In ECCV.

  • Yu, L., Wang, Y., Wu, Y., Jia, Y. (2018). Deep stereo matching with explicit cost aggregation sub-architecture. In AAAI.

  • Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In CVPR, pp. 4353–4361.

  • Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In CVPR, pp. 1592–1599.

  • Zbontar, J., & LeCun, Y. (2016). Stereo matching by training a convolutional neural network to compare image patches. JMLR, 17(1–32), 2.

    MATH  Google Scholar 

  • Zhang, L., & Seitz, S. M. (2007). Estimating optimal parameters for mrf stereo from a single image pair. TPAMI, 29(2), 331–342.

    Article  Google Scholar 

  • Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J. (2017). Pyramid scene parsing network. In CVPR, pp. 2881–2890.

  • Zhong, Y., Dai, Y., Li, H. (2017). Self-supervised learning for stereo matching with self-improving ability. In CVPR.

  • Zhou, C., Zhang, H., Shen, X., Jia, J. (2017). Unsupervised learning of stereo matching. In ICCV, vol. 2.

Download references

Acknowledgements

This research has been supported in part by the funding from NSFC programs (61673269, U1764264, 61273285) and in part by the project funding from Institute of Medical Robotics, Shanghai Jiao Tong University

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Zhao.

Additional information

Communicated by C.V. Jawahar, Hongdong Li, Greg Mori, Konrad Schindler.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, X., Zhao, X., Fang, L. et al. EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection. Int J Comput Vis 128, 910–930 (2020). https://doi.org/10.1007/s11263-019-01287-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01287-w

Keywords

Navigation