Abstract
Recently, leveraging on the development of end-to-end convolutional neural networks, deep stereo matching networks have achieved remarkable performance far exceeding traditional approaches. However, state-of-the-art stereo frameworks still have difficulties at finding correct correspondences in texture-less regions, detailed structures, small objects and near boundaries, which could be alleviated by geometric clues such as edge contours and corresponding constraints. To improve the quality of disparity estimates in these challenging areas, we propose an effective multi-task learning network, EdgeStereo, composed of a disparity estimation branch and an edge detection branch, which enables end-to-end predictions of both disparity map and edge map. To effectively incorporate edge cues, we propose the edge-aware smoothness loss and edge feature embedding for inter-task interactions. It is demonstrated that based on our unified model, edge detection task and stereo matching task can promote each other. In addition, we design a compact module called residual pyramid to replace the commonly-used multi-stage cascaded structures or 3-D convolution based regularization modules in current stereo matching networks. By the time of the paper submission, EdgeStereo achieves state-of-art performance on the FlyingThings3D dataset, KITTI 2012 and KITTI 2015 stereo benchmarks, outperforming other published stereo matching methods by a noteworthy margin. EdgeStereo also achieves comparable generalization performance for disparity estimation because of the incorporation of edge cues.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The validation image indexes are 3, 15, 33, 34, 36, 45, 59, 60, 69, 71, 72, 80, 85, 88, 104, 108, 115, 146, 149, 150, 159, 161, 162, 163, 170, 172, 173, 175, 178, 179, 181, 185, 187, 188.
Baby1_06, Baby2_06, Barn1_01, Barn2_01, Bull_01, Cloth1_06, Poster_01, Sawtooth_01, Venus_01, Wood1_06
References
Arbelaez, P., Maire, M., Fowlkes, C., & Malik, J. (2011). Contour detection and hierarchical image segmentation. TPAMI, 33(5), 898–916.
Barron, J. T. (2017). A more general robust loss function. arXiv preprint arXiv:1701.03077.
Barron, J. T., Adams, A., Shih, Y., Hernández, C. (2015). Fast bilateral-space stereo for synthetic defocus. In CVPR, pp. 4466–4474.
Canny, J. (1986). A computational approach to edge detection. TPAMI, 6, 679–698.
Chang, J. R., & Chen, Y. S. (2018). Pyramid stereo matching network. In CVPR.
Chen, Z., Sun, X., Wang, L., Yu, Y., Huang, C. (2015). A deep visual correspondence embedding model for stereo matching costs. In ICCV, pp. 972–980.
Cheng, J., Tsai, Y. H., Wang, S., Yang, M. H. (2017). SegFlow: Joint learning for video object segmentation and optical flow. In ICCV, pp. 686–695.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In CVPR, pp. 3213–3223.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR, pp. 248–255.
Dollar, P., Tu, Z., & Belongie, S. (2006). Supervised learning of edges and object boundaries. CVPR, 2, 1964–1971.
Dollár, P., & Zitnick, C. L. (2015). Fast edge detection using structured forests. TPAMI, 37(8), 1558–1570.
Dosovitskiy, A., Fischery, P., Ilg, E., HUsser, P. (2015). Flownet: Learning optical flow with convolutional networks. In ICCV, pp. 2758–2766.
Garg, R., Vijay Kumar, B. G., Carneiro, G., Reid, I. (2016). Unsupervised CNN for single view depth estimation: Geometry to the rescue. In ECCV, pp. 740–756.
Geiger, A., Lenz, P., Urtasun, R. (2012). Are we ready for autonomous driving? The Kitti vision benchmark suite. In CVPR, pp. 3354–3361.
Geiger, A., Roser, M., Urtasun, R. (2010). Efficient large-scale stereo matching. In ACCV, pp. 25–38.
Gidaris, S., & Komodakis, N. (2017). Detect, replace, refine: Deep structured prediction for pixel wise labeling. In CVPR, pp. 5248–5257.
Godard, C., Aodha, O. M., Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR, pp. 6602–6611.
Godard, C., Mac Aodha, O., Brostow, G. J. (2017). Unsupervised monocular depth estimation with left-right consistency. In CVPR.
Guney, F., Geiger, A. (2015). Displets: Resolving stereo ambiguities using object knowledge. In CVPR, pp. 4165–4175.
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In CVPR, pp. 770–778.
Heise, P., Jensen, B., Klose, S., Knoll, A. (2015). Fast dense stereo correspondences by binary locality sensitive hashing. In ICRA, pp. 105–110.
Hirschmuller, H. (2005). Accurate and efficient stereo processing by semi-global matching and mutual information. In CVPR, pp. 807–814.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACMMM, pp. 675–678.
Jie, Z., Wang, P., Ling, Y., Zhao, B., Wei, Y., Feng, J., Liu, W. (2018). Left-right comparative recurrent model for stereo matching. In CVPR, pp. 3838–3846.
Kendall, A., Gal, Y., Cipolla, R. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7482–7491.
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R. (2017). End-to-end learning of geometry and context for deep stereo regression. In ICCV.
Klaus, A., Sormann, M., & Karner, K. (2006). Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. ICPR, 3, 15–18.
Knobelreiter, P., Reinbacher, C., Shekhovtsov, A., Pock, T. (2017). End-to-end training of hybrid CNN-CRF models for stereo. In CVPR, pp. 2339–2348.
Kolmogorov, V., & Zabih, R. (2001). Computing visual correspondence with occlusions using graph cuts. ICCV, 2, 508–515.
Kuznietsov, Y., Stuckler, J., Leibe, B. (2017). Semi-supervised deep learning for monocular depth map prediction. In CVPR, pp. 6647–6655.
Liang, Z., Feng, Y., Guo, Y., Liu, H. (2018) Learning deep correspondence through prior and posterior feature constancy. In CVPR.
Liu, Y., Cheng, M.M., Hu, X., Wang, K., Bai, X. (2017). Richer convolutional features for edge detection. In CVPR, pp. 5872–5881.
Liu, Y., & Lew, M. S. (2016). Learning relaxed deep supervision for better edge detection. In CVPR, pp. 231–240.
Long, J., Shelhamer, E., Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR, pp. 3431–3440.
Lu, C., Uchiyama, H., Thomas, D., Shimada, A., & Taniguchi, R. I. (2018). Sparse cost volume for efficient stereo matching. Remote Sensing, 10(11), 1844.
Luo, W., Schwing, A. G., Urtasun, R. (2016). Efficient deep learning for stereo matching. In CVPR, pp. 5695–5703.
Mao, J., & Xiao, T. (2017). What can help pedestrian detection? In CVPR.
Mayer, N., Ilg, E., Hausser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In CVPR, pp. 4040–4048.
Mei, X., Sun, X., Zhou, M., Jiao, S., Wang, H., Zhang, X. (2011). On building an accurate stereo matching system on graphics hardware. In ICCVW, pp. 467–474.
Mély, D. A., Kim, J., McGill, M., Guo, Y., & Serre, T. (2016). A systematic comparison between visual cues for boundary detection. Vision Research, 120, 93–107.
Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In CVPR, pp. 3061–3070.
Mottaghi, R., Chen, X., Liu, X., Cho, N. G., Lee, S. W. (2014). The role of context for object detection and semantic segmentation in the wild. In CVPR, pp. 891–898.
Nam, K. W., Park, J., Kim, I. Y., & Kim, K. G. (2012). Application of stereo-imaging technology to medical field. Healthcare Informatics Research, 18(3), 158–163.
Pang, J., Sun, W., Ren, J., Yang, C., & Yan, Q. (2017). Cascade residual learning: A two-stage convolutional neural network for stereo matching. ICCV Workshop, 3, 1057–7149.
Ramirez, P.Z., Poggi, M., Tosi, F., Mattoccia, S., Di Stefano, L. (2018). Geometry meets semantics for semi-supervised monocular depth estimation. In ACCV, pp. 298–313.
Ronneberger, O., Fischer, P., Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pp. 234–241.
Scharstein, D., Hirschmüller, H., Kitajima, Y., Krathwohl, G., Nešić, N., Wang, X., Westling, P. (2014). High-resolution stereo datasets with subpixel-accurate ground truth. In GCPR, pp. 31–42.
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47(1–3), 7–42.
Schmid, K., Tomic, T., Ruess, F., Hirschmüller, H., Suppa, M. (2013). Stereo vision based indoor/outdoor navigation for flying robots. In IROS, pp. 3955–3962.
Seki, A., & Pollefeys, M. (2017). SGM-nets: Semi-global matching with neural networks. In CVPR, pp. 21–26.
Shaked, A., & Wolf, L. (2017). Improved stereo matching with constant highway networks and reflective confidence learning. In CVPR, pp. 4641–4650.
Shean, D. E., Alexandrov, O., Moratto, Z. M., Smith, B. E., Joughin, I. R., Porter, C., Morin, P. (2016). An automated, open-source pipeline for mass production of digital elevation models (dems) from very-high-resolution commercial stereo satellite imagery. ISPRS.
Song, X., Zhao, X., Hu, H., Fang, L. (2018). Edgestereo: A context integrated residual pyramid network for stereo matching. In ACCV.
Sun, D., Yang, X., Liu, M. Y., Kautz, J. (2018). PWC-net: CNNS for optical flow using pyramid, warping, and cost volume. In CVPR, pp. 8934–8943.
Tonioni, A., Poggi, M., Mattoccia, S., Di Stefano, L. (2017). Unsupervised adaptation for deep stereo. In ICCV, pp. 1605–1613.
Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., Stefano, L. D. (2019). Real-time self-adaptive deep stereo. In CVPR, pp. 195–204.
Tulyakov, S., Ivanov, A., Fleuret, F. (2018). Practical deep stereo (PDS): Toward applications-friendly deep stereo matching. In NIPS.
Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. In ICCV, pp. 1395–1403.
Yang, G., Deng, Z., Lu, H., Li, Z. (2018). SRC-disp: Synthetic-realistic collaborative disparity learning for stereo mathcing. In ACCV.
Yang, G., Zhao, H., Shi, J., Deng, Z., Jia, J. (2018). Segstereo: Exploiting semantic information for disparity estimation. In ECCV.
Yu, L., Wang, Y., Wu, Y., Jia, Y. (2018). Deep stereo matching with explicit cost aggregation sub-architecture. In AAAI.
Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In CVPR, pp. 4353–4361.
Zbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In CVPR, pp. 1592–1599.
Zbontar, J., & LeCun, Y. (2016). Stereo matching by training a convolutional neural network to compare image patches. JMLR, 17(1–32), 2.
Zhang, L., & Seitz, S. M. (2007). Estimating optimal parameters for mrf stereo from a single image pair. TPAMI, 29(2), 331–342.
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J. (2017). Pyramid scene parsing network. In CVPR, pp. 2881–2890.
Zhong, Y., Dai, Y., Li, H. (2017). Self-supervised learning for stereo matching with self-improving ability. In CVPR.
Zhou, C., Zhang, H., Shen, X., Jia, J. (2017). Unsupervised learning of stereo matching. In ICCV, vol. 2.
Acknowledgements
This research has been supported in part by the funding from NSFC programs (61673269, U1764264, 61273285) and in part by the project funding from Institute of Medical Robotics, Shanghai Jiao Tong University
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by C.V. Jawahar, Hongdong Li, Greg Mori, Konrad Schindler.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Song, X., Zhao, X., Fang, L. et al. EdgeStereo: An Effective Multi-task Learning Network for Stereo Matching and Edge Detection. Int J Comput Vis 128, 910–930 (2020). https://doi.org/10.1007/s11263-019-01287-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-019-01287-w