Self-Supervised Stereo Matching Method Based on SRWP and PCAM for Urban Satellite Images
<p>The training and testing process for proposed self-supervised stereo matching method based on SRWP and PCAM.</p> "> Figure 2
<p>Outline of proposed pre-matching method based on superpixel and random walk.</p> "> Figure 3
<p>Comparison of pixel cost and superpixel cost.</p> "> Figure 4
<p>Results of different superpixel segmentation cases: (<b>a</b>) optical image; (<b>b</b>) accurate segmentation; (<b>c</b>) under-segmentation; (<b>d</b>) over-segmentation.</p> "> Figure 5
<p>Architecture overview of proposed stereo matching network based on parallax-channel attention mechanism.</p> "> Figure 6
<p>An illustration of parallax attention.</p> "> Figure 7
<p>An illustration of channel attention.</p> "> Figure 8
<p>Some pre-matching results with different methods. (<b>a</b>) Results of SIFT; (<b>b</b>) results of Harris; (<b>c</b>) results of LLT; (<b>d</b>) results of FD; (<b>e</b>) results of pre-matching method. The top and bottom are right and left views, respectively.</p> "> Figure 9
<p>Some stereo matching results with different methods for Dataset A. (<b>a</b>) Optical satellite stereo pairs; (<b>b</b>) truth of disparity; (<b>c</b>) results of proposed method; (<b>d</b>) results of FCVFSM; (<b>e</b>) results of SGBM; (<b>f</b>) results of SGM; (<b>g</b>) results of CGN; (<b>h</b>) results of BGNet; (<b>i</b>) results of PSM.</p> "> Figure 10
<p>Some stereo matching results with different methods for dataset B. (<b>a</b>) Optical satellite stereo pairs; (<b>b</b>) mask of left image; (<b>c</b>) truth of disparity; (<b>d</b>) results of proposed algorithm; (<b>e</b>) results of FCVFSM; (<b>f</b>) results of SGBM; (<b>g</b>) results of SGM; (<b>h</b>) results of CGN; (<b>i</b>) results of BGNet; (<b>j</b>) results of PSM.</p> "> Figure 11
<p>Some 3D reconstruction results with different methods for dataset C. (<b>a</b>) Optical satellite stereo pairs; (<b>b</b>) results of SGBM; (<b>c</b>) results of SGM; (<b>d</b>) results of FCVFSM; (<b>e</b>) results of CGN; (<b>f</b>) results of BGNet; (<b>g</b>) results of PSM; (<b>h</b>) results of proposed algorithm.</p> ">
Abstract
:1. Introduction
- The traditional matching methods do not need a large number of training samples, which makes them faster and requiring less computational resources. They can obtain some high-confidence matching points in the local area of the image through simple and known feature artificial selection. However, in urban scenes, there are more complex situations on satellite images than in other scenes. Traditional methods are not as accurate as the CNN method. They only describe the matching cost using conventional features such as gradient, census, and scale-invariant feature transform (SIFT), which extract limited feature dimensions and make it difficult to achieve better results on satellite images [4].
- Convolutional neural networks can extract deep features for finding more accurate stereo matching correspondence, which is more advantageous in processing larger amounts of remote sensing data. However, a large number of training samples containing truth labels are required for CNN training, which is difficult to obtain for stereo matching of satellite data [5]. If the network model trained with other datasets is directly used to match real satellite stereo remote sensing images, the effect is poor [6].
- (1)
- A pre-matching method based on superpixel random walk is proposed. The occlusion and parallax discontinuity existing in stereo images are handled by constructing parallax consistency and mutation constraints. The matching cost update is achieved by superpixel segmentation and random walk to ensure the reliability of disparity for weak texture and parallax mutation regions. This method is robust to visual difference and occlusion between images with different viewing angles.
- (2)
- A parallax-channel attention stereo matching network is proposed. The self-supervised training problem under sparse samples is solved by a feature enhancement module. The correspondence of stereo image pairs is captured by parallax-channel attention. The method can achieve better results for stereo matching of complex urban scenes.
2. Related Work
3. The Proposed Method
3.1. Superpixel Random Walk Pre-Matching
3.1.1. Point Matching Cost
3.1.2. Block Matching Cost
3.1.3. Optimization and Updating
3.2. Stereo Matching Network Based on Parallax-Channel Attention Mechanism
3.2.1. Feature Extraction and Enhancement
3.2.2. Parallax-Channel Attention
Parallax Attention
Channel Attention
Cascaded Parallax-Channel Attention Module
3.2.3. Disparity Calculation and Refinement
3.2.4. Losses
- Photometric Loss
- 2.
- Smoothness Loss
- 3.
- PCAM Loss
- 4.
- L1 loss
4. Experimental Results and Discussion
4.1. Data Sets, Metrics, and Implementation Details
- Dataset
- Metrics
- Implementation Details
4.2. Evaluation of Pre-Matching Performance
4.3. Ablation Study of SRWP and PCAM
- Pre-matching results
- Different Loss
4.4. Flexibility of SRWP and PCAM
- Resolutions
- Maximum Parallax
4.5. Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Xiao, X.; Guo, B.; Li, D. Multi-view stereo matching based on self-adaptive patch and image grouping for multiple unmanned aerial vehicle imagery. Remote Sens. 2016, 8, 89. [Google Scholar] [CrossRef] [Green Version]
- Nguatem, W.; Mayer, H. Modeling urban scenes from Pointclouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3857–3866. [Google Scholar]
- Wohlfeil, J.; Hirschmüller, H.; Piltz, B.; Börner, A.; Suppa, M. Fully automated generation of accurate digital surface models with sub-meter resolution from satellite imagery. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2012, XXXIX-B3, 75–80. [Google Scholar] [CrossRef] [Green Version]
- Zhao, L.; Liu, Y.; Men, C.; Men, Y. Double propagation stereo matching for urban 3-D reconstruction from satellite imagery. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–17. [Google Scholar] [CrossRef]
- Zhou, C.; Zhang, H.; Shen, X. Unsupervised learning of stereo matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1567–1575. [Google Scholar]
- Pang, J.; Sun, W.; Yang, C. Zoom and learn: Generalizing deep stereo matching to novel domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2070–2079. [Google Scholar]
- Muresan, M.P.; Nedevschi, S.; Danescu, R. A multi patch warping approach for improved stereo block matching. In Proceedings of the International Conference on Computer Vision Theory and Applications, Porto, Portugal, 27 February–1 March 2017; pp. 459–466. [Google Scholar]
- Spangenberg, R.; Langner, T.; Adfeldt, S.; Rojas, R. Large scale semi-global matching on the CPU. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV 2014), Dearborn, MI, USA, 8–11 June 2014; pp. 195–201. [Google Scholar]
- Liu, X.; Li, Z.H.; Li, D.M. Computing stereo correspondence based on motion detection and graph cuts. In Proceedings of the 2012 Second International Conference on Instrumentation, Measurement, Computer, Communication and Control (IMCCC), Harbin, China, 8–10 December 2012; pp. 1468–1471. [Google Scholar]
- Hirschmuller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 328–341. [Google Scholar] [CrossRef]
- Huang, X.; Zhang, Y.; Yue, Z. Image-guided non-local dense matching with three-steps optimization. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 3, 67–74. [Google Scholar] [CrossRef] [Green Version]
- Li, M.; Kwoh, L.K.; Yang, C.-J.; Liew, S.C. 3D building extraction with semi-global matching from stereo pair worldview-2 satellite imageries. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium, Milan, Italy, 26–31 July 2015; pp. 3006–3009. [Google Scholar] [CrossRef]
- Rhemann, C.; Hosni, A.; Bleyer, M.; Rother, C.; Gelautz, M. Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 504–511. [Google Scholar]
- Oh, C.; Ham, B.; Sohn, K. Probabilistic correspondence matching using random walk with restart. In Proceedings of the British Machine Vision Conference (BMVC 2012), Guildford, UK, 3–7 September 2012; pp. 1–10. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Learning to compare image patches via convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4353–4361. [Google Scholar]
- Shaked, A.; Wolf, L. Improved stereo matching with constant highway networks and reflective confidence learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6901–6910. [Google Scholar] [CrossRef] [Green Version]
- Zbontar, J.; LeCun, Y. Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1592–1599. [Google Scholar]
- Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 66–75. [Google Scholar]
- Chang, J.-R.; Chen, Y.-S. Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 5410–5418. [Google Scholar]
- Seki, A.; Pollefeys, M. SGM-Nets: Semi-global matching with neural networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 6640–6649. [Google Scholar]
- Suliman, A.; Zhang, Y. Double projection planes method for generating enriched disparity maps from multi-view stereo satellite images. Photogramm. Eng. Remote Sens. 2017, 83, 749–760. [Google Scholar] [CrossRef]
- Tatar, N.; Saadatseresht, M.; Arefi, H.; Hadavand, A. Quasi-epipolar resampling of high resolution satellite stereo imagery for semi global matching. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 40, 707. [Google Scholar] [CrossRef] [Green Version]
- Mandanici, E.; Girelli, V.A.; Poluzzi, L. Metric accuracy of digital elevation models from worldview-3 stereo-pairs in urban areas. Remote Sens. 2019, 11, 878. [Google Scholar] [CrossRef] [Green Version]
- Yang, W.; Li, X.; Yang, B.; Fu, Y. A novel stereo matching algorithm for digital surface model (DSM) generation in water areas. Remote Sens. 2020, 12, 870. [Google Scholar] [CrossRef] [Green Version]
- Zhu, H.; Jiao, L.; Ma, W.; Liu, F.; Zhao, W. A novel neural network for remote sensing image matching. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2853–2865. [Google Scholar] [CrossRef] [PubMed]
- Tao, R.; Xiang, Y.; You, H. Stereo matching of VHR remote sensing images via bidirectional pyramid network. In Proceedings of the IGARSS 2020–2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa Village, HI, USA, 16–26 July 2020; pp. 6742–6745. [Google Scholar]
- Froba, B.; Ernst, A. Face detection with the modified census transform. In Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, Korea, 19 May 2004; pp. 91–96. [Google Scholar]
- Norouzi, M.; Fleet, D.J.; Salakhutdinov, R. Hamming distance metric learning. Adv. Neural Inf. Process. Syst. 2012, 2, 1061–1069. [Google Scholar]
- Yin, J.; Wang, T.; Du, Y.; Liu, X.; Zhou, L.; Yang, J. SLIC superpixel segmentation for polarimetric SAR images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–17. [Google Scholar] [CrossRef]
- Dong, X.; Shen, J.; Shao, L.; Van Gool, L. Sub-markov random walk for image segmentation. IEEE Trans. Image Process. 2016, 25, 516–527. [Google Scholar] [CrossRef] [Green Version]
- Li, W.; Qi, F.; Tang, M.; Yu, Z. Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing 2020, 387, 63–77. [Google Scholar] [CrossRef]
- Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 483–499. [Google Scholar]
- Wang, L.; Wang, Y.; Liang, Z.; Lin, Z.; Yang, J.; An, W.; Guo, Y. Learning parallax attention for stereo image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 12250–12259. [Google Scholar]
- Li, H.; Qiu, K.; Chen, L.; Mei, X.; Hong, L.; Tao, C. SCAttNet: Semantic segmentation network with spatial and channel attention mechanism for high-resolution remote sensing images. IEEE Geosci. Remote Sens. Lett. 2020, 18, 905–909. [Google Scholar] [CrossRef]
- Zhang, K.; Fang, Y.; Min, D.; Sun, L.; Yang, S.; Yan, S.; Tian, Q. Cross-scale cost aggregation for stereo matching. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1590–1597. [Google Scholar]
- Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2018; pp. 85–100. [Google Scholar]
- Guo, C.; Chen, D.; Huang, Z. Learning efficient stereo matching network with depth discontinuity aware super-resolution. IEEE Access 2019, 7, 159712–159723. [Google Scholar] [CrossRef]
- Fleet, D.; Weiss, Y. Optical flow estimation. In Handbook of Mathematical Models in Computer Vision; Springer: Boston, MA, USA, 2006; pp. 239–257. [Google Scholar]
- Godard, C.; Mac Aodha, O.; Brostow, G.J. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 270–279. [Google Scholar]
- Li, A.; Yuan, Z. Occlusion aware stereo matching via cooperative unsupervised learning. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 4–6 December 2018; pp. 197–213. [Google Scholar]
- Yin, Z.; Shi, J. GeoNet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1983–1992. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:abs/1412.6980. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Ram, P.; Padmavathi, S. Analysis of Harris corner detection for color images. In Proceedings of the 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), Paralakhemundi, India, 3–5 October 2016; pp. 405–410. [Google Scholar]
- Ma, J.; Zhou, H.; Zhao, J.; Gao, Y.; Jiang, J.; Tian, J. Robust feature matching for remote sensing image registration via locally linear transforming. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6469–6481. [Google Scholar] [CrossRef]
- Du, W.-L.; Li, X.-Y.; Ye, B.; Tian, X.-L. A fast dense feature-matching model for cross-track pushbroom satellite imagery. Sensors 2018, 18, 4182. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pilzer, A.; Xu, D.; Puscas, M.; Ricci, E.; Sebe, N. Unsupervised adversarial depth estimation using cycled generative networks. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 587–595. [Google Scholar]
- Xu, B.; Xu, Y.; Yang, X.; Jia, W.; Guo, Y. Bilateral grid learning for stereo matching network. arXiv 2021, arXiv:abs/2101.01601. [Google Scholar]
Method | EPE | RMSE | D1 | D3 | Points |
---|---|---|---|---|---|
SIFT | 1.31 | 3.15 | 0.17 | 0.10 | 984 |
Harris | 1.12 | 2.44 | 0.19 | 0.06 | 319 |
LLT | 1.24 | 2.75 | 0.17 | 0.07 | 973 |
FD | 1.35 | 3.45 | 0.17 | 0.11 | 1115 |
Proposed | 1.11 | 2.45 | 0.12 | 0.06 | 29,021 |
Method | Threshold | Number of Points | EPE | D1 | D3 |
---|---|---|---|---|---|
Pre-matching | 0.001 | 20,922 | 2.53 | 0.28 | 0.18 |
Pre-matching | 0.005 | 24,576 | 2.50 | 0.27 | 0.18 |
Pre-matching | 0.01 | 29,021 | 2.44 | 0.25 | 0.16 |
Pre-matching | 0.02 | 39,097 | 2.62 | 0.29 | 0.19 |
Pre-matching | 0.05 | 51,724 | 2.79 | 0.32 | 0.23 |
Lp | Ls | LPCAM | Ll | EPE | D1 | D3 |
---|---|---|---|---|---|---|
✓ | 5.79 | 0.31 | 0.24 | |||
✓ | ✓ | 4.12 | 0.30 | 0.22 | ||
✓ | ✓ | ✓ | 3.24 | 0.28 | 0.19 | |
✓ | ✓ | ✓ | ✓ | 2.44 | 0.25 | 0.16 |
Resolutions | EPE | D1 | D3 |
---|---|---|---|
1024 × 1024 | 2.44 | 0.25 | 0.16 |
512 × 512 | 2.35 | 0.23 | 0.15 |
256 × 256 | 2.36 | 0.22 | 0.14 |
128 × 128 | 2.34 | 0.20 | 0.14 |
Range of Disparity | EPE | D1 | D3 |
---|---|---|---|
(0,40) | 2.42 | 0.24 | 0.16 |
(0,80) | 2.56 | 0.26 | 0.18 |
(0,120) | 2.64 | 0.27 | 0.20 |
(0,160) | 2.80 | 0.28 | 0.21 |
Method | Dataset A | Dataset B | Dataset C | ||||||
---|---|---|---|---|---|---|---|---|---|
EPE | D1 | D3 | Time (s) | EPE | D1 | D3 | RMSE | Time (s) | |
FCVFSM | 4.56 | 0.49 | 0.36 | 24.67 | 4.84 | 0.47 | 0.39 | 6.21 | 3.79 |
SGBM | 4.95 | 0.47 | 0.32 | 22.96 | 5.10 | 0.45 | 0.35 | 7.19 | 3.53 |
SGM | 3.73 | 0.40 | 0.29 | 8.95 | 3.65 | 0.39 | 0.27 | 5.14 | 1.37 |
PSMNet | 3.14 | 0.33 | 0.26 | 1.36 | 3.23 | 0.34 | 0.25 | 3.75 | 0.21 |
CGN | 3.39 | 0.35 | 0.22 | 1.12 | 3.45 | 0.38 | 0.24 | 3.93 | 0.17 |
BGNet | 2.85 | 0.31 | 0.18 | 1.10 | 2.83 | 0.31 | 0.17 | 3.32 | 0.17 |
Proposed | 2.44 | 0.25 | 0.16 | 1.18 | 2.32 | 0.24 | 0.14 | 2.36 | 0.18 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, W.; Chen, H.; Yang, S. Self-Supervised Stereo Matching Method Based on SRWP and PCAM for Urban Satellite Images. Remote Sens. 2022, 14, 1636. https://doi.org/10.3390/rs14071636
Chen W, Chen H, Yang S. Self-Supervised Stereo Matching Method Based on SRWP and PCAM for Urban Satellite Images. Remote Sensing. 2022; 14(7):1636. https://doi.org/10.3390/rs14071636
Chicago/Turabian StyleChen, Wen, Hao Chen, and Shuting Yang. 2022. "Self-Supervised Stereo Matching Method Based on SRWP and PCAM for Urban Satellite Images" Remote Sensing 14, no. 7: 1636. https://doi.org/10.3390/rs14071636