DA4AD: End-to-End Deep Attention-Based Visual Localization for Autonomous Driving

Yao Zhou¹²,
Guowei Wan¹²,
Shenhua Hou¹²,
Li Yu¹²,
Gang Wang¹²,
Xiaofei Rui¹² &
…
Shiyu Song¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12373))

Included in the following conference series:

European Conference on Computer Vision

4020 Accesses
29 Citations

Abstract

We present a visual localization framework based on novel deep attention aware features for autonomous driving that achieves centimeter level localization accuracy. Conventional approaches to the visual localization problem rely on handcrafted features or human-made objects on the road. They are known to be either prone to unstable matching caused by severe appearance or lighting changes, or too scarce to deliver constant and robust localization results in challenging scenarios. In this work, we seek to exploit the deep attention mechanism to search for salient, distinctive and stable features that are good for long-term matching in the scene through a novel end-to-end deep neural network. Furthermore, our learned feature descriptors are demonstrated to be competent to establish robust matches and therefore successfully estimate the optimal camera poses with high precision. We comprehensively validate the effectiveness of our method using a freshly collected dataset with high-quality ground truth trajectories and hardware synchronization between sensors. Results demonstrate that our method achieves a competitive localization accuracy when compared to the LiDAR-based localization solutions under various challenging circumstances, leading to a potential low-cost localization solution for autonomous driving.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AttDLNet: Attention-Based Deep Network for 3D LiDAR Place Recognition

DAttNet: monocular depth estimation network based on attention mechanisms

Article 13 December 2023

YOLOPoint: Joint Keypoint and Object Detection

References

Baidu Apollo open platform. http://apollo.auto/
Alahi, A., Ortiz, R., Vandergheynst, P.: FREAK: fast retina keypoint. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 510–517. IEEE (2012)
Google Scholar
Barsan, I.A., Wang, S., Pokrovsky, A., Urtasun, R.: Learning to localize using a LiDAR intensity map. In: Proceedings of the Conference on Robot Learning (CoRL), pp. 605–616 (2018)
Google Scholar
Brahmbhatt, S., Gu, J., Kim, K., Hays, J., Kautz, J.: Geometry-aware learning of maps for camera localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Bürki, M., et al.: VIZARD: reliable visual localization for autonomous vehicles in urban outdoor environments. arXiv preprint arXiv:1902.04343 (2019)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Chapter Google Scholar
Carlevaris-Bianco, N., Ushani, A.K., Eustice, R.M.: University of Michigan North Campus long-term vision and LiDAR dataset. Int. J. Rob. Res. (IJRR) 35(9), 1023–1035 (2015)
Article Google Scholar
Caselitz, T., Steder, B., Ruhnke, M., Burgard, W.: Monocular camera localization in 3D LiDAR maps. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1926–1931. IEEE (2016)
Google Scholar
Chen, Y., Wang, G.: EnforceNet: monocular camera localization in large scale indoor sparse LiDAR point cloud. arXiv preprint arXiv:1907.07160 (2019)
Cui, D., Xue, J., Du, S., Zheng, N.: Real-time global localization of intelligent road vehicles in lane-level via lane marking detection and shape registration. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4958–4964. IEEE (2014)
Google Scholar
Cui, D., Xue, J., Zheng, N.: Real-time global localization of robotic cars in lane level via lane marking detection and shape registration. IEEE Trans. Intell. Transp. Syst. (T-ITS) 17(4), 1039–1050 (2015)
Article Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)
Google Scholar
Dusmanu, Met al.: D2-Net: atrainable CNN for joint description and detection of local features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Eldar, Y., Lindenbaum, M., Porat, M., Zeevi, Y.Y.: The farthest point strategy for progressive image sampling. IEEE Trans. Image Process. (TIP) 6(9), 1305–1315 (1997)
Article Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)
Article MathSciNet Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. (IJRR) 32(11), 1231–1237 (2013)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3354–3361. IEEE (2012)
Google Scholar
Germain, H., Bourmaud, G., Lepetit, V.: Sparse-to-dense hypercolumn matching for long-term visual localization. In: Proceedings of the International Conference on 3D Vision (3DV), pp. 513–523. IEEE (2019)
Google Scholar
Haralick, B.M., Lee, C.N., Ottenberg, K., Nölle, M.: Review and analysis of solutions of the three point perspective pose estimation problem. Int. J. Comput. Vis.(IJCV) 13(3), 331–356 (1994)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for average precision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Jo, K., Jo, Y., Suhr, J.K., Jung, H.G., Sunwoo, M.: Precise localization of an autonomous car based on probabilistic noise models of road surface marker features using multiple cameras. IEEE Trans. Intell. Transp. Syst. 16(6), 3377–3392 (2015)
Article Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2938–2946 (2015). https://doi.org/10.1109/ICCV.2015.336
Kendall, A., Cipolla, R., et al.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 3, p. 8 (2017)
Google Scholar
Lategahn, H., Beck, J., Kitt, B., Stiller, C.: How to learn an illumination robust image feature for place recognition. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 285–291. IEEE (2013)
Google Scholar
Lategahn, H., Schreiber, M., Ziegler, J., Stiller, C.: Urban localization with camera and inertial measurement unit. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 719–724. IEEE (2013)
Google Scholar
Lategahn, H., Stiller, C.: Vision only localization. IEEE Trans. Intell. Transp. Syst. (T-ITS) 15(3), 1246–1257 (2014)
Article Google Scholar
Levinson, J., Montemerlo, M., Thrun, S.: Map-based precision vehicle localization in urban environments. In: Proceedings of the Robotics: Science and Systems (RSS), vol. 4, p. 1 (2007)
Google Scholar
Levinson, J., Thrun, S.: Robust vehicle localization in urban environments using probabilistic maps. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 4372–4378 (2010)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Google Scholar
Linegar, C., Churchill, W., Newman, P.: Work smart, not hard: recalling relevant experiences for vast-scale but time-constrained localisation. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 90–97. IEEE (2015)
Google Scholar
Linegar, C., Churchill, W., Newman, P.: Made to measure: bespoke landmarks for 24-hour, all-weather localisation with a camera. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 787–794. IEEE (2016)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. (IJCV) 60(2), 91–110 (2004)
Article Google Scholar
Lu, W., Wan, G., Zhou, Y., Fu, X., Yuan, P., Song, S.: DeepVCP: an end-to-end deep neural network for point cloud registration. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Lu, W., Zhou, Y., Wan, G., Hou, S., Song, S.: L3-Net: towards learning based LiDAR localization for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6389–6398 (2019)
Google Scholar
Maddern, W., Pascoe, G., Gadd, M., Barnes, D., Yeomans, B., Newman, P.: Real-time kinematic ground truth for the Oxford robotcar dataset. arXiv preprint arXiv:2002.10152 (2020)
Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 year, 1000 km: the Oxford RobotCar dataset. Int. J. Rob. Res. (IJRR) 36(1), 3–15 (2017)
Article Google Scholar
Maddern, W., Stewart, A.D., Newman, P.: LAPS-II: 6-DoF day and night visual localisation with prior 3D structure for autonomous road vehicles. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 330–337. IEEE (2014)
Google Scholar
Naseer, T., Burgard, W.: Deep regression for monocular camera-based 6-DoF global localization in outdoor environments. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1525–1530 (2017)
Google Scholar
Neubert, P., Schubert, S., Protzel, P.: Sampling-based methods for visual navigation in 3D maps by synthesizing depth images. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2492–2498. IEEE (2017)
Google Scholar
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3456–3465 (2017)
Google Scholar
Pandey, G., McBride, J.R., Eustice, R.M.: Ford campus vision and LiDAR data set. Int. J. Rob. Res. (IJRR) 30(13), 1543–1552 (2011)
Article Google Scholar
Pascoe, G., Maddern, W., Newman, P.: Direct visual localisation and calibration for road vehicles in changing city environments. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 9–16 (2015)
Google Scholar
Radwan, N., Valada, A., Burgard, W.: VLocNet++: deep multitask learning for semantic visual localization and odometry. IEEE Rob. Autom. Lett. (RA-L) 3(4), 4407–4414 (2018)
Article Google Scholar
Ranganathan, A., Ilstrup, D., Wu, T.: Light-weight localization for vehicles using road markings. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 921–927. IEEE (2013)
Google Scholar
Sarlin, P.E., Cadena, C., Siegwart, R., Dymczyk, M.: From coarse to fine: robust hierarchical localization at large scale. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Sattler, T., et al.: Benchmarking 6-DoF outdoor visual localization in changing conditions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8601–8610 (2018)
Google Scholar
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-Based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Savinov, N., Seki, A., Ladicky, L., Sattler, T., Pollefeys, M.: Quad-networks: unsupervised learning to rank for interest point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Schreiber, M., Knöppel, C., Franke, U.: Laneloc: lane marking based localization using highly accurate maps. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 449–454. IEEE (2013)
Google Scholar
Stewart, A.D., Newman, P.: LAPS-localisation using appearance of prior structure: 6-DOF monocular camera localisation using prior pointclouds. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 2625–2632. IEEE (2012)
Google Scholar
von Stumberg, L., Wenzel, P., Khan, Q., Cremers, D.: GN-Net: the Gauss-Newton loss for multi-weather relocalization. IEEE Rob. Autom. Lett. 5(2), 890–897 (2020)
Article Google Scholar
Suhr, J.K., Jang, J., Min, D., Jung, H.G.: Sensor fusion-based low-cost vehicle localization system for complex urban environments. IEEE Trans. Intell. Transp. Syst. (T-ITS) 18(5), 1078–1086 (2016)
Article Google Scholar
Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Toft, C., et al.: Semantic match consistency for long-term visual localization. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Valada, A., Radwan, N., Burgard, W.: Deep auxiliary learning for visual localization and odometry. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 6939–6946 (2018). https://doi.org/10.1109/ICRA.2018.8462979
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using LSTMs for structured feature correlation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 627–637 (2017)
Google Scholar
Wan, G., et al.: Robust and precise vehicle localization based on multi-sensor fusion in diverse city scenes. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 4670–4677 (2018)
Google Scholar
Wang, P., Yang, R., Cao, B., Xu, W., Lin, Y.: DeLS-3D: deep localization and segmentation with a 3D semantic map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Wolcott, R.W., Eustice, R.M.: Visual localization within LiDAR maps for automated urban driving. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 176–183. IEEE (2014)
Google Scholar
Wolcott, R.W., Eustice, R.M.: Fast LiDAR localization using multiresolution Gaussian mixture maps. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 2814–2821 (2015)
Google Scholar
Wolcott, R.W., Eustice, R.M.: Robust LiDAR localization using multiresolution gaussian mixture maps for autonomous driving. Int. J. Rob. Res. (IJRR) 36(3), 292–319 (2017)
Article Google Scholar
Wu, T., Ranganathan, A.: Vehicle localization using road markings. In: Proceedings of the IEEE Intelligent Vehicles Symposium (IV), pp. 1185–1190. IEEE (2013)
Google Scholar
Yi, K.M.Y., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Proceedings of the European Conference on Computer Vision (ECCV) (2016)
Google Scholar
Yu, Y., Zhao, H., Davoine, F., Cui, J., Zha, H.: Monocular visual localization using road structural features. In: Proceedings of the IEEE Intelligent Vehicles Symposium Proceedings (IV), pp. 693–699. IEEE (2014)
Google Scholar

Download references

Acknowledgments

This work is supported by Baidu Autonomous Driving Technology Department (ADT) in conjunction with the Apollo Project. Shufu Xie helped with the development of the lane-based method. Shirui Li and Yuanfan Xie helped with the sensor calibration. Shuai Wang, Lingchang Li, and Shuangcheng Guo helped with the sensor synchronization.

Author information

Authors and Affiliations

Baidu Autonomous Driving Technology Department (ADT), Beijing, China
Yao Zhou, Guowei Wan, Shenhua Hou, Li Yu, Gang Wang, Xiaofei Rui & Shiyu Song

Authors

Yao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Guowei Wan
View author publications
You can also search for this author in PubMed Google Scholar
Shenhua Hou
View author publications
You can also search for this author in PubMed Google Scholar
Li Yu
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Rui
View author publications
You can also search for this author in PubMed Google Scholar
Shiyu Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiyu Song .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 82394 KB)

Supplementary material 2 (pdf 4098 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y. et al. (2020). DA4AD: End-to-End Deep Attention-Based Visual Localization for Autonomous Driving. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12373. Springer, Cham. https://doi.org/10.1007/978-3-030-58604-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-58604-1_17
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58603-4
Online ISBN: 978-3-030-58604-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DA4AD: End-to-End Deep Attention-Based Visual Localization for Autonomous Driving

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AttDLNet: Attention-Based Deep Network for 3D LiDAR Place Recognition

DAttNet: monocular depth estimation network based on attention mechanisms

YOLOPoint: Joint Keypoint and Object Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 2 (pdf 4098 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

DA4AD: End-to-End Deep Attention-Based Visual Localization for Autonomous Driving

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AttDLNet: Attention-Based Deep Network for 3D LiDAR Place Recognition

DAttNet: monocular depth estimation network based on attention mechanisms

YOLOPoint: Joint Keypoint and Object Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 2 (pdf 4098 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation