Abstract
Simultaneous Localization and Mapping (SLAM) aims to estimate the position and reconstruct the map of mobile robotics. Odometry is an essential component that tries to calculate the translations and rotations between frames of the sensors attached to the vehicle on the fly. Visual-LiDAR Odometry (VLO) is a prominent approach that has advantages in the sensor costs of cameras and robustness to environmental changes of LiDAR sensors. In general, one of the critical tasks in Odometry is selecting the important features between frames. In this paper, we proposed an end-to-end visual LiDAR odometry method named AdVLO that selects the important regions between frames via an attention-driven mechanism. A mask of essential regions of the input frame is generated via the attention mechanism. We then fuse the attention mask with the corresponding frame to maintain the essential regions. Instead of concatenating like previous works in VLO, we fuse the visual features and LiDAR using the Guided attention technique. The translation and rotation of the camera are calculated via the sequential computation of the LSTM. Experimental results on the KITTI dataset show that our proposed method achieves promising results compared to other odometry methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
An, Y., Shi, J., Gu, D., Liu, Q.: Visual-lidar slam based on unsupervised multi-channel deep neural networks. Springer Cogn. Comput. 14, 1496–1508 (2022)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Cho, Y., Kim, G., Kim, A.: DeepLO: geometry-aware deep lidar odometry. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2145–2152 (2022)
Davison, A.J., Reid, I., Molton, N., Stasse, O.: MonoSLAM: real-time single camera slam. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 1052–1067 (2007). https://doi.org/10.1109/tpami.2007.1049
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32, 1231–1237 (2013). https://doi.org/10.1177/0278364913491297
Geiger, A., Ziegler, J., Stiller, C.: StereoScan: dense 3d reconstruction in real-time. In: IEEE Intelligent Vehicles Symposium (IV), pp. 963–968 (2011)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: International Conference on Computer Vision (ICCV), pp. 2938–2946 (2015)
Li, Q., Chen, S., Wang, C., Li, X., Wen, C., Cheng, M., Li, J.: Lo-net: deep real-time lidar odometry. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019). https://doi.org/10.1109/cvpr.2019.00867
Li, R., Gu, D., Liu, Q., Long, Z., Hu, H.: Semantic scene mapping with spatio-temporal deep neural network for robotic applications. Cogn. Comput. (2018). https://doi.org/10.1007/s12559-017-9526-9
Li, R., Wang, S., Gu, D.: DeepSLAM: a robust monocular slam system with unsupervised deep learning. IEEE Trans. Industr. Electron. (2020). https://doi.org/10.1109/tie.2020.2982096
Li, R., Wang, S., Long, Z., Gu, D.: UnDeepVO: monocular visual odometry through unsupervised deep learning. ArXiv e-prints (2017). https://doi.org/10.48550/ARXIV.1709.06841, arXiv:1709.06841
Lowe, D.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999). https://doi.org/10.1109/ICCV.1999.790410
Mur-Artal, R., Tardós, J.D.: Orb-SLAM2: an open-source slam system for monocular, stereo and RGB-D cameras. IEEE Trans. Rob. (2017). https://doi.org/10.1109/tro.2017.2705103
Nguyen, X.D., You, B.J., Oh, S.R.: A simple framework for indoor monocular slam. Int. J. Control. Autom. Syst. 6, 62–75 (2008)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 32 (2019). https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
Qi, C., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3d classification and segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77–85 (2017)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: International Conference on Computer Vision, pp. 2564–2571 (2011). https://doi.org/10.1109/ICCV.2011.6126544
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pp. 145–152 (2001). https://doi.org/10.1109/IM.2001.924423
Shi, J., Tomasi: Good features to track. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994). https://doi.org/10.1109/CVPR.1994.323794
Wang, S., Clark, R., Wen, H., Trigoni, N.: DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International Conference on Robotics and Automation (ICRA) (2017). https://doi.org/10.1109/icra.2017.7989236
Weixin, L., Lu, W., Zhou, Y., Wan, G., Hou, S., Song, S.: L3-net: towards learning based lidar localization for autonomous driving. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6389–6398 (2019). https://doi.org/10.1109/cvpr.2019.00655
Xu, C., Feng, Z., Chen, Y., Wang, M., Wei, T.: FeatNet: large-scale fraud device detection by network representation learning with rich features. In: Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pp. 57–63 (2018)
Yan, M., Wang, J., Li, J., Zhang, C.: Loose coupling visual-lidar odometry by combining VISO2 and LOAM. In: 36th Chinese Control Conference (CCC), pp. 6841–6846 (2017)
Yin, D., et al.: CAE-LO: lidar odometry leveraging fully unsupervised convolutional auto-encoder for interest point detection and feature description. arXiv: Computer Vision and Pattern Recognition (2020)
Zhan, H., Garg, R., Weerasekera, C.S., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 340–349 (2018)
Zhang, J., Singh, S.: LOAM: lidar odometry and mapping in real-time. In: Proceedings of Robotics: Science and Systems (RSS 2014) (2014)
Zhang, J., Singh, S.: Visual-lidar odometry and mapping: low-drift, robust, and fast. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2174–2181 (2015)
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612–6619 (2017). https://doi.org/10.1109/CVPR.2017.700
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lam, H., Pho, K., Yoshitaka, A. (2023). AdVLO: Region Selection via Attention-Driven for Visual LiDAR Odometry. In: Nguyen, N.T., et al. Intelligent Information and Database Systems. ACIIDS 2023. Lecture Notes in Computer Science(), vol 13995. Springer, Singapore. https://doi.org/10.1007/978-981-99-5834-4_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-5834-4_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5833-7
Online ISBN: 978-981-99-5834-4
eBook Packages: Computer ScienceComputer Science (R0)