Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (131)

Search Parameters:
Keywords = ORB SLAM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 12893 KiB  
Article
Research on Visual–Inertial Measurement Unit Fusion Simultaneous Localization and Mapping Algorithm for Complex Terrain in Open-Pit Mines
by Yuanbin Xiao, Wubin Xu, Bing Li, Hanwen Zhang, Bo Xu and Weixin Zhou
Sensors 2024, 24(22), 7360; https://doi.org/10.3390/s24227360 - 18 Nov 2024
Viewed by 460
Abstract
As mining technology advances, intelligent robots in open-pit mining require precise localization and digital maps. Nonetheless, significant pitch variations, uneven highways, and rocky surfaces with minimal texture present substantial challenges to the precision of feature extraction and positioning in traditional visual SLAM systems, [...] Read more.
As mining technology advances, intelligent robots in open-pit mining require precise localization and digital maps. Nonetheless, significant pitch variations, uneven highways, and rocky surfaces with minimal texture present substantial challenges to the precision of feature extraction and positioning in traditional visual SLAM systems, owing to the intricate terrain features of open-pit mines. This study proposes an improved SLAM technique that integrates visual and Inertial Measurement Unit (IMU) data to address these challenges. The method incorporates a point–line feature fusion matching strategy to enhance the quality and stability of line feature extraction. It integrates an enhanced Line Segment Detection (LSD) algorithm with short segment culling and approximate line merging techniques. The combination of IMU pre-integration and visual feature restrictions is executed inside a tightly coupled visual–inertial framework utilizing a sliding window approach for back-end optimization, enhancing system robustness and precision. Experimental results demonstrate that the suggested method improves RMSE accuracy by 36.62% and 26.88% on the MH and VR sequences of the EuRoC dataset, respectively, compared to ORB-SLAM3. The improved SLAM system significantly reduces trajectory drift in the simulated open-pit mining tests, improving localization accuracy by 40.62% and 61.32%. The results indicate that the proposed method demonstrates significance. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>The system framework diagram. This procedure encompasses data input, front-end visual–inertial odometry, closed-loop detection, back-end optimization, and mapping; The red box in the data input section represents the sparse textured slope.</p>
Full article ">Figure 2
<p>An example diagram of the line feature extraction optimization method. The efficacy of line segment identification is enhanced by implementing short line elimination and approximate line segment amalgamation procedures.</p>
Full article ">Figure 3
<p>(<b>a</b>) Flowchart of improved LSD line feature detection algorithm; (<b>b</b>) schematic diagram of similar line feature merging.</p>
Full article ">Figure 4
<p>Visual observation model and IMU schematic diagram. The IMU data must be integrated and calculated in discrete time due to the fact that its data acquisition frequency is significantly higher than that of the camera. Consequently, a unified data format is necessary to ensure close coupling of the data. This diagram uses hollow circles, hollow triangles, green stars, and black squares to represent image frames, keyframes, IMU data, and the pre-integration process.</p>
Full article ">Figure 5
<p>Marginalization model. The relationship model between the camera and the landmark locations during the marginalization process.</p>
Full article ">Figure 6
<p>Histogram for the performance comparison of the line feature extraction algorithm: (<b>a</b>) the average time required to derive line features; (<b>b</b>) the average number of line feature extractions.</p>
Full article ">Figure 7
<p>Line feature extraction algorithm performance comparison: (<b>a</b>) the LSD algorithm’s effect on line feature extraction; (<b>b</b>) the enhanced LSD method. Utilizing short segment elimination and approximate line merging techniques markedly eliminates redundant short line features while preserving the longer line segments essential for localization precision. The red box highlights the comparison section between the two images, with the green dots and lines representing the extracted point and line features from the images, respectively.</p>
Full article ">Figure 8
<p>Histogram of absolute trajectory error. The histogram illustrates that, in the MH sequence, the absolute trajectory error of the enhanced algorithm is less than that of other algorithms, whereas, in the VR sequence, the enhanced algorithm performs comparably to or better than the perfect algorithm.</p>
Full article ">Figure 9
<p>Analysis results of trajectory error comparison: (<b>a</b>) comparison of the trajectory for Sequence MH_04_difficult; (<b>b</b>) comparison of difficult trajectories in Sequence V2_03. The black boxes and red arrows in the figure are used to enlarge key areas and mark trajectory deviations, highlighting the accuracy differences among different algorithms in these regions.</p>
Full article ">Figure 10
<p>Results of absolute pose inaccuracy for each series: (<b>a</b>) Sequence MH_04_difficult; (<b>b</b>) Sequence MH_05_difficult; (<b>c</b>) Sequence V1_02_medium; (<b>d</b>) Sequence V1_03_difficult. The color-coded line represents varying levels of Absolute Pose Error (APE) along the trajectory, with red indicating higher error and blue indicating lower error, highlighting accuracy differences across segments.</p>
Full article ">Figure 11
<p>Three-dimensional point cloud maps: (<b>a</b>) Sequence MH_05_difficult; (<b>b</b>) Sequence V1_03_difficult. The figure shows a 3D mapping visualization where the green lines represent the estimated trajectory, red points indicate mapped features, and black points show additional environmental points.</p>
Full article ">Figure 12
<p>Experimental mining intelligent robot platform: (<b>a</b>) left view; (<b>b</b>) front view.</p>
Full article ">Figure 13
<p>Scene 1: circular open-pit excavation: (<b>a</b>) real-world scene; (<b>b</b>) diagram of movement trajectory. In (<b>a</b>), the red arrows represent the motion trajectory of the mapping robot. In (<b>b</b>), points A, B, and C represent key checkpoints along the closed-loop path.</p>
Full article ">Figure 14
<p>Comparison of trajectory errors in Scenario 1.</p>
Full article ">Figure 15
<p>Analysis of experimental outcomes in Scenario 1: (<b>a</b>) comparison of 2D plane trajectories; (<b>b</b>) absolute trajectory error of data series.</p>
Full article ">Figure 16
<p>Scene 2: Uneven road conditions in an open-pit mine: (<b>a</b>) real-world scene; (<b>b</b>) diagram of movement trajectory. In (<b>a</b>), the red arrows represent the motion trajectory of the mapping robot. In (<b>b</b>), points A, B, and C represent key checkpoints along the path.</p>
Full article ">Figure 17
<p>Comparison of trajectory errors in Scenario 2.</p>
Full article ">
18 pages, 2990 KiB  
Article
A GGCM-E Based Semantic Filter and Its Application in VSLAM Systems
by Yuanjie Li, Chunyan Shao and Jiaming Wang
Electronics 2024, 13(22), 4487; https://doi.org/10.3390/electronics13224487 - 15 Nov 2024
Viewed by 289
Abstract
Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although [...] Read more.
Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although the RANSAC-based framework can enable better results, it is computationally inefficient and unstable in the presence of a large number of outliers. A Faster R-CNN learning-based semantic filter is proposed to explore the semantic information of inliers to remove low-quality correspondences, helping vSLAM localize accurately in our previous work. However, the semantic filter learning method generalizes low precision for low-level and dense texture-rich scenes, leading the semantic filter-based vSLAM to be unstable and have poor geometry estimation. In this paper, a GGCM-E-based semantic filter using YOLOv8 is proposed to address these problems. Firstly, the semantic patches of images are collected from the KITTI dataset, the TUM dataset provided by the Technical University of Munich, and real outdoor scenes. Secondly, the semantic patches are classified by our proposed GGCM-E descriptors to obtain the YOLOv8 neural network training dataset. Finally, several semantic filters for filtering low-level and dense texture-rich scenes are generated and combined into the ORB-SLAM3 system. Extensive experiments show that the semantic filter can detect and classify semantic levels of different scenes effectively, filtering low-level semantic scenes to improve the quality of correspondences, thus achieving accurate and robust trajectory reconstruction and mapping. For the challenging autonomous driving benchmark and real environments, the vSLAM system with respect to the GGCM-E-based semantic filter demonstrates its superiority regarding reducing the 3D position error, such that the absolute trajectory error is reduced by up to approximately 17.44%, showing its promise and good generalization. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence in Robotics)
Show Figures

Figure 1

Figure 1
<p>ORB-SLAM3 framework with the proposed semantic filter module.</p>
Full article ">Figure 2
<p>Framework of the proposed semantic filter approach.</p>
Full article ">Figure 3
<p>Computation of GGCM-E features.</p>
Full article ">Figure 4
<p>Semantic filtering on the KITTI frame.</p>
Full article ">Figure 5
<p>Semantic filtering on our captured outdoor frame.</p>
Full article ">Figure 6
<p>The trajectory of KITTI07 with respect to the ground truth using GGCM-E semantic filter.</p>
Full article ">Figure 7
<p>Comparison of trajectories between the proposed method and ground truth in the KITTI dataset.</p>
Full article ">Figure 8
<p>Comparison on APEs with respect to ground truth of the ORB-SLAM3 and the semantic filter.</p>
Full article ">Figure 9
<p>Dense texture-rich sequences in TUM dataset (DTR sequences).</p>
Full article ">Figure 10
<p>Comparison of camera trajectories in DTR sequences.</p>
Full article ">Figure 11
<p>Comparison of the trajectory with respect to the ground truth of DynaSLAM and GGCM-E+DynaSLAM on KITTI00 sequences.</p>
Full article ">Figure 12
<p>Comparison of the APEs of semantic filter-based Structure-SLAM, LDSO and DynaSLAM on KITTI07 sequences.</p>
Full article ">
20 pages, 1837 KiB  
Article
A Monocular Ranging Method for Ship Targets Based on Unmanned Surface Vessels in a Shaking Environment
by Zimu Wang, Xiunan Li, Peng Chen, Dan Luo, Gang Zheng and Xin Chen
Remote Sens. 2024, 16(22), 4220; https://doi.org/10.3390/rs16224220 - 12 Nov 2024
Viewed by 568
Abstract
Aiming to address errors in the estimation of the position and attitude of an unmanned vessel, especially during vibration, where the rapid loss of feature point information hinders continuous attitude estimation and global trajectory mapping, this paper improves the monocular ORB-SLAM framework based [...] Read more.
Aiming to address errors in the estimation of the position and attitude of an unmanned vessel, especially during vibration, where the rapid loss of feature point information hinders continuous attitude estimation and global trajectory mapping, this paper improves the monocular ORB-SLAM framework based on the characteristics of the marine environment. In general, we extract the location area of the artificial sea target in the video, build a virtual feature set for it, and filter the background features. When shaking occurs, GNSS information is combined and the target feature set is used to complete the map reconstruction task. Specifically, firstly, the sea target area of interest is detected by YOLOv5, and the feature extraction and matching method is optimized in the front-end tracking stage to adapt to the sea environment. In the key frame selection and local map optimization stage, the characteristics of the feature set are improved to further improve the positioning accuracy, to provide more accurate position and attitude information about the unmanned platform. We use GNSS information to provide the scale and world coordinates for the map. Finally, the target distance is measured by the beam ranging method. In this paper, marine unmanned platform data, GNSS, and AIS position data are autonomously collected, and experiments are carried out using the proposed marine ranging system. Experimental results show that the maximum measurement error of this method is 9.2%, and the average error is 4.7%. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of our method. The maize-yellow parallelogram represents input and output, the blue rectangle represents methods, and the red diamond represents special cases.</p>
Full article ">Figure 2
<p>YOLOv5 network structure.</p>
Full article ">Figure 3
<p>Ship target extraction of the YOLOv5 network.</p>
Full article ">Figure 4
<p>The three processes of video, mapping, and set update, and the strategy to be adopted when tracking loss occurs due to shaking. The red stars represent feature points that are not on the ship.</p>
Full article ">Figure 5
<p>Triangulation ranging.</p>
Full article ">Figure 6
<p>Video sequence acquisition route. Gray ship is the main positioning vessel and red line is the camera route. The red arrow represents the camera path, and the black ship is the camera’s main observation ship “Yihai 157”.</p>
Full article ">Figure 7
<p>There are many water scenes in the video sequence data set collected in <a href="#remotesensing-16-04220-f007" class="html-fig">Figure 7</a>, and the ships are in a static anchored state. The whole route ensures that the video can obtain complete ship information, from the bow to the stern as a whole. The bottom right image is the video frame at the moment when the camera shaking occurs due to the natural undulations of the sea.</p>
Full article ">Figure 8
<p>Ship detection.</p>
Full article ">Figure 9
<p>Object-based feature set establishment. The green points are the extracted feature points.</p>
Full article ">Figure 10
<p>An overview of the three methods. The red lines represent the path of the ship and the blue points are the target feature points. Ours (<b>a</b>). Yolo_ORBSLAM (<b>b</b>). Water segmentation SLAM mapping (<b>c</b>). U-net dynamic SLAM (<b>d</b>).</p>
Full article ">Figure 11
<p>Real distance and actual distance of sampling frame.</p>
Full article ">Figure 12
<p>The analysis of USV movement path from the top view is clearer, and our method (<b>a</b>) effectively overcomes the shaking part. It can be clearly seen that there is a large shift at the end of YOLO_ORBSLAM (<b>b</b>), and the SLAM method of water segmentation (<b>c</b>) has poor performance. U-net dynamic SLAM (<b>d</b>) is stable for the first part but it finally stops at a shaking moment.</p>
Full article ">
7 pages, 1148 KiB  
Proceeding Paper
A Novel Method to Improve the Efficiency and Performance of Cloud-Based Visual Simultaneous Localization and Mapping
by Omar M. Salih, Hussam Rostum and József Vásárhelyi
Eng. Proc. 2024, 79(1), 78; https://doi.org/10.3390/engproc2024079078 - 11 Nov 2024
Viewed by 255
Abstract
Since Visual Simultaneous Localization and Mapping (VSLAM) inherently requires intensive computational operations and consumes many hardware resources, these limitations pose challenges to implementing the entire VSLAM architecture within limited processing power and battery capacity. This paper proposes a novel solution to improve the [...] Read more.
Since Visual Simultaneous Localization and Mapping (VSLAM) inherently requires intensive computational operations and consumes many hardware resources, these limitations pose challenges to implementing the entire VSLAM architecture within limited processing power and battery capacity. This paper proposes a novel solution to improve the efficiency and performance of exchanging data between the unmanned aerial vehicle (UAV) and the cloud server. First, an adaptive ORB (oriented FAST and rotated BRIEF) method is proposed for precise tracking, mapping, and re-localization. Second, efficient visual data encoding and decoding methods are proposed for exchanging the data between the edge device and the UAV. The results show an improvement in the trajectory RMSE and accurate tracking using the adaptive ORB-SLAM. Furthermore, the proposed visual data encoding and decoding showed an outstanding performance compared with the most used standard JPEG-based system over high quantization ratios. Full article
Show Figures

Figure 1

Figure 1
<p>Adaptive ORB-SLAM architecture.</p>
Full article ">Figure 2
<p>The proposed visual data algorithms for (<b>a</b>) encoding and (<b>b</b>) decoding.</p>
Full article ">Figure 3
<p>The graph of the comparison between the estimated trajectory and the actual camera trajectory.</p>
Full article ">
17 pages, 3301 KiB  
Article
Stereo and LiDAR Loosely Coupled SLAM Constrained Ground Detection
by Tian Sun, Lei Cheng, Ting Zhang, Xiaoping Yuan, Yanzheng Zhao and Yong Liu
Sensors 2024, 24(21), 6828; https://doi.org/10.3390/s24216828 - 24 Oct 2024
Viewed by 549
Abstract
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain [...] Read more.
In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain limitations. This paper proposes a hybrid localization and mapping method using stereo vision and LiDAR. Unlike the traditional single-sensor systems, we construct a pose optimization model by matching ground information between LiDAR maps and visual images. We use stereo vision to extract ground information and fuse it with LiDAR tensor voting data to establish coplanarity constraints. Pose optimization is achieved through a graph-based optimization algorithm and a local window optimization method. The proposed method is evaluated using the KITTI dataset and compared against the ORB-SLAM3, F-LOAM, LOAM, and LeGO-LOAM methods. Additionally, we generate 3D point cloud maps for the corresponding sequences and high-definition point cloud maps of the streets in sequence 00. The experimental results demonstrate significant improvements in trajectory accuracy and robustness, enabling the construction of clear, dense 3D maps. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

Figure 1
<p>Pose optimization based on ground information. <span class="html-italic">T</span>, <span class="html-italic">p</span>, and <span class="html-italic">q</span> represent the transformation matrix, points on the plane, and points off the plane, respectively.</p>
Full article ">Figure 2
<p>The stereo sensor model and the coordinate systems used [<a href="#B34-sensors-24-06828" class="html-bibr">34</a>].</p>
Full article ">Figure 3
<p>Region of interest extraction. (<b>a</b>) Left image. (<b>b</b>) Right image. (<b>c</b>) Disparity image. (<b>d</b>) v-disparity. (<b>e</b>) u-disparity. (<b>d</b>,<b>e</b>) are derived from (<b>c</b>). (<b>f</b>) Large obstacles removed by removing peak values from (<b>e</b>). (<b>g</b>) v-disparity based on (<b>f</b>), and red line is disparity profile of ground plane. (<b>h</b>) Detected ground plane and region of interest (RoI); RoI in red box. (<b>i</b>) City 3D reconstruction; green represents ground.</p>
Full article ">Figure 4
<p>Graph-structure optimization. <span class="html-italic">P</span> represents the nodes of visual points, and <span class="html-italic">X</span> represents the pose of the frame. “Ground” denotes the ground information extracted from the 3D reconstruction.</p>
Full article ">Figure 5
<p>Trajectory estimates in the KITTI dataset. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 6
<p>High-definition display of point clouds for some streets in the 00 sequence. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 7
<p>3D reconstruction based on road constraints, where green represents the road. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 7 Cont.
<p>3D reconstruction based on road constraints, where green represents the road. (<b>a</b>) 00. (<b>b</b>) 01. (<b>c</b>) 05. (<b>d</b>) 07. (<b>e</b>) 08. (<b>f</b>) 09.</p>
Full article ">Figure 8
<p>High-definition display of point clouds for some streets in the 00 sequence.The image in the top left corner is a 3D reconstruction of the entire city, and the other images depict details of its streets (<b>a</b>–<b>e</b>).</p>
Full article ">
19 pages, 12975 KiB  
Article
Enhancing Real-Time Visual SLAM with Distant Landmarks in Large-Scale Environments
by Hexuan Dou, Xinyang Zhao, Bo Liu, Yinghao Jia, Guoqing Wang and Changhong Wang
Drones 2024, 8(10), 586; https://doi.org/10.3390/drones8100586 - 16 Oct 2024
Viewed by 966
Abstract
The efficacy of visual Simultaneous Localization and Mapping (SLAM) diminishes in large-scale environments due to challenges in identifying distant landmarks, leading to a limited perception range and trajectory drift. This paper presents a practical method to enhance the accuracy of feature-based real-time visual [...] Read more.
The efficacy of visual Simultaneous Localization and Mapping (SLAM) diminishes in large-scale environments due to challenges in identifying distant landmarks, leading to a limited perception range and trajectory drift. This paper presents a practical method to enhance the accuracy of feature-based real-time visual SLAM for compact unmanned vehicles by constructing distant map points. By tracking consecutive image features across multiple frames, remote map points are generated with sufficient parallax angles, extending the mapping scope to the theoretical maximum range. Observations of these landmarks from preceding keyframes are supplemented accordingly, improving back-end optimization and, consequently, localization accuracy. The effectiveness of this approach is ensured by the introduction of the virtual map point, a proposed data structure that links relational features to an imaginary map point, thereby maintaining the constrained size of local optimization during triangulation. Based on the ORB-SLAM3 code, a SLAM system incorporating the proposed method is implemented and tested. Experimental results on drone and vehicle datasets demonstrate that the proposed method outperforms ORB-SLAM3 in both accuracy and perception range with negligible additional processing time, thus preserving real-time performance. Field tests using a UGV further validate the efficacy of the proposed method. Full article
(This article belongs to the Section Drone Design and Development)
Show Figures

Figure 1

Figure 1
<p>Triangulation of a spatial point. The uncertainty of triangulation is represented by the gray area. The case illustrated on the right has a shorter baseline <span class="html-italic">b</span> and/or a smaller parallax angle <math display="inline"><semantics> <mi>θ</mi> </semantics></math>, resulting in greater uncertainty in the localization of <math display="inline"><semantics> <mi mathvariant="normal">P</mi> </semantics></math>.</p>
Full article ">Figure 2
<p>Extending the perception range to the maximum by utilizing features from keyframes beyond the local mapping range.</p>
Full article ">Figure 3
<p>The relationship between the perception ranges of the camera and the structure of the matrix <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">H</mi> <mrow> <mi>C</mi> <mi>P</mi> </mrow> </msub> </semantics></math>. The case in (<b>a</b>) has shorter perception range, with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">H</mi> <mrow> <mi>C</mi> <mi>P</mi> </mrow> </msub> </semantics></math> represented as (<b>b</b>). While the case in (<b>c</b>) has longer perception range, with <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">H</mi> <mrow> <mi>C</mi> <mi>P</mi> </mrow> </msub> </semantics></math> augmented by the commonly observed landmarks <math display="inline"><semantics> <msub> <mi mathvariant="script">P</mi> <mrow> <mi>a</mi> <mi>b</mi> <mi>c</mi> </mrow> </msub> </semantics></math>, represented as (<b>d</b>). Note that many of the block matrices in dark blue remain zero due to the absence of observation between corresponding camera pose <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">C</mi> <mi>i</mi> </msub> </semantics></math> and landmark <math display="inline"><semantics> <msub> <mi mathvariant="bold-italic">P</mi> <mi>j</mi> </msub> </semantics></math>.</p>
Full article ">Figure 4
<p>A diagram of a virtual map point. (i) The virtual map point <math display="inline"><semantics> <mi mathvariant="normal">V</mi> </semantics></math> is constructed from features matched from two adjacent frames. (ii) As the camera moves, features and back-projected rays in subsequent frames are associated with <math display="inline"><semantics> <mi mathvariant="normal">V</mi> </semantics></math>, allowing for the calculation of the maximum parallax angle. (iii) The spatial coordinates of <math display="inline"><semantics> <mi mathvariant="normal">V</mi> </semantics></math> are triangulated when the parallax angle exceeds the threshold <math display="inline"><semantics> <msub> <mi>θ</mi> <mi>th</mi> </msub> </semantics></math>. (iv) Subsequently, a map point <math display="inline"><semantics> <mi mathvariant="normal">P</mi> </semantics></math> is constructed from <math display="inline"><semantics> <mi mathvariant="normal">V</mi> </semantics></math>, inheriting the observation relationships with frames ranging from <math display="inline"><semantics> <msub> <mi mathvariant="normal">F</mi> <mn>0</mn> </msub> </semantics></math> to <math display="inline"><semantics> <msub> <mi mathvariant="normal">F</mi> <mi>i</mi> </msub> </semantics></math>.</p>
Full article ">Figure 5
<p>An overview of the proposed SLAM system. The management of virtual map points is integrated into the “Local Mapping” thread of ORB-SLAM3.</p>
Full article ">Figure 6
<p>Keyframe trajectory comparison between ORB-SLAM3 (ORB3) and the proposed method (MOD) alongside groundtruth (GT) from the top view in sequences of (<b>a</b>) EuRoC Machine Hall 05, (<b>b</b>) KITTI 00, (<b>c</b>) KAIST 32, (<b>d</b>) KAIST 27, (<b>e</b>) KAIST 33, (<b>f</b>) KAIST 39 and (<b>g</b>) UZH MAV during the dataset test.</p>
Full article ">Figure 7
<p>The distribution of triangulation depth of map points in dataset tests: (<b>a</b>) Comparison between ORB-SLAM3 (ORB3) and the proposed method (MOD); (<b>b</b>) comparison between normal map points and map points converted from virtual map points within the proposed method.</p>
Full article ">Figure 8
<p>The observation on map points in the 2776th frame of the sequence KAIST 26 by (<b>a</b>) ORB-SLAM3 and (<b>b</b>) the proposed method, and in the 2979th frame by (<b>c</b>) ORB-SLAM3 and (<b>d</b>) the proposed method, with normal map points denoted in green and converted virtual map points denoted in red.</p>
Full article ">Figure 9
<p>Comparison of statistics on mean tracking time across the dataset and field tests between ORB-SLAM3 (ORB) and the proposed method (MOD).</p>
Full article ">Figure 10
<p>(<b>a</b>) The experimental platform on a wheeled robot. Intel D435i collects the monocular image sequence and ComNav M100 offers groundtruth. (<b>b</b>) A sample of collected image sequences, which contains parts of distant landmarks. The sequences are collected in campus scenes of (<b>c</b>) yard, (<b>d</b>) road and (<b>e</b>) park. Location: Science Park, Harbin Institute of Technology.</p>
Full article ">Figure 11
<p>Keyframe trajectory comparison between ORB-SLAM3 (ORB3) and the proposed method (MOD) alongside groundtruth (GT) from the top view in scenes of a (<b>a</b>) yard, (<b>b</b>) road and (<b>c</b>) park during the field test.</p>
Full article ">Figure 12
<p>The distribution of triangulation depth of map points in field tests: (<b>a</b>) Comparison between ORB-SLAM3 (ORB3) and the proposed method (MOD); (<b>b</b>) comparison between normal map points and map points converted from virtual map points within the proposed method.</p>
Full article ">
20 pages, 6262 KiB  
Article
YPR-SLAM: A SLAM System Combining Object Detection and Geometric Constraints for Dynamic Scenes
by Xukang Kan, Gefei Shi, Xuerong Yang and Xinwei Hu
Sensors 2024, 24(20), 6576; https://doi.org/10.3390/s24206576 - 12 Oct 2024
Viewed by 574
Abstract
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. [...] Read more.
Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. First of all, the system includes a lightweight YOLOv5 detection network for detecting both dynamic and static objects, which provides pre-dynamic object information to the SLAM system. Secondly, utilizing the prior information of dynamic targets and the depth image, a method of geometric constraint for removing motion feature points from the depth image is proposed. The Depth-PROSAC algorithm is used to differentiate the dynamic and static feature points so that dynamic feature points can be removed. At last, the dense cloud map is constructed by the static feature points. The YPR-SLAM system is an efficient combination of object detection and geometry constraint in a tightly coupled way, eliminating motion feature points and minimizing their adverse effects on SLAM systems. The performance of the YPR-SLAM was assessed on the public TUM RGB-D dataset, and it was found that YPR-SLAM was suitable for dynamic situations. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>Framework of YPR-SLAM System. The blue section is ORB-SLAM2, and the orange section is the addition of this paper.</p>
Full article ">Figure 2
<p>The YOLOv5 network architecture.</p>
Full article ">Figure 3
<p>Dynamic target detection and filtering thread. First, the ORB feature point is extracted from the RGB image by the tracking thread. Next, the dynamic target detection thread identifies potential dynamic target areas, and then the Depth-PROSAC algorithm is applied to filter out dynamic feature points. Finally, the static feature points are retained for subsequent pose estimation.</p>
Full article ">Figure 4
<p>The comparison between target detection algorithms and the Depth-PROSAC algorithm in filtering out dynamic feature points. (<b>a</b>) shows that the object detection method directly filters out dynamic feature points, and (<b>b</b>) shows that the Depth-PROSAC algorithm filters out dynamic feature points.</p>
Full article ">Figure 5
<p>Dense point cloud construction workflow.</p>
Full article ">Figure 6
<p>In the fr3_walking_halfsphere sequence, the YPR-SLAM and ORB-SLAM2 systems were used to estimate the 3D motion of the camera. (<b>a</b>) Camera path estimated by ORB-SLAM2; (<b>b</b>) YPR-SLAM estimation of camera trajectory.</p>
Full article ">Figure 7
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 7 Cont.
<p><span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> of the ORB-SLAM2 system and the YPR-SLAM system under different datasets. (<b>a1</b>,<b>a2</b>,<b>c1</b>,<b>c2</b>,<b>e1</b>,<b>e2</b>,<b>g1</b>,<b>g2</b>) represent ATE and RPE obtained by the ORB-SLAM2 system by running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>b1</b>,<b>b2</b>,<b>d1</b>,<b>d2</b>,<b>f1</b>,<b>f2</b>,<b>h1</b>,<b>h2</b>) represent <span class="html-italic">ATE</span> and <span class="html-italic">RPE</span> plots of the YPR-SLAM system running fre3_sitting_static, fre3_walking_static, fre3_walking_halfsphere, and fre3_walking_xyz, respectively. (<b>a1</b>,<b>b1</b>,<b>c1</b>,<b>d1</b>,<b>e1</b>,<b>f1</b>,<b>g1</b>,<b>h1</b>) represent ATE plots. (<b>a2</b>,<b>b2</b>,<b>c2</b>,<b>d2</b>,<b>e2</b>,<b>f2</b>,<b>g2</b>,<b>h2</b>) represent <span class="html-italic">RPE</span> plots.</p>
Full article ">Figure 8
<p>Using ORB-SLAM2 and YPR-SLAM to construct dense 3D point cloud map in dynamic scene sequence fre3_walking_xyz. (<b>a</b>) represents a dense 3D point cloud map constructed by the ORB-SLAM2 system; (<b>b</b>) represents a dense 3D point cloud map constructed by the YPR-SLAM system.</p>
Full article ">
24 pages, 4712 KiB  
Article
Balancing Efficiency and Accuracy: Enhanced Visual Simultaneous Localization and Mapping Incorporating Principal Direction Features
by Yuelin Yuan, Fei Li, Xiaohui Liu and Jialiang Chen
Appl. Sci. 2024, 14(19), 9124; https://doi.org/10.3390/app14199124 - 9 Oct 2024
Viewed by 721
Abstract
In visual Simultaneous Localization and Mapping (SLAM), operational efficiency and localization accuracy are equally crucial evaluation metrics. We propose an enhanced visual SLAM method to ensure stable localization accuracy while improving system efficiency. It can maintain localization accuracy even after reducing the number [...] Read more.
In visual Simultaneous Localization and Mapping (SLAM), operational efficiency and localization accuracy are equally crucial evaluation metrics. We propose an enhanced visual SLAM method to ensure stable localization accuracy while improving system efficiency. It can maintain localization accuracy even after reducing the number of feature pyramid levels by 50%. Firstly, we innovatively incorporate the principal direction error, which represents the global geometric features of feature points, into the error function for pose estimation, utilizing Pareto optimal solutions to improve the localization accuracy. Secondly, for loop-closure detection, we construct a feature matrix by integrating the grayscale and gradient direction of an image. This matrix is then dimensionally reduced through aggregation, and a multi-layer detection approach is employed to ensure both efficiency and accuracy. Finally, we optimize the feature extraction levels and integrate our method into the visual system to speed up the extraction process and mitigate the impact of the reduced levels. We comprehensively evaluate the proposed method on local and public datasets. Experiments show that the SLAM method maintained high localization accuracy after reducing the tracking time by 24% compared with ORB SLAM3. Additionally, the proposed loop-closure-detection method demonstrated superior computational efficiency and detection accuracy compared to the existing methods. Full article
(This article belongs to the Special Issue Mobile Robotics and Autonomous Intelligent Systems)
Show Figures

Figure 1

Figure 1
<p>The visual SLAM framework. The main contributions of our work are highlighted within the yellow dashed box. (<b>a</b>) Pose estimation incorporating principal direction information; (<b>b</b>) descriptor extraction; (<b>c</b>) similarity calculation and loop-closure detection.</p>
Full article ">Figure 2
<p>Pose estimation incorporating principal direction information. The small rectangles represent feature points, and the straight lines with arrows represent the principal directions of the feature points.</p>
Full article ">Figure 3
<p>Loop-closure detection based on aggregation descriptors. The left side shows the aggregation feature descriptor extraction process, while the right side shows the loop-closure-detection process.</p>
Full article ">Figure 4
<p>The illustration of principal direction feature projection. The purple rectangles represent feature points, while the pink lines indicate the principal directions of the feature points.</p>
Full article ">Figure 5
<p>Aggregation feature descriptor generation process. Each image generates two feature vectors, aggregated from the feature matrix along the row and column directions.</p>
Full article ">Figure 6
<p>Examples of scenes in dataset. (<b>a</b>) Around-view image; (<b>b</b>) fast motion and uneven feature distribution; (<b>c</b>) low luminance; (<b>d</b>) high luminance; (<b>e</b>) image rotation.</p>
Full article ">Figure 7
<p>Experimental trajectory and loop-closure ground truth. From left to right: sequences 00, 05, 06, 07 from KITTI.</p>
Full article ">Figure 8
<p>Experimental trajectory and loop-closure ground truth in local datasets.</p>
Full article ">Figure 9
<p>Precision–recall curves based on 00 sequence (<b>left</b>) and local dataset (<b>right</b>).</p>
Full article ">Figure 10
<p>Maximum recall at 100% precision.</p>
Full article ">Figure 11
<p>Schematic of image processing. The left image is the original, and the right image has luminance reduced to 15%.</p>
Full article ">Figure 12
<p>Maximum recall at 100% precision in low-luminance scenarios.</p>
Full article ">Figure 13
<p>Comparison of trajectory results. The red box indicates a locally magnified trajectory. The left and right images correspond to sequences 05 and 06, both from the KITTI dataset.</p>
Full article ">
23 pages, 9746 KiB  
Article
Research on SLAM Localization Algorithm for Orchard Dynamic Vision Based on YOLOD-SLAM2
by Zhen Ma, Siyuan Yang, Jingbin Li and Jiangtao Qi
Agriculture 2024, 14(9), 1622; https://doi.org/10.3390/agriculture14091622 - 16 Sep 2024
Cited by 1 | Viewed by 749
Abstract
With the development of agriculture, the complexity and dynamism of orchard environments pose challenges to the perception and positioning of inter-row environments for agricultural vehicles. This paper proposes a method for extracting navigation lines and measuring pedestrian obstacles. The improved YOLOv5 algorithm is [...] Read more.
With the development of agriculture, the complexity and dynamism of orchard environments pose challenges to the perception and positioning of inter-row environments for agricultural vehicles. This paper proposes a method for extracting navigation lines and measuring pedestrian obstacles. The improved YOLOv5 algorithm is used to detect tree trunks between left and right rows in orchards. The experimental results show that the average angle deviation of the extracted navigation lines was less than 5 degrees, verifying its accuracy. Due to the variable posture of pedestrians and ineffective camera depth, a distance measurement algorithm based on a four-zone depth comparison is proposed for pedestrian obstacle distance measurement. Experimental results showed that within a range of 6 m, the average relative error of distance measurement did not exceed 1%, and within a range of 9 m, the maximum relative error was 2.03%. The average distance measurement time was 30 ms, which could accurately and quickly achieve pedestrian distance measurement in orchard environments. On the publicly available TUM RGB-D dynamic dataset, YOLOD-SLAM2 significantly reduced the RMSE index of absolute trajectory error compared to the ORB-SLAM2 algorithm, which was less than 0.05 m/s. In actual orchard environments, YOLOD-SLAM2 had a higher degree of agreement between the estimated trajectory and the true trajectory when the vehicle was traveling in straight and circular directions. The RMSE index of the absolute trajectory error was less than 0.03 m/s, and the average tracking time was 47 ms, indicating that the YOLOD-SLAM2 algorithm proposed in this paper could meet the accuracy and real-time requirements of agricultural vehicle positioning in orchard environments. Full article
(This article belongs to the Section Agricultural Technology)
Show Figures

Figure 1

Figure 1
<p>Pinhole Camera Model.</p>
Full article ">Figure 2
<p>Pixel Coordinate System.</p>
Full article ">Figure 3
<p>YOLOv5s algorithm network structure.</p>
Full article ">Figure 4
<p>Overall principle block diagram of ORB-SLAM2.</p>
Full article ">Figure 5
<p>Algorithm flowchart.</p>
Full article ">Figure 6
<p>Reference point of tree trunk.</p>
Full article ">Figure 7
<p>Problem diagram of extracting navigation lines in Scenario 1.</p>
Full article ">Figure 8
<p>Problem diagram of extracting navigation lines in scenario 2.</p>
Full article ">Figure 9
<p>Navigation line extraction diagram.</p>
Full article ">Figure 10
<p>Navigation line extraction in different orchard scenes.</p>
Full article ">Figure 11
<p>Schematic diagram of evaluation indicators.</p>
Full article ">Figure 12
<p>Angle deviation.</p>
Full article ">Figure 13
<p>Schematic diagram of algorithm flow.</p>
Full article ">Figure 14
<p>Pedestrian Distance Measurement Experiment.</p>
Full article ">Figure 14 Cont.
<p>Pedestrian Distance Measurement Experiment.</p>
Full article ">Figure 15
<p>YOLOD-SLAM2 Algorithm Framework.</p>
Full article ">Figure 16
<p>Left image and depth map of binocular camera.</p>
Full article ">Figure 17
<p>Geometric Constraints of Polar Lines.</p>
Full article ">Figure 18
<p>Comparison of Algorithm Trajectory.</p>
Full article ">Figure 19
<p>Actual Orchard Environment Trajectory Map.</p>
Full article ">Figure 20
<p>Trajectory diagram of the straight-line driving in x, y, and z directions.</p>
Full article ">Figure 21
<p>Trajectory diagram of circular driving in the x, y, and z directions.</p>
Full article ">
18 pages, 5473 KiB  
Article
Visual-Inertial RGB-D SLAM with Encoder Integration of ORB Triangulation and Depth Measurement Uncertainties
by Zhan-Wu Ma and Wan-Sheng Cheng
Sensors 2024, 24(18), 5964; https://doi.org/10.3390/s24185964 - 14 Sep 2024
Viewed by 927
Abstract
In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct [...] Read more.
In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct measurements from RGB-D depth cameras, which inherently contain measurement errors. Moreover, the potential of triangulation-based estimation for ORB (Oriented FAST and Rotated BRIEF) feature points remains underutilized. To address the singularity of measurement data, this paper proposes the integration of the ORB features, triangulation uncertainty estimation and depth measurements uncertainty estimation, for 3D positions of feature points. This integration is achieved using a CI (Covariance Intersection) filter, referred to as the CI-TEDM (Triangulation Estimates and Depth Measurements) method. Vision-based SLAM systems face significant challenges, particularly in environments, such as long straight corridors, weakly textured scenes, or during rapid motion, where tracking failures are common. To enhance the stability of visual SLAM, this paper introduces an improved CI-TEDM method by incorporating wheel encoder data. The mathematical model of the encoder is proposed, and detailed derivations of the encoder pre-integration model and error model are provided. Building on these improvements, we propose a novel tightly coupled visual-inertial RGB-D SLAM with encoder integration of ORB triangulation and depth measurement uncertainties. Validation on open-source datasets and real-world environments demonstrates that the proposed improvements significantly enhance the robustness of real-time state estimation and localization accuracy for intelligent vehicles in challenging environments. Full article
Show Figures

Figure 1

Figure 1
<p>The system framework diagram. The system framework diagram consists of three main modules: input, function, and output.</p>
Full article ">Figure 2
<p>An example diagram of reprojection error. Feature matching indicates that points <math display="inline"><semantics> <mrow> <mi>p</mi> <mn>1</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>p</mi> <mn>2</mn> </mrow> </semantics></math> are projections of the same spatial point <math display="inline"><semantics> <mi>p</mi> </semantics></math>, but the camera pose is initially unknown. Initially, there is a certain distance between the projected point, <math display="inline"><semantics> <mrow> <mover> <mi>p</mi> <mo>∧</mo> </mover> <mn>2</mn> </mrow> </semantics></math>, of <math display="inline"><semantics> <mi>P</mi> </semantics></math> and the actual point, <math display="inline"><semantics> <mrow> <mi>p</mi> <mn>2</mn> </mrow> </semantics></math>. The camera pose is then adjusted to minimize this distance.</p>
Full article ">Figure 3
<p>The motion model of the wheeled robot using wheel encoders. The figure illustrates the motion model of a mobile robot using wheel encoders in a 2D plane. The model describes the robot’s trajectory between its position at time <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mi>k</mi> </msub> </mrow> </semantics></math>, denoted as <math display="inline"><semantics> <mrow> <mfenced> <mrow> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </mfenced> </mrow> </semantics></math>, and its position at time <math display="inline"><semantics> <mrow> <msub> <mi>t</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </semantics></math>, denoted as <math display="inline"><semantics> <mrow> <mfenced> <mrow> <msub> <mi>x</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>,</mo> <msub> <mi>y</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> </mrow> </mfenced> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>The process of running datasets in the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene. The blue frames represent keyframes, red frames represent initial keyframes, and green frames represent current frames.</p>
Full article ">Figure 5
<p>The process of tracking datasets in the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene. The green boxes in the figure represent key feature points detected by VEOS3-TEDM algorithm.</p>
Full article ">Figure 6
<p>The comparison between estimated and true trajectory in the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene.</p>
Full article ">Figure 7
<p>The comparison between true and estimated trajectories in <span class="html-italic">x</span>, <span class="html-italic">y</span> and <span class="html-italic">z</span> directions, using the VEOS3-TEDM algorithm: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene.</p>
Full article ">Figure 8
<p>3D point cloud maps: (<b>a</b>) corridor scene and (<b>b</b>) laboratory scene.</p>
Full article ">Figure 9
<p>Images of the experimental platform: (<b>a</b>) front view and (<b>b</b>) left view.</p>
Full article ">Figure 10
<p>The location of various components on the mobile robot: (<b>a</b>) bottom level and (<b>b</b>) upper level.</p>
Full article ">Figure 11
<p>The process of tracking real-world environments in the VEOS3-TEDM algorithm: (<b>a1</b>,<b>a2</b>) laboratory, (<b>b1</b>,<b>b2</b>) hall, (<b>c1</b>,<b>c2</b>) weak texture scene, (<b>d1</b>,<b>d2</b>) long straight corridor. The green boxes in the figure represent key feature points detected by VEOS3-TEDM algorithm.</p>
Full article ">Figure 12
<p>A comparison of estimated and true trajectories in real-world environments using the VEOS3-TEDM algorithm.</p>
Full article ">
19 pages, 20386 KiB  
Article
YOD-SLAM: An Indoor Dynamic VSLAM Algorithm Based on the YOLOv8 Model and Depth Information
by Yiming Li, Yize Wang, Liuwei Lu and Qi An
Electronics 2024, 13(18), 3633; https://doi.org/10.3390/electronics13183633 - 12 Sep 2024
Viewed by 747
Abstract
Aiming at the problems of low positioning accuracy and poor mapping effect of the visual SLAM system caused by the poor quality of the dynamic object mask in an indoor dynamic environment, an indoor dynamic VSLAM algorithm based on the YOLOv8 model and [...] Read more.
Aiming at the problems of low positioning accuracy and poor mapping effect of the visual SLAM system caused by the poor quality of the dynamic object mask in an indoor dynamic environment, an indoor dynamic VSLAM algorithm based on the YOLOv8 model and depth information (YOD-SLAM) is proposed based on the ORB-SLAM3 system. Firstly, the YOLOv8 model obtains the original mask of a priori dynamic objects, and the depth information is used to modify the mask. Secondly, the mask’s depth information and center point are used to a priori determine if the dynamic object has missed detection and if the mask needs to be redrawn. Then, the mask edge distance and depth information are used to judge the movement state of non-prior dynamic objects. Finally, all dynamic object information is removed, and the remaining static objects are used for posing estimation and dense point cloud mapping. The accuracy of camera positioning and the construction effect of dense point cloud maps are verified using the TUM RGB-D dataset and real environment data. The results show that YOD-SLAM has a higher positioning accuracy and dense point cloud mapping effect in dynamic scenes than other advanced SLAM systems such as DS-SLAM and DynaSLAM. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

Figure 1
<p>Overview of YOD-SLAM.</p>
Full article ">Figure 2
<p>The process of modifying prior dynamic object masks using depth information. Figure (<b>a</b>) shows the depth image corresponding to the current frame. Through the algorithm presented in this article, the background area that is excessively covered in (<b>b</b>) is removed in (<b>c</b>). The expanded edges of the human body have achieved better coverage in (<b>d</b>).</p>
Full article ">Figure 3
<p>The process of redrawing the mask of previously missed dynamic objects. Figure (<b>a</b>) shows the depth image corresponding to the current frame. In (<b>b</b>), we can see that people in the distance were not covered by the original mask, resulting in missed detections. The mask in (<b>c</b>) can be obtained by filling in the mask with the depth information specific to that location in (<b>a</b>).</p>
Full article ">Figure 4
<p>The process of excluding prior static objects in motion.</p>
Full article ">Figure 5
<p>Comparison of estimated trajectories and real trajectories of four systems.</p>
Full article ">Figure 6
<p>The results of mask modification on dynamic objects. The three graphs of each column come from the same time as their respective datasets. The first line is the depth image corresponding to the current frame; the second is the original mask obtained by YOLOv8; and the third is the final mask after our modification.</p>
Full article ">Figure 7
<p>Comparison of point cloud maps between ORB-SLAM3 and YOD-SLAM in two sets of highly dynamic sequences.</p>
Full article ">Figure 7 Cont.
<p>Comparison of point cloud maps between ORB-SLAM3 and YOD-SLAM in two sets of highly dynamic sequences.</p>
Full article ">Figure 8
<p>Comparison of point cloud maps between ORB-SLAM3 and YOD-SLAM in low- and static dynamic sequences, where fr2/desk/p is a low-dynamic scene, while fr2/rpy is a static scene.</p>
Full article ">Figure 9
<p>Intel RealSense Depth Camera D455.</p>
Full article ">Figure 10
<p>Mask processing and ORB feature point extraction in real laboratory environments. Several non English exhibition boards are leaning against the wall to simulate typical indoor environments. The facial features of the characters have been treated with confidentiality.</p>
Full article ">Figure 11
<p>Comparison of dense point cloud mapping between ORB-SLAM3 and YOD-SLAM in real laboratory environments. We marked the map areas affected by dynamic objects with red circles.</p>
Full article ">
24 pages, 1413 KiB  
Article
Loop Detection Method Based on Neural Radiance Field BoW Model for Visual Inertial Navigation of UAVs
by Xiaoyue Zhang, Yue Cui, Yanchao Ren, Guodong Duan and Huanrui Zhang
Remote Sens. 2024, 16(16), 3038; https://doi.org/10.3390/rs16163038 - 19 Aug 2024
Viewed by 680
Abstract
The loop closure detection (LCD) methods in Unmanned Aerial Vehicle (UAV) Visual Inertial Navigation System (VINS) are often affected by issues such as insufficient image texture information and limited observational perspectives, resulting in constrained UAV positioning accuracy and reduced capability to perform complex [...] Read more.
The loop closure detection (LCD) methods in Unmanned Aerial Vehicle (UAV) Visual Inertial Navigation System (VINS) are often affected by issues such as insufficient image texture information and limited observational perspectives, resulting in constrained UAV positioning accuracy and reduced capability to perform complex tasks. This study proposes a Bag-of-Words (BoW) LCD method based on Neural Radiance Field (NeRF), which estimates camera poses from existing images and achieves rapid scene reconstruction through NeRF. A method is designed to select virtual viewpoints and render images along the flight trajectory using a specific sampling approach to expand the limited observational angles, mitigating the impact of image blur and insufficient texture information at specific viewpoints while enlarging the loop closure candidate frames to improve the accuracy and success rate of LCD. Additionally, a BoW vector construction method that incorporates the importance of similar visual words and an adapted virtual image filtering and comprehensive scoring calculation method are designed to determine loop closures. Applied to VINS-Mono and ORB-SLAM3, and compared with the advanced BoW model LCDs of the two systems, results indicate that the NeRF-based BoW LCD method can detect more than 48% additional accurate loop closures, while the system’s navigation positioning error mean is reduced by over 46%, validating the effectiveness and superiority of the proposed method and demonstrating its significant importance for improving the navigation accuracy of VINS. Full article
Show Figures

Figure 1

Figure 1
<p>Framework of the BoW LCD method based on NeRF and its position in VINS.</p>
Full article ">Figure 2
<p>Positions of the central pixel and surrounding pixels.</p>
Full article ">Figure 3
<p>Example of Instant-NGP virtual view camera pose.</p>
Full article ">Figure 4
<p>Quadtree uniform feature point extraction.</p>
Full article ">Figure 5
<p>Comparison of reconstructed data of three pose estimation schemes.</p>
Full article ">Figure 6
<p>Feature matching effect between real image (<b>left</b>) and synthetic image (<b>right</b>).</p>
Full article ">Figure 7
<p>The loop closure frame detection results for the two approaches used in VINS-Mono.</p>
Full article ">Figure 8
<p>The loop closure frame detection results for the two approaches used in ORB-SLAM3.</p>
Full article ">Figure 9
<p>Example of additional loopback matching results.</p>
Full article ">Figure 10
<p>The ground truth and trajectory of two methods in VINS-Mono.</p>
Full article ">Figure 11
<p>Statistical data of the APE with image frame index in VINS-Mono.</p>
Full article ">Figure 12
<p>The APE statistics of BoW LCD method.</p>
Full article ">Figure 13
<p>The APE statistics of NeRF-based BoW model LCD method.</p>
Full article ">Figure 14
<p>The distribution image of APE in the VINS-Mono system with the color of the trajectory.</p>
Full article ">Figure 15
<p>The ground truth and trajectory of two methods in ORB-SLAM3.</p>
Full article ">Figure 16
<p>Statistical data of the APE with image frame index in ORB-SLAM3.</p>
Full article ">Figure 17
<p>The APE statistics of BoW LCD method in ORB-SLAM3.</p>
Full article ">Figure 18
<p>The APE statistics of NeRF-based BoW model LCD method in ORB-SLAM3.</p>
Full article ">Figure 19
<p>The distribution image of APE in the ORB-SLAM3 system with the color of the trajectory.</p>
Full article ">Figure 20
<p>Distribution of detected loop closures as a function of threshold r.</p>
Full article ">
25 pages, 23704 KiB  
Article
PE-SLAM: A Modified Simultaneous Localization and Mapping System Based on Particle Swarm Optimization and Epipolar Constraints
by Cuiming Li, Zhengyu Shang, Jinxin Wang, Wancai Niu and Ke Yang
Appl. Sci. 2024, 14(16), 7097; https://doi.org/10.3390/app14167097 - 13 Aug 2024
Viewed by 797
Abstract
Due to various typical unstructured factors in the environment of photovoltaic power stations, such as high feature similarity, weak textures, and simple structures, the motion model of the ORB-SLAM2 algorithm performs poorly, leading to a decline in tracking accuracy. To address this issue, [...] Read more.
Due to various typical unstructured factors in the environment of photovoltaic power stations, such as high feature similarity, weak textures, and simple structures, the motion model of the ORB-SLAM2 algorithm performs poorly, leading to a decline in tracking accuracy. To address this issue, we propose PE-SLAM, which improves the ORB-SLAM2 algorithm’s motion model by incorporating the particle swarm optimization algorithm combined with epipolar constraint to eliminate mismatches. First, a new mutation strategy is proposed to introduce perturbations to the pbest (personal best value) during the late convergence stage of the PSO algorithm, thereby preventing the PSO algorithm from falling into local optima. Then, the improved PSO algorithm is used to solve the fundamental matrix between two images based on the feature matching relationships obtained from the motion model. Finally, the epipolar constraint is applied using the computed fundamental matrix to eliminate incorrect matches produced by the motion model, thereby enhancing the tracking accuracy and robustness of the ORB-SLAM2 algorithm in unstructured photovoltaic power station scenarios. In feature matching experiments, compared to the ORB algorithm and the ORB+HAMMING algorithm, the ORB+PE-match algorithm achieved an average accuracy improvement of 19.5%, 14.0%, and 6.0% in unstructured environments, respectively, with better recall rates. In the trajectory experiments of the TUM dataset, PE-SLAM reduced the average absolute trajectory error compared to ORB-SLAM2 by 29.1% and the average relative pose error by 27.0%. In the photovoltaic power station scene mapping experiment, the dense point cloud map constructed has less overlap and is complete, reflecting that PE-SLAM has basically overcome the unstructured factors of the photovoltaic power station scene and is suitable for applications in this scene. Full article
(This article belongs to the Special Issue Autonomous Vehicles and Robotics)
Show Figures

Figure 1

Figure 1
<p>Photovoltaic power plant scene images.</p>
Full article ">Figure 2
<p>PE-SLAM system framework diagram.</p>
Full article ">Figure 3
<p>Schematic diagram of motion model.</p>
Full article ">Figure 4
<p>Schematic diagram of epipolar constraint.</p>
Full article ">Figure 5
<p>Schematic diagram of matching point pairs for pole constraint verification.</p>
Full article ">Figure 6
<p>Improved mutation strategy for particle perturbation.</p>
Full article ">Figure 7
<p>Particle update schematic.</p>
Full article ">Figure 8
<p>Optical images of OxfordVGG dataset.</p>
Full article ">Figure 9
<p>Influence of important parameters on accuracy and time consuming.</p>
Full article ">Figure 9 Cont.
<p>Influence of important parameters on accuracy and time consuming.</p>
Full article ">Figure 10
<p>Line chart of fitness change.</p>
Full article ">Figure 11
<p>Matching accuracy based on the data.</p>
Full article ">Figure 12
<p>The recall rate of matching results based on the dataset.</p>
Full article ">Figure 13
<p>Unstructured scenarios for datasets.</p>
Full article ">Figure 14
<p>Trajectory error plot of the structure_notexture sequence.</p>
Full article ">Figure 15
<p>Trajectory error plot of the nostructure_texture sequence.</p>
Full article ">Figure 16
<p>Trajectory error plot of the structure_texture sequence.</p>
Full article ">Figure 17
<p>The 3D map generated from the structure_notexture sequence.</p>
Full article ">Figure 18
<p>The 3D map generated from the nostructure_texture sequence.</p>
Full article ">Figure 19
<p>The 3D map generated from the structure_texture sequence.</p>
Full article ">Figure 20
<p>Robot model based on Gazebo.</p>
Full article ">Figure 21
<p>Photovoltaic power station simulation scene based on Gazebo.</p>
Full article ">Figure 22
<p>Trajectory error plot of the Si.</p>
Full article ">Figure 23
<p>The 3D map generated from the Si.</p>
Full article ">Figure 24
<p>Partial images of indoor scenes.</p>
Full article ">Figure 25
<p>Partial images of outdoor scenes.</p>
Full article ">Figure 26
<p>The 3D map generated from In sequence.</p>
Full article ">Figure 27
<p>The 3D map generated from Real sequence.</p>
Full article ">
21 pages, 12275 KiB  
Article
Segmentation Point Simultaneous Localization and Mapping: A Stereo Vision Simultaneous Localization and Mapping Method for Unmanned Surface Vehicles in Nearshore Environments
by Xiujing Gao, Xinzhi Lin, Fanchao Lin and Hongwu Huang
Electronics 2024, 13(16), 3106; https://doi.org/10.3390/electronics13163106 - 6 Aug 2024
Cited by 1 | Viewed by 1098
Abstract
Unmanned surface vehicles (USVs) in nearshore areas are prone to environmental occlusion and electromagnetic interference, which can lead to the failure of traditional satellite-positioning methods. This paper utilizes a visual simultaneous localization and mapping (vSLAM) method to achieve USV positioning in nearshore environments. [...] Read more.
Unmanned surface vehicles (USVs) in nearshore areas are prone to environmental occlusion and electromagnetic interference, which can lead to the failure of traditional satellite-positioning methods. This paper utilizes a visual simultaneous localization and mapping (vSLAM) method to achieve USV positioning in nearshore environments. To address the issues of uneven feature distribution, erroneous depth information, and frequent viewpoint jitter in the visual data of USVs operating in nearshore environments, we propose a stereo vision SLAM system tailored for nearshore conditions: SP-SLAM (Segmentation Point-SLAM). This method is based on ORB-SLAM2 and incorporates a distance segmentation module, which filters feature points from different regions and adaptively adjusts the impact of outliers on iterative optimization, reducing the influence of erroneous depth information on motion scale estimation in open environments. Additionally, our method uses the Sum of Absolute Differences (SAD) for matching image blocks and quadric interpolation to obtain more accurate depth information, constructing a complete map. The experimental results on the USVInland dataset show that SP-SLAM solves the scaling constraint failure problem in nearshore environments and significantly improves the robustness of the stereo SLAM system in such environments. Full article
(This article belongs to the Section Electrical and Autonomous Vehicles)
Show Figures

Figure 1

Figure 1
<p>Nearshore environments image, (<b>a</b>) is the river channel, (<b>b</b>) is the coastal area, and (<b>c</b>) is the lake.</p>
Full article ">Figure 2
<p>Structure of the SP-SLAM system.</p>
Full article ">Figure 3
<p>Rotation invariance and scale invariance of ORB features.</p>
Full article ">Figure 4
<p>Morphological dilatation, (<b>a</b>) is the original image, (<b>b</b>) is the structural element, and (<b>c</b>) is the expanded image; the gray areas represent the pixel distribution before dilation, while the black areas denote the pixels filled during the dilation process.</p>
Full article ">Figure 5
<p>Distance segmentation, (<b>a</b>) is the original image, (<b>b</b>) is the image after distance segmentation.</p>
Full article ">Figure 6
<p>Quadric interpolation diagram.</p>
Full article ">Figure 7
<p>Three-dimensional point updating.</p>
Full article ">Figure 8
<p>USVInland Acquisition Platform: (<b>a</b>) is the physical picture of the unmanned ship, (<b>b</b>) is the front view of the unmanned ship, (<b>c</b>) is the top view of the unmanned ship.</p>
Full article ">Figure 9
<p>The distance segmentation effect processed by different structural parts is as follows: the first act is the original image, the second act is the binary image segmented by the iterative threshold, and the third to fifth lines are segmented by different structural elements.</p>
Full article ">Figure 10
<p>Nearshore feature extraction effect, (<b>a</b>–<b>d</b>) illustrate the extraction results in different scenarios, where the green points represent foreground feature points and the red points represent background feature points.</p>
Full article ">Figure 11
<p>USVInland real environment presentation.</p>
Full article ">Figure 12
<p>Comparison of motion trajectories.</p>
Full article ">Figure 13
<p>KITTI datasets 00, 01 sequence trajectory comparison diagram.</p>
Full article ">Figure 14
<p>N03-5 mapping effect display, (<b>a</b>) shows the real scene image and mapping results of the riverbanks, and (<b>b</b>) displays the satellite top view of the river and the overall mapping outcomes. The red line in the figure indicates the approximate path of the unmanned surface vessel.</p>
Full article ">Figure 15
<p>N02-4 mapping effect display, (<b>a</b>) shows the real scene images of both ends of the river, both of which are bridge scenes; (<b>b</b>) displays the satellite top view of the river and the overall mapping results. The red line in the figure indicates the approximate path of the USVs.</p>
Full article ">
15 pages, 4119 KiB  
Article
Visual Navigation Algorithms for Aircraft Fusing Neural Networks in Denial Environments
by Yang Gao, Yue Wang, Lingyun Tian, Dongguang Li and Fenming Wang
Sensors 2024, 24(15), 4797; https://doi.org/10.3390/s24154797 - 24 Jul 2024
Viewed by 776
Abstract
A lightweight aircraft visual navigation algorithm that fuses neural networks is proposed to address the limited computing power issue during the offline operation of aircraft edge computing platforms in satellite-denied environments with complex working scenarios. This algorithm utilizes object detection algorithms to label [...] Read more.
A lightweight aircraft visual navigation algorithm that fuses neural networks is proposed to address the limited computing power issue during the offline operation of aircraft edge computing platforms in satellite-denied environments with complex working scenarios. This algorithm utilizes object detection algorithms to label dynamic objects within complex scenes and performs dynamic feature point elimination to enhance the feature point extraction quality, thereby improving navigation accuracy. The algorithm was validated using an aircraft edge computing platform, and comparisons were made with existing methods through experiments conducted on the TUM public dataset and physical flight experiments. The experimental results show that the proposed algorithm not only improves the navigation accuracy but also has high robustness compared with the monocular ORB-SLAM2 method under the premise of satisfying the real-time operation of the system. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

Figure 1
<p>Network of the YOLOv5 [<a href="#B29-sensors-24-04797" class="html-bibr">29</a>].</p>
Full article ">Figure 2
<p>The Network of CBAM-YOLOv5.</p>
Full article ">Figure 3
<p>Flow chart of the YOLO-SVO system.</p>
Full article ">Figure 4
<p>Processing flow of the dynamic pixels marked.</p>
Full article ">Figure 5
<p>Static feature point extraction.</p>
Full article ">Figure 6
<p>(<b>a</b>) The result of the original feature algorithm extraction; (<b>b</b>) the result of the static feature algorithm extraction. Comparison of the result among the original feature extraction and the static feature extraction.</p>
Full article ">Figure 7
<p>The error of the pose estimation. (<b>a</b>) Trajectories parsed by the monocular ORB-SLAM2; (<b>b</b>) The RPE of the monocular ORB-SLAM2 parsed trajectories; (<b>c</b>) trajectories parsed by our algorithm; (<b>d</b>) the RPE of our algorithm’s parsed trajectories.</p>
Full article ">Figure 8
<p>The experimental UAV platform.</p>
Full article ">Figure 9
<p>The trajectory of the flight in the real world.</p>
Full article ">Figure 10
<p>The scene of the flight.</p>
Full article ">Figure 11
<p>The result of the static feature algorithm extraction.</p>
Full article ">Figure 12
<p>Comparison of the absolute trajectory error. (<b>a</b>) Trajectories parsed by our algorithm; (<b>b</b>) the RPE of our algorithm-parsed trajectories; (<b>c</b>) trajectories parsed by the monocular ORB-SLAM2; (<b>d</b>) the RPE of the monocular ORB-SLAM2-parsed trajectories.</p>
Full article ">
Back to TopTop