MDPI - Publisher of Open Access Journals

18 pages, 39910 KiB

Open AccessArticle

DyGS-SLAM: Realistic Map Reconstruction in Dynamic Scenes Based on Double-Constrained Visual SLAM

by Fan Zhu, Yifan Zhao, Ziyu Chen, Chunmao Jiang, Hui Zhu and Xiaoxi Hu

Remote Sens. 2025, 17(4), 625; https://doi.org/10.3390/rs17040625 - 12 Feb 2025

Viewed by 446

Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of [...] Read more.

Visual SLAM is widely applied in robotics and remote sensing. The fusion of Gaussian radiance fields and Visual SLAM has demonstrated astonishing efficacy in constructing high-quality dense maps. While existing methods perform well in static scenes, they are prone to the influence of dynamic objects in real-world dynamic environments, thus making robust tracking and mapping challenging. We introduce DyGS-SLAM, a Visual SLAM system that employs dual constraints to achieve high-fidelity static map reconstruction in dynamic environments. We extract ORB features within the scene, and use open-world semantic segmentation models and multi-view geometry to construct dual constraints, forming a zero-shot dynamic information elimination module while recovering backgrounds occluded by dynamic objects. Furthermore, we select high-quality keyframes and use them for loop closure detection and global optimization, constructing a foundational Gaussian map through a set of determined point clouds and poses and integrating repaired frames for rendering new viewpoints and optimizing 3D scenes. Experimental results on the TUM RGB-D, Bonn, and Replica datasets, as well as real scenes, demonstrate that our method has excellent localization accuracy and mapping quality in dynamic scenes. Full article

(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)

► Show Figures

Figure 1

15 pages, 3120 KiB

Open AccessArticle

Implementation of Visual Odometry on Jetson Nano

by Jakub Krško, Dušan Nemec, Vojtech Šimák and Mário Michálik

Sensors 2025, 25(4), 1025; https://doi.org/10.3390/s25041025 - 9 Feb 2025

Viewed by 461

Abstract

This paper presents the implementation of ORB-SLAM3 for visual odometry on a low-power ARM-based system, specifically the Jetson Nano, to track a robot’s movement using RGB-D cameras. Key challenges addressed include the selection of compatible software libraries, camera calibration, and system optimization. The [...] Read more.

This paper presents the implementation of ORB-SLAM3 for visual odometry on a low-power ARM-based system, specifically the Jetson Nano, to track a robot’s movement using RGB-D cameras. Key challenges addressed include the selection of compatible software libraries, camera calibration, and system optimization. The ORB-SLAM3 algorithm was adapted for the ARM architecture and tested using both the EuRoC dataset and real-world scenarios involving a mobile robot. The testing demonstrated that ORB-SLAM3 provides accurate localization, with errors in path estimation ranging from 3 to 11 cm when using the EuRoC dataset. Real-world tests on a mobile robot revealed discrepancies primarily due to encoder drift and environmental factors such as lighting and texture. The paper discusses strategies for mitigating these errors, including enhanced calibration and the potential use of encoder data for tracking when camera performance falters. Future improvements focus on refining the calibration process, adding trajectory correction mechanisms, and integrating visual odometry data more effectively into broader systems. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

17 pages, 15387 KiB

Open AccessArticle

Improving 3D Reconstruction Through RGB-D Sensor Noise Modeling

by Fahira Afzal Maken, Sundaram Muthu, Chuong Nguyen, Changming Sun, Jinguang Tong, Shan Wang, Russell Tsuchida, David Howard, Simon Dunstall and Lars Petersson

Sensors 2025, 25(3), 950; https://doi.org/10.3390/s25030950 - 5 Feb 2025

Viewed by 459

Abstract

High-resolution RGB-D sensors are widely used in computer vision, manufacturing, and robotics. The depth maps from these sensors have inherently high measurement uncertainty that includes both systematic and non-systematic noise. These noisy depth estimates degrade the quality of scans, resulting in less accurate [...] Read more.

High-resolution RGB-D sensors are widely used in computer vision, manufacturing, and robotics. The depth maps from these sensors have inherently high measurement uncertainty that includes both systematic and non-systematic noise. These noisy depth estimates degrade the quality of scans, resulting in less accurate 3D reconstruction, making them unsuitable for some high-precision applications. In this paper, we focus on quantifying the uncertainty in the depth maps of high-resolution RGB-D sensors for the purpose of improving 3D reconstruction accuracy. To this end, we estimate the noise model for a recent high-precision RGB-D structured light sensor called Zivid when mounted on a robot arm. Our proposed noise model takes into account the measurement distance and angle between the sensor and the measured surface. We additionally analyze the effect of background light, exposure time, and the number of captures on the quality of the depth maps obtained. Our noise model seamlessly integrates with well-known classical and modern neural rendering-based algorithms, from KinectFusion to Point-SLAM methods using bilinear interpolation as well as 3D analytical functions. We collect a high-resolution RGB-D dataset and apply our noise model to improve tracking and produce higher-resolution 3D models. Full article

(This article belongs to the Special Issue Challenges and Future Trends of 3D Image Sensing, Visualization, and Processing)

► Show Figures

Figure 1

17 pages, 1609 KiB

Open AccessArticle

by Xiasheng Ma, Ci Song, Yimin Ji and Shanlin Zhong

Appl. Sci. 2025, 15(3), 1320; https://doi.org/10.3390/app15031320 - 27 Jan 2025

Viewed by 690

Abstract

Simultaneous localization and mapping (SLAM) is the basis for intelligent robots to explore the world. As a promising method for 3D reconstruction, 3D Gaussian splatting (3DGS) integrated with SLAM systems has shown significant potential. However, due to environmental uncertainties, errors in the tracking [...] Read more.

Simultaneous localization and mapping (SLAM) is the basis for intelligent robots to explore the world. As a promising method for 3D reconstruction, 3D Gaussian splatting (3DGS) integrated with SLAM systems has shown significant potential. However, due to environmental uncertainties, errors in the tracking process with 3D Gaussians can negatively impact SLAM systems. This paper introduces a novel dense RGB-D SLAM system based on 3DGS that refines Gaussians through sub-Gaussians in the camera coordinate system. Additionally, we propose an algorithm to select keyframes closely related to the current frame, optimizing the scene map and pose of the current keyframe. This approach effectively enhances both the tracking and mapping performance. Experiments on high-quality synthetic scenes (Replica dataset) and low-quality real-world scenes (TUM-RGBD and ScanNet datasets) demonstrate that our system achieves competitive performance in tracking and mapping. Full article

► Show Figures

Figure 1

22 pages, 4507 KiB

Open AccessArticle

Visual Target-Driven Robot Crowd Navigation with Limited FOV Using Self-Attention Enhanced Deep Reinforcement Learning

by Yinbei Li, Qingyang Lyu, Jiaqiang Yang, Yasir Salam and Baixiang Wang

Sensors 2025, 25(3), 639; https://doi.org/10.3390/s25030639 - 22 Jan 2025

Viewed by 623

Abstract

Navigating crowded environments poses significant challenges for mobile robots, particularly as traditional Simultaneous Localization and Mapping (SLAM)-based methods often struggle with dynamic and unpredictable settings. This paper proposes a visual target-driven navigation method using self-attention enhanced deep reinforcement learning (DRL) to overcome these [...] Read more.

Navigating crowded environments poses significant challenges for mobile robots, particularly as traditional Simultaneous Localization and Mapping (SLAM)-based methods often struggle with dynamic and unpredictable settings. This paper proposes a visual target-driven navigation method using self-attention enhanced deep reinforcement learning (DRL) to overcome these limitations. The navigation policy is developed based on the Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling efficient obstacle avoidance and target pursuit. We utilize a single RGB-D camera with a limited field of view (FOV) for target detection and surrounding sensing, where environmental features are extracted from depth data via a convolutional neural network (CNN). A self-attention network (SAN) is employed to compensate for the limited FOV, enhancing the robot’s capability of searching for the target when it is temporarily lost. Experimental results show that our method achieves a higher success rate and shorter average target-reaching time in dynamic environments, while offering hardware simplicity, cost-effectiveness, and ease of deployment in real-world applications. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

21 pages, 4833 KiB

Open AccessArticle

An Effective 3D Instance Map Reconstruction Method Based on RGBD Images for Indoor Scene

by Heng Wu, Yanjie Liu, Chao Wang and Yanlong Wei

Remote Sens. 2025, 17(1), 139; https://doi.org/10.3390/rs17010139 - 3 Jan 2025

Viewed by 544

Abstract

To enhance the intelligence of robots, constructing accurate object-level instance maps is essential. However, the diversity and clutter of objects in indoor scenes present significant challenges for instance map construction. To tackle this issue, we propose a method for constructing object-level instance maps [...] Read more.

To enhance the intelligence of robots, constructing accurate object-level instance maps is essential. However, the diversity and clutter of objects in indoor scenes present significant challenges for instance map construction. To tackle this issue, we propose a method for constructing object-level instance maps based on RGBD images. First, we utilize the advanced visual odometer ORB-SLAM3 to estimate the poses of image frames and extract keyframes. Next, we perform semantic and geometric segmentation on the color and depth images of these keyframes, respectively, using semantic segmentation to optimize the geometric segmentation results and address inaccuracies in the target segmentation caused by small depth variations. The segmented depth images are then projected into point cloud segments, which are assigned corresponding semantic information. We integrate these point cloud segments into a global voxel map, updating each voxel’s class using color, distance constraints, and Bayesian methods to create an object-level instance map. Finally, we construct an ellipsoids scene from this map to test the robot’s localization capabilities in indoor environments using semantic information. Our experiments demonstrate that this method accurately and robustly constructs the environment, facilitating precise object-level scene segmentation. Furthermore, compared to manually labeled ellipsoidal maps, generating ellipsoidal maps from extracted objects enables accurate global localization. Full article

(This article belongs to the Special Issue 3D Scene Reconstruction, Modeling and Analysis Using Remote Sensing)

► Show Figures

Figure 1

24 pages, 31029 KiB

Open AccessArticle

InCrowd-VI: A Realistic Visual–Inertial Dataset for Evaluating Simultaneous Localization and Mapping in Indoor Pedestrian-Rich Spaces for Human Navigation

by Marziyeh Bamdad, Hans-Peter Hutter and Alireza Darvishy

Sensors 2024, 24(24), 8164; https://doi.org/10.3390/s24248164 - 21 Dec 2024

Viewed by 709

Abstract

Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual–inertial dataset specifically [...] Read more.

Simultaneous localization and mapping (SLAM) techniques can be used to navigate the visually impaired, but the development of robust SLAM solutions for crowded spaces is limited by the lack of realistic datasets. To address this, we introduce InCrowd-VI, a novel visual–inertial dataset specifically designed for human navigation in indoor pedestrian-rich environments. Recorded using Meta Aria Project glasses, it captures realistic scenarios without environmental control. InCrowd-VI features 58 sequences totaling a 5 km trajectory length and 1.5 h of recording time, including RGB, stereo images, and IMU measurements. The dataset captures important challenges such as pedestrian occlusions, varying crowd densities, complex layouts, and lighting changes. Ground-truth trajectories, accurate to approximately 2 cm, are provided in the dataset, originating from the Meta Aria project machine perception SLAM service. In addition, a semi-dense 3D point cloud of scenes is provided for each sequence. The evaluation of state-of-the-art visual odometry (VO) and SLAM algorithms on InCrowd-VI revealed severe performance limitations in these realistic scenarios. Under challenging conditions, systems exceeded the required localization accuracy of 0.5 m and the 1% drift threshold, with classical methods showing drift up to 5–10%. While deep learning-based approaches maintained high pose estimation coverage (>90%), they failed to achieve real-time processing speeds necessary for walking pace navigation. These results demonstrate the need and value of a new dataset to advance SLAM research for visually impaired navigation in complex indoor environments. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

19 pages, 21309 KiB

Open AccessArticle

Real-Time Drivable Region Mapping Using an RGB-D Sensor with Loop Closure Refinement and 3D Semantic Map-Merging

by ChangWan Ha, DongHyun Yang, Gicheol Wang, Sung Chang Kim and HyungGi Jo

Appl. Sci. 2024, 14(24), 11613; https://doi.org/10.3390/app142411613 - 12 Dec 2024

Viewed by 767

Abstract

Drivable region maps, created using a visual sensor, are essential for autonomous navigation because off-the-shelf maps do not reflect contemporary real-world conditions. This study presents a large-scale drivable region mapping system that is capable of capturing large-scale environments in real-time, using a single [...] Read more.

Drivable region maps, created using a visual sensor, are essential for autonomous navigation because off-the-shelf maps do not reflect contemporary real-world conditions. This study presents a large-scale drivable region mapping system that is capable of capturing large-scale environments in real-time, using a single RGB-D sensor. Whereas existing semantic simultaneous localization and mapping (SLAM) methods consider only accurate pose estimation and the registration of semantic information, when loop closure is detected, contemporaneous large-scale spatial semantic maps are generated by refining 3D point clouds and semantic information. When loop closure occurs, our method finds the corresponding keyframe for each semantically labeled point cloud and transforms the point cloud into adjusted positions. Additionally, a map-merging algorithm for semantic maps is proposed to address large-scale environments. Experiments were conducted on the Complex Urban dataset and our custom dataset, which are publicly available, and real-world datasets using a vehicle-mounted sensor. Our method alleviates the drift errors that frequently occur when the agents navigate in large areas. Compared with satellite images, the resulting semantic maps are well aligned and have proven validity in terms of timeliness and accuracy. Full article

► Show Figures

Figure 1

15 pages, 6086 KiB

Open AccessArticle

Improved Visual SLAM Algorithm Based on Dynamic Scenes

by Jinxing Niu, Ziqi Chen, Tao Zhang and Shiyu Zheng

Appl. Sci. 2024, 14(22), 10727; https://doi.org/10.3390/app142210727 - 20 Nov 2024

Viewed by 892

Abstract

This work presents a novel RGB-D dynamic simultaneous localization and mapping (SLAM) method that improves accuracy, stability, and efficiency of localization while relying on deep learning in a dynamic environment, in contrast to traditional static scene-based visual SLAM methods. Based on the classic [...] Read more.

This work presents a novel RGB-D dynamic simultaneous localization and mapping (SLAM) method that improves accuracy, stability, and efficiency of localization while relying on deep learning in a dynamic environment, in contrast to traditional static scene-based visual SLAM methods. Based on the classic framework of traditional visual SLAM, we propose a method that replaces the traditional feature extraction method with a convolutional neural network approach, aiming to enhance the accuracy of feature extraction and localization, as well as to improve the algorithm’s ability to capture and represent the characteristics of the entire scene. Subsequently, the semantic segmentation thread was utilized in a target detection network combined with geometric methods to identify potential dynamic areas in the image and generate masks for dynamic objects. Finally, the standard deviation of the depth information of potential dynamic points was calculated to identify true dynamic feature points, to guarantee that static feature points were used for position estimation. We performed experiments based on the public datasets to validate the feasibility of the proposed algorithm. The experimental results indicate that the improved SLAM algorithm, which boasts a reduction in absolute trajectory error (ATE) by approximately 97% compared to traditional static visual SLAM and about 20% compared to traditional dynamic visual SLAM, also exhibited a 68% decrease in computation time compared to well-known dynamic visual SLAM, thereby possessing absolute advantages in both positioning accuracy and operational efficiency. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

Figure 1
Overview of the enhanced SLAM system. The framework of the algorithm comprises four threads: semantic segmentation, tracking, local mapping, and loop closing. Full article ">Figure 2
GCNv2 feature extraction network structure with channel numbers listed below each convolutional layer. Full article ">Figure 3
YOLOv5’s network architecture diagram. Full article ">Figure 4
(a,b) and (c,d) are the semantic segmentation results based on the modified SLAM. Red indicates the detection boxes from YOLOv5x for object detection, while green represents the extracted feature points. Full article ">Figure 5
Comparing feature point distribution between ORB and GCNv2, the scenes in figures (a,b) are cluttered with various objects, including computer screens, which made it difficult to obtain features. The images in (c,d) were taken from the corner of a table where the camera was moving, resulting in significant changes in viewpoint. Full article ">Figure 6
Comparing the ATE of the improved SLAM, ORB-SLAM2 across five dynamic scene sequences from the fr3 dataset, (a–e) represent the trajectory maps of ORB-SLAM2, while (f–j) represent the trajectory maps of the improved SLAM. Full article ">Figure 7
Shows the results for the fr3_walking_xyz sequence. Panels (a,b) illustrate the estimated trajectories compared to the ground truth, as well as the errors along the x, y, and z axes for ORB-SLAM2 and the improved SLAM. Panel (c) displays the time consumption for each method. Full article ">

32 pages, 11087 KiB

Open AccessArticle

Path Planning and Motion Control of Robot Dog Through Rough Terrain Based on Vision Navigation

by Tianxiang Chen, Yipeng Huangfu, Sutthiphong Srigrarom and Boo Cheong Khoo

Sensors 2024, 24(22), 7306; https://doi.org/10.3390/s24227306 - 15 Nov 2024

Viewed by 2090

Abstract

This article delineates the enhancement of an autonomous navigation and obstacle avoidance system for a quadruped robot dog. Part one of this paper presents the integration of a sophisticated multi-level dynamic control framework, utilizing Model Predictive Control (MPC) and Whole-Body Control (WBC) from [...] Read more.

This article delineates the enhancement of an autonomous navigation and obstacle avoidance system for a quadruped robot dog. Part one of this paper presents the integration of a sophisticated multi-level dynamic control framework, utilizing Model Predictive Control (MPC) and Whole-Body Control (WBC) from MIT Cheetah. The system employs an Intel RealSense D435i depth camera for depth vision-based navigation, which enables high-fidelity 3D environmental mapping and real-time path planning. A significant innovation is the customization of the EGO-Planner to optimize trajectory planning in dynamically changing terrains, coupled with the implementation of a multi-body dynamics model that significantly improves the robot’s stability and maneuverability across various surfaces. The experimental results show that the RGB-D system exhibits superior velocity stability and trajectory accuracy to the SLAM system, with a 20% reduction in the cumulative velocity error and a 10% improvement in path tracking precision. The experimental results also show that the RGB-D system achieves smoother navigation, requiring 15% fewer iterations for path planning, and a 30% faster success rate recovery in challenging environments. The successful application of these technologies in simulated urban disaster scenarios suggests promising future applications in emergency response and complex urban environments. Part two of this paper presents the development of a robust path planning algorithm for a robot dog on a rough terrain based on attached binocular vision navigation. We use a commercial-of-the-shelf (COTS) robot dog. An optical CCD binocular vision dynamic tracking system is used to provide environment information. Likewise, the pose and posture of the robot dog are obtained from the robot’s own sensors, and a kinematics model is established. Then, a binocular vision tracking method is developed to determine the optimal path, provide a proposal (commands to actuators) of the position and posture of the bionic robot, and achieve stable motion on tough terrains. The terrain is assumed to be a gentle uneven terrain to begin with and subsequently proceeds to a more rough surface. This work consists of four steps: (1) pose and position data are acquired from the robot dog’s own inertial sensors, (2) terrain and environment information is input from onboard cameras, (3) information is fused (integrated), and (4) path planning and motion control proposals are made. Ultimately, this work provides a robust framework for future developments in the vision-based navigation and control of quadruped robots, offering potential solutions for navigating complex and dynamic terrains. Full article

(This article belongs to the Special Issue Control Systems, Vision Technology and Sensor Fusion for Unmanned Robotic Vehicles)

► Show Figures

Figure 1

26 pages, 3132 KiB

Open AccessArticle

A Novel Fuzzy Image-Based UAV Landing Using RGBD Data and Visual SLAM

by Shayan Sepahvand, Niloufar Amiri, Houman Masnavi, Iraj Mantegh and Farrokh Janabi-Sharifi

Drones 2024, 8(10), 594; https://doi.org/10.3390/drones8100594 - 18 Oct 2024

Viewed by 1227

Abstract

In this work, an innovative perception-guided approach is proposed for landing zone detection and realization of Unmanned Aerial Vehicles (UAVs) operating in unstructured environments ridden with obstacles. To accommodate secure landing, two well-established tools, namely fuzzy systems and visual Simultaneous Localization and Mapping [...] Read more.

In this work, an innovative perception-guided approach is proposed for landing zone detection and realization of Unmanned Aerial Vehicles (UAVs) operating in unstructured environments ridden with obstacles. To accommodate secure landing, two well-established tools, namely fuzzy systems and visual Simultaneous Localization and Mapping (vSLAM), are implemented into the landing pipeline. Firstly, colored images and point clouds acquired by a visual sensory device are processed to serve as characterizing maps that acquire information about flatness, steepness, inclination, and depth variation. By leveraging these images, a novel fuzzy map infers the areas for risk-free landing on which the UAV can safely land. Subsequently, the vSLAM system is employed to estimate the platform’s pose and an additional set of point clouds. The vSLAM point clouds presented in the corresponding keyframe are projected back onto the image plane on which a threshold fuzzy landing score map is applied. In other words, this binary image serves as a mask for the re-projected vSLAM world points to identify the best subset for landing. Once these image points are identified, their corresponding world points are located, and among them, the center of the cluster with the largest area is chosen as the point to land. Depending on the UAV’s size, four synthesis points are added to the vSLAM point cloud to execute the image-based visual servoing landing using image moment features. The effectiveness of the landing package is assessed through the ROS Gazebo simulation environment, where comparisons are made with a state-of-the-art landing site detection method. Full article

(This article belongs to the Special Issue Perception, Decision-Making and Control of Intelligent Unmanned System)

► Show Figures

Figure 1

20 pages, 6262 KiB

Open AccessArticle

YPR-SLAM: A SLAM System Combining Object Detection and Geometric Constraints for Dynamic Scenes

by Xukang Kan, Gefei Shi, Xuerong Yang and Xinwei Hu

Sensors 2024, 24(20), 6576; https://doi.org/10.3390/s24206576 - 12 Oct 2024

Viewed by 895

Abstract

Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. [...] Read more.

Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. First of all, the system includes a lightweight YOLOv5 detection network for detecting both dynamic and static objects, which provides pre-dynamic object information to the SLAM system. Secondly, utilizing the prior information of dynamic targets and the depth image, a method of geometric constraint for removing motion feature points from the depth image is proposed. The Depth-PROSAC algorithm is used to differentiate the dynamic and static feature points so that dynamic feature points can be removed. At last, the dense cloud map is constructed by the static feature points. The YPR-SLAM system is an efficient combination of object detection and geometry constraint in a tightly coupled way, eliminating motion feature points and minimizing their adverse effects on SLAM systems. The performance of the YPR-SLAM was assessed on the public TUM RGB-D dataset, and it was found that YPR-SLAM was suitable for dynamic situations. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

23 pages, 9746 KiB

Open AccessArticle

Research on SLAM Localization Algorithm for Orchard Dynamic Vision Based on YOLOD-SLAM2

by Zhen Ma, Siyuan Yang, Jingbin Li and Jiangtao Qi

Agriculture 2024, 14(9), 1622; https://doi.org/10.3390/agriculture14091622 - 16 Sep 2024

Cited by 1 | Viewed by 1108

Abstract

With the development of agriculture, the complexity and dynamism of orchard environments pose challenges to the perception and positioning of inter-row environments for agricultural vehicles. This paper proposes a method for extracting navigation lines and measuring pedestrian obstacles. The improved YOLOv5 algorithm is [...] Read more.

With the development of agriculture, the complexity and dynamism of orchard environments pose challenges to the perception and positioning of inter-row environments for agricultural vehicles. This paper proposes a method for extracting navigation lines and measuring pedestrian obstacles. The improved YOLOv5 algorithm is used to detect tree trunks between left and right rows in orchards. The experimental results show that the average angle deviation of the extracted navigation lines was less than 5 degrees, verifying its accuracy. Due to the variable posture of pedestrians and ineffective camera depth, a distance measurement algorithm based on a four-zone depth comparison is proposed for pedestrian obstacle distance measurement. Experimental results showed that within a range of 6 m, the average relative error of distance measurement did not exceed 1%, and within a range of 9 m, the maximum relative error was 2.03%. The average distance measurement time was 30 ms, which could accurately and quickly achieve pedestrian distance measurement in orchard environments. On the publicly available TUM RGB-D dynamic dataset, YOLOD-SLAM2 significantly reduced the RMSE index of absolute trajectory error compared to the ORB-SLAM2 algorithm, which was less than 0.05 m/s. In actual orchard environments, YOLOD-SLAM2 had a higher degree of agreement between the estimated trajectory and the true trajectory when the vehicle was traveling in straight and circular directions. The RMSE index of the absolute trajectory error was less than 0.03 m/s, and the average tracking time was 47 ms, indicating that the YOLOD-SLAM2 algorithm proposed in this paper could meet the accuracy and real-time requirements of agricultural vehicle positioning in orchard environments. Full article

(This article belongs to the Section Agricultural Technology)

► Show Figures

Figure 1

18 pages, 5473 KiB

Open AccessArticle

Visual-Inertial RGB-D SLAM with Encoder Integration of ORB Triangulation and Depth Measurement Uncertainties

by Zhan-Wu Ma and Wan-Sheng Cheng

Sensors 2024, 24(18), 5964; https://doi.org/10.3390/s24185964 - 14 Sep 2024

Cited by 2 | Viewed by 1351

Abstract

In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct [...] Read more.

In recent years, the accuracy of visual SLAM (Simultaneous Localization and Mapping) technology has seen significant improvements, making it a prominent area of research. However, within the current RGB-D SLAM systems, the estimation of 3D positions of feature points primarily relies on direct measurements from RGB-D depth cameras, which inherently contain measurement errors. Moreover, the potential of triangulation-based estimation for ORB (Oriented FAST and Rotated BRIEF) feature points remains underutilized. To address the singularity of measurement data, this paper proposes the integration of the ORB features, triangulation uncertainty estimation and depth measurements uncertainty estimation, for 3D positions of feature points. This integration is achieved using a CI (Covariance Intersection) filter, referred to as the CI-TEDM (Triangulation Estimates and Depth Measurements) method. Vision-based SLAM systems face significant challenges, particularly in environments, such as long straight corridors, weakly textured scenes, or during rapid motion, where tracking failures are common. To enhance the stability of visual SLAM, this paper introduces an improved CI-TEDM method by incorporating wheel encoder data. The mathematical model of the encoder is proposed, and detailed derivations of the encoder pre-integration model and error model are provided. Building on these improvements, we propose a novel tightly coupled visual-inertial RGB-D SLAM with encoder integration of ORB triangulation and depth measurement uncertainties. Validation on open-source datasets and real-world environments demonstrates that the proposed improvements significantly enhance the robustness of real-time state estimation and localization accuracy for intelligent vehicles in challenging environments. Full article

(This article belongs to the Special Issue Target Tracking and Navigation for Intelligent Autonomous Unmanned Systems Application)

► Show Figures

Figure 1

29 pages, 9403 KiB

Open AccessArticle

DIO-SLAM: A Dynamic RGB-D SLAM Method Combining Instance Segmentation and Optical Flow

by Lang He, Shiyun Li, Junting Qiu and Chenhaomin Zhang

Sensors 2024, 24(18), 5929; https://doi.org/10.3390/s24185929 - 12 Sep 2024

Viewed by 1378

Abstract

Feature points from moving objects can negatively impact the accuracy of Visual Simultaneous Localization and Mapping (VSLAM) algorithms, while detection or semantic segmentation-based VSLAM approaches often fail to accurately determine the true motion state of objects. To address this challenge, this paper introduces [...] Read more.

Feature points from moving objects can negatively impact the accuracy of Visual Simultaneous Localization and Mapping (VSLAM) algorithms, while detection or semantic segmentation-based VSLAM approaches often fail to accurately determine the true motion state of objects. To address this challenge, this paper introduces DIO-SLAM: Dynamic Instance Optical Flow SLAM, a VSLAM system specifically designed for dynamic environments. Initially, the detection thread employs YOLACT (You Only Look At CoefficienTs) to distinguish between rigid and non-rigid objects within the scene. Subsequently, the optical flow thread estimates optical flow and introduces a novel approach to capture the optical flow of moving objects by leveraging optical flow residuals. Following this, an optical flow consistency method is implemented to assess the dynamic nature of rigid object mask regions, classifying them as either moving or stationary rigid objects. To mitigate errors caused by missed detections or motion blur, a motion frame propagation method is employed. Lastly, a dense mapping thread is incorporated to filter out non-rigid objects using semantic information, track the point clouds of rigid objects, reconstruct the static background, and store the resulting map in an octree format. Experimental results demonstrate that the proposed method surpasses current mainstream dynamic VSLAM techniques in both localization accuracy and real-time performance. Full article

(This article belongs to the Special Issue Sensors and Algorithms for 3D Visual Analysis and SLAM)

► Show Figures

Figure 1

Search Results (126)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (126)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI