MDPI - Publisher of Open Access Journals

25 pages, 13698 KiB

Open AccessArticle

Self-Supervised Foundation Model for Template Matching

by Anton Hristov, Dimo Dimov and Maria Nisheva-Pavlova

Big Data Cogn. Comput. 2025, 9(2), 38; https://doi.org/10.3390/bdcc9020038 - 11 Feb 2025

Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations [...] Read more.

Finding a template location in a query image is a fundamental problem in many computer vision applications, such as localization of known objects, image registration, image matching, and object tracking. Currently available methods fail when insufficient training data are available or big variations in the textures, different modalities, and weak visual features exist in the images, leading to limited applications on real-world tasks. We introduce Self-Supervised Foundation Model for Template Matching (Self-TM), a novel end-to-end approach to self-supervised learning template matching. The idea behind Self-TM is to learn hierarchical features incorporating localization properties from images without any annotations. As going deeper in the convolutional neural network (CNN) layers, their filters begin to react to more complex structures and their receptive fields increase. This leads to loss of localization information in contrast to the early layers. The hierarchical propagation of the last layers back to the first layer results in precise template localization. Due to its zero-shot generalization capabilities on tasks such as image retrieval, dense template matching, and sparse image matching, our pre-trained model can be classified as a foundation one. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

Figure 1
Illustration of Self-TM. Full article ">Figure 2
Illustration of a receptive field, <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mo>_</mo> <msub> <mi>p</mi> <mi>N</mi> </msub> </mrow> </msub> </mrow> </semantics></math>, in layer <math display="inline"><semantics> <mrow> <mi>N</mi> <mo>−</mo> <mn>1</mn> </mrow> </semantics></math> (in orange) of a detected maximum value, <math display="inline"><semantics> <mrow> <mi>p</mi> <mi>r</mi> <mi>e</mi> <mi>d</mi> <mo>_</mo> <msub> <mi>p</mi> <mi>N</mi> </msub> </mrow> </semantics></math>, in layer <math display="inline"><semantics> <mi>N</mi> </semantics></math> (in red). Full article ">Figure 3
Visual representation of results on Hpatches (values, excluding those for Self-TM, are taken from Twin-Net [<a href="#B61-BDCC-09-00038" class="html-bibr">61</a>]): (a) patch verification task; (b) image matching task; (c) patch retrieval task. The methods are grouped into the following groups: “handcrafted”, which were manually created by their authors; “supervised”, which used annotated data for their training; “self-supervised”, which did not use any annotations. A plus (+) denotes Self-TM models that are finetuned on the Hpatches dataset, and similarly (*) denotes variations of Tfear models. Full article ">Figure 3 Cont.
Visual representation of results on Hpatches (values, excluding those for Self-TM, are taken from Twin-Net [<a href="#B61-BDCC-09-00038" class="html-bibr">61</a>]): (a) patch verification task; (b) image matching task; (c) patch retrieval task. The methods are grouped into the following groups: “handcrafted”, which were manually created by their authors; “supervised”, which used annotated data for their training; “self-supervised”, which did not use any annotations. A plus (+) denotes Self-TM models that are finetuned on the Hpatches dataset, and similarly (*) denotes variations of Tfear models. Full article ">Figure 4
Comparison of OmniGlue [<a href="#B34-BDCC-09-00038" class="html-bibr">34</a>] (a) and OmniGlue + Self-TM Base (b) in finding keypoint matches in an image with out-of-training-domain modality. For the purpose of visualization, matches with high “confidence” are not visualized to make the errors visible. The correct matches are shown in green color, respectively the incorrect matches in red color. Full article ">

24 pages, 8492 KiB

Open AccessArticle

Conditional Generative Adversarial Networks and Deep Learning Data Augmentation: A Multi-Perspective Data-Driven Survey Across Multiple Application Fields and Classification Architectures

by Lucas C. Ribas, Wallace Casaca and Ricardo T. Fares

AI 2025, 6(2), 32; https://doi.org/10.3390/ai6020032 - 7 Feb 2025

Abstract

Effectively training deep learning models relies heavily on large datasets, as insufficient instances can hinder model generalization. A simple yet effective way to address this is by applying modern deep learning augmentation methods, as they synthesize new data matching the input distribution while [...] Read more.

Effectively training deep learning models relies heavily on large datasets, as insufficient instances can hinder model generalization. A simple yet effective way to address this is by applying modern deep learning augmentation methods, as they synthesize new data matching the input distribution while preserving the semantic content. While these methods produce realistic samples, important issues persist concerning how well they generalize across different classification architectures and their overall impact in accuracy improvement. Furthermore, the relationship between dataset size and model accuracy, as well as the determination of an optimal augmentation level, remains an open question in the field. Aiming to address these challenges, in this paper, we investigate the effectiveness of eight data augmentation methods—StyleGAN3, DCGAN, SAGAN, RandAugment, Random Erasing, AutoAugment, TrivialAugment and AugMix—throughout several classification networks of varying depth: ResNet18, ConvNeXt-Nano, DenseNet121 and InceptionResNetV2. By comparing their performance on diverse datasets from leaf textures, medical imaging and remote sensing, we assess which methods offer superior accuracy and generalization capability in training models with no pre-trained weights. Our findings indicate that deep learning data augmentation is an effective tool for dealing with small datasets, achieving accuracy gains of up to 17%. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

Figure 1
Example of rotation geometric transformation. (a) Rotation does preserve label. (b) Rotation does not preserve label. Full article ">Figure 2
Six samples from the 1200Tex, Kather, and Brazilian Coffee Scenes (BCSs) data collections, with each column representing a sample and each row corresponding to a dataset. Full article ">Figure 3
Schematization of proposed data augmentation training and testing pipeline. Full article ">Figure 4
Samples generated by all evaluated data augmentation approaches. Full article ">Figure 5
The t-SNE projection of the original training sets for the 1200Tex, Kather and Brazilian Coffee Scenes datasets. Full article ">Figure 6
The t-SNE projections of all compared augmentation approaches for the 1200Tex dataset. Full article ">Figure 7
The t-SNE projections of all compared augmentation approaches for the Kather dataset. Full article ">Figure 8
The t-SNE projections of all compared augmentation approaches for the Brazilian Coffee Scenes dataset. Full article ">

14 pages, 3344 KiB

Open AccessArticle

Robot-Based Procedure for 3D Reconstruction of Abdominal Organs Using the Iterative Closest Point and Pose Graph Algorithms

by Birthe Göbel, Jonas Huurdeman, Alexander Reiterer and Knut Möller

J. Imaging 2025, 11(2), 44; https://doi.org/10.3390/jimaging11020044 - 5 Feb 2025

Abstract

Image-based 3D reconstruction enables robot-assisted interventions and image-guided navigation, which are emerging technologies in laparoscopy. When a robotic arm guides a laparoscope for image acquisition, hand–eye calibration is required to know the transformation between the camera and the robot flange. The calibration procedure [...] Read more.

Image-based 3D reconstruction enables robot-assisted interventions and image-guided navigation, which are emerging technologies in laparoscopy. When a robotic arm guides a laparoscope for image acquisition, hand–eye calibration is required to know the transformation between the camera and the robot flange. The calibration procedure is complex and must be conducted after each intervention (when the laparoscope is dismounted for cleaning). In the field, the surgeons and their assistants cannot be expected to do so. Thus, our approach is a procedure for a robot-based multi-view 3D reconstruction without hand–eye calibration, but with pose optimization algorithms instead. In this work, a robotic arm and a stereo laparoscope build the experimental setup. The procedure includes the stereo matching algorithm Semi Global Matching from OpenCV for depth measurement and the multiscale color iterative closest point algorithm from Open3D (v0.19), along with the multiway registration algorithm using a pose graph from Open3D (v0.19) for pose optimization. The procedure is evaluated quantitatively and qualitatively on ex vivo organs. The results are a low root mean squared error (1.1–3.37 mm) and dense point clouds. The proposed procedure leads to a plausible 3D model, and there is no need for complex hand–eye calibration, as this step can be compensated for by pose optimization algorithms. Full article

(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))

► Show Figures

Figure 1

Figure 1
Schematic overview of the 3D reconstruction method stereoscopy. It shows the surface to be reconstructed, with point P in blue, the laparoscope tip with two image sensors generating the estimated depth (black arrow), and the estimated camera positions (x marks and dotted line). Full article ">Figure 2
Schematic overview of the experimental setup. The TipCam Rubina 1S 30° is held by the UR5 CB3 robotic arm (Universal Robots A/S, Odense, Denmark). The video laparoscope is equipped with a stereo camera system with chip-on-the-tip technology, which is angled at 30° and has an 80° FOV. Full article ">Figure 3
Visualization of the architecture of the robot-based 3D reconstruction procedure. Full article ">Figure 4
Photography of pig organs (left) and screenshot of the reconstructed point cloud created by our approach (right). Full article ">Figure 5
Screenshot of a section of the reconstructed point cloud in front view (left) and side view (right). Full article ">Figure 6
Six example images from the examined dataset (always left images) (left) and the corresponding depth maps with nan values in white, closer objects in more yellowish tones, and objects further away in more blueish colors (right). Full article ">Figure 7
Screenshots of the pig organ point clouds (left) and an excerpt with a focus on the gallbladder (right) created by three different approaches: camera position estimation only by robot kinematics (Rob only) (top), by robot kinematics + ICP (Rob + ICP) (middle), by robot kinematics + ICP + pose graphs (Rob +ICP + pose graph) (bottom). Full article ">Figure 8
Screenshot of the ground truth point cloud (left) and the reconstructed point cloud by our approach (middle) with markers A0–A7 and R0–R7. The markers are used for point cloud alignment to compute the reconstruction error as RMSE. Screenshot of the overlaid ground truth in black and the reconstructed point cloud in colors to compute the reconstruction error (right). Full article ">

24 pages, 6629 KiB

Open AccessArticle

UnDER: Unsupervised Dense Point Cloud Extraction Routine for UAV Imagery Using Deep Learning

by John Ray Bergado and Francesco Nex

Remote Sens. 2025, 17(1), 24; https://doi.org/10.3390/rs17010024 - 25 Dec 2024

Viewed by 479

Abstract

Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within [...] Read more.

Extraction of dense 3D geographic information from ultra-high-resolution unmanned aerial vehicle (UAV) imagery unlocks a great number of mapping and monitoring applications. This is facilitated by a step called dense image matching, which tries to find pixels corresponding to the same object within overlapping images captured by the UAV from different locations. Recent developments in deep learning utilize deep convolutional networks to perform this dense pixel correspondence task. A common theme in these developments is to train the network in a supervised setting using available dense 3D reference datasets. However, in this work we propose a novel unsupervised dense point cloud extraction routine for UAV imagery, called UnDER. We propose a novel disparity-shifting procedure to enable the use of a stereo matching network pretrained on an entirely different typology of image data in the disparity-estimation step of UnDER. Unlike previously proposed disparity-shifting techniques for forming cost volumes, the goal of our procedure was to address the domain shift between the images that the network was pretrained on and the UAV images, by using prior information from the UAV image acquisition. We also developed a procedure for occlusion masking based on disparity consistency checking that uses the disparity image space rather than the object space proposed in a standard 3D reconstruction routine for UAV data. Our benchmarking results demonstrated significant improvements in quantitative performance, reducing the mean cloud-to-cloud distance by approximately 1.8 times the ground sampling distance (GSD) compared to other methods. Full article

(This article belongs to the Special Issue Deep Learning Applications of 3D Reconstruction and Visualization from Remote Sensing Imagery)

► Show Figures

Figure 1

22 pages, 6639 KiB

Open AccessArticle

Reliable Disparity Estimation Using Multiocular Vision with Adjustable Baseline

by Victor H. Diaz-Ramirez, Martin Gonzalez-Ruiz, Rigoberto Juarez-Salazar and Miguel Cazorla

Sensors 2025, 25(1), 21; https://doi.org/10.3390/s25010021 - 24 Dec 2024

Viewed by 516

Abstract

Accurate estimation of three-dimensional (3D) information from captured images is essential in numerous computer vision applications. Although binocular stereo vision has been extensively investigated for this task, its reliability is conditioned by the baseline between cameras. A larger baseline improves the resolution of [...] Read more.

Accurate estimation of three-dimensional (3D) information from captured images is essential in numerous computer vision applications. Although binocular stereo vision has been extensively investigated for this task, its reliability is conditioned by the baseline between cameras. A larger baseline improves the resolution of disparity estimation but increases the probability of matching errors. This research presents a reliable method for disparity estimation through progressive baseline increases in multiocular vision. First, a robust rectification method for multiocular images is introduced, satisfying epipolar constraints and minimizing induced distortion. This method can improve rectification error by 25% for binocular images and 80% for multiocular images compared to well-known existing methods. Next, a dense disparity map is estimated by stereo matching from the rectified images with the shortest baseline. Afterwards, the disparity map for the subsequent images with an extended baseline is estimated within a short optimized interval, minimizing the probability of matching errors and further error propagation. This process is iterated until the disparity map for the images with the longest baseline is obtained. The proposed method increases disparity estimation accuracy by 20% for multiocular images compared to a similar existing method. The proposed approach enables accurate scene characterization and spatial point computation from disparity maps with improved resolution. The effectiveness of the proposed method is verified through exhaustive evaluations using well-known multiocular image datasets and physical scenes, achieving superior performance over similar existing methods in terms of objective measures. Full article

(This article belongs to the Collection Robotics and 3D Computer Vision)

► Show Figures

Figure 1

26 pages, 13651 KiB

Open AccessArticle

Dense In Situ Underwater 3D Reconstruction by Aggregation of Successive Partial Local Clouds

by Loïca Avanthey and Laurent Beaudoin

Remote Sens. 2024, 16(24), 4737; https://doi.org/10.3390/rs16244737 - 19 Dec 2024

Viewed by 557

Abstract

Assessing the completeness of an underwater 3D reconstruction on-site is crucial as it allows for rescheduling acquisitions, which capture missing data during a mission, avoiding additional costs of a subsequent mission. This assessment needs to rely on a dense point cloud since a [...] Read more.

Assessing the completeness of an underwater 3D reconstruction on-site is crucial as it allows for rescheduling acquisitions, which capture missing data during a mission, avoiding additional costs of a subsequent mission. This assessment needs to rely on a dense point cloud since a sparse cloud lacks detail and a triangulated model can hide gaps. The challenge is to generate a dense cloud with field-deployable tools. Traditional dense reconstruction methods can take several dozen hours on low-capacity systems like laptops or embedded units. To speed up this process, we propose building the dense cloud incrementally within an SfM framework while incorporating data redundancy management to eliminate recalculations and filtering already-processed data. The method evaluates overlap area limits and computes depths by propagating the matching around SeaPoints—the keypoints we design for identifying reliable areas regardless of the quality of the processed underwater images. This produces local partial dense clouds, which are aggregated into a common frame via the SfM pipeline to produce the global dense cloud. Compared to the production of complete dense local clouds, this approach reduces the computation time by about 70% while maintaining a comparable final density. The underlying prospect of this work is to enable real-time completeness estimation directly on board, allowing for the dynamic re-planning of the acquisition trajectory. Full article

(This article belongs to the Special Issue Advances in Positioning, Navigation and 3D Mapping of Underwater Environments)

► Show Figures

Figure 1

20 pages, 15268 KiB

Open AccessArticle

Automatic Reading and Reporting Weather Information from Surface Fax Charts for Ships Sailing in Actual Northern Pacific and Atlantic Oceans

by Jun Jian, Yingxiang Zhang, Ke Xu and Peter J. Webster

J. Mar. Sci. Eng. 2024, 12(11), 2096; https://doi.org/10.3390/jmse12112096 - 19 Nov 2024

Viewed by 720

Abstract

This study is aimed to improve the intelligence level, efficiency, and accuracy of ship safety and security systems by contributing to the development of marine weather forecasting. The accurate and prompt recognition of weather fax charts is very important for navigation safety. This [...] Read more.

This study is aimed to improve the intelligence level, efficiency, and accuracy of ship safety and security systems by contributing to the development of marine weather forecasting. The accurate and prompt recognition of weather fax charts is very important for navigation safety. This study employed many artificial intelligent (AI) methods including a vectorization approach and target recognition algorithm to automatically detect the severe weather information from Japanese and US weather charts. This enabled the expansion of an existing auto-response marine forecasting system’s applications toward north Pacific and Atlantic Oceans, thus enhancing decision-making capabilities and response measures for sailing ships at actual sea. The OpenCV image processing method and YOLOv5s/YOLO8vn algorithm were utilized to make template matches and locate warning symbols and weather reports from surface weather charts. After these improvements, the average accuracy of the model significantly increased from 0.920 to 0.928, and the detection rate of a single image reached a maximum of 1.2 ms. Additionally, OCR technology was applied to retract texts from weather reports and highlighted the marine areas where dense fog and great wind conditions are likely to occur. Finally, the field tests confirmed that this auto and intelligent system could assist the navigator within 2–3 min and thus greatly enhance the navigation safety in specific areas in the sailing routes with minor text-based communication costs. Full article

(This article belongs to the Special Issue Ship Performance in Actual Seas)

► Show Figures

Figure 1

24 pages, 14942 KiB

Open AccessArticle

The Ground-Penetrating Radar Image Matching Method Based on Central Dense Structure Context Features

by Jie Xu, Qifeng Lai, Dongyan Wei, Xinchun Ji, Ge Shen and Hong Yuan

Remote Sens. 2024, 16(22), 4291; https://doi.org/10.3390/rs16224291 - 18 Nov 2024

Cited by 1 | Viewed by 763

Abstract

Subsurface structural distribution can be detected using Ground-Penetrating Radar (GPR). The distribution can be considered as road fingerprints for vehicle positioning. Similar to the principle of visual image matching for localization, the position coordinates of the vehicle can be calculated by matching real-time [...] Read more.

Subsurface structural distribution can be detected using Ground-Penetrating Radar (GPR). The distribution can be considered as road fingerprints for vehicle positioning. Similar to the principle of visual image matching for localization, the position coordinates of the vehicle can be calculated by matching real-time GPR images with pre-constructed reference GPR images. However, GPR images, due to their low resolution, cannot extract well-defined geometric features such as corners and lines. Thus, traditional visual image processing algorithms perform inadequately when applied to GPR image matching. To address this issue, this paper innovatively proposes a GPR image matching and localization method based on a novel feature descriptor, termed as central dense structure context (CDSC) features. The algorithm utilizes the strip-like elements in GPR images to improve the accuracy of GPR image matching. First, a CDSC feature descriptor is designed. By applying threshold segmentation and extremum point extraction to the GPR image, stratified strip-like elements and pseudo-corner points are obtained. The pseudo-corner points are treated as the centers, and the surrounding strip-like elements are described in context to form the GPR feature descriptors. Then, based on the feature description method, feature descriptors for both the real-time image and the reference image are calculated separately. By searching for the nearest matching point pairs and removing erroneous pairs, GPR image matching and localization are achieved. The proposed algorithm was evaluated on datasets collected from urban roads and railway tracks, achieving localization errors of 0.06 m (RMSE) and 1.22 m (RMSE), respectively. Compared to the traditional Speeded Up Robust Features (SURF) visual image matching algorithm, localization errors were reduced by 86.6% and 95.7% in urban road and railway track scenarios, respectively. Full article

(This article belongs to the Special Issue Advanced Ground-Penetrating Radar (GPR) Technologies and Applications)

► Show Figures

Graphical abstract

18 pages, 2990 KiB

Open AccessArticle

A GGCM-E Based Semantic Filter and Its Application in VSLAM Systems

by Yuanjie Li, Chunyan Shao and Jiaming Wang

Electronics 2024, 13(22), 4487; https://doi.org/10.3390/electronics13224487 - 15 Nov 2024

Viewed by 481

Abstract

Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although [...] Read more.

Image matching-based visual simultaneous localization and mapping (vSLAM) extracts low-level pixel features to reconstruct camera trajectories and maps through the epipolar geometry method. However, it fails to achieve correct trajectories and mapping when there are low-quality feature correspondences in several challenging environments. Although the RANSAC-based framework can enable better results, it is computationally inefficient and unstable in the presence of a large number of outliers. A Faster R-CNN learning-based semantic filter is proposed to explore the semantic information of inliers to remove low-quality correspondences, helping vSLAM localize accurately in our previous work. However, the semantic filter learning method generalizes low precision for low-level and dense texture-rich scenes, leading the semantic filter-based vSLAM to be unstable and have poor geometry estimation. In this paper, a GGCM-E-based semantic filter using YOLOv8 is proposed to address these problems. Firstly, the semantic patches of images are collected from the KITTI dataset, the TUM dataset provided by the Technical University of Munich, and real outdoor scenes. Secondly, the semantic patches are classified by our proposed GGCM-E descriptors to obtain the YOLOv8 neural network training dataset. Finally, several semantic filters for filtering low-level and dense texture-rich scenes are generated and combined into the ORB-SLAM3 system. Extensive experiments show that the semantic filter can detect and classify semantic levels of different scenes effectively, filtering low-level semantic scenes to improve the quality of correspondences, thus achieving accurate and robust trajectory reconstruction and mapping. For the challenging autonomous driving benchmark and real environments, the vSLAM system with respect to the GGCM-E-based semantic filter demonstrates its superiority regarding reducing the 3D position error, such that the absolute trajectory error is reduced by up to approximately 17.44%, showing its promise and good generalization. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Robotics)

► Show Figures

Figure 1

21 pages, 12827 KiB

Open AccessArticle

Research on the Registration of Aerial Images of Cyclobalanopsis Natural Forest Based on Optimized Fast Sample Consensus Point Matching with SIFT Features

by Peng Wu, Hailong Liu, Xiaomei Yi, Lufeng Mo, Guoying Wang and Shuai Ma

Forests 2024, 15(11), 1908; https://doi.org/10.3390/f15111908 - 29 Oct 2024

Viewed by 866

Abstract

The effective management and conservation of forest resources hinge on accurate monitoring. Nonetheless, individual remote-sensing images captured by low-altitude unmanned aerial vehicles (UAVs) fail to encapsulate the entirety of a forest’s characteristics. The application of image-stitching technology to high-resolution drone imagery facilitates a [...] Read more.

The effective management and conservation of forest resources hinge on accurate monitoring. Nonetheless, individual remote-sensing images captured by low-altitude unmanned aerial vehicles (UAVs) fail to encapsulate the entirety of a forest’s characteristics. The application of image-stitching technology to high-resolution drone imagery facilitates a prompt evaluation of forest resources, encompassing quantity, quality, and spatial distribution. This study introduces an improved SIFT algorithm designed to tackle the challenges of low matching rates and prolonged registration times encountered with forest images characterized by dense textures. By implementing the SIFT-OCT (SIFT omitting the initial scale space) approach, the algorithm bypasses the initial scale space, thereby reducing the number of ineffective feature points and augmenting processing efficiency. To bolster the SIFT algorithm’s resilience against rotation and illumination variations, and to furnish supplementary information for registration even when fewer valid feature points are available, a gradient location and orientation histogram (GLOH) descriptor is integrated. For feature matching, the more computationally efficient Manhattan distance is utilized to filter feature points, which further optimizes efficiency. The fast sample consensus (FSC) algorithm is then applied to remove mismatched point pairs, thus refining registration accuracy. This research also investigates the influence of vegetation coverage and image overlap rates on the algorithm’s efficacy, using five sets of Cyclobalanopsis natural forest images. Experimental outcomes reveal that the proposed method significantly reduces registration time by an average of 3.66 times compared to that of SIFT, 1.71 times compared to that of SIFT-OCT, 5.67 times compared to that of PSO-SIFT, and 3.42 times compared to that of KAZE, demonstrating its superior performance. Full article

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

► Show Figures

Figure 1

22 pages, 10007 KiB

Open AccessArticle

Deep Learning-Based Emergency Rescue Positioning Technology Using Matching-Map Images

by Juil Jeon, Myungin Ji, Jungho Lee, Kyeong-Soo Han and Youngsu Cho

Remote Sens. 2024, 16(21), 4014; https://doi.org/10.3390/rs16214014 - 29 Oct 2024

Cited by 1 | Viewed by 725

Abstract

Smartphone-based location estimation technology is becoming increasingly important across various fields. Accurate location estimation plays a critical role in life-saving efforts during emergency rescue situations, where rapid response is essential. Traditional methods such as GPS often face limitations in indoors or in densely [...] Read more.

Smartphone-based location estimation technology is becoming increasingly important across various fields. Accurate location estimation plays a critical role in life-saving efforts during emergency rescue situations, where rapid response is essential. Traditional methods such as GPS often face limitations in indoors or in densely built environments, where signals may be obstructed or reflected, leading to inaccuracies. Similarly, fingerprinting-based methods rely heavily on existing infrastructure and exhibit signal variability, making them less reliable in dynamic, real-world conditions. In this study, we analyzed the strengths and weaknesses of different types of wireless signal data and proposed a new deep learning-based method for location estimation that comprehensively integrates these data sources. The core of our research is the introduction of a ‘matching-map image’ conversion technique that efficiently integrates LTE, WiFi, and BLE signals. These generated matching-map images were applied to a deep learning model, enabling highly accurate and stable location estimates even in challenging emergency rescue situations. In real-world experiments, our method, utilizing multi-source data, achieved a positioning success rate of 85.27%, which meets the US FCC’s E911 standards for location accuracy and reliability across various conditions and environments. This makes the proposed approach particularly well-suited for emergency applications, where both accuracy and speed are critical. Full article

(This article belongs to the Topic Smartphone Positioning, Navigation and Timing: Advances and Challenges)

► Show Figures

Figure 1

17 pages, 3301 KiB

Open AccessArticle

Stereo and LiDAR Loosely Coupled SLAM Constrained Ground Detection

by Tian Sun, Lei Cheng, Ting Zhang, Xiaoping Yuan, Yanzheng Zhao and Yong Liu

Sensors 2024, 24(21), 6828; https://doi.org/10.3390/s24216828 - 24 Oct 2024

Viewed by 928

Abstract

In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain [...] Read more.

In many robotic applications, creating a map is crucial, and 3D maps provide a method for estimating the positions of other objects or obstacles. Most of the previous research processes 3D point clouds through projection-based or voxel-based models, but both approaches have certain limitations. This paper proposes a hybrid localization and mapping method using stereo vision and LiDAR. Unlike the traditional single-sensor systems, we construct a pose optimization model by matching ground information between LiDAR maps and visual images. We use stereo vision to extract ground information and fuse it with LiDAR tensor voting data to establish coplanarity constraints. Pose optimization is achieved through a graph-based optimization algorithm and a local window optimization method. The proposed method is evaluated using the KITTI dataset and compared against the ORB-SLAM3, F-LOAM, LOAM, and LeGO-LOAM methods. Additionally, we generate 3D point cloud maps for the corresponding sequences and high-definition point cloud maps of the streets in sequence 00. The experimental results demonstrate significant improvements in trajectory accuracy and robustness, enabling the construction of clear, dense 3D maps. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

24 pages, 14015 KiB

Open AccessArticle

CDP-MVS: Forest Multi-View Reconstruction with Enhanced Confidence-Guided Dynamic Domain Propagation

by Zitian Liu, Zhao Chen, Xiaoli Zhang and Shihan Cheng

Remote Sens. 2024, 16(20), 3845; https://doi.org/10.3390/rs16203845 - 16 Oct 2024

Viewed by 981

Abstract

Using multi-view images of forest plots to reconstruct dense point clouds and extract individual tree parameters enables rapid, high-precision, and cost-effective forest plot surveys. However, images captured at close range face challenges in forest reconstruction, such as unclear canopy reconstruction, prolonged reconstruction times, [...] Read more.

Using multi-view images of forest plots to reconstruct dense point clouds and extract individual tree parameters enables rapid, high-precision, and cost-effective forest plot surveys. However, images captured at close range face challenges in forest reconstruction, such as unclear canopy reconstruction, prolonged reconstruction times, insufficient accuracy, and issues with tree duplication. To address these challenges, this paper introduces a new image dataset creation process that enhances both the efficiency and quality of image acquisition. Additionally, a block-matching-based multi-view reconstruction algorithm, Forest Multi-View Reconstruction with Enhanced Confidence-Guided Dynamic Domain Propagation (CDP-MVS), is proposed. The CDP-MVS algorithm addresses the issue of canopy and sky mixing in reconstructed point clouds by segmenting the sky in the depth maps and setting its depth value to zero. Furthermore, the algorithm introduces a confidence calculation method that comprehensively evaluates multiple aspects. Moreover, CDP-MVS employs a decentralized dynamic domain propagation sampling strategy, guiding the propagation of the dynamic domain through newly defined confidence measures. Finally, this paper compares the reconstruction results and individual tree parameters of the CDP-MVS, ACMMP, and PatchMatchNet algorithms using self-collected data. Visualization results show that, compared to the other two algorithms, CDP-MVS produces the least sky noise in tree reconstructions, with the clearest and most detailed canopy branches and trunk sections. In terms of parameter metrics, CDP-MVS achieved 100% accuracy in reconstructing tree quantities across the four plots, effectively avoiding tree duplication. The accuracy of breast diameter extraction values of point clouds reconstructed by CDPMVS reached 96.27%, 90%, 90.64%, and 93.62%, respectively, in the four sample plots. The positional deviation of reconstructed trees, compared to ACMMP, was reduced by 0.37 m, 0.07 m, 0.18 m and 0.33 m, with the average distance deviation across the four plots converging within 0.25 m. In terms of reconstruction efficiency, CDP-MVS completed the reconstruction of the four plots in 1.8 to 3.1 h, reducing the average reconstruction time per plot by six minutes compared to ACMMP and by two to three times compared to PatchMatchNet. Finally, the differences in tree height accuracy among the point clouds reconstructed by the different algorithms were minimal. The experimental results demonstrate that CDP-MVS, as a multi-view reconstruction algorithm tailored for forest reconstruction, shows promising application potential and can provide valuable support for forestry surveys. Full article

► Show Figures

Figure 1

20 pages, 6262 KiB

Open AccessArticle

YPR-SLAM: A SLAM System Combining Object Detection and Geometric Constraints for Dynamic Scenes

by Xukang Kan, Gefei Shi, Xuerong Yang and Xinwei Hu

Sensors 2024, 24(20), 6576; https://doi.org/10.3390/s24206576 - 12 Oct 2024

Viewed by 895

Abstract

Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. [...] Read more.

Traditional SLAM systems assume a static environment, but moving objects break this ideal assumption. In the real world, moving objects can greatly influence the precision of image matching and camera pose estimation. In order to solve these problems, the YPR-SLAM system is proposed. First of all, the system includes a lightweight YOLOv5 detection network for detecting both dynamic and static objects, which provides pre-dynamic object information to the SLAM system. Secondly, utilizing the prior information of dynamic targets and the depth image, a method of geometric constraint for removing motion feature points from the depth image is proposed. The Depth-PROSAC algorithm is used to differentiate the dynamic and static feature points so that dynamic feature points can be removed. At last, the dense cloud map is constructed by the static feature points. The YPR-SLAM system is an efficient combination of object detection and geometry constraint in a tightly coupled way, eliminating motion feature points and minimizing their adverse effects on SLAM systems. The performance of the YPR-SLAM was assessed on the public TUM RGB-D dataset, and it was found that YPR-SLAM was suitable for dynamic situations. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

38 pages, 98377 KiB

Open AccessArticle

FaSS-MVS: Fast Multi-View Stereo with Surface-Aware Semi-Global Matching from UAV-Borne Monocular Imagery

by Boitumelo Ruf, Martin Weinmann and Stefan Hinz

Sensors 2024, 24(19), 6397; https://doi.org/10.3390/s24196397 - 2 Oct 2024

Viewed by 831

Abstract

With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D [...] Read more.

With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D mapping, meaning that a 3D map of the scene is immediately and incrementally generated as the image data are acquired or being received. FaSS-MVS is composed of a hierarchical processing scheme in which depth and normal data, as well as corresponding confidence scores, are estimated in a coarse-to-fine manner, allowing efficient processing of large scene depths, such as those inherent in oblique images acquired by UAVs flying at low altitudes. The actual depth estimation uses a plane-sweep algorithm for dense multi-image matching to produce depth hypotheses from which the actual depth map is extracted by means of a surface-aware semi-global optimization, reducing the fronto-parallel bias of Semi-Global Matching (SGM). Given the estimated depth map, the pixel-wise surface normal information is then computed by reprojecting the depth map into a point cloud and computing the normal vectors within a confined local neighborhood. In a thorough quantitative and ablative study, we show that the accuracy of the 3D information computed by FaSS-MVS is close to that of state-of-the-art offline multi-view stereo approaches, with the error not even an order of magnitude higher than that of COLMAP. At the same time, however, the average runtime of FaSS-MVS for estimating a single depth and normal map is less than 14% of that of COLMAP, allowing us to perform online and incremental processing of full HD images at 1–2 Hz. Full article

(This article belongs to the Special Issue Advances on UAV-Based Sensing and Imaging)

► Show Figures

Figure 1

Search Results (254)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (254)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI