1. Introduction
Both 3D reconstruction and mobile mapping are critical in supporting various applications in urban environments, including but not limited to autonomous driving, smart logistics, pedestrian navigation, and virtual reality. In the last decade, remote sensing-based techniques have emerged as a meaningful solution for ensuring urban environments are evaluated in an orderly fashion, due to the rapid evolution of cutting-edge techniques, e.g., SfM (Structure from Motion), SLAM (Simultaneous Localization and Mapping), and the revolution in deep learning techniques that enhance the entire pipeline, e.g., NeRF (Neural Radiance Field). In conclusion, the explosive development of 3D reconstruction and mobile mapping has been particularly notable in recent years.
This Special Issue comprises high-quality papers focusing on the techniques and applications of 3D reconstruction and mobile mapping in urban environments. A total of 15 papers are published in this Special Issue, covering topics such as image feature matching and dense matching, LiDAR/image-fused SLAM for image orientation and tunnel mapping, NeRF-based scene rendering and orthophoto generation, and other interesting applications, such as InSAR point cloud registration and 3D Ground-Penetrating Radar (3D GPR) for underground imaging and positioning. The details of each paper will be described in the following section.
2. Overview
Reliable feature matching is the first step of 3D reconstruction, determining the success of subsequent processing. Focusing on the feature matching of spherical images, Jiang et al. [
1] present an algorithm by combining local geometric rectification with convolutional neural network (CNN) learned descriptors. It addresses the challenge of the geometric distortions inherent in spherical images and improves the performance of 3D reconstruction systems. The method utilized includes a local geometric rectification, a CNN-based descriptor learning network for rectified patches, and a robust essential matrix estimation for outlier removal. The effectiveness of the proposed solution is demonstrated through experiments using real spherical images.
Yao et al. [
2] introduce a quasi-dense matching algorithm for oblique stereo images with large viewpoint changes. The core idea of the proposed method relies on the combination of VGG16-UNet-based semantic segmentation with LoFTR-based local feature enhancement. The method involves segmenting multiplanar scenes, performing affine-invariant feature matching, and enhancing weak texture regions to improve the matching accuracy. By using low-altitude stereo images, the experiments demonstrate significant advantages in match quantity, accuracy, and spatial distribution over classical and deep-learning methods.
The 3D reconstruction of ancient buildings plays a critical role in digital city construction. By using recent techniques, Ge et al. [
3] present a NeRF-based 3D reconstruction workflow for UAV images with depth supervision. It introduces a multi-resolution hash coding approach to reduce hash conflicts and a truncated signed distance function (TSDF) to improve geometric accuracy. Through the use of collected UAV (Unmanned Aerial Vehicle) images, the test results demonstrate that the proposed solution can render images with clearer structural details and achieves a notable improvement in performance, with a 15% gain on average in the Peak Signal-to-Noise Ratio (PSNR) and a 10% gain in the Structural Similarity Index Measure (SSIM), producing detailed and accurate 3D models that are suitable for the digital preservation of cultural heritage sites.
A digital orthophoto is one of the most important developments, and it has been produced via the standard photogrammetric workflow for many years. Recently, by using the cutting-edge technique, Lv et al. [
4] have presented a comparative study of explicit and implicit methods for generating digital orthophotos. The explicit method, termed as TDM (top-view-constrained dense matching), relies on the traditional geometric approach, while the implicit method, namely Instant NGP, employs neural rendering with Neural Radiance Fields. The comparative test concludes that both methods can produce accurate and high-quality orthophotos; due to the usage of the Compute Unified Device Architecture (CUDA) acceleration technique, TDM has significantly higher efficiency. To summarize, the study offers insights for selecting appropriate digital orthophoto generation methods based on efficiency and quality requirements.
Arza-García et al. [
5] propose a cost-effective method for assessing the structural stability of a typical 3D model application, rubble mound breakwaters (RMBs), through the combination of UAV photogrammetry and Random Sample Consensus (RANSAC). In the proposed workflow, the photogrammetric point clouds of the RMB are generated via Structure from Motion and Multi-View Stereo (SfM-MVS) from pre- and post-storm flights, and they are fed to RANSAC for plane extraction and segmentation. Finally, by using a spatial proximity criterion, the cuboids of the two time periods are registered. The tests conducted on a breakwater in Porto, Portugal, show that the proposed method successfully identified post-storm structural changes and showcased its potential for monitoring RMB.
For urban 3D modeling, Cui et al. [
6] introduce a method to extract urban building heights from Gaofen-7 stereo satellite images. The key technique involves using a contour matching algorithm to accurately determine rooftop elevations and using ground filtering to generate a DEM (Digital Elevation Model) from the DSM (Digital Surface Model). The proposed solution addresses challenges like occlusions, inaccurate ground elevation, and high-rise buildings, and it has been well-verified by using stereo images from three different provinces. The results verify the improved accuracy in building height extraction, especially beneficial for high-rise buildings and sites with complex terrain or vegetation.
For multi-source data fusion, Liu et al. [
7] present a robust multi-sensor SLAM system, termed LVI-fusion, that integrates camera, lidar, and IMU data. The proposed mainly consists of a time alignment module to handle varying data frequencies, an image segmentation module for dynamic target removal, and a depth recovery model for feature points. The system uses a sliding window optimization module to achieve real-time pose calculation. The tests, carried out in various environments, demonstrate that the proposed method has high accuracy and robustness, and outperforms the other existing SLAM solutions, particularly in dynamic settings.
Xu et al. [
8] present an enhanced Strapdown Inertial Navigation System (SINS) and a LiDAR tightly integrated SLAM system for urban environments with sparse structural features. The method refines an edge point extraction process from the LOAM algorithm and introduces a Kalman filter using line distance error as the primary observation metric to improve the robustness and accuracy of the system. The experimental tests conducted in various environments demonstrate its superior performance, with a 17% enhancement in positioning accuracy, especially in scenarios with limited structural features.
Point cloud registration, which aims to align two 3D point clouds using keypoint correspondences, is essential in photogrammetry and remote sensing. Traditional methods face challenges due to uncertainties in keypoint detection and matching, leading to outliers that reduce efficiency and accuracy. Wang et al. [
9] present a new registration method using a compatibility graph and accelerated guided sampling, introducing a minimum subset sampling approach to minimize outlier impact and a preference-based sampling strategy to enhance computational efficiency and accuracy. Using synthetic and real datasets, the test results show that the proposed solution achieves a minimum rotation error of 0.737° and a minimum translation error of 0.0201 m, respectively, compared with existing methods.
In complex scenes with closely adjacent trees and buildings, the accurate extraction of building point clouds is challenging. Su et al. [
10] introduce a two-stage method for building-point-cloud extraction based on geometric information. The first stage coarsely extracts building points, which are refined using mask polygons and a region-growing algorithm in the second stage. The method integrates the Alpha Shape algorithm and neighborhood expansion to address missing boundary points and applies mask extraction to the original points to avoid errors in facade identification. The approach shows significant improvements in extraction accuracy, outperforming PointNet by 20.73% in terms of precision and achieving results comparable to the HDL-JME-GGO network on the Urban-LiDAR and Vaihingen datasets.
Mutlti-source data fusion is a key step in the application of vehicle-borne mobile mapping systems (MMSs). Ji et al. [
11] propose a method for vehicle-borne laser point cloud and panoramic images based on occlusion removal. The approach involves removing irrelevant points, extracting relevant scenes based on trajectory points, and applying a collinear model with spherical projection for matching. In addition, a vectorial angle selection algorithm is designed, in order to filter out occluded projections. The experimental results show the proposed solution can achieve an average pixel error of 2.82 pixels and a positional error of 4 cm, verifying that it is effective for data fusion applications in navigation, surveying, and mapping.
Cheng et al. [
12] introduce an image-aided LiDAR framework for the extraction, classification, and characterization of lane markings from mobile mapping data. The framework addresses road safety by improving lane-marking inventory through a combination of imagery and LiDAR data, enhancing the detection of markings under various conditions. The framework includes road surface identification and color/intensity enhancement, and utilizes a geographic information system for visualization. The study demonstrates the system’s effectiveness over an extended road network, showing the potential to improve road safety analyses.
To address the need for the high-precision point cloud mapping of subway trains in long tunnel scenarios, Li et al. [
13] introduce a LiDAR and inertial measurement sensor-based map construction method. The approach integrates a tightly coupled front-end odometry system by using Kalman filters with back-end optimization via factor graphs. In the front end, inertial measurements predict filter updates based on LiDAR points and local map planes. A global pose graph, built from inter-frame odometry and constraints, undergoes smoothing optimization for accurate mapping. The experiments show that it achieves a trajectory consistency of 0.1 m and an accumulated error of less than 0.2% compared to ground truth.
Array interferometric synthetic aperture radar (Array InSAR) systems can address shadow issues by performing scans in opposite directions. However, point clouds from two scans must be registered accurately. Cui et al. [
14] present a robust registration method for urban Array InSAR point clouds, which uses images to represent 3D data, where pixel positions reflect azimuth and ground range, and pixel intensity represents height. The KAZE algorithm and an enhanced matching approach identify corresponding points to estimate transformation relationships. The experimental results show that it achieves the facade registration with a relative angular difference of less than 0.5°, and ground element registration achieves a Root Mean Square Error (RMSE) of less than 1.5 m.
Three-Dimensional Ground-Penetrating Radar (3D GPR) offers non-destructive and continuous subsurface detection but faces challenges regarding positioning accuracy in complex urban environments. Zhang et al. [
15] propose a multi-level robust positioning method to enhance the accuracy of 3D GPR. In areas with strong GNSS signals, differential GNSS technology ensures rapid, precise positioning. For weak GNSS signals, a GNSS/INS tightly coupled solution improves accuracy, while in GNSS-denied environments, SLAM technology integrates INS data and 3D point clouds. This approach achieves a positioning accuracy of better than 10 cm, delivers high-quality 3D images of underground urban structures, and supports urban road surveys and underground disease detection.