CN116543117B

CN116543117B - A high-precision three-dimensional modeling method for large scenes from drone images

Info

Publication number: CN116543117B
Application number: CN202310252401.5A
Authority: CN
Inventors: 余卓渊; 金鹏飞; 石智杰
Original assignee: Institute of Geographic Sciences and Natural Resources of CAS
Current assignee: Institute of Geographic Sciences and Natural Resources of CAS
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2024-01-09
Anticipated expiration: 2043-03-16
Also published as: CN116543117A

Abstract

The invention discloses a high-precision large-scene three-dimensional modeling method for unmanned aerial vehicle images, which comprises three steps in total, wherein the method avoids the traditional oblique photogrammetry three-dimensional modeling method, an unmanned aerial vehicle aerial image set is divided according to a circular surrounding track, the pose of an unmanned aerial vehicle camera is recovered by using an SFM algorithm after feature extraction, matching and geometric verification of the divided image set, then sub NeRF is trained, and finally sub NeRF around a target visual angle is combined, so that the implicit construction of a large-scene three-dimensional model is completed; through experimental tests, the method achieves good effect and can reconstruct the ground object with smooth surface and small cross section.

Description

A high-precision three-dimensional modeling method for large scenes from drone images

技术领域Technical field

本发明涉及三维建模技术领域，尤其涉及一种无人机影像的高精度大场景三维建模方法。The invention relates to the technical field of three-dimensional modeling, and in particular to a high-precision three-dimensional modeling method of large scenes for drone images.

背景技术Background technique

实景三维能够真实、有序地反映人类生产、生活和生态空间的大规模时空信息，是推动智慧城市和智能数字经济发展的重要新型集成系统。三维场景构建将传统的2D数据扩展到3D数据并作为核心数据结构来实现实景环境，代替了传统的点、线、面的纯几何可视化架构。实景三维使计算机能够全面、立体地呈现与感知各种自然资源要素的现状和空间分布，此外，还能够以高清、可视化的方式准确反映地形的空间分布、表面纹理细节以及地物的形态特征等信息。因此，实景三维模型构建是支撑遥感测绘理论和应用问题的新兴技术，具有重要的科研价值和现实意义，对数字孪生、元宇宙的发展提供了技术支撑。此外，实景三维建模在城市规划、CIM、城市交通、地质测绘、无人驾驶和虚拟地理环境等领域也得到了广泛应用。Real-life 3D can truly and orderly reflect large-scale spatio-temporal information of human production, life and ecological space. It is an important new integrated system that promotes the development of smart cities and smart digital economy. Three-dimensional scene construction extends traditional 2D data to 3D data and uses it as a core data structure to realize a real-scene environment, replacing the traditional pure geometric visualization architecture of points, lines, and surfaces. Real-life 3D enables computers to comprehensively and three-dimensionally present and perceive the current status and spatial distribution of various natural resource elements. In addition, it can also accurately reflect the spatial distribution of terrain, surface texture details, and morphological characteristics of ground objects in a high-definition, visual way. information. Therefore, the construction of real-life three-dimensional models is an emerging technology that supports remote sensing mapping theory and application issues. It has important scientific research value and practical significance, and provides technical support for the development of digital twins and metaverses. In addition, real-life 3D modeling has also been widely used in the fields of urban planning, CIM, urban transportation, geological surveying and mapping, driverless driving and virtual geographical environment.

随着地理信息数据获取手段日益丰富，通过不同数据源来构建三维场景的建模方法也层出不穷。常用三维建模方法有以下几种，例如通过Sketchup、3dMax等软件进行手动建模，以及通过Revit等软件手动构建BIM，该方法得到的模型虽然足够精细，但费时费力，效率低下，难以满足大范围场景建模的需求。或通过拉伸CAD软件中二维矢量面状建筑物到该建筑物高度得到建筑白膜的方法，该方法虽然不需要人工手动建模，但建筑物高度难以获得准确数据，并且模型缺乏纹理和形状。再有就是激光点云建模，多是通过机载激光雷达构建目标物体点云再生成三角面格网，这种方法抗光照、风速干扰能力强，精度高，但成本高，数据噪声问题和数据不一致问题仍是挑战。还有由移动的车辆捕获的街景图片建模、航空摄影测量建模等，但它们均不能重建相对完整的三维场景。至于通过网络众包途径获取的图像来进行三维建模重建效果则严重依赖于网络图片对场景的覆盖程度。As the means of obtaining geographic information data become increasingly abundant, modeling methods for constructing three-dimensional scenes through different data sources are also emerging. Commonly used three-dimensional modeling methods include the following, such as manual modeling through software such as Sketchup and 3dMax, and manual construction of BIM through software such as Revit. Although the model obtained by this method is precise enough, it is time-consuming, labor-intensive, inefficient, and difficult to meet the needs of large-scale users. Requirements for scope scenario modeling. Or obtain the building white film by stretching the two-dimensional vector planar building in CAD software to the height of the building. Although this method does not require manual modeling, it is difficult to obtain accurate data for the height of the building, and the model lacks texture and shape. Then there is the laser point cloud modeling, which mostly uses airborne lidar to construct the target object point cloud and then generate a triangular surface grid. This method has strong resistance to light and wind speed interference and high accuracy, but it has high cost and data noise problems. Data inconsistency remains a challenge. There are also street scene image modeling captured by moving vehicles, aerial photogrammetry modeling, etc., but none of them can reconstruct a relatively complete three-dimensional scene. As for the 3D modeling and reconstruction effect of images obtained through online crowdsourcing, the effect heavily depends on the coverage of the scene by online images.

因此，本领域的技术人员致力于开发一种无人机影像的高精度大场景三维建模方法，以解决上述现有技术的不足。Therefore, those skilled in the art are committed to developing a high-precision large-scene three-dimensional modeling method for drone images to solve the above-mentioned shortcomings of the existing technology.

发明内容Contents of the invention

有鉴于现有技术的上述缺陷，本发明所要解决的技术问题是传统摄影测量三维建模不能很好重建光滑表面、横截面小的地物的缺陷问题。In view of the above-mentioned defects of the prior art, the technical problem to be solved by the present invention is the defective problem that traditional photogrammetric three-dimensional modeling cannot well reconstruct smooth surfaces and small cross-section features.

为实现上述目的，本发明提供了一种无人机影像的高精度大场景三维建模方法，所述方法包括以下步骤：In order to achieve the above purpose, the present invention provides a high-precision large-scene three-dimensional modeling method of drone images. The method includes the following steps:

步骤1、无人机影像的获取与处理；Step 1. Acquisition and processing of drone images;

步骤2、单个NeRF的构建；Step 2. Construction of a single NeRF;

步骤3、合并先前构建的NeRF，得到任意视角下的三维场景；Step 3. Merge the previously constructed NeRF to obtain a three-dimensional scene from any viewing angle;

进一步地，对于步骤1，所述无人机影像的获取与处理按照顺序可分为无人机航拍影像、划分影像集、批量导入影像、提取影像特征、匹配影像特征、几何验证、提取相机位姿；Further, for step 1, the acquisition and processing of drone images can be divided into drone aerial images, dividing image sets, batch importing images, extracting image features, matching image features, geometric verification, and extracting camera positions in order. posture;

所述划分影像集的划分标准为需覆盖一定的场景，且子集与相邻子集之间有较高的重叠度；所述子集由无人机环绕航线的轨迹划分而成，每一个圆形航线中拍摄的无人机影像为一个子集；The criteria for dividing the image set are that it needs to cover a certain scene, and there is a high degree of overlap between the subset and adjacent subsets; the subset is divided by the trajectory of the drone around the route, and each UAV images taken in a circular route are a subset;

所述提取影响特征使用到SIFT算法(尺度不变特征变换，Scale InvariantFeature Transform)提取无人机航拍影像的影像特征，使用SIFTGPU显卡加速达到实时的计算速度；所述SIFT算法在不同的尺度空间上查找特征点，计算特征点的方向，同时生成描述子；The extraction of influence features uses the SIFT algorithm (Scale Invariant Feature Transform) to extract the image features of drone aerial images, and uses the SIFTGPU graphics card to accelerate to achieve real-time calculation speed; the SIFT algorithm operates on different scale spaces Find feature points, calculate the direction of the feature points, and generate descriptors;

所述匹配影响特征使用到Brute-force算法用于特征点匹配，Brute-force算法遍历每对特征点，计算每对特征点之间的距离，根据阈值确定每对特征点是否为匹配对；对于无人机航空影像集合中的任意两幅影像，SiftGPU提取出的特征点和描述子经所述Brute-force算法寻找匹配对；The matching influence feature uses the Brute-force algorithm for feature point matching. The Brute-force algorithm traverses each pair of feature points, calculates the distance between each pair of feature points, and determines whether each pair of feature points is a matching pair according to the threshold; for For any two images in the UAV aerial image collection, the feature points and descriptors extracted by SiftGPU are used to find matching pairs through the Brute-force algorithm;

所述几何验证使用到RANSAC算法随机选取匹配对，计算出拟合矩阵，通过计算拟合误差来确定匹配对是否合理；所述几何验证能够有效提高匹配精度，避免匹配错误；The geometric verification uses the RANSAC algorithm to randomly select matching pairs, calculates the fitting matrix, and determines whether the matching pairs are reasonable by calculating the fitting error; the geometric verification can effectively improve the matching accuracy and avoid matching errors;

所述提取相机位姿使用到增量式SFM算法用于计算相机位姿；所述增量式SFM算法可逐步进行三维重建，有效处理大规模的图像序列；所述增量式SFM算法可分为初始化、增量重建；所述初始化包括三角化、本质矩阵分解；The incremental SFM algorithm is used to calculate the camera pose when extracting the camera pose; the incremental SFM algorithm can gradually perform three-dimensional reconstruction and effectively process large-scale image sequences; the incremental SFM algorithm can be divided into It is initialization and incremental reconstruction; the initialization includes triangulation and essential matrix decomposition;

进一步地，对于步骤2，所述单个NeRF的构建通过预先构建全连接神经网络(MLP)并设定多分辨率哈希编码、球谐函数编码规则；所述单个NeRF的构建使处于不同位置和朝向的相机拍摄的影像的每个像素发射射线，在射线上进行粗采样；将所述采样点的坐标编码后与外观嵌入向量一起输入全连接神经网络，进行一轮细采样；利用一轮细采样的所述采样点的概率密度函数指导第二轮细采样，将采样点的/>坐标编码后与外观嵌入向量一起输入全连接神经网络，输出每个采样点的颜色与体密度；将第二轮细采样的所述采样点的颜色通过体渲染累积积分得到每条射线对应的像素颜色，并与真实值作比较计算LOSS，不断迭代该过程，直到LOSS降低到较低的值；Further, for step 2, the single NeRF is constructed by pre-constructing a fully connected neural network (MLP) and setting multi-resolution hash coding and spherical harmonic function coding rules; the construction of the single NeRF enables A ray is emitted for each pixel of the image captured by the camera, and coarse sampling is performed on the ray; the sampling point is After the coordinates are encoded, the fully connected neural network is input together with the appearance embedding vector to perform a round of fine sampling; the probability density function of the sampling points in one round of fine sampling is used to guide the second round of fine sampling, and the /> of the sampling points are After coordinate encoding, the fully connected neural network is input together with the appearance embedding vector to output the color and volume density of each sampling point; the color of the sampling point in the second round of fine sampling is accumulated and integrated through volume rendering to obtain the pixels corresponding to each ray. Color, and compare it with the real value to calculate LOSS, and continue to iterate the process until LOSS is reduced to a lower value;

进一步地，对于步骤3，所述合并先前构建的NeRF的流程为选取子NeRF、渲染目标视角图像；所述选取子NeRF规则为以给定目标视角为圆心，以预先设定的半径作圆，若子NeRF的原点投影在圆内，那么子NeRF则被选取；所述渲染目标视角图像选用IDW反距离加权算法对渲染目标视角的图像之间进行插值；Further, for step 3, the process of merging previously constructed NeRFs is to select sub-NeRFs and render target perspective images; the rule for selecting sub-NeRFs is to use a given target perspective as the center of the circle and draw a circle with a preset radius, If the origin of the sub-NeRF is projected within the circle, then the sub-NeRF is selected; the rendering target perspective image uses the IDW inverse distance weighting algorithm to interpolate between the images of the rendering target perspective;

进一步地，对于步骤3，将多个所述目标视角的图像连接形成轨迹，即可达到在三维空间中漫游的效果；Further, for step 3, connecting multiple images from the target perspective to form a trajectory can achieve the effect of roaming in a three-dimensional space;

采用以上方案，本发明公开的一种无人机影像的高精度大场景三维建模方法，具有以下优点：Using the above solution, the invention discloses a high-precision large-scene three-dimensional modeling method for drone images, which has the following advantages:

(1)本发明的一种无人机影像的高精度大场景三维建模方法，选用无人机航拍影像作为数据源，充分利用无人机影像空间分辨率高、成像范围广和重叠率高的优点进行影像获取。(1) A high-precision large-scene three-dimensional modeling method of UAV images according to the present invention selects UAV aerial images as the data source and makes full use of the high spatial resolution, wide imaging range and high overlap rate of UAV images. advantages for image acquisition.

(2)本发明的一种无人机影像的高精度大场景三维建模方法，避开了采用传统倾斜摄影测量三维建模的方法，而是提出了新的通过构建神经辐射场NeRF来重建大范围的三维模型的方案，对无人机航拍影像集划分后进行特征提取匹配与几何验证，训练子NeRF，最后将子NeRF合并，完成大场景三维模型的隐式构建。(2) A high-precision large-scene three-dimensional modeling method of UAV images of the present invention avoids the use of traditional oblique photogrammetry three-dimensional modeling methods, but proposes a new reconstruction method by constructing neural radiation fields NeRF For a large-scale three-dimensional model, the UAV aerial image set is divided, feature extraction matching and geometric verification are performed, sub-NeRFs are trained, and sub-NeRFs are finally merged to complete the implicit construction of a large-scene three-dimensional model.

综上所述，本发明公开的一种无人机影像的高精度大场景三维建模方法，避开了采用传统倾斜摄影测量三维建模的方法，将无人机航拍影像集按圆形的环绕航迹进行划分，对划分后的影像集，通过特征提取与匹配和几何验证后，使用SFM算法恢复无人机相机的位姿，然后训练子NeRF，最后将目标视角周围的子NeRF合并，完成大场景三维模型的隐式构建；通过实验测试，达到了很好的效果，可以很好地重建光滑表面和横截面小的地物。In summary, the present invention discloses a high-precision large-scene three-dimensional modeling method for UAV images, which avoids using the traditional oblique photogrammetry three-dimensional modeling method, and divides the UAV aerial image set into a circular shape. Divide around the track, and after feature extraction, matching and geometric verification of the divided image set, use the SFM algorithm to restore the pose of the UAV camera, then train the sub-NeRF, and finally merge the sub-NeRF around the target perspective. Complete the implicit construction of 3D models of large scenes; through experimental testing, good results have been achieved, and smooth surfaces and small cross-section features can be well reconstructed.

以下将结合具体实施方式对本发明的构思、具体技术方案及产生的技术效果作进一步说明，以充分地了解本发明的目的、特征和效果。The concepts, specific technical solutions and technical effects produced by the present invention will be further described below in conjunction with specific embodiments to fully understand the purpose, features and effects of the present invention.

附图说明Description of the drawings

图1为本发明一种无人机影像的高精度大场景三维建模方法的流程图；Figure 1 is a flow chart of a high-precision large-scene three-dimensional modeling method for drone images according to the present invention;

图2为无人机圆形环绕航迹示意图；Figure 2 is a schematic diagram of the circular orbit of the UAV;

图3为场景边界示意图。Figure 3 is a schematic diagram of the scene boundary.

具体实施方式Detailed ways

以下介绍本发明的多个优选实施例，使其技术内容更加清楚和便于理解。本发明可以通过许多不同形式的实施例来得以体现，这些实施例为示例性描述，本发明的保护范围并非仅限于文中提到的实施例。Several preferred embodiments of the present invention are introduced below to make the technical content clearer and easier to understand. The present invention can be embodied in many different forms of embodiments. These embodiments are illustrative descriptions, and the protection scope of the present invention is not limited to the embodiments mentioned herein.

如若有未注明具体条件的实验方法，通常按照常规条件，如相关说明书或者手册进行实施。If there are experimental methods that do not indicate specific conditions, they are usually implemented according to conventional conditions, such as relevant instructions or manuals.

如图1～3所示，本发明的无人机影像的高精度大场景三维建模方法，具体实施方式如下：As shown in Figures 1 to 3, the high-precision large-scene three-dimensional modeling method of drone images of the present invention is implemented as follows:

步骤1、本次采用的航线设计方案为环绕航线(指无人机在飞行过程中按照预先规划的路径绕着一个圆形或椭圆形的航线飞行，如图2所示)，通过自动航线规划软件输入飞行任务要求及地形数据，此时自动生成最优的航线方案；无人机的自动飞行系统按照规划的航线执行飞行任务，完成影像采集工作；Step 1. The route design scheme adopted this time is a circumnavigation route (meaning that the drone flies around a circular or elliptical route according to a pre-planned path during flight, as shown in Figure 2). Through automatic route planning The software inputs the flight mission requirements and terrain data, and automatically generates the optimal route plan; the drone's automatic flight system performs the flight mission according to the planned route and completes the image collection work;

收集后按照无人机环绕航线的轨迹来划分影像集，无人机上的相机会在每一个圆形航线上指向同一个兴趣点，圆形航线上相机指向的兴趣点即每个子NeRF的原点，所以将每个圆形航线拍摄的无人机影像作为一个子集；After collection, the image set is divided according to the trajectory of the drone around the route. The camera on the drone will point to the same point of interest on each circular route. The point of interest pointed by the camera on the circular route is the origin of each sub-NeRF. Therefore, the drone images taken by each circular route are regarded as a subset;

在配备GPU的异构计算系统上使用SIFT算法高效提取无人机航拍影像的影像特征，SiftGPU并行处理像素以构建高斯金字塔并检测高斯差分DoG(Difference ofGaussian)关键点，基于GPU列表生成，SiftGPU使用GPU/CPU混合方法有效地构建紧凑的关键点列表，最后并行处理关键点以获得它们的方向和描述符；Brute-force算法在进行特征点匹配时首先需要定义一个匹配阈值，即距离阈值，然后遍历所有特征点对，并计算它们之间的距离；如果距离小于阈值，则将它们加入匹配对列表中；The SIFT algorithm is used to efficiently extract image features of drone aerial images on a heterogeneous computing system equipped with a GPU. SiftGPU processes pixels in parallel to build a Gaussian pyramid and detect Gaussian difference DoG (Difference of Gaussian) key points. It is generated based on the GPU list. SiftGPU uses The GPU/CPU hybrid method effectively builds a compact keypoint list, and finally processes the keypoints in parallel to obtain their directions and descriptors; the Brute-force algorithm first needs to define a matching threshold, that is, the distance threshold, when performing feature point matching, and then Traverse all feature point pairs and calculate the distance between them; if the distance is less than the threshold, add them to the matching pair list;

先通过RANSAC算法剔除错误匹配点对并计算基础矩阵，具体为随机均匀采样八对匹配点对，基于采样点对，使用归一化八点法估计基础矩阵，计算剩余匹配点对是否满足当前基础矩阵，统计满足当前基础的匹配点对数量作为当前基础矩阵分数，以设定的次数重复上述步骤，得到分数最高的基础矩阵；这八个点对满足公式p'^TFp＝0，p和p'是三维点P在每个成像平面中的投影，F为要求得的基础矩阵；First, the RANSAC algorithm is used to eliminate incorrect matching point pairs and calculate the basic matrix. Specifically, eight pairs of matching point pairs are randomly and uniformly sampled. Based on the sampling point pairs, the normalized eight-point method is used to estimate the basic matrix and calculate whether the remaining matching point pairs satisfy the current basis. matrix, count the number of matching point pairs that satisfy the current basis as the current basis matrix score, repeat the above steps a set number of times, and obtain the basis matrix with the highest score; these eight point pairs satisfy the formula p' ^T Fp=0, p and p ' is the projection of the three-dimensional point P in each imaging plane, and F is the basic matrix to be obtained;

再用RANSAC算法计算单应矩阵并剔除错误匹配点对，具体为随机均匀采样四对匹配点对，基于采样点对，使用四点法估计单应矩阵，计算剩余匹配点对是否满足当前单应矩阵，统计满足当前基础的匹配点对数量作为当前单应矩阵分数，以设定的次数重复上述步骤，得到分数最高的单应矩阵；四点法公式为p'＝Hp，p和p'是三维点P在每个成像平面中的投影，H为要求得的单应矩阵；将四个点对的坐标代入上面的公式，即可构造出四个方程，将上述4个方程组合起来，构成一个线性方程组，求解线性方程组得到单应矩阵；Then use the RANSAC algorithm to calculate the homography matrix and eliminate wrong matching point pairs. Specifically, four pairs of matching point pairs are randomly and uniformly sampled. Based on the sampling point pairs, the four-point method is used to estimate the homography matrix and calculate whether the remaining matching point pairs satisfy the current homography. matrix, count the number of matching point pairs that meet the current basis as the current homography matrix score, repeat the above steps a set number of times, and obtain the homography matrix with the highest score; the four-point method formula is p'=Hp, p and p' are The projection of the three-dimensional point P in each imaging plane, H is the homography matrix to be obtained; by substituting the coordinates of the four point pairs into the above formula, four equations can be constructed, and the above four equations are combined to form A system of linear equations. Solve the system of linear equations to obtain the homography matrix;

利用增量SFM计算相机位姿，首先计算每个特征点的轨迹长度，轨迹长度代表任意特征点出现在所有图片中的数量，然后建立场景连通图，将每张图片作为连通图的结点，如果其中任意两个结点之间的匹配特征点对的数量大于某个阈值，那么就将这两个结点连起来，作为连通图的一条边，选择一条边，选择两张交会角足够大(所有点对应点三角化时射线夹角中位数一般不大于60度不小于3度)、同名点数量足够多且分布均匀的种子图片作为该边的两个结点，鲁棒估计这条边(这两张种子图片)所对应的本质矩阵，分解本质矩阵，得到两张种子图片所属相机的位姿(即相机的外参数)，筛选出两张种子图片上的特征点的轨迹长度大于2的特征点对，进行三角化，得到初始的重建结果，从场景连通图删除这条边，至此，增量SFM的初始化完成；Incremental SFM is used to calculate the camera pose. First, the trajectory length of each feature point is calculated. The trajectory length represents the number of any feature points appearing in all pictures. Then a scene connected graph is established, and each picture is used as a node of the connected graph. If the number of matching feature point pairs between any two nodes is greater than a certain threshold, then connect the two nodes as an edge of the connected graph, select an edge, and select two intersection angles that are large enough (When triangulating all corresponding points, the median angle of the rays is generally not greater than 60 degrees and not less than 3 degrees). Seed images with a sufficient number of points with the same name and even distribution are used as the two nodes of the edge to robustly estimate this edge. The essential matrix corresponding to the edge (these two seed pictures), decompose the essential matrix, obtain the pose of the camera to which the two seed pictures belong (that is, the external parameters of the camera), and filter out the trajectory length of the feature points on the two seed pictures greater than 2's feature point pair is triangulated to obtain the initial reconstruction result, and this edge is deleted from the scene connected graph. At this point, the initialization of the incremental SFM is completed;

完成增量式SFM的初始化后，如果场景连通图中还有边，那么从场景连通图中选取一条边，该边满足对应的两张图片上的特征点的轨迹长度大于2的特征点与已重建的3D点的子集最大化，用PnP方法估计相机位姿(相机的外参数)，筛选出新的该边对应的两张图片上的特征点的轨迹长度大于2的特征点对，进行三角化，删除场景连通图中新加进来的这条边，执行BA算法直到场景连通图中没有边后，即完成了增量SFM的场景增量重建和相机位姿恢复；After completing the initialization of incremental SFM, if there are still edges in the scene connected graph, then select an edge from the scene connected graph that satisfies the feature points whose trajectory length is greater than 2 on the corresponding two pictures and has been Maximize the subset of reconstructed 3D points, use the PnP method to estimate the camera pose (external parameters of the camera), and filter out feature point pairs whose trajectory length is greater than 2 on the two pictures corresponding to the new edge. Triangulate, delete the newly added edge in the scene connected graph, and execute the BA algorithm until there is no edge in the scene connected graph. Then the incremental SFM scene incremental reconstruction and camera pose recovery are completed;

步骤2、基于NeRF的一个空间场景被表示为一个输入为五维向量的函数，用一个MLP网络隐式表达，来描述空间场景内三维模型的形状和从不同方向观察到的颜色信息；该五维向量函数的输入是一个3D位置向量x＝(x,y,z)和一个2D方向向量x＝(x,y,z)是场景内三维点的坐标，/>代表着球坐标系中观察方向的方向角(从x轴正半轴开始测量，沿着yoz平面的方向旋转)和极视角(从z轴正半轴开始测量，沿着xoy平面的方向旋转)；函数的输出是相机发出的射线沿着方向d到达三维位置x所呈现的颜色c＝(r,g,b)和体密度σ(x)；体密度σ(x)代表着射线终止于x位置的无穷小粒子的微分概率；该五维向量函数可以写作：Step 2. A spatial scene based on NeRF is represented as a function whose input is a five-dimensional vector, which is implicitly expressed by an MLP network to describe the shape of the three-dimensional model in the spatial scene and the color information observed from different directions; the five-dimensional vector The input of the dimensional vector function is a 3D position vector x = (x, y, z) and a 2D direction vector x=(x,y,z) is the coordinate of the three-dimensional point in the scene,/> Represents the direction angle of the observation direction in the spherical coordinate system (measured from the positive half-axis of the x-axis, rotating along the direction of the yoz plane) and the polar viewing angle (measured from the positive half-axis of the z-axis, rotating along the direction of the xoy plane) ;The output of the function is the color c=(r,g,b) and the volume density σ(x) of the ray emitted by the camera arriving at the three-dimensional position x along the direction d; the volume density σ(x) represents the ray ending at x The differential probability of the infinitesimal particle position; this five-dimensional vector function can be written as:

F_Θ(x,d)＝(c,σ)F _Θ (x, d) = (c, σ)

该全连接神经网络的训练过程就是不断地调整网络模型的权重参数Θ，使其最终可以在给定5D坐标输入后，输出对应的颜色和体密度；为了保证网络输出结果的多视图一致性，体密度σ仅为空间位置x的函数σ(x)，而颜色向量c为x和d的函数c(x,d)；The training process of this fully connected neural network is to continuously adjust the weight parameter Θ of the network model so that it can finally output the corresponding color and volume density after a given 5D coordinate input; in order to ensure the multi-view consistency of the network output results, The volume density σ is only a function σ(x) of the spatial position x, while the color vector c is a function c(x,d) of x and d;

多分辨率哈希编码的具体流程如下：(1)对于给定的输入坐标x，找到其在不同分辨率层级的周围体素，将它们的整数坐标通过哈希映射为体素的每个顶点指定索引；(2)在不同层级的哈希表中找到网格的每个顶点索引对应的特征向量；(3)根据输入坐标x在不同层级的网格中的相对位置，将不同层级的网格的每个顶点的特征向量通过三线性内插插值为一个特征向量；(4)将不同分辨率格网的特征向量拼接，即完成了多分辨率哈希编码；The specific process of multi-resolution hash encoding is as follows: (1) For a given input coordinate x, find its surrounding voxels at different resolution levels, and map their integer coordinates to each vertex of the voxel through hash Specify the index; (2) Find the feature vector corresponding to each vertex index of the grid in the hash table at different levels; (3) According to the relative position of the input coordinate x in the grid at different levels, divide the grids at different levels into The feature vector of each vertex of the grid is interpolated into a feature vector through trilinear interpolation; (4) The feature vectors of different resolution grids are spliced, that is, multi-resolution hash coding is completed;

在图形渲染上的表现就是，当光源发生旋转后，只要同步的计算出变换后的广义傅里叶系数，就能保证画面的光照效果不会抖动跳变；当球坐标中的拉普拉斯方程分离变量后，关于极视角θ的函数为连带勒让德多项式关于方向角/>的函数为/>球谐函数被定义为：The performance in graphics rendering is that when the light source rotates, as long as the transformed generalized Fourier coefficients are calculated synchronously, it can ensure that the lighting effect of the picture will not jitter or jump; when the Laplacian in spherical coordinates After the equation separates the variables, the function about the polar angle θ is the joint Legendre polynomial About direction angle/> The function of is/> Spherical harmonics are defined as:

上式中，表示为一个单位向量，指向球坐标中的点/>l表示次数，m表示阶数，它们都为整数，l≥0,-l≤m≤l；A_l,m是归一化系数，使得/>在单位球面上的积分等于1；In the above formula, Represented as a unit vector pointing to a point in spherical coordinates/> l represents the degree, m represents the order, they are both integers, l≥0,-l≤m≤l; A _{l, m} are normalization coefficients, such that/> The integral on the unit sphere is equal to 1;

球谐函数可以看作是将单位球面上的每一点(或者三维空间中的每个方向)映射到一个复数函数值；对于输入方向d，可以将其编码为球谐函数的基函数与球谐系数的组合，然后输入到颜色MLP网络中；The spherical harmonic function can be regarded as mapping each point on the unit sphere (or each direction in the three-dimensional space) to a complex function value; for the input direction d, it can be encoded as the basis function and spherical harmonic function of the spherical harmonic function The combination of coefficients is then input into the color MLP network;

为了使NeRF能够适应不同光照变化，采用了GLO技术，即对每张图像都会赋给一个对应的实值外观嵌入向量其长度为-^(a)；然后将外观嵌入向量/>作为第二个颜色MLP网络的输入，网络模型的权重参数Θ与/>嵌入一同被优化；In order to enable NeRF to adapt to different lighting changes, GLO technology is used, that is, each image is assigned a corresponding real-valued appearance embedding vector. Its length is - ^(a) ; then the appearance is embedded into the vector/> As the input of the second color MLP network, the weight parameter Θ of the network model is the same as /> Embeddings are optimized together;

利用在多层级哈希编码中设定的包围模型的大正方体，通过计算相机射线与大正方体的交点来确定near和far的值，如图3所示，至此，场景边界范围得以确定；Using the large cube surrounding the model set in multi-level hash coding, the values of near and far are determined by calculating the intersection of the camera ray and the large cube, as shown in Figure 3. At this point, the scene boundary range is determined;

对于体渲染，使用离散采样求和的方式拟合连续采样的积分，即将t_-到t_f分为N个均匀区间，然后从每一个区间中随机抽取一个样本点，则第i个样本点可以表示为：For volume rendering, the discrete sampling summation method is used to fit the integral of continuous sampling, that is, t _- to t _f are divided into N uniform intervals, and then a sample point is randomly selected from each interval, then the i-th sample point can be Expressed as:

离散采样求和保证了MLP在不断的训练优化的过程中在宏观上连续的位置被估计，从而确保了场景表示的连续性；基于以上的思路，将积分转化为求和的形式为：Discrete sampling summation ensures that MLP's macroscopically continuous positions are estimated during the continuous training and optimization process, thereby ensuring the continuity of scene representation. Based on the above ideas, the form of converting integrals into summation is:

上式中，δ_i＝t_i+1-t_i表示相邻样本点之间的距离，σ_i与c_i是样本点i的体密度和颜色，从σ_i与c_i值的集合中计算是可微的，样本点i的透明度即1-exp(-σ_iδ_i)；In the above formula, δ _i =t _i+1 -t _i represents the distance between adjacent sample points, σ _i and c _i are the volume density and color of sample point i, calculated from the set of σ _i and c _i values is differentiable, the transparency of sample point i is 1-exp(-σ _i δ _i );

对于分层采样，NeRF使用分层采样的方案来沿着空间中的射线对点进行采样，即先进行粗采样，然后用粗采样的结果指导下一步的细采样；粗采样即均匀采样，在距离相机的预定义范围内均匀地采样N个点；将near到far的范围n等分，并在射线上均匀地放置N个采样点；将这N个采样点的位置信息经过多层级哈希编码、将点的方向信息经过球谐函数编码后和外观嵌入向量拼接后送入MLP网络，这个MLP网络被称为粗(coarse)网络，得到这N个采样点的体密度和颜色；在这条射线上，对应的像素颜色可以看做是所以采样点颜色的加权和，即For stratified sampling, NeRF uses a stratified sampling scheme to sample points along rays in space, that is, coarse sampling is performed first, and then the results of coarse sampling are used to guide the next step of fine sampling; coarse sampling is uniform sampling, in Sample N points uniformly within a predefined range from the camera; divide the range from near to far into n equal parts, and place N sampling points evenly on the ray; pass the position information of these N sampling points through multi-level hashing Encoding, the direction information of the point is encoded by spherical harmonic function and spliced with the appearance embedding vector, and then sent to the MLP network. This MLP network is called a coarse network, and the volume density and color of the N sampling points are obtained; here On a ray, the corresponding pixel color can be regarded as the weighted sum of the colors of all sampling points, that is

w_i＝T_i(1-exp(-σ_iδ_i))w _i =T _i (1-exp(-σ _i δ _i ))

采样权重的大小反映了采样点距离三维模型表面的远近程度，第二次采样会对第一次采样结果中权重较大的区域重点采样；即在权重值大的地方采样多一些，在权重值小的地方采样少一些；The size of the sampling weight reflects the distance between the sampling point and the surface of the three-dimensional model. The second sampling will focus on sampling the area with a larger weight in the first sampling result; that is, more sampling will be done in the area with a large weight value, and the area with a larger weight value will be sampled more. Smaller places need less sampling;

将这些权重归一化为These weights are normalized to

上式中，N_c是粗采样的采样点数目，该可以被视为是沿着射线的物体的概率密度函数，通过这个概率密度函数，可以粗略地得到射线方向上物体的分布情况；接下来根据这个概率密度函数，采用细采样，即逆变换采样(inversetransformsampling)，得到N_f个新的采样点，这些新的采样点在粗采样权重高的位置聚集，在粗采样权重低的位置稀疏，这些新的采样点靠近物体表面；然后，将N_c+N_f个采样点的位置和方向信息经过编码后都输入到MLP网络中，这个网络被称为细(fine)网络，得到新的采样点的体密度和颜色信息，然后根据体渲染公式计算出对应像素颜色，公式如下所示：In the above formula, N _c is the number of sampling points for coarse sampling, which It can be regarded as the probability density function of objects along the ray. Through this probability density function, the distribution of objects in the ray direction can be roughly obtained. Next, according to this probability density function, fine sampling, that is, inverse transform sampling ( inversetransformsampling), get N _f new sampling points. These new sampling points are gathered at locations with high coarse sampling weights and sparse at locations with low coarse sampling weights. These new sampling points are close to the object surface; then, N _c + The position and direction information of N _f sampling points are encoded and input into the MLP network. This network is called a fine network. The volume density and color information of the new sampling points are obtained, and then calculated according to the volume rendering formula. Corresponding to the pixel color, the formula is as follows:

损失函数Loss由coarse网络输出的像素颜色和由fine网络输出的像素颜色和真实像素颜色之间的总平方误差计算得到：The loss function Loss is calculated by the total squared error between the pixel color output by the coarse network and the pixel color output by the fine network and the true pixel color:

上式中，R是每个训练批次中的一组射线集合，和/>分别是相机射线粗采样和细采样后体渲染得到的像素颜色，/>则是图像像素颜色的真实值(GroundTruth，GT)；即使最终渲染结果来自/>也会使/>的损失最小化，以便来自coarse网络的权重分布可以用于在fine网络中分配样本；In the above formula, R is a set of rays in each training batch, and/> They are the pixel colors obtained by body rendering after coarse sampling and fine sampling of camera rays, /> is the true value of the image pixel color (GroundTruth, GT); even if the final rendering result comes from/> It will also make/> The loss is minimized so that the weight distribution from the coarse network can be used to distribute samples in the fine network;

当Loss的值达到一个较低的值后，即可视作训练结束，此时停止迭代，对于场景中任意视角，都可以放置一个虚拟相机，这个相机经过发射多条射线，在每个射线上粗采样，送coarse网络后得到每个采样点的权重，然后进行细采样，将两轮采样点送入fine网络，经过体渲染后得到每个像素的颜色，这些像素共同组成一幅图像；即可以得到场景中任意视角观察到的图像；When the value of Loss reaches a lower value, the training can be regarded as over. At this time, the iteration is stopped. For any perspective in the scene, a virtual camera can be placed. This camera emits multiple rays, and on each ray For coarse sampling, the weight of each sampling point is obtained after sending it to the coarse network, and then fine sampling is performed. Two rounds of sampling points are sent to the fine network, and the color of each pixel is obtained after volume rendering. These pixels together form an image; that is, Images observed from any angle in the scene can be obtained;

步骤3、在对无人机影像集合的每个子集训练NeRF模型后，就获得了每个以其圆形航线上相机指向的兴趣点为原点，以其圆形航线内构建的大正方体为边界的子NeRF，为了提高效率，只渲染给定目标视角的相关子NeRF，接着从筛选后的子NeRF中渲染颜色信息，并使用目标相机原点o和经过筛选后的子NeRF中心点x_i之间的反距离加权(InverseDistance Weighted，IDW)在它们之间进行插值，(具体为将各个权重计算为w_i∝d(o,x_i)^-p，其中p影响子NeRF渲染之间的混合速率，d(c,x_i)是目标相机原点o到经过筛选后的子NeRF中心点x_i之间的距离)，最终得到目标视角下的新视图；将多个目标视角下的视图连接形成轨迹，即可达到在三维空间中漫游的效果；Step 3. After training the NeRF model on each subset of the UAV image collection, we obtain each point of interest pointed by the camera on its circular route as the origin, and the large cube constructed within its circular route as the boundary. sub-NeRF, in order to improve efficiency, only render the relevant sub-NeRF for a given target perspective, then render the color information from the filtered sub-NeRF, and use the target camera origin o and the filtered sub-NeRF center point _xi The InverseDistance Weighted (IDW) is interpolated between them, (specifically, each weight is calculated as w _i ∝d(o,xi ₎ ^-p , where p affects the mixing rate between sub-NeRF renderings, d(c, _xi ) is the distance between the target camera origin o and the filtered sub-NeRF center point x _i ), and finally a new view from the target perspective is obtained; views from multiple target perspectives are connected to form a trajectory, You can achieve the effect of roaming in three-dimensional space;

经实施例1分析可知，本发明的一种无人机影像的高精度大场景三维建模方法避开了采用传统倾斜摄影测量三维建模的方法，将无人机航拍影像集按圆形的环绕航迹进行划分，对划分后的影像集，通过特征提取与匹配和几何验证后，使用SFM算法恢复无人机相机的位姿，然后训练子NeRF，最后将目标视角周围的子NeRF合并，完成大场景三维模型的隐式构建；通过实验测试，达到了很好的效果，可以很好地重建光滑表面和横截面小的地物。It can be seen from the analysis of Embodiment 1 that the high-precision large-scene three-dimensional modeling method of UAV images of the present invention avoids the use of traditional oblique photogrammetry three-dimensional modeling methods, and divides the UAV aerial image collection into circular shapes. Divide around the track, and after feature extraction, matching and geometric verification of the divided image set, use the SFM algorithm to restore the pose of the UAV camera, then train the sub-NeRF, and finally merge the sub-NeRF around the target perspective. Complete the implicit construction of 3D models of large scenes; through experimental testing, good results have been achieved, and smooth surfaces and small cross-section features can be well reconstructed.

以上详细描述了本发明的较佳具体实施例。应当理解，本领域的普通技术无需创造性劳动就可以根据本发明的构思做出诸多修改和变化。因此，凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的试验可以得到的技术方案，皆应在由权利要求书所确定的保护范围内。The preferred embodiments of the present invention are described in detail above. It should be understood that those skilled in the art can make many modifications and changes according to the concept of the present invention without creative efforts. Therefore, any technical solutions that can be obtained by those skilled in the art through logical analysis, reasoning or limited testing on the basis of the prior art based on the concept of the present invention should be within the scope of protection determined by the claims.

Claims

1. A high-precision three-dimensional modeling method of large scenes from drone images, which is characterized by:

The method includes the following steps:

Step 1. Acquisition and processing of UAV images. The acquisition and processing of UAV images can be divided into UAV aerial images, dividing image sets, batch importing images, extracting image features, matching image features, and geometry in order. Verify and extract camera pose;

Step 2. Construction of a single NeRF;

Step 3. Merge the previously constructed NeRF to obtain a three-dimensional scene from any viewing angle;

The criteria for dividing the image set are that it needs to cover a certain scene, and there is a high degree of overlap between the subset and adjacent subsets; the subset is divided by the trajectory of the drone around the route, and each UAV images taken in a circular route are a subset;

The extracted image features use the SIFT algorithm to extract the image features of drone aerial images, and use the SIFTGPU graphics card to accelerate to achieve real-time calculation speed; the SIFT algorithm searches for feature points in different scale spaces, calculates the direction of the feature points, and simultaneously Generate descriptors;

The matching image features use the Brute-force algorithm for feature point matching. The Brute-force algorithm traverses each pair of feature points, calculates the distance between each pair of feature points, and determines whether each pair of feature points is a matching pair according to the threshold; for For any two images in the UAV aerial image collection, the feature points and descriptors extracted by SiftGPU are used to find matching pairs through the Brute-force algorithm;

The geometric verification uses the RANSAC algorithm to randomly select matching pairs, calculates the fitting matrix, and determines whether the matching pairs are reasonable by calculating the fitting error; the geometric verification can effectively improve the matching accuracy and avoid matching errors;

The incremental SFM algorithm is used to calculate the camera pose when extracting the camera pose; the incremental SFM algorithm can gradually perform three-dimensional reconstruction and effectively process large-scale image sequences; the incremental SFM algorithm can be divided into It is initialization and incremental reconstruction; the initialization includes triangulation and essential matrix decomposition;

The single NeRF is constructed by pre-constructing a fully connected neural network and setting multi-resolution hash coding and spherical harmonic function coding rules; the construction of the single NeRF enables each pixel of the image captured by cameras at different positions and orientations to Emit rays and perform rough sampling on the rays; encode the coordinates of the sampling points and input them into the fully connected neural network together with the appearance embedding vector to perform a round of fine sampling; use the probability density function of the sampling points from one round of fine sampling to guide the second round For fine sampling, the coordinates of the sampling points are encoded and input into the fully connected neural network together with the appearance embedding vector, and the color and volume density of each sampling point are output; the color of the sampling points in the second round of fine sampling is accumulated and integrated through volume rendering to obtain each The pixel color corresponding to the ray is compared with the real value to calculate the LOSS, and the construction process of a single NeRF is continuously iterated until the LOSS is reduced to a lower value;

The process of merging previously constructed NeRFs is to select sub-NeRFs and render the target perspective image; the rule for selecting sub-NeRFs is to take the given target perspective as the center of the circle and draw a circle with a preset radius. If the origin of the sub-NeRF is projected on the circle within, then the sub-NeRF is selected; the IDW inverse distance weighting algorithm is used for the rendering target perspective image to interpolate between the images of the rendering target perspective.

2. A high-precision large-scene three-dimensional modeling method for drone images as claimed in claim 1, characterized in that:

For step 3, connecting multiple images from the target perspective to form a trajectory can achieve the effect of roaming in the three-dimensional space.