CN116071278A

CN116071278A - Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium

Info

Publication number: CN116071278A
Application number: CN202211534828.6A
Authority: CN
Inventors: 刘静; 王钰琳; 王浩龙; 蒋晓瑜; 苏立玉
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-05-05

Abstract

The invention belongs to the technical field of computer graphics and computer vision, and discloses an unmanned aerial vehicle aerial image synthesis method, an unmanned aerial vehicle aerial image synthesis system, computer equipment and a storage medium, wherein the unmanned aerial vehicle aerial image synthesis method comprises the following steps of: acquiring a plurality of two-dimensional images of the unmanned aerial vehicle; performing three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling in the sparse three-dimensional point cloud model to obtain a plurality of sampling points; acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all sampling points to obtain a trained complete scene neural radiation field network model; inputting the set viewpoint direction into a trained complete scene neural radiation field network model, and rendering to obtain a scene image in the set viewpoint direction. The method has faster rendering speed, can synthesize scenes in a large scale range, and can realize high-quality view rendering under any view point.

Description

Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium

Technical Field

The invention belongs to the technical field of computer graphics and computer vision, and relates to an unmanned aerial vehicle aerial image synthesis method, an unmanned aerial vehicle aerial image synthesis system, computer equipment and a storage medium.

Background

With the rapid development of technology, the 3D reconstruction technology has increasingly strong business demands in the fields of urban development, cultural relic protection, virtual reality, industrial geographic measurement and the like. However, the traditional 3D reconstruction method requires manual measurement of related data, and drawing of 3D graphics by professional software is time-consuming and labor-consuming. In recent years, unmanned aerial vehicle technology is gradually becoming a main tool for engineers to acquire aerial images or video images due to the characteristics of low cost, high efficiency and portability. At present, research on realizing three-dimensional reconstruction through sequence images shot by unmanned aerial vehicles is becoming mature. Among them, image-based viewpoint synthesis is an important issue of common concern in the fields of computer graphics and computer vision. Specifically, a plurality of images of known shooting viewpoints are used as input, and the three-dimensional objects or scenes shot by the images are expressed in terms of geometric, appearance, illumination and other properties, so that images of other non-shooting viewpoints can be synthesized, and finally a drawing result with high sense of reality is obtained. Compared with the traditional process of combining three-dimensional reconstruction with graphic drawing, the method can obtain a synthetic result of photo-level realism.

And solving the sparse point cloud of the camera pose and the three-dimensional space from the input aerial image sequence by using software such as Pix4D, visualSFM, smart D, colMap and the like based on a motion restoration structure (Structure From Motion, SFM) technology by using an SFM method, performing densification processing on the sparse point cloud, generating a triangular patch from the dense point cloud by grid reconstruction, and finally performing texture mapping on the grid to obtain the three-dimensional map with texture information. Although the whole flow of the method based on the SFM technology is mature, the huge algorithm calculation amount makes the method have high requirements on hardware configuration, particularly for high-resolution aerial image sequences, the processing time of the method is very long for keeping the reconstruction result of the original resolution, and the synthetic view point is very limited.

Synthesizing new views of a scene from a sparse set of captured images is a long-standing problem in computer vision, and is a prerequisite for many AR and VR applications. Although classical techniques have addressed this problem using motion structure-based or image-based rendering, there are still problems of explicit modeling difficulties, low three-dimensional reconstruction accuracy, and poor quality of the rendered image.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide an unmanned aerial vehicle aerial image synthesis method, an unmanned aerial vehicle aerial image synthesis system, computer equipment and a storage medium.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

in one aspect of the invention, an unmanned aerial vehicle aerial image synthesis method comprises the following steps:

acquiring a plurality of two-dimensional images of the unmanned aerial vehicle;

performing three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling in the sparse three-dimensional point cloud model to obtain a plurality of sampling points;

acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values;

inputting the set viewpoint direction into a trained complete scene neural radiation field network model, and rendering to obtain a scene image in the set viewpoint direction.

Optionally, the acquiring the plurality of two-dimensional images of the unmanned aerial vehicle comprises:

And acquiring a plurality of two-dimensional images shot by the unmanned aerial vehicle at fixed heights, different positions and different visual angles, wherein the overlapping degree of the two-dimensional images shot adjacently is more than 80%.

Optionally, the three-dimensional reconstruction of the scene from the plurality of two-dimensional images includes:

the colmap image reconstruction method of the motion restoration structure carries out three-dimensional reconstruction of the scene.

Optionally, the complete scene neural radiation field network model includes a foreground neural radiation field network and a background neural radiation field network;

the foreground neural radiation field network comprises a foreground volume density synthesis network and a foreground color synthesis network; the foreground volume density synthesis network is:

the foreground color synthesis network is:

wherein σ (t) represents a bulk density function; z (t) represents a position-coding-related feature vector;

representing a bulk density synthetic network; gamma ray _x Representing a position code;

Representing the foreground image color estimated by the foreground color synthesis network, r (T) =o+td represents the light emitted along the ray origin, T e (0, T') represents the scene boundary of the sphere of the internal unit, T (T) represents the cumulative transparency of the rays along the viewpoint camera, and the calculation formula is +.>

The expression t represents the distance along ray r from the ray origin, c _i (t) represents radiance, ">

Representing a foreground color synthesis network, gamma _d (d) Indicating the viewing angle directionCoding (I)>

An implicit illumination visibility feature vector code representing an ith image, a representing an illumination visibility feature vector;

the background neural radiation field network comprises a background volume density synthesis network and a background color synthesis network; background bulk density synthetic networks are:

the background color synthesis network is:

wherein ,

representing the background bulk density synthesis network,/->

Representing the background image color estimated by the background color synthesis network, t e (t',) representing the outside of the unit sphere,/->

The rendering function of the complete scene neural radiation field network model is as follows:

wherein ,C_i (r) is the synthesized color value of the complete scene neural radiation field network model, (i) is the foreground neural radiation occasion color value, (ii) is the synthesized coefficient, and (iii) is the background neural radiation occasion color value.

Optionally, obtaining the sample point implicit scene illumination visibility feature vector of each sample point includes:

the opacity α and the cumulative transparency T of the camera light of the sampling point in the viewpoint direction are obtained using:

α _i ＝1-exp(-σ _i δ _i )

wherein ,δ_i ＝t _i+1 -t _i Representing the distance, sigma, between adjacent sampling points _i A bulk density representing a viewpoint direction; obtaining a sampling point implicit scene illumination visibility feature vector of a sampling point by using the following method

Where a is the illumination visibility feature vector.

Optionally, the sampling in the sparse three-dimensional point cloud model includes:

obtaining a near-border and a far-border of a scene according to the sparse three-dimensional point cloud model, and uniformly sampling between the near-border and the far-border of the scene to obtain sampling points of coarse sampling;

inputting the sampling points of the rough sampling into a preset complete scene neural radiation field network model to obtain a probability density distribution function of the color;

according to the probability density distribution function of the color, carrying out fine sampling in a region with the probability value larger than a preset threshold value to obtain a sampling point of fine sampling;

and combining the sampling points of the coarse sampling and the fine sampling to obtain a final sampling point.

Optionally, when training a preset complete scene neural radiation field network model according to training data of each sampling point, training is performed by adopting a residual loss function L of the following formula:

where R represents the total number of light up-sampling points per batch of training,

inputting color estimation value obtained by a complete scene neural radiation field network model for sampling points of rough sampling, < +.>

And C (r) represents the actual color value of the sampling point, which is the color estimated value obtained by the neural radiation field network model of the complete scene of the sampling point of the fine sampling.

In a second aspect of the present invention, an unmanned aerial vehicle aerial image synthesis system includes:

the acquisition module is used for acquiring a plurality of two-dimensional images of the unmanned aerial vehicle;

the sampling module is used for carrying out three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling the sparse three-dimensional point cloud model to obtain a plurality of sampling points;

the training module is used for acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all the sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values;

and the rendering module is used for inputting the set viewpoint direction into the trained complete scene neural radiation field network model, and obtaining a scene image in the set viewpoint direction through rendering.

In a third aspect of the present invention, a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the steps of the unmanned aerial vehicle aerial image synthesis method are implemented when the processor executes the computer program.

In a fourth aspect of the present invention, a computer readable storage medium stores a computer program which, when executed by a processor, implements the steps of the unmanned aerial vehicle aerial image synthesis method described above.

Compared with the prior art, the invention has the following beneficial effects:

according to the unmanned aerial vehicle aerial image synthesis method, a plurality of two-dimensional images of unmanned aerial vehicle aerial images are obtained, three-dimensional reconstruction of a scene is carried out according to the plurality of two-dimensional images, a sparse three-dimensional point cloud model of the scene is obtained, sampling is carried out in the sparse three-dimensional point cloud model, a plurality of sampling points are obtained, training data of the sampling points are obtained, a preset complete scene neural radiation field network model is trained according to the training data of the sampling points, and finally the scene image in the set viewpoint direction is obtained through the complete scene neural radiation field network model which is obtained through training. Based on the trained complete scene neural radiation field network model, view rendering under any view point can be realized, compared with a three-dimensional model of a scene which needs to be built in advance, the network training time is shorter than the reconstruction time of the three-dimensional model of the scene, the network training time is faster, the scene in a large scale range can be synthesized, high-quality view rendering under any view point can be realized, and a real-level high-quality new view can be realized by utilizing the complete scene neural radiation field network model rendering which is successfully trained in advance. In addition, the neural network can be trained offline, and can realize real-time rendering of the new viewpoint, and meanwhile, the synthesis range of the new viewpoint is not limited, so that the method has better application prospect. Meanwhile, the large-scale scene reconstruction is performed by utilizing a nerve rendering mode, and the acceleration processing can be performed by utilizing GPU equipment, so that the method is very suitable for large-scale scene reconstruction with large data volume.

Drawings

FIG. 1 is a flowchart of an aerial image synthesis method of an unmanned aerial vehicle according to an embodiment of the present invention;

FIG. 2 is a detailed flowchart of an aerial image synthesizing method of the unmanned aerial vehicle according to the embodiment of the invention;

FIG. 3 is a schematic diagram of an implicit scene illumination visibility feature vector network framework in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of an overall framework of a full scene neural radiation field network model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the principle of the inverse sphere parameterization of the segmentation foreground and background according to the embodiment of the present invention;

fig. 6 is a schematic diagram of a scene reconstruction result at a certain view angle generated by a method according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The invention is described in further detail below with reference to the attached drawing figures:

in the field of large-scale high-quality unmanned aerial vehicle aerial map synthesis, a traditional three-dimensional reconstruction method needs to model a three-dimensional model of a whole large-scale scene, the time consumption is long, the precision of the synthesized three-dimensional model depends on the number of input views, and the more the number of acquired images, the more accurate the built model. However, the data volume is increased, so that three-dimensional reconstruction of a scene is long, the reconstructed model cannot synthesize images in any view point range, and only virtual view point images with fixed view angles can be synthesized, so that the method has great limitation.

Also, synthesizing new views of a scene from a sparse set of captured images is a long standing problem in computer vision, and is a prerequisite for many AR and VR applications. While classical techniques have addressed this problem using motion structure-based or image-based rendering, significant progress has recently been made in this field due to the rendering technique of shading (by adding neural network modules that learn 3D geometry and through a training network can reconstruct the observed image). Neural radiation field (NeRF) methods model the radiation field and density of a scene using the weights of a neural network. The new view is then synthesized using a volume-based rendering technique, displaying unprecedented fidelity over a range of challenging scenes.

In recent years, because the neural rendering technology is based on the principle of volume rendering, a neural network implicit function can be utilized to fit and reconstruct a scene, so that higher-quality and faster image rendering is realized, and therefore, the neural network implicit function reconstruction technology becomes a hot spot for research. Along with the rapid development of the deep learning technology, a plurality of methods based on deep learning are also proposed, and the accuracy and the sense of reality of viewpoint synthesis are further improved by a data driving mode. The large-scale high-quality live-action map synthesis based on the nerve radiation field is to construct a nerve radiation field network for scene reconstruction through training, view rendering under any view point can be realized by utilizing the trained nerve network, compared with a three-dimensional model of a scene which needs to be built in advance, the time of network training is short compared with the time of reconstructing the three-dimensional model of the scene, and a real-level high-quality new view can be realized by utilizing the network rendering which is successfully pre-trained. Because the neural network can be trained offline, the new view point can be rendered in real time, the synthesis range of the new view point is not limited, and the method has better application prospect.

Therefore, the invention discloses an unmanned aerial vehicle aerial image synthesis method, which is used for reconstructing a dense scene view according to a sparse two-dimensional image sequence obtained by unmanned aerial vehicle aerial shooting. Because the large-scale scene needs to be reconstructed, if all scene contents are reconstructed by using a nerve radiation field, the problem that details of the scene reconstruction with a large depth range are blurred occurs, so that the foreground and the background in the image need to be modeled respectively; and considering that the illumination can change during unmanned aerial vehicle aerial photography, the foreground and background nerve radiation field models with changeable illumination can be synthesized by carrying out implicit illumination visibility vector coding on the photographed static image, then the obtained two models are spliced to obtain the nerve radiation field model of the reconstructed scene, and the nerve rendering technology is utilized to render and realize large-scale high-quality unmanned aerial vehicle aerial photography map image synthesis.

The neural rendering method based on the neural radiation field is to convert a scene to be displayed through three-dimensional modeling into an implicit function simulating real imaging, and the color and density characteristics of the picture are estimated along the direction of the sight to render, so that a reconstruction result which is very consistent with the original picture is obtained. By means of the method, large-scale illumination variable high-quality image synthesis under any view point can be achieved by means of the trained model.

Referring to fig. 1 and 2, in an embodiment of the present invention, an unmanned aerial vehicle aerial image synthesis method is provided, which is suitable for large-scale high-quality unmanned aerial vehicle aerial image synthesis, wherein unmanned aerial vehicle aerial rendering images are generated by computer simulation. In this embodiment, the unmanned aerial vehicle aerial image synthesis method specifically includes the following steps:

s1: and acquiring a plurality of two-dimensional images of the unmanned aerial vehicle.

Wherein, obtain a plurality of two-dimensional images of unmanned aerial vehicle aerial photo by plane include: and acquiring a plurality of two-dimensional images shot by the unmanned aerial vehicle at fixed heights, different positions and different visual angles, wherein the overlapping degree of the two-dimensional images shot adjacently is more than 80%. The acquisition of a plurality of two-dimensional images of the unmanned aerial vehicle is generally a two-dimensional image sequence

N is the number of two-dimensional images. Specifically, a scene picture in a large scale range is aerial photographed by using an unmanned aerial vehicle, and a flight path of the unmanned aerial vehicle needs to be planned in advance, so that the unmanned aerial vehicle flies at a fixed height, the overlapping degree of photographed adjacent images is more than 80%, the inclination angle of a camera is 45 degrees, and two-dimensional images photographed by the unmanned aerial vehicle under N different positions and different view angles are obtained to form a two-dimensional image sequence->

S2: and carrying out three-dimensional reconstruction of the scene according to the plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling in the sparse three-dimensional point cloud model to obtain a plurality of sampling points.

Wherein the three-dimensional reconstruction of the scene from the plurality of two-dimensional images comprises: the colmap image reconstruction method of the motion restoration structure carries out three-dimensional reconstruction of the scene.

Specifically, image data preprocessing is performed on a two-dimensional image sequence obtained by aerial photography of an unmanned aerial vehicle, and a collmap image reconstruction program of a motion restoration structure technology is used for carrying out feature extraction, feature matching and sparse reconstruction on the image sequence to obtain internal and external parameters, sparse 3D position points and far-near plane parameters of a camera, wherein the far-near plane parameters comprise near and far boundaries of a scene. The feature extraction comprises the steps of extracting feature points of each two-dimensional image, detecting the positions of the feature points by using a SIFT algorithm, and calculating feature vectors of the feature points to obtain feature point information of each image. And then carrying out sparse 3D point cloud reconstruction according to the obtained result, namely carrying out feature matching among images by utilizing the obtained feature point information, firstly calculating the internal and external parameters of a camera of each image, then carrying out inter-image matching according to the camera parameters, reducing error matching point pairs in the feature matching by utilizing geometric verification after the inter-image feature point matching, then calculating a basic matrix and an essential matrix by utilizing epipolar geometric constraint in a double view, carrying out singular value decomposition on the essential matrix to obtain rotation and translation information of the images, then recovering three-dimensional coordinates of the feature points by utilizing a triangulation technology, carrying out integral optimization on all the feature points and the rotation and translation of the images by utilizing binding optimization to obtain a sparse three-dimensional point cloud model of a scene, and obtaining near and far boundaries of the scene according to the sparse three-dimensional point cloud model of the scene.

Sampling is then performed in a sparse three-dimensional point cloud model, typically comprising coarse sampling and fine sampling. Specifically, according to a sparse three-dimensional point cloud model, obtaining a near-range and a far-range of a scene, and uniformly sampling between the near-range and the far-range of the scene to obtain sampling points of coarse sampling; inputting the sampling points of the rough sampling into a preset complete scene neural radiation field network model to obtain a probability density distribution function of the color; according to the probability density distribution function of the color, carrying out fine sampling in a region with the probability value larger than a preset threshold value, namely placing more sampling points at the position with large probability to obtain fine sampling points; and combining the sampling points of the coarse sampling and the fine sampling to obtain a final sampling point.

S3: acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values.

Specifically, both the coordinates of the sampling points and the viewpoint directions of the sampling points can be obtained through a sparse three-dimensional point cloud model, and the illumination visibility feature vector of the implicit scene of the sampling points is obtained through the following steps:

α _i ＝1-exp(-σ _i δ _i )

wherein, the index i is the sampling point sequence number.

Obtaining a sampling point implicit scene illumination visibility feature vector of a sampling point by using the following method

Where a is the illumination visibility feature vector.

Specifically, the image dimension of the two-dimensional image is H×W×3, and the implicit scene illumination visibility feature vector of the sampling point is set

For each two-dimensional image, the connecting line of the optical center and the pixel point is a light ray, n light rays are arbitrarily selected, each light ray is uniformly sampled, and the opacity alpha and the cumulative transparency T of the camera light rays along the direction of the viewpoint p are calculated by using the following formula:

α _i ＝1-exp(-σ _i δ _i )

wherein ,δ_i ＝t _i+1 -t _i Representing the distance, sigma, between adjacent sampling points _i The volume density in the direction of the viewpoint p is indicated.

Then, a visibility feature vector of camera light along the viewpoint p direction is calculated using the following formula

Sequence of two-dimensional images

Input to implicit scene illumination visibilityThe feature vector network performs network training using the following formula:

wherein L' represents a prediction error;

representing the illumination visibility characteristic vector estimated value calculated by the network; l (r) represents the actual illumination visibility.

Referring to fig. 3, the implicit scene illumination visibility feature vector network is a fully connected neural network, the input is a position coordinate and a viewpoint direction, the network training uses Adam optimization method, and through a back propagation algorithm, the network weight parameters are updated and the implicit scene illumination visibility feature vector set is obtained. Obtaining final implicit scene illumination visibility feature vector codes of all two-dimensional images through the trained implicit scene illumination visibility feature vector network

When training the implicit scene illumination visibility feature vector network, the light direction needs to be sampled according to the pixel points and the camera positions, wherein the sampling is uniform sampling. The implicit scene illumination visibility feature vector of each two-dimensional image obtained is encoded in the following way>

And (3) splicing the two-dimensional images and inputting the two-dimensional images into a preset complete scene neural radiation field network model for training. For example, for a two-dimensional image sequence->

The connection line of the optical center and the pixel points is a light ray, n light rays are arbitrarily selected, each light ray is uniformly sampled for 16 points, and the position information code and the viewpoint direction code of the sampled points are input into an implicit scene illumination visibility feature vector network to obtain The corresponding two-dimensional image sequence is acquired>

N implicit scene illumination visibility feature vectors +.>

And then inputting the color values into a complete scene neural radiation field network model to obtain corresponding color values.

The above steps are to realize feature vector coding of different illumination visibility information on each pixel point corresponding to light on images in different view directions between scenes, and obtain global illumination visibility information of the scenes, so as to input the global illumination visibility information into a nerve radiation field network to further synthesize an illumination variable nerve radiation field.

Referring to fig. 4, the complete scene neural radiation field network model includes a foreground neural radiation field network and a background neural radiation field network; the foreground neural radiation field network comprises a foreground volume density synthesis network and a foreground color synthesis network; the foreground volume density synthesis network is:

the foreground color synthesis network is:

Representing a foreground image color estimated by a foreground color synthesis network, r=o+t ^d Representing the light rays emitted along the origin of the rays, t.epsilon. (0, T') representing the scene boundary of an internal unit sphere, T (T) representing the cumulative transparency of the camera rays along the viewpoint, calculated as +. >

Representing a foreground color synthesis network, gamma _d (d) Indicating the coding of the viewing direction>

An implicit illumination visibility feature vector code representing an ith image, and a represents an illumination visibility feature vector.

the background color synthesis network is:

wherein ,

Representing the background bulk density synthesis network,/->

wherein ,C_i (r) is the complete scene godThe synthesized color values via the radiation field network model are (i) foreground neural radiation occasion color values, (ii) synthesis coefficients, and (iii) background neural radiation occasion color values.

In this embodiment, for constructing a preset complete scene neural radiation field network model, the following specific implementation steps are provided: for separate modeling of scenes with different depth values in the scene, when processing to a part with a larger depth value, a new view of high quality is synthesized for better benefit of its detail part. Thus, the scene space is divided into two parts, one inner unit sphere containing the foreground in the scene and one outer volume containing the background in the scene. The modeling is carried out by the inner unit sphere by using the foreground nerve radiation field, no additional parameterization is needed, the modeling is carried out by the outer unit sphere by using the background nerve radiation field, and the inverted sphere parameterization is needed. See fig. 5, wherein the inverted ball parameterization process is as follows: representing the scene in a unit circle S, and striving to invert the sphere parameterization to divide the foreground and the background of the scene, the position points in the foreground of the scene can be represented as (x, y, z), and the position points in the background can be represented as

Where r' is the radius of the unit circle S.

The position codes and the direction codes of the sampling points on the light rays in the unit circle are directly input into a foreground radiation field, and the modeling of the foreground nerve radiation field can be expressed as a function: MLP (Multi-layer Programming protocol) ₁ (x, d) = (c, sigma), wherein MLP is a fully connected network, x is a three-dimensional space coordinate, d is a two-dimensional viewing angle direction, c is a three-channel color output at the x position, sigma is a volume density at the x position, wherein the volume density is 0 to indicate that the space is not occupied, 1 to indicate the surface of an object), and the network structure comprises two MLP networks, one of which is a volume density synthesis network and the other of which is a color synthesis network. Application prospect nerve spokeThe shooting field performs view rendering of a new view, when camera light rays r of a known view are given, wherein the number of the camera light rays r is determined by the number of pixels of the known view, and a light projection algorithm can calculate to obtain a color estimation value of the light projected onto the new view by using the following formula:

the position information of the sampling point on the light line after the inverted sphere parameterization pretreatment, namely the coordinates of the sampling point and the viewpoint direction of the sampling point, is input into a background radiation field network, the network structure is a fully-connected neural network and is used for processing the background part in the scene shot by a camera, and as the depth range is equivalent to the front Jing Lai, if the input foreground radiation module possibly causes rough detail part synthesis, the background part with larger depth value is input into the background radiation field for processing, the network structure comprises two MLP networks, one of which is a volume density synthesis network and the other of which is a color synthesis network. Specifically, the background neural radiation field modeling function is similar to the foreground modeling function, the MLP ₂ (x', d) = (c, σ). The difference is that the position points (x, y, z) on the light outside the unit circle need to be re-parameterized into quadruples (x ', y', z ', 1/r), x' ² +y′ ² +z′ ² =1, wherein (x ', y ', z ') is a unit vector in the same direction as (x, y, z), 1/r (0<1/r<1) Is the inverse radius along this direction indicating the point r· (x ', y ', z ') outside the sphere. The re-parameterized quaternion is bounded, where (x ', y ', z ') e [ -1,1 []，1/r∈[0,1]. The four-element (x ', y ', z ', 1/r) obtained by re-parameterization is used for representing the points of the background part of the scene, then the color and the volume density of the corresponding pixel position points of the new view are calculated by inputting the background nerve radiation field, and the new view rendering function is obtained by the following formula:

taking the obtained implicit scene illumination visibility feature vector as

The network branches are input, so that the foreground and background nerve radiation fields with variable illumination can be obtained, and the static scene three-dimensional geometrical structure shared by all images can be obtained. The color synthesis network of the foreground nerve radiation field with variable illumination at this time is shown as the following formula:

the color synthesis network of the illumination variable background neural radiation field is shown as follows:

the neural radiation field utilizes a volume rendering technology to fuse the foreground and the background to obtain a complete scene. Specifically, the constructed foreground nerve radiation field and background nerve radiation field are spliced to obtain a nerve radiation field model of the whole scene, namely, the whole scene nerve radiation field network model, and the rendering function is shown in the following formula:

When training a preset complete scene neural radiation field network model according to training data of each sampling point, training by adopting the residual error loss function L:

Specifically, the complete scene neural radiation field network model adopts a reverse propagation training method, and comprises the following steps: the method comprises (1) carrying out normalization processing on training data; (2) Inputting training data into a complete scene neural radiation field network model, and calculating the complete scene neural radiation field network model to output; (3) Calculating errors of actual output and expected output of the complete scene neural radiation field network model, and reversely adjusting parameters of each layer according to an Adam optimization method; (4) Repeating the steps (2) and (3) until all training data are input; (5) And calculating the accumulated total error of the actual output and the expected output of all training data, adding 1 to the training times, and ending the training if the total error is smaller than the set total error or the training times are larger than the set training times.

S4: inputting the set viewpoint direction into a trained complete scene neural radiation field network model, and rendering to obtain a scene image in the set viewpoint direction.

Specifically, the training complete scene neural radiation field network model is utilized, the set viewpoint direction is input into the training complete scene neural radiation field network model, and the scene image in the set viewpoint direction is obtained. The dense view generation after the sparse view is input into the neural network is realized, and finally, the large-scale high-quality unmanned airport scene image under any view point is obtained.

In summary, according to the unmanned aerial vehicle aerial image synthesis method, a plurality of two-dimensional images of unmanned aerial vehicle aerial images are obtained, three-dimensional reconstruction of a scene is carried out according to the plurality of two-dimensional images, a sparse three-dimensional point cloud model of the scene is obtained, sampling is carried out in the sparse three-dimensional point cloud model, a plurality of sampling points are obtained, training data of the sampling points are obtained, a preset complete scene neural radiation field network model is trained according to the training data of the sampling points, and finally the scene image in the set viewpoint direction is obtained through the complete scene neural radiation field network model which is completed through training. Based on the trained complete scene neural radiation field network model, view rendering under any view point can be realized, compared with a three-dimensional model of a scene which needs to be built in advance, the network training time is shorter than the reconstruction time of the three-dimensional model of the scene, the network training time is faster, the scene in a large scale range can be synthesized, high-quality view rendering under any view point can be realized, and a real-level high-quality new view can be realized by utilizing the complete scene neural radiation field network model rendering which is successfully trained in advance. In addition, the neural network can be trained offline, and can realize real-time rendering of the new viewpoint, and meanwhile, the synthesis range of the new viewpoint is not limited, so that the method has better application prospect. Meanwhile, the large-scale scene reconstruction is performed by utilizing a nerve rendering mode, and the acceleration processing can be performed by utilizing GPU equipment, so that the method is very suitable for large-scale scene reconstruction with large data volume.

Referring to fig. 6, in still another embodiment of the present invention, aerial images of a turret captured outdoors using an unmanned aerial vehicle are input into the network provided by the present invention for training to obtain a complete turret scene reconstruction model, and a trained neural network is used to implement a turret rendering view of this view under the acquired camera position.

It can be seen that under sparse input, compared with the original neural radiation field network, the scene reconstruction result obtained by training the neural network provided by the invention has a certain improvement on the reconstruction result of the foreground and background parts with larger depth values due to the separate modeling of the foreground and the background of a large-scale scene.

The following are device embodiments of the present invention that may be used to perform method embodiments of the present invention. For details not disclosed in the apparatus embodiments, please refer to the method embodiments of the present invention.

In still another embodiment of the present invention, an unmanned aerial vehicle aerial image synthesis system is provided, which can be used to implement the above unmanned aerial vehicle aerial image synthesis method, and specifically, the unmanned aerial vehicle aerial image synthesis system includes an acquisition module, a sampling module, a training module, and a rendering module.

The acquisition module is used for acquiring a plurality of two-dimensional images of the unmanned aerial vehicle; the sampling module is used for carrying out three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling is carried out in the sparse three-dimensional point cloud model to obtain a plurality of sampling points; the training module is used for acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all the sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values; the rendering module is used for inputting the set viewpoint direction into the trained complete scene neural radiation field network model, and obtaining a scene image in the set viewpoint direction through rendering.

In one possible implementation, the acquiring a number of two-dimensional images of the unmanned aerial vehicle comprises: and acquiring a plurality of two-dimensional images shot by the unmanned aerial vehicle at fixed heights, different positions and different visual angles, wherein the overlapping degree of the two-dimensional images shot adjacently is more than 80%.

In a possible implementation manner, the three-dimensional reconstruction of the scene from the plurality of two-dimensional images includes: the colmap image reconstruction method of the motion restoration structure carries out three-dimensional reconstruction of the scene.

In one possible implementation, the full scene neural radiation field network model includes a foreground neural radiation field network and a background neural radiation field network; the foreground nerve radiation field network comprises a foreground volume density synthesis network and foreground color synthesisA network; the foreground volume density synthesis network is:

the foreground color synthesis network is:

Representing the foreground image color estimated by the foreground color synthesis network, r (T) =o+td represents the light emitted along the ray origin, T e (0, T') represents the scene boundary of the sphere of the internal unit, T (T) represents the cumulative transparency of the rays along the viewpoint camera, and the calculation formula is +. >

An implicit illumination visibility feature vector code representing an ith image, a representing an illumination visibility feature vector; the background neural radiation field network comprises a background volume density synthesis network and a background color synthesis network; background bulk density synthetic networks are:

the background color synthesis network is:

wherein ,

Representing the background bulk density synthesis network,/->

The rendering function of the complete scene neural radiation field network model is as follows: />

In one possible implementation, obtaining a sample point implicit scene illumination visibility feature vector for each sample point includes: the opacity α and the cumulative transparency T of the camera light of the sampling point in the viewpoint direction are obtained using:

wherein ,δ_i ＝t _i+1 -t _i Representing the distance, sigma, between adjacent sampling points _i A bulk density representing a viewpoint direction; obtaining a sampling point implicit scene illumination visibility characteristic vector of a sampling point by using the following formula>

Where a is the illumination visibility feature vector.

In one possible implementation manner, the sampling in the sparse three-dimensional point cloud model includes: obtaining a near-border and a far-border of a scene according to the sparse three-dimensional point cloud model, and uniformly sampling between the near-border and the far-border of the scene to obtain sampling points of coarse sampling; inputting the sampling points of the rough sampling into a preset complete scene neural radiation field network model to obtain a probability density distribution function of the color; according to the probability density distribution function of the color, carrying out fine sampling in a region with the probability value larger than a preset threshold value to obtain a sampling point of fine sampling; and combining the sampling points of the coarse sampling and the fine sampling to obtain a final sampling point.

In one possible implementation manner, when the preset complete scene neural radiation field network model is trained according to the training data of each sampling point, the training is performed by adopting a residual loss function L of the following formula:

wherein R represents the total number of light up-sampling points of each batch of training, +. >

All relevant contents of each step involved in the foregoing embodiment of the unmanned aerial vehicle aerial image synthesis method may be cited to the functional description of the functional module corresponding to the unmanned aerial vehicle aerial image synthesis system in the embodiment of the present invention, which is not described herein.

The division of the modules in the embodiments of the present invention is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present invention may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.

In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program including program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computational core and control core of the terminal adapted to implement one or more instructions, in particular adapted to load and execute one or more instructions within a computer storage medium to implement the corresponding method flow or corresponding functions; the processor provided by the embodiment of the invention can be used for the operation of the unmanned aerial vehicle aerial image synthesis method.

In yet another embodiment of the present invention, a storage medium, specifically a computer readable storage medium (Memory), is a Memory device in a computer device, for storing a program and data. It is understood that the computer readable storage medium herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the corresponding steps of the method for aerial image synthesis for an unmanned aerial vehicle in the above-described embodiments.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The unmanned aerial vehicle aerial image synthesis method is characterized by comprising the following steps of:

acquiring a plurality of two-dimensional images of the unmanned aerial vehicle;

2. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein the acquiring a number of two-dimensional images of the unmanned aerial vehicle aerial image comprises:

3. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein the three-dimensional reconstruction of the scene from the plurality of two-dimensional images comprises:

4. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein the full scene neural radiation field network model comprises a foreground neural radiation field network and a background neural radiation field network;

the foreground color synthesis network is:

Representing the foreground image color estimated by the foreground color synthesis network, r (t) =o+td represents the light emitted along the origin of the light, t e (0, t') represents the internal unit sphereScene boundary of volume, T (T) represents cumulative transparency along viewpoint camera ray, calculated as +.>

the background color synthesis network is:

wherein ,

representing the background bulk density synthesis network,/->

5. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein obtaining a sample point implicit scene illumination visibility feature vector for each sample point comprises:

α _i ＝1-exp(-σ _i δ _i )

wherein ,δ_i ＝t _i+1 -t _i Representing the distance, sigma, between adjacent sampling points _i A bulk density representing a viewpoint direction;

Where a is the illumination visibility feature vector.

6. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein the sampling in the sparse three-dimensional point cloud model comprises:

7. The unmanned aerial vehicle aerial image synthesis method according to claim 6, wherein when training a preset complete scene neural radiation field network model according to training data of each sampling point, training is performed by adopting a residual loss function L of the following formula:

8. An unmanned aerial vehicle aerial image synthesis system, characterized by comprising:

9. Computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the unmanned aerial vehicle aerial image synthesis method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the unmanned aerial vehicle aerial image synthesis method of any of claims 1 to 7.