Nothing Special   »   [go: up one dir, main page]

CN116071278A - Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium - Google Patents

Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN116071278A
CN116071278A CN202211534828.6A CN202211534828A CN116071278A CN 116071278 A CN116071278 A CN 116071278A CN 202211534828 A CN202211534828 A CN 202211534828A CN 116071278 A CN116071278 A CN 116071278A
Authority
CN
China
Prior art keywords
scene
sampling
radiation field
unmanned aerial
aerial vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211534828.6A
Other languages
Chinese (zh)
Inventor
刘静
王钰琳
王浩龙
蒋晓瑜
苏立玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202211534828.6A priority Critical patent/CN116071278A/en
Publication of CN116071278A publication Critical patent/CN116071278A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of computer graphics and computer vision, and discloses an unmanned aerial vehicle aerial image synthesis method, an unmanned aerial vehicle aerial image synthesis system, computer equipment and a storage medium, wherein the unmanned aerial vehicle aerial image synthesis method comprises the following steps of: acquiring a plurality of two-dimensional images of the unmanned aerial vehicle; performing three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling in the sparse three-dimensional point cloud model to obtain a plurality of sampling points; acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all sampling points to obtain a trained complete scene neural radiation field network model; inputting the set viewpoint direction into a trained complete scene neural radiation field network model, and rendering to obtain a scene image in the set viewpoint direction. The method has faster rendering speed, can synthesize scenes in a large scale range, and can realize high-quality view rendering under any view point.

Description

Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium
Technical Field
The invention belongs to the technical field of computer graphics and computer vision, and relates to an unmanned aerial vehicle aerial image synthesis method, an unmanned aerial vehicle aerial image synthesis system, computer equipment and a storage medium.
Background
With the rapid development of technology, the 3D reconstruction technology has increasingly strong business demands in the fields of urban development, cultural relic protection, virtual reality, industrial geographic measurement and the like. However, the traditional 3D reconstruction method requires manual measurement of related data, and drawing of 3D graphics by professional software is time-consuming and labor-consuming. In recent years, unmanned aerial vehicle technology is gradually becoming a main tool for engineers to acquire aerial images or video images due to the characteristics of low cost, high efficiency and portability. At present, research on realizing three-dimensional reconstruction through sequence images shot by unmanned aerial vehicles is becoming mature. Among them, image-based viewpoint synthesis is an important issue of common concern in the fields of computer graphics and computer vision. Specifically, a plurality of images of known shooting viewpoints are used as input, and the three-dimensional objects or scenes shot by the images are expressed in terms of geometric, appearance, illumination and other properties, so that images of other non-shooting viewpoints can be synthesized, and finally a drawing result with high sense of reality is obtained. Compared with the traditional process of combining three-dimensional reconstruction with graphic drawing, the method can obtain a synthetic result of photo-level realism.
And solving the sparse point cloud of the camera pose and the three-dimensional space from the input aerial image sequence by using software such as Pix4D, visualSFM, smart D, colMap and the like based on a motion restoration structure (Structure From Motion, SFM) technology by using an SFM method, performing densification processing on the sparse point cloud, generating a triangular patch from the dense point cloud by grid reconstruction, and finally performing texture mapping on the grid to obtain the three-dimensional map with texture information. Although the whole flow of the method based on the SFM technology is mature, the huge algorithm calculation amount makes the method have high requirements on hardware configuration, particularly for high-resolution aerial image sequences, the processing time of the method is very long for keeping the reconstruction result of the original resolution, and the synthetic view point is very limited.
Synthesizing new views of a scene from a sparse set of captured images is a long-standing problem in computer vision, and is a prerequisite for many AR and VR applications. Although classical techniques have addressed this problem using motion structure-based or image-based rendering, there are still problems of explicit modeling difficulties, low three-dimensional reconstruction accuracy, and poor quality of the rendered image.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an unmanned aerial vehicle aerial image synthesis method, an unmanned aerial vehicle aerial image synthesis system, computer equipment and a storage medium.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in one aspect of the invention, an unmanned aerial vehicle aerial image synthesis method comprises the following steps:
acquiring a plurality of two-dimensional images of the unmanned aerial vehicle;
performing three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling in the sparse three-dimensional point cloud model to obtain a plurality of sampling points;
acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values;
inputting the set viewpoint direction into a trained complete scene neural radiation field network model, and rendering to obtain a scene image in the set viewpoint direction.
Optionally, the acquiring the plurality of two-dimensional images of the unmanned aerial vehicle comprises:
And acquiring a plurality of two-dimensional images shot by the unmanned aerial vehicle at fixed heights, different positions and different visual angles, wherein the overlapping degree of the two-dimensional images shot adjacently is more than 80%.
Optionally, the three-dimensional reconstruction of the scene from the plurality of two-dimensional images includes:
the colmap image reconstruction method of the motion restoration structure carries out three-dimensional reconstruction of the scene.
Optionally, the complete scene neural radiation field network model includes a foreground neural radiation field network and a background neural radiation field network;
the foreground neural radiation field network comprises a foreground volume density synthesis network and a foreground color synthesis network; the foreground volume density synthesis network is:
Figure BDA0003970886490000031
the foreground color synthesis network is:
Figure BDA0003970886490000032
wherein σ (t) represents a bulk density function; z (t) represents a position-coding-related feature vector;
Figure BDA0003970886490000033
representing a bulk density synthetic network; gamma ray x Representing a position code;
Figure BDA0003970886490000034
Representing the foreground image color estimated by the foreground color synthesis network, r (T) =o+td represents the light emitted along the ray origin, T e (0, T') represents the scene boundary of the sphere of the internal unit, T (T) represents the cumulative transparency of the rays along the viewpoint camera, and the calculation formula is +.>
Figure BDA0003970886490000035
The expression t represents the distance along ray r from the ray origin, c i (t) represents radiance, ">
Figure BDA0003970886490000036
Representing a foreground color synthesis network, gamma d (d) Indicating the viewing angle directionCoding (I)>
Figure BDA0003970886490000037
An implicit illumination visibility feature vector code representing an ith image, a representing an illumination visibility feature vector;
the background neural radiation field network comprises a background volume density synthesis network and a background color synthesis network; background bulk density synthetic networks are:
Figure BDA0003970886490000038
the background color synthesis network is:
Figure BDA0003970886490000039
wherein ,
Figure BDA00039708864900000310
representing the background bulk density synthesis network,/->
Figure BDA00039708864900000311
Representing the background image color estimated by the background color synthesis network, t e (t',) representing the outside of the unit sphere,/->
Figure BDA00039708864900000312
The rendering function of the complete scene neural radiation field network model is as follows:
Figure BDA0003970886490000041
wherein ,Ci (r) is the synthesized color value of the complete scene neural radiation field network model, (i) is the foreground neural radiation occasion color value, (ii) is the synthesized coefficient, and (iii) is the background neural radiation occasion color value.
Optionally, obtaining the sample point implicit scene illumination visibility feature vector of each sample point includes:
the opacity α and the cumulative transparency T of the camera light of the sampling point in the viewpoint direction are obtained using:
Figure BDA0003970886490000042
α i =1-exp(-σ i δ i )
wherein ,δi =t i+1 -t i Representing the distance, sigma, between adjacent sampling points i A bulk density representing a viewpoint direction; obtaining a sampling point implicit scene illumination visibility feature vector of a sampling point by using the following method
Figure BDA0003970886490000043
Figure BDA0003970886490000044
Where a is the illumination visibility feature vector.
Optionally, the sampling in the sparse three-dimensional point cloud model includes:
obtaining a near-border and a far-border of a scene according to the sparse three-dimensional point cloud model, and uniformly sampling between the near-border and the far-border of the scene to obtain sampling points of coarse sampling;
inputting the sampling points of the rough sampling into a preset complete scene neural radiation field network model to obtain a probability density distribution function of the color;
according to the probability density distribution function of the color, carrying out fine sampling in a region with the probability value larger than a preset threshold value to obtain a sampling point of fine sampling;
and combining the sampling points of the coarse sampling and the fine sampling to obtain a final sampling point.
Optionally, when training a preset complete scene neural radiation field network model according to training data of each sampling point, training is performed by adopting a residual loss function L of the following formula:
Figure BDA0003970886490000051
where R represents the total number of light up-sampling points per batch of training,
Figure BDA0003970886490000052
inputting color estimation value obtained by a complete scene neural radiation field network model for sampling points of rough sampling, < +.>
Figure BDA0003970886490000053
And C (r) represents the actual color value of the sampling point, which is the color estimated value obtained by the neural radiation field network model of the complete scene of the sampling point of the fine sampling.
In a second aspect of the present invention, an unmanned aerial vehicle aerial image synthesis system includes:
the acquisition module is used for acquiring a plurality of two-dimensional images of the unmanned aerial vehicle;
the sampling module is used for carrying out three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling the sparse three-dimensional point cloud model to obtain a plurality of sampling points;
the training module is used for acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all the sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values;
and the rendering module is used for inputting the set viewpoint direction into the trained complete scene neural radiation field network model, and obtaining a scene image in the set viewpoint direction through rendering.
In a third aspect of the present invention, a computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the steps of the unmanned aerial vehicle aerial image synthesis method are implemented when the processor executes the computer program.
In a fourth aspect of the present invention, a computer readable storage medium stores a computer program which, when executed by a processor, implements the steps of the unmanned aerial vehicle aerial image synthesis method described above.
Compared with the prior art, the invention has the following beneficial effects:
according to the unmanned aerial vehicle aerial image synthesis method, a plurality of two-dimensional images of unmanned aerial vehicle aerial images are obtained, three-dimensional reconstruction of a scene is carried out according to the plurality of two-dimensional images, a sparse three-dimensional point cloud model of the scene is obtained, sampling is carried out in the sparse three-dimensional point cloud model, a plurality of sampling points are obtained, training data of the sampling points are obtained, a preset complete scene neural radiation field network model is trained according to the training data of the sampling points, and finally the scene image in the set viewpoint direction is obtained through the complete scene neural radiation field network model which is obtained through training. Based on the trained complete scene neural radiation field network model, view rendering under any view point can be realized, compared with a three-dimensional model of a scene which needs to be built in advance, the network training time is shorter than the reconstruction time of the three-dimensional model of the scene, the network training time is faster, the scene in a large scale range can be synthesized, high-quality view rendering under any view point can be realized, and a real-level high-quality new view can be realized by utilizing the complete scene neural radiation field network model rendering which is successfully trained in advance. In addition, the neural network can be trained offline, and can realize real-time rendering of the new viewpoint, and meanwhile, the synthesis range of the new viewpoint is not limited, so that the method has better application prospect. Meanwhile, the large-scale scene reconstruction is performed by utilizing a nerve rendering mode, and the acceleration processing can be performed by utilizing GPU equipment, so that the method is very suitable for large-scale scene reconstruction with large data volume.
Drawings
FIG. 1 is a flowchart of an aerial image synthesis method of an unmanned aerial vehicle according to an embodiment of the present invention;
FIG. 2 is a detailed flowchart of an aerial image synthesizing method of the unmanned aerial vehicle according to the embodiment of the invention;
FIG. 3 is a schematic diagram of an implicit scene illumination visibility feature vector network framework in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of an overall framework of a full scene neural radiation field network model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the principle of the inverse sphere parameterization of the segmentation foreground and background according to the embodiment of the present invention;
fig. 6 is a schematic diagram of a scene reconstruction result at a certain view angle generated by a method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is described in further detail below with reference to the attached drawing figures:
in the field of large-scale high-quality unmanned aerial vehicle aerial map synthesis, a traditional three-dimensional reconstruction method needs to model a three-dimensional model of a whole large-scale scene, the time consumption is long, the precision of the synthesized three-dimensional model depends on the number of input views, and the more the number of acquired images, the more accurate the built model. However, the data volume is increased, so that three-dimensional reconstruction of a scene is long, the reconstructed model cannot synthesize images in any view point range, and only virtual view point images with fixed view angles can be synthesized, so that the method has great limitation.
Also, synthesizing new views of a scene from a sparse set of captured images is a long standing problem in computer vision, and is a prerequisite for many AR and VR applications. While classical techniques have addressed this problem using motion structure-based or image-based rendering, significant progress has recently been made in this field due to the rendering technique of shading (by adding neural network modules that learn 3D geometry and through a training network can reconstruct the observed image). Neural radiation field (NeRF) methods model the radiation field and density of a scene using the weights of a neural network. The new view is then synthesized using a volume-based rendering technique, displaying unprecedented fidelity over a range of challenging scenes.
In recent years, because the neural rendering technology is based on the principle of volume rendering, a neural network implicit function can be utilized to fit and reconstruct a scene, so that higher-quality and faster image rendering is realized, and therefore, the neural network implicit function reconstruction technology becomes a hot spot for research. Along with the rapid development of the deep learning technology, a plurality of methods based on deep learning are also proposed, and the accuracy and the sense of reality of viewpoint synthesis are further improved by a data driving mode. The large-scale high-quality live-action map synthesis based on the nerve radiation field is to construct a nerve radiation field network for scene reconstruction through training, view rendering under any view point can be realized by utilizing the trained nerve network, compared with a three-dimensional model of a scene which needs to be built in advance, the time of network training is short compared with the time of reconstructing the three-dimensional model of the scene, and a real-level high-quality new view can be realized by utilizing the network rendering which is successfully pre-trained. Because the neural network can be trained offline, the new view point can be rendered in real time, the synthesis range of the new view point is not limited, and the method has better application prospect.
Therefore, the invention discloses an unmanned aerial vehicle aerial image synthesis method, which is used for reconstructing a dense scene view according to a sparse two-dimensional image sequence obtained by unmanned aerial vehicle aerial shooting. Because the large-scale scene needs to be reconstructed, if all scene contents are reconstructed by using a nerve radiation field, the problem that details of the scene reconstruction with a large depth range are blurred occurs, so that the foreground and the background in the image need to be modeled respectively; and considering that the illumination can change during unmanned aerial vehicle aerial photography, the foreground and background nerve radiation field models with changeable illumination can be synthesized by carrying out implicit illumination visibility vector coding on the photographed static image, then the obtained two models are spliced to obtain the nerve radiation field model of the reconstructed scene, and the nerve rendering technology is utilized to render and realize large-scale high-quality unmanned aerial vehicle aerial photography map image synthesis.
The neural rendering method based on the neural radiation field is to convert a scene to be displayed through three-dimensional modeling into an implicit function simulating real imaging, and the color and density characteristics of the picture are estimated along the direction of the sight to render, so that a reconstruction result which is very consistent with the original picture is obtained. By means of the method, large-scale illumination variable high-quality image synthesis under any view point can be achieved by means of the trained model.
Referring to fig. 1 and 2, in an embodiment of the present invention, an unmanned aerial vehicle aerial image synthesis method is provided, which is suitable for large-scale high-quality unmanned aerial vehicle aerial image synthesis, wherein unmanned aerial vehicle aerial rendering images are generated by computer simulation. In this embodiment, the unmanned aerial vehicle aerial image synthesis method specifically includes the following steps:
s1: and acquiring a plurality of two-dimensional images of the unmanned aerial vehicle.
Wherein, obtain a plurality of two-dimensional images of unmanned aerial vehicle aerial photo by plane include: and acquiring a plurality of two-dimensional images shot by the unmanned aerial vehicle at fixed heights, different positions and different visual angles, wherein the overlapping degree of the two-dimensional images shot adjacently is more than 80%. The acquisition of a plurality of two-dimensional images of the unmanned aerial vehicle is generally a two-dimensional image sequence
Figure BDA0003970886490000091
N is the number of two-dimensional images. Specifically, a scene picture in a large scale range is aerial photographed by using an unmanned aerial vehicle, and a flight path of the unmanned aerial vehicle needs to be planned in advance, so that the unmanned aerial vehicle flies at a fixed height, the overlapping degree of photographed adjacent images is more than 80%, the inclination angle of a camera is 45 degrees, and two-dimensional images photographed by the unmanned aerial vehicle under N different positions and different view angles are obtained to form a two-dimensional image sequence->
Figure BDA0003970886490000092
S2: and carrying out three-dimensional reconstruction of the scene according to the plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling in the sparse three-dimensional point cloud model to obtain a plurality of sampling points.
Wherein the three-dimensional reconstruction of the scene from the plurality of two-dimensional images comprises: the colmap image reconstruction method of the motion restoration structure carries out three-dimensional reconstruction of the scene.
Specifically, image data preprocessing is performed on a two-dimensional image sequence obtained by aerial photography of an unmanned aerial vehicle, and a collmap image reconstruction program of a motion restoration structure technology is used for carrying out feature extraction, feature matching and sparse reconstruction on the image sequence to obtain internal and external parameters, sparse 3D position points and far-near plane parameters of a camera, wherein the far-near plane parameters comprise near and far boundaries of a scene. The feature extraction comprises the steps of extracting feature points of each two-dimensional image, detecting the positions of the feature points by using a SIFT algorithm, and calculating feature vectors of the feature points to obtain feature point information of each image. And then carrying out sparse 3D point cloud reconstruction according to the obtained result, namely carrying out feature matching among images by utilizing the obtained feature point information, firstly calculating the internal and external parameters of a camera of each image, then carrying out inter-image matching according to the camera parameters, reducing error matching point pairs in the feature matching by utilizing geometric verification after the inter-image feature point matching, then calculating a basic matrix and an essential matrix by utilizing epipolar geometric constraint in a double view, carrying out singular value decomposition on the essential matrix to obtain rotation and translation information of the images, then recovering three-dimensional coordinates of the feature points by utilizing a triangulation technology, carrying out integral optimization on all the feature points and the rotation and translation of the images by utilizing binding optimization to obtain a sparse three-dimensional point cloud model of a scene, and obtaining near and far boundaries of the scene according to the sparse three-dimensional point cloud model of the scene.
Sampling is then performed in a sparse three-dimensional point cloud model, typically comprising coarse sampling and fine sampling. Specifically, according to a sparse three-dimensional point cloud model, obtaining a near-range and a far-range of a scene, and uniformly sampling between the near-range and the far-range of the scene to obtain sampling points of coarse sampling; inputting the sampling points of the rough sampling into a preset complete scene neural radiation field network model to obtain a probability density distribution function of the color; according to the probability density distribution function of the color, carrying out fine sampling in a region with the probability value larger than a preset threshold value, namely placing more sampling points at the position with large probability to obtain fine sampling points; and combining the sampling points of the coarse sampling and the fine sampling to obtain a final sampling point.
S3: acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values.
Specifically, both the coordinates of the sampling points and the viewpoint directions of the sampling points can be obtained through a sparse three-dimensional point cloud model, and the illumination visibility feature vector of the implicit scene of the sampling points is obtained through the following steps:
The opacity α and the cumulative transparency T of the camera light of the sampling point in the viewpoint direction are obtained using:
Figure BDA0003970886490000111
α i =1-exp(-σ i δ i )
wherein, the index i is the sampling point sequence number.
Obtaining a sampling point implicit scene illumination visibility feature vector of a sampling point by using the following method
Figure BDA0003970886490000112
Figure BDA0003970886490000113
Where a is the illumination visibility feature vector.
Specifically, the image dimension of the two-dimensional image is H×W×3, and the implicit scene illumination visibility feature vector of the sampling point is set
Figure BDA0003970886490000114
For each two-dimensional image, the connecting line of the optical center and the pixel point is a light ray, n light rays are arbitrarily selected, each light ray is uniformly sampled, and the opacity alpha and the cumulative transparency T of the camera light rays along the direction of the viewpoint p are calculated by using the following formula:
Figure BDA0003970886490000115
α i =1-exp(-σ i δ i )
wherein ,δi =t i+1 -t i Representing the distance, sigma, between adjacent sampling points i The volume density in the direction of the viewpoint p is indicated.
Then, a visibility feature vector of camera light along the viewpoint p direction is calculated using the following formula
Figure BDA0003970886490000116
Figure BDA0003970886490000117
Sequence of two-dimensional images
Figure BDA0003970886490000118
Input to implicit scene illumination visibilityThe feature vector network performs network training using the following formula:
Figure BDA0003970886490000119
wherein L' represents a prediction error;
Figure BDA00039708864900001110
representing the illumination visibility characteristic vector estimated value calculated by the network; l (r) represents the actual illumination visibility.
Referring to fig. 3, the implicit scene illumination visibility feature vector network is a fully connected neural network, the input is a position coordinate and a viewpoint direction, the network training uses Adam optimization method, and through a back propagation algorithm, the network weight parameters are updated and the implicit scene illumination visibility feature vector set is obtained. Obtaining final implicit scene illumination visibility feature vector codes of all two-dimensional images through the trained implicit scene illumination visibility feature vector network
Figure BDA0003970886490000121
When training the implicit scene illumination visibility feature vector network, the light direction needs to be sampled according to the pixel points and the camera positions, wherein the sampling is uniform sampling. The implicit scene illumination visibility feature vector of each two-dimensional image obtained is encoded in the following way>
Figure BDA0003970886490000122
And (3) splicing the two-dimensional images and inputting the two-dimensional images into a preset complete scene neural radiation field network model for training. For example, for a two-dimensional image sequence->
Figure BDA0003970886490000123
The connection line of the optical center and the pixel points is a light ray, n light rays are arbitrarily selected, each light ray is uniformly sampled for 16 points, and the position information code and the viewpoint direction code of the sampled points are input into an implicit scene illumination visibility feature vector network to obtain The corresponding two-dimensional image sequence is acquired>
Figure BDA0003970886490000124
N implicit scene illumination visibility feature vectors +.>
Figure BDA0003970886490000125
And then inputting the color values into a complete scene neural radiation field network model to obtain corresponding color values.
The above steps are to realize feature vector coding of different illumination visibility information on each pixel point corresponding to light on images in different view directions between scenes, and obtain global illumination visibility information of the scenes, so as to input the global illumination visibility information into a nerve radiation field network to further synthesize an illumination variable nerve radiation field.
Referring to fig. 4, the complete scene neural radiation field network model includes a foreground neural radiation field network and a background neural radiation field network; the foreground neural radiation field network comprises a foreground volume density synthesis network and a foreground color synthesis network; the foreground volume density synthesis network is:
Figure BDA0003970886490000126
the foreground color synthesis network is:
Figure BDA0003970886490000127
wherein σ (t) represents a bulk density function; z (t) represents a position-coding-related feature vector;
Figure BDA0003970886490000128
Representing a bulk density synthetic network; gamma ray x Representing a position code;
Figure BDA0003970886490000129
Representing a foreground image color estimated by a foreground color synthesis network, r=o+t d Representing the light rays emitted along the origin of the rays, t.epsilon. (0, T') representing the scene boundary of an internal unit sphere, T (T) representing the cumulative transparency of the camera rays along the viewpoint, calculated as +. >
Figure BDA0003970886490000131
The expression t represents the distance along ray r from the ray origin, c i (t) represents radiance, ">
Figure BDA0003970886490000132
Representing a foreground color synthesis network, gamma d (d) Indicating the coding of the viewing direction>
Figure BDA0003970886490000133
An implicit illumination visibility feature vector code representing an ith image, and a represents an illumination visibility feature vector.
The background neural radiation field network comprises a background volume density synthesis network and a background color synthesis network; background bulk density synthetic networks are:
Figure BDA0003970886490000134
the background color synthesis network is:
Figure BDA0003970886490000135
wherein ,
Figure BDA0003970886490000136
Representing the background bulk density synthesis network,/->
Figure BDA0003970886490000137
Representing the background image color estimated by the background color synthesis network, t e (t',) representing the outside of the unit sphere,/->
Figure BDA0003970886490000138
The rendering function of the complete scene neural radiation field network model is as follows:
Figure BDA0003970886490000139
wherein ,Ci (r) is the complete scene godThe synthesized color values via the radiation field network model are (i) foreground neural radiation occasion color values, (ii) synthesis coefficients, and (iii) background neural radiation occasion color values.
In this embodiment, for constructing a preset complete scene neural radiation field network model, the following specific implementation steps are provided: for separate modeling of scenes with different depth values in the scene, when processing to a part with a larger depth value, a new view of high quality is synthesized for better benefit of its detail part. Thus, the scene space is divided into two parts, one inner unit sphere containing the foreground in the scene and one outer volume containing the background in the scene. The modeling is carried out by the inner unit sphere by using the foreground nerve radiation field, no additional parameterization is needed, the modeling is carried out by the outer unit sphere by using the background nerve radiation field, and the inverted sphere parameterization is needed. See fig. 5, wherein the inverted ball parameterization process is as follows: representing the scene in a unit circle S, and striving to invert the sphere parameterization to divide the foreground and the background of the scene, the position points in the foreground of the scene can be represented as (x, y, z), and the position points in the background can be represented as
Figure BDA0003970886490000141
Where r' is the radius of the unit circle S.
Figure BDA0003970886490000142
The position codes and the direction codes of the sampling points on the light rays in the unit circle are directly input into a foreground radiation field, and the modeling of the foreground nerve radiation field can be expressed as a function: MLP (Multi-layer Programming protocol) 1 (x, d) = (c, sigma), wherein MLP is a fully connected network, x is a three-dimensional space coordinate, d is a two-dimensional viewing angle direction, c is a three-channel color output at the x position, sigma is a volume density at the x position, wherein the volume density is 0 to indicate that the space is not occupied, 1 to indicate the surface of an object), and the network structure comprises two MLP networks, one of which is a volume density synthesis network and the other of which is a color synthesis network. Application prospect nerve spokeThe shooting field performs view rendering of a new view, when camera light rays r of a known view are given, wherein the number of the camera light rays r is determined by the number of pixels of the known view, and a light projection algorithm can calculate to obtain a color estimation value of the light projected onto the new view by using the following formula:
Figure BDA0003970886490000143
the position information of the sampling point on the light line after the inverted sphere parameterization pretreatment, namely the coordinates of the sampling point and the viewpoint direction of the sampling point, is input into a background radiation field network, the network structure is a fully-connected neural network and is used for processing the background part in the scene shot by a camera, and as the depth range is equivalent to the front Jing Lai, if the input foreground radiation module possibly causes rough detail part synthesis, the background part with larger depth value is input into the background radiation field for processing, the network structure comprises two MLP networks, one of which is a volume density synthesis network and the other of which is a color synthesis network. Specifically, the background neural radiation field modeling function is similar to the foreground modeling function, the MLP 2 (x', d) = (c, σ). The difference is that the position points (x, y, z) on the light outside the unit circle need to be re-parameterized into quadruples (x ', y', z ', 1/r), x' 2 +y′ 2 +z′ 2 =1, wherein (x ', y ', z ') is a unit vector in the same direction as (x, y, z), 1/r (0<1/r<1) Is the inverse radius along this direction indicating the point r· (x ', y ', z ') outside the sphere. The re-parameterized quaternion is bounded, where (x ', y ', z ') e [ -1,1 [],1/r∈[0,1]. The four-element (x ', y ', z ', 1/r) obtained by re-parameterization is used for representing the points of the background part of the scene, then the color and the volume density of the corresponding pixel position points of the new view are calculated by inputting the background nerve radiation field, and the new view rendering function is obtained by the following formula:
Figure BDA0003970886490000151
taking the obtained implicit scene illumination visibility feature vector as
Figure BDA0003970886490000152
The network branches are input, so that the foreground and background nerve radiation fields with variable illumination can be obtained, and the static scene three-dimensional geometrical structure shared by all images can be obtained. The color synthesis network of the foreground nerve radiation field with variable illumination at this time is shown as the following formula:
Figure BDA0003970886490000153
Figure BDA0003970886490000154
the color synthesis network of the illumination variable background neural radiation field is shown as follows:
Figure BDA0003970886490000155
Figure BDA0003970886490000156
the neural radiation field utilizes a volume rendering technology to fuse the foreground and the background to obtain a complete scene. Specifically, the constructed foreground nerve radiation field and background nerve radiation field are spliced to obtain a nerve radiation field model of the whole scene, namely, the whole scene nerve radiation field network model, and the rendering function is shown in the following formula:
Figure BDA0003970886490000157
When training a preset complete scene neural radiation field network model according to training data of each sampling point, training by adopting the residual error loss function L:
Figure BDA0003970886490000161
where R represents the total number of light up-sampling points per batch of training,
Figure BDA0003970886490000162
inputting color estimation value obtained by a complete scene neural radiation field network model for sampling points of rough sampling, < +.>
Figure BDA0003970886490000163
And C (r) represents the actual color value of the sampling point, which is the color estimated value obtained by the neural radiation field network model of the complete scene of the sampling point of the fine sampling.
Specifically, the complete scene neural radiation field network model adopts a reverse propagation training method, and comprises the following steps: the method comprises (1) carrying out normalization processing on training data; (2) Inputting training data into a complete scene neural radiation field network model, and calculating the complete scene neural radiation field network model to output; (3) Calculating errors of actual output and expected output of the complete scene neural radiation field network model, and reversely adjusting parameters of each layer according to an Adam optimization method; (4) Repeating the steps (2) and (3) until all training data are input; (5) And calculating the accumulated total error of the actual output and the expected output of all training data, adding 1 to the training times, and ending the training if the total error is smaller than the set total error or the training times are larger than the set training times.
S4: inputting the set viewpoint direction into a trained complete scene neural radiation field network model, and rendering to obtain a scene image in the set viewpoint direction.
Specifically, the training complete scene neural radiation field network model is utilized, the set viewpoint direction is input into the training complete scene neural radiation field network model, and the scene image in the set viewpoint direction is obtained. The dense view generation after the sparse view is input into the neural network is realized, and finally, the large-scale high-quality unmanned airport scene image under any view point is obtained.
In summary, according to the unmanned aerial vehicle aerial image synthesis method, a plurality of two-dimensional images of unmanned aerial vehicle aerial images are obtained, three-dimensional reconstruction of a scene is carried out according to the plurality of two-dimensional images, a sparse three-dimensional point cloud model of the scene is obtained, sampling is carried out in the sparse three-dimensional point cloud model, a plurality of sampling points are obtained, training data of the sampling points are obtained, a preset complete scene neural radiation field network model is trained according to the training data of the sampling points, and finally the scene image in the set viewpoint direction is obtained through the complete scene neural radiation field network model which is completed through training. Based on the trained complete scene neural radiation field network model, view rendering under any view point can be realized, compared with a three-dimensional model of a scene which needs to be built in advance, the network training time is shorter than the reconstruction time of the three-dimensional model of the scene, the network training time is faster, the scene in a large scale range can be synthesized, high-quality view rendering under any view point can be realized, and a real-level high-quality new view can be realized by utilizing the complete scene neural radiation field network model rendering which is successfully trained in advance. In addition, the neural network can be trained offline, and can realize real-time rendering of the new viewpoint, and meanwhile, the synthesis range of the new viewpoint is not limited, so that the method has better application prospect. Meanwhile, the large-scale scene reconstruction is performed by utilizing a nerve rendering mode, and the acceleration processing can be performed by utilizing GPU equipment, so that the method is very suitable for large-scale scene reconstruction with large data volume.
Referring to fig. 6, in still another embodiment of the present invention, aerial images of a turret captured outdoors using an unmanned aerial vehicle are input into the network provided by the present invention for training to obtain a complete turret scene reconstruction model, and a trained neural network is used to implement a turret rendering view of this view under the acquired camera position.
It can be seen that under sparse input, compared with the original neural radiation field network, the scene reconstruction result obtained by training the neural network provided by the invention has a certain improvement on the reconstruction result of the foreground and background parts with larger depth values due to the separate modeling of the foreground and the background of a large-scale scene.
The following are device embodiments of the present invention that may be used to perform method embodiments of the present invention. For details not disclosed in the apparatus embodiments, please refer to the method embodiments of the present invention.
In still another embodiment of the present invention, an unmanned aerial vehicle aerial image synthesis system is provided, which can be used to implement the above unmanned aerial vehicle aerial image synthesis method, and specifically, the unmanned aerial vehicle aerial image synthesis system includes an acquisition module, a sampling module, a training module, and a rendering module.
The acquisition module is used for acquiring a plurality of two-dimensional images of the unmanned aerial vehicle; the sampling module is used for carrying out three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling is carried out in the sparse three-dimensional point cloud model to obtain a plurality of sampling points; the training module is used for acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all the sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values; the rendering module is used for inputting the set viewpoint direction into the trained complete scene neural radiation field network model, and obtaining a scene image in the set viewpoint direction through rendering.
In one possible implementation, the acquiring a number of two-dimensional images of the unmanned aerial vehicle comprises: and acquiring a plurality of two-dimensional images shot by the unmanned aerial vehicle at fixed heights, different positions and different visual angles, wherein the overlapping degree of the two-dimensional images shot adjacently is more than 80%.
In a possible implementation manner, the three-dimensional reconstruction of the scene from the plurality of two-dimensional images includes: the colmap image reconstruction method of the motion restoration structure carries out three-dimensional reconstruction of the scene.
In one possible implementation, the full scene neural radiation field network model includes a foreground neural radiation field network and a background neural radiation field network; the foreground nerve radiation field network comprises a foreground volume density synthesis network and foreground color synthesisA network; the foreground volume density synthesis network is:
Figure BDA0003970886490000181
the foreground color synthesis network is:
Figure BDA0003970886490000182
Wherein σ (t) represents a bulk density function; z (t) represents a position-coding-related feature vector;
Figure BDA0003970886490000183
Representing a bulk density synthetic network; gamma ray x Representing a position code;
Figure BDA0003970886490000184
Representing the foreground image color estimated by the foreground color synthesis network, r (T) =o+td represents the light emitted along the ray origin, T e (0, T') represents the scene boundary of the sphere of the internal unit, T (T) represents the cumulative transparency of the rays along the viewpoint camera, and the calculation formula is +. >
Figure BDA0003970886490000185
The expression t represents the distance along ray r from the ray origin, c i (t) represents radiance, ">
Figure BDA0003970886490000186
Representing a foreground color synthesis network, gamma d (d) Indicating the coding of the viewing direction>
Figure BDA0003970886490000187
An implicit illumination visibility feature vector code representing an ith image, a representing an illumination visibility feature vector; the background neural radiation field network comprises a background volume density synthesis network and a background color synthesis network; background bulk density synthetic networks are:
Figure BDA0003970886490000191
the background color synthesis network is:
Figure BDA0003970886490000192
Figure BDA0003970886490000193
wherein ,
Figure BDA0003970886490000194
Representing the background bulk density synthesis network,/->
Figure BDA0003970886490000195
Representing the background image color estimated by the background color synthesis network, t e (t',) representing the outside of the unit sphere,/->
Figure BDA0003970886490000196
The rendering function of the complete scene neural radiation field network model is as follows: />
Figure BDA0003970886490000197
wherein ,Ci (r) is the synthesized color value of the complete scene neural radiation field network model, (i) is the foreground neural radiation occasion color value, (ii) is the synthesized coefficient, and (iii) is the background neural radiation occasion color value.
In one possible implementation, obtaining a sample point implicit scene illumination visibility feature vector for each sample point includes: the opacity α and the cumulative transparency T of the camera light of the sampling point in the viewpoint direction are obtained using:
Figure BDA0003970886490000198
wherein ,δi =t i+1 -t i Representing the distance, sigma, between adjacent sampling points i A bulk density representing a viewpoint direction; obtaining a sampling point implicit scene illumination visibility characteristic vector of a sampling point by using the following formula>
Figure BDA0003970886490000199
Figure BDA00039708864900001910
Where a is the illumination visibility feature vector.
In one possible implementation manner, the sampling in the sparse three-dimensional point cloud model includes: obtaining a near-border and a far-border of a scene according to the sparse three-dimensional point cloud model, and uniformly sampling between the near-border and the far-border of the scene to obtain sampling points of coarse sampling; inputting the sampling points of the rough sampling into a preset complete scene neural radiation field network model to obtain a probability density distribution function of the color; according to the probability density distribution function of the color, carrying out fine sampling in a region with the probability value larger than a preset threshold value to obtain a sampling point of fine sampling; and combining the sampling points of the coarse sampling and the fine sampling to obtain a final sampling point.
In one possible implementation manner, when the preset complete scene neural radiation field network model is trained according to the training data of each sampling point, the training is performed by adopting a residual loss function L of the following formula:
Figure BDA0003970886490000201
wherein R represents the total number of light up-sampling points of each batch of training, +. >
Figure BDA0003970886490000202
Inputting color estimation value obtained by a complete scene neural radiation field network model for sampling points of rough sampling, < +.>
Figure BDA0003970886490000203
And C (r) represents the actual color value of the sampling point, which is the color estimated value obtained by the neural radiation field network model of the complete scene of the sampling point of the fine sampling.
All relevant contents of each step involved in the foregoing embodiment of the unmanned aerial vehicle aerial image synthesis method may be cited to the functional description of the functional module corresponding to the unmanned aerial vehicle aerial image synthesis system in the embodiment of the present invention, which is not described herein.
The division of the modules in the embodiments of the present invention is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present invention may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules.
In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program including program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computational core and control core of the terminal adapted to implement one or more instructions, in particular adapted to load and execute one or more instructions within a computer storage medium to implement the corresponding method flow or corresponding functions; the processor provided by the embodiment of the invention can be used for the operation of the unmanned aerial vehicle aerial image synthesis method.
In yet another embodiment of the present invention, a storage medium, specifically a computer readable storage medium (Memory), is a Memory device in a computer device, for storing a program and data. It is understood that the computer readable storage medium herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the corresponding steps of the method for aerial image synthesis for an unmanned aerial vehicle in the above-described embodiments.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (10)

1. The unmanned aerial vehicle aerial image synthesis method is characterized by comprising the following steps of:
acquiring a plurality of two-dimensional images of the unmanned aerial vehicle;
performing three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling in the sparse three-dimensional point cloud model to obtain a plurality of sampling points;
acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values;
inputting the set viewpoint direction into a trained complete scene neural radiation field network model, and rendering to obtain a scene image in the set viewpoint direction.
2. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein the acquiring a number of two-dimensional images of the unmanned aerial vehicle aerial image comprises:
and acquiring a plurality of two-dimensional images shot by the unmanned aerial vehicle at fixed heights, different positions and different visual angles, wherein the overlapping degree of the two-dimensional images shot adjacently is more than 80%.
3. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein the three-dimensional reconstruction of the scene from the plurality of two-dimensional images comprises:
the colmap image reconstruction method of the motion restoration structure carries out three-dimensional reconstruction of the scene.
4. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein the full scene neural radiation field network model comprises a foreground neural radiation field network and a background neural radiation field network;
the foreground neural radiation field network comprises a foreground volume density synthesis network and a foreground color synthesis network; the foreground volume density synthesis network is:
Figure FDA0003970886480000011
the foreground color synthesis network is:
Figure FDA0003970886480000012
wherein σ (t) represents a bulk density function; z (t) represents a position-coding-related feature vector;
Figure FDA0003970886480000021
representing a bulk density synthetic network; gamma ray x Representing a position code;
Figure FDA0003970886480000022
Representing the foreground image color estimated by the foreground color synthesis network, r (t) =o+td represents the light emitted along the origin of the light, t e (0, t') represents the internal unit sphereScene boundary of volume, T (T) represents cumulative transparency along viewpoint camera ray, calculated as +.>
Figure FDA0003970886480000023
The expression t represents the distance along ray r from the ray origin, c i (t) represents radiance, ">
Figure FDA0003970886480000024
Representing a foreground color synthesis network, gamma d (d) Indicating the coding of the viewing direction>
Figure FDA0003970886480000025
An implicit illumination visibility feature vector code representing an ith image, a representing an illumination visibility feature vector;
the background neural radiation field network comprises a background volume density synthesis network and a background color synthesis network; background bulk density synthetic networks are:
Figure FDA0003970886480000026
the background color synthesis network is:
Figure FDA0003970886480000027
wherein ,
Figure FDA0003970886480000028
representing the background bulk density synthesis network,/->
Figure FDA0003970886480000029
Representing the background image color estimated by the background color synthesis network, t e (t',) representing the outside of the unit sphere,/->
Figure FDA00039708864800000210
The rendering function of the complete scene neural radiation field network model is as follows:
Figure FDA00039708864800000211
wherein ,Ci (r) is the synthesized color value of the complete scene neural radiation field network model, (i) is the foreground neural radiation occasion color value, (ii) is the synthesized coefficient, and (iii) is the background neural radiation occasion color value.
5. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein obtaining a sample point implicit scene illumination visibility feature vector for each sample point comprises:
the opacity α and the cumulative transparency T of the camera light of the sampling point in the viewpoint direction are obtained using:
Figure FDA0003970886480000031
α i =1-exp(-σ i δ i )
wherein ,δi =t i+1 -t i Representing the distance, sigma, between adjacent sampling points i A bulk density representing a viewpoint direction;
obtaining a sampling point implicit scene illumination visibility feature vector of a sampling point by using the following method
Figure FDA0003970886480000032
Figure FDA0003970886480000033
Where a is the illumination visibility feature vector.
6. The unmanned aerial vehicle aerial image synthesis method of claim 1, wherein the sampling in the sparse three-dimensional point cloud model comprises:
obtaining a near-border and a far-border of a scene according to the sparse three-dimensional point cloud model, and uniformly sampling between the near-border and the far-border of the scene to obtain sampling points of coarse sampling;
inputting the sampling points of the rough sampling into a preset complete scene neural radiation field network model to obtain a probability density distribution function of the color;
according to the probability density distribution function of the color, carrying out fine sampling in a region with the probability value larger than a preset threshold value to obtain a sampling point of fine sampling;
and combining the sampling points of the coarse sampling and the fine sampling to obtain a final sampling point.
7. The unmanned aerial vehicle aerial image synthesis method according to claim 6, wherein when training a preset complete scene neural radiation field network model according to training data of each sampling point, training is performed by adopting a residual loss function L of the following formula:
Figure FDA0003970886480000034
Where R represents the total number of light up-sampling points per batch of training,
Figure FDA0003970886480000035
inputting color estimation value obtained by a complete scene neural radiation field network model for sampling points of rough sampling, < +.>
Figure FDA0003970886480000036
And C (r) represents the actual color value of the sampling point, which is the color estimated value obtained by the neural radiation field network model of the complete scene of the sampling point of the fine sampling.
8. An unmanned aerial vehicle aerial image synthesis system, characterized by comprising:
the acquisition module is used for acquiring a plurality of two-dimensional images of the unmanned aerial vehicle;
the sampling module is used for carrying out three-dimensional reconstruction of a scene according to a plurality of two-dimensional images to obtain a sparse three-dimensional point cloud model of the scene, and sampling the sparse three-dimensional point cloud model to obtain a plurality of sampling points;
the training module is used for acquiring training data of all sampling points, and training a preset complete scene neural radiation field network model according to the training data of all the sampling points to obtain a trained complete scene neural radiation field network model; the training data comprises sampling point coordinates, sampling point viewpoint directions, sampling point implicit scene illumination visibility feature vectors and sampling point actual color values;
and the rendering module is used for inputting the set viewpoint direction into the trained complete scene neural radiation field network model, and obtaining a scene image in the set viewpoint direction through rendering.
9. Computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the unmanned aerial vehicle aerial image synthesis method according to any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the unmanned aerial vehicle aerial image synthesis method of any of claims 1 to 7.
CN202211534828.6A 2022-11-29 2022-11-29 Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium Pending CN116071278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211534828.6A CN116071278A (en) 2022-11-29 2022-11-29 Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211534828.6A CN116071278A (en) 2022-11-29 2022-11-29 Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116071278A true CN116071278A (en) 2023-05-05

Family

ID=86181102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211534828.6A Pending CN116071278A (en) 2022-11-29 2022-11-29 Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116071278A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580212A (en) * 2023-05-16 2023-08-11 北京百度网讯科技有限公司 Image generation method, training method, device and equipment of image generation model
CN116958449A (en) * 2023-09-12 2023-10-27 北京邮电大学 Urban scene three-dimensional modeling method and device and electronic equipment
CN116958492A (en) * 2023-07-12 2023-10-27 数元科技(广州)有限公司 VR editing application based on NeRf reconstruction three-dimensional base scene rendering
CN117422804A (en) * 2023-10-24 2024-01-19 中国科学院空天信息创新研究院 Large-scale city block three-dimensional scene rendering and target fine space positioning method
CN117544829A (en) * 2023-10-16 2024-02-09 支付宝(杭州)信息技术有限公司 Video generation method and device
CN117876346A (en) * 2024-01-16 2024-04-12 湖南湖大华龙电气与信息技术有限公司 Insulator autonomous infrared three-dimensional visual detection method and edge intelligent device
CN118379459A (en) * 2024-06-21 2024-07-23 南京工业大学 Bridge disease visualization method based on three-dimensional reconstruction of nerve radiation field

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580212A (en) * 2023-05-16 2023-08-11 北京百度网讯科技有限公司 Image generation method, training method, device and equipment of image generation model
CN116580212B (en) * 2023-05-16 2024-02-06 北京百度网讯科技有限公司 Image generation method, training method, device and equipment of image generation model
CN116958492A (en) * 2023-07-12 2023-10-27 数元科技(广州)有限公司 VR editing application based on NeRf reconstruction three-dimensional base scene rendering
CN116958492B (en) * 2023-07-12 2024-05-03 数元科技(广州)有限公司 VR editing method for reconstructing three-dimensional base scene rendering based on NeRf
CN116958449A (en) * 2023-09-12 2023-10-27 北京邮电大学 Urban scene three-dimensional modeling method and device and electronic equipment
CN116958449B (en) * 2023-09-12 2024-04-30 北京邮电大学 Urban scene three-dimensional modeling method and device and electronic equipment
CN117544829A (en) * 2023-10-16 2024-02-09 支付宝(杭州)信息技术有限公司 Video generation method and device
CN117422804A (en) * 2023-10-24 2024-01-19 中国科学院空天信息创新研究院 Large-scale city block three-dimensional scene rendering and target fine space positioning method
CN117422804B (en) * 2023-10-24 2024-06-07 中国科学院空天信息创新研究院 Large-scale city block three-dimensional scene rendering and target fine space positioning method
CN117876346A (en) * 2024-01-16 2024-04-12 湖南湖大华龙电气与信息技术有限公司 Insulator autonomous infrared three-dimensional visual detection method and edge intelligent device
CN118379459A (en) * 2024-06-21 2024-07-23 南京工业大学 Bridge disease visualization method based on three-dimensional reconstruction of nerve radiation field
CN118379459B (en) * 2024-06-21 2024-09-03 南京工业大学 Bridge disease visualization method based on three-dimensional reconstruction of nerve radiation field

Similar Documents

Publication Publication Date Title
CN116071278A (en) Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium
Gortler et al. The lumigraph
Riegler et al. Free view synthesis
CN109003325B (en) Three-dimensional reconstruction method, medium, device and computing equipment
CN106803267B (en) Kinect-based indoor scene three-dimensional reconstruction method
CN115100339B (en) Image generation method, device, electronic equipment and storage medium
CN108876814B (en) Method for generating attitude flow image
CN103530907B (en) Complicated three-dimensional model drawing method based on images
CN114863038B (en) Real-time dynamic free visual angle synthesis method and device based on explicit geometric deformation
CN115428027A (en) Neural opaque point cloud
US11544898B2 (en) Method, computer device and storage medium for real-time urban scene reconstruction
CN116958379A (en) Image rendering method, device, electronic equipment, storage medium and program product
JP2024510230A (en) Multi-view neural human prediction using implicitly differentiable renderer for facial expression, body pose shape and clothing performance capture
CN115115805A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
CN117745932A (en) Neural implicit curved surface reconstruction method based on depth fusion constraint
Lu et al. Single image shape-from-silhouettes
Gu et al. Ue4-nerf: Neural radiance field for real-time rendering of large-scale scene
Wang et al. Neural opacity point cloud
Zhang et al. Resimad: Zero-shot 3d domain transfer for autonomous driving with source reconstruction and target simulation
CN116681839B (en) Live three-dimensional target reconstruction and singulation method based on improved NeRF
CN118154770A (en) Single tree image three-dimensional reconstruction method and device based on nerve radiation field
Niu et al. Overview of image-based 3D reconstruction technology
Li et al. Point-Based Neural Scene Rendering for Street Views
CN115239559A (en) Depth map super-resolution method and system for fusion view synthesis
Wei et al. LiDeNeRF: Neural radiance field reconstruction with depth prior provided by LiDAR point cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination