CN114140510A

CN114140510A - Incremental three-dimensional reconstruction method and device and computer equipment

Info

Publication number: CN114140510A
Application number: CN202111470822.2A
Authority: CN
Inventors: 王俊翔
Original assignee: Beijing Moviebook Science And Technology Co ltd
Current assignee: Beijing Moviebook Science And Technology Co ltd
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2022-03-04
Anticipated expiration: 2041-12-03
Also published as: CN114140510B

Abstract

The embodiment of the invention discloses an incremental three-dimensional reconstruction method, an incremental three-dimensional reconstruction device and computer equipment, and aims to solve the problems of complex operation, insufficient precision and the like in the prior art. The incremental three-dimensional reconstruction method mainly comprises the following steps: acquiring a picture sequence of a target scene; calculating the projection ray of each pixel point of each image; discrete sampling and rendering of projection ray rays; implicit reconstruction of the model; and (4) solving the parameters of the camera reversely, and performing incremental reconstruction based on the reconstruction result of the previous image. The method uses the nerve radiation field technology to implicitly reconstruct a three-dimensional model, also uses the volume rendering principle and a micro-rendering algorithm to realize the rendering process from a 3D voxel model to a 2D image, so that the three-dimensional model can use differentiation to carry out nonlinear optimization, thereby realizing end-to-end three-dimensional reconstruction, and having simple and convenient operation and high precision.

Description

Incremental three-dimensional reconstruction method and device and computer equipment

Technical Field

The invention relates to the technical field of computer vision, in particular to a three-dimensional reconstruction method based on images.

Background

With the development of computer vision, technologies such as three-dimensional reconstruction, AR, autopilot, SLAM and the like gradually enter our daily life, wherein image-based three-dimensional reconstruction is always one of the most important challenges in the field of computer vision. The three-dimensional reconstruction method based on the image can restore the geometric shape and the texture information of a scene according to the image, and has wide application value in real life, such as defect detection, digital vestige, electronic maps, navigation and the like. Different application directions have different requirements on three-dimensional reconstruction.

The current three-dimensional reconstruction method mainly comprises three-dimensional reconstruction based on a depth camera, three-dimensional reconstruction based on structured light and multi-view three-dimensional reconstruction, wherein the three-dimensional reconstruction based on the depth camera and the three-dimensional reconstruction based on the structured light are limited by hardware, and the multi-view three-dimensional reconstruction requires abundant shooting feature points and is also limited. Generally speaking, most of the operations are complicated and the precision is not ideal.

Disclosure of Invention

Therefore, the embodiment of the invention provides an incremental three-dimensional reconstruction method, an incremental three-dimensional reconstruction device and computer equipment, and aims to solve the problems of complex operation, insufficient precision and the like in the prior art.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

in a first aspect, an incremental three-dimensional reconstruction method includes the following steps:

s1: acquiring a picture sequence of a target scene; the picture sequence comprises a plurality of images which are obtained correspondingly by shooting a target scene from multiple angles; performing the following steps S2-S4 for any image in the sequence of pictures;

s2: calculating a back projection ray from each pixel point of the image to the center position of the camera;

s3: setting a depth range of the back projection ray, discretely sampling a series of 3D points on the back projection ray in the depth range, solving through a neural network model represented by a multilayer perceptron, and performing micro-rendering on pixel points by using a volume rendering principle to obtain a rendering result of the pixel points, wherein gradient information is obtained in the micro-rendering process;

s4: randomly sampling pixel points on the image, carrying out implicit reconstruction on the model, and obtaining an initial reconstruction result after iterative calculation and rendering according to the steps S2 and S3; updating parameters of the neural network model and attitude information of the camera in the implicit reconstruction process, and utilizing gradient information obtained in the micro-rendering process;

s5: sequentially taking other images in the image sequence, and performing incremental reconstruction based on a reconstruction result of a previous image; traversing the picture sequence to finally obtain an incremental reconstruction result;

wherein each incremental reconstruction comprises: randomly initializing the camera pose corresponding to the current image, and repeatedly executing the steps S2, S3 and S4, wherein only the pose of the camera is updated in the repeated execution process, and the parameters of the neural network model are kept unchanged; then, steps S2, S3, and S4 are repeatedly performed again, and only the parameters of the neural network model are updated during the repeated execution.

Optionally, the pose information of the camera comprises a camera position and a viewing direction.

Optionally, the inputs of the neural network model are camera position, observation direction and observed 3D coordinates, and the output is color c and volume density σ of the 3D point.

Optionally, in step S4, the parameters of the neural network model and the pose information of the camera are updated, specifically: and calculating the gradient of the neural network model and the gradient of the camera parameter by using the gradient information obtained in the micro-rendering process, and updating the parameter of the neural network model and the attitude information of the camera by using an Adam algorithm.

Optionally, between step S3 and step S4, the difference between the rendered color and the actual picture color is also calculated as the error of reconstruction.

Optionally, in step S4, the randomly sampling pixel points on the image, specifically extracting pixel points from the region of interest of each image; the determination method of the interest area comprises the following steps: and (3) carrying out convolution on the gradient image (gradient information) by using the blob and the corner kernel, then carrying out non-maximum inhibition to obtain an image interest point, and then generating an interest region by using a morphological dilation algorithm.

Optionally, the incremental reconstruction described in step S5 further performs the following processing in consideration of dynamic object or illumination influence:

firstly, using the rendering equation capable of micro-rendering in the step S3 as the static part of the model, then adding the transient part, and rendering the color and density of the transient, wherein the density allows the change in the training image;

secondly, an uncertainty field is allowed to be transmitted by the transient part, so that the model can adjust reconstruction loss and ignore unreliable pixels and 3D positions;

and modeling the color of each pixel into isotropic normal distribution, performing maximum likelihood estimation, and rendering the variance of the distribution as a transient color by using volume rendering.

In a second aspect, an incremental three-dimensional reconstruction apparatus comprises:

the image sequence acquisition module is used for acquiring an image sequence of a target scene; the picture sequence comprises a plurality of images which are obtained correspondingly by shooting a target scene from multiple angles;

the projection ray calculation module is used for calculating a back projection ray from each pixel point of the image to the center position of the camera aiming at the currently selected image;

the discrete sampling and rendering module is used for setting a depth range of the back projection ray, discretely sampling a series of 3D points on the back projection ray in the depth range, solving the points through a neural network model represented by a multilayer perceptron, and then carrying out micro-rendering on the pixel points by using a volume rendering principle to obtain a rendering result of the pixel points, wherein gradient information is obtained in the micro-rendering process;

the implicit reconstruction module is used for randomly sampling pixel points on the image, performing implicit reconstruction on the model, and obtaining an initial reconstruction result by iteratively operating the projection ray calculation module and the discrete sampling and rendering module; updating parameters of the neural network model and attitude information of the camera in the implicit reconstruction process, and utilizing gradient information obtained in the micro-rendering process;

the incremental reconstruction module is used for sequentially taking other images in the picture sequence and carrying out incremental reconstruction based on the reconstruction result of the previous image; traversing the picture sequence to finally obtain an incremental reconstruction result;

wherein each incremental reconstruction comprises: randomly initializing a camera pose corresponding to a current image, and repeatedly operating a projection ray calculation module, a discrete sampling and rendering module and an implicit reconstruction module, wherein only the pose of the camera is updated in the repeated operation process, and parameters of the neural network model are kept unchanged; and then, repeatedly operating the projection ray calculation module, the discrete sampling and rendering module and the implicit reconstruction module again, and only updating parameters of the neural network model in the repeated operation process.

In a third aspect, a computer device comprising a memory and a processor, the memory storing a computer program, is characterized in that the processor implements the steps of the above method when executing the computer program.

In a fourth aspect, a computer-readable storage medium, on which a computer program is stored, is characterized in that the computer program realizes the steps of the above-mentioned method when being executed by a processor.

The invention has at least the following beneficial effects:

the method uses a nerve radiation field technology to implicitly reconstruct a three-dimensional model, input data are picture sequences, and parameters to be estimated are the posture and the three-dimensional model of a camera; the rendering process from the 3D voxel model to the 2D image is realized by using a micro-rendering algorithm by using a volume rendering principle, so that the three-dimensional model can be subjected to nonlinear optimization by using differentiation, and an end-to-end three-dimensional reconstruction method is realized; based on the method, a plurality of images which are correspondingly obtained by shooting the target scene in multiple angles are traversed, and an incremental reconstruction result is finally obtained, so that the method is simple and convenient to operate and high in precision.

Drawings

In order to more clearly illustrate the prior art and the present invention, the drawings which are needed to be used in the description of the prior art and the embodiments of the present invention will be briefly described. It should be apparent that the drawings in the following description are merely exemplary, and that other drawings may be derived from the provided drawings by those of ordinary skill in the art without inventive effort.

The structures, proportions, sizes, and other dimensions shown in the specification are for illustrative purposes only and are not intended to limit the scope of the present invention, which is defined by the claims, and it is to be understood that all such modifications, changes in proportions, or alterations in size which do not affect the efficacy or objectives of the invention are not to be seen as within the scope of the present invention.

Fig. 1 is a schematic flow chart of an incremental three-dimensional reconstruction method according to an embodiment of the present invention;

fig. 2 is a schematic view of a virtual module architecture of an incremental three-dimensional reconstruction apparatus according to an embodiment of the present invention;

FIG. 3 is a sample of the actual processing results of one embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, there is provided an incremental three-dimensional reconstruction method, including the steps of:

The key steps of the method at least comprise:

(1) calculating a back projection ray of each pixel point of each view;

(2) discrete sampling and rendering of projection ray rays;

(3) implicit reconstruction of the model;

(4) inverse solution of camera parameters (parameters of model M are updated using Adam algorithm).

The method is a three-dimensional reconstruction algorithm which is simple and convenient to operate and high in precision, and a three-dimensional model is implicitly reconstructed by using a nerve radiation field technology; the input data is a picture sequence, and the pose and the three-dimensional model of the camera are parameters to be estimated. The method also utilizes a volume rendering principle and a micro-rendering algorithm to realize the rendering process from the 3D voxel model to the 2D image, so that the three-dimensional model can utilize the differential to carry out nonlinear optimization, thereby realizing the end-to-end three-dimensional reconstruction method, finally obtaining the incremental reconstruction result, and having simple and convenient operation and high precision.

Illustratively, the specific description is as follows:

1. firstly, shooting a scene or an object at multiple angles, and recording camera attitude information;

2. camera reference is represented by K

The camera external parameters are represented by a conductive SE (3),

3. model M is characterized using a multi-layer perceptron. The parameters of the perceptron imply a three-dimensional model. The inputs to the neural network are the camera position, the direction of observation, and the 3D coordinates being observed, and the output is the color c and the volume density σ of the 3D point.

4. And calculating the back projection ray of one pixel point according to the internal and external parameters of the image, wherein the central position of the camera is the starting point of the ray. The depth range of the ray is set according to the size of the model, and then sampling is carried out on the ray in the depth range, so that a series of 3D points are obtained. And inputting the camera position, the projection ray and the 3D point coordinate into the model M to obtain the color and the volume density of each 3D point. And solving the rendering result of the pixel point by using a volume rendering formula.

5. Calculating a rendering loss: and calculating the difference between the rendering color and the actual picture color as the reconstruction error.

6. And calculating the gradient of the model M and the gradient of the camera parameter by using the gradient information obtained in the micro-rendering process. Parameters of the model M, parameters of the camera are updated using Adam algorithm.

7. And randomly sampling pixel points. During reconstruction, pixel points need to be randomly sampled on an image. And (4) convolving the gradient image by using the blob and the kernel, and then executing non-maximum suppression to obtain the image interest points. The region of interest is then generated using a morphological dilation algorithm. And when random sampling is carried out, extracting the interest region of each graph. This can significantly improve training speed and accuracy.

8. And repeating the steps 4-7 to obtain an initial reconstruction result.

9. And subsequently, shooting a new scene picture I, and performing incremental reconstruction. Due to dynamic objects or illumination effects, the image may not conform to the model, and therefore the following is also done:

1) first, the rendering equation in step 4 is taken as the static part of our model, then we add the transient part, rendering the color and density of the transient, where the density allows for changes in the training image. This allows the reconstruction of images containing occlusions without introducing a dynamic blur component to the static scene representation.

2) Second, not assuming that all observed pixel colors are equally reliable, allowing our transient part to emit uncertainty fields (color and density) can let the model adjust reconstruction loss, ignoring unreliable pixels and 3D locations that are likely to contain transient occlusions.

3) The color of each pixel is modeled as an isotropic normal distribution and a maximum likelihood estimation is performed, and then the variance of the distribution is rendered as a transient color using volume rendering.

10. The camera pose of the picture is initialized and the steps 4-7 are repeated, but only the pose of the camera is updated, keeping the parameters of the model M unchanged.

And 4-7 steps are executed again, parameters of the model M are updated by using the Adam algorithm, and an incremental reconstruction result is obtained.

In one embodiment, there is provided an incremental three-dimensional reconstruction apparatus, as shown in fig. 2, comprising:

For the specific limitations of the above apparatus, reference may be made to the above limitations of the incremental three-dimensional reconstruction method, which are not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

The device uses a nerve radiation field technology to implicitly reconstruct a three-dimensional model, input data is a picture sequence, and parameters to be estimated are the posture and the three-dimensional model of a camera; the rendering process from the 3D voxel model to the 2D image is realized by using a micro-rendering algorithm by using a volume rendering principle, so that the three-dimensional model can be subjected to nonlinear optimization by using differentiation, and end-to-end three-dimensional reconstruction is realized; based on the method, a plurality of images which are correspondingly obtained by shooting the target scene in multiple angles are traversed, and an incremental reconstruction result is finally obtained, so that the method is simple and convenient to operate and high in precision.

Fig. 3 shows the effect of applying the above-described exemplary scheme incremental reconstruction.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory storing a computer program, and all or part of the procedures in the method of the above embodiment are involved.

In one embodiment, a computer-readable storage medium having a computer program stored thereon is provided, which relates to all or part of the processes of the method of the above embodiments.

The above specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

All the technical features of the above embodiments can be arbitrarily combined (as long as there is no contradiction between the combinations of the technical features), and for brevity of description, all the possible combinations of the technical features in the above embodiments are not described; these examples, which are not explicitly described, should be considered to be within the scope of the present description.

The present invention has been described in considerable detail by the general description and the specific examples given above. It should be noted that it is obvious that several variations and modifications can be made to these specific embodiments without departing from the inventive concept, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An incremental three-dimensional reconstruction method, comprising the steps of:

2. The incremental three-dimensional reconstruction method of claim 1, wherein the pose information of the camera comprises a camera position and a viewing direction.

3. The incremental three-dimensional reconstruction method according to claim 2, wherein the inputs of the neural network model are camera position, observation direction and observed 3D coordinates, and the output is color c and volume density σ of the 3D point.

4. The incremental three-dimensional reconstruction method according to claim 1, wherein in step S4, the parameters of the neural network model and the pose information of the camera are updated, specifically: and calculating the gradient of the neural network model and the gradient of the camera parameter by using the gradient information obtained in the micro-rendering process, and updating the parameter of the neural network model and the attitude information of the camera by using an Adam algorithm.

5. The incremental three-dimensional reconstruction method according to claim 1, wherein between step S3 and step S4, a difference between a rendered color and an actual picture color is further calculated as an error of reconstruction.

6. The incremental three-dimensional reconstruction method according to claim 1, wherein in step S4, the randomly sampling pixel points on the image, specifically extracting from the region of interest of each image; the determination method of the interest area comprises the following steps: and (3) convolving the gradient image by using the blob and the corner kernel, then executing non-maximum suppression to obtain an image interest point, and then generating an interest region by using a morphological dilation algorithm.

7. The incremental three-dimensional reconstruction method according to claim 1, wherein the incremental reconstruction in step S5 further comprises the following steps in consideration of dynamic object or illumination effect:

8. An incremental three-dimensional reconstruction apparatus, comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.