CN107993260A

CN107993260A - A kind of light field image depth estimation method based on mixed type convolutional neural networks

Info

Publication number: CN107993260A
Application number: CN201711337965.XA
Authority: CN
Inventors: 林丽莉; 潘志伟; 周文晖
Original assignee: Zhejiang Gongshang University
Current assignee: Zhejiang Gongshang University
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2018-05-04

Abstract

The invention discloses a kind of light field image depth estimation method based on mixed type convolutional neural networks.The present invention includes the training of the structure, convolutional neural networks model of training dataset, generates the estimation of Depth figure of light field image.Light field depth calculation problem is converted to classification problem by the present invention, efficiently uses the contact between regional area pixel depth；Light field data is represented by four-dimensional parameter；The present invention utilizes the EPI image cathetus slopes relation proportional to scene depth of light field image, and using EPI images as a medium, four-dimension light field image is mapped in two dimensional image.By extracting the EPI blocks region in light field image centre visual angle image corresponding to pixel, by the way of polar curve segment region pair, a kind of training dataset of new light field image estimation of Depth is constructed.Present invention utilizes advantage of the deep learning in terms of feature abstraction, and the Stability and veracity of its estimation of Depth is than general conventional method advantageously.

Description

A kind of light field image depth estimation method based on mixed type convolutional neural networks

Technical field

The present invention relates to light field (Light Field) image domains and deep learning (Deep Learning) field, specifically It is related to a kind of light field image depth estimation method based on mixed type convolutional neural networks.

Background technology

In recent years, as the fast development of photoelectric technology, new imaging device continue to bring out, optical field imaging is as a kind of new Emerging 3 Dimension Image Technique enjoys researcher to pay close attention to its unique imaging process.Traditional camera can only record one it is two-dimentional flat Face, can not obtain depth information of scene, and optical field imaging can record four-dimensional position and direction of the light radiation in communication process Information, thus in image reconstruction process, more abundant image information can be obtained.Optical field imaging can realize digital reunion Jiao, synthetic aperture, obtain big depth image, and three-dimensional reconstruction.As light field technology constantly moves to maturity, light field estimation of Depth Receive more and more attention, research both domestic and external at present focuses primarily upon the following aspects：

Light field data is carried out 4D parametrizations to represent, direction and the angle information of capture is made full use of, improves depth calculation As a result accuracy；Light field data is utilized into the lax expression of polar plane image, improves the time efficiency during depth calculation；Profit With the 4D architectural features of light field, model is built, makes full use of the abundant information of various visual angles, realizes that the exact depth for blocking scene is estimated Meter；Pass through interpolation and local restriction, the resolution ratio of raising scene depth result of calculation.

At present, the estimation of Depth in relation to light field has been achieved for bigger progress, and the method for mainstream can substantially divide at present For three classes：

(1) estimation of Depth based on polar plane image.According to the polar curve slope of polar figure (Epipolar Image, EPI) Proportional relation, Goldluecke ask texture oblique using the structure tensor of EPI image gradients with light field image Scene depth Rate, using the energy-saving global restriction of energy function is minimized, realizes the estimation of depth information.

(2) estimation of Depth based on various visual angles.According to its constraints, various visual angles depth estimation method can be divided into three Kind：Estimated based on matched various visual angles estimation of Depth, the various visual angles estimation of Depth based on model, colour consistency various visual angles depth Meter.Various visual angles depth estimation method utilizes the matching relationship estimation scene depth between different visual angles, can effectively eliminate visual angle string Disturb and the influence of light field aliasing effect.

(3) estimation of Depth based on refocusing.Using digital refocusing technology, a series of focus stack diagrams can be obtained Picture, by being detected to its being focused property, can accurately estimate scene depth.

Due to illumination variation and different scenes complexity, the estimation of Depth performance of existing algorithm to varying degrees by To limitation, the accuracy of algorithm has much room for improvement, and is easily influenced be subject to noise in image, easily in depth locus of discontinuity There is erroneous estimation in domain, texture weaker area, cause accuracy relatively low.

The content of the invention

The present invention is directed to the deficiency of above-mentioned light field image depth computing method, and deep learning innovatively is applied to light field Picture depth calculating field, proposes a kind of light field image depth estimation method based on mixed type convolutional neural networks.The present invention Light field depth calculation problem is converted into classification problem, efficiently uses the contact between regional area pixel depth.Light field data Represented by four-dimensional parameter, therefore be difficult to directly apply to two-dimensional convolution neural network structure.The present invention utilizes the EPI of light field image Image cathetus slope relation proportional to scene depth, using EPI images as a medium, is mapped to four-dimension light field image In two dimensional image.By extracting the EPI blocks region in light field image centre visual angle image corresponding to pixel, using polar curve segment Region constructs a kind of training dataset of new light field image estimation of Depth to the mode of (EPI Patch-pairs).

The technical solution adopted by the present invention to solve the technical problems includes the following steps：

Step 1. extracts polar figure：

For the light field picture in light field image data set, with centre visual angle image hanging down per a line and corresponding to each row Nogata carries out polar figure extraction to horizontal direction；

Step 2. generation polar curve segment region pair：

Extract the polar curve segment horizontally and vertically corresponding to each pixel of polar figure, and with horizontal direction and Vertical direction is one group, generation polar curve segment region pair；

Step 3. builds training data：

Edge detection is carried out to center multi-view image, by the polar curve block region corresponding to the unsharp pixel region of texture It is right, be considered as invalid data and removed from data set, in order to obtain more preferable training effect, by the characteristic in data set into Row equilibrating is handled, and is made the quantity of each feature identical in error range, is built final training dataset；

Step 4. neural network model is trained：

Mixed type convolutional neural networks are built, mixed type convolutional neural networks are carried out using the data set having had been built up Training, generates model file.

Extraction polar figure described in step 1, specifically：

Use L_F(x, y, s, t) represents 4D light field datas, wherein (x, y) representation space coordinate, (s, t) represents microlens array Middle sub-lens coordinate, centre visual angle image refer to the image (s=s that light passes through main lens to be formed₀) for centre visual angle figure Pixel p (the x of picture_i,y_i), the EPI of horizontal direction can use L_F(x, y, s, t) represents that size is N_s×N_x, each pixel level side To EPI blocks be exactly with (x_i,s₀) centered on a block size be (N_s×W_x) region；It can similarly obtain, row coordinate is x_iHang down Nogata to EPI L can be expressed as in light field image_F(x, y, s, t), size N_t×N_y, the EPI blocks region of vertical direction is just It is with (y_i,t₀) centered on size be (N_t×W_y) region.

Generation polar curve segment region pair described in step 2, specifically：

The light-field camera of use uses 9 × 9 lens array, and the resolution ratio of each sub-lens imaging is 512 × 512, So the size of polar figure is 512 × 9, the corresponding polar curve block region obtained of each centre visual angle graphical pixel point is with current Centered on location of pixels, size is 13 × 9 region；Pixel is corresponded to the Liang Ge areas horizontally and vertically got Domain can extract multiple polar curves pair as one group of polar curve pair, each illumination field patterns.

Structure training data described in step 3, specifically：

Judge pixel whether be invalid data process, the method for use is：Centering is carried out using Canny operators Heart multi-view image carries out edge detection, is to be effective by this polar curve segment region if more clearly edge can be detected Data, if cannot, it is considered as invalid data and weeds out.

Structure training data described in step 3, the method that the process of data set balance uses is to use the side of repeat replication Formula, makes the training set quantity of different characteristic tend to balance.

Mixed type convolutional neural networks described in step 4, specific structure are as follows：

Connected using two relatively independent convolutional neural networks, first order neutral net is first trained, by the first order Training data of the prediction result of neutral net as second level convolutional neural networks；First order convolutional neural networks are specially：

The input of first order convolutional neural networks is the polar curve segment region pair of extraction, in order to enable the network to learning level Different characteristic on direction and vertical direction polar curve segment region, horizontal direction is handled using two independent convolution sub-networks With the polar curve segment region of vertical direction, each sub-network has 7 layers of convolutional layer, and the size of convolution kernel is 2 × 2, first layer volume The quantity of product core is 16, generates 16 characteristic patterns, and each layer below will double；Behind convolutional layer with a full articulamentum phase Even, the output of layer 7 convolutional layer is integrated into one-dimensional characteristic vector；Output layer is the classification layer of convolutional neural networks, receives and From the output of two sub-networks, the scope of output is exactly the scope classified；The estimation of Depth scope of light field image in database For -4~4, in the range of this, 229 classes, precision 0.035 are classified as, Softmax functions are received to come from two sons The feature vector of network；In the training stage, using cross entropy as loss function, it is specifically defined as：

Wherein, y is the output valve of mixed type convolutional neural networks, y^gtIt is label value, wherein, λ₁=0.5, λ₂=0.25.

The second level convolutional neural networks are specially：

The training data input of second level convolutional neural networks is the prediction result of first order convolutional neural networks, by light field Light field image in database carries out regression forecasting with trained first order convolutional neural networks, generates primary light field figure The estimation of Depth figure of picture, as the training data of second level convolutional neural networks；Second level convolutional neural networks are by one Convolutional layer and a full articulamentum composition, convolutional layer use 7 × 7 convolution kernel, and convolution kernel number is 32, the characteristic pattern number of generation Measure as 32, be connected behind convolutional layer with a full articulamentum, the final result of output light field picture depth figure.

The beneficial effects of the invention are as follows：

The present invention light field image scene depth is estimated using the method for deep learning, contain data set structure with In two stages of convolutional neural networks model training, build the stage, it is proposed that a kind of new light field image data set structure in data set Construction method, dimensionality reduction is carried out by light field data, and training dataset is reached enough quantity, meets network training to data set It is required that.In the convolutional neural networks training stage, it is proposed that a kind of convolutional neural networks structure of mixed type, employs two relatively Independent sub-network structure, first order neutral net are trained using the training set built.Second level neutral net is by The prediction result of level-one neutral net continues the training of second level network as training set.Being experimentally confirmed this method has Obvious performance boost.Compared to the scene depth computational methods of traditional light field image, the present invention asks light field depth calculation Topic is converted to classification problem, efficiently uses the contact between regional area pixel depth.The present invention is to the scene in light field image Estimation of Depth is carried out, improves scene depth is estimated in light field image accuracy and stability.

Brief description of the drawings

Fig. 1 is the method flow diagram of the light field image estimation of Depth based on mixed type convolutional neural networks

Fig. 2 is the primary network station concrete structure diagram of the mixed type convolutional neural networks of the present invention.

Fig. 3 is the two grade network concrete structure diagram of the mixed type convolutional neural networks of the present invention.

Embodiment

The present invention is made further instructions below in conjunction with the accompanying drawings：

As shown in Figs. 1-3, a kind of light field image depth estimation method based on mixed type convolutional neural networks, specific implementation It is as follows：

Step 1. extraction polar figure (Epipolar Image, EPI)：

For the light field picture in light field image data set, with centre visual angle image hanging down per a line and corresponding to each row Nogata carries out polar figure extraction (EPI Extraction) to horizontal direction.

Step 2. generates polar curve segment region to (EPI-Patch Pairs)：

The polar curve segment (EPI-Patch) horizontally and vertically corresponding to each pixel of polar figure is extracted, and With horizontally and vertically for one group, to (EPI-Patch Pairs), each light field image can in generation polar curve segment region The a large amount of this data of generation.

Step 3. training data (Training Data) is built：

Edge detection is carried out to center multi-view image, by the polar curve block region corresponding to the unsharp pixel region of texture It is right, be considered as invalid data and removed from data set, in order to obtain more preferable training effect, by the characteristic in data set into Row equilibrating is handled, and is made the quantity of each feature essentially identical, is built final training dataset.

Step 4. neural network model is trained：

Extraction polar figure described in step 1, specifically：

Use L_F(x, y, s, t) represents 4D light field datas, wherein (x, y) representation space coordinate, (s, t) represents microlens array Middle sub-lens coordinate, centre visual angle image refer to the image (s=s that light passes through main lens to be formed₀) for centre visual angle figure Pixel p (the x of picture_i,y_i), the EPI of horizontal direction can use L_F(x, y, s, t) represents that size is N_s×N_x, each pixel level side To EPI blocks be exactly with (x_i,s₀) centered on a block size be (N_s×W_x) region, can similarly obtain, row coordinate is x_iHang down Nogata to EPI L can be expressed as in light field image_F(x, y, s, t), its size are N_t×N_y, the EPI blocks area of vertical direction Domain is exactly with (y_i,t₀) centered on size be (N_t×W_y) region.

Generation polar curve segment region pair described in step 2, specifically：

The light-field camera of use uses 9 × 9 lens array, and the resolution ratio of each sub-lens imaging is 512 × 512, So the size of polar figure (EPI) is 512 × 9, the polar curve block region of the corresponding acquirement of each centre visual angle graphical pixel point is Centered on current pixel position, size is 13 × 9 region.Pixel is corresponded to get horizontally and vertically As one group of polar curve pair, each illumination field patterns can extract many this polar curves pair in two regions.

The structure of training data described in step 3, specifically：

First by the polar curve corresponding to the unsharp pixel region of texture to being considered as invalid data and being picked from data set Remove, in order to obtain more preferable training effect, the characteristic in data set is balanced processing, makes the quantity of each feature It is essentially identical, data set is finally converted to the input format of convolutional neural networks requirement, builds training dataset.Judging picture Whether vegetarian refreshments is invalid process, and the method for use is：Carry out carrying out edge inspection to center multi-view image using Canny operators Survey, if more clearly edge can be detected, by this polar curve segment region be for valid data, if cannot, be considered as nothing Effect data simultaneously weed out.The process of data set balance, using method be to use repeat replication by the way of, make the instruction of different characteristic Practice collection quantity to tend to balance.

Neural network model training described in step 4, specifically：

The data set built is trained by hybrid neural network, generates model file.It is of the invention innovative Mixed type convolutional neural networks are employed, it is specific as follows：

Present networks employ two relatively independent convolutional neural networks and connect, and first train first order neutral net, Training data using the prediction result of first order neutral net as second level convolutional neural networks.

First order convolutional neural networks are specially：

As shown in Fig. 2, the input of this network be previous step extraction polar curve segment region to (EPI Patch- Pairs), in order to make network to employ two with the different characteristic on learning level direction and vertical direction polar curve segment region A independent convolution sub-network handles polar curve segment region horizontally and vertically, and each sub-network has 7 layers of volume Lamination, the size of convolution kernel is 2 × 2, and the quantity of first layer convolution kernel is 16, generates 16 characteristic patterns

(featuremap) each layer behind will double, and be connected behind convolutional layer with a full articulamentum, by the 7th The output of layer convolutional layer is integrated into one-dimensional characteristic vector.Output layer is the classification layer of convolutional neural networks, and reception comes from two sons The output of network, the scope of output are exactly the scope that we classify, and the estimation of Depth scope of the light field image in database is -4 ~4, in the range of this, 229 classes, precision 0.035 are classified as, Softmax functions are received to come from two sub-networks Feature vector.It can be obtained using two sub-networks and more accurately classified.

In the training stage, using cross entropy as loss function, it is specifically defined as：

Wherein, y is the output valve of network model defined in us, y^gtIt is label value, in our experiment, λ₁= 0.5, λ₂=0.25.

Second level convolutional neural networks are specially：

As shown in figure 3, prediction of the training data input of second level convolutional neural networks for first order convolutional neural networks As a result, the light field image in light field data storehouse is subjected to regression forecasting with trained first order convolutional neural networks, it is raw Into the estimation of Depth figure of primary light field image, as the training data of second level convolutional neural networks.Second level convolution god It is made of through network a convolutional layer and a full articulamentum, convolutional layer uses 7 × 7 convolution kernel, and convolution kernel number is 32, raw Into characteristic pattern quantity be 32, be connected with a full articulamentum behind convolutional layer, the final result of output light field picture depth figure.

Claims

1. a kind of light field image depth estimation method based on mixed type convolutional neural networks, it is characterised in that including following step Suddenly：

Step 1. extracts polar figure：

For the light field picture in light field image data set, with Vertical Square of the centre visual angle image per a line and corresponding to each row Polar figure extraction is carried out to horizontal direction；

Step 2. generation polar curve segment region pair：

The polar curve segment horizontally and vertically corresponding to each pixel of polar figure is extracted, and with horizontal direction and vertically Direction is one group, generation polar curve segment region pair；

Step 3. builds training data：

Edge detection is carried out to center multi-view image, by the polar curve block region pair corresponding to the unsharp pixel region of texture, It is considered as invalid data and is removed from data set, in order to obtain more preferable training effect, the characteristic in data set is carried out Equilibrating is handled, and is made the quantity of each feature identical in error range, is built final training dataset；

Step 4. neural network model is trained：

Mixed type convolutional neural networks are built, mixed type convolutional neural networks are instructed using the data set having had been built up Practice, generate model file.

2. a kind of light field image depth estimation method based on mixed type convolutional neural networks according to claim 1, its It is characterized in that the extraction polar figure described in step 1, specifically：

Use L_F(x, y, s, t) represents 4D light field datas, wherein (x, y) representation space coordinate, (s, t) represents microlens array neutron Lens coordinate, centre visual angle image refer to the image (s=s that light passes through main lens to be formed₀) for centre visual angle image Pixel p (x_i,y_i), the EPI of horizontal direction can use L_F(x, y, s, t) represents that size is N_s×N_x, each pixel level direction EPI blocks are exactly with (x_i,s₀) centered on a block size be (N_s×W_x) region；It can similarly obtain, row coordinate is x_iVertical Square To EPI L can be expressed as in light field image_F(x, y, s, t), size N_t×N_y, the EPI blocks region of vertical direction be exactly with (y_i,t₀) centered on size be (N_t×W_y) region.

3. a kind of light field image depth estimation method based on mixed type convolutional neural networks according to claim 2, its It is characterized in that the generation polar curve segment region pair described in step 2, specifically：

The light-field camera of use uses 9 × 9 lens array, and the resolution ratio of each sub-lens imaging is 512 × 512, so The size of polar figure is 512 × 9, and the corresponding polar curve block region obtained of each centre visual angle graphical pixel point is with current pixel Centered on position, size is 13 × 9 region；Pixel is corresponded to horizontally and vertically two regions got to make For one group of polar curve pair, each illumination field patterns can extract multiple polar curves pair.

4. a kind of light field image depth estimation method based on mixed type convolutional neural networks according to claim 3, its It is characterized in that the structure training data described in step 3, specifically：

Judge pixel whether be invalid data process, the method for use is：Using Canny operators regard center Angle image carries out edge detection, if more clearly edge can be detected, by this polar curve segment region be for valid data, If cannot, it is considered as invalid data and weeds out.

5. a kind of light field image depth estimation method based on mixed type convolutional neural networks according to claim 4, its It is characterized in that the structure training data described in step 3, specifically：

The method that the process of data set balance uses is using by the way of repeat replication, makes the training set quantity of different characteristic tend to Balance.

6. a kind of light field image depth estimation method based on mixed type convolutional neural networks according to claim 4, its It is characterized in that the mixed type convolutional neural networks described in step 4, specific structure is as follows：

Connected using two relatively independent convolutional neural networks, first train first order neutral net, by first order nerve Training data of the prediction result of network as second level convolutional neural networks；First order convolutional neural networks are specially：

The input of first order convolutional neural networks is the polar curve segment region pair of extraction, in order to enable the network to learning level direction With the different characteristic on vertical direction polar curve segment region, handle horizontal direction using two independent convolution sub-networks and hang down Nogata to polar curve segment region, each sub-network has 7 layers of a convolutional layer, and the size of convolution kernel is 2 × 2, first layer convolution kernel Quantity be 16, generate 16 characteristic patterns, each layer below will double；It is connected behind convolutional layer with a full articulamentum, The output of layer 7 convolutional layer is integrated into one-dimensional characteristic vector；Output layer is the classification layer of convolutional neural networks, and reception comes from The output of two sub-networks, the scope of output are exactly the scope classified；The estimation of Depth scope of light field image in database for- 4~4, in the range of this, 229 classes, precision 0.035 are classified as, Softmax functions are received to come from two sub-networks Feature vector；In the training stage, using cross entropy as loss function, it is specifically defined as：

<mrow> <mi>&epsiv;</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>,</mo> <msup> <mi>y</mi> <mrow> <mi>g</mi> <mi>t</mi> </mrow> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mo>-</mo> <munder> <mo>&Sigma;</mo> <msup> <mi>y</mi> <mi>i</mi> </msup> </munder> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>,</mo> <msup> <mi>y</mi> <mrow> <mi>g</mi> <mi>t</mi> </mrow> </msup> <mo>)</mo> </mrow> <mo>*</mo> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mfrac> <msup> <mi>e</mi> <mrow> <mo>-</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> </mrow> </msup> <mrow> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <msup> <mi>e</mi> <msub> <mi>y</mi> <mi>j</mi> </msub> </msup> </mrow> </mfrac> <mo>)</mo> </mrow> </mrow>

<mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>,</mo> <msup> <mi>y</mi> <mrow> <mi>g</mi> <mi>t</mi> </mrow> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <msub> <mi>&lambda;</mi> <mn>1</mn> </msub> </mtd> <mtd> <mrow> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>=</mo> <msup> <msub> <mi>y</mi> <mi>i</mi> </msub> <mrow> <mi>g</mi> <mi>t</mi> </mrow> </msup> </mrow> </mtd> </mtr> <mtr> <mtd> <msub> <mi>&lambda;</mi> <mn>2</mn> </msub> </mtd> <mtd> <mrow> <mo>|</mo> <mrow> <msub> <mi>y</mi> <mi>i</mi> </msub> <mo>-</mo> <msup> <mi>y</mi> <mrow> <mi>g</mi> <mi>t</mi> </mrow> </msup> </mrow> <mo>|</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>w</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>

7. a kind of light field image depth estimation method based on mixed type convolutional neural networks according to claim 6, its It is characterized in that the second level convolutional neural networks are specially：

The training data input of second level convolutional neural networks is the prediction result of first order convolutional neural networks, by light field data Light field image in storehouse carries out regression forecasting with trained first order convolutional neural networks, generates primary light field image Estimation of Depth figure, as the training data of second level convolutional neural networks；Second level convolutional neural networks are by a convolution Layer and a full articulamentum composition, convolutional layer use 7 × 7 convolution kernel, and convolution kernel number is 32, and the characteristic pattern quantity of generation is 32, it is connected with a full articulamentum behind convolutional layer, the final result of output light field picture depth figure.