End-to-End Residual Network for Light Field Reconstruction on Raw Images and View Image Stacks
<p>LF input–output relationship for different reconstruction tasks; 8 × 8 views are reconstructed from 2 × 2 views to generate 60 novel views. Colored images represent ground-truth and input views, while gray images represent output views to be reconstructed; (<b>a</b>) 8 × 8 ground-truth views are used to train the network for different reconstruction tasks, (<b>b</b>) task 1: 2 × 2 − 8 × 8 extrapolation 0, (<b>c</b>) task 2: 2 × 2 − 8 × 8 extrapolation 1, (<b>d</b>) task 3: 2 × 2 − 8 × 8 extrapolation 2.</p> "> Figure 2
<p>Parameterization and visualization of 4D LF. (<b>a</b>) <span class="html-italic">L</span>(<span class="html-italic">u</span>, <span class="html-italic">v</span>, <span class="html-italic">s</span>, <span class="html-italic">t</span>) denotes the light beam from an arbitrary point P that crosses the two planes at the angular (<span class="html-italic">u</span>, <span class="html-italic">v</span>) and spatial (<span class="html-italic">s</span>, <span class="html-italic">t</span>) positions. (<b>b</b>) The 4D LF may be seen as a 2D array of view images, with neighboring images differing slightly from one another.</p> "> Figure 3
<p>The reconstruction of raw LF image and view image stack; (<b>a</b>) 2 × 2 views of LF, where each view is shown using one color and contains 9 pixels from 1 to 9 with size (<span class="html-italic">u</span>, <span class="html-italic">v</span>, <span class="html-italic">S</span>*, <span class="html-italic">T</span>*) = (2, 2, 3, 3), (<b>b</b>) The mapping into a raw LF image of size (<span class="html-italic">u</span> × <span class="html-italic">S</span>*, <span class="html-italic">v</span> × <span class="html-italic">T</span>*) = (6, 6), (<b>c</b>) The mapping into a view image stack of size (<span class="html-italic">uv</span>, <span class="html-italic">S</span>*, <span class="html-italic">T</span>*) = (4, 3, 3). (<b>d</b>) An example of a 4D LF image captured by a Lytro Illum camera with size (<span class="html-italic">U</span>, <span class="html-italic">V</span>, <span class="html-italic">S</span>, <span class="html-italic">T</span>) = (8, 8, 541, 376). (<b>e</b>) The raw LF of the image in (<b>d</b>) with size (<span class="html-italic">US</span>, <span class="html-italic">VT</span>) = (8 × 541, 8 × 376). (<b>f</b>) A close-up of a portion of the raw LF image shown in (<b>e</b>).</p> "> Figure 4
<p>A visual representation of the proposed network architecture. (<b>a</b>) The sparse set of input views are rearranged into raw LF representation LF<sub>LR</sub> using the periodic shuffling operator (PS) to reconstruct the input raw LF image, and this image is then fed into the LF reconstruction block (LFR) to generate the initial LF images LF<sub>VI</sub>. These images are then rearranged into the raw LF representation using the PS. The initial raw LF image LF<sub>I</sub> is fed into the LF augmentation block (LFA) to improve the quality and generate the final high-resolution image LF<sub>HR</sub>. (<b>b</b>) (LFR): this block is responsible for reconstructing dense LF images from a few input images. It can be called a magnifier where the magnification percentage depends on the reconstruction task (number of input images and the number of images to be reconstructed). (<b>c</b>) The residual block (RB) is the network’s central unit, consisting of cascaded convolutions and ReLU connected by a skip connection, where the red block represents the convolutional layer (Conv). In contrast, the blue ones represent the rectified linear unit (ReLU). (<b>d</b>) (LFA1): this block is responsible for enhancing the quality of the initial reconstructed LF images. It consists of one block that works on the raw LF images to perform an angular augmentation. (<b>e</b>) (LFA2): this block performs the same function as the LFA1 block; however, the first main block here works on the view stack images to perform a spatial augmentation rather than working on the raw LF images to perform an angular augmentation. Where LFA1 is used for the extrapolation tasks (task 2: 2 × 2 − 8 × 8 extrapolation 1, and task 3: 2 × 2 − 8 × 8 extrapolation 2.), while LFA2 is used for the interpolation task (task 1: 2 × 2 − 8 × 8 Extrapolation 0) as shown in <a href="#sensors-22-03540-f001" class="html-fig">Figure 1</a>.</p> "> Figure 5
<p>Comparison of LF image reconstruction to other approaches using the relevant ground-truth images on task 1: 2 × 2 − 8 × 8 extrapolation 0 [<a href="#B15-sensors-22-03540" class="html-bibr">15</a>,<a href="#B45-sensors-22-03540" class="html-bibr">45</a>]. Additionally, error maps are given between the reconstructed LF images and the relevant ground truth images. A diagram shows the input, output, and images to be reconstructed relationships on the left-hand side. Red boxes indicate the extracted EPIs, whereas blue and green boxes indicate a close-up of reconstructed LF image parts. The performance of our strategy is proven through error maps, for example, by the red ellipse around the tree limb in the Cars scene. The red ellipse in the rock scene denotes the complicated area of the automobile with the rock barrier and the longitudinal leaf.</p> ">
Abstract
:1. Introduction
- We present a deep residual convolutional neural network (CNN) for reconstructing high-quality LF pictures. Our network was built so that it can be used to model different interpolation and extrapolation tasks for LF reconstruction with the same network architecture.
- We fully trained our model using raw LF photos, enabling the network to represent the non-local characteristics of the 4D LF images more effectively. Furthermore, utilizing raw LF pictures simplifies our work by converting it from an image reconstruction to an image-to-image translation.
- Comprehensive experiments on challenging datasets demonstrate our model’s ability to outperform the state-of-the-art methods to reconstruct LF images in different tasks.
2. Related Work
2.1. Depth-Dependent LF Reconstruction
2.2. Depth-Independent LF Reconstruction
3. Methodology
3.1. Problem Formulation
3.2. Raw LF Image and View Image Stack Reconstruction
3.3. Network Architecture
3.3.1. Overview
3.3.2. Loss Function
3.3.3. Training Details
4. Experiments and Discussion
4.1. Comparison with the State-of-the-Art
Interpolation Task (2 × 2 − 8 × 8 Extrapolation 0)
4.2. Extrapolation Tasks (2: 2 × 2 − 8 × 8 Extrapolation 1, 2)
4.3. Ablation Study
5. Limitations and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Adelson, E.H.; Bergen, J.R. The plenoptic function and the elements of early vision. In Computational Models of Visual Processing; Vision and Modeling Group Media Laboratory Massachusetts Institute of Technology: Hong Kong, China, 1991; Volume 2. [Google Scholar]
- Levoy, M.; Hanrahan, P. Light field rendering. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 4–9 August 1996; pp. 31–42. [Google Scholar]
- Zhang, W.; Zhao, S.; Zhou, W.; Chen, Z. None ghosting artifacts stitching based on depth map for light field image. In Proceedings of the Pacific Rim Conference on Multimedia, Hefei, China, 21–22 September 2018; pp. 567–578. [Google Scholar]
- Yücer, K.; Sorkine-Hornung, A.; Wang, O.; Sorkine-Hornung, O. Efficient 3D object segmentation from densely sampled light fields with applications to 3D reconstruction. ACM Trans. Graph. TOG 2016, 35, 22. [Google Scholar] [CrossRef]
- Wang, Y.; Wu, T.; Yang, J.; Wang, L.; An, W.; Guo, Y. DeOccNet: Learning to see through foreground occlusions in light fields. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 2–5 March 2020; pp. 118–127. [Google Scholar]
- Wang, Y.; Yang, J.; Guo, Y.; Xiao, C.; An, W. Selective light field refocusing for camera arrays using bokeh rendering and superresolution. IEEE Signal Processing Lett. 2018, 26, 204–208. [Google Scholar] [CrossRef]
- Wang, W.; Lin, Y.; Zhang, S. Enhanced Spinning Parallelogram Operator Combining Color Constraint and Histogram Integration for Robust Light Field Depth Estimation. IEEE Signal Processing Lett. 2021, 28, 1080–1084. [Google Scholar] [CrossRef]
- Wang, A. Three-Stream Cross-Modal Feature Aggregation Network for Light Field Salient Object Detection. IEEE Signal Processing Lett. 2020, 28, 46–50. [Google Scholar] [CrossRef]
- Wilburn, B.; Joshi, N.; Vaish, V.; Talvala, E.-V.; Antunez, E.; Barth, A.; Adams, A.; Horowitz, M.; Levoy, M. High performance imaging using large camera arrays. In Proceedings of the ACM SIGGRAPH 2005 Special Interest Group on Computer Graphics and Interactive Techniques Conference, Los Angeles, CA, USA, 31 July–4 August 2005; pp. 765–776. [Google Scholar]
- Raytrix. Available online: https://raytrix.de/ (accessed on 13 March 2022).
- Georgiev, T.G.; Lumsdaine, A. Focused plenoptic camera and rendering. J. Electron. Imaging 2010, 19, 021106. [Google Scholar]
- Liang, Z.; Wang, Y.; Wang, L.; Yang, J.; Zhou, S. Light field image super-resolution with transformers. arXiv 2021, arXiv:2108.07597. [Google Scholar]
- Wang, Y.; Yang, J.; Wang, L.; Ying, X.; Wu, T.; An, W.; Guo, Y. Light field image super-resolution using deformable convolution. IEEE Trans. Image Process. 2020, 30, 1057–1071. [Google Scholar] [CrossRef] [PubMed]
- Liu, G.; Yue, H.; Wu, J.; Yang, J. Intra-Inter View Interaction Network for Light Field Image Super-Resolution. IEEE Trans. Multimed. 2021. early access. [Google Scholar] [CrossRef]
- Kalantari, N.K.; Wang, T.-C.; Ramamoorthi, R. Learning-based view synthesis for light field cameras. ACM Trans. Graph. TOG 2016, 35, 1–10. [Google Scholar] [CrossRef] [Green Version]
- Liu, D.; Huang, Y.; Wu, Q.; Ma, R.; An, P. Multi-Angular Epipolar Geometry Based Light Field Angular Reconstruction Network. IEEE Trans. Comput. Imaging 2020, 6, 1507–1522. [Google Scholar] [CrossRef]
- Zhang, S.; Chang, S.; Shen, Z.; Lin, Y. Micro-Lens Image Stack Upsampling for Densely-Sampled Light Field Reconstruction. IEEE Trans. Comput. Imaging 2021, 7, 799–811. [Google Scholar] [CrossRef]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 405–421. [Google Scholar]
- Salem, A.; Ibrahem, H.; Kang, H.-S. Dual Disparity-Based Novel View Reconstruction for Light Field Images Using Discrete Cosine Transform Filter. IEEE Access 2020, 8, 72287–72297. [Google Scholar] [CrossRef]
- Salem, A.; Ibrahem, H.; Kang, H.-S. Light Field Reconstruction Using Residual Networks on Raw Images. Sensors 2022, 22, 1956. [Google Scholar] [CrossRef]
- Wanner, S.; Goldluecke, B. Spatial and angular variational super-resolution of 4D light fields. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; pp. 608–621. [Google Scholar]
- Wanner, S.; Goldluecke, B. Variational light field analysis for disparity estimation and super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 606–619. [Google Scholar] [CrossRef]
- Mitra, K.; Veeraraghavan, A. Light field denoising, light field superresolution and stereo camera based refocussing using a GMM light field patch prior. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012; pp. 22–28. [Google Scholar]
- Le Pendu, M.; Guillemot, C.; Smolic, A. A fourier disparity layer representation for light fields. IEEE Trans. Image Process. 2019, 28, 5740–5753. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jin, J.; Hou, J.; Yuan, H.; Kwong, S. Learning light field angular super-resolution via a geometry-aware network. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 11141–11148. [Google Scholar]
- Yeung, H.W.F.; Hou, J.; Chen, X.; Chen, J.; Chen, Z.; Chung, Y.Y. Light field spatial super-resolution using deep efficient spatial-angular separable convolution. IEEE Trans. Image Process. 2018, 28, 2319–2330. [Google Scholar] [CrossRef]
- Shi, L.; Hassanieh, H.; Davis, A.; Katabi, D.; Durand, F. Light field reconstruction using sparsity in the continuous fourier domain. ACM Trans. Graph. TOG 2014, 34, 1–13. [Google Scholar] [CrossRef]
- Vagharshakyan, S.; Bregovic, R.; Gotchev, A. Light field reconstruction using shearlet transform. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 133–147. [Google Scholar] [CrossRef] [Green Version]
- Wu, G.; Liu, Y.; Dai, Q.; Chai, T. Learning sheared EPI structure for light field reconstruction. IEEE Trans. Image Process. 2019, 28, 3261–3273. [Google Scholar] [CrossRef]
- Wu, G.; Liu, Y.; Fang, L.; Chai, T. Revisiting Light Field Rendering with Deep Anti-Aliasing Neural Network. IEEE Trans. Pattern Anal. Mach. Intell. 2021. early access. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, F.; Wang, Z.; Hou, G.; Sun, Z.; Tan, T. End-to-end view synthesis for light field imaging with pseudo 4DCNN. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 333–348. [Google Scholar]
- Wang, Y.; Liu, F.; Zhang, K.; Wang, Z.; Sun, Z.; Tan, T. High-fidelity view synthesis for light field imaging with extended pseudo 4DCNN. IEEE Trans. Comput. Imaging 2020, 6, 830–842. [Google Scholar] [CrossRef]
- Gul, M.S.K.; Gunturk, B.K. Spatial and angular resolution enhancement of light fields using convolutional neural networks. IEEE Trans. Image Process. 2018, 27, 2146–2159. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Salem, A.; Ibrahem, H.; Kang, H.S. Fast Light Field Image Super-Resolution Using Residual Networks. In Proceedings of the Korean Information Science Society Conference, Jeju City, Korea, 5–7 July 2021; pp. 389–392. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1646–1654. [Google Scholar]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1874–1883. [Google Scholar]
- Wang, L.; Guo, Y.; Lin, Z.; Deng, X.; An, W. Learning for video super-resolution through HR optical flow estimation. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; pp. 514–529. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Raj, A.S.; Lowney, M.; Shah, R.; Wetzstein, G. Stanford Lytro Light Field Archive. 2016. Available online: http://lightfields.stanford.edu/ (accessed on 28 December 2021).
- Kinga, D. Adam: A methodforstochasticoptimization. In Proceedings of the ICLR 2015 International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- Shi, J.; Jiang, X.; Guillemot, C. Learning fused pixel and feature-based view reconstructions for light fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2555–2564. [Google Scholar]
- Yeung, H.W.F.; Hou, J.; Chen, J.; Chung, Y.Y.; Chen, X. Fast light field reconstruction with deep coarse-to-fine modeling of spatial-angular clues. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 137–152. [Google Scholar]
Dataset | Kalantari [15] | Shi [44] | Yeung [45] | Zhang [17] | Proposed |
---|---|---|---|---|---|
30 Scenes | 40.11/0.979 | 41.12/0.985 | 41.21/0.982 | 41.98/0.986 | 42.33/0.985 |
Reflective | 37.35/0.954 | 38.10/0.958 | 38.09/0.959 | 38.71/0.962 | 38.86/0.962 |
Occlusions | 33.21/0.911 | 34.41/0.929 | 34.50/0.921 | 34.76/0.918 | 34.69/0.922 |
Average | 36.89/0.948 | 37.88/0.957 | 37.93/0.954 | 38.48/0.955 | 38.62/0.956 |
Dataset | Yeung [45] | Zhang [17] | Proposed |
---|---|---|---|
30 Scenes | 42.47/0.985 | 43.57/0.989 | 43.76/0.988 |
Reflective | 41.61/0.973 | 42.33/0.975 | 42.44/0.974 |
Occlusions | 37.28/0.934 | 37.61/0.937 | 37.93/0.948 |
Average | 40.45/0.964 | 41.17/0.967 | 41.38/0.970 |
Dataset | Yeung [45] | Zhang [17] | Proposed |
---|---|---|---|
30 Scenes | 42.74/0.986 | 43.41/0.989 | 43.43/0.987 |
Reflective | 41.52/0.972 | 42.09/0.975 | 42.26/0.975 |
Occlusions | 36.96/0.937 | 37.60/0.944 | 37.91/0.945 |
Average | 40.41/0.965 | 41.03/0.969 | 41.20/0.969 |
Spatial | Angular | 30 Scenes | Reflective | Occlusions | Average |
---|---|---|---|---|---|
X | X | 36.69/0.958 | 36.02/0.944 | 31.35/0.888 | 34.69/0.930 |
✓ | X | 37.30/0.963 | 36.36/0.946 | 31.60/0.891 | 35.09/0.933 |
X | ✓ | 41.53/0.981 | 38.62/0.959 | 34.46/0.926 | 38.20/0.955 |
✓ | ✓ | 42.33/0.985 | 38.86/0.962 | 34.69/0.922 | 38.62/0.956 |
Dataset | Inter | Aug1 | Aug2 | Aug3 |
---|---|---|---|---|
30 Scenes | 38.72/0.973 | 43.76/0.988 | 43.82/0.988 | 43.88/0.988 |
Reflective | 40.13/0.967 | 42.44/0.974 | 42.45/0.975 | 42.46/0.975 |
Occlusions | 34.50/0.927 | 37.93/0.948 | 37.91/0.947 | 37.98/0.948 |
Average | 37.78/0.955 | 41.38/0.970 | 41.39/0.970 | 41.44/0.970 |
Dataset | Inter | Aug1 | Aug2 | Aug3 |
---|---|---|---|---|
30 Scenes | 40.10/0.979 | 43.43/0.987 | 43.57/0.987 | 43.61/0.987 |
Reflective | 39.52/0.967 | 42.26/0.975 | 42.29/0.975 | 42.29/0.975 |
Occlusions | 35.01/0.930 | 37.91/0.945 | 38.01/0.946 | 37.93/0.945 |
Average | 38.21/0.959 | 41.20/0.969 | 41.29/0.969 | 41.28/0.969 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Salem, A.; Ibrahem, H.; Yagoub, B.; Kang, H.-S. End-to-End Residual Network for Light Field Reconstruction on Raw Images and View Image Stacks. Sensors 2022, 22, 3540. https://doi.org/10.3390/s22093540
Salem A, Ibrahem H, Yagoub B, Kang H-S. End-to-End Residual Network for Light Field Reconstruction on Raw Images and View Image Stacks. Sensors. 2022; 22(9):3540. https://doi.org/10.3390/s22093540
Chicago/Turabian StyleSalem, Ahmed, Hatem Ibrahem, Bilel Yagoub, and Hyun-Soo Kang. 2022. "End-to-End Residual Network for Light Field Reconstruction on Raw Images and View Image Stacks" Sensors 22, no. 9: 3540. https://doi.org/10.3390/s22093540
APA StyleSalem, A., Ibrahem, H., Yagoub, B., & Kang, H.-S. (2022). End-to-End Residual Network for Light Field Reconstruction on Raw Images and View Image Stacks. Sensors, 22(9), 3540. https://doi.org/10.3390/s22093540