research-article

Structure recovery from single omnidirectional image with distortion-aware learning

Authors:

Zhong ZhouAuthors Info & Claims

Volume 36, Issue 7

https://doi.org/10.1016/j.jksuci.2024.102151

Published: 01 September 2024 Publication History

Abstract

Recovering structures from images with 180∘ or 360∘ FoV is pivotal in computer vision and computational photography, particularly for VR/AR/MR and autonomous robotics applications. Due to varying distortions and the complexity of indoor scenes, recovering flexible structures from a single image is challenging. We introduce OmniSRNet, a comprehensive deep learning framework that merges distortion-aware learning with bidirectional LSTM. Utilizing a curated dataset with optimized panorama and expanded fisheye images, our framework features a distortion-aware module (DAM) for extracting features and a horizontal and vertical step module (HVSM) of LSTM for contextual predictions. OmniSRNet excels in applications such as VR-based house viewing and MR-based video surveillance, achieving leading results on cuboid and non-cuboid datasets. The code and dataset can be accessed at https://github.com/mmlph/OmniSRNet/.

References

[1]

Armeni Iro, Sax Sasha, Zamir Amir R., Savarese Silvio, Joint 2D-3D-semantic data for indoor scene understanding, 2017, arXiv:1702.01105 [cs.CV].

[2]

Baskurt, Didem, Nicolas, G., Guerrero, Josechu, 2011. Scene structure recovery from a single omnidirectional image. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 359–366.

[3]

Chen Xingyu, Yu Junzhi, Kong Shihan, Wu Zhengxing, Wen Li, Joint anchor-feature refinement for real-time accurate object detection in images and videos, IEEE Trans. Circuits Syst. Video Technol. 31 (2) (2021) 594–607.

[4]

Chhikara, Prateek, Kuhar, Harshul, Goyal, Anil, Sharma, Chirag, 2023. DIGITOUR: Automatic Digital Tours for Real-Estate Properties. In: International Conference on Data Science & Management of Data. 10th ACM IKDD CODS and 28th COMAD, pp. 223–227.

[5]

Cui Xiaoliang, Khan Dawar, He Zhenbang, Cheng Zhanglin, Fusing surveillance videos and three-dimensional scene: A mixed reality system, Comput. Animat. Virtual Worlds 34 (1) (2023).

[6]

Dai, Jifeng, Qi, Haozhi, Xiong, Yuwen, Li, Yi, Zhang, Guodong, Hu, Han, Wei, Yichen, 2017. Deformable Convolutional Networks. In: 2017 IEEE International Conference on Computer Vision. ICCV, pp. 764–773.

[7]

Felzenszwalb Pedro F., Girshick Ross B., McAllester David, Ramanan Deva, Object detection with discriminatively trained part-based models, IEEE Trans. Pattern Anal. Mach. Intell. 32 (9) (2010) 1627–1645.

Digital Library

[8]

Fernandez-Labrador Clara, Facil Jose M., Perez-Yus Alejandro, Demonceaux Cedric, Civera Javier, Guerrero Jose J., Corners for layout: End-to-end layout recovery from 360 images, IEEE Robot. Autom. Lett. 5 (2) (2020) 1255–1262.

[9]

Fernandez-Labrador Clara, Perez-Yus Alejandro, Lopez-Nicolas Gonzalo, Guerrero Jose J., Layouts from panoramic images with geometry and deep learning, IEEE Robot. Autom. Lett. 3 (4) (2018) 3153–3160.

[10]

Fu, Jun, Hou, Chen, Zhou, Wei, Xu, Jiahua, Chen, Zhibo, 2022. Adaptive hypergraph convolutional network for no-reference 360-degree image quality assessment. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 961–969.

[11]

Hedau, Varsha, Hoiem, Derek, Forsyth, David, 2009. Recovering the Spatial Layout of Cluttered Rooms. In: Proceedings of the IEEE International Conference on Computer Vision. ICCV, pp. 1849–1856.

[12]

Jaderberg, Max, Simonyan, Karen, Zisserman, Andrew, Kavukcuoglu, Koray, 2015. Spatial Transformer Networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. pp. 2017–2025.

[13]

Kingma, Diederik P., Ba, Jimmy Lei, 2015. Adam: A Method for Stochastic Optimization. In: ICLR 2015 : International Conference on Learning Representations 2015.

[14]

Lee, Chen-Yu, Badrinarayanan, Vijay, Malisiewicz, Tomasz, Rabinovich, Andrew, 2017. RoomNet: End-to-End Room Layout Estimation. In: 2017 IEEE International Conference on Computer Vision. ICCV, pp. 4875–4884.

[15]

Li, Mingyang, Zhou, Yi, Meng, Ming, Wang, Yuehua, Zhou, Zhong, 2019. 3D Room Reconstruction from A Single Fisheye Image. In: 2019 International Joint Conference on Neural Networks. IJCNN, pp. 1–8.

[16]

Liu, Shu, Qi, Lu, Qin, Haifang, Shi, Jianping, Jia, Jiaya, 2018. Path Aggregation Network for Instance Segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8759–8768.

[17]

Meng Ming, Xiao Likai, Zhou Zhong, Geometric-driven structure recovery from a single omnidirectional image based on planar depth map learning, Neural Comput. Appl. 35 (34) (2023) 24407–24433.

[18]

Meng, Ming, Xiao, Likai, Zhou, Yi, Li, Zhaoxin, Zhou, Zhong, 2021. Distortion-Aware Room Layout Estimation from A Single Fisheye Image. In: 2021 IEEE International Symposium on Mixed and Augmented Reality. ISMAR, pp. 441–449.

[19]

Mohan Rohit, Valada Abhinav, EfficientPS: Efficient panoptic segmentation, Int. J. Comput. Vis. 129 (5) (2021) 1–29.

[20]

Pérez-Yus, Alejandro, López-Nicolás, Gonzalo, Guerrero, José Jesús, 2016. Peripheral Expansion of Depth Information via Layout Estimation with Fisheye Camera. In: European Conference on Computer Vision. pp. 396–412.

[21]

Pintore, Giovanni, Agus, Marco, Gobbetti, Enrico, 2020. AtlantaNet: Inferring the 3D Indoor Layout from a Single 360°Image Beyond the Manhattan World Assumption. In: European Conference on Computer Vision. pp. 432–448.

[22]

Playout Clément, Ahmad Ola, Lécué Freddy, Cheriet Farida, Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems, 2021, arXiv preprint arXiv:2102.10191.

[23]

Raimundo Mendes Nilton Paulo, Santos Eduardo, Exploratory virtual model: Study and evaluation of a low-cost VR-based real estate sales tool, J. Geom. Graph. 26 (2022) 171–184.

[24]

Ramalingam, Srikumar, Pillai, Jaishanker K., Jain, Arpit, Taguchi, Yuichi, 2013. Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3065–3072.

[25]

Rao, Shivansh, Kumar, Vikas, Kifer, Daniel, Giles, C. Lee, Mali, Ankur, 2021. OmniLayout: Room Layout Reconstruction from Indoor Spherical Panoramas. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. CVPRW, pp. 3706–3715.

[26]

Shen, Zhijie, Zheng, Zishuo, Lin, Chunyu, Nie, Lang, Liao, Kang, Zheng, Shuai, Zhao, Yao, 2023. Disentangling orthogonal planes for indoor panoramic room layout estimation with cross-scale distortion awareness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17337–17345.

[27]

Singh Bharat, Najibi Mahyar, Sharma Abhishek, Davis Larry Steven, Scale normalized image pyramids with AutoFocus for object detection., IEEE Trans. Pattern Anal. Mach. Intell. (2021) 1.

[28]

Sun, Cheng, Hsiao, Chi-Wei, Sun, Min, Chen, Hwann-Tzong, 2019. HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 1047–1056.

[29]

Tateno, Keisuke, Navab, Nassir, Tombari, Federico, 2018. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images. In: Proceedings of the European Conference on Computer Vision. ECCV, pp. 732–750.

[30]

Wang, Haiyan, Hutchcroft, Will, Li, Yuguang, Wan, Zhiqiang, Boyadzhiev, Ivaylo, Tian, Yingli, Kang, Sing Bing, 2022. Psmnet: Position-aware stereo merging network for room layout estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8616–8625.

[31]

Xiongwei, Wu, Hoi, Steven, Sahoo, Doyen, 2021. PolarNet: Learning to Optimize Polar Keypoints for Keypoint Based Object Detection. In: ICLR 2021: The Ninth International Conference on Learning Representations.

[32]

Xu, Jiu, Stenger, Bjorn, Kerola, Tommi, Tung, Tony, 2017. Pano2CAD: Room Layout from a Single Panorama Image. In: 2017 IEEE Winter Conference on Applications of Computer Vision. WACV, pp. 354–362.

[33]

Xu Jiahua, Zhou Wei, Chen Zhibo, Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks, IEEE Trans. Circuits Syst. Video Technol. 31 (5) (2020) 1724–1737.

[34]

Yang, Yang, Jin, Shi, Liu, Ruiyang, Kang, Sing Bing, Yu, Jingyi, 2018. Automatic 3D Indoor Scene Modeling from Single Panorama. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3926–3934.

[35]

Yang Shang-Ta, Peng Chi-Han, Wonka Peter, Chu Hung-Kuo, PanoAnnotator: a semi-automatic tool for indoor panorama layout annotation, 2018, SIGGRAPH Asia 2018 Posters.

[36]

Yang, Shang-Ta, Wang, Fu-En, Peng, Chi-Han, Wonka, Peter, Sun, Min, Chu, Hung-Kuo, 2019. DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 3358–3367.

[37]

Yang, Hao, Zhang, Hui, 2016. Efficient 3D Room Shape Recovery from a Single Panorama. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. CVPR, pp. 5422–5430.

[38]

Zhang, Yinda, Song, Shuran, Tan, Ping, Xiao, Jianxiong, 2014. PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding. In: European Conference on Computer Vision. pp. 668–686.

[39]

Zheng, Jia, Zhang, Junfei, Li, Jing, Tang, Rui, Gao, Shenghua, Zhou, Zihan, 2019. Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling. In: ECCV. (9), pp. 519–535.

[40]

Zhou Yi, Cao Mingjun, You Jingdi, Meng Ming, Wang Yuehua, MR video fusion: interactive 3D modeling and stitching on wide-baseline videos, ACM Symp. Virtual Real. Softw. Technol. (2018) 17.

[41]

Zhou Zhong, Meng Ming, Zhou Yi, Zhu Zhe, You Jing, Model-guided 3D stitching for augmented virtual environment, Sci. China Inf. Sci. (2021).

[42]

Zhou, Wei, Wang, Zhou, 2023. Blind omnidirectional image quality assessment: integrating local statistics and global semantics. In: 2023 IEEE International Conference on Image Processing. ICIP, pp. 1405–1409.

[43]

Zhou Wei, Xu Jiahua, Jiang Qiuping, Chen Zhibo, No-reference quality assessment for 360-degree images by analysis of multifrequency information and local-global naturalness, IEEE Trans. Circuits Syst. Video Technol. 32 (4) (2021) 1778–1791.

[44]

Zhu, Xizhou, Hu, Han, Lin, Stephen, Dai, Jifeng, 2019. Deformable ConvNets V2: More Deformable, Better Results. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 9308–9316.

[45]

Zhu, Xizhou, Su, Weijie, Lu, Lewei, Li, Bin, Wang, Xiaogang, Dai, Jifeng, 2021. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In: ICLR 2021: The Ninth International Conference on Learning Representations.

[46]

Zhu Ge, Zhang Huili, Jiang Yirui, Lei Juan, He Linqing, Li Hongwei, Dynamic fusion technology of mobile video and 3D GIS: The example of smartphone video, ISPRS Int. J. Geo-Inf. 12 (2023) 125.

[47]

Zioulis, Nikolaos, Karakottas, Antonis, Zarpalas, Dimitrios, Daras, Petros, 2018. OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas. In: Proceedings of the European Conference on Computer Vision. ECCV, pp. 453–471.

[48]

Zou, Chuhang, Colburn, Alex, Shan, Qi, Hoiem, Derek, 2018. LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2051–2059.

[49]

Zou Chuhang, Su Jheng Wei, Peng Chi Han, Colburn Alex, Shan Qi, Wonka Peter, Chu Hung Kuo, Hoiem Derek, Manhattan room layout reconstruction from a single 360°image: A comparative study of state-of-the-art methods, Int. J. Comput. Vis. (2021) 1–22.

Cited By

Han HLiang YZhou YWang WJ. Rojas-Muñoz ELi X(2024)AURORA: Automated Unleash of 3D Room Outlines for VR ApplicationsProceedings of the 19th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry10.1145/3703619.3706036(1-8)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1145/3703619.3706036

Index Terms

Structure recovery from single omnidirectional image with distortion-aware learning
1. Computing methodologies
2. Social and professional topics
  1. Professional topics
    1. History of computing
      1. History of computing theory

Index terms have been assigned to the content through auto-classification.

Recommendations

Geometric-driven structure recovery from a single omnidirectional image based on planar depth map learning
Abstract
Scene structure recovery is a crucial process for assisting scene reconstruction and understanding by extracting vital scene structure information and has been widely used in smart city, VR/AR and intelligent robot navigation. Omnidirectional ...
Structure Recovery with Multiple Cameras from Scaled Orthographic and Perspective Views

This paper presents a novel framework for Euclidean structure recovery utilizing a scaled orthographic view and perspective views simultaneously. A scaled orthographic view is introduced in order to automatically obtain camera parameters such as camera ...
Metric calibration of a stereo rig
VSR '95: Proceedings of the IEEE Workshop on Representation of Visual Scenes

Describes a method to determine affine and metric calibration for a stereo rig. The method does not involve the use of calibration objects or special motions, but simply a single general motion of the rig with fixed parameters (i.e. camera parameters ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of King Saud University - Computer and Information Sciences

Journal of King Saud University - Computer and Information Sciences Volume 36, Issue 7

Sep 2024

378 pages

Issue’s Table of Contents

The Author(s).

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 September 2024

Author Tags

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Han HLiang YZhou YWang WJ. Rojas-Muñoz ELi X(2024)AURORA: Automated Unleash of 3D Room Outlines for VR ApplicationsProceedings of the 19th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry10.1145/3703619.3706036(1-8)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1145/3703619.3706036

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents