Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Structure recovery from single omnidirectional image with distortion-aware learning

Published: 21 November 2024 Publication History

Abstract

Recovering structures from images with 180∘ or 360∘ FoV is pivotal in computer vision and computational photography, particularly for VR/AR/MR and autonomous robotics applications. Due to varying distortions and the complexity of indoor scenes, recovering flexible structures from a single image is challenging. We introduce OmniSRNet, a comprehensive deep learning framework that merges distortion-aware learning with bidirectional LSTM. Utilizing a curated dataset with optimized panorama and expanded fisheye images, our framework features a distortion-aware module (DAM) for extracting features and a horizontal and vertical step module (HVSM) of LSTM for contextual predictions. OmniSRNet excels in applications such as VR-based house viewing and MR-based video surveillance, achieving leading results on cuboid and non-cuboid datasets. The code and dataset can be accessed at https://github.com/mmlph/OmniSRNet/.

References

[1]
Armeni Iro, Sax Sasha, Zamir Amir R., Savarese Silvio, Joint 2D-3D-semantic data for indoor scene understanding, 2017, arXiv:1702.01105 [cs.CV].
[2]
Baskurt, Didem, Nicolas, G., Guerrero, Josechu, 2011. Scene structure recovery from a single omnidirectional image. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 359–366.
[3]
Chen Xingyu, Yu Junzhi, Kong Shihan, Wu Zhengxing, Wen Li, Joint anchor-feature refinement for real-time accurate object detection in images and videos, IEEE Trans. Circuits Syst. Video Technol. 31 (2) (2021) 594–607.
[4]
Chhikara, Prateek, Kuhar, Harshul, Goyal, Anil, Sharma, Chirag, 2023. DIGITOUR: Automatic Digital Tours for Real-Estate Properties. In: International Conference on Data Science & Management of Data. 10th ACM IKDD CODS and 28th COMAD, pp. 223–227.
[5]
Cui Xiaoliang, Khan Dawar, He Zhenbang, Cheng Zhanglin, Fusing surveillance videos and three-dimensional scene: A mixed reality system, Comput. Animat. Virtual Worlds 34 (1) (2023).
[6]
Dai, Jifeng, Qi, Haozhi, Xiong, Yuwen, Li, Yi, Zhang, Guodong, Hu, Han, Wei, Yichen, 2017. Deformable Convolutional Networks. In: 2017 IEEE International Conference on Computer Vision. ICCV, pp. 764–773.
[7]
Felzenszwalb Pedro F., Girshick Ross B., McAllester David, Ramanan Deva, Object detection with discriminatively trained part-based models, IEEE Trans. Pattern Anal. Mach. Intell. 32 (9) (2010) 1627–1645.
[8]
Fernandez-Labrador Clara, Facil Jose M., Perez-Yus Alejandro, Demonceaux Cedric, Civera Javier, Guerrero Jose J., Corners for layout: End-to-end layout recovery from 360 images, IEEE Robot. Autom. Lett. 5 (2) (2020) 1255–1262.
[9]
Fernandez-Labrador Clara, Perez-Yus Alejandro, Lopez-Nicolas Gonzalo, Guerrero Jose J., Layouts from panoramic images with geometry and deep learning, IEEE Robot. Autom. Lett. 3 (4) (2018) 3153–3160.
[10]
Fu, Jun, Hou, Chen, Zhou, Wei, Xu, Jiahua, Chen, Zhibo, 2022. Adaptive hypergraph convolutional network for no-reference 360-degree image quality assessment. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 961–969.
[11]
Hedau, Varsha, Hoiem, Derek, Forsyth, David, 2009. Recovering the Spatial Layout of Cluttered Rooms. In: Proceedings of the IEEE International Conference on Computer Vision. ICCV, pp. 1849–1856.
[12]
Jaderberg, Max, Simonyan, Karen, Zisserman, Andrew, Kavukcuoglu, Koray, 2015. Spatial Transformer Networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. pp. 2017–2025.
[13]
Kingma, Diederik P., Ba, Jimmy Lei, 2015. Adam: A Method for Stochastic Optimization. In: ICLR 2015 : International Conference on Learning Representations 2015.
[14]
Lee, Chen-Yu, Badrinarayanan, Vijay, Malisiewicz, Tomasz, Rabinovich, Andrew, 2017. RoomNet: End-to-End Room Layout Estimation. In: 2017 IEEE International Conference on Computer Vision. ICCV, pp. 4875–4884.
[15]
Li, Mingyang, Zhou, Yi, Meng, Ming, Wang, Yuehua, Zhou, Zhong, 2019. 3D Room Reconstruction from A Single Fisheye Image. In: 2019 International Joint Conference on Neural Networks. IJCNN, pp. 1–8.
[16]
Liu, Shu, Qi, Lu, Qin, Haifang, Shi, Jianping, Jia, Jiaya, 2018. Path Aggregation Network for Instance Segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8759–8768.
[17]
Meng Ming, Xiao Likai, Zhou Zhong, Geometric-driven structure recovery from a single omnidirectional image based on planar depth map learning, Neural Comput. Appl. 35 (34) (2023) 24407–24433.
[18]
Meng, Ming, Xiao, Likai, Zhou, Yi, Li, Zhaoxin, Zhou, Zhong, 2021. Distortion-Aware Room Layout Estimation from A Single Fisheye Image. In: 2021 IEEE International Symposium on Mixed and Augmented Reality. ISMAR, pp. 441–449.
[19]
Mohan Rohit, Valada Abhinav, EfficientPS: Efficient panoptic segmentation, Int. J. Comput. Vis. 129 (5) (2021) 1–29.
[20]
Pérez-Yus, Alejandro, López-Nicolás, Gonzalo, Guerrero, José Jesús, 2016. Peripheral Expansion of Depth Information via Layout Estimation with Fisheye Camera. In: European Conference on Computer Vision. pp. 396–412.
[21]
Pintore, Giovanni, Agus, Marco, Gobbetti, Enrico, 2020. AtlantaNet: Inferring the 3D Indoor Layout from a Single 360°Image Beyond the Manhattan World Assumption. In: European Conference on Computer Vision. pp. 432–448.
[22]
Playout Clément, Ahmad Ola, Lécué Freddy, Cheriet Farida, Adaptable deformable convolutions for semantic segmentation of fisheye images in autonomous driving systems, 2021, arXiv preprint arXiv:2102.10191.
[23]
Raimundo Mendes Nilton Paulo, Santos Eduardo, Exploratory virtual model: Study and evaluation of a low-cost VR-based real estate sales tool, J. Geom. Graph. 26 (2022) 171–184.
[24]
Ramalingam, Srikumar, Pillai, Jaishanker K., Jain, Arpit, Taguchi, Yuichi, 2013. Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. pp. 3065–3072.
[25]
Rao, Shivansh, Kumar, Vikas, Kifer, Daniel, Giles, C. Lee, Mali, Ankur, 2021. OmniLayout: Room Layout Reconstruction from Indoor Spherical Panoramas. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. CVPRW, pp. 3706–3715.
[26]
Shen, Zhijie, Zheng, Zishuo, Lin, Chunyu, Nie, Lang, Liao, Kang, Zheng, Shuai, Zhao, Yao, 2023. Disentangling orthogonal planes for indoor panoramic room layout estimation with cross-scale distortion awareness. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17337–17345.
[27]
Singh Bharat, Najibi Mahyar, Sharma Abhishek, Davis Larry Steven, Scale normalized image pyramids with AutoFocus for object detection., IEEE Trans. Pattern Anal. Mach. Intell. (2021) 1.
[28]
Sun, Cheng, Hsiao, Chi-Wei, Sun, Min, Chen, Hwann-Tzong, 2019. HorizonNet: Learning Room Layout With 1D Representation and Pano Stretch Data Augmentation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 1047–1056.
[29]
Tateno, Keisuke, Navab, Nassir, Tombari, Federico, 2018. Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images. In: Proceedings of the European Conference on Computer Vision. ECCV, pp. 732–750.
[30]
Wang, Haiyan, Hutchcroft, Will, Li, Yuguang, Wan, Zhiqiang, Boyadzhiev, Ivaylo, Tian, Yingli, Kang, Sing Bing, 2022. Psmnet: Position-aware stereo merging network for room layout estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8616–8625.
[31]
Xiongwei, Wu, Hoi, Steven, Sahoo, Doyen, 2021. PolarNet: Learning to Optimize Polar Keypoints for Keypoint Based Object Detection. In: ICLR 2021: The Ninth International Conference on Learning Representations.
[32]
Xu, Jiu, Stenger, Bjorn, Kerola, Tommi, Tung, Tony, 2017. Pano2CAD: Room Layout from a Single Panorama Image. In: 2017 IEEE Winter Conference on Applications of Computer Vision. WACV, pp. 354–362.
[33]
Xu Jiahua, Zhou Wei, Chen Zhibo, Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks, IEEE Trans. Circuits Syst. Video Technol. 31 (5) (2020) 1724–1737.
[34]
Yang, Yang, Jin, Shi, Liu, Ruiyang, Kang, Sing Bing, Yu, Jingyi, 2018. Automatic 3D Indoor Scene Modeling from Single Panorama. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3926–3934.
[35]
Yang Shang-Ta, Peng Chi-Han, Wonka Peter, Chu Hung-Kuo, PanoAnnotator: a semi-automatic tool for indoor panorama layout annotation, 2018, SIGGRAPH Asia 2018 Posters.
[36]
Yang, Shang-Ta, Wang, Fu-En, Peng, Chi-Han, Wonka, Peter, Sun, Min, Chu, Hung-Kuo, 2019. DuLa-Net: A Dual-Projection Network for Estimating Room Layouts From a Single RGB Panorama. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 3358–3367.
[37]
Yang, Hao, Zhang, Hui, 2016. Efficient 3D Room Shape Recovery from a Single Panorama. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. CVPR, pp. 5422–5430.
[38]
Zhang, Yinda, Song, Shuran, Tan, Ping, Xiao, Jianxiong, 2014. PanoContext: A Whole-Room 3D Context Model for Panoramic Scene Understanding. In: European Conference on Computer Vision. pp. 668–686.
[39]
Zheng, Jia, Zhang, Junfei, Li, Jing, Tang, Rui, Gao, Shenghua, Zhou, Zihan, 2019. Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling. In: ECCV. (9), pp. 519–535.
[40]
Zhou Yi, Cao Mingjun, You Jingdi, Meng Ming, Wang Yuehua, MR video fusion: interactive 3D modeling and stitching on wide-baseline videos, ACM Symp. Virtual Real. Softw. Technol. (2018) 17.
[41]
Zhou Zhong, Meng Ming, Zhou Yi, Zhu Zhe, You Jing, Model-guided 3D stitching for augmented virtual environment, Sci. China Inf. Sci. (2021).
[42]
Zhou, Wei, Wang, Zhou, 2023. Blind omnidirectional image quality assessment: integrating local statistics and global semantics. In: 2023 IEEE International Conference on Image Processing. ICIP, pp. 1405–1409.
[43]
Zhou Wei, Xu Jiahua, Jiang Qiuping, Chen Zhibo, No-reference quality assessment for 360-degree images by analysis of multifrequency information and local-global naturalness, IEEE Trans. Circuits Syst. Video Technol. 32 (4) (2021) 1778–1791.
[44]
Zhu, Xizhou, Hu, Han, Lin, Stephen, Dai, Jifeng, 2019. Deformable ConvNets V2: More Deformable, Better Results. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 9308–9316.
[45]
Zhu, Xizhou, Su, Weijie, Lu, Lewei, Li, Bin, Wang, Xiaogang, Dai, Jifeng, 2021. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In: ICLR 2021: The Ninth International Conference on Learning Representations.
[46]
Zhu Ge, Zhang Huili, Jiang Yirui, Lei Juan, He Linqing, Li Hongwei, Dynamic fusion technology of mobile video and 3D GIS: The example of smartphone video, ISPRS Int. J. Geo-Inf. 12 (2023) 125.
[47]
Zioulis, Nikolaos, Karakottas, Antonis, Zarpalas, Dimitrios, Daras, Petros, 2018. OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas. In: Proceedings of the European Conference on Computer Vision. ECCV, pp. 453–471.
[48]
Zou, Chuhang, Colburn, Alex, Shan, Qi, Hoiem, Derek, 2018. LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2051–2059.
[49]
Zou Chuhang, Su Jheng Wei, Peng Chi Han, Colburn Alex, Shan Qi, Wonka Peter, Chu Hung Kuo, Hoiem Derek, Manhattan room layout reconstruction from a single 360°image: A comparative study of state-of-the-art methods, Int. J. Comput. Vis. (2021) 1–22.

Index Terms

  1. Structure recovery from single omnidirectional image with distortion-aware learning
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Journal of King Saud University - Computer and Information Sciences
    Journal of King Saud University - Computer and Information Sciences  Volume 36, Issue 7
    Sep 2024
    378 pages

    Publisher

    Elsevier Science Inc.

    United States

    Publication History

    Published: 21 November 2024

    Author Tags

    1. Computing methodologies: Artificial intelligence
    2. Computer graphics
    3. Computer graphics: Mixed / augmented reality

    Author Tags

    1. Structure recovery
    2. Omnidirectional dataset
    3. DAM
    4. HVSM

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media