Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-031-72973-7_19guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

PolyRoom: Room-Aware Transformer for Floorplan Reconstruction

Published: 01 November 2024 Publication History

Abstract

Reconstructing geometry and topology structures from raw unstructured data has always been an important research topic in indoor mapping research. In this paper, we aim to reconstruct the floorplan with a vectorized representation from point clouds. Despite significant advancements achieved in recent years, current methods still encounter several challenges, such as missing corners or edges, inaccuracies in corner positions or angles, self-intersecting or overlapping polygons, and potentially implausible topology. To tackle these challenges, we present PolyRoom, a room-aware Transformer that leverages uniform sampling representation, room-aware query initialization, and room-aware self-attention for floorplan reconstruction. Specifically, we adopt a uniform sampling floorplan representation to enable dense supervision during training and effective utilization of angle information. Additionally, we propose a room-aware query initialization scheme to prevent non-polygonal sequences and introduce room-aware self-attention to enhance memory efficiency and model performance. Experimental results on two widely used datasets demonstrate that PolyRoom surpasses current state-of-the-art methods both quantitatively and qualitatively. Our code is available at: https://github.com/3dv-casia/PolyRoom/.

References

[1]
Avetisyan, A., Khanova, T., Choy, C., Dash, D., Dai, A., Nießner, M.: SceneCAD: predicting object alignments and layouts in RGB-D scans. In: European Conference on Computer Vision (ECCV), pp. 596–612 (2020)
[2]
Browne, C.B., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
[3]
Cabral, R., Furukawa, Y.: Piecewise planar and compact floorplan reconstruction from images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 628–635 (2014)
[4]
Castrejon, L., Kundu, K., Urtasun, R., Fidler, S.: Annotating object instances with a polygon-RNN. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4485–4493 (2017)
[5]
Chen, C., Wang, R., Vogel, C., Pollefeys, M.: F3Loc: fusion and filtering for floorplan localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18029–18038 (2024)
[6]
Chen, J., Deng, R., Furukawa, Y.: PolyDiffuse: polygonal shape reconstruction via guided set diffusion models. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 1863–1888 (2023)
[7]
Chen, J., Liu, C., Wu, J., Furukawa, Y.: Floor-SP: inverse CAD for floorplans by sequential room-wise shortest path. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2661–2670 (2019)
[8]
Chen, J., Qian, Y., Furukawa, Y.: HEAT: holistic edge attention Transformer for structured reconstruction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3856–3865 (2022)
[9]
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask Transformer for universal image segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1280–1289 (2022)
[10]
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2432–2443 (2017)
[11]
Dosovitskiy, A., et al.: An image is worth 16×16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (ICLR) (2020)
[12]
Douglas DH and Peucker TK Algorithms for the reduction of the number of points required to represent a digitized line or its caricature Cartographica 1973 10 2 112-122
[13]
Fan, Z., Zhu, L., Li, H., Chen, X., Zhu, S., Tan, P.: FloorPlanCAD: a large-scale CAD drawing dataset for panoptic symbol spotting. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10108–10117 (2021)
[14]
Favreau JD, Lafarge F, Bousseau A, and Auvolat A Extracting geometric structures in images with Delaunay point processes IEEE Trans. Pattern Anal. Mach. Intell. 2019 42 4 837-850
[15]
Han J, Liu Y, Rong M, Zheng X, and Shen S FloorUSG: indoor floorplan reconstruction by unifying 2D semantics and 3D geometry ISPRS J. Photogramm. Remote. Sens. 2023 196 490-501
[16]
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988 (2017)
[17]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
[18]
Hochreiter S and Schmidhuber J Long short-term memory Neural Comput. 1997 9 8 1735-1780
[19]
Hu Y, Wang Z, Huang Z, and Liu Y PolyBuilding: polygon transformer for building extraction ISPRS J. Photogramm. Remote. Sens. 2023 199 15-27
[20]
Ibrahem, H., Salem, A., Kang, H.S.: ST-RoomNet: learning room layout estimation from single image through unsupervised spatial transformations. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3376–3384 (2023)
[21]
Ikehata, S., Yang, H., Furukawa, Y.: Structured indoor modeling. In: IEEE International Conference on Computer Vision (ICCV), pp. 1323–1331 (2015)
[22]
Jiang, Z., Xiang, Z., Xu, J., Zhao, M.: LGT-Net: indoor panoramic room layout estimation with geometry-aware transformer network. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1644–1653 (2022)
[23]
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
[24]
Lazarow, J., Xu, W., Tu, Z.: Instance segmentation with mask-supervised polygonal boundary Transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4372–4381 (2022)
[25]
Lee, C.Y., Badrinarayanan, V., Malisiewicz, T., Rabinovich, A.: RoomNet: end-to-end room layout estimation. In: IEEE International Conference on Computer Vision (ICCV), pp. 4875–4884 (2017)
[26]
Li, J., Chan, C.L., Le Chan, J., Li, Z., Wan, K.W., Yau, W.Y.: Cognitive navigation for indoor environment using floorplan. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9030–9037. IEEE (2021)
[27]
Liang, J., Homayounfar, N., Ma, W.C., Xiong, Y., Hu, R., Urtasun, R.: PolyTransform: deep polygon transformer for instance segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9128–9137 (2020)
[28]
Liao, B., et al.: MapTRv2: an end-to-end framework for online vectorized HD map construction. arXiv preprint arXiv:2308.05736 (2023)
[29]
Liu, H., et al.: Lightweight structured line map based visual localization. IEEE Robot. Automat. Lett. 9(6), 5182–5189 (2024)
[30]
Liu, J., et al.: PolyFormer: referring image segmentation as sequential polygon generation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18653–18663 (2023)
[31]
Liu, Z., et al.: Swin transformer: hierarchical vision Transformer using shifted windows. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9992–10002 (2021)
[32]
Peng, S., Jiang, W., Pi, H., Li, X., Bao, H., Zhou, X.: Deep snake for real-time instance segmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8530–8539 (2020)
[33]
Stekovic, S., Rad, M., Fraundorfer, F., Lepetit, V.: MonteFloor: extending MCTs for reconstructing accurate large-scale floor plans. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16014–16023 (2021)
[34]
Su, J.W., Tung, K.Y., Peng, C.H., Wonka, P., Chu, H.K.: SLIBO-Net: floorplan reconstruction via slicing box representation with local geometry regularization. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 48781–48792 (2023)
[35]
Sun, C., Hsiao, C.W., Sun, M., Chen, H.T.: HorizonNet: learning room layout with 1D representation and pano stretch data augmentation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1047–1056 (2019)
[36]
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS) (2017)
[37]
Xu, Y., Xu, W., Cheung, D., Tu, Z.: Line segment detection using transformers without edges. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4255–4264 (2021)
[38]
Xue, N., et al.: Holistically-attracted wireframe parsing. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2785–2794 (2020)
[39]
Yue, Y., Kontogianni, T., Schindler, K., Engelmann, F.: Connecting the dots: floorplan reconstruction using two-level queries. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 845–854 (2023)
[40]
Zhang, F., Xu, X., Nauata, N., Furukawa, Y.: Structured outdoor architecture reconstruction by exploration and classification. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12407–12415 (2021)
[41]
Zheng J, Zhang J, Li J, Tang R, Gao S, and Zhou Z Vedaldi A, Bischof H, Brox T, and Frahm J-M Structured3D: a large photo-realistic dataset for structured 3D modeling Computer Vision – ECCV 2020 2020 Cham Springer 519-535
[42]
Zheng, Z., Li, J., Zhu, L., Li, H., Petzold, F., Tan, P.: GAT-CADNet: graph attention network for panoptic symbol spotting in CAD drawings. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11737–11746 (2022)
[43]
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (ICLR) (2020)

Index Terms

  1. PolyRoom: Room-Aware Transformer for Floorplan Reconstruction
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part L
            Sep 2024
            568 pages
            ISBN:978-3-031-72972-0
            DOI:10.1007/978-3-031-72973-7
            • Editors:
            • Aleš Leonardis,
            • Elisa Ricci,
            • Stefan Roth,
            • Olga Russakovsky,
            • Torsten Sattler,
            • Gül Varol

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 01 November 2024

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 10 Nov 2024

            Other Metrics

            Citations

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media