Abstract
The 3D CAD shapes in current 3D benchmarks are mostly collected from online model repositories. Thus, they typically have insufficient geometric details and less informative textures, making them less attractive for comprehensive and subtle research in areas such as high-quality 3D mesh and texture recovery. This paper presents 3D Furniture shape with TextURE (3D-FUTURE): a richly-annotated and large-scale repository of 3D furniture shapes in the household scenario. At the time of this technical report, 3D-FUTURE contains 9992 modern 3D furniture shapes with high-resolution textures and detailed attributes. To support the studies of 3D modeling from images, we couple the CAD models with 20,240 scene images. The room scenes are designed by professional designers or generated by an industrial scene creating system. Given the well-organized 3D-FUTURE and its characteristics, we provide a package of baseline experiments, such as joint 2D instance segmentation and 3D object pose estimation, image-based 3D shape retrieval, 3D object reconstruction from a single image, texture recovery for 3D shapes, and furniture composition, to facilitate related future researches on our database.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aubry, M., Maturana, D., Efros, A. A., Russell, B. C., & Sivic, J. (2014). Seeing 3d chairs: Exemplar part-based 2d–3d alignment using a large dataset of cad models. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3762–3769.
Bachman, C. W. (1978). Data processing system utilizing data field descriptors for processing data files. US Patent 4,068,300.
Bansal, A., Russell, B., & Gupta, A. (2016). Marr revisited: 2d–3d alignment via surface normal prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5965–5974.
Bo, L., Ren, X., & Fox, D. (2014). Learning hierarchical sparse features for rgb-(d) object recognition. The International Journal of Robotics Research, 33(4), 581–599.
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., & Rother, C. (2014). Learning 6d object pose estimation using 3d object coordinates. In European conference on computer vision, Springer, pp. 536–551.
Cai, Z., & Vasconcelos, N. (2019). Cascade R-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1483–1498.
Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A., & Zhang, Y. (2017). Matterport3d: Learning from rgb-d data in indoor environments. arXiv:1709.06158
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al. (2015). Shapenet: An information-rich 3d model repository. arXiv:1512.03012
Chen, W., Huang, P., Xu, J., Guo, X., Guo, C., Sun, F., Li, C., Pfadler, A., Zhao, H., & Zhao, B. (2019a). Pog: Personalized outfit generation for fashion recommendation at alibaba ifashion. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2662–2670.
Chen, W., Ling, H., Gao, J., Smith, E., Lehtinen, J., Jacobson, A., & Fidler, S. (2019b). Learning to predict 3d objects with an interpolation-based differentiable renderer. In Advances in neural information processing systems, pp. 9609–9619.
Choi, S., Zhou, Q. Y., Miller, S., & Koltun, V. (2016). A large dataset of object scans. arXiv:1602.02481
Choy, C. B., Xu, D., Gwak, J., Chen, K., & Savarese, S. (2016). 3D-R2N2: A unified approach for single and multi-view 3d object reconstruction. In Proceedings of the European conference on computer vision (ECCV).
Collet, A., Martinez, M., & Srinivasa, S. S. (2011). The moped framework: Object recognition and pose estimation for manipulation. The International Journal of Robotics Research, 30(10), 1284–1306.
Cucurull, G., Taslakian, P., & Vazquez, D. (2019). Context-aware visual compatibility prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12617–12626.
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser, T., & Nießner, M. (2017a). Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5828–5839.
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., & Theobalt, C. (2017b). Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration. ACM Transactions on Graphics (ToG), 36(4), 1.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Dhanachandra, N., Manglem, K., & Chanu, Y. J. (2015). Image segmentation using k-means clustering algorithm and subtractive clustering algorithm. Procedia Computer Science, 54, 764–771.
Durou, J. D., Falcone, M., & Sagona, M. (2008). Numerical methods for shape-from-shading: A new survey with benchmarks. Computer Vision and Image Understanding, 109(1), 22–43.
Fan, H., Su, H., & Guibas, L. J. (2017). A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 605–613.
Fathi, A., Wojna, Z., Rathod, V., Wang, P., Song, H. O., Guadarrama, S., Murphy, K. P. (2017). Semantic instance segmentation via deep metric learning. arXiv:1703.10277
Favaro, P., & Soatto, S. (2005). A geometric approach to shape from defocus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 406–417.
Feng, Y., Feng, Y., You, H., Zhao, X., & Gao, Y. (2019). Meshnet: Mesh neural network for 3d shape representation. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 8279–8286).
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Fu, H., Li, S., Jia, R., Gong, M., Zhao, B., & Tao, D. (2020). Hard example generation by texture synthesis for cross-domain shape similarity learning. Advances in Neural Information Processing Systems, 33, 14675–14687.
Gao, L., Wu, T., Yuan, Y. J., Lin, M. X., Lai, Y. K., & Zhang, H. (2020) Tm-net: Deep generative networks for textured meshes. arXiv:2010.06217
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp. 3354–3361.
Girdhar, R., Fouhey, D. F., Rodriguez, M., & Gupta, A. (2016). Learning a predictable and generative vector representation for objects. In European conference on computer vision. Springer, pp. 484–499.
Grabner, A., Roth, P. M., & Lepetit, V. (2018). 3d pose estimation and 3d model retrieval for objects in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3022–3031.
Grabner, A., Roth, P. M., & Lepetit, V. (2019). Location field descriptors: Single image 3d model retrieval in the wild. In 2019 international conference on 3d vision (3DV). IEEE, pp. 583–593.
Groueix, T., Fisher, M., Kim, V. G., Russell, B., & Aubry, M. (2018). AtlasNet: A Papier–Mâché approach to learning 3D surface generation. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR).
Hanocka, R., Hertz, A., Fish, N., Giryes, R., Fleishman, S., & Cohen-Or, D. (2019). Meshcnn: A network with an edge. ACM Transactions on Graphics (TOG), 38(4), 1–12.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 2961–2969.
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637.
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., & Navab, N. (2012). Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes. In Asian conference on computer vision. Springer, pp. 548–562.
Hua, B. S., Pham, Q. H., Nguyen, D. T., Tran, M. K., Yu, L. F., & Yeung, S. K. (2016). Scenenn: A scene meshes dataset with annotations. In 2016 fourth international conference on 3D vision (3DV). IEEE, pp. 92–101.
Huang, S., Qi, S., Zhu, Y., Xiao, Y., Xu, Y., & Zhu, S. C. (2018). Holistic 3d scene parsing and reconstruction from a single rgb image. In Proceedings of the European conference on computer vision (ECCV), pp. 187–203.
Huang, Z., Huang, L., Gong, Y., Huang, C., & Wang, X. (2019). Mask scoring r-cnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6409–6418.
Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 881–892.
Kehl, W., Milletari, F., Tombari, F., Ilic, S., & Navab, N. (2016). Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation. In European conference on computer vision. Springer, pp. 205–220.
Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In 4th international IEEE workshop on 3D representation and recognition (3dRR-13), Sydney, Australia.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105.
Lee, T., Lin, Y. L., Chiang, H., Chiu, M. W., Hsu, W., & Huang, P. (2018). Cross-domain image-based 3d shape retrieval by view sequence learning. In 2018 international conference on 3D vision (3DV). IEEE, pp. 258–266.
Li, W., Saeedi, S., McCormac, J., Clark, R., Tzoumanikas, D., Ye, Q., Huang, Y., Tang, R., & Leutenegger, S. (2018). Interiornet: Mega-scale multi-sensor photo-realistic indoor scenes dataset. arXiv:1809.00716
Li, Y., Su, H., Qi, C. R., Fish, N., Cohen-Or, D., & Guibas, L. J. (2015). Joint embeddings of shapes and images via cnn image purification. ACM Transactions on Graphics (TOG), 34(6), 234.
Lim, J. J., Pirsiavash, H., & Torralba, A. (2013). Parsing ikea objects: Fine pose estimation. In Proceedings of the IEEE international conference on computer vision, pp. 2992–2999.
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision. Springer, pp. 740–755.
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125.
Liu, S., Li, T., Chen, W., & Li, H. (2019). Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In Proceedings of the IEEE international conference on computer vision, pp. 7708–7717.
Liu, T., Hertzmann, A., Li, W., & Funkhouser, T. (2015). Style compatibility for 3d furniture models. ACM Transactions on Graphics (TOG), 34(4), 1–9.
Massa, F., Russell, B. C., & Aubry, M. (2016). Deep exemplar 2d–3d detection by adapting from real to rendered views. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6024–6033.
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., & Geiger, A. (2019). Occupancy networks: Learning 3D reconstruction in function space. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4460–4470.
Najman, L., & Schmitt, M. (1994). Watershed of a continuous function. Signal Processing, 38(1), 99–112.
Oechsle, M., Mescheder, L., Niemeyer, M., Strauss, T., & Geiger, A. (2019). Texture fields: Learning texture representations in function space. In Proceedings of the IEEE international conference on computer vision, pp. 4531–4540.
Park, K., Mousavian, A., Xiang, Y., & Fox, D. (2020). Latentfusion: End-to-end differentiable reconstruction and rendering for unseen object pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10710–10719.
Peng, S., Liu, Y., Huang, Q., Zhou, X., & Bao, H. (2019). Pvnet: Pixel-wise voting network for 6dof pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4561–4570.
Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017a). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108.
Qi, C. R., Yi, L., Su, H., Guibas, L. J. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pp. 5099–5108.
Rad, M., & Lepetit, V. (2017). Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth. In Proceedings of the IEEE international conference on computer vision, pp. 3828–3836.
Raj, A., Ham, C., Barnes, C., Kim, V., Lu, J., & Hays, J. (2019). Learning to generate textures on 3d meshes. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 32–38.
Rothganger, F., Lazebnik, S., Schmid, C., & Ponce, J. (2006). 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision, 66(3), 231–259.
Schönberger, J. L., & Frahm, J. M. (2016). Structure-from-motion revisited. In Conference on computer vision and pattern recognition (CVPR).
Schönberger, J. L., Zheng, E., Pollefeys, M., & Frahm, J. M. (2016). Pixelwise view selection for unstructured multi-view stereo. In European conference on computer vision (ECCV).
Shilane, P., Min, P., Kazhdan, M., & Funkhouser, T. (2004). The Princeton shape benchmark. In Proceedings shape modeling applications, 2004. IEEE, pp. 167–178.
Silberman, N., Hoiem, D., Kohli, P., & Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In European conference on computer vision. Springer, pp. 746–760.
Song, C., Song, J., & Huang, Q. (2020). Hybridpose: 6d object pose estimation under hybrid representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 431–440.
Song, S., Lichtenberg, S. P., & Xiao, J. (2015). Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 567–576.
Song, S., Yu, F., Zeng, A., Chang, A. X., Savva, M., & Funkhouser, T. (2017). Semantic scene completion from a single depth image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1746–1754.
Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, pp. 945–953.
Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J. B., & Freeman, W. T. (2018a). Pix3d: Dataset and methods for single-image 3d shape modeling. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2974–2983.
Sun, Y., Liu, Z., Wang, Y., & Sarma, S. E. (2018b). Im2avatar: Colorful 3d reconstruction from a single image. arXiv:1804.06375
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9.
Tasse, F. P., & Dodgson, N. (2016). Shape2vec: Semantic-based descriptors for 3d shapes, sketches and images. ACM Transactions on Graphics (TOG), 35(6), 208.
Tatarchenko, M., Richter, S. R., Ranftl, R., Li, Z., Koltun, V., & Brox, T. (2019). What do single-view 3d reconstruction networks learn? In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3405–3414.
Tekin, B., Sinha, S. N., & Fua, P. (2018). Real-time seamless single shot 6d object pose prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 292–301.
Tulsiani, S., Zhou, T., Efros, A. A., & Malik, J. (2017). Multi-view supervision for single-view reconstruction via differentiable ray consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2626–2634.
Uy, M. A., Pham, Q. H., Hua, B. S., Nguyen, T., & Yeung, S. K. (2019). Revisiting point cloud classification: A new benchmark dataset and classification model on real-world data. In Proceedings of the IEEE international conference on computer vision, pp. 1588–1597.
Vasileva, M. I., Plummer, B. A., Dusad, K., Rajpal, S., Kumar, R., & Forsyth, D. (2018). Learning type-aware embeddings for fashion compatibility. In Proceedings of the European conference on computer vision (ECCV), pp. 390–405.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser. Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008.
Wang, N., Zhang, Y., Li, Z., Fu, Y., Liu, W., & Jiang, Y. G. (2018). Pixel2Mesh: Generating 3D mesh models from single RGB images. In Proceedings of the European conference on computer vision (ECCV), pp. 52–67.
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, B., & Tenenbaum, J. (2017). Marrnet: 3d shape reconstruction via 2.5 d sketches. In Advances in neural information processing systems, pp. 540–550.
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920.
Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3733–3742.
Xiang, Y., Mottaghi, R., & Savarese, S. (2014). Beyond pascal: A benchmark for 3d object detection in the wild. In IEEE winter conference on applications of computer vision. IEEE, pp. 75–82.
Xiang, Y., Kim, W., Chen, W., Ji, J., Choy, C., Su, H., Mottaghi, R., Guibas, L., & Savarese, S. (2016). Objectnet3d: A large scale database for 3d object recognition. In European conference on computer vision. Springer, pp. 160–176.
Xiang, Y., Schmidt, T., Narayanan, V., & Fox, D. (2017). Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv:1711.00199
Xiao, J., Ehinger, K. A., Hays, J., Torralba, A., & Oliva, A. (2016). Sun database: Exploring a large collection of scene categories. International Journal of Computer Vision, 119(1), 3–22.
Xiao, J., Owens, A., & Torralba, A. (2013). Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proceedings of the IEEE international conference on computer vision, pp. 1625–1632.
Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. (2017). Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500.
Xu, Q., Wang, W., Ceylan, D., Mech, R., & Neumann, U. (2019). DISN: Deep implicit surface network for high-quality single-view 3D reconstruction. In Advances in neural information processing systems, pp. 492–502.
Yang, L., Luo, P., Change Loy, C., & Tang, X. (2015). A large-scale car dataset for fine-grained categorization and verification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3973–3981.
Zhang, R., Tsai, P. S., Cryer, J. E., & Shah, M. (1999). Shape-from-shading: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(8), 690–706.
Zhang, Y., Song, S., Yumer, E., Savva, M., Lee, J. Y., Jin, H., & Funkhouser, T. (2017). Physically-based rendering for indoor scene understanding using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5287–5295.
Zheng, J., Zhang, J., Li, J., Tang, R., Gao, S., & Zhou, Z. (2019). Structured3d: A large photo-realistic dataset for structured (3d modeling). arXiv:1908.00222
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 633–641.
Zhu, J. Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A., Wang, O., & Shechtman, E. (2017). Toward multimodal image-to-image translation. In Advances in neural information processing systems, pp. 465–476.
Acknowledgements
Dacheng Tao is supported by Australian Research Council Project FL-170100117. Mingming Gong is supported by Australian Research Council Project DE-210101624. We would like to thank Alibaba Topping Homestyler for the great help with the data preparation and image rendering. We also appreciate Alibaba Tianchi for managing the dataset so that it can be easily requested and downloaded.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Boxin Shi.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fu, H., Jia, R., Gao, L. et al. 3D-FUTURE: 3D Furniture Shape with TextURE. Int J Comput Vis 129, 3313–3337 (2021). https://doi.org/10.1007/s11263-021-01534-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01534-z