research-article

Open access

Creating LEGO Figurines from Single Images

Authors:

Chi-Wing FuAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 43, Issue 4

Article No.: 153, Pages 1 - 16

https://doi.org/10.1145/3658167

Published: 19 July 2024 Publication History

Abstract

This paper presents a computational pipeline for creating personalized, physical LEGO^®1 figurines from user-input portrait photos. The generated figurine is an assembly of coherently-connected LEGO^® bricks detailed with uv-printed decals, capturing prominent features such as hairstyle, clothing style, and garment color, and also intricate details such as logos, text, and patterns. This task is non-trivial, due to the substantial domain gap between unconstrained user photos and the stylistically-consistent LEGO^® figurine models. To ensure assemble-ability by LEGO^® bricks while capturing prominent features and intricate details, we design a three-stage pipeline: (i) we formulate a CLIP-guided retrieval approach to connect the domains of user photos and LEGO^® figurines, then output physically-assemble-able LEGO^® figurines with decals excluded; (ii) we then synthesize decals on the figurines via a symmetric U-Nets architecture conditioned on appearance features extracted from user photos; and (iii) we next reproject and uv-print the decals on associated LEGO^® bricks for physical model production. We evaluate the effectiveness of our method against eight hundred expert-designed figurines, using a comprehensive set of metrics, which include a novel GPT-4V-based evaluation metric, demonstrating superior performance of our method in visual quality and resemblance to input photos. Also, we show our method's robustness by generating LEGO^® figurines from diverse inputs and physically fabricating and assembling several of them.

Supplementary Material

ZIP File (papers_400.zip)

supplemental

Download
229.55 MB

References

[1]

Patrick Baudisch and Stefanie Mueller. 2017. Personal Fabrication. Foundations and Trends® in Human-Computer Interaction 10, 3--4 (2017), 165--293.

[2]

Quentin Becker, Seiichi Suzuki, Yingying Ren, Davide Pellis, Francis Julian Panetta, and Mark Pauly. 2023. C-shells: Deployable Gridshells with Curved Beams. ACM Transactions on Graphics 42, 6 (2023), 1--17.

Digital Library

[3]

BrickLink. 2024. Bricklink Color Guide. https://www.bricklink.com/catalogColors.asp

[4]

Kaidi Cao, Jing Liao, and Lu Yuan. 2018. Carigans: Unpaired photo-to-caricature translation. arXiv preprint arXiv:1811.00222 (2018).

[5]

Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Rui, Xuhui Jia, Ming-Wei Chang, and William W Cohen. 2023a. Subject-driven text-to-image generation via apprenticeship learning. arXiv preprint arXiv:2304.00186 (2023).

[6]

Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Hengshuang Zhao. 2023b. Anydoor: Zero-shot object-level image customization. arXiv preprint arXiv:2307.09481 (2023).

[7]

Blender Online Community. 2024. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam. http://www.blender.org

[8]

Noah Duncan, Lap-Fai Yu, Sai-Kit Yeung, and Demetri Terzopoulos. 2017. Approximate dissections. ACM Transactions on Graphics (TOG) 36, 6 (2017), 1--13.

Digital Library

[9]

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. 2022a. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022).

[10]

Rinon Gal, Or Patashnik, Haggai Maron, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. 2022b. StyleGAN-NADA: CLIP-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1--13.

Digital Library

[11]

Yuying Ge, Ruimao Zhang, Lingyun Wu, Xiaogang Wang, Xiaoou Tang, and Ping Luo. 2019. A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images. CVPR (2019).

[12]

Rebecca A. H. Gower, Agnes E. Heydtmann, and Henrik G. Petersen. 1998. LEGO: Automated Model Construction. In European Study Group with Industry. 81--94.

[13]

Li Hu, Xin Gao, Peng Zhang, Ke Sun, Bang Zhang, and Liefeng Bo. 2023. Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation. arXiv preprint arXiv:2311.17117 (2023).

[14]

Xuhui Jia, Yang Zhao, Kelvin CK Chan, Yandong Li, Han Zhang, Boqing Gong, Tingbo Hou, Huisheng Wang, and Yu-Chuan Su. 2023. Taming encoder for zero fine-tuning image customization with text-to-image diffusion models. arXiv preprint arXiv:2304.02642 (2023).

[15]

Glenn Jocher, Ayush Chaurasia, and Jing Qiu. 2023. YOLO by Ultralytics. https://github.com/ultralytics/ultralytics

[16]

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020. Training generative adversarial networks with limited data. Advances in neural information processing systems 33 (2020), 12104--12114.

[17]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401--4410.

[18]

Maria Korosteleva and Olga Sorkine-Hornung. 2023. GarmentCode: Programming Parametric Sewing Patterns. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1--15.

Digital Library

[19]

Gihyun Kwon and Jong Chul Ye. 2023. Diffusion-based Image Translation using disentangled style and content representation. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1--5, 2023. OpenReview.net. https://openreview.net/pdf?id=Nayau9fwXU

[20]

Seung-Mok Lee, Jae Woo Kim, and Hyun Myung. 2018. Split-and-Merge-Based Genetic Algorithm (SM-GA) for LEGO Brick Sculpture Optimization. IEEE Access 6 (2018), 40429--40438.

[21]

Kyle Lennon, Katharina Fransen, Alexander O'Brien, Yumeng Cao, Yamin Beveridge, Matthewand Arefeen, Nikhil Singh, and Iddo Drori. 2021. Image2lego: Customized lego set generation from images. arXiv preprint arXiv:2108.08477 (2021).

[22]

Dongxu Li, Junnan Li, and Steven CH Hoi. 2023. Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing. arXiv preprint arXiv:2305.14720 (2023).

[23]

Xiaodan Liang, Si Liu, Xiaohui Shen, Jianchao Yang, Luoqi Liu, Jian Dong, Liang Lin, and Shuicheng Yan. 2015a. Deep human parsing with active template regression. IEEE transactions on pattern analysis and machine intelligence 37, 12 (2015), 2402--2414.

Digital Library

[24]

Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, and Shuicheng Yan. 2015b. Human parsing with contextualized convolutional neural network. In Proceedings of the IEEE international conference on computer vision. 1386--1394.

Digital Library

[25]

Sheng-Jie Luo, Yonghao Yue, Chun-Kai Huang, Yu-Huan Chung, Sei Imai, Tomoyuki Nishita, and Bing-Yu Chen. 2015. Legolization: Optimizing LEGO Designs. 34, 6 (2015). Article no. 222.

Digital Library

[26]

Jun Mitani and Hiromasa Suzuki. 2004. Making papercraft toys from meshes using strip-based approximate unfolding. ACM transactions on graphics (TOG) 23, 3 (2004), 259--263.

[27]

OpenAI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).

[28]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023).

[29]

Julian Panetta, Mina Konaković-Luković, Florin Isvoranu, Etienne Bouleau, and Mark Pauly. 2019. X-shells: A new class of deployable beam structures. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1--15.

Digital Library

[30]

Pavel Petrovič. 2001. Solving LEGO Brick Layout Problem using Evolutionary Algorithms. In Proc. NIK (Norsk Informatikkonferanse). 87--97.

[31]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.

[32]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10684--10695.

[33]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22500--22510.

[34]

Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. 2022. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings. 1--10.

Digital Library

[35]

Jing Shi, Wei Xiong, Zhe Lin, and Hyun Joon Jung. 2023. Instantbooth: Personalized text-to-image generation without test-time finetuning. arXiv preprint arXiv:2304.03411 (2023).

[36]

Guoxian Song, Linjie Luo, Jing Liu, Wan-Chun Ma, Chunpong Lai, Chuanxia Zheng, and Tat-Jen Cham. 2021. AgileGAN: stylizing portraits by inversion-consistent transfer learning. ACM Trans. Graph. 40, 4, Article 117 (jul 2021), 13 pages.

Digital Library

[37]

Peng Song, Xiaofei Wang, Xiao Tang, Chi-Wing Fu, Hongfei Xu, Ligang Liu, and Niloy J Mitra. 2017. Computational design of wind-up toys. ACM Transactions on Graphics (TOG) 36, 6 (2017), 1--13.

Digital Library

[38]

Enrique Soriano, Ramon Sastre, and Dionis Boixader. 2019. G-shells: Flat collapsible geodesic mechanisms for gridshells. In Proceedings of IASS annual symposia, Vol. 2019. International Association for Shell and Spatial Structures (IASS), 1--8.

[39]

Ben Stephenson. 2016. A Multi-Phase Search Approach to the LEGO Construction Problem. In Proc. Symposium on Combinatorial Search (SoCS). 89--97.

[40]

Romain Testuz, Yuliy Schwartzburg, and Mark Pauly. 2013. Automatic Generation of Constructible Brick Sculptures. In Eurographics (short paper). 81--84.

[41]

Narek Tumanyan, Omer Bar-Tal, Shai Bagon, and Tali Dekel. 2022. Splicing vit features for semantic appearance transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10748--10757.

[42]

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2023. Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1921--1930.

[43]

Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, and Fang Wen. 2022. Pretraining is all you need for image-to-image translation. arXiv preprint arXiv:2205.12952 (2022).

[44]

Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600--612.

Digital Library

[45]

David V. Winkler. 2005. Automated Brick Layout. In Proc. BrickFest. 145--166.

[46]

Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems 34 (2021), 12077--12090.

[47]

Hao Xu, Ka-Hei Hui, Chi-Wing Fu, and Hao Zhang. 2021. Computational LEGO Technic Design. ACM Trans. Graph. 38, 6, Article 196 (aug 2021), 14 pages.

Digital Library

[48]

Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, and Mike Zheng Shou. 2023. MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model. arXiv preprint arXiv:2311.16498 (2023).

[49]

Zhijin Yang, Pengfei Xu, Hongbo Fu, and Hui Huang. 2021. WireRoom: Model-Guided Explorative Design of Abstract Wire Art. ACM Transactions on Graphics 40, 4 (July 2021), 128:1--128:13.

Digital Library

[50]

Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023c. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836--3847.

[51]

Ran Zhang, Thomas Auzinger, and Bernd Bickel. 2021. Computational Design of Planar Multistable Compliant Structures. ACM Transactions on Graphics 40, 5 (Oct. 2021), 186:1--186:16.

Digital Library

[52]

Yuxin Zhang, Weiming Dong, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma, Tong-Yee Lee, Oliver Deussen, and Changsheng Xu. 2023a. ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models. 42, 6, Article 244 (dec 2023), 14 pages.

Digital Library

[53]

Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, and Changsheng Xu. 2023b. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10146--10156.

[54]

Jie Zhou, Xuejin Chen, and Y Xu. 2019. Automatic generation of vivid LEGO architectural sculptures. In Computer Graphics Forum, Vol. 38. Wiley Online Library, 31--42.

[55]

Mingjun Zhou, Jiahao Ge, Hao Xu, and Chi-Wing Fu. 2023. Computational Design of LEGO® Sketch Art. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1--15.

Digital Library

[56]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.

Index Terms

Creating LEGO Figurines from Single Images
1. Applied computing
  1. Operations research
    1. Computer-aided manufacturing
    2. Consumer products

Recommendations

Computational Design of LEGO^® Sketch Art

This paper presents computational methods to aid the creation of LEGO^®1 sketch models from simple input images. Beyond conventional LEGO^® mosaics, we aim to improve the expressiveness of LEGO^® models by utilizing LEGO^® tiles with sloping and rounding ...
Learn to Create Simple LEGO Micro Buildings

This paper presents the first learning-based generative pipeline for effectively creating 3D LEGO® ¹ models. This task is very challenging due to the lack of dedicated representations and datasets for learning coherently-connected bricks arrangements, as ...
LEGO Batman: graphical breakdown editing - optimising assembly workflow
DigiPro '17: Proceedings of the ACM SIGGRAPH Digital Production Symposium

The ever increasing complexity of the LEGO movies demanded a new way of managing project breakdowns. Animal Logic's fine-grained, modular representation for assets[Sarsfied and Murphy 2011] meant that hundreds and thousands of shots, and shot objects, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 43, Issue 4

July 2024

1774 pages

EISSN:1557-7368

DOI:10.1145/3675116

Issue’s Table of Contents

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2024

Published in TOG Volume 43, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

The Research Grants Council of the Hong Kong Special Administrative Region

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
1,580
Total Downloads

Downloads (Last 12 months)1,580
Downloads (Last 6 weeks)267

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents