research-article

Tops, Bottoms, and Shoes: Building Capsule Wardrobes via Cross-Attention Tensor Network

Authors:

Hao YangAuthors Info & Claims

RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems

Pages 453 - 462

https://doi.org/10.1145/3460231.3474258

Published: 13 September 2021 Publication History

Abstract

Fashion is more than Paris runways. Fashion is about how people express their interests, identity, mood, and cultural influences. Given an inventory of candidate garments from different categories, how to assemble them together would most improve their fashionability? This question presents an intriguing visual recommendation challenge to automatically create capsule wardrobes. Capsule wardrobe generation is a complex combinatorial problem that requires the understanding of how multiple visual items interact. The generative process often needs fashion experts to manually tease the combinations out, making it hard to scale.

We introduce TensorNet, an approach that captures the key ingredients of visual compatibility among tops, bottoms, and shoes. TensorNet aims to provide actionable advice for full-body clothing outfits that mix and match well. Our TensorNet consists of two core modules: a Cross-Attention Message Passing module and a Wide&Deep Tensor Interaction module. As such, TensorNet is able to characterize the local region-based patterns as well as the global compatibility of the entire outfits. Our experimental results on the real-word datasets indicate that the proposed method is capable of learning visual compatibility and outperforms all the baselines. TensorNet opens up opportunities for fashion designers to narrow down the search space for multi-clothes combinations.

Supplementary Material

MP4 File (RecSys2021.mp4)

Video for "Tops, Bottoms, and Shoes: Building Capsule Wardrobes via Cross-Attention Tensor Network"

Download
90.87 MB

References

[1]

Kenan E Ak, Ashraf A Kassim, Joo Hwee Lim, and Jo Yew Tham. 2018. Learning attribute representations with localization for flexible fashion search. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7708–7717.

[2]

Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6077–6086.

[3]

Ada S Ballin. 1885. The science of dress in theory and practice. Sampson, Low, Marston, Searle & Rivington.

[4]

Irwan Bello. 2021. LambdaNetworks: Modeling long-range Interactions without Attention. In International Conference on Learning Representations.

[5]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Neural Information Processing Systems. 1–9.

[6]

Huiyuan Chen and Jing Li. 2019. Adversarial tensor factorization for context-aware recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 363–367.

Digital Library

[7]

Huiyuan Chen and Jing Li. 2020. Neural Tensor Model for Learning Multi-Aspect Factors in Recommender Systems. In IJCAI. 2449–2455.

[8]

Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, and Binqiang Zhao. 2019. POG: personalized outfit generation for fashion recommendation at Alibaba iFashion. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2662–2670.

Digital Library

[9]

Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.

Digital Library

[10]

Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger, Lucy J Colwell, and Adrian Weller. 2021. Rethinking Attention with Performers. In International Conference on Learning Representations.

[11]

Xintong Han, Zuxuan Wu, Yu-Gang Jiang, and Larry S Davis. 2017. Learning fashion compatibility with bidirectional lstms. In Proceedings of the 25th ACM international conference on Multimedia. 1078–1086.

Digital Library

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[13]

Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.

[14]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.

Digital Library

[15]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. NeurIPS Deep Learning and Representation Learning Workshop (2015).

[16]

Ruibing Hou, Hong Chang, Bingpeng MA, Shiguang Shan, and Xilin Chen. 2019. Cross Attention Network for Few-shot Classification. In Advances in Neural Information Processing Systems.

[17]

Wei-Lin Hsiao and Kristen Grauman. 2018. Creating capsule wardrobes from fashion images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7161–7170.

[18]

Yang Hu, Xi Yi, and Larry S Davis. 2015. Collaborative fashion recommendation: A functional tensor factorization approach. In Proceedings of the 23rd ACM international conference on Multimedia. 129–138.

Digital Library

[19]

Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. 2010. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems. 79–86.

Digital Library

[20]

Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning. 5156–5165.

[21]

Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 (2009), 455–500.

Digital Library

[22]

Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, and Wayne Zhang. 2019. Fashion retrieval via graph reasoning networks on a similarity pyramid. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3066–3075.

[23]

Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the European Conference on Computer Vision. 201–216.

[24]

Yusan Lin, Maryam Moosaei, and Hao Yang. 2020. OutfitNet: Fashion outfit recommendation with attention-based multiple instance learning. In Proceedings of The Web Conference 2020. 77–87.

Digital Library

[25]

Hanpeng Liu, Yaguang Li, Michael Tsang, and Yan Liu. 2019. CoSTCo: A neural tensor completion model for sparse tensors. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 324–334.

Digital Library

[26]

Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah Smith, and Lingpeng Kong. 2021. Random Feature Attention. In International Conference on Learning Representations.

[27]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems.

[28]

Steffen Rendle and Lars Schmidt-Thieme. 2010. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the third ACM international conference on Web search and data mining. 81–90.

Digital Library

[29]

Kihyuk Sohn. 2016. Improved deep metric learning with multi-class n-pair loss objective. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 1857–1865.

[30]

Mariya I Vasileva, Bryan A Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, and David Forsyth. 2018. Learning type-aware embeddings for fashion compatibility. In Proceedings of the European Conference on Computer Vision (ECCV). 390–405.

Digital Library

[31]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems.

[32]

Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, and Jing Shao. 2019. Camp: Cross-modal adaptive message passing for text-image retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5764–5773.

[33]

Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020. Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10941–10950.

[34]

Xian Wu, Baoxu Shi, Yuxiao Dong, Chao Huang, and Nitesh V Chawla. 2019. Neural tensor factorization for temporal interaction learning. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 537–545.

Digital Library

[35]

Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh. 2021. Nyströmformer: A Nyström-based Algorithm for Approximating Self-Attention. In Proceedings of the AAAI Conference on Artificial Intelligence.

[36]

Zenglin Xu, Feng Yan, and Yuan Alan Qi. 2012. Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis. In International Conference on Machine Learning.

[37]

Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang. 2019. Cross-modal self-attention network for referring image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10502–10511.

[38]

Ruiping Yin, Kan Li, Jie Lu, and Guangquan Zhang. 2019. Enhancing fashion recommendation with visual compatibility relationship. In The World Wide Web Conference. 3434–3440.

Digital Library

[39]

Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong, and Zheng Qin. 2018. Aesthetic-based clothing recommendation. In Proceedings of the 2018 world wide web conference. 649–658.

Digital Library

[40]

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences. In Advances in Neural Information Processing Systems.

Cited By

Selwon KSzymański J(2024)A Review of Explainable Fashion Compatibility Modeling MethodsACM Computing Surveys10.1145/366461456:11(1-29)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3664614
Tanaka YOzaki T(2024)Interactive construction of personalized fashion capsule wardrobes with alternative item recommendations2024 7th International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT62343.2024.00086(493-498)Online publication date: 15-Mar-2024
https://doi.org/10.1109/ICICT62343.2024.00086
Lai VChen HYeh CXu MCai YYang H(2023)Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608831(791-797)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608831
Show More Cited By

Recommendations

Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Existing works about fashion outfit compatibility focus on predicting the overall compatibility of a set of fashion items with their information from different modalities. However, there are few works explore how to explain the prediction, which limits ...
Outfit Recommendation using Graph Neural Networks via Visual Similarity
Analysis of Images, Social Networks and Texts
Abstract
Computer vision plays an important role in the development of the fashion industry. There has been a lot of research done on various fashion recommendations, and determining the compatibility of clothing is a key factor in most of them. Solving ...
Diagnosing fashion outfit compatibility with deep learning techniques
Highlights
- We propose an end-to-end framework for outfit compatibility diagnosing is developed based on image captioning. It provides feedback for improving outfit ...
Abstract
Fashion image understanding is a popular research field with many different machine learning applications. There have been many studies regarding outfit prediction and outfit composition in the field of fashion. However, there are few ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems

September 2021

883 pages

ISBN:9781450384582

DOI:10.1145/3460231

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

RecSys '21

Sponsor:

RecSys '21: Fifteenth ACM Conference on Recommender Systems

September 27 - October 1, 2021

Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
593
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Selwon KSzymański J(2024)A Review of Explainable Fashion Compatibility Modeling MethodsACM Computing Surveys10.1145/366461456:11(1-29)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3664614
Tanaka YOzaki T(2024)Interactive construction of personalized fashion capsule wardrobes with alternative item recommendations2024 7th International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT62343.2024.00086(493-498)Online publication date: 15-Mar-2024
https://doi.org/10.1109/ICICT62343.2024.00086
Lai VChen HYeh CXu MCai YYang H(2023)Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608831(791-797)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608831
Chen HZhou KLai KYeh CZheng YHu XYang H(2023)Hessian-aware Quantized Node Embeddings for RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608826(757-762)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3604915.3608826
Chen HLi XZhou KHu XYeh CZheng YYang H(2022)TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender SystemsProceedings of the 16th ACM Conference on Recommender Systems10.1145/3523227.3546760(257-267)Online publication date: 12-Sep-2022
https://dl.acm.org/doi/10.1145/3523227.3546760
Moosaei MLin YAkhazhanov AChen HWang FYang H(2022)OutfitGAN: Learning Compatible Items for Generative Fashion Outfits2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW56347.2022.00251(2272-2276)Online publication date: Jun-2022
https://doi.org/10.1109/CVPRW56347.2022.00251
Tanaka YOzaki T(2022)An extension of optimal fashion capsule wardrobe construction by considering visual dissimilarity and number of good coordinates2022 Tenth International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW57323.2022.00052(224-228)Online publication date: Nov-2022
https://doi.org/10.1109/CANDARW57323.2022.00052
Banerjee DDhakad LMaheshwari HChelliah MGanguly NBhattacharya A(2022)Recommendation of Compatible Outfits Conditioned on StyleAdvances in Information Retrieval10.1007/978-3-030-99736-6_3(35-50)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99736-6_3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten