Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3460231.3474258acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Tops, Bottoms, and Shoes: Building Capsule Wardrobes via Cross-Attention Tensor Network

Published: 13 September 2021 Publication History

Abstract

Fashion is more than Paris runways. Fashion is about how people express their interests, identity, mood, and cultural influences. Given an inventory of candidate garments from different categories, how to assemble them together would most improve their fashionability? This question presents an intriguing visual recommendation challenge to automatically create capsule wardrobes. Capsule wardrobe generation is a complex combinatorial problem that requires the understanding of how multiple visual items interact. The generative process often needs fashion experts to manually tease the combinations out, making it hard to scale.
We introduce TensorNet, an approach that captures the key ingredients of visual compatibility among tops, bottoms, and shoes. TensorNet aims to provide actionable advice for full-body clothing outfits that mix and match well. Our TensorNet consists of two core modules: a Cross-Attention Message Passing module and a Wide&Deep Tensor Interaction module. As such, TensorNet is able to characterize the local region-based patterns as well as the global compatibility of the entire outfits. Our experimental results on the real-word datasets indicate that the proposed method is capable of learning visual compatibility and outperforms all the baselines. TensorNet opens up opportunities for fashion designers to narrow down the search space for multi-clothes combinations.

Supplementary Material

MP4 File (RecSys2021.mp4)
Video for "Tops, Bottoms, and Shoes: Building Capsule Wardrobes via Cross-Attention Tensor Network"

References

[1]
Kenan E Ak, Ashraf A Kassim, Joo Hwee Lim, and Jo Yew Tham. 2018. Learning attribute representations with localization for flexible fashion search. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7708–7717.
[2]
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6077–6086.
[3]
Ada S Ballin. 1885. The science of dress in theory and practice. Sampson, Low, Marston, Searle & Rivington.
[4]
Irwan Bello. 2021. LambdaNetworks: Modeling long-range Interactions without Attention. In International Conference on Learning Representations.
[5]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Neural Information Processing Systems. 1–9.
[6]
Huiyuan Chen and Jing Li. 2019. Adversarial tensor factorization for context-aware recommendation. In Proceedings of the 13th ACM Conference on Recommender Systems. 363–367.
[7]
Huiyuan Chen and Jing Li. 2020. Neural Tensor Model for Learning Multi-Aspect Factors in Recommender Systems. In IJCAI. 2449–2455.
[8]
Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, and Binqiang Zhao. 2019. POG: personalized outfit generation for fashion recommendation at Alibaba iFashion. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2662–2670.
[9]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7–10.
[10]
Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger, Lucy J Colwell, and Adrian Weller. 2021. Rethinking Attention with Performers. In International Conference on Learning Representations.
[11]
Xintong Han, Zuxuan Wu, Yu-Gang Jiang, and Larry S Davis. 2017. Learning fashion compatibility with bidirectional lstms. In Proceedings of the 25th ACM international conference on Multimedia. 1078–1086.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[13]
Ruining He and Julian McAuley. 2016. VBPR: visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.
[14]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.
[15]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. NeurIPS Deep Learning and Representation Learning Workshop (2015).
[16]
Ruibing Hou, Hong Chang, Bingpeng MA, Shiguang Shan, and Xilin Chen. 2019. Cross Attention Network for Few-shot Classification. In Advances in Neural Information Processing Systems.
[17]
Wei-Lin Hsiao and Kristen Grauman. 2018. Creating capsule wardrobes from fashion images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7161–7170.
[18]
Yang Hu, Xi Yi, and Larry S Davis. 2015. Collaborative fashion recommendation: A functional tensor factorization approach. In Proceedings of the 23rd ACM international conference on Multimedia. 129–138.
[19]
Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. 2010. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems. 79–86.
[20]
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. 2020. Transformers are rnns: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning. 5156–5165.
[21]
Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 (2009), 455–500.
[22]
Zhanghui Kuang, Yiming Gao, Guanbin Li, Ping Luo, Yimin Chen, Liang Lin, and Wayne Zhang. 2019. Fashion retrieval via graph reasoning networks on a similarity pyramid. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3066–3075.
[23]
Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the European Conference on Computer Vision. 201–216.
[24]
Yusan Lin, Maryam Moosaei, and Hao Yang. 2020. OutfitNet: Fashion outfit recommendation with attention-based multiple instance learning. In Proceedings of The Web Conference 2020. 77–87.
[25]
Hanpeng Liu, Yaguang Li, Michael Tsang, and Yan Liu. 2019. CoSTCo: A neural tensor completion model for sparse tensors. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 324–334.
[26]
Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah Smith, and Lingpeng Kong. 2021. Random Feature Attention. In International Conference on Learning Representations.
[27]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems.
[28]
Steffen Rendle and Lars Schmidt-Thieme. 2010. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the third ACM international conference on Web search and data mining. 81–90.
[29]
Kihyuk Sohn. 2016. Improved deep metric learning with multi-class n-pair loss objective. In Proceedings of the 30th International Conference on Neural Information Processing Systems. 1857–1865.
[30]
Mariya I Vasileva, Bryan A Plummer, Krishna Dusad, Shreya Rajpal, Ranjitha Kumar, and David Forsyth. 2018. Learning type-aware embeddings for fashion compatibility. In Proceedings of the European Conference on Computer Vision (ECCV). 390–405.
[31]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems.
[32]
Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, and Jing Shao. 2019. Camp: Cross-modal adaptive message passing for text-image retrieval. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5764–5773.
[33]
Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020. Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10941–10950.
[34]
Xian Wu, Baoxu Shi, Yuxiao Dong, Chao Huang, and Nitesh V Chawla. 2019. Neural tensor factorization for temporal interaction learning. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 537–545.
[35]
Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh. 2021. Nyströmformer: A Nyström-based Algorithm for Approximating Self-Attention. In Proceedings of the AAAI Conference on Artificial Intelligence.
[36]
Zenglin Xu, Feng Yan, and Yuan Alan Qi. 2012. Infinite Tucker Decomposition: Nonparametric Bayesian Models for Multiway Data Analysis. In International Conference on Machine Learning.
[37]
Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang. 2019. Cross-modal self-attention network for referring image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10502–10511.
[38]
Ruiping Yin, Kan Li, Jie Lu, and Guangquan Zhang. 2019. Enhancing fashion recommendation with visual compatibility relationship. In The World Wide Web Conference. 3434–3440.
[39]
Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong, and Zheng Qin. 2018. Aesthetic-based clothing recommendation. In Proceedings of the 2018 world wide web conference. 649–658.
[40]
Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences. In Advances in Neural Information Processing Systems.

Cited By

View all
  • (2024)A Review of Explainable Fashion Compatibility Modeling MethodsACM Computing Surveys10.1145/366461456:11(1-29)Online publication date: 28-Jun-2024
  • (2024)Interactive construction of personalized fashion capsule wardrobes with alternative item recommendations2024 7th International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT62343.2024.00086(493-498)Online publication date: 15-Mar-2024
  • (2023)Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608831(791-797)Online publication date: 14-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
RecSys '21: Proceedings of the 15th ACM Conference on Recommender Systems
September 2021
883 pages
ISBN:9781450384582
DOI:10.1145/3460231
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cross-Attention
  2. Fashion Recommendation
  3. Linear Attention
  4. Neural Tensor Network

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

RecSys '21: Fifteenth ACM Conference on Recommender Systems
September 27 - October 1, 2021
Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)6
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Review of Explainable Fashion Compatibility Modeling MethodsACM Computing Surveys10.1145/366461456:11(1-29)Online publication date: 28-Jun-2024
  • (2024)Interactive construction of personalized fashion capsule wardrobes with alternative item recommendations2024 7th International Conference on Information and Computer Technologies (ICICT)10.1109/ICICT62343.2024.00086(493-498)Online publication date: 15-Mar-2024
  • (2023)Enhancing Transformers without Self-supervised Learning: A Loss Landscape Perspective in Sequential RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608831(791-797)Online publication date: 14-Sep-2023
  • (2023)Hessian-aware Quantized Node Embeddings for RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608826(757-762)Online publication date: 14-Sep-2023
  • (2022)TinyKG: Memory-Efficient Training Framework for Knowledge Graph Neural Recommender SystemsProceedings of the 16th ACM Conference on Recommender Systems10.1145/3523227.3546760(257-267)Online publication date: 12-Sep-2022
  • (2022)OutfitGAN: Learning Compatible Items for Generative Fashion Outfits2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW56347.2022.00251(2272-2276)Online publication date: Jun-2022
  • (2022)An extension of optimal fashion capsule wardrobe construction by considering visual dissimilarity and number of good coordinates2022 Tenth International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW57323.2022.00052(224-228)Online publication date: Nov-2022
  • (2022)Recommendation of Compatible Outfits Conditioned on StyleAdvances in Information Retrieval10.1007/978-3-030-99736-6_3(35-50)Online publication date: 10-Apr-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media