Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394486.3403311acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce

Published: 20 August 2020 Publication History

Abstract

In this paper, we present GrokNet, a deployed image recognition system for commerce applications. GrokNet leverages a multi-task learning approach to train a single computer vision trunk. We achieve a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system. We achieve this by training on 7 datasets across several commerce verticals, using 80 categorical loss functions and 3 embedding losses. We share our experience of combining diverse sources with wide-ranging label semantics and image statistics, including learning from human annotations, user-generated tags, and noisy search engine interaction data. GrokNet has demonstrated gains in production applications and operates at Facebook scale.

References

[1]
2016. PyTorch. http://pytorch.org/
[2]
2017. Google Lens. https://lens.google.com/
[3]
Sean Bell and Kavita Bala. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Trans. Graph. (2015).
[4]
Maxim Berman, Hervé Jégou, Vedaldi Andrea, Iasonas Kokkinos, and Matthijs Douze. 2019. MultiGrain: a unified image embedding for classes and instances. arXiv e-prints (Feb 2019).
[5]
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature Verification Using a "Siamese" Time Delay Neural Network. In NIPS.
[6]
Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. 2018. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. In ICML.
[7]
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a Similarity Metric Discriminatively, with Application to Face Verification. In CVPR.
[8]
Chuan, Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On Calibration of Modern Neural Networks. In JMLR.
[9]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In CVPR.
[10]
Piotr Dollár, Zhuowen Tu, Pietro Perona, and Serge J. Belongie. 2009. Integral Channel Features. In BMVC.
[11]
Jeffrey Dunn. 2016. Introducing FBLearner Flow: Facebook's AI backbone. https://code.fb.com/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/
[12]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.
[13]
Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality Reduction by Learning an Invariant Mapping. In CVPR.
[14]
Houdong Hu, Yan Wang, Linjun Yang, Pavel Komlev, Li Huang, Xi (Stephen) Chen, Jiapei Huang, Ye Wu, Meenaz Merchant, and Arun Sacheti. 2018. Web-Scale Responsive Visual Search at Bing. In KDD.
[15]
Armand Joulin, Laurens van der Maaten, Allan Jabri, and Nicolas Vasilache. 2016. Learning visual features from large weakly supervised data. In ECCV.
[16]
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2017. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. CoRR, Vol. abs/1705.07115 (2017).
[17]
Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, and Le Song. 2017. SphereFace: Deep Hypersphere Embedding for Face Recognition. In CVPR.
[18]
Wenjie Luo, Bin Yang, and Raquel Urtasun. 2018. Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting With a Single Convolutional Net. In CVPR.
[19]
Dhruv Mahajan, Ross B. Girshick, Vignesh Ramanathan, Kaiming He, Manohar Paluri, Yixuan Li, Ashwin Bharambe, and Laurens van der Maaten. 2018. Exploring the Limits of Weakly Supervised Pretraining. In ECCV.
[20]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS. 3111--3119.
[21]
Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, and Martial Hebert. 2016. Cross-Stitch Networks for Multi-task Learning. In CVPR.
[22]
Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2019. Fine-Tuning CNN Image Retrieval with No Human Annotation. (2019).
[23]
Zhongzheng Ren and Yong Jae Lee. 2017. Cross-Domain Self-supervised Multi-task Feature Learning using Synthetic Imagery. arxiv: cs.CV/1711.09082
[24]
Alexandre Sablayrolles, Matthijs Douze, Cordelia Schmid, and Hervé Jé gou. 2019. Spreading vectors for similarity search. In ICLR.
[25]
Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In ICCV.
[26]
Yina Tang, Fedor Borisyuk, Siddarth Malreddy, Yixuan Li, Yiqun Liu, and Sergey Kirshner. 2019. MSURU: Large Scale E-commerce Image Classification with Weakly Supervised Search Data. In KDD.
[27]
Giorgos Tolias, Ronan Sicre, and Hervé Jé gou. 2016. Particular object retrieval with integral max-pooling of CNN activations. In ICLR.
[28]
Feng Wang, Xiang Xiang, Jian Cheng, and Alan Loddon Yuille. 2017. NormFace: L(_mbox2) Hypersphere Embedding for Face Verification. In Multimedia Conference, MM.
[29]
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. 2018. CosFace: Large Margin Cosine Loss for Deep Face Recognition. In CVPR.
[30]
Chao-Yuan Wu, R. Manmatha, Alexander J. Smola, and Philipp Krahenbuhl. 2017. Sampling Matters in Deep Embedding Learning. In ICCV.
[31]
Zhirong Wu, Yuanjun Xiong, Stella Yu, and Dahua Lin. 2018. Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination. CoRR (2018).
[32]
Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2016. Aggregated Residual Transformations for Deep Neural Networks. CVPR (2016).
[33]
Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, M. Hadi Kiapour, and Robinson Piramuthu. 2017. Visual Search at eBay. In KDD.
[34]
Andrew Zhai, Dmitry Kislyuk, Yushi Jing, Michael Feng, Eric Tzeng, Jeff Donahue, Yue Li Du, and Trevor Darrell. 2017. Visual Discovery at Pinterest. In WWW.
[35]
Andrew Zhai and Hao-Yu Wu. 2019. Classification is a strong baseline for deep metric learning. In BMVC.
[36]
Andrew Zhai, Hao-Yu Wu, Eric Tzeng, Dong Huk Park, and Charles Rosenberg. 2019. Learning a Unified Embedding for Visual Search at Pinterest. In KDD.
[37]
Yanhao Zhang, Pan Pan, Yun Zheng, Kang Zhao, Yingya Zhang, Xiaofeng Ren, and Rong Jin. 2018. Visual Search at Alibaba. In KDD.

Cited By

View all
  • (2024)Bringing Multimodality to Amazon Visual Search SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671640(6390-6399)Online publication date: 25-Aug-2024
  • (2024)FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651287(1-8)Online publication date: 30-Jun-2024
  • (2024)De-noised Vision-language Fusion Guided by Visual Cues for E-commerce Product Search2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00204(1986-1996)Online publication date: 17-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. e-commerce image understanding
  3. embedding
  4. image classification
  5. multi-task learning

Qualifiers

  • Research-article

Conference

KDD '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)41
  • Downloads (Last 6 weeks)5
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bringing Multimodality to Amazon Visual Search SystemProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671640(6390-6399)Online publication date: 25-Aug-2024
  • (2024)FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651287(1-8)Online publication date: 30-Jun-2024
  • (2024)De-noised Vision-language Fusion Guided by Visual Cues for E-commerce Product Search2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00204(1986-1996)Online publication date: 17-Jun-2024
  • (2024)Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Position PaperNatural Language Processing Journal10.1016/j.nlp.2024.1000767(100076)Online publication date: Jun-2024
  • (2023)Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook MarketplaceCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3584633(386-390)Online publication date: 30-Apr-2023
  • (2023)Shop by image: characterizing visual search in e-commerceInformation Retrieval Journal10.1007/s10791-023-09418-126:1Online publication date: 3-Mar-2023
  • (2022)PinnerFormerProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539156(3702-3712)Online publication date: 14-Aug-2022
  • (2022)CommerceMMProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539151(4433-4442)Online publication date: 14-Aug-2022
  • (2022)Amazon Shop the Look: A Visual Search System for Fashion and HomeProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539071(2822-2830)Online publication date: 14-Aug-2022
  • (2022)e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerceProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557067(3484-3494)Online publication date: 17-Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media