Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3539597.3570392acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

Published: 27 February 2023 Publication History

Abstract

Extreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its superiority over earlier XC methods that used sparse, hand-crafted features. Negative mining techniques have emerged as a critical component of all deep XC methods, allowing them to scale to millions of labels. However, despite recent advances, training deep XC models with large encoder architectures such as transformers remains challenging. This paper notices that memory overheads of popular negative mining techniques often force mini-batch sizes to remain small and slow training down. In response, this paper introduces NGAME, a light-weight mini-batch creation technique that offers provably accurate in-batch negative samples. This allows training with larger mini-batches offering significantly faster convergence and higher accuracies than existing negative sampling techniques. NGAME was found to be up to 16% more accurate than state-of-the-art methods on a wide array of benchmark datasets for extreme classification, as well as 3% more accurate at retrieving search engine queries in response to a user webpage visit to show personalized ads. In live A/B tests on a popular search engine, NGAME yielded up to 23% gains in click-through-rates. Code for NGAME is available at https://github.com/Extreme-classification/ngame

Supplementary Material

MP4 File (WSDM23-fp0171.mp4)
Presentation video for NGAME - a light-weight negative-mining aware mini-batching strategy.
MP4 File (32_wsdm2023_dahiya_extreme_classification_01.mp4-streaming.mp4)
NGAME: Negative Mining-aware Mini-batching for Extreme Classification

References

[1]
R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW.
[2]
R. Babbar and B. Schölkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.
[3]
R. Babbar and B. Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. ML (2019).
[4]
E. J. Barezi, I. D. W., P. Fung, and H. R. Rabiee. 2019. A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems. In NAACL.
[5]
K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. 2016. The Extreme Classification Repository: Multi-label Datasets & Code. http://manikvarma.org/downloads/XC/XMLRepository.html
[6]
K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS.
[7]
I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos. 2019. Extreme Multi-Label Legal Text Classification: A case study in EU Legislation. In ACL.
[8]
C. W. Chang, H. F. Yu, K. Zhong, Y. Yang, and I. S. Dhillon. 2019. A Modular Deep Learning Approach for Extreme Multi-label Text Classification. CoRR (2019).
[9]
W. C. Chang, Yu H. F., K. Zhong, Y. Yang, and I. S. Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD.
[10]
S.-A. Chen, J.-J. Liu, T.-H. Yang, H.-T. Lin, and C.-J. Lin. 2022. Even the Simplest Baseline Needs Careful Re-investigation: A Case Study on XML-CNN. In NAACL.
[11]
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML.
[12]
M Cissé, N. Usunier, T. Artières, and P. Gallinari. 2013. Robust bloom filters for large multilabel classification tasks. In NIPS.
[13]
K. Dahiya, A. Agarwal, D. Saini, K. Gururaj, J. Jiao, A. Singh, S. Agarwal, P. Kar, and M. Varma. 2021a. SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels. In ICML.
[14]
K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal, and M. Varma. 2021b. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents. In WSDM.
[15]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL (2019).
[16]
F. Faghri, D.-J. Fleet, J.-R. Kiros, and S. Fidler. 2018. VSE: Improving Visual-Semantic Embeddings with Hard Negatives. In BMVC.
[17]
C. Guo, A. Mousavi, X. Wu, D.-N. Holtmann-Rice, S. Kale, S. Reddi, and S. Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In NeurIPS.
[18]
V. Gupta, R. Wadbude, N. Natarajan, H. Karnick, P. Jain, and P. Rai. 2019. Distributional Semantics Meets Multi-Label Learning. In AAAI.
[19]
B. Harwood, Kumar B.-V., G. Carneiro, I. Reid, and T. Drummond. 2017. Smart mining for deep metric learning. In ICCV.
[20]
K. He, Haoqi Fan, Yuxin W., S. Xie, and R. Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR.
[21]
S. Hofstätter, S.-C. Lin, J.-H. Yang, J. Lin, and A. Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. In SIGIR.
[22]
P. S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. In CIKM.
[23]
H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma. 2019. Slice: Scalable Linear Extreme Classifiers trained on 100 Million Labels for Related Searches. In WSDM.
[24]
H. Jain, Y. Prabhu, and M. Varma. 2016. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking and Other Missing Label Applications. In KDD.
[25]
T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, and F. Zhuang. 2021. LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification. In AAAI.
[26]
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In EACL.
[27]
V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-T. Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In EMNLP.
[28]
K. Kawaguchi and H. Lu. 2020. Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization. In AISTATS.
[29]
S. Khandagale, H. Xiao, and R. Babbar. 2020. Bonsai: diverse and shallow trees for extreme multi-label classification. ML (2020).
[30]
S. Kharbanda, A. Banerjee, A. Palrecha, and R. Babbar. 2021. Embedding Convolutions for Short Text Extreme Classification with Millions of Labels. arXiv preprint arXiv:2109.07319 (2021).
[31]
M. C. Lee, B. Gao, and R. Zhang. 2018. Rare Query Expansion Through Generative Adversarial Networks in Search Advertising. In KDD.
[32]
J. Liu, W. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR.
[33]
W. Lu, J. Jiao, and R. Zhang. 2020. TwinBERT: Distilling Knowledge to Twin-Structured Compressed BERT Models for Large-Scale Retrieval. In CIKM.
[34]
Y. Luan, J. Eisenstein, K. Toutanova, and M. Collins. 2020. Sparse, Dense, and Attentional Representations for Text Retrieval. TACL.
[35]
A. Y. Malkov and D. A. Yashunin. 2020. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. TPAMI (2020).
[36]
T. K. R. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava. 2019. Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products. In NeurIPS.
[37]
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NIPS.
[38]
A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar, and M. Varma. 2021a. DECAF: Deep Extreme Classification with Label Features. In WSDM.
[39]
A. Mittal, K. Dahiya, S. Malani, J. Ramaswamy, S. Kuruvilla, J. Ajmera, K. Chang, S. Agrawal, P. Kar, and M. Varma. 2022. Multimodal Extreme Classification. In CVPR.
[40]
A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar, and M. Varma. 2021b. ECLARE: Extreme Classification with Label Graph Correlations. In WWW.
[41]
R. Panda, A. Pensia, N. Mehta, M. Zhou, and P. Rai. 2019. Deep Topic Models for Multi-label Learning. In ICML.
[42]
Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal, and M. Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM.
[43]
Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. 2018b. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In WWW.
[44]
Y. Prabhu, A. Kusupati, N. Gupta, and M. Varma. 2020. Extreme Regression for Dynamic Search Advertising. In WSDM.
[45]
Y. Prabhu and M. Varma. 2014. FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning. In KDD.
[46]
M. Qaraei, E. Schultheis, P. Gupta, and R. Babbar. 2021. Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels. In The WebConf.
[47]
Y. Qu, Y. Ding, J. Liu, K. Liu, R. Ren, W. X. Zhao, D. Dong, H. Wu, and H. Wang. 2021. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering.
[48]
A. S. Rawat, A. K. Menon, W. Jitkrittum, S. Jayasumana, F. X. Yu, S. Reddi, and S. Kumar. 2021. Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces. In ICML.
[49]
S. J. Reddi, S. Kale, F.X. Yu, D. N. H. Rice, J. Chen, and S. Kumar. 2019. Stochastic Negative Mining for Learning with Large Output Spaces. In AISTATS.
[50]
N. Reimers and I. Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. EMNLP (2019).
[51]
D. Saini, A.K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang, and M. Varma. 2021. GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification. In WWW.
[52]
V. Sanh, L. Debut, J. Chaumond, and T. Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv (2019).
[53]
F. Schroff, D. Kalenichenko, and J. Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In CVPR.
[54]
W. Siblini, P. Kuntz, and F. Meyer. 2018. CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In ICML.
[55]
L. Song, P. Pan, K. Zhao, H. Yang, Y. Chen, Y. Zhang, Y. Xu, and R. Jin. 2020. Large-Scale Training System for 100-Million Classification at Alibaba. In KDD.
[56]
T. Wei, W. W. Tu, and Y. F. Li. 2019. Learning for Tail Label Data: A Label-Specific Feature Approach. In IJCAI.
[57]
L. Xiong, C. Xiong, Y. Li, K.-F. Tang, J. Liu, P. Bennett, J. Ahmed, and A. Overwijk. 2021. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In ICLR.
[58]
H. Ye, Z. Chen, D.-H. Wang, and B. D. Davison. 2020. Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification. In ICML.
[59]
E. H. I. Yen, X. Huang, W. Dai, P. Ravikumar, I. Dhillon, and E. Xing. 2017. PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD.
[60]
R. You, S. Dai, Z. Zhang, H. Mamitsuka, and S. Zhu. 2019. AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks. In NeurIPS.
[61]
J. Zhang, W. C. Chang, H. F. Yu, and I. Dhillon. 2021. Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. In NeurIPS.
[62]
W. Zhang, L. Wang, J. Yan, X. Wang, and H. Zha. 2018. Deep Extreme Multi-label Learning. ICMR (2018).

Cited By

View all
  • (2024)OAKProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693537(36012-36028)Online publication date: 21-Jul-2024
  • (2024)Contrastive representation learning for self-supervised taxonomy completionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/712(6442-6450)Online publication date: 3-Aug-2024
  • (2024)Gandalf: Learning Label-label Correlations in Extreme Multi-label Classification via Label FeaturesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672063(1360-1371)Online publication date: 25-Aug-2024
  • Show More Cited By

Index Terms

  1. NGAME: Negative Mining-aware Mini-batching for Extreme Classification

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining
    February 2023
    1345 pages
    ISBN:9781450394079
    DOI:10.1145/3539597
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 February 2023

    Check for updates

    Author Tags

    1. extreme multi-label learning
    2. large-scale learning
    3. negative sampling
    4. personalized ads
    5. siamese networks
    6. sponsored search

    Qualifiers

    • Research-article

    Conference

    WSDM '23

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)97
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)OAKProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693537(36012-36028)Online publication date: 21-Jul-2024
    • (2024)Contrastive representation learning for self-supervised taxonomy completionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/712(6442-6450)Online publication date: 3-Aug-2024
    • (2024)Gandalf: Learning Label-label Correlations in Extreme Multi-label Classification via Label FeaturesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672063(1360-1371)Online publication date: 25-Aug-2024
    • (2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
    • (2023)Personalized Retrieval over Millions of ItemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591749(1014-1022)Online publication date: 19-Jul-2023
    • (2023)Meta-classifier free negative sampling for extreme multilabel classificationMachine Language10.1007/s10994-023-06468-w113:2(675-697)Online publication date: 20-Nov-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media