research-article

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Pages 258 - 266

https://doi.org/10.1145/3539597.3570392

Published: 27 February 2023 Publication History

Abstract

Extreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its superiority over earlier XC methods that used sparse, hand-crafted features. Negative mining techniques have emerged as a critical component of all deep XC methods, allowing them to scale to millions of labels. However, despite recent advances, training deep XC models with large encoder architectures such as transformers remains challenging. This paper notices that memory overheads of popular negative mining techniques often force mini-batch sizes to remain small and slow training down. In response, this paper introduces NGAME, a light-weight mini-batch creation technique that offers provably accurate in-batch negative samples. This allows training with larger mini-batches offering significantly faster convergence and higher accuracies than existing negative sampling techniques. NGAME was found to be up to 16% more accurate than state-of-the-art methods on a wide array of benchmark datasets for extreme classification, as well as 3% more accurate at retrieving search engine queries in response to a user webpage visit to show personalized ads. In live A/B tests on a popular search engine, NGAME yielded up to 23% gains in click-through-rates. Code for NGAME is available at https://github.com/Extreme-classification/ngame

Supplementary Material

MP4 File (WSDM23-fp0171.mp4)

Presentation video for NGAME - a light-weight negative-mining aware mini-batching strategy.

Download
134.50 MB

MP4 File (32_wsdm2023_dahiya_extreme_classification_01.mp4-streaming.mp4)

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

Download
692.05 MB

References

[1]

R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW.

[2]

R. Babbar and B. Schölkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.

[3]

R. Babbar and B. Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. ML (2019).

[4]

E. J. Barezi, I. D. W., P. Fung, and H. R. Rabiee. 2019. A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems. In NAACL.

[5]

K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma. 2016. The Extreme Classification Repository: Multi-label Datasets & Code. http://manikvarma.org/downloads/XC/XMLRepository.html

[6]

K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS.

[7]

I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos. 2019. Extreme Multi-Label Legal Text Classification: A case study in EU Legislation. In ACL.

[8]

C. W. Chang, H. F. Yu, K. Zhong, Y. Yang, and I. S. Dhillon. 2019. A Modular Deep Learning Approach for Extreme Multi-label Text Classification. CoRR (2019).

[9]

W. C. Chang, Yu H. F., K. Zhong, Y. Yang, and I. S. Dhillon. 2020. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD.

[10]

S.-A. Chen, J.-J. Liu, T.-H. Yang, H.-T. Lin, and C.-J. Lin. 2022. Even the Simplest Baseline Needs Careful Re-investigation: A Case Study on XML-CNN. In NAACL.

[11]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML.

[12]

M Cissé, N. Usunier, T. Artières, and P. Gallinari. 2013. Robust bloom filters for large multilabel classification tasks. In NIPS.

[13]

K. Dahiya, A. Agarwal, D. Saini, K. Gururaj, J. Jiao, A. Singh, S. Agarwal, P. Kar, and M. Varma. 2021a. SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels. In ICML.

[14]

K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal, and M. Varma. 2021b. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents. In WSDM.

[15]

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL (2019).

[16]

F. Faghri, D.-J. Fleet, J.-R. Kiros, and S. Fidler. 2018. VSE: Improving Visual-Semantic Embeddings with Hard Negatives. In BMVC.

[17]

C. Guo, A. Mousavi, X. Wu, D.-N. Holtmann-Rice, S. Kale, S. Reddi, and S. Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In NeurIPS.

[18]

V. Gupta, R. Wadbude, N. Natarajan, H. Karnick, P. Jain, and P. Rai. 2019. Distributional Semantics Meets Multi-Label Learning. In AAAI.

[19]

B. Harwood, Kumar B.-V., G. Carneiro, I. Reid, and T. Drummond. 2017. Smart mining for deep metric learning. In ICCV.

[20]

K. He, Haoqi Fan, Yuxin W., S. Xie, and R. Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR.

[21]

S. Hofstätter, S.-C. Lin, J.-H. Yang, J. Lin, and A. Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. In SIGIR.

[22]

P. S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. In CIKM.

[23]

H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma. 2019. Slice: Scalable Linear Extreme Classifiers trained on 100 Million Labels for Related Searches. In WSDM.

Digital Library

[24]

H. Jain, Y. Prabhu, and M. Varma. 2016. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking and Other Missing Label Applications. In KDD.

[25]

T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, and F. Zhuang. 2021. LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification. In AAAI.

[26]

A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In EACL.

[27]

V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-T. Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. In EMNLP.

[28]

K. Kawaguchi and H. Lu. 2020. Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization. In AISTATS.

[29]

S. Khandagale, H. Xiao, and R. Babbar. 2020. Bonsai: diverse and shallow trees for extreme multi-label classification. ML (2020).

[30]

S. Kharbanda, A. Banerjee, A. Palrecha, and R. Babbar. 2021. Embedding Convolutions for Short Text Extreme Classification with Millions of Labels. arXiv preprint arXiv:2109.07319 (2021).

[31]

M. C. Lee, B. Gao, and R. Zhang. 2018. Rare Query Expansion Through Generative Adversarial Networks in Search Advertising. In KDD.

[32]

J. Liu, W. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR.

[33]

W. Lu, J. Jiao, and R. Zhang. 2020. TwinBERT: Distilling Knowledge to Twin-Structured Compressed BERT Models for Large-Scale Retrieval. In CIKM.

[34]

Y. Luan, J. Eisenstein, K. Toutanova, and M. Collins. 2020. Sparse, Dense, and Attentional Representations for Text Retrieval. TACL.

[35]

A. Y. Malkov and D. A. Yashunin. 2020. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. TPAMI (2020).

[36]

T. K. R. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava. 2019. Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products. In NeurIPS.

[37]

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NIPS.

Digital Library

[38]

A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar, and M. Varma. 2021a. DECAF: Deep Extreme Classification with Label Features. In WSDM.

[39]

A. Mittal, K. Dahiya, S. Malani, J. Ramaswamy, S. Kuruvilla, J. Ajmera, K. Chang, S. Agrawal, P. Kar, and M. Varma. 2022. Multimodal Extreme Classification. In CVPR.

[40]

A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar, and M. Varma. 2021b. ECLARE: Extreme Classification with Label Graph Correlations. In WWW.

[41]

R. Panda, A. Pensia, N. Mehta, M. Zhou, and P. Rai. 2019. Deep Topic Models for Multi-label Learning. In ICML.

[42]

Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal, and M. Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM.

[43]

Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. 2018b. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In WWW.

Digital Library

[44]

Y. Prabhu, A. Kusupati, N. Gupta, and M. Varma. 2020. Extreme Regression for Dynamic Search Advertising. In WSDM.

[45]

Y. Prabhu and M. Varma. 2014. FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning. In KDD.

[46]

M. Qaraei, E. Schultheis, P. Gupta, and R. Babbar. 2021. Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels. In The WebConf.

[47]

Y. Qu, Y. Ding, J. Liu, K. Liu, R. Ren, W. X. Zhao, D. Dong, H. Wu, and H. Wang. 2021. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering.

[48]

A. S. Rawat, A. K. Menon, W. Jitkrittum, S. Jayasumana, F. X. Yu, S. Reddi, and S. Kumar. 2021. Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces. In ICML.

[49]

S. J. Reddi, S. Kale, F.X. Yu, D. N. H. Rice, J. Chen, and S. Kumar. 2019. Stochastic Negative Mining for Learning with Large Output Spaces. In AISTATS.

[50]

N. Reimers and I. Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. EMNLP (2019).

[51]

D. Saini, A.K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang, and M. Varma. 2021. GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification. In WWW.

[52]

V. Sanh, L. Debut, J. Chaumond, and T. Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv (2019).

[53]

F. Schroff, D. Kalenichenko, and J. Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In CVPR.

[54]

W. Siblini, P. Kuntz, and F. Meyer. 2018. CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In ICML.

[55]

L. Song, P. Pan, K. Zhao, H. Yang, Y. Chen, Y. Zhang, Y. Xu, and R. Jin. 2020. Large-Scale Training System for 100-Million Classification at Alibaba. In KDD.

[56]

T. Wei, W. W. Tu, and Y. F. Li. 2019. Learning for Tail Label Data: A Label-Specific Feature Approach. In IJCAI.

[57]

L. Xiong, C. Xiong, Y. Li, K.-F. Tang, J. Liu, P. Bennett, J. Ahmed, and A. Overwijk. 2021. Approximate nearest neighbor negative contrastive learning for dense text retrieval. In ICLR.

[58]

H. Ye, Z. Chen, D.-H. Wang, and B. D. Davison. 2020. Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification. In ICML.

[59]

E. H. I. Yen, X. Huang, W. Dai, P. Ravikumar, I. Dhillon, and E. Xing. 2017. PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD.

[60]

R. You, S. Dai, Z. Zhang, H. Mamitsuka, and S. Zhu. 2019. AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks. In NeurIPS.

[61]

J. Zhang, W. C. Chang, H. F. Yu, and I. Dhillon. 2021. Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. In NeurIPS.

[62]

W. Zhang, L. Wang, J. Yan, X. Wang, and H. Zha. 2018. Deep Extreme Multi-label Learning. ICMR (2018).

Cited By

Mohan SSaini DMittal AChowdhury SPaliwal BJiao JGupta MVarma MSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)OAKProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693537(36012-36028)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693537
Niu YXu HLiu CWen YYuan XLarson K(2024)Contrastive representation learning for self-supervised taxonomy completionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/712(6442-6450)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/712
Kharbanda SGupta DSchultheis EBanerjee AHsieh CBabbar RBaeza-Yates RBonchi F(2024)Gandalf: Learning Label-label Correlations in Extreme Multi-label Classification via Label FeaturesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672063(1360-1371)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3672063
Show More Cited By

Index Terms

NGAME: Negative Mining-aware Mini-batching for Extreme Classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification

Recommendations

Deep Encoders with Auxiliary Parameters for Extreme Classification
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

The task of annotating a data point with labels most relevant to it from a large universe of labels is referred to as Extreme Classification (XC). State-of-the-art XC methods have applications in ranking, recommendation, and tagging and mostly employ a ...
ECLARE: Extreme Classification with Label Graph Correlations
WWW '21: Proceedings of the Web Conference 2021

Deep extreme classification (XC) seeks to train deep architectures that can tag a data point with its most relevant subset of labels from an extremely large label set. The core utility of XC comes from predicting labels that are rarely seen during ...
DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

February 2023

1345 pages

ISBN:9781450394079

DOI:10.1145/3539597

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Hady Lauw
Singapore Management University
,
Program Chairs:
Luo Si
Salesforce
,
Evimaria Terzi
Boston University
,
Panayiotis Tsaparas
University of Ioannina

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 February 2023

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM '23

Sponsor:

WSDM '23: The Sixteenth ACM International Conference on Web Search and Data Mining

February 27 - March 3, 2023

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
303
Total Downloads

Downloads (Last 12 months)97
Downloads (Last 6 weeks)4

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mohan SSaini DMittal AChowdhury SPaliwal BJiao JGupta MVarma MSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)OAKProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693537(36012-36028)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693537
Niu YXu HLiu CWen YYuan XLarson K(2024)Contrastive representation learning for self-supervised taxonomy completionProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/712(6442-6450)Online publication date: 3-Aug-2024
https://dl.acm.org/doi/10.24963/ijcai.2024/712
Kharbanda SGupta DSchultheis EBanerjee AHsieh CBabbar RBaeza-Yates RBonchi F(2024)Gandalf: Learning Label-label Correlations in Extreme Multi-label Classification via Label FeaturesProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672063(1360-1371)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3672063
Ye HSunderraman RJi S(2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
https://doi.org/10.1109/TKDE.2024.3374750
Vemuri HAgrawal SMittal SSaini DSoni ASambasivan ALu WWang YParsana MKar PVarma MChen HDuh WHuang HKato MMothe JPoblete B(2023)Personalized Retrieval over Millions of ItemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591749(1014-1022)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591749
Qaraei MBabbar R(2023)Meta-classifier free negative sampling for extreme multilabel classificationMachine Language10.1007/s10994-023-06468-w113:2(675-697)Online publication date: 20-Nov-2023
https://dl.acm.org/doi/10.1007/s10994-023-06468-w

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten