research-article

Free access

Deep Encoders with Auxiliary Parameters for Extreme Classification

Authors:

Manik VarmaAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 358 - 367

https://doi.org/10.1145/3580305.3599301

Published: 04 August 2023 Publication History

PDF eReader

Abstract

The task of annotating a data point with labels most relevant to it from a large universe of labels is referred to as Extreme Classification (XC). State-of-the-art XC methods have applications in ranking, recommendation, and tagging and mostly employ a combination architecture comprised of a deep encoder and a high-capacity classifier. These two components are often trained in a modular fashion to conserve compute. This paper shows that in XC settings where data paucity and semantic gap issues abound, this can lead to suboptimal encoder training which negatively affects the performance of the overall architecture. The paper then proposes a lightweight alternative DEXA that augments encoder training with auxiliary parameters. Incorporating DEXA into existing XC architectures requires minimal modifications and the method can scale to datasets with 40 million labels and offer predictions that are up to 6% and 15% more accurate than embeddings offered by existing deep XC methods on benchmark and proprietary datasets, respectively. The paper also analyzes DEXA theoretically and shows that it offers provably superior encoder training than existing Siamese training strategies in certain realizable settings. Code for DEXA is available at https://github.com/Extreme-classification/dexa.

Supplementary Material

MP4 File (rtfp0211-2min-promo.mp4)

Large language models or encoders are widely used in real-world search and recommendation applications as they provide accurate and scalable solutions for dense retrieval. Such models may suffer when the label-text is insufficient which is referred to as semantic gap. DEXA bridges the semantic gap by introducing auxiliary parameters in the existing encoders. Integrating DEXA into existing dense retrieval pipelines can help improve the predictions with minimal overhead at the training time and zero overhead at inference time. Check out our code at: https://github.com/Extreme-classification/dexa

Download
12.45 MB

References

[1]

R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW.

Abstract

Supplementary Material

References

Index Terms

Recommendations

NGAME: Negative Mining-aware Mini-batching for Extreme Classification

Extreme Multi-label Learning with Label Features for Warm-start Tagging, Ranking & Recommendation

Deep Extreme Multi-label Learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations