Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3534678.3542629acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
abstract

PECOS: Prediction for Enormous and Correlated Output Spaces

Published: 14 August 2022 Publication History

Abstract

Different from traditional machine learning tasks and benchmarks, real-world problems are usually accompanied by enormous output spaces, from hundred thousands of diseases in medical diagnosis, to millions of items and billions of websites in product and web search engines. Unfortunately, conventional machine learning tools and libraries are incapable of efficiently and accurately tackling large-scale output spaces. To address this issue, PECOS (Prediction for Enormous and Correlated Output Spaces) [11] is a state-of-the-art and open-sourced machine learning library1, which not only provides high-level and user-friendly interfaces of both linear and deep learning models, but also supplies considerable flexibility for solving diverse machine learning problems. Specifically, PECOS eases complicated semantic indexing for organizing enormous output spaces, thereby efficiently training models and deriving predictions by magnitude orders on correlated output labels. As a powerful and useful framework, PECOS has already been adopted in various real- world large-scale products like semantic search in Amazon [1], as well as achieved state-of-the-art on public extreme multi-label classification (XMC) benchmarks [2, 11, 12 ] and various downstream applications [3, 7, 9].
In this tutorial, we will introduce several key functions and features of the PECOS library. By way of real-world examples, the attendees will learn how to efficiently train large-scale machine learning models for enormous output spaces, and obtain predictions in less than 1 millisecond for a data input with million labels, in the context of product recommendation and natural language processing. We will also show the flexibility of dealing with diverse machine learning problems and data formats with assorted built-in utilities in PECOS. By the end of the tutorial, we believe that attendees will be easily capable of adopting certain concepts to their own projects and address different machine learning problems with enormous output spaces

References

[1]
Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon-Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, Japinder Singh, and Inderjit S Dhillon. Extreme multi-label learning for semantic matching in product search. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2021.
[2]
Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S Dhillon. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3163--3171, 2020.
[3]
Eli Chien, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Jiong Zhang, Olgica Milenkovic, and Inderjit S Dhillon. Node feature extraction by self-supervised multi-scale neighborhood prediction. In International Conference on Learning Representations (ICLR), 2022.
[4]
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning, 2020.
[5]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535--547, 2019.
[6]
Y. A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824--836, 2020.
[7]
Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel N Hill, and Inderjit S Dhillon. Top-k extreme contextual bandits with arm hierarchy. In International Conference on Machine Learning, pages 9422--9433. PMLR, 2021.
[8]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
[9]
Nishant Yadav, Rajat Sen, Daniel N Hill, Arya Mazumdar, and Inderjit S Dhillon. Session-aware query auto-completion using extreme multi-label ranking. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3835--3844, 2021.
[10]
Hsiang-Fu Yu, Cho-Jui Hsieh, Qi Lei, and Inderjit S Dhillon. A greedy approach for budgeted maximum inner product search. In Advances in Neural Information Processing Systems, pages 5453--5462, 2017.
[11]
Hsiang-Fu Yu, Kai Zhong, Jiong Zhang, Wei-Cheng Chang, and Inderjit S Dhillon. Pecos: Prediction for enormous and correlated output spaces. Journal of Machine Learning Research, 2022.
[12]
Jiong Zhang, Wei-Cheng Chang, Hsiang-Fu Yu, and Inderjit S Dhillon. Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. In Advances in Neural Information Processing Systems, 2021.

Cited By

View all
  • (2024)Entity Disambiguation with Extreme Multi-label RankingProceedings of the ACM Web Conference 202410.1145/3589334.3645498(4172-4180)Online publication date: 13-May-2024
  • (2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
  • (2024)Collaborative learning of supervision and correlation for generalized zero-shot extreme multi-label learningApplied Intelligence10.1007/s10489-024-05498-854:8(6285-6298)Online publication date: 9-May-2024
  • Show More Cited By

Index Terms

  1. PECOS: Prediction for Enormous and Correlated Output Spaces

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
      August 2022
      5033 pages
      ISBN:9781450393850
      DOI:10.1145/3534678
      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 August 2022

      Check for updates

      Author Tags

      1. approximate nearest neighbor search
      2. extreme multi-label ranking
      3. hands-on tutorial
      4. interactive demonstration

      Qualifiers

      • Abstract

      Conference

      KDD '22
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)13
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 23 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Entity Disambiguation with Extreme Multi-label RankingProceedings of the ACM Web Conference 202410.1145/3589334.3645498(4172-4180)Online publication date: 13-May-2024
      • (2024)MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337475036:9(4781-4793)Online publication date: Sep-2024
      • (2024)Collaborative learning of supervision and correlation for generalized zero-shot extreme multi-label learningApplied Intelligence10.1007/s10489-024-05498-854:8(6285-6298)Online publication date: 9-May-2024
      • (2024)DBSSM: Deep BERT-Based Semantic Skill Matching from Resumes to a Public Skill TaxonomyAI 2024: Advances in Artificial Intelligence10.1007/978-981-96-0348-0_23(316-328)Online publication date: 18-Nov-2024
      • (2023)PINAProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618632(5616-5630)Online publication date: 23-Jul-2023
      • (2023)A Self-Supervised Tree-Structured Framework for Fine-Grained ClassificationApplied Sciences10.3390/app1307445313:7(4453)Online publication date: 31-Mar-2023
      • (2023)Flamingo: Environmental Impact Factor Matching for Life Cycle Assessment with Zero-shot Machine LearningACM Journal on Computing and Sustainable Societies10.1145/36163851:2(1-23)Online publication date: 6-Dec-2023
      • (2023)Build Faster with Less: A Journey to Accelerate Sparse Model Building for Semantic Matching in Product SearchProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614661(4960-4966)Online publication date: 21-Oct-2023
      • (2023)Extreme Multi-Label Classification for Ad Targeting using Factorization MachinesProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599822(4705-4716)Online publication date: 6-Aug-2023
      • (2023)Online Level-wise Hierarchical ClusteringProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599455(1733-1745)Online publication date: 6-Aug-2023
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media