research-article

Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages

Authors:

Yashoteja Prabhu,

Manik VarmaAuthors Info & Claims

WWW '13: Proceedings of the 22nd international conference on World Wide Web

Pages 13 - 24

https://doi.org/10.1145/2488388.2488391

Published: 13 May 2013 Publication History

Abstract

Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on making recommendations from a small set of phrases extracted (and expanded) from the page using NLP and ranking based techniques. In this paper, we eschew this paradigm, and demonstrate that it is possible to efficiently predict the relevant subset of queries from a large set of monetizable ones by posing the problem as a multi-label learning task with each query being represented by a separate label.

We develop Multi-label Random Forests to tackle problems with millions of labels. Our proposed classifier has prediction costs that are logarithmic in the number of labels and can make predictions in a few milliseconds using 10 Gb of RAM. We demonstrate that it is possible to generate training data for our classifier automatically from click logs without any human annotation or intervention. We train our classifier on tens of millions of labels, features and training points in less than two days on a thousand node cluster. We develop a sparse semi-supervised multi-label learning formulation to deal with training set biases and noisy labels harvested automatically from the click logs. This formulation is used to infer a belief in the state of each label for each training ad and the random forest classifier is extended to train on these beliefs rather than the given labels. Experiments reveal significant gains over ranking and NLP based techniques on a large test set of 5 million ads using multiple metrics.

References

[1]

V. Abhishek and K. Hosanagar. Keyword generation for search engine advertising using semantic similarity between terms. In ICEC, pages 89--94, 2007.

Digital Library

[2]

I. Antonellis, H. G. Molina, and C. C. Chang. Simrank++: query rewriting through link analysis of the click graph. VLDBE, 1(1):408--421, 2008.

Digital Library

[3]

S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In NIPS, 2010.

Digital Library

[4]

A. Beygelzimer, J. Langford, Y. Lifshits, G. Sorkin, and A. Strehl. Conditional probability tree estimation analysis and algorithms. In UAI, 2009.

Digital Library

[5]

W. Bi and J. T. Kwok. Multilabel classification on tree- and dag-structured hierarchies. In ICML, 2011.

[6]

H. Blockeel, L. D. Raedt, and J. Ramon. Top-down induction of clustering trees. In ICML, pages 55--63, 1998.

Digital Library

[7]

T. Blumensath and M. Davies. Iterative thresholding for sparse approximations. J. of Fourier Analysis and Applications, 14:629--654, 2004.

[8]

M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757--1771, 2004.

[9]

L. Breiman. Random forests. Machine Learning, 45(1), 2001.

Digital Library

[10]

A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. Online expansion of rare queries for sponsored search. In WWW, pages 511--520, 2009.

Digital Library

[11]

C. Carpineto and G. Romano. A survey of automatic query expansion in information retrieval. CSUR, 44(1):1:1--1:50, 2012.

Digital Library

[12]

N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. JMLR, 7, 2006.

Digital Library

[13]

G. Chen, Y. Song, F. Wang, and C. Zhang. Semi-supervised multi-label learning by solving a sylvester equation. In SDM, pages 410--419, 2008.

[14]

Y. Choi, M. Fontoura, E. Gabrilovich, V. Josifovski, M. Mediano, and B. Pang. Using landing pages for sponsored search ad selection. In WWW, pages 251--260, 2010.

Digital Library

[15]

A. Clare and R. D. King. Knowledge discovery in multi-label phenotype data. In PKDD, pages 42--53, 2001.

Digital Library

[16]

J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.

Digital Library

[17]

O. Dekel and O. Shair. Multiclass-multilabel classification with more classes than examples. In AISTATS, 2010.

[18]

J. Deng, S. Satheesh, A. C. Berg, and F. Li. Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS, 2011.

Digital Library

[19]

A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In NIPS, pages 681--687, 2001.

Digital Library

[20]

B. Hariharan, S. V. N. Vishwanathan, and M. Varma. Efficient max-margin multi-label classification with applications to zero-shot learning. ML, 2012.

Digital Library

[21]

D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In NIPS, 2009.

Digital Library

[22]

S. Ji, L. Sun, R. Jin, and J. Ye. Multi-label multiple kernel learning. In NIPS, pages 777--784, 2008.

Digital Library

[23]

A. Joshi and R. Motwani. Keyword generation for search engine advertising. In ICDMW, pages 490--496, 2006.

Digital Library

[24]

A. Kapoor, R. Viswanathan, and P. Jain. Multilabel classification using bayesian compressed sensing. In NIPS, 2012.

Digital Library

[25]

D. Kocev, C. Vens, J. Struyf, and S. Dzeroski. Ensembles of multi-objective decision trees. In ECML, pages 624 -- 631, 2007.

Digital Library

[26]

A. Z. Kouzani and G. Nasireding. Multilabel classification by bch code and random forests. Intl. J. of Recent Trends in Engg., 2, 2009.

[27]

Y. Liu, R. Jin, and L. Yang. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In AAAI, 2006.

Digital Library

[28]

A. Malekian, C.-C. Chang, R. Kumar, and G. Wang. Optimizing query rewrites for keyword-based advertising. In EC, pages 10--19, 2008.

Digital Library

[29]

M. Munsey, J. Veilleux, S. Bikkani, A. Teredesai, and M. D. Cock. Born to trade: A genetically evolved keyword bidder for sponsored search. In CEC, pages 1--8, july 2010.

[30]

R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. M. Lukose, M. Scholz, and Q. Yang. One-class collaborative filtering. In ICDM, 2008.

Digital Library

[31]

B. Panda, J. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. PVLDB, 2(2), 2009.

Digital Library

[32]

S. Ravi, A. Broder, E. Gabrilovich, V. Josifovski, S. Pandey, and B. Pang. Automatic generation of bid phrases for online advertising. In WSDM, 2010.

Digital Library

[33]

J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. JMLR, 7, 2006.

Digital Library

[34]

M. R. Segal. Tree-structured methods for longitudinal data. J. of American Statistical Association, 87(418):407--418, 1992.

[35]

S. Shalev-Shwartz, N. Srebro, and T. Zhang. Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM J. on Optimization, 20(6), 2010.

Digital Library

[36]

V. Sindhwani, S. S. Bucak, J. Hu, and A. Mojsilovic. One-class matrix completion with low-density factorizations. In ICDM, 2010.

Digital Library

[37]

S. Sun and J. Shawe-Taylor. Sparse semi-supervised learning using conjugate functions. JMLR, 2010.

Digital Library

[38]

L. Tang, S. Rajan, and V. K. Narayanan. Large scale multi-label classification via metalabeler. In WWW, pages 211--220, 2009.

Digital Library

[39]

B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2003.

Digital Library

[40]

I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6, 2005.

Digital Library

[41]

G. Tsoumakas, I. Katakis, and I. Vlahavas. Effective and efficient multilabel classification in domains with large number of labels. In ECML/PKDD, 2008.

[42]

G. Tsoumakas and I. P. Vlahavas. Random k -labelsets: An ensemble method for multilabel classification. In ECML, 2007.

Digital Library

[43]

J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning to rank with joint word-image embeddings. Machine Learning, 81(1), 2010.

Digital Library

[44]

X. Wu and A. Bolivar. Keyword extraction for contextual advertisement. In WWW, pages 1195--1196, 2008.

Digital Library

[45]

R. Yan, J. Tesic, and J. R. Smith. Model-shared subspace boosting for multi-label classification. In KDD, 2007.

Digital Library

[46]

W. Yih, J. Goodman, and V. R. Carvalho. Finding advertising keywords on web pages. In WWW, 2006.

Digital Library

[47]

M. L. Zhang, J. M. Pena, and V. Robles. Feature selection for multi-label naive bayes classification. Inf. Sci., 179(19), 2009.

Digital Library

[48]

X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin Madison, 2008.

Cited By

Chen HZhao YChen ZWang MLi LZhang MZhang M(2024)Retrieval-style In-context Learning for Few-shot Hierarchical Text ClassificationTransactions of the Association for Computational Linguistics10.1162/tacl_a_0069712(1214-1231)Online publication date: 30-Sep-2024
https://doi.org/10.1162/tacl_a_00697
Zhao XAn YXu NGeng X(2024)Variational Continuous Label Distribution Learning for Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3323401(1-15)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3323401
Yan JLi PChen HZheng JMa Q(2024)Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text ClassificationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.332937432(276-285)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TASLP.2023.3329374
Show More Cited By

Index Terms

Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages
1. Computing methodologies
  1. Machine learning

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Stacked co-training for semi-supervised multi-label learning
Abstract
Due to the difficulty of annotation, multi-label learning sometimes obtains a small amount of labeled data and a large amount of unlabeled data as supplements. To make up this issue, many algorithms extended the existing semi-supervised ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '13: Proceedings of the 22nd international conference on World Wide Web

May 2013

1628 pages

ISBN:9781450320351

DOI:10.1145/2488388

General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea

Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

Sponsors

NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
CGIBR: Comite Gestor da Internet no Brazil

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '13

Sponsor:

NICBR
CGIBR

WWW '13: 22nd International World Wide Web Conference

May 13 - 17, 2013

Rio de Janeiro, Brazil

Acceptance Rates

WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

158
Total Citations
View Citations
1,173
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)6

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen HZhao YChen ZWang MLi LZhang MZhang M(2024)Retrieval-style In-context Learning for Few-shot Hierarchical Text ClassificationTransactions of the Association for Computational Linguistics10.1162/tacl_a_0069712(1214-1231)Online publication date: 30-Sep-2024
https://doi.org/10.1162/tacl_a_00697
Zhao XAn YXu NGeng X(2024)Variational Continuous Label Distribution Learning for Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3323401(1-15)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3323401
Yan JLi PChen HZheng JMa Q(2024)Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text ClassificationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.332937432(276-285)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TASLP.2023.3329374
Duan ARaga R(2024)BiLSTM model with Attention mechanism for multi-label news text classification2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498894(566-569)Online publication date: 19-Jan-2024
https://doi.org/10.1109/NNICE61279.2024.10498894
Caron MIscen AFathi ASchmid C(2024)A Generative Approach for Wikipedia-Scale Visual Entity Recognition2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01639(17313-17322)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01639
Kumar AToshinwal D(2024)HLC: hierarchically-aware label correlation for hierarchical text classificationApplied Intelligence10.1007/s10489-023-05257-154:2(1602-1618)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1007/s10489-023-05257-1
Zhou YYuan BZhong YLi Y(2024)Multi-label Robust Feature Selection via Subspace-Sparsity LearningArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72332-2_1(3-17)Online publication date: 17-Sep-2024
https://doi.org/10.1007/978-3-031-72332-2_1
Sawalha LAkinci T(2024)Shallow Learning Versus Deep Learning in Natural Language Processing ApplicationsShallow Learning vs. Deep Learning10.1007/978-3-031-69499-8_8(179-206)Online publication date: 13-Oct-2024
https://doi.org/10.1007/978-3-031-69499-8_8
Schultheis EWydmuch MKotłowski WBabbar RDembczyński KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Generalized test utilities for long-tail performance in extreme multi-label classificationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667100(22269-22303)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667100
Aggarwal PDeshpande ANarasimhan KKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)SemSup-XCProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618419(228-247)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3618419
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents