Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2488388.2488391acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages

Published: 13 May 2013 Publication History

Abstract

Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on making recommendations from a small set of phrases extracted (and expanded) from the page using NLP and ranking based techniques. In this paper, we eschew this paradigm, and demonstrate that it is possible to efficiently predict the relevant subset of queries from a large set of monetizable ones by posing the problem as a multi-label learning task with each query being represented by a separate label.
We develop Multi-label Random Forests to tackle problems with millions of labels. Our proposed classifier has prediction costs that are logarithmic in the number of labels and can make predictions in a few milliseconds using 10 Gb of RAM. We demonstrate that it is possible to generate training data for our classifier automatically from click logs without any human annotation or intervention. We train our classifier on tens of millions of labels, features and training points in less than two days on a thousand node cluster. We develop a sparse semi-supervised multi-label learning formulation to deal with training set biases and noisy labels harvested automatically from the click logs. This formulation is used to infer a belief in the state of each label for each training ad and the random forest classifier is extended to train on these beliefs rather than the given labels. Experiments reveal significant gains over ranking and NLP based techniques on a large test set of 5 million ads using multiple metrics.

References

[1]
V. Abhishek and K. Hosanagar. Keyword generation for search engine advertising using semantic similarity between terms. In ICEC, pages 89--94, 2007.
[2]
I. Antonellis, H. G. Molina, and C. C. Chang. Simrank++: query rewriting through link analysis of the click graph. VLDBE, 1(1):408--421, 2008.
[3]
S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In NIPS, 2010.
[4]
A. Beygelzimer, J. Langford, Y. Lifshits, G. Sorkin, and A. Strehl. Conditional probability tree estimation analysis and algorithms. In UAI, 2009.
[5]
W. Bi and J. T. Kwok. Multilabel classification on tree- and dag-structured hierarchies. In ICML, 2011.
[6]
H. Blockeel, L. D. Raedt, and J. Ramon. Top-down induction of clustering trees. In ICML, pages 55--63, 1998.
[7]
T. Blumensath and M. Davies. Iterative thresholding for sparse approximations. J. of Fourier Analysis and Applications, 14:629--654, 2004.
[8]
M. Boutell, J. Luo, X. Shen, and C. Brown. Learning multi-label scene classification. Pattern Recognition, 37(9):1757--1771, 2004.
[9]
L. Breiman. Random forests. Machine Learning, 45(1), 2001.
[10]
A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. Online expansion of rare queries for sponsored search. In WWW, pages 511--520, 2009.
[11]
C. Carpineto and G. Romano. A survey of automatic query expansion in information retrieval. CSUR, 44(1):1:1--1:50, 2012.
[12]
N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. JMLR, 7, 2006.
[13]
G. Chen, Y. Song, F. Wang, and C. Zhang. Semi-supervised multi-label learning by solving a sylvester equation. In SDM, pages 410--419, 2008.
[14]
Y. Choi, M. Fontoura, E. Gabrilovich, V. Josifovski, M. Mediano, and B. Pang. Using landing pages for sponsored search ad selection. In WWW, pages 251--260, 2010.
[15]
A. Clare and R. D. King. Knowledge discovery in multi-label phenotype data. In PKDD, pages 42--53, 2001.
[16]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.
[17]
O. Dekel and O. Shair. Multiclass-multilabel classification with more classes than examples. In AISTATS, 2010.
[18]
J. Deng, S. Satheesh, A. C. Berg, and F. Li. Fast and balanced: Efficient label tree learning for large scale object recognition. In NIPS, 2011.
[19]
A. Elisseeff and J. Weston. A kernel method for multi-labelled classification. In NIPS, pages 681--687, 2001.
[20]
B. Hariharan, S. V. N. Vishwanathan, and M. Varma. Efficient max-margin multi-label classification with applications to zero-shot learning. ML, 2012.
[21]
D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In NIPS, 2009.
[22]
S. Ji, L. Sun, R. Jin, and J. Ye. Multi-label multiple kernel learning. In NIPS, pages 777--784, 2008.
[23]
A. Joshi and R. Motwani. Keyword generation for search engine advertising. In ICDMW, pages 490--496, 2006.
[24]
A. Kapoor, R. Viswanathan, and P. Jain. Multilabel classification using bayesian compressed sensing. In NIPS, 2012.
[25]
D. Kocev, C. Vens, J. Struyf, and S. Dzeroski. Ensembles of multi-objective decision trees. In ECML, pages 624 -- 631, 2007.
[26]
A. Z. Kouzani and G. Nasireding. Multilabel classification by bch code and random forests. Intl. J. of Recent Trends in Engg., 2, 2009.
[27]
Y. Liu, R. Jin, and L. Yang. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In AAAI, 2006.
[28]
A. Malekian, C.-C. Chang, R. Kumar, and G. Wang. Optimizing query rewrites for keyword-based advertising. In EC, pages 10--19, 2008.
[29]
M. Munsey, J. Veilleux, S. Bikkani, A. Teredesai, and M. D. Cock. Born to trade: A genetically evolved keyword bidder for sponsored search. In CEC, pages 1--8, july 2010.
[30]
R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. M. Lukose, M. Scholz, and Q. Yang. One-class collaborative filtering. In ICDM, 2008.
[31]
B. Panda, J. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively parallel learning of tree ensembles with mapreduce. PVLDB, 2(2), 2009.
[32]
S. Ravi, A. Broder, E. Gabrilovich, V. Josifovski, S. Pandey, and B. Pang. Automatic generation of bid phrases for online advertising. In WSDM, 2010.
[33]
J. Rousu, C. Saunders, S. Szedmak, and J. Shawe-Taylor. Kernel-based learning of hierarchical multilabel classification models. JMLR, 7, 2006.
[34]
M. R. Segal. Tree-structured methods for longitudinal data. J. of American Statistical Association, 87(418):407--418, 1992.
[35]
S. Shalev-Shwartz, N. Srebro, and T. Zhang. Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM J. on Optimization, 20(6), 2010.
[36]
V. Sindhwani, S. S. Bucak, J. Hu, and A. Mojsilovic. One-class matrix completion with low-density factorizations. In ICDM, 2010.
[37]
S. Sun and J. Shawe-Taylor. Sparse semi-supervised learning using conjugate functions. JMLR, 2010.
[38]
L. Tang, S. Rajan, and V. K. Narayanan. Large scale multi-label classification via metalabeler. In WWW, pages 211--220, 2009.
[39]
B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In NIPS, 2003.
[40]
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. JMLR, 6, 2005.
[41]
G. Tsoumakas, I. Katakis, and I. Vlahavas. Effective and efficient multilabel classification in domains with large number of labels. In ECML/PKDD, 2008.
[42]
G. Tsoumakas and I. P. Vlahavas. Random k -labelsets: An ensemble method for multilabel classification. In ECML, 2007.
[43]
J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning to rank with joint word-image embeddings. Machine Learning, 81(1), 2010.
[44]
X. Wu and A. Bolivar. Keyword extraction for contextual advertisement. In WWW, pages 1195--1196, 2008.
[45]
R. Yan, J. Tesic, and J. R. Smith. Model-shared subspace boosting for multi-label classification. In KDD, 2007.
[46]
W. Yih, J. Goodman, and V. R. Carvalho. Finding advertising keywords on web pages. In WWW, 2006.
[47]
M. L. Zhang, J. M. Pena, and V. Robles. Feature selection for multi-label naive bayes classification. Inf. Sci., 179(19), 2009.
[48]
X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, University of Wisconsin Madison, 2008.

Cited By

View all
  • (2024)Retrieval-style In-context Learning for Few-shot Hierarchical Text ClassificationTransactions of the Association for Computational Linguistics10.1162/tacl_a_0069712(1214-1231)Online publication date: 30-Sep-2024
  • (2024)Variational Continuous Label Distribution Learning for Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3323401(1-15)Online publication date: 2024
  • (2024)Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text ClassificationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.332937432(276-285)Online publication date: 1-Jan-2024
  • Show More Cited By

Index Terms

  1. Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '13: Proceedings of the 22nd international conference on World Wide Web
    May 2013
    1628 pages
    ISBN:9781450320351
    DOI:10.1145/2488388

    Sponsors

    • NICBR: Nucleo de Informatcao e Coordenacao do Ponto BR
    • CGIBR: Comite Gestor da Internet no Brazil

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 May 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bid phrase recommendation
    2. large scale learning
    3. multi-label learning
    4. random forests
    5. semi-supervised learning

    Qualifiers

    • Research-article

    Conference

    WWW '13
    Sponsor:
    • NICBR
    • CGIBR
    WWW '13: 22nd International World Wide Web Conference
    May 13 - 17, 2013
    Rio de Janeiro, Brazil

    Acceptance Rates

    WWW '13 Paper Acceptance Rate 125 of 831 submissions, 15%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)52
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Retrieval-style In-context Learning for Few-shot Hierarchical Text ClassificationTransactions of the Association for Computational Linguistics10.1162/tacl_a_0069712(1214-1231)Online publication date: 30-Sep-2024
    • (2024)Variational Continuous Label Distribution Learning for Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3323401(1-15)Online publication date: 2024
    • (2024)Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text ClassificationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2023.332937432(276-285)Online publication date: 1-Jan-2024
    • (2024)BiLSTM model with Attention mechanism for multi-label news text classification2024 4th International Conference on Neural Networks, Information and Communication (NNICE)10.1109/NNICE61279.2024.10498894(566-569)Online publication date: 19-Jan-2024
    • (2024)A Generative Approach for Wikipedia-Scale Visual Entity Recognition2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01639(17313-17322)Online publication date: 16-Jun-2024
    • (2024)HLC: hierarchically-aware label correlation for hierarchical text classificationApplied Intelligence10.1007/s10489-023-05257-154:2(1602-1618)Online publication date: 1-Jan-2024
    • (2024)Multi-label Robust Feature Selection via Subspace-Sparsity LearningArtificial Neural Networks and Machine Learning – ICANN 202410.1007/978-3-031-72332-2_1(3-17)Online publication date: 17-Sep-2024
    • (2024)Shallow Learning Versus Deep Learning in Natural Language Processing ApplicationsShallow Learning vs. Deep Learning10.1007/978-3-031-69499-8_8(179-206)Online publication date: 13-Oct-2024
    • (2023)Generalized test utilities for long-tail performance in extreme multi-label classificationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667100(22269-22303)Online publication date: 10-Dec-2023
    • (2023)SemSup-XCProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618419(228-247)Online publication date: 23-Jul-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media