Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2556195.2556242acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Sampling dilemma: towards effective data sampling for click prediction in sponsored search

Published: 24 February 2014 Publication History

Abstract

Precise prediction of the probability that users click on ads plays a key role in sponsored search. State-of-the-art sponsored search systems typically employ a machine learning approach to conduct click prediction. While paying much attention to extracting useful features and building effective models, previous studies have overshadowed seemingly less obvious but essentially important challenges in terms of data sampling. To fulfill the learning objective of click prediction, it is not only necessary to ensure that the sampled training data implies the similar input distribution compared with the real world one, but also to guarantee that the sampled training data yield the consistent conditional output distribution, i.e. click-through rate (CTR), with the real world data. However, due to the sparseness of clicks in sponsored search, it is a bit contradictory to address these two challenges simultaneously. In this paper, we first take a theoretical analysis to reveal this sampling dilemma, followed by a thorough data analysis which demonstrates that the straightforward random sampling method may not be effective to balance these two kinds of consistency in sampling dilemma simultaneously. To address this problem, we propose a new sampling algorithm which can succeed in retaining the consistency between the sampled data and real world in terms of both input distribution and conditional output distribution. Large scale evaluations on the click-through logs from a commercial search engine demonstrate that this new sampling algorithm can effectively address the sampling dilemma. Further experiments illustrate that, by using the training data obtained by our new sampling algorithm, we can learn the model with much higher accuracy in click prediction.

References

[1]
V. Abhishek and K. Hosanagar. Keyword generation for search engine advertising using semantic similarity between terms. In Proc. of EC, 2007.
[2]
J. Attenberg, S. Pandey, and T. Suel. Modeling and predicting user behavior in sponsored search. In Proc. of KDD, 2009.
[3]
A. Berger and V. Pietra. A maximum entropy approach to natural language processing. In Computational Linguistics, 1996.
[4]
H. Cheng and E. Cantu-Paz. Personalized click prediction in sponsored search. In Proc. of WSDM, 2010.
[5]
C. Clarke, E. Agichtein, S. Dumais, and R. White. The influence of caption features oh clickthrough patterns in web search. In Proc. of SIGIR, 2007.
[6]
K. Dembczynski, W. Kotlowski, and D. Weiss. Predicting ads click-through rate with decision rules. In Workshop on Targeting and Ranking in Online Advertising, 2008.
[7]
B. Edelman, M. Ostrovsky, and M. Schwarz. Internet adverstising and the generalized second-price auction: selling billions of dollars worth of keywords. In The American Economic Review, 2007.
[8]
D. Fain and J. Pedersen. Sponsored search: a brief history. In Proc. of 2nd Workshop on Sponsored Search Auctions, 2006.
[9]
T. Graepel, J. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine. In Proc. of ICML, 2010.
[10]
B. Jansen and T. Mullen. Sponsored search: an overview of the concept, history, and technology. In International Journal of Electric Business, 2008.
[11]
T. P. Minka. A comparison of numerical optimizers for logistic regression. In Technical report, Microsoft, 2003.
[12]
A. Mordecai. Nonlinear Programming: Analysis and Methods.
[13]
F. Radlinski, A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, and L. Riedel. Optimizing relevance and revenue in ad search: a query substitution approach. In Proc. of SIGIR, 2008.
[14]
H. Raghavan and R. Iyer. Evaluating vector-space and probabilistic models for query to ad matching. In Proc. of SIGIR Workshop on Information Retrieval for Advertising, 2008.
[15]
M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proc. of WWW, 2007.
[16]
B. Shaparenko, O. Cetin, and R. Iyer. Data-driven text features for sponsored search click prediction. In Proc. of ADKDD, 2009.
[17]
C. Xiong, T. Wang, W. Ding, Y. Shen, and T.-Y. Liu. Relational click prediction for sponsored search. In Proc. of WSDM, 2012.
[18]
W. Xu, E. Manavoglu, and E. Cantu-Paz. Temporal click model for sponsored search. In Proc. of SIGIR, 2010.
[19]
W. Zhang, X. He, B. Rey, and R. Jones. Query rewriting using active learning for sponsored search. In Proc. of SIGIR, 2007.

Cited By

View all
  • (2023)CFF: combining interactive features and user interest features for click-through rate predictionThe Journal of Supercomputing10.1007/s11227-023-05598-180:3(3282-3309)Online publication date: 4-Sep-2023
  • (2022)Click-through rate prediction in online advertising: A literature reviewInformation Processing & Management10.1016/j.ipm.2021.10285359:2(102853)Online publication date: Mar-2022
  • (2014)Evaluating quality score of new ads2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI)10.1109/ICACCI.2014.6968335(13-17)Online publication date: Sep-2014

Index Terms

  1. Sampling dilemma: towards effective data sampling for click prediction in sponsored search

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
        February 2014
        712 pages
        ISBN:9781450323512
        DOI:10.1145/2556195
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 24 February 2014

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. click prediction
        2. data sampling
        3. online advertising
        4. sponsored search

        Qualifiers

        • Research-article

        Conference

        WSDM 2014

        Acceptance Rates

        WSDM '14 Paper Acceptance Rate 64 of 355 submissions, 18%;
        Overall Acceptance Rate 498 of 2,863 submissions, 17%

        Upcoming Conference

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)2
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 12 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)CFF: combining interactive features and user interest features for click-through rate predictionThe Journal of Supercomputing10.1007/s11227-023-05598-180:3(3282-3309)Online publication date: 4-Sep-2023
        • (2022)Click-through rate prediction in online advertising: A literature reviewInformation Processing & Management10.1016/j.ipm.2021.10285359:2(102853)Online publication date: Mar-2022
        • (2014)Evaluating quality score of new ads2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI)10.1109/ICACCI.2014.6968335(13-17)Online publication date: Sep-2014

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media