Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2939672.2939873acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

How to Compete Online for News Audience: Modeling Words that Attract Clicks

Published: 13 August 2016 Publication History

Abstract

Headlines are particularly important for online news outlets where there are many similar news stories competing for users' attention. Traditionally, journalists have followed rules-of-thumb and experience to master the art of crafting catchy headlines, but with the valuable resource of large-scale click-through data of online news articles, we can apply quantitative analysis and text mining techniques to acquire an in-depth understanding of headlines. In this paper, we conduct a large-scale analysis and modeling of 150K news articles published over a period of four months on the Yahoo home page. We define a simple method to measure click-value of individual words, and analyze how temporal trends and linguistic attributes affect click-through rate (CTR). We then propose a novel generative model, headline click-based topic model (HCTM), that extends latent Dirichlet allocation (LDA) to reveal the effect of topical context on the click-value of words in headlines. HCTM leverages clicks in aggregate on previously published headlines to identify words for headlines that will generate more clicks in the future. We show that by jointly taking topics and clicks into account we can detect changes in user interests within topics. We evaluate HCTM in two different experimental settings and compare its performance with ALDA (adapted LDA), LDA, and TextRank. The first task, full headline, is to retrieve full headline used for a news article given the body of news article. The second task, good headline, is to specifically identify words in the headline that have high click values for current news audience. For full headline task, our model performs on par with ALDA, a state-of-the art web-page summarization method that utilizes click-through information. For good headline task, which is of more practical importance to both individual journalists and online news outlets, our model significantly outperforms all other comparative methods.

References

[1]
M. Ahmed, S. Spagna, F. Huici, and S. Niccolini. A peek into the future: Predicting the evolution of popularity in user generated content. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM '13, pages 607--616, New York, NY, USA, 2013.
[2]
R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval - the concepts and technology behind search, Second edition. Pearson Education Ltd., Harlow, England, 2011.
[3]
R. Bandari, S. Asur, and B. A. Huberman. The pulse of news in social media: Forecasting popularity. In ICWSM, 2012.
[4]
M. Banko, V. O. Mittal, and M. J. Witbrock. Headline generation based on statistical translation. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 318--325. Association for Computational Linguistics, 2000.
[5]
N. Barbieri and G. Manco. An analysis of probabilistic methods for top-n recommendation in collaborative filtering. In Machine Learning and Knowledge Discovery in Databases, pages 172--187. Springer, 2011.
[6]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022, 2003.
[7]
C. Castillo, M. El-Haddad, J. Pfeffer, and M. Stempeck. Characterizing the life cycle of online news stories using social media reactions. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, pages 211--223. ACM, 2014.
[8]
B. Dorr, D. Zajic, and R. Schwartz. Hedge trimmer: A parse-and-trim approach to headline generation. In Proceedings of the HLT-NAACL 03 on Text summarization workshop-Volume 5, pages 1--8. Association for Computational Linguistics, 2003.
[9]
T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 13--20, 2010.
[10]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National academy of Sciences of the United States of America, 101(Suppl 1):5228--5235, 2004.
[11]
A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 362--370. Association for Computational Linguistics, 2009.
[12]
E. Ifantidou. Newspaper headlines and relevance: Ad hoc concepts in ad hoc contexts. Journal of Pragmatics, 41(4):699--720, 2009.
[13]
A. C. König, M. Gamon, and Q. Wu. Click-through prediction for news queries. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 347--354. ACM, 2009.
[14]
Y. Liu, A. Niculescu-Mizil, and W. Gryc. Topic-link lda: joint models of topic and author community. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 665--672. ACM, 2009.
[15]
R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In Proceedings of EMNLP, volume 4, pages 275--283. Barcelona, Spain, 2004.
[16]
M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web, pages 521--530. ACM, 2007.
[17]
S. Saxena. Headline writing. Sage, 2006.
[18]
Z. M. Seward. How the huffington post uses real-time testing to write better headlines. http://www.niemanlab.org/2009/10/how-the-huffington-post-uses-real-time-testing-to-write-better-headlines/.
[19]
Z. M. Seward. A/b testing for headlines: Now available for wordpress. http://www.niemanlab.org/2010/11/ab-testing-for-headlines-now-available-for-wordpress/, Nov. 2010.
[20]
J.-S. Shie. Metaphors and metonymies in new york times and times supplement news headlines. Journal of Pragmatics, 43(5):1318--1334, 2011.
[21]
A. V. Stavros. Advances in Communications and Media Research, volume 2. Nova Publishers, 2002.
[22]
J.-T. Sun, D. Shen, H.-J. Zeng, Q. Yang, Y. Lu, and Z. Chen. Web-page summarization using clickthrough data. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 194--201. ACM, 2005.
[23]
G. Szabo and B. A. Huberman. Predicting the popularity of online content. Commun. ACM, 53(8), Aug. 2010.
[24]
T. A. Van Dijk. News as discourse. Lawrence Erlbaum Associates, Inc, 1988.
[25]
S. Wan, M. Dras, C. Paris, and R. Dale. Using thematic information in statistical headline generation. In Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering-Volume 12, pages 11--20. Association for Computational Linguistics, 2003.
[26]
S. Xu, S. Yang, and F. C.-M. Lau. Keyword extraction and headline generation using novel word features. In AAAI, 2010.
[27]
D. Zajic, B. Dorr, and R. Schwartz. Automatic headline generation for newspaper stories. In Workshop on Automatic Summarization, pages 78--85. Citeseer, 2002.

Cited By

View all
  • (2024)Evaluating User Engagement in Online News: A Deep Learning Approach Based on Attractiveness and Multiple FeaturesSystems10.3390/systems1208027412:8(274)Online publication date: 30-Jul-2024
  • (2024)The Language That Drives Engagement: A Systematic Large-scale Analysis of Headline ExperimentsMarketing Science10.1287/mksc.2021.0018Online publication date: 4-Nov-2024
  • (2024)What News Is Shared Where and How: A Multi-Platform Analysis of News Shared During the 2022 U.S. Midterm ElectionsSocial Media + Society10.1177/2056305124124595010:2Online publication date: 18-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. click-through rate
  2. headline prediction
  3. large-scale analysis
  4. online news analysis

Qualifiers

  • Research-article

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)63
  • Downloads (Last 6 weeks)7
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Evaluating User Engagement in Online News: A Deep Learning Approach Based on Attractiveness and Multiple FeaturesSystems10.3390/systems1208027412:8(274)Online publication date: 30-Jul-2024
  • (2024)The Language That Drives Engagement: A Systematic Large-scale Analysis of Headline ExperimentsMarketing Science10.1287/mksc.2021.0018Online publication date: 4-Nov-2024
  • (2024)What News Is Shared Where and How: A Multi-Platform Analysis of News Shared During the 2022 U.S. Midterm ElectionsSocial Media + Society10.1177/2056305124124595010:2Online publication date: 18-Apr-2024
  • (2021)The media image of Chinese older people: From stigmatic stereotype to diverse self-representationGlobal Media and China10.1177/205943642110125136:3(281-302)Online publication date: 15-May-2021
  • (2021)Anticipating Attention: On the Predictability of News Headline TestsDigital Journalism10.1080/21670811.2021.198426610:4(647-668)Online publication date: 13-Oct-2021
  • (2021)Toward Successful Social Media Viral Marketing: A Knowledge Management ApproachThe Importance of New Technologies and Entrepreneurship in Business Development: In The Context of Economic Diversity in Developing Countries10.1007/978-3-030-69221-6_28(377-389)Online publication date: 13-Mar-2021
  • (2019)Using Butterfly-patterned Partial Sums to Draw from Discrete DistributionsACM Transactions on Parallel Computing10.1145/33656626:4(1-30)Online publication date: 19-Nov-2019
  • (2019)The decision‐making process in viral marketing—A review and suggestions for further researchPsychology & Marketing10.1002/mar.2125636:11(1062-1081)Online publication date: 28-Aug-2019
  • (2017)Using Butterfly-Patterned Partial Sums to Draw from Discrete DistributionsACM SIGPLAN Notices10.1145/3155284.301875752:8(341-355)Online publication date: 26-Jan-2017
  • (2017)Demographics of News Sharing in the U.S. TwittersphereProceedings of the 28th ACM Conference on Hypertext and Social Media10.1145/3078714.3078734(195-204)Online publication date: 4-Jul-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media