Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2806416.2806531acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Automated News Suggestions for Populating Wikipedia Entity Pages

Published: 17 October 2015 Publication History

Abstract

Wikipedia entity pages are a valuable source of information for direct consumption and for knowledge-base construction, update and maintenance. Facts in these entity pages are typically supported by references. Recent studies show that as much as 20% of the references are from online news sources. However, many entity pages are incomplete even if relevant information is already available in existing news articles. Even for the already present references, there is often a delay between the news article publication time and the reference time. In this work, we therefore look at Wikipedia through the lens of news and propose a novel news-article suggestion task to improve news coverage in Wikipedia, and reduce the lag of newsworthy references. Our work finds direct application, as a precursor, to Wikipedia page generation and knowledge-base acceleration tasks that rely on relevant and high quality input sources.
We propose a two-stage supervised approach for suggesting news articles to entity pages for a given state of Wikipedia. First, we suggest news articles to Wikipedia entities (article-entity placement) relying on a rich set of features which take into account the salience and relative authority of entities, and the novelty of news articles to entity pages. Second, we determine the exact section in the entity page for the input article (article-section placement) guided by class-based section templates. We perform an extensive evaluation of our approach based on ground-truth data that is extracted from external references in Wikipedia. We achieve a high precision value of up to 93% in the article-entity suggestion stage and upto 84% for the article-section placement. Finally, we compare our approach against competitive baselines and show significant improvements.

References

[1]
K. Balog and H. Ramampiaro. Cumulative citation recommendation: classification vs. ranking. In 36th ACM SIGIR, Dublin, Ireland, 2013, pages 941--944.
[2]
K. Balog, H. Ramampiaro, N. Takhirov, and K. Nørvåg. Multi-step classification approaches to cumulative citation recommendation. In OAIR, Lisbon, Portugal, 2013, pages 121--128.
[3]
Y. Bernstein and J. Zobel. Redundant documents and search effectiveness. In 14th ACM CIKM, pages 736--743, New York, USA, 2005.
[4]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia - a crystallization point for the web of data. J. Web Sem., 7(3), Sept. 2009.
[5]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, Mar. 2003.
[6]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[7]
C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM TIST, 2(3):27, 2011.
[8]
J. Dunietz and D. Gillick. A new entity salience task with millions of training examples. In 14th EACL, Gothenburg, Sweden, pages 205--209, 2014.
[9]
P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with wikipedia pages. IEEE Software, 29(1):70--75, 2012.
[10]
B. Fetahu, A. Anand, and A. Anand. How much is wikipedia lagging behind news? In WebSci '15, Oxford, UK, 2015.
[11]
J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. In 2011 EMNLP, Stroudsburg, PA, USA, 2011.
[12]
T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell., 24(7):881--892, 2002.
[13]
R. Kaptein, P. Serdyukov, A. De Vries, and J. Kamps. Entity ranking using wikipedia as a pivot. In 19th ACM CIKM, New York, USA, 2010.
[14]
M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In 47th ACL and the 4th AFNLP, pages 1003--1011, Stroudsburg, PA, USA, 2009.
[15]
V. Ng. Supervised noun phrase coreference research: The first fifteen years. In 48th ACL, 2010, Uppsala, Sweden, pages 1396--1411.
[16]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1999.
[17]
D. Pelleg, A. W. Moore, et al. X-means: Extending k-means with efficient estimation of the number of clusters. In ICML, pages 727--734, 2000.
[18]
C. Sauper and R. Barzilay. Automatically generating wikipedia articles: A structure-aware approach. In 47th ACL, 2009, Singapore, pages 208--216.
[19]
F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A core of semantic knowledge. In 16th WWW, New York, USA, 2007.
[20]
M. Surdeanu, D. McClosky, J. Tibshirani, J. Bauer, A. X. Chang, V. I. Spitkovsky, and C. D. Manning. A simple distant supervision approach for the tac-kbp slot filling task. In Text Analysis Conference 2010 Workshop.
[21]
B. Taneva and G. Weikum. Gem-based entity-knowledge maintenance. In 22nd ACM CIKM, pages 149--158, New York, USA, 2013.
[22]
K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL, pages 173--180, Stroudsburg, USA, 2003.
[23]
M. A. Walker, A. K. Joshi, and E. F. Prince. Centering theory in discourse. Oxford University Press, 1998.
[24]
P. Wang and C. Domeniconi. Building semantic kernels for text classification using wikipedia. In 14th ACM SIGKDD, New York, USA, 2008.
[25]
R. West, E. Gabrilovich, K. Murphy, S. Sun, R. Gupta, and D. Lin. Knowledge base completion via search-based question answering. In 23rd WWW, Seoul, Korea, pages 515--526, 2014.

Cited By

View all
  • (2023)AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware EntitiesProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614782(3361-3370)Online publication date: 21-Oct-2023
  • (2023)Listwise Explanations for Ranking Models Using Multiple ExplainersAdvances in Information Retrieval10.1007/978-3-031-28244-7_41(653-668)Online publication date: 2-Apr-2023
  • (2022)Predicting Guiding Entities for Entity Aspect LinkingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557671(3848-3852)Online publication date: 17-Oct-2022
  • Show More Cited By

Index Terms

  1. Automated News Suggestions for Populating Wikipedia Entity Pages

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
    October 2015
    1998 pages
    ISBN:9781450337946
    DOI:10.1145/2806416
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 October 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entity salience
    2. news suggestion
    3. populating wikipedia entities
    4. relative entity authority

    Qualifiers

    • Research-article

    Funding Sources

    • ERC Advanced Grant ALEXANDRIA

    Conference

    CIKM'15
    Sponsor:

    Acceptance Rates

    CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware EntitiesProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614782(3361-3370)Online publication date: 21-Oct-2023
    • (2023)Listwise Explanations for Ranking Models Using Multiple ExplainersAdvances in Information Retrieval10.1007/978-3-031-28244-7_41(653-668)Online publication date: 2-Apr-2023
    • (2022)Predicting Guiding Entities for Entity Aspect LinkingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557671(3848-3852)Online publication date: 17-Oct-2022
    • (2022)BERT-ERProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531944(1466-1477)Online publication date: 6-Jul-2022
    • (2022)An Entity-Oriented Approach for Answering Topical Information NeedsAdvances in Information Retrieval10.1007/978-3-030-99739-7_57(463-472)Online publication date: 10-Apr-2022
    • (2020)A Large Test Collection for Entity Aspect LinkingProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412875(3109-3116)Online publication date: 19-Oct-2020
    • (2020)Improving News Personalization Through Search LogsBias and Social Aspects in Search and Recommendation10.1007/978-3-030-52485-2_14(152-166)Online publication date: 12-Jul-2020
    • (2019)EncyCatalogRec: catalog recommendation for encyclopedia article completionFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.180036321:3(436-447)Online publication date: 5-Sep-2019
    • (2019)Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's VerifiabilityThe World Wide Web Conference10.1145/3308558.3313618(1567-1578)Online publication date: 13-May-2019
    • (2019) Swat : A system for detecting salient Wikipedia entities in texts Computational Intelligence10.1111/coin.1221635:4(858-890)Online publication date: 6-May-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media