Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2491748.2491775acmotherconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Multi-step classification approaches to cumulative citation recommendation

Published: 15 May 2013 Publication History

Abstract

Knowledge bases have become indispensable sources of information. It is therefore critical that they rely on the latest information available and get updated every time new facts surface. Knowledge base acceleration (KBA) systems seek to help humans expand knowledge bases like Wikipedia by automatically recommending edits based on incoming content streams. A core step in this process is that of identifying relevant content, i.e., filtering documents that would imply modifications to the attributes or relations of a given target entity. We propose two multi-step classification approaches for this task that consist of two and three binary classification steps, respectively. Both methods share the same initial component, which is concerned with the identification of entity mentions in documents, while subsequent steps involve identification of documents being relevant and/or central to a given entity. Using the evaluation platform of the TREC 2012 KBA track and a rich feature set developed for this particular task, we show that both approaches deliver state-of-the-art performance.

References

[1]
D. Ahn, V. Jijkoun, G. Mishne, K. Müller, M. de Rijke, S. Schlobach, M. Voorhees, and L. Buckland. Using Wikipedia at the TREC QA track. In TREC '04, 2005.
[2]
J. Allan. Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, 2002.
[3]
J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In SIGIR '98, pages 37--45, 1998.
[4]
S. Araujo, G. Gebremeskel, J. He, C. Bosscarino, and A. de Vries. CWI at TREC 2012, KBA track and session track. In TREC '12, 2013.
[5]
K. Balog, P. Serdyukov, and A. P. de Vries. Overview of the TREC 2011 entity track. In TREC '11, 2012.
[6]
R. Berendsen, E. Meij, D. Odijk, M. de Rijke, and W. Weerkamp. The University of Amsterdam at TREC 2012. In TREC '12, 2013.
[7]
C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. Int. J. Semantic Web Inf. Syst., 5(3):1--22, 2009.
[8]
L. Bonnefoy, V. Bouvier, and P. Bellot. LSIS/LIA at TREC 2012 knowledge base acceleration. In TREC '12, 2013.
[9]
R. C. Bunescu and M. Pasca. Using encyclopedic knowledge for named entity disambiguation. In EACL '06, pages 9--16, 2006.
[10]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI '10, pages 1306--1313, 2010.
[11]
S. Cucerzan. Large-scale named entity disambiguation based on Wikipedia data. In EMNLP-CoNLL '07, pages 708--716, 2007.
[12]
J. Dalton and L. Dietz. Bi-directional linkability from Wikipedia to documents and back again: UMass at TREC 2012 knowledge base acceleration track. In TREC '12, 2013.
[13]
G. Demartini, T. Iofciu, and A. P. De Vries. Overview of the INEX 2009 entity ranking track. In INEX '09, 2010.
[14]
M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In COLING '10, pages 277--285, 2010.
[15]
O. Etzioni, M. Banko, S. Soderland, and D. S. Weld. Open information extraction from the web. Commun. ACM, 51(12):68--74, 2008.
[16]
J. R. Frank, M. Kleiman-Weiner, D. A. Roberts, F. Niu, C. Zhang, C. Ré, and I. Soboroff. Building an entity-centric stream filtering test collection for TREC 2012. In TREC '12, 2013.
[17]
O. Gross, A. Doucet, and H. Toivonen. Term association analysis for named entity filtering. In TREC '12, 2013.
[18]
U. Hanani, B. Shapira, and P. Shoval. Information filtering: Overview of issues, research and systems. User Model. User-Adapt. Interact., 11(3):203--259, 2001.
[19]
J. Hoffart, F. Suchanek, K. Berberich, and G. Weikum. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell., 194:28--61, 2013.
[20]
D. W. Huang, Y. Xu, A. Trotman, and S. Geva. Overview of INEX 2007 link the wiki track. In INEX '07, 2008.
[21]
B. Huurnink, L. Hollink, W. van den Heuvel, and M. de Rijke. Search behavior of media professionals at an audiovisual archive: A transaction log analysis. J. Am. Soc. Inf. Sci. Technol., 61(6):1180--1197, 2010.
[22]
H. Ji and R. Grishman. Knowledge base population: successful approaches and challenges. In ACL HLT '11, pages 1148--1158, 2011.
[23]
J. Kazama and K. Torisawa. Exploiting Wikipedia as external knowledge for named entity recognition. In EMNLP-CoNLL '07, pages 698--707, 2007.
[24]
B. Kjersten and P. McNamee. The HLTCOE approach to the TREC 2012 KBA track. In TREC '12, 2013.
[25]
Y. Li, Z. Wang, B. Yu, Y. Zhang, R. Luo, W. Xu, G. Chen, and J. Guo. PRIS at TREC2012 KBA track. In TREC '12, 2013.
[26]
X. Liu and H. Fang. Entity profile based approach in automatic knowledge finding. In TREC '12, 2013.
[27]
J. Mayfield, D. Lawrie, P. McNamee, and D. W. Oard. Building a cross-language entity linking collection in twenty-one languages. In CLEF '11, 2011.
[28]
E. Meij and M. de Rijke. Supervised query modeling using Wikipedia. In SIGIR '10, pages 875--876, 2010.
[29]
E. Meij, M. Bron, L. Hollink, B. Huurnink, and M. de Rijke. Mapping queries to the linking open data cloud: A case study using DBpedia. Web Semantics, 9(4):418--433, 2011.
[30]
E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In WSDM '12, pages 563--572, 2012.
[31]
R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM '07, pages 233--242, 2007.
[32]
D. N. Milne and I. H. Witten. Learning to link with Wikipedia. In CIKM '08, pages 509--518, 2008.
[33]
D. Mladenic. Using text learning to help web browsing. In SIGCHI '01, pages 893--897, 2001.
[34]
N. Nanas, A. Roeck, and M. Vavalis. What happened to content-based information filtering? In ICTIR '09, pages 249--256, 2009.
[35]
N. Nanas, M. Vavalis, and A. N. D. Roeck. A network-based model for high-dimensional information filtering. In SIGIR '10, pages 202--209, 2010.
[36]
T. Polajnar, R. Glassey, and L. Azzopardi. Detection of news feeds items appropriate for children. In ECIR '12, pages 63--72, 2012.
[37]
S. E. Robertson and I. Soboroff. The TREC 2002 filtering track report. In TREC '02, 2003.
[38]
W. Shen, J. Wang, P. Luo, and M. Wang. LINDEN: linking named entities with knowledge base via semantic knowledge. In WWW '12, pages 449--458, 2012.
[39]
B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas. Short text classification in twitter to improve information filtering. In SIGIR '10, pages 841--842, 2010.
[40]
F. Suchanek, G. Kasneci, and G. Weikum. YAGO: a core of semantic knowledge. In WWW '07, pages 697--706, 2007.
[41]
C. Tompkins, Z. Witter, and S. G. Small. SAWUS Siena's automatic Wikipedia update system. In TREC '12, 2013.
[42]
Z. Wang, J. Li, Z. Wang, and J. Tang. Cross-lingual knowledge linking across wiki knowledge bases. In WWW '12, pages 459--468, 2012.
[43]
W. Weerkamp and M. de Rijke. Credibility improves topical blog post retrieval. In ACL '08, pages 923--931, 2008.
[44]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers Inc., 2005.
[45]
Y. Xu, G. J. Jones, and B. Wang. Query dependent pseudo-relevance feedback based on Wikipedia. In SIGIR '09, pages 59--66, 2009.
[46]
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97, pages 412--420, 1997.

Cited By

View all
  • (2018)Citation Worthiness of Sentences in Scientific ReportsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210162(1061-1064)Online publication date: 27-Jun-2018
  • (2017)Towards building a knowledge base of monetary transactions from a news collectionProceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries10.5555/3200334.3200357(209-218)Online publication date: 19-Jun-2017
  • (2016)Finding News Citations for WikipediaProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983808(337-346)Online publication date: 24-Oct-2016
  • Show More Cited By

Index Terms

  1. Multi-step classification approaches to cumulative citation recommendation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    OAIR '13: Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
    May 2013
    236 pages
    ISBN:9782905450098

    Sponsors

    • CID (France): Le Centre de Hautes Etudes Internationales D'Informatique Documentaire

    In-Cooperation

    Publisher

    LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE

    Paris, France

    Publication History

    Published: 15 May 2013

    Check for updates

    Author Tags

    1. cumulative citation recommendation
    2. information filtering
    3. knowledge base acceleration

    Qualifiers

    • Research-article

    Conference

    OAIR '13
    Sponsor:
    • CID (France)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)Citation Worthiness of Sentences in Scientific ReportsThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210162(1061-1064)Online publication date: 27-Jun-2018
    • (2017)Towards building a knowledge base of monetary transactions from a news collectionProceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries10.5555/3200334.3200357(209-218)Online publication date: 19-Jun-2017
    • (2016)Finding News Citations for WikipediaProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983808(337-346)Online publication date: 24-Oct-2016
    • (2016)Document Filtering for Long-tail EntitiesProceedings of the 25th ACM International on Conference on Information and Knowledge Management10.1145/2983323.2983728(771-780)Online publication date: 24-Oct-2016
    • (2015)When temporal expressions help to detect vital documents related to an entityACM SIGAPP Applied Computing Review10.1145/2835260.283526315:3(49-58)Online publication date: 13-Oct-2015
    • (2015)Automated News Suggestions for Populating Wikipedia Entity PagesProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806531(323-332)Online publication date: 17-Oct-2015
    • (2015)An Entity Class-Dependent Discriminative Mixture Model for Cumulative Citation RecommendationProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767698(635-644)Online publication date: 9-Aug-2015
    • (2015)Leveraging temporal expressions to filter vital documents related to an entityProceedings of the 30th Annual ACM Symposium on Applied Computing10.1145/2695664.2695910(1093-1098)Online publication date: 13-Apr-2015
    • (2013)KBAAAProceedings of the 10th Conference on Open Research Areas in Information Retrieval10.5555/2491748.2491794(215-216)Online publication date: 15-May-2013
    • (2013)Leveraging related entities for knowledge base accelerationProceedings of the 4th international workshop on Web-scale knowledge representation retrieval and reasoning10.1145/2512405.2512407(1-4)Online publication date: 1-Nov-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media