Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2042536.2042584guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Linking archives using document enrichment and term selection

Published: 26 September 2011 Publication History

Abstract

News, multimedia and cultural heritage archives are increasingly offering opportunities to create connections between their collections. We consider the task of linking archives: connecting an item in one archive to one or more items in other, often complementary archives. We focus on a specific instance of the task: linking items with a rich textual representation in a news archive to items with sparse annotations in a multimedia archive, where items should be linked if they describe the same or a related event. We find that the difference in textual richness of annotations presents a challenge and investigate two approaches: (i) to enrich sparsely annotated items with textually rich content; and (ii) to reduce rich news archive items using term selection. We demonstrate the positive impact of both approaches on linking to same events and linking to related events.

References

[1]
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., et al.: Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194-218 (1998).
[2]
Bron, M., van Gorp, J., Nack, F., de Rijke, M.: Exploratory search in an audio-visual archive: Evaluating a professional search tool for non-professional users. In: EuroHCIR 2011: 1st European Workshop on Human-Computer Interaction and Information Retrieval (July 2011).
[3]
Carrick, C., Watters, C.: Automatic association of news items. Information Processing & Management 33(5), 615-632 (1997).
[4]
Cohn, D., Hofmann, T.: The missing link-a probabilistic model of document content and hypertext connectivity. In: NIPS 2001, pp. 430-436 (2001).
[5]
Diaz, F., Metzler, D.: Improving the estimation of relevance models using large external corpora. In: SIGIF 2006, pp. 154-161. ACM, New York (2006).
[6]
Finkel, J., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL 2005, pp. 363-370. ACL (2005).
[7]
Franz, M., Ward, T., McCarley, J., Zhu, W.: Unsupervised and supervised clustering for topic tracking. In: SIGIR 2001, pp. 310-317. ACM, New York (2001).
[8]
Harman, D.K.: The TREC test collections. In: Voorhees, E.M., Harman, D.K. (eds.) TREC: Experiment and Evaluation in Information Retrieval. MIT, Cambridge (2005).
[9]
Henzinger, M., Chang, B.-W., Milch, B., Brin, S.: Query-free news search. In: World Wide Web, vol. 8, pp. 101-126 (2005).
[10]
Huurnink, B., Hollink, L., van den Heuvel, W., de Rijke, M.: Search behavior of media professionals at an audiovisual archive: A transaction log analysis. J. American Soc. Information Science and Technology 61(6), 1180-1197 (2010).
[11]
Kern, R., Granitzer, M.: German encyclopedia alignment based on information retrieval techniques. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 315-326. Springer, Heidelberg (2010).
[12]
Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: SIGIR 2004, pp. 297-304. ACM, New York (2004).
[13]
Li, Z., Wang, B., Li, M., Ma, W.: A probabilistic model for retrospective news event detection. In: SIGIR 2005, pp. 106-113. ACM, New York (2005).
[14]
Ma, Q., Nadamoto, A., Tanaka, K.: Complementary information retrieval for cross-media news content. Information Systems 31(7), 659-678 (2006).
[15]
Meij, E., Bron, M., Hollink, L., Huurnink, B., de Rijke, M.: Learning semantic query suggestions. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 424-440. Springer, Heidelberg (2009).
[16]
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: CIKM 2007, vol. 7, pp. 233-242 (2007).
[17]
Radev, D., Otterbacher, J., Winkel, A., Blair-Goldensohn, S.: NewsInEssence: summarizing online news topics. Comm. of the ACM 48(10), 95-98 (2005).
[18]
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Comm. of the ACM 18(11), 613-620 (1975).
[19]
Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: HLT-NAACL 2006, pp. 407-414 (2006).
[20]
Tsagkias, M., de Rijke, M., Weerkamp, W.: Linking online news and social media. In: WSDM 2011, pp. 565-574. ACM, New York (2011).
[21]
Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR 2002, pp. 81-88. ACM, New York (2002).

Cited By

View all
  • (2017)QALinkProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132934(1359-1368)Online publication date: 6-Nov-2017
  • (2015)Convenient Discovery of Archived Video Using Audiovisual HyperlinkingProceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia10.1145/2802558.2814652(23-26)Online publication date: 30-Oct-2015
  • (2015)Dynamic Query Modeling for Related Content FindingProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767715(33-42)Online publication date: 9-Aug-2015
  • Show More Cited By
  1. Linking archives using document enrichment and term selection

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    TPDL'11: Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
    September 2011
    534 pages
    ISBN:9783642244681
    • Editors:
    • Stefan Gradmann,
    • Francesca Borri,
    • Carlo Meghini,
    • Heiko Schuldt

    Sponsors

    • Ashgate: Ashgate Publishing Group
    • Emerald: Emerald Group Publishing Limited
    • SWETS Information Services
    • Ex Libris: Ex Libris
    • IOS Press: IOS Press

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 26 September 2011

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)QALinkProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132934(1359-1368)Online publication date: 6-Nov-2017
    • (2015)Convenient Discovery of Archived Video Using Audiovisual HyperlinkingProceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia10.1145/2802558.2814652(23-26)Online publication date: 30-Oct-2015
    • (2015)Dynamic Query Modeling for Related Content FindingProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767715(33-42)Online publication date: 9-Aug-2015
    • (2015)Defining and Evaluating Video Hyperlinking for Navigating Multimedia ArchivesProceedings of the 24th International Conference on World Wide Web10.1145/2740908.2742915(727-732)Online publication date: 18-May-2015
    • (2015)Fast and Space-Efficient Entity Linking for QueriesProceedings of the Eighth ACM International Conference on Web Search and Data Mining10.1145/2684822.2685317(179-188)Online publication date: 2-Feb-2015
    • (2015)A generalized topic modeling approach for automatic document annotationInternational Journal on Digital Libraries10.1007/s00799-015-0146-216:2(111-128)Online publication date: 1-Jun-2015
    • (2014)Linking Today's Wikipedia and News from the PastProceedings of the 7th Workshop on Ph.D Students10.1145/2663714.2668048(1-8)Online publication date: 3-Nov-2014
    • (2014)Time-aware topic-based contextualizationProceedings of the 23rd International Conference on World Wide Web10.1145/2567948.2567957(15-20)Online publication date: 7-Apr-2014
    • (2014)An Investigation into Feature Effectiveness for Multimedia HyperlinkingProceedings of the 20th Anniversary International Conference on MultiMedia Modeling - Volume 832610.1007/978-3-319-04117-9_23(251-262)Online publication date: 6-Jan-2014
    • (2014)Automatically embedding newsworthy links to articlesJournal of the Association for Information Science and Technology10.1002/asi.2295965:1(129-145)Online publication date: 1-Jan-2014
    • Show More Cited By

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media