Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2837689.2837691acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgirConference Proceedingsconference-collections
short-paper

Spatial characteristics of a large web n-gram corpus

Published: 26 November 2015 Publication History

Abstract

N-gram corpora, though prominently used to structure and index large natural language corpora, are rarely in the focus of GIR. In this study we describe a step in this direction by characterizing spatial information in a large Web n-gram corpus provided by Microsoft. We explore how continent and country toponyms are represented in this corpus and if basic topological relations can be correctly retrieved. Results suggest that toponym ambiguity has major impact and that although retrieved topological relations are often correct, recall is considerably low. We conclude that further research is required if more fine grained spatial information is to be retrieved from n-grams.

References

[1]
C. Derungs and R. Purves. Where is near? In Proceedings of the 8th International Conference on Geographic Information Science, 2014.
[2]
M. Graham, B. Hogan, R. K. Straumann, and A. Medhat. Uneven geographies of user-generated information: patterns of increasing informational poverty. Annals of the Association of American Geographers, 104(4):746--764, 2014.
[3]
J. L. Leidner and M. D. Liberman. Detecting Geographical References in the Form of Place Names and Associated Spatial Natural Language. In R. Purves and C. Jones, editors, Letters on Geographic Information Retrieval, pages 5--12. ACM Sigspatial Special, 2011.
[4]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval, volume 1. Cambridge University Press Cambridge, 2008.
[5]
K. Wang, C. Thrasher, E. Viegas, X. Li, and B.-j. P. Hsu. An overview of Microsoft Web N-gram corpus and applications. In Proceedings of the NAACL HLT 2010 Demonstration Session, pages 45--48. Association for Computational Linguistics, 2010.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
GIR '15: Proceedings of the 9th Workshop on Geographic Information Retrieval
November 2015
90 pages
ISBN:9781450339377
DOI:10.1145/2837689
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GIR
  2. ambiguity
  3. spatial information
  4. web N-gram

Qualifiers

  • Short-paper

Conference

GIR '15

Acceptance Rates

Overall Acceptance Rate 46 of 61 submissions, 75%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 51
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media