Article

Web-a-where: geotagging web content

Authors:

Einat Amitay,

Nadav Har'El,

Ron Sivan,

Aya SofferAuthors Info & Claims

SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 273 - 280

https://doi.org/10.1145/1008992.1009040

Published: 25 July 2004 Publication History

Get Access

Abstract

We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it assigns to each page a geographic focus --- a locality that the page discusses as a whole. The tagging process is simple and fast, aimed to be applied to large collections of Web pages and to facilitate a variety of location-based applications and data analyses.Geotagging involves arbitrating two types of ambiguities: geo/non-geo and geo/geo. A geo/non-geo ambiguity occurs when a place name also has a non-geographic meaning, such as a person name (e.g., Berlin) or a common word (Turkey). Geo/geo ambiguity arises when distinct places have the same name, as in London, England vs. London, Ontario.An implementation of the tagger within the framework of the WebFountain data mining system is described, and evaluated on several corpora of real Web pages. Precision of up to 82% on individual geotags is achieved. We also evaluate the relative contribution of various heuristics the tagger employs, and evaluate the focus-finding algorithm using a corpus pretagged with localities, showing that as many as 91% of the foci reported are correct up to the country level.

References

[1]

Google Search by Location http://labs.google.com/location.]]

Google Scholar

[2]

ISO 3166 code lists. http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/index.html.]]

Google Scholar

[3]

MεταCARTA, Inc. 875 Massachusetts Avenue, Cambridge, MA 02139. http://www.metacarta.com.]]

Google Scholar

[4]

ODP: Regional. http://dmoz.org/regional.]]

Google Scholar

[5]

Text REtrieval Conference 2003: .gov test collection. http://es.cmis.csiro.au/trecweb/access_to_data.html.]]

Google Scholar

[6]

United Nations department of economic and social affairs. http://unstats.un.org/unsd.]]

Google Scholar

[7]

USGS Geographic Names Information System (GNIS). http://geonames.usgs.gov.]]

Google Scholar

[8]

WebFountain framework for data mining. http://www.almaden.ibm.com/webfountain.]]

Google Scholar

[9]

World Gazetteer. http://www.world-gazetteer.com.]]

Google Scholar

[10]

The 6th message understanding conference task definition, March 1995. http://www.cs.nyu.edu/cs/faculty/grishman/COtask21.book_1.html.]]

Google Scholar

[11]

Language-independent named entity recognition: shared task, 2002. http://cnts.uia.ac.be/conll2002/ner.]]

Google Scholar

[12]

F. Bilhaut, T. Charnois, P. Enjalbert, and Y. Mathet. Geographic reference analysis for geographic document querying. In Workshop on the Analysis of Geographic References, Edmonton, Alberta, Canada, May 2003. NAACL-HLT.]]

Digital Library

Google Scholar

[13]

J. D. Burger, J. C. Henderson, and W. T. Morgan. Statistical named entity recognizer adaptation. In Proceedings of CoNLL-2002, pages 163--166, 2002.]]

Digital Library

Google Scholar

[14]

S. Cucerzan and D. Yarowsky. Language independent NER using a unified model of internal and contextual evidence. In Proceedings of CoNLL-2002, pages 171--175. Taipei, Taiwan, 2002.]]

Digital Library

Google Scholar

[15]

J. Ding, L. Gravano, and N. Shivakumar. Computing geographical scopes of web resources. In Proceedings of the 26th VLDB Conference, Cairo, Egypt, 2000.]]

Digital Library

Google Scholar

[16]

G. Eriksson, K. Franzén, F. Olsson, L. Asker, and P. Lidén. Exploiting syntax when detecting protein names in text. In Proceedings of Workshop on Natural Language Processing in Biomedical Applications, 2002.]]

Google Scholar

[17]

J. Leidner, G. Sinclair, and B. Webber. Grounding spatial named entities for information extraction and question answering. In Workshop on the Analysis of Geographic References, Edmonton, Alberta, Canada, May 2003. NAACL-HLT.]]

Digital Library

Google Scholar

[18]

H. Li, R. K. Srihari, C. Niu, and W. Li. Location normalization for information extraction. In Proc. of the 19th Conference on Computational Linguistics (COLING-02), Taipei, Taiwan, August 2002. ACL.]]

Digital Library

Google Scholar

[19]

H. Li, R. K. Srihari, C. Niu, and W. Li. infoXtract location normalization: a hybrid approach to geographical references in information extraction. In Workshop on the Analysis of Geographic References, Edmonton, Canada, May 2003. NAACL-HLT.]]

Digital Library

Google Scholar

[20]

R. Malouf. Markov models for language-independent named entity recognition. In Proceedings of CoNLL-2002, pages 187--190, Taipei, Taiwan, 2002.]]

Digital Library

Google Scholar

[21]

K. S. McCurley. Geospatial mapping and navigation of the web. In Proc. of the 10th int. conference on World Wide Web, pages 221--229. ACM Press, 2001.]]

Digital Library

Google Scholar

[22]

P. McNamee and J. Mayfield. Entity extraction without language-specific resources. In Proceedings of CoNLL-2002, pages 183--186. Taipei, Taiwan, 2002.]]

Digital Library

Google Scholar

[23]

J. Patrick, C. Whitelaw, and R. Munro. Slinerc: The sydney language-independent named entity recogniser and classifier. In Proceedings of CoNLL-2002, pages 199--202. Taipei, Taiwan, 2002.]]

Digital Library

Google Scholar

[24]

E. Rauch, M. Bukatin, and K. Baker. A confidence-based framework for disambiguating geographic terms. In Workshop on the Analysis of Geographic References, Edmonton, Alberta, Canada, May 2003. NAACL-HLT.]]

Digital Library

Google Scholar

[25]

Y. Ravin and N. Wacholder. Extracting names from natural-language text. Technical Report RC-20338, IBM Research Division, T.J.Watson, Yorktown Heights, NY, October 1997.]]

Google Scholar

[26]

D. A. Smith and G. Crane. Disambiguating geographic names in a historical digital library. In Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries (ECDL'01), Lecture Notes in Computer Science, pages 127--136, Darmstadt, September 2001. Springer.]]

Digital Library

Google Scholar

[27]

B. Sundheim. Overview of results of the MUC-6 evaluation. In Proc. of the 6th message understanding conference, pages 13--32, Columbia, MD, Nov. 1995.]]

Digital Library

Google Scholar

[28]

D. Wu, G. Ngai, M. Carpuat, J. Larsen, and Y. Yang. Boosting for named entity recognition. In Proceedings of CoNLL-2002, pages 195--198. Taipei, Taiwan, 2002.]]

Digital Library

Google Scholar

[29]

G. Zhou and J. Su. Named entity tagging using an HMM-based chunk tagger. In Proceedings of the 40th Annual meeting of the ACL, pages 209--219, Philadelphia, PA, July 2002.]]

Digital Library

Google Scholar

Cited By

View all

Ferrari EStriewski FTiefenbacher FBereuter POesch DDi Donato P(2024)Search Engine for Open Geospatial Consortium Web Services Improving Discoverability through Natural Language Processing-Based Processing and RankingISPRS International Journal of Geo-Information10.3390/ijgi1304012813:4(128)Online publication date: 12-Apr-2024
https://doi.org/10.3390/ijgi13040128
Jiang GWang YLi YMoosavi NHui P(2024)Blending Social Interaction Realms: Harmonizing Online and Offline Interactions through Augmented RealityProceedings of the 17th International Symposium on Visual Information Communication and Interaction10.1145/3678698.3678700(1-8)Online publication date: 11-Dec-2024
https://dl.acm.org/doi/10.1145/3678698.3678700
Liu YSingh L(2024)Utilizing External Knowledge to Enhance Location Prediction for Twitter/X Users in Low Resource SettingsACM Transactions on Spatial Algorithms and Systems10.1145/3673899Online publication date: 19-Jun-2024
https://doi.org/10.1145/3673899
Show More Cited By

Index Terms

Web-a-where: geotagging web content
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Location Extraction from Social Media: Geoparsing, Location Disambiguation, and Geotagging

Location extraction, also called “toponym extraction,” is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This article evaluates five “best-...
Geotagging Named Entities in News and Online Documents
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

News sources generate constant streams of text with many references to real world entities; understanding the content from such sources often requires effectively detecting the geographic foci of the entities. We study the problem of associating ...
TEXTOMAP: determining geographical window for texts
GIR '15: Proceedings of the 9th Workshop on Geographic Information Retrieval

In newspapers or scholar manuals, numerous texts are accompanied by maps. In these map/text couples, maps give a spatial portrayal of the text issues, thus they make the spatial issues easier to understand. TEXTOMAP aims to design the geographical ...

Reviews

Reviewer: Wei Tang

Location-assisted search has been gaining momentum recently. For example, Google has introduced a new service called "Search by Location." (Other search engines offer similar services, for example, Gigablast.com and local-news.net.) However, there remain unanswered issues in this research area, for example, how to increase the precision of name resolving (and provide automatic measurement), find the focus (or foci) of a Web page, and bring the search scope to a broader geographical region worldwide. This paper describes Web-a-Where, a system for associating geography with Web pages. The process includes two steps: geotagging, and focus-finding for each page. The algorithms are implemented in the framework of the WebFountain data mining system. There is a performance evaluation for the geotagging and focus-finding algorithms, using several corpora of real Web pages derived from three categories: arbitrary, pages in the .gov domain, and pages from the "regional" sub-category in the Open Directory Project (ODP). The result shows that geotagging achieves a precision of up to 82 percent, and the focus-finding algorithm correctly finds as many as 91 percent of the foci reported, up to the country level. The authors also note that the main source of errors for geotagging comes from geo/nongeo ambiguity, a case in which a place name has another nongeo meaning, for example, Mobile (Alabama). The work described in the paper is novel, in that the algorithms are not covered in prior research, and they demonstrate promising results. The evaluation platform is also unique, and is effective in measuring the precision of the algorithms. The paper is well structured, with concepts and algorithms clearly explained. The presentation is also clean, with a good balance of tables and figures to describe the performance evaluation results. As a proof-of-concept system, Web-a-Where does a reasonable job in identifying place names in Web pages. However, the precision level is not yet satisfactory (especially in the ODP category). More experiments should be done on the effect of the confidence assignments in the disambiguation algorithm. The authors did not explain why they chose the current assignments in the evaluation. More heuristic algorithms may need to be developed (for example, correlation between place names and other terms). It would also be interesting to see the runtime performance of the system when applied to a much larger corpus of Web pages. Overall, this is a solid research paper, with good technical depth, and interesting demonstrated results. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval

July 2004

624 pages

ISBN:1581138814

DOI:10.1145/1008992

General Chair:
Mark Sanderson
University of Sheffield (UK)
,
Program Chairs:
Kalervo Järvelin
University of Tampere (Finland)
,
James Allan
University of Massachusetts (USA)
,
Peter Bruza
Distributed Systems Technology Centre (Australia)

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR04

Sponsor:

SIGIR04: The 27th ACM/SIGIR International Symposium on Information Retrieval 2004

July 25 - 29, 2004

Sheffield, United Kingdom

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

398
Total Citations
View Citations
2,858
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)6

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ferrari EStriewski FTiefenbacher FBereuter POesch DDi Donato P(2024)Search Engine for Open Geospatial Consortium Web Services Improving Discoverability through Natural Language Processing-Based Processing and RankingISPRS International Journal of Geo-Information10.3390/ijgi1304012813:4(128)Online publication date: 12-Apr-2024
https://doi.org/10.3390/ijgi13040128
Jiang GWang YLi YMoosavi NHui P(2024)Blending Social Interaction Realms: Harmonizing Online and Offline Interactions through Augmented RealityProceedings of the 17th International Symposium on Visual Information Communication and Interaction10.1145/3678698.3678700(1-8)Online publication date: 11-Dec-2024
https://dl.acm.org/doi/10.1145/3678698.3678700
Liu YSingh L(2024)Utilizing External Knowledge to Enhance Location Prediction for Twitter/X Users in Low Resource SettingsACM Transactions on Spatial Algorithms and Systems10.1145/3673899Online publication date: 19-Jun-2024
https://doi.org/10.1145/3673899
Leppämäki TToivonen THiippala T(2024)Geographical and linguistic perspectives on developing geoparsers with generic resourcesInternational Journal of Geographical Information Science10.1080/13658816.2024.236953938:10(2039-2060)Online publication date: 30-Jun-2024
https://doi.org/10.1080/13658816.2024.2369539
Zhang MLiu XZhang ZQiu YJiang ZZhang P(2024)CHTopoNER model-based method for recognizing Chinese place names from social media informationJournal of Geographical Systems10.1007/s10109-023-00433-w26:1(149-179)Online publication date: 11-Jan-2024
https://doi.org/10.1007/s10109-023-00433-w
Sharma PSamal ASoh LJoshi D(2023)A Spatially-Aware Data-Driven Approach to Automatically Geocoding Non-Gazetteer Place NamesACM Transactions on Spatial Algorithms and Systems10.1145/362798710:1(1-34)Online publication date: 11-Dec-2023
https://dl.acm.org/doi/10.1145/3627987
Hu XZhou ZLi HHu YGu FKersten JFan HKlan F(2023)Location Reference Recognition from Texts: A Survey and ComparisonACM Computing Surveys10.1145/362581956:5(1-37)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3625819
Lewis MSoudjani SZuliani P(2023)Formal Verification of Quantum Programs: Theory, Tools and ChallengesACM Transactions on Quantum Computing10.1145/3624483Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3624483
Al Rhman Sarsour ABouros PChondrogiannis T(2023)Towards Generating Realistic Geosocial NetworksProceedings of the 7th ACM SIGSPATIAL Workshop on Location-based Recommendations, Geosocial Networks and Geoadvertising10.1145/3615896.3628340(25-28)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3615896.3628340
Krause ACohen S(2023)Geographic Information Retrieval Using Wikipedia ArticlesProceedings of the ACM Web Conference 202310.1145/3543507.3583469(3331-3341)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583469
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Location Extraction from Social Media: Geoparsing, Location Disambiguation, and Geotagging

Geotagging Named Entities in News and Online Documents

TEXTOMAP: determining geographical window for texts

Reviews

Access critical reviews of Computing literature here