research-article

Leave no Place Behind: Improved Geolocation in Humanitarian Documents

Authors:

Enrico Belliardo,

Kyriaki Kalimeri,

Yelena MejovaAuthors Info & Claims

GoodIT '23: Proceedings of the 2023 ACM Conference on Information Technology for Social Good

Pages 31 - 39

https://doi.org/10.1145/3582515.3609515

Published: 06 September 2023 Publication History

Abstract

Geographical location is a crucial element of humanitarian response, outlining vulnerable populations, ongoing events, and available resources. Latest developments in Natural Language Processing may help in extracting vital information from the deluge of reports and documents produced by the humanitarian sector. However, the performance and biases of existing state-of-the-art information extraction tools are unknown. In this work, we develop annotated resources to fine-tune the popular Named Entity Recognition (NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We then propose a geocoding method FeatureRank which links the candidate locations to the GeoNames database. We find that not only does the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92), but it also alleviates some of the bias of the existing tools, which erroneously favor locations in the Western countries. Thus, we conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for the deployment in the humanitarian sector.

References

[1]

American Red Cross. 2022. What is the Red Cross Global Network?https://www.redcross.org/about-us/news-and-events/news/2022/what-is-the-red-cross-global-network.html. September 16, 2022.

[2]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).

[3]

Davide Buscaldi and Bernardo Magnini. 2010. Grounding toponyms in an italian local news corpus. In Proceedings of the 6th workshop on geographic information retrieval. 1–5.

Digital Library

[4]

Hao Chen, Maria Vasardani, and Stephan Winter. 2018. Disambiguating fine-grained place names from descriptions by clustering. arXiv preprint arXiv:1808.05946 (2018).

[5]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of machine learning research 12, ARTICLE (2011), 2493–2537.

[6]

Craig M Dalton, Linnet Taylor, and Jim Thatcher. 2016. Critical data studies: A dialog on data and space. Big Data & Society 3, 1 (2016), 2053951716648346.

[7]

Franck Dernoncourt, Ji Young Lee, and Peter Szolovits. 2017. NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 97–102.

[8]

Selim Fekih, Nicolo’ Tamagnone, Benjamin Minixhofer, Ranjan Shrestha, Ximena Contla, Ewan Oglethorpe, and Navid Rekabsaz. 2022. HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crises Response. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 4379–4389. https://aclanthology.org/2022.findings-emnlp.321

[9]

Mark Graham. 2015. The hidden biases of Geodata. https://www.theguardian.com/news/datablog/2015/apr/28/the-hidden-biases-of-geodata. The Guardian. [Accessed April 10, 2023].

[10]

Mark Graham and Stefano De Sabbata. 2015. Mapping information wealth and poverty: the geography of gazetteers. Environment and Planning A 47, 6 (2015), 1254–1264.

[11]

Mark Graham, Scott A Hale, and Devin Gaffney. 2014. Where in the world are you? Geolocation and language identification in Twitter. The Professional Geographer 66, 4 (2014), 568–578.

[12]

Milan Gritta, Mohammad Taher Pilevar, and Nigel Collier. 2019. A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics. Language Resources and Evaluation 54 (09 2019). https://doi.org/10.1007/s10579-019-09475-3

Digital Library

[13]

Muhammad Imran, Carlos Castillo, Fernando Diaz, and Sarah Vieweg. 2015. Processing social media messages in mass emergency: A survey. ACM Computing Surveys (CSUR) 47, 4 (2015), 1–38.

Digital Library

[14]

Muhammad Imran, Umair Qazi, and Ferda Ofli. 2022. Tbcov: two billion multilingual covid-19 tweets with sentiment, entity, geo, and gender labels. Data 7, 1 (2022), 8.

[15]

Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, Vol. 1. 2.

[16]

Kelvin Lai, Jeremy R Porter, Mike Amodeo, David Miller, Michael Marston, and Saman Armal. 2022. A natural language processing approach to understanding context in the extraction and geocoding of historical floods, storms, and adaptation measures. Information Processing & Management 59, 1 (2022), 102735.

Digital Library

[17]

Jochen L. Leidner. 2007. Toponym resolution in text: Annotation, evaluation and applications of spatial grounding. SIGIR Forum 41 (01 2007), 124–126.

Digital Library

[18]

Chen Liang, Yue Yu, Haoming Jiang, Siawpeng Er, Ruijia Wang, Tuo Zhao, and Chao Zhang. 2020. Bond: Bert-assisted open-domain named entity recognition with distant supervision. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 1054–1064.

Digital Library

[19]

Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, and Honglak Lee. 2019. Zero-Shot Entity Linking by Reading Entity Descriptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3449–3460.

[20]

Momin Malik, Hemank Lamba, Constantine Nakos, and Jürgen Pfeffer. 2015. Population bias in geotagged tweets. In Proceedings of the international AAAI conference on web and social media, Vol. 9. 18–27.

[21]

Stuart E Middleton, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2018. Location extraction from social media: Geoparsing, location disambiguation, and geotagging. ACM Transactions on Information Systems (TOIS) 36, 4 (2018), 1–27.

Digital Library

[22]

David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3–26.

[23]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.

Digital Library

[24]

Fabio Poletto, Yunbai Zhang, André Panisson, Yelena Mejova, Daniela Paolotti, and Sylvain Ponserre. 2021. Developing Annotated Resources for Internal Displacement Monitoring. In Companion Proceedings of the Web Conference 2021. 136–144.

[25]

Mansi A Radke, Nitin Gautam, Akhil Tambi, Umesh A Deshpande, and Zareen Syed. 2018. Geotagging Text Data on the Web—A Geometrical Approach. IEEE Access 6 (2018), 30086–30099.

[26]

Manish Raghavan, Solon Barocas, Jon Kleinberg, and Karen Levy. 2020. Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 469–481.

Digital Library

[27]

Satoshi Sekine, Kiyoshi Sudo, and Chikashi Nobata. 2002. Extended Named Entity Hierarchy. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02). European Language Resources Association (ELRA), Las Palmas, Canary Islands - Spain. http://www.lrec-conf.org/proceedings/lrec2002/pdf/120.pdf

[28]

Özge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, and Chris Biemann. 2022. Neural entity linking: A survey of models based on deep learning. Semantic WebPreprint (2022), 1–44.

[29]

Laleh Seyyed-Kalantari, Haoran Zhang, Matthew BA McDermott, Irene Y Chen, and Marzyeh Ghassemi. 2021. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature medicine 27, 12 (2021), 2176–2182.

[30]

Jyoti Prakash Singh, Yogesh K Dwivedi, Nripendra P Rana, Abhinav Kumar, and Kawaljeet Kaur Kapoor. 2019. Event classification and location prediction from tweets during disasters. Annals of Operations Research 283 (2019), 737–757.

[31]

Statista. 2022. Leading countries based on number of Twitter users as of January 2022. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/. [Accessed April 10, 2023].

[32]

Briana Vecchione, Karen Levy, and Solon Barocas. 2021. Algorithmic auditing and social justice: Lessons from the history of audit studies. In Equity and Access in Algorithms, Mechanisms, and Optimization. 1–9.

[33]

Wikipedia. 2022. List of Wikipedias. https://meta.wikimedia.org/wiki/List_of_Wikipedias. [Accessed April 10, 2023].

[34]

Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Scalable Zero-shot Entity Linking with Dense Entity Retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6397–6407.

[35]

Vikas Yadav and Steven Bethard. 2018. A Survey on Recent Advances in Named Entity Recognition from Deep Learning models. In Proceedings of the 27th International Conference on Computational Linguistics. 2145–2158.

[36]

Xin Zheng, Jialong Han, and Aixin Sun. 2018. A survey of location prediction on twitter. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1652–1671.

Digital Library

[37]

Lei Zou, Nina SN Lam, Heng Cai, and Yi Qiang. 2018. Mining Twitter data for improved understanding of disaster resilience. Annals of the American Association of Geographers 108, 5 (2018), 1422–1441.

Cited By

Liberatore DKalimeri KSever DMejova Y(2024)Quantitative Information Extraction from Humanitarian DocumentsProceedings of the 2024 International Conference on Information Technology for Social Good10.1145/3677525.3678667(240-248)Online publication date: 4-Sep-2024
https://dl.acm.org/doi/10.1145/3677525.3678667

Index Terms

Leave no Place Behind: Improved Geolocation in Humanitarian Documents
1. Information systems
  1. Information systems applications
    1. Data mining
2. Social and professional topics
  1. User characteristics
    1. Geographic characteristics

Recommendations

Location Extraction from Social Media: Geoparsing, Location Disambiguation, and Geotagging

Location extraction, also called “toponym extraction,” is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. This article evaluates five “best-...
Mapping Historical Documents to Geographical Space
MOBIQUITOUS 2016: Adjunct Proceedings of the 13th International Conference on Mobile and Ubiquitous Systems: Computing Networking and Services

Geotagging is the process of recognizing place and facility names in a document, and assigning each set of latitude and longitude values. In the latter step, an external geographic database, which contains pairs of place/facility names and latitude/...
What's missing in geographical parsing?

Geographical data can be obtained by converting place names from free-format text into geographical coordinates. The ability to geo-locate events in textual reports represents a valuable source of information in many real-world applications such as ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GoodIT '23: Proceedings of the 2023 ACM Conference on Information Technology for Social Good

September 2023

560 pages

ISBN:9798400701160

DOI:10.1145/3582515

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCAS: ACM Special Interest Group on Computers and Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Fondazione CRT

Conference

GoodIT '23

Sponsor:

SIGCAS

GoodIT '23: ACM International Conference on Information Technology for Social Good

September 6 - 8, 2023

Lisbon, Portugal

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
190
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)7

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liberatore DKalimeri KSever DMejova Y(2024)Quantitative Information Extraction from Humanitarian DocumentsProceedings of the 2024 International Conference on Information Technology for Social Good10.1145/3677525.3678667(240-248)Online publication date: 4-Sep-2024
https://dl.acm.org/doi/10.1145/3677525.3678667

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents