Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3582515.3609515acmconferencesArticle/Chapter ViewAbstractPublication PagesgooditConference Proceedingsconference-collections
research-article

Leave no Place Behind: Improved Geolocation in Humanitarian Documents

Published: 06 September 2023 Publication History

Abstract

Geographical location is a crucial element of humanitarian response, outlining vulnerable populations, ongoing events, and available resources. Latest developments in Natural Language Processing may help in extracting vital information from the deluge of reports and documents produced by the humanitarian sector. However, the performance and biases of existing state-of-the-art information extraction tools are unknown. In this work, we develop annotated resources to fine-tune the popular Named Entity Recognition (NER) tools Spacy and roBERTa to perform geotagging of humanitarian texts. We then propose a geocoding method FeatureRank which links the candidate locations to the GeoNames database. We find that not only does the humanitarian-domain data improves the performance of the classifiers (up to F1 = 0.92), but it also alleviates some of the bias of the existing tools, which erroneously favor locations in the Western countries. Thus, we conclude that more resources from non-Western documents are necessary to ensure that off-the-shelf NER systems are suitable for the deployment in the humanitarian sector.

References

[1]
American Red Cross. 2022. What is the Red Cross Global Network?https://www.redcross.org/about-us/news-and-events/news/2022/what-is-the-red-cross-global-network.html. September 16, 2022.
[2]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).
[3]
Davide Buscaldi and Bernardo Magnini. 2010. Grounding toponyms in an italian local news corpus. In Proceedings of the 6th workshop on geographic information retrieval. 1–5.
[4]
Hao Chen, Maria Vasardani, and Stephan Winter. 2018. Disambiguating fine-grained place names from descriptions by clustering. arXiv preprint arXiv:1808.05946 (2018).
[5]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of machine learning research 12, ARTICLE (2011), 2493–2537.
[6]
Craig M Dalton, Linnet Taylor, and Jim Thatcher. 2016. Critical data studies: A dialog on data and space. Big Data & Society 3, 1 (2016), 2053951716648346.
[7]
Franck Dernoncourt, Ji Young Lee, and Peter Szolovits. 2017. NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 97–102.
[8]
Selim Fekih, Nicolo’ Tamagnone, Benjamin Minixhofer, Ranjan Shrestha, Ximena Contla, Ewan Oglethorpe, and Navid Rekabsaz. 2022. HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crises Response. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 4379–4389. https://aclanthology.org/2022.findings-emnlp.321
[9]
Mark Graham. 2015. The hidden biases of Geodata. https://www.theguardian.com/news/datablog/2015/apr/28/the-hidden-biases-of-geodata. The Guardian. [Accessed April 10, 2023].
[10]
Mark Graham and Stefano De Sabbata. 2015. Mapping information wealth and poverty: the geography of gazetteers. Environment and Planning A 47, 6 (2015), 1254–1264.
[11]
Mark Graham, Scott A Hale, and Devin Gaffney. 2014. Where in the world are you? Geolocation and language identification in Twitter. The Professional Geographer 66, 4 (2014), 568–578.
[12]
Milan Gritta, Mohammad Taher Pilevar, and Nigel Collier. 2019. A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics. Language Resources and Evaluation 54 (09 2019). https://doi.org/10.1007/s10579-019-09475-3
[13]
Muhammad Imran, Carlos Castillo, Fernando Diaz, and Sarah Vieweg. 2015. Processing social media messages in mass emergency: A survey. ACM Computing Surveys (CSUR) 47, 4 (2015), 1–38.
[14]
Muhammad Imran, Umair Qazi, and Ferda Ofli. 2022. Tbcov: two billion multilingual covid-19 tweets with sentiment, entity, geo, and gender labels. Data 7, 1 (2022), 8.
[15]
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, Vol. 1. 2.
[16]
Kelvin Lai, Jeremy R Porter, Mike Amodeo, David Miller, Michael Marston, and Saman Armal. 2022. A natural language processing approach to understanding context in the extraction and geocoding of historical floods, storms, and adaptation measures. Information Processing & Management 59, 1 (2022), 102735.
[17]
Jochen L. Leidner. 2007. Toponym resolution in text: Annotation, evaluation and applications of spatial grounding. SIGIR Forum 41 (01 2007), 124–126.
[18]
Chen Liang, Yue Yu, Haoming Jiang, Siawpeng Er, Ruijia Wang, Tuo Zhao, and Chao Zhang. 2020. Bond: Bert-assisted open-domain named entity recognition with distant supervision. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 1054–1064.
[19]
Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, and Honglak Lee. 2019. Zero-Shot Entity Linking by Reading Entity Descriptions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3449–3460.
[20]
Momin Malik, Hemank Lamba, Constantine Nakos, and Jürgen Pfeffer. 2015. Population bias in geotagged tweets. In Proceedings of the international AAAI conference on web and social media, Vol. 9. 18–27.
[21]
Stuart E Middleton, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2018. Location extraction from social media: Geoparsing, location disambiguation, and geotagging. ACM Transactions on Information Systems (TOIS) 36, 4 (2018), 1–27.
[22]
David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3–26.
[23]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.
[24]
Fabio Poletto, Yunbai Zhang, André Panisson, Yelena Mejova, Daniela Paolotti, and Sylvain Ponserre. 2021. Developing Annotated Resources for Internal Displacement Monitoring. In Companion Proceedings of the Web Conference 2021. 136–144.
[25]
Mansi A Radke, Nitin Gautam, Akhil Tambi, Umesh A Deshpande, and Zareen Syed. 2018. Geotagging Text Data on the Web—A Geometrical Approach. IEEE Access 6 (2018), 30086–30099.
[26]
Manish Raghavan, Solon Barocas, Jon Kleinberg, and Karen Levy. 2020. Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 469–481.
[27]
Satoshi Sekine, Kiyoshi Sudo, and Chikashi Nobata. 2002. Extended Named Entity Hierarchy. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02). European Language Resources Association (ELRA), Las Palmas, Canary Islands - Spain. http://www.lrec-conf.org/proceedings/lrec2002/pdf/120.pdf
[28]
Özge Sevgili, Artem Shelmanov, Mikhail Arkhipov, Alexander Panchenko, and Chris Biemann. 2022. Neural entity linking: A survey of models based on deep learning. Semantic WebPreprint (2022), 1–44.
[29]
Laleh Seyyed-Kalantari, Haoran Zhang, Matthew BA McDermott, Irene Y Chen, and Marzyeh Ghassemi. 2021. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature medicine 27, 12 (2021), 2176–2182.
[30]
Jyoti Prakash Singh, Yogesh K Dwivedi, Nripendra P Rana, Abhinav Kumar, and Kawaljeet Kaur Kapoor. 2019. Event classification and location prediction from tweets during disasters. Annals of Operations Research 283 (2019), 737–757.
[31]
Statista. 2022. Leading countries based on number of Twitter users as of January 2022. https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/. [Accessed April 10, 2023].
[32]
Briana Vecchione, Karen Levy, and Solon Barocas. 2021. Algorithmic auditing and social justice: Lessons from the history of audit studies. In Equity and Access in Algorithms, Mechanisms, and Optimization. 1–9.
[33]
Wikipedia. 2022. List of Wikipedias. https://meta.wikimedia.org/wiki/List_of_Wikipedias. [Accessed April 10, 2023].
[34]
Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, and Luke Zettlemoyer. 2020. Scalable Zero-shot Entity Linking with Dense Entity Retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 6397–6407.
[35]
Vikas Yadav and Steven Bethard. 2018. A Survey on Recent Advances in Named Entity Recognition from Deep Learning models. In Proceedings of the 27th International Conference on Computational Linguistics. 2145–2158.
[36]
Xin Zheng, Jialong Han, and Aixin Sun. 2018. A survey of location prediction on twitter. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1652–1671.
[37]
Lei Zou, Nina SN Lam, Heng Cai, and Yi Qiang. 2018. Mining Twitter data for improved understanding of disaster resilience. Annals of the American Association of Geographers 108, 5 (2018), 1422–1441.

Cited By

View all
  • (2024)Quantitative Information Extraction from Humanitarian DocumentsProceedings of the 2024 International Conference on Information Technology for Social Good10.1145/3677525.3678667(240-248)Online publication date: 4-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GoodIT '23: Proceedings of the 2023 ACM Conference on Information Technology for Social Good
September 2023
560 pages
ISBN:9798400701160
DOI:10.1145/3582515
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 September 2023

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. Geolocation
  2. geocoding
  3. geographic bias
  4. geotagging
  5. humanitarian
  6. location bias
  7. name entity recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

GoodIT '23
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)78
  • Downloads (Last 6 weeks)7
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Quantitative Information Extraction from Humanitarian DocumentsProceedings of the 2024 International Conference on Information Technology for Social Good10.1145/3677525.3678667(240-248)Online publication date: 4-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media