Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3403896.3403970acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Boosting toponym interlinking by paying attention to both machine and deep learning

Published: 26 June 2020 Publication History

Abstract

Toponym interlinking is the problem of identifying same spatio-textual entities within two or more different data sources, based exclusively on their names. It comprises a significant task in geospatial data management and integration with application in fields such as geomarketing, cadastration, navigation, etc. Previous works have assessed the effectiveness of unsupervised string similarity functions, while more recent ones have deployed similarity-based Machine Learning techniques and language model-based Deep Learning techniques, achieving significantly higher interlinking accuracy. In this paper, we demonstrate the suitability of Attention-based neural networks on the problem, as well as the fact that all different approaches provide merit to the problem, proposing a hybrid scheme that achieves the highest accuracy reported on toponym interlinking on the widely used Geonames dataset.

References

[1]
P. Christen. 2006. A Comparison of Personal Name Matching: Techniques and Practical Issues. In ICDM'06 Workshops.
[2]
William W. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg. 2003. A Comparison of String Distance Metrics for Name-matching Tasks. In In IIWEB'03.
[3]
Nilesh Dalvi, Marian Olteanu, Manish Raghavan, and Philip Bohannon. 2014. Deduplicating a Places Database. In Proceedings of the 23rd International Conference on World Wide Web (WWW '14). ACM, New York, NY, USA, 409--418.
[4]
Clodoveu A. Davis and Emerson de Salles. 2007. Approximate String Matching for Geographic Names and Personal Names. In GeoInfo. 49--60.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186.
[6]
Giorgos Giannopoulos, Vassilis Kaffes, and Georgios Kostoulas. 2020. Learning Advanced Similarities and Training Features for Toponym Interlinking. In In ECIR'20 (to appear).
[7]
Suela Isaj, Esteban Zimányi, and Torben Bach Pedersen. 2019. Multi-Source Spatial Entity Linkage. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019, Vienna, Austria, August 19-21, 2019, Walid G. Aref, Michela Bertolotto, Panagiotis Bouros, Christian S. Jensen, Ahmed Mahmood, Kjetil Nørvåg, Dimitris Sacharidis, and Mohamed Sarwat (Eds.). ACM, 1--10.
[8]
Vassilis Kaffes, Giorgos Giannopoulos, Nikos Karagiannakis, and Nontas Tsakonas. 2019. Learning Domain Specific Models for Toponym Interlinking. In Proceedings of SIGSPATIAL'19,.
[9]
Pradap Konda, Sanjib Das, C. Suganthan G. PaulSuganthan G., AnHai Doan, Adel Ardalan, Jeffrey R. Ballard, Han Li, Fatemah Panahi, Haojun Zhang, Jeffrey F. Naughton, Shishir Prasad, Ganesh Krishnan, Rohit Deep, and Vijay Raghavendra. 2016. Magellan: Toward Building Entity Matching Management Systems. PVLDB 9 (2016), 1197--1208.
[10]
Deniz Kln. 2016. An accurate toponym-matching measure based on approximate string matching. Journal of Information Science 42, 2 (2016), 138--149.
[11]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1301.3781
[12]
George Papadakis, George Alexiou, George Papastefanatos, and Georgia Koutrika. 2015. Schema-agnostic vs Schema-based Configurations for Blocking Methods on Homogeneous Data. PVLDB 9, 4 (2015), 312--323.
[13]
Rui Santos, Patricia Murrieta-Flores, Pável Calado, and Bruno Martins. 2017. Toponym Matching Through Deep Neural Networks. International Journal of Geographic Information Systems (2017).
[14]
Rui Santos, Patricia Murrieta-Flores, and Bruno Martins. 2017. Combining Multiple String Similarity Metrics for Effective Toponym Matching. International Journal of Digital Earth (2017).
[15]
Vivek Sehgal, Lise Getoor, and Peter D Viechnicki. 2006. Entity Resolution in Geospatial Data Integration. In Proceedings of the 14th Annual ACM International Symposium on Advances in Geographic Information Systems (GIS '06). ACM, New York, NY, USA, 83--90.
[16]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In the Proceedings of ICLR.
[17]
Yu Zheng, Xixuan Fen, Xing Xie, Shuang Peng, and James Fu. 2010. Detecting Nearly Duplicated Records in Location Datasets. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS '10). ACM, New York, NY, USA, 137--143.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GeoRich '20: Proceedings of the Sixth International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data
June 2020
27 pages
ISBN:9781450380355
DOI:10.1145/3403896
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. deep learning
  3. ensembles
  4. interlinking
  5. machine learning
  6. toponym

Qualifiers

  • Short-paper

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

GeoRich '20 Paper Acceptance Rate 4 of 9 submissions, 44%;
Overall Acceptance Rate 25 of 50 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 66
    Total Downloads
  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media