Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3474717.3483973acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

GEM: An Efficient Entity Matching Framework for Geospatial Data

Published: 04 November 2021 Publication History

Abstract

Identifying various mentions of the same real-world locations is known as spatial entity matching. GEM is an end-to-end Geospatial EM framework that matches polygon geometry entities in addition to point geometry type. Blocking, feature vector creation, and classification are the core steps of our system. GEM comprises of an efficient and lightweight blocking technique, GeoPrune, that uses the geohash encoding mechanism. We re-purpose the spatial proximality operators from Apache Sedona to create semantically rich spatial feature vectors. The classification step in GEM is a pluggable component, which consumes a unique feature vector and determines whether the geolocations match or not. We conduct experiments with three classifiers upon multiple large-scale geospatial datasets consisting of both spatial and relational attributes. GEM achieves an F-measure of 1.0 for a point x point dataset with 176k total pairs, which is 42% higher than a state-of-the-art spatial EM baseline. It achieves F-measures of 0.966 and 0.993 for the point x polygon dataset with 302M total pairs, and the polygon x polygon dataset with 16M total pairs respectively.

References

[1]
Apache [n. d.]. Apache Sedona. Apache. https://sedona.apache.org/
[2]
pubnub [n. d.]. Geohash. pubnub. https://www.pubnub.com/learn/glossary/what-is-geohashing/
[3]
Python [n. d.]. Python Client for Google Maps Services. Python. https://pypi.org/project/googlemaps/
[4]
[n. d.]. Restaurant Dataset. https://www.cs.utexas.edu/users/ml/riddle/data.html
[5]
[n. d.]. SimMetrics Java Library. https://github.com/Simmetrics/simmetrics
[6]
Nelly Barret, Fabien Duchateau, Franck Favetta, and Ludovic Moncla. 2019. Spatial Entity Matching with GeoAlign (demo paper). In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 580--583.
[7]
Suela Isaj, Esteban Zimányi, and Torben Bach Pedersen. 2019. Multi-Source Spatial Entity Linkage. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases. 1--10.
[8]
Pradap Konda, Sanjib Das, Paul Suganthan GC, AnHai Doan, Adel Ardalan, Jeffrey R Ballard, Han Li, Fatemah Panahi, Haojun Zhang, Jeff Naughton, et al. 2016. Magellan: Toward building entity matching management systems. Proceedings of the VLDB Endowment 9, 12 (2016), 1197--1208.
[9]
Bruno Martins. 2011. A supervised machine learning approach for duplicate detection over gazetteer records. In International Conference on GeoSpatial Sematics. Springer, 34--51.
[10]
Venkata Vamsikrishna Meduri, Lucian Popa, Prithviraj Sen, and Mohamed Sarwat. 2020. A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1133--1147.
[11]
Anthony Morana, Thomas Morel, Bilal Berjawi, and Fabien Duchateau. 2014. Geobench: a geospatial integration tool for building a spatial entity matching benchmark. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. 533--536.
[12]
Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In Proceedings of the 2018 International Conference on Management of Data. 19--34.

Cited By

View all
  • (2022)Hall of Mirrors: A Novel Strategy to Address Locality in Geocoded-Based PoI Private QueriesIEEE Access10.1109/ACCESS.2022.318004610(61769-61783)Online publication date: 2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '21: Proceedings of the 29th International Conference on Advances in Geographic Information Systems
November 2021
700 pages
ISBN:9781450386647
DOI:10.1145/3474717
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 November 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Apache Sedona
  2. geohash
  3. spatial blocking
  4. spatial entity matching

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • National Science Foundation

Conference

SIGSPATIAL '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Hall of Mirrors: A Novel Strategy to Address Locality in Geocoded-Based PoI Private QueriesIEEE Access10.1109/ACCESS.2022.318004610(61769-61783)Online publication date: 2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media