Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3557915.3561016acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article
Open access

Learning geospatially aware place embeddings via weak-supervision

Published: 22 November 2022 Publication History

Abstract

Understanding and representing real-world places (physical locations where drivers can deliver packages) is key to successfully and efficiently delivering packages to customer's doorstep. Prerequisite to this is the task of capturing similarity and relatedness between places. Intuitively, places that belong to a same building should have similar characteristics in geospatial as well as textual space. However, these assumptions fail in practice as existing methods use customer address text as a proxy for places. While providing the address text, customers tend to miss-out on key tokens, use vernacular content or place synonyms and do not follow a standard structure making them inherently ambiguous. Thus, modelling the problem from linguistic perspective alone is not sufficient. To overcome these shortcomings, we adapt various state-of-the-art embedding learning techniques to geospatial domain and propose Places-FastText, Places-Bert, and Places-GraphSage. We train these models using weak-supervision by innovatively leveraging different geospatial signals already available from historical delivery data. Our experiments and intrinsic evaluation demonstrate the significance of utilizing these signals and neighborhood information in learning geospatially aware place embeddings. Conclusions are further validated by observing significant improvements in two domain specific tasks viz., Pair-wise Matching (recall@95precision improves by 29%) and Candidate Generation (avg recall@k improves by 10%) as evaluated on UAE addresses.

References

[1]
Nikolay Arefyev, Dmitry Kharchev, and Artem Shelmanov. 2021. NB-MLM: Efficient Domain Adaptation of Masked Language Models for Sentiment Analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7--11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 9114--9124.
[2]
T. Ravindra Babu, Abhranil Chatterjee, Shivram Khandeparker, A. Vamsi Subhash, and Sawan Gupta. 2015. Geographical address classification without using geolocation coordinates. In Proceedings of the 9th Workshop on Geographic Information Retrieval, GIR 2015, Paris, France, November 26--27, 2015, Ross S. Purves and Christopher B. Jones (Eds.). ACM, 8:1--8:10.
[3]
Jasmijn Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Sima'an. 2017. Graph Convolutional Encoders for Syntax-aware Neural Machine Translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9--11, 2017, Martha Palmer, Rebecca Hwa, and Sebastian Riedel (Eds.). Association for Computational Linguistics, 1957--1967.
[4]
Daniel Beck, Gholamreza Haffari, and Trevor Cohn. 2018. Graph-to-Sequence Learning using Gated Graph Neural Networks. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15--20, 2018, Volume 1: Long Papers, Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, 273--283.
[5]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomás Mikolov. 2017. Enriching Word Vectors with Subword Information. Trans. Assoc. Comput. Linguistics 5 (2017), 135--146. https://transacl.org/ojs/index.php/tacl/article/view/999
[6]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. CoRR abs/2005.14165 (2020). arXiv:2005.14165 https://arxiv.org/abs/2005.14165
[7]
Deng Cai and Wai Lam. 2020. Graph Transformer for Graph-to-Sequence Learning. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7--12, 2020. AAAI Press, 7464--7471. https://ojs.aaai.org/index.php/AAAI/article/view/6243
[8]
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net. https://openreview.net/forum?id=r1xMH1BtvB
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR abs/1810.04805 (2018). arXiv:1810.04805 http://arxiv.org/abs/1810.04805
[10]
Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Structured Neural Summarization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net. https://openreview.net/forum?id=H1ersoRqtm
[11]
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. CoRR abs/1706.02216 (2017). arXiv:1706.02216 http://arxiv.org/abs/1706.02216
[12]
Hanqi Jin, Tianming Wang, and Xiaojun Wan. 2020. SemSUM: Semantic Dependency Guided Neural Abstractive Summarization. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7--12, 2020. AAAI Press, 8026--8033. https://ojs.aaai.org/index.php/AAAI/article/view/6312
[13]
Jiaqi Jin, Zhuojian Xiao, Qiang Qiu, and Jinyun Fang. 2019. A Geohash Based Place2vec Model. In 2019 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2019, Yokohama, Japan, July 28 - August 2, 2019. IEEE, 3344--3347.
[14]
Vishal Kakkar and T. Ravindra Babu. 2018. Address Clustering for e-Commerce Applications. In The SIGIR 2018 Workshop On eCommerce co-located with the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2018), Ann Arbor, Michigan, USA, July 12, 2018 (CEUR Workshop Proceedings, Vol. 2319), Jon Degenhardt, Giuseppe Di Fabbrizio, Surya Kallumadi, Mohit Kumar, Andrew Trotman, Yiu-Chang Lin, and Huasha Zhao (Eds.). CEUR-WS.org. http://ceur-ws.org/Vol-2319/paper8.pdf
[15]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization.
[16]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
[17]
Zhibin Lu, Pan Du, and Jian-Yun Nie. 2020. VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification. In Advances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14--17, 2020, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 12035), Joemon M. Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio Martins (Eds.). Springer, 369--382.
[18]
Shreyas Mangalgi, Lakshya Kumar, and Ravindra Babu Tallamraju. 2020. Deep Contextual Embeddings for Address Classification in E-commerce. CoRR abs/2007.03020 (2020). arXiv:2007.03020 https://arxiv.org/abs/2007.03020
[19]
Diego Marcheggiani, Jasmijn Bastings, and Ivan Titov. 2018. Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1--6, 2018, Volume 2 (Short Papers), Marilyn A. Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 486--492.
[20]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013). arXiv:1310.4546 http://arxiv.org/abs/1310.4546
[21]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543.
[22]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018). arXiv:1802.05365 http://arxiv.org/abs/1802.05365
[23]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019). arXiv:1910.01108 http://arxiv.org/abs/1910.01108
[24]
Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, and Max Welling. 2017. Modeling Relational Data with Graph Convolutional Networks.
[25]
Shuangli Shan, Zhixu Li, Qiang Yang, An Liu, Lei Zhao, Guanfeng Liu, and Zhigang Chen. 2020. Geographical address representation learning for address matching. World Wide Web 23, 3 (2020), 2005--2022.
[26]
Danqing Wang, Pengfei Liu, Yining Zheng, Xipeng Qiu, and Xuanjing Huang. 2020. Heterogeneous Graph Neural Networks for Extractive Document Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5--10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 6209--6219.
[27]
Lingfei Wu, Yu Chen, Kai Shen, Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, and Bo Long. 2021. Graph Neural Networks for Natural Language Processing: A Survey. CoRR abs/2106.06090 (2021). arXiv:2106.06090 https://arxiv.org/abs/2106.06090
[28]
Bo Yan, Krzysztof Janowicz, Gengchen Mai, and Song Gao. 2017. From ITDL to Place2Vec: Reasoning About Place Type Similarity and Relatedness by Learning Embeddings From Augmented Spatial Contexts. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2017, Redondo Beach, CA, USA, November 7--10, 2017, Erik G. Hoel, Shawn D. Newsam, Siva Ravada, Roberto Tamassia, and Goce Trajcevski (Eds.). ACM, 35:1--35:10.
[29]
Liang Yao, Chengsheng Mao, and Yuan Luo. 2018. Graph Convolutional Networks for Text Classification. CoRR abs/1809.05679 (2018). arXiv:1809.05679 http://arxiv.org/abs/1809.05679
[30]
Wei Zhai, Xueyin Bai, Yu Shi, Yu Han, Zhong-Ren Peng, and Chaolin Gu. 2019. Beyond Word2vec: An approach for urban functional region extraction and identification by combining Place2vec and POIs. Comput. Environ. Urban Syst. 74 (2019), 1--12.

Cited By

View all
  • (2024)Accurate Customer Address Matching via Weak Supervision for Geocode LearningProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691277(454-464)Online publication date: 29-Oct-2024
  • (2024)Let's Speak Trajectories: A Vision to Use NLP Models for Trajectory Analysis TasksACM Transactions on Spatial Algorithms and Systems10.1145/365647010:2(1-25)Online publication date: 1-Jul-2024
  • (2023)Kamel: A Scalable BERT-Based System for Trajectory ImputationProceedings of the VLDB Endowment10.14778/3632093.363211317:3(525-538)Online publication date: 1-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '22: Proceedings of the 30th International Conference on Advances in Geographic Information Systems
November 2022
806 pages
ISBN:9781450395298
DOI:10.1145/3557915
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2022

Check for updates

Author Tags

  1. address embedding
  2. address matching
  3. geoproximity
  4. geospatial BERT
  5. graph neural networks
  6. link prediction
  7. place embedding
  8. weak supervision

Qualifiers

  • Research-article

Conference

SIGSPATIAL '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)165
  • Downloads (Last 6 weeks)22
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Accurate Customer Address Matching via Weak Supervision for Geocode LearningProceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems10.1145/3678717.3691277(454-464)Online publication date: 29-Oct-2024
  • (2024)Let's Speak Trajectories: A Vision to Use NLP Models for Trajectory Analysis TasksACM Transactions on Spatial Algorithms and Systems10.1145/365647010:2(1-25)Online publication date: 1-Jul-2024
  • (2023)Kamel: A Scalable BERT-Based System for Trajectory ImputationProceedings of the VLDB Endowment10.14778/3632093.363211317:3(525-538)Online publication date: 1-Nov-2023
  • (2023)Learning Geolocation by Accurately Matching Customer Addresses via Graph based Active LearningCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3584647(457-463)Online publication date: 30-Apr-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media