research-article

Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation

Authors:

Giorgos Kordopatis-Zilos,

Panagiotis Galopoulos,

Symeon Papadopoulos,

Ioannis KompatsiarisAuthors Info & Claims

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Pages 155 - 163

https://doi.org/10.1145/3460426.3463644

Published: 01 September 2021 Publication History

Abstract

In this paper, we address the problem of global-scale image geolocation, proposing a mixed classification-retrieval scheme. Unlike other methods that strictly tackle the problem as a classification or retrieval task, we combine the two practices in a unified solution leveraging the advantages of each approach with two different modules. The first leverages the EfficientNet architecture to assign images to a specific geographic cell in a robust way. The second introduces a new residual architecture that is trained with contrastive learning to map input images to an embedding space that minimizes the pairwise geodesic distance of same-location images. For the final location estimation, the two modules are combined with a search-within-cell scheme, where the locations of most similar images from the predicted geographic cell are aggregated based on a spatial clustering scheme. Our approach demonstrates very competitive performance on four public datasets, achieving new state-of-the-art performance in fine granularity scales, i.e., 15.0% at 1km range on Im2GPS3k.

References

[1]

Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In IEEE conference on computer vision and pattern recognition. 5297--5307.

[2]

Roberto Arroyo, Pablo F Alcantarilla, Luis M Bergasa, and Eduardo Romera. 2015. Towards life-long visual localization using an efficient matching of binary sequences from images. In 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 6328--6335.

[3]

Yannis Avrithis, Yannis Kalantidis, Giorgos Tolias, and Evaggelos Spyrou. 2010. Retrieving landmark and non-landmark images from community photo collections. In 18th ACM international conference on Multimedia. 153--162.

Digital Library

[4]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).

[5]

Andrei Boiarov and Eduard Tyantov. 2019. Large scale landmark recognition via deep metric learning. In Proceedings of the ACM International Conference on Information and Knowledge Management . 169--178.

Digital Library

[6]

Jan Brejcha and Martin vC ad'ik. 2017. State-of-the-art in visual geo-localization. Pattern Analysis and Applications, Vol. 20, 3 (2017), 613--637.

Digital Library

[7]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.

[8]

Jaeyoung Choi, Claudia Hauff, Olivier Van Laere, and Bart Thomee. 2016. The Placing Task at MediaEval 2016. In Working Notes Proceedings of the MediaEval 2016 Workshop .

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[10]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (Portland, Oregon) (KDD'96). 226--231.

Digital Library

[11]

google. 2020. S2 Geometry Library. https://github.com/google/s2geometry .

[12]

James Hays and Alexei A Efros. 2008. IM2GPS: estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition .

[13]

James Hays and Alexei A Efros. 2015. Large-scale image geolocalization. In Multimodal location estimation of videos and images. Springer, 41--62.

[14]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 9729--9738.

[15]

Paul Hongsuck Seo, Tobias Weyand, Jack Sim, and Bohyung Han. 2018. CPlaNet: Enhancing image geolocalization by combinatorial partitioning of maps. In Proceedings of the European Conference on Computer Vision (ECCV) . 536--551.

Digital Library

[16]

Mike Izbicki, Evangelos E Papalexakis, and Vassilis J Tsotras. 2019. Exploiting the Earth's Spherical Geometry to Geolocate Images. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases .

[17]

Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision . 2938--2946.

Digital Library

[18]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised Contrastive Learning. Advances in Neural Information Processing Systems, Vol. 33 (2020).

[19]

Giorgos Kordopatis-Zilos, Symeon Papadopoulos, and Ioannis Kompatsiaris. 2017. Geotagging text content with language models and feature mining. Proc. IEEE, Vol. 105, 10 (2017), 1971--1986.

[20]

G Kordopatis-Zilos, A Popescu, S Papadopoulos, and Y Kompatsiaris. 2016. Placing images with refined language models and similarity search with PCA-reduced VGG features. In 2016 Multimedia Benchmark Workshop, MediaEval 2016, Vol. 1739. CEUR-WS.

[21]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Vol. 25 (2012), 1097--1105.

Digital Library

[22]

Xinchao Li, Martha Larson, and Alan Hanjalic. 2017. Geo-distinctive visual element matching for location estimation of images. IEEE Transactions on Multimedia, Vol. 20, 5 (2017), 1179--1194.

[23]

Tsung-Yi Lin, Serge Belongie, and James Hays. 2013. Cross-view image geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 891--898.

Digital Library

[24]

Liu Liu and Hongdong Li. 2019. Lending orientation to neural networks for cross-view geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 5624--5633.

[25]

Liu Liu, Hongdong Li, and Yuchao Dai. 2019. Stochastic attraction-repulsion embedding for large scale image localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 2570--2579.

[26]

Ilya Loshchilov and Frank Hutter. 2016. SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).

[27]

Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations .

[28]

Carlo Masone and Barbara Caputo. 2021. A Survey on Deep Visual Place Recognition. IEEE Access (2021).

[29]

Eric Muller-Budack, Kader Pustu-Iren, and Ralph Ewerth. 2018. Geolocation estimation of photos using a hierarchical model and scene classification. In Proceedings of the European Conference on Computer Vision .

Digital Library

[30]

Javier AV Mu noz, Lin Tzy Li, Ícaro C Dourado, Keiller Nogueira, Samuel G Fadel, Otávio AB Penatti, Jurandy Almeida, Lu'is AM Pereira, Rodrigo T Calumby, Jefersson A dos Santos, et almbox. 2016. RECOD@ Placing Task of MediaEval 2016: A Ranking Fusion Approach for Geographic-Location Prediction of Multimedia Objects. In 2016 Multimedia Benchmark Workshop, MediaEval 2016, Vol. 1739. CEUR-WS.

[31]

Michal Nowicki, Jan Wietrzykowski, and Piotr Skrzypczy'nski. 2016. Experimental evaluation of visual place recognition algorithms for personal indoor localization. In 2016 International conference on indoor positioning and indoor navigation (IPIN). IEEE, 1--8.

[32]

Michał R Nowicki, Jan Wietrzykowski, and Piotr Skrzypczy'nski. 2017. Real-time visual place recognition for personal localization on a mobile device. Wireless Personal Communications (2017).

[33]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).

[34]

Krishna Regmi and Mubarak Shah. 2019. Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 470--479.

[35]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition . 815--823.

[36]

Yujiao Shi, Liu Liu, Xin Yu, and Hongdong Li. 2019. Spatial-Aware Feature Aggregation for Cross-View Image based Geo-Localization. Advances in Neural Information Processing Systems (2019), 10090--10100.

[37]

Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105--6114.

[38]

Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM, Vol. 59, 2 (2016), 64--73.

Digital Library

[39]

Akihiko Torii, Hajime Taira, Josef Sivic, Marc Pollefeys, Masatoshi Okutomi, Tomas Pajdla, and Torsten Sattler. 2019. Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? IEEE transactions on pattern analysis and machine intelligence (2019).

[40]

Olivier Van Laere, Steven Schockaert, and Bart Dhoedt. 2011. Finding locations of Flickr resources using language models and similarity search. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval. 1--8.

Digital Library

[41]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000--6010.

[42]

Nam Vo, Nathan Jacobs, and James Hays. 2017. Revisiting IM2GPS in the deep learning era. In Proceedings of IEEE International Conference on Computer Vision .

[43]

Nam N Vo and James Hays. 2016. Localizing and orienting street views using overhead imagery. In European conference on computer vision. Springer, 494--509.

[44]

Han Wang, Chen Wang, and Lihua Xie. 2020 a. Online Visual Place Recognition via Saliency Re-identification. arXiv preprint arXiv:2007.14549 (2020).

[45]

Xun Wang, Haozhi Zhang, Weilin Huang, and Matthew R Scott. 2020 b. Cross-batch memory for embedding learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6388--6397.

[46]

Tobias Weyand, Andre Araujo, Bingyi Cao, and Jack Sim. 2020. Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2575--2584.

[47]

Tobias Weyand, Ilya Kostrikov, and James Philbin. 2016. PlaNet-photo geolocation with convolutional neural networks. In Proceedings of the European Conference on Computer Vision .

[48]

Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3733--3742.

[49]

Shuhei Yokoo, Kohei Ozaki, Edgar Simo-Serra, and Satoshi Iizuka. 2020. Two-stage discriminative re-ranking for large-scale landmark retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . 1012--1013.

[50]

Sijie Zhu, Taojiannan Yang, and Chen Chen. 2021. Revisiting street-to-aerial view image geo-localization and orientation estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 756--765.

Cited By

Astruc GDufour NSiglidis IAronssohn CBouia NFu SLoiseau RNguyen VRaude CVincent EXu LZhou HLandrieu L(2024)OpenStreetView-5M: The Many Roads to Global Visual Geolocation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02074(21967-21977)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02074
Haas LSkreta MAlberti SFinn C(2024)PIGEON: Predicting Image Geolocations2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01225(12893-12902)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01225
Huang GZhou YHu XZhang CZhao LGan W(2024)DINO-Mix enhancing visual place recognition with foundational vision model and feature mixingScientific Reports10.1038/s41598-024-73853-314:1Online publication date: 27-Sep-2024
https://doi.org/10.1038/s41598-024-73853-3
Show More Cited By

Index Terms

Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
2. Information systems
  1. Information systems applications
    1. Spatial-temporal systems
      1. Geographic information systems

Recommendations

Location and the web (LocWeb 2008)
WWW '08: Proceedings of the 17th international conference on World Wide Web

The World Wide Web has become the world's largest networked information resource, but references to geographical locations remain unstructured and typically implicit in nature. This lack of explicit spatial knowledge within the Web makes it difficult to ...
Spatial Constraint for Image Location Estimation
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Nowadays, image location has been widely used in many application scenarios for large geo-tagged image corpora. As to images which are not geographically tagged, we can estimate their locations with the help of the large geo-tagged image set by content ...
Spatial-aware Multimodal Location Estimation for Social Images
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Nowadays the locations of social images play an important role in geographic knowledge discovery. However, most social images still lack the location information, driving location estimation for social images to have recently become an active research ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

August 2021

715 pages

ISBN:9781450384636

DOI:10.1145/3460426

General Chairs:
Wen-Huang Cheng
National Yang Ming Chiao Tung University, Taiwan
,
Mohan Kankanhalli
National University of Singapore, Singapore
,
Meng Wang
Hefei University of Technology, China
,
Program Chairs:
Wei-Ta Chu
National Cheng Kung University, Taiwan
,
Jiaying Liu
Peking University, China
,
Marcel Worring
University of Amsterdam, Netherlands

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Commission

Conference

ICMR '21

Sponsor:

SIGMM

ICMR '21: International Conference on Multimedia Retrieval

August 21 - 24, 2021

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
192
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)3

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Astruc GDufour NSiglidis IAronssohn CBouia NFu SLoiseau RNguyen VRaude CVincent EXu LZhou HLandrieu L(2024)OpenStreetView-5M: The Many Roads to Global Visual Geolocation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02074(21967-21977)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.02074
Haas LSkreta MAlberti SFinn C(2024)PIGEON: Predicting Image Geolocations2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01225(12893-12902)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01225
Huang GZhou YHu XZhang CZhao LGan W(2024)DINO-Mix enhancing visual place recognition with foundational vision model and feature mixingScientific Reports10.1038/s41598-024-73853-314:1Online publication date: 27-Sep-2024
https://doi.org/10.1038/s41598-024-73853-3
Xie YWang BWang HLiang FZhang WDong ZYang B(2024)Look at the whole scene: General point cloud place recognition by classification proxyISPRS Journal of Photogrammetry and Remote Sensing10.1016/j.isprsjprs.2024.06.017215(15-30)Online publication date: Sep-2024
https://doi.org/10.1016/j.isprsjprs.2024.06.017
Trivigno GBerton GAragon JCaputo BMasone C(2023)Divide&Classify: Fine-Grained Classification for City-Wide Visual Place Recognition2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01023(11108-11118)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICCV51070.2023.01023
Theiner JMuller-Budack EEwerth R(2022)Interpretable Semantic Photo Geolocation2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV51458.2022.00154(1474-1484)Online publication date: Jan-2022
https://doi.org/10.1109/WACV51458.2022.00154
Berton GMereu RTrivigno GMasone CCsurka GSattler TCaputo B(2022)Deep Visual Geo-localization Benchmark2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00532(5386-5397)Online publication date: Jun-2022
https://doi.org/10.1109/CVPR52688.2022.00532
Berton GMasone CCaputo B(2022)Rethinking Visual Geo-localization for Large-Scale Applications2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00483(4868-4878)Online publication date: Jun-2022
https://doi.org/10.1109/CVPR52688.2022.00483

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents