Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3460426.3463644acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Leveraging EfficientNet and Contrastive Learning for Accurate Global-scale Location Estimation

Published: 01 September 2021 Publication History

Abstract

In this paper, we address the problem of global-scale image geolocation, proposing a mixed classification-retrieval scheme. Unlike other methods that strictly tackle the problem as a classification or retrieval task, we combine the two practices in a unified solution leveraging the advantages of each approach with two different modules. The first leverages the EfficientNet architecture to assign images to a specific geographic cell in a robust way. The second introduces a new residual architecture that is trained with contrastive learning to map input images to an embedding space that minimizes the pairwise geodesic distance of same-location images. For the final location estimation, the two modules are combined with a search-within-cell scheme, where the locations of most similar images from the predicted geographic cell are aggregated based on a spatial clustering scheme. Our approach demonstrates very competitive performance on four public datasets, achieving new state-of-the-art performance in fine granularity scales, i.e., 15.0% at 1km range on Im2GPS3k.

References

[1]
Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In IEEE conference on computer vision and pattern recognition. 5297--5307.
[2]
Roberto Arroyo, Pablo F Alcantarilla, Luis M Bergasa, and Eduardo Romera. 2015. Towards life-long visual localization using an efficient matching of binary sequences from images. In 2015 IEEE international conference on robotics and automation (ICRA). IEEE, 6328--6335.
[3]
Yannis Avrithis, Yannis Kalantidis, Giorgos Tolias, and Evaggelos Spyrou. 2010. Retrieving landmark and non-landmark images from community photo collections. In 18th ACM international conference on Multimedia. 153--162.
[4]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[5]
Andrei Boiarov and Eduard Tyantov. 2019. Large scale landmark recognition via deep metric learning. In Proceedings of the ACM International Conference on Information and Knowledge Management . 169--178.
[6]
Jan Brejcha and Martin vC ad'ik. 2017. State-of-the-art in visual geo-localization. Pattern Analysis and Applications, Vol. 20, 3 (2017), 613--637.
[7]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607.
[8]
Jaeyoung Choi, Claudia Hauff, Olivier Van Laere, and Bart Thomee. 2016. The Placing Task at MediaEval 2016. In Working Notes Proceedings of the MediaEval 2016 Workshop .
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.
[10]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (Portland, Oregon) (KDD'96). 226--231.
[11]
google. 2020. S2 Geometry Library. https://github.com/google/s2geometry .
[12]
James Hays and Alexei A Efros. 2008. IM2GPS: estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition .
[13]
James Hays and Alexei A Efros. 2015. Large-scale image geolocalization. In Multimodal location estimation of videos and images. Springer, 41--62.
[14]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 9729--9738.
[15]
Paul Hongsuck Seo, Tobias Weyand, Jack Sim, and Bohyung Han. 2018. CPlaNet: Enhancing image geolocalization by combinatorial partitioning of maps. In Proceedings of the European Conference on Computer Vision (ECCV) . 536--551.
[16]
Mike Izbicki, Evangelos E Papalexakis, and Vassilis J Tsotras. 2019. Exploiting the Earth's Spherical Geometry to Geolocate Images. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases .
[17]
Alex Kendall, Matthew Grimes, and Roberto Cipolla. 2015. Posenet: A convolutional network for real-time 6-dof camera relocalization. In Proceedings of the IEEE international conference on computer vision . 2938--2946.
[18]
Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised Contrastive Learning. Advances in Neural Information Processing Systems, Vol. 33 (2020).
[19]
Giorgos Kordopatis-Zilos, Symeon Papadopoulos, and Ioannis Kompatsiaris. 2017. Geotagging text content with language models and feature mining. Proc. IEEE, Vol. 105, 10 (2017), 1971--1986.
[20]
G Kordopatis-Zilos, A Popescu, S Papadopoulos, and Y Kompatsiaris. 2016. Placing images with refined language models and similarity search with PCA-reduced VGG features. In 2016 Multimedia Benchmark Workshop, MediaEval 2016, Vol. 1739. CEUR-WS.
[21]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Vol. 25 (2012), 1097--1105.
[22]
Xinchao Li, Martha Larson, and Alan Hanjalic. 2017. Geo-distinctive visual element matching for location estimation of images. IEEE Transactions on Multimedia, Vol. 20, 5 (2017), 1179--1194.
[23]
Tsung-Yi Lin, Serge Belongie, and James Hays. 2013. Cross-view image geolocalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 891--898.
[24]
Liu Liu and Hongdong Li. 2019. Lending orientation to neural networks for cross-view geo-localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 5624--5633.
[25]
Liu Liu, Hongdong Li, and Yuchao Dai. 2019. Stochastic attraction-repulsion embedding for large scale image localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision . 2570--2579.
[26]
Ilya Loshchilov and Frank Hutter. 2016. SGDR: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016).
[27]
Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations .
[28]
Carlo Masone and Barbara Caputo. 2021. A Survey on Deep Visual Place Recognition. IEEE Access (2021).
[29]
Eric Muller-Budack, Kader Pustu-Iren, and Ralph Ewerth. 2018. Geolocation estimation of photos using a hierarchical model and scene classification. In Proceedings of the European Conference on Computer Vision .
[30]
Javier AV Mu noz, Lin Tzy Li, Ícaro C Dourado, Keiller Nogueira, Samuel G Fadel, Otávio AB Penatti, Jurandy Almeida, Lu'is AM Pereira, Rodrigo T Calumby, Jefersson A dos Santos, et almbox. 2016. RECOD@ Placing Task of MediaEval 2016: A Ranking Fusion Approach for Geographic-Location Prediction of Multimedia Objects. In 2016 Multimedia Benchmark Workshop, MediaEval 2016, Vol. 1739. CEUR-WS.
[31]
Michal Nowicki, Jan Wietrzykowski, and Piotr Skrzypczy'nski. 2016. Experimental evaluation of visual place recognition algorithms for personal indoor localization. In 2016 International conference on indoor positioning and indoor navigation (IPIN). IEEE, 1--8.
[32]
Michał R Nowicki, Jan Wietrzykowski, and Piotr Skrzypczy'nski. 2017. Real-time visual place recognition for personal localization on a mobile device. Wireless Personal Communications (2017).
[33]
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
[34]
Krishna Regmi and Mubarak Shah. 2019. Bridging the domain gap for ground-to-aerial image matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 470--479.
[35]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition . 815--823.
[36]
Yujiao Shi, Liu Liu, Xin Yu, and Hongdong Li. 2019. Spatial-Aware Feature Aggregation for Cross-View Image based Geo-Localization. Advances in Neural Information Processing Systems (2019), 10090--10100.
[37]
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105--6114.
[38]
Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM, Vol. 59, 2 (2016), 64--73.
[39]
Akihiko Torii, Hajime Taira, Josef Sivic, Marc Pollefeys, Masatoshi Okutomi, Tomas Pajdla, and Torsten Sattler. 2019. Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization? IEEE transactions on pattern analysis and machine intelligence (2019).
[40]
Olivier Van Laere, Steven Schockaert, and Bart Dhoedt. 2011. Finding locations of Flickr resources using language models and similarity search. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval. 1--8.
[41]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000--6010.
[42]
Nam Vo, Nathan Jacobs, and James Hays. 2017. Revisiting IM2GPS in the deep learning era. In Proceedings of IEEE International Conference on Computer Vision .
[43]
Nam N Vo and James Hays. 2016. Localizing and orienting street views using overhead imagery. In European conference on computer vision. Springer, 494--509.
[44]
Han Wang, Chen Wang, and Lihua Xie. 2020 a. Online Visual Place Recognition via Saliency Re-identification. arXiv preprint arXiv:2007.14549 (2020).
[45]
Xun Wang, Haozhi Zhang, Weilin Huang, and Matthew R Scott. 2020 b. Cross-batch memory for embedding learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6388--6397.
[46]
Tobias Weyand, Andre Araujo, Bingyi Cao, and Jack Sim. 2020. Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2575--2584.
[47]
Tobias Weyand, Ilya Kostrikov, and James Philbin. 2016. PlaNet-photo geolocation with convolutional neural networks. In Proceedings of the European Conference on Computer Vision .
[48]
Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3733--3742.
[49]
Shuhei Yokoo, Kohei Ozaki, Edgar Simo-Serra, and Satoshi Iizuka. 2020. Two-stage discriminative re-ranking for large-scale landmark retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . 1012--1013.
[50]
Sijie Zhu, Taojiannan Yang, and Chen Chen. 2021. Revisiting street-to-aerial view image geo-localization and orientation estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 756--765.

Cited By

View all
  • (2024)OpenStreetView-5M: The Many Roads to Global Visual Geolocation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02074(21967-21977)Online publication date: 16-Jun-2024
  • (2024)PIGEON: Predicting Image Geolocations2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01225(12893-12902)Online publication date: 16-Jun-2024
  • (2024)DINO-Mix enhancing visual place recognition with foundational vision model and feature mixingScientific Reports10.1038/s41598-024-73853-314:1Online publication date: 27-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval
August 2021
715 pages
ISBN:9781450384636
DOI:10.1145/3460426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. contrastive learning
  2. geolocation
  3. global-scale location estimation
  4. location estimation
  5. spacial clustering

Qualifiers

  • Research-article

Funding Sources

  • European Commission

Conference

ICMR '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)OpenStreetView-5M: The Many Roads to Global Visual Geolocation2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.02074(21967-21977)Online publication date: 16-Jun-2024
  • (2024)PIGEON: Predicting Image Geolocations2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01225(12893-12902)Online publication date: 16-Jun-2024
  • (2024)DINO-Mix enhancing visual place recognition with foundational vision model and feature mixingScientific Reports10.1038/s41598-024-73853-314:1Online publication date: 27-Sep-2024
  • (2024)Look at the whole scene: General point cloud place recognition by classification proxyISPRS Journal of Photogrammetry and Remote Sensing10.1016/j.isprsjprs.2024.06.017215(15-30)Online publication date: Sep-2024
  • (2023)Divide&Classify: Fine-Grained Classification for City-Wide Visual Place Recognition2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01023(11108-11118)Online publication date: 1-Oct-2023
  • (2022)Interpretable Semantic Photo Geolocation2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV51458.2022.00154(1474-1484)Online publication date: Jan-2022
  • (2022)Deep Visual Geo-localization Benchmark2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00532(5386-5397)Online publication date: Jun-2022
  • (2022)Rethinking Visual Geo-localization for Large-Scale Applications2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00483(4868-4878)Online publication date: Jun-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media