ConvNet and LSH-Based Visual Localization Using Localized Sequence Matching †
<p>A detailed block diagram of the proposed visual localization method: Feature extraction uses a pretrained network, feature comparison uses the cosine distance, and localized sequence searching is conducted based on the potential path.</p> "> Figure 2
<p>An example of scene and extracted features from different layers of the caffe-alex network. Features obtained from different ConvNet layers can serve as holistic image descriptors for place recognition.</p> "> Figure 3
<p>The search algorithm finds the lowest-cost straight line within the searching matrix <math display="inline"><semantics> <msup> <mi>M</mi> <mi>T</mi> </msup> </semantics></math>. These lines are the set of potential paths through the matrix. The red line is the lowest-cost path which aligns the testing sequence and the training sequence. Each element represents the cosine distance between two images.</p> "> Figure 4
<p>The experimental vehicle equipped with sensors (camera and RTK-GPS).</p> "> Figure 5
<p>The trajectory of the UTBM-1 data set and its representative images: (<b>a</b>) The trajectory acrosses forest, city, and parking areas. (<b>b</b>) Three representative examples of appearance and shadow variations. The images in each row are taken in the same place at different times (interval time of one week).</p> "> Figure 6
<p>The trajectory of the UTBM-2 data set and its representative images: (<b>a</b>) The trajectory acrosses forest, city, and parking areas. (<b>b</b>) Two representative examples of illumination variations. The images in each row were taken in the same place at different times (morning vs. afternoon).</p> "> Figure 7
<p>Four representative examples of the Nordland data set (each image corresponds to a different season).</p> "> Figure 8
<p>City Center data set [<a href="#B38-sensors-19-02439" class="html-bibr">38</a>]: twice traveling. The left column shows the training images, and the right column shows the testing images.</p> "> Figure 9
<p>Two examples of a performance comparison of our proposal depending on the image sequence length (<math display="inline"><semantics> <msub> <mi>d</mi> <mi>s</mi> </msub> </semantics></math>) in the challenging UTBM-1 and Nordland data sets (fall vs. winter). The feature used here is the conv4 layer.</p> "> Figure 10
<p>Frame match examples from the Nordland (fall vs. winter) data sets. The top row shows a query sequence, and the middle and third rows show the frames recalled by <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>s</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics></math> (single image) and <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>s</mi> </msub> <mo>=</mo> <mn>6</mn> </mrow> </semantics></math>, respectively. Visual recognition based on sequence matching achieves a better performance than that of a single image.</p> "> Figure 11
<p>Precision-recall curves for the UTBM-1 data set (the trajectory acrosses forest, city, and parking areas) (<math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>s</mi> </msub> <mo>=</mo> <mn>8</mn> </mrow> </semantics></math>).</p> "> Figure 12
<p>Precision-recall curves for the City Center data set (the trajectory acrosses city and parking areas) (<math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>s</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>).</p> "> Figure 13
<p>Place recognition across seasons on the Nordland data set. It can be seen that conv4 and conv5 perform better than the others, while fc6 and fc7 are the worst (<math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>s</mi> </msub> <mo>=</mo> <mn>6</mn> </mrow> </semantics></math>).</p> "> Figure 14
<p>Precision-recall curves for the UTBM-2 data set considering different ConvNet layers (<math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>s</mi> </msub> <mo>=</mo> <mn>8</mn> </mrow> </semantics></math>).</p> "> Figure 15
<p>Precision-recall curves of different hash bit lengths. The cosine distance over the full feature vector of 64,896 dimensions (red) can be closely approximated by the hamming distance over bit vectors of length 4096 (dark) without losing much performance. This corresponds to a compression of 63.1%.</p> "> Figure 16
<p>The visual localization results in the four data sets. The used feature is 4096 hash bits of the conv4 layer. In the left column, two images from the same location (on the basis of appearance alone) are marked with red points and joined with a blue line. In the right column are the corresponding normalized feature cosine distances.</p> ">
Abstract
:1. Introduction
2. Related Works
2.1. Different Representations for Place Recognition
2.2. Convolutional Networks
3. Proposed Approach
- ConvNet features extraction (detailed in Section 3.1): ConvNet features are extracted from all training database images by off-line processing, and is extracted from the current testing image by online processing using the pretrained caffe-alex network. These learned features are robust to both appearance and illumination changes and represent each location (place) profoundly. The extracted ConvNet features will be compared in the next step.
- Feature comparison (detailed in Section 3.2): The cosine distances are computed between the feature of the current testing image and the features of all the images of the training database. All these distances form a vector . Based on this, localized sequence matching is conducted in the next step.
- Localized sequence matching (detailed in Section 3.3): To achieve an efficient place recognition, localized sequence matching is used instead of single image matching. Considering the testing sequence composed of the last testing images (indexed from to T), localized sequence matching is conducted in the matrix . According to the speed ratio between the testing and training sequences, some possible training sequence candidates in the training database can be firstly determined. A score S is calculated by summing all the testing image to training image cosine distances along each possible training sequence. The sequence that provides the minimum score can be considered the most similar one to the testing sequence. The two best sequence matching scores are conserved for further matching validation.
- Final Matching Validation (detailed in Section 3.4): The ratio between the two best sequence matching scores is used to verify the best sequence candidate. If the ratio is below or equal to a threshold , the first candidate (with the lower matching score) is confirmed and regarded as positive matching; otherwise, it is considered a negative one (in this case, no matching is conserved).
3.1. ConvNet Features Extraction
3.2. Feature Comparison
3.3. Localized Sequence Matching
3.4. Final Matching Validation
3.5. Visual Localization
3.6. Algorithm of Proposed ConvNet-Based Visual Localization
Algorithm 1 ConvNet feature extraction and comparison. |
Inputs: {training images database}; {testing images database}; {training and testing images numbers}; Outputs: D {Cosine distance}; Algorithm: for i ← 1 to do for j ← 1 to do ← Feature extraction for training images; ← Feature extraction for testing images; ← cos 〈, 〉; // Cosine distance {Section 3.2}. end for ← ; Column vector that contains the cosine distance between the testing image and all the training images {Section 3.2}. end for |
Algorithm 2 Localized sequence matching and visual localization. |
Inputs: {Cosine distance}; {training and testing images numbers}; {maximum and minimum speed ratios}; {Vehicle speed step-size} {Sequence length}; Outputs: S {Path-line (sequence candidate) score}; for T ← to do ← ; // Local searching matrix. j ← 1; // Path number (sequence candidates number) initialization. for s ← 0 to do for V ← : : do ← 0; for t ← to T do ; // k is a line index in the column vector ; s is the training image number where the path originated in. ← ; // Score S is calculated for each possible path. end for j ← ; Sequence candidate number updating. end for end for ; is the index of minimum score. if Matching validation is positive; Vehicle position ← The matched training image position if Matching validation is negative; Vehicle position ← NaN (no position results) end for |
4. Experimental Setup
4.1. Experimental Platform
4.2. Data Sets and Ground Truth
4.3. Performance Evaluation
5. Experimental Results
5.1. Performance Comparison between Single Images and Sequences Bsed Approach
5.2. Comparison of ConvNet Features Layer-By-Layer
5.2.1. Appearance Change Robustness
5.2.2. Illumination Change Robustness
5.3. Local Sensitive Hashing for Real-Time Place Recognition
5.4. Visual Localization Results
6. Conclusions and Future Works
Author Contributions
Funding
Conflicts of Interest
Abbreviations
ConvNet | Convolutional Network |
LSH | locality sensitive hashing |
FAB-MAP | Fast Appearance Based Mapping |
SeqSLAM | Sequence Simultaneous Localisation and Mapping |
References
- Rivera-Rubio, J.; Alexiou, I.; Bharath, A.A. Appearance-based indoor localization: A comparison of patch descriptor performance. Pattern Recognit. Lett. 2015, 66, 109–117. [Google Scholar] [CrossRef] [Green Version]
- Lin, S.; Cheng, R.; Wang, K.; Yang, K. Visual localizer: Outdoor localization based on convnet descriptor and global optimization for visually impaired pedestrians. Sensors 2018, 18, 2476. [Google Scholar] [CrossRef] [PubMed]
- Qiao, Y.; Cappelle, C.; Ruichek, Y. Visual localization across seasons using sequence matching based on multi-feature combination. Sensors 2017, 17, 2442. [Google Scholar] [CrossRef] [PubMed]
- Herranz, L.; Jiang, S.; Li, X. Scene recognition with CNNs: Objects, scales and data set bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 571–579. [Google Scholar]
- Yuan, B.; Tu, J.; Zhao, R.W.; Zheng, Y.; Jiang, Y.G. Learning part-based mid-level representation for visual recognition. Neurocomputing 2018, 275, 2126–2136. [Google Scholar] [CrossRef]
- Li, Q.; Li, K.; You, X.; Bu, S.; Liu, Z. Place recognition based on deep feature and adaptive weighting of similarity matrix. Neurocomputing 2016, 199, 114–127. [Google Scholar] [CrossRef]
- Garcia-Fidalgo, E.; Ortiz, A. Vision-based topological mapping and localization methods: A survey. Robot. Auton. Syst. 2015, 64, 1–20. [Google Scholar] [CrossRef]
- Ouerghi, S.; Boutteau, R.; Savatier, X.; Tlili, F. Visual odometry and place recognition fusion for vehicle position tracking in urban environments. Sensors 2018, 18, 939. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Shen, Y.; Liu, X.; Zhong, B. 3D object tracking via image sets and depth-based occlusion detection. Signal Process. 2015, 112, 146–153. [Google Scholar] [CrossRef]
- Sharif Razavian, A.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, Columbus, OH, USA, 24–27 June 2014; pp. 806–813. [Google Scholar]
- Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Is object localization for free?-weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 685–694. [Google Scholar]
- Zhu, J.; Ai, Y.; Tian, B.; Cao, D.; Scherer, S. Visual Place Recognition in Long-term and Large-scale Environment based on CNN Feature. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 1679–1685. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C., Bottou, L., Weinberger, K., Eds.; The MIT Press: Cambridge, MA, USA, 2012; pp. 1097–1105. [Google Scholar]
- Li, L.; Goh, W.; Lim, J.H.; Pan, S.J. Extended Spectral Regression for efficient scene recognition. Pattern Recognit. 2014, 47, 2940–2951. [Google Scholar] [CrossRef]
- Valiente, D.; Gil, A.; Payá, L.; Sebastián, J.; Reinoso, Ó. Robust visual localization with dynamic uncertainty management in omnidirectional SLAM. Appl. Sci. 2017, 7, 1294. [Google Scholar] [CrossRef]
- Valiente, D.; Gil, A.; Fernández, L.; Reinoso, Ó. A comparison of EKF and SGD applied to a view-based SLAM approach with omnidirectional images. Robot. Auton. Syst. 2014, 62, 108–119. [Google Scholar] [CrossRef]
- Song, X.; Jiang, S.; Herranz, L.; Kong, Y.; Zheng, K. Category co-occurrence modeling for large scale scene recognition. Pattern Recognit. 2016, 59, 98–111. [Google Scholar] [CrossRef]
- Duan, Q.; Akram, T.; Duan, P.; Wang, X. Visual saliency detection using information contents weighting. Optik 2016, 127, 7418–7430. [Google Scholar] [CrossRef]
- Cummins, M.; Newman, P. Appearance-only SLAM at large scale with FAB-MAP 2.0. Int. J. Robot. Res. 2011, 30, 1100–1123. [Google Scholar] [CrossRef]
- Milford, M.; Wyeth, G. SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 14–18 May 2012; pp. 1643–1649. [Google Scholar]
- Neubert, P.; Sunderhauf, N.; Protzel, P. Appearance change prediction for long-term navigation across seasons. In Proceedings of the European Conference on Mobile Robots (ECMR), Barcelona, Spain, 25–27 September 2013; pp. 198–203. [Google Scholar]
- Badino, H.; Huber, D.; Kanade, T. Real-time topometric localization. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 14–18 May 2012; pp. 1635–1642. [Google Scholar]
- Qiao, Y.; Cappelle, C.; Ruichek, Y. Place Recognition Based Visual Localization Using LBP Feature and SVM. In Proceedings of the Mexican International Conference on Artificial Intelligence, Morelos, Mexico, 25–31 October 2015; Springer: New York, NY, USA, 2015; pp. 393–404. [Google Scholar]
- Arroyo, R.; Alcantarilla, P.; Bergasa, L.; Yebes, J.; Bronte, S. Fast and effective visual place recognition using binary codes and disparity information. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), Chicago, IL, USA, 14–18 September 2014; pp. 3089–3094. [Google Scholar]
- Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P. BRIEF: Binary robust independent elementary features. In Proceedings of the European Conference on Computer Vision, Crete, Greece, 5–11 September 2010; Springer: New York, NY, USA, 2010; pp. 778–792. [Google Scholar]
- Liu, Y.; Zhang, H. Visual loop closure detection with a compact image descriptor. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, 7–12 October 2012; pp. 1051–1056. [Google Scholar]
- Kosecka, J.; Zhou, L.; Barber, P.; Duric, Z. Qualitative image based localization in indoors environments. In Proceedings of the 2003 IEEE Computer Society Conference onComputer Vision and Pattern Recognition, Madison, WI, USA, 16–22 June 2003; Volume 2, pp. II-3–II-8. [Google Scholar]
- Chen, Z.; Lam, O.; Jacobson, A.; Milford, M. Convolutional Neural Network-based Place Recognition. arXiv 2014, arXiv:1411.1509v1. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Sünderhauf, N.; Shirazi, S.; Dayoub, F.; Upcroft, B.; Milford, M. On the performance of ConvNet features for place recognition. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany, 28 September–3 October 2015; pp. 4297–4304. [Google Scholar]
- Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. In Proceedings of the 2014 CBLS International Conference on Learning Representations (ICLR 2014), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.B.; Guadarrama, S.; Darrell, T. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 675–678. [Google Scholar]
- Vedaldi, A.; Lenc, K. MatConvNet: Convolutional Neural Networks for MATLAB. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, Brisbane, Australia, 26–30 October 2015; pp. 689–692. [Google Scholar]
- Sünderhauf, N.; Neubert, P.; Protzel, P. Are we there yet? Challenging SeqSLAM on a 3000 km journey across all four seasons. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA),Workshop on Long-Term Autonomy, Karlsruhe, Germany, 6–10 May 2013. [Google Scholar]
- Cummins, M.; Newman, P. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance. Int. J. Robot. Res. 2008, 27, 647–665. [Google Scholar] [CrossRef]
- Glover, A.; Maddern, W.; Warren, M.; Reid, S.; Milford, M.; Wyeth, G. OpenFABMAP: An open source toolbox for appearance-based loop closure detection. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA, 14–18 May 2012; pp. 4730–4735. [Google Scholar]
- Jacobson, A.; Chen, Z.; Milford, M. Autonomous Multisensor Calibration and Closed-loop Fusion for SLAM. J. Field Robot. 2015, 32, 85–122. [Google Scholar] [CrossRef]
- Datar, M.; Immorlica, N.; Indyk, P.; Mirrokni, V.S. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, Brooklyn, NY, USA, 9–11 June 2004; ACM: New York, NY, USA, 2014; pp. 253–262. [Google Scholar]
Layer | Dimensions | Layer | Dimensions |
---|---|---|---|
conv4 | 13 × 13 × 384 | fc6 | 1 × 1 × 4096 |
conv5 | 13 × 13 × 256 | fc7 | 1 × 1 × 4096 |
relu5 | 13 × 13 × 256 | ||
pool5 | 6 × 6 × 256 |
Data Set | Length | No. Images | Description |
---|---|---|---|
UTBM-1 | 2 × 4.0 KM | training: 848; testing: 819 | minor variations in appearance and illumination |
UTBM-2 | 2 × 2.3 KM | training: 540; testing: 520 | medium variations in appearance and illumination |
Nordland | 4 × 728 KM | 4 × 3568 | severe variations in appearance |
City Center | 2 × 2.0 KM | 2 × 1237 | medium variations in viewpoint |
Dateset | Caffe-Alex Layers | SeqSLAM | FAB-MAP | ||||||
---|---|---|---|---|---|---|---|---|---|
conv4 | conv5 | relu5 | pool5 | fc6 | fc7 | ||||
Norland | spring vs. summer | 0.8967 | 0.8427 | 0.8734 | 0.8354 | 0.6722 | 0.5455 | 0.7222 | ‡ |
spring vs. fall | 0.8984 | 0.8572 | 0.8821 | 0.8579 | 0.7098 | 0.5859 | 0.7015 | ‡ | |
spring vs. winter | 0.9255 | 0.8987 | 0.8983 | 0.8750 | 0.4795 | 0.2387 | 0.6685 | ‡ | |
summer vs. fall | 0.9396 | 0.9381 | 0.9388 | 0.9375 | 0.9286 | 0.9047 | 0.6960 | ‡ | |
summer vs. winter | 0.9245 | 0.8935 | 0.8581 | 0.8497 | 0.4142 | 0.1817 | 0.5117 | ‡ | |
fall vs. winter | 0.9288 | 0.8922 | 0.8598 | 0.8599 | 0.5119 | 0.2337 | 0.5293 | ‡ | |
UTBM-1 | 0.9607 | 0.9576 | 0.9576 | 0.9583 | 0.9607 | 0.7762 | 0.7222 | 0.2356 | |
UTBM-2 | 0.9622 | 0.9564 | 0.9544 | 0.9574 | 0.9593 | 0.9516 | 0.7180 | 0.4813 | |
City Center | 0.9288 | 0.9246 | 0.9264 | 0.9317 | 0.9299 | 0.9166 | † | 0.5326 |
Method | Scores | Average Time Per Matching (All Data Sets) | |||
---|---|---|---|---|---|
UTBM-1 | UTBM-2 | City Center | Nordland (Spring vs. Winter) | ||
256 bits | 0.9411 | 0.9574 | 0.9094 | 0.8817 | 0.0135 s |
512 bits | 0.9478 | 0.9554 | 0.9084 | 0.8944 | 0.0147 s |
1024 bits | 0.9460 | 0.9612 | 0.9162 | 0.9046 | 0.0170 s |
2048 bits | 0.9521 | 0.9632 | 0.9246 | 0.9064 | 0.0209 s |
4096 bits | 0.9521 | 0.9641 | 0.9166 | 0.9099 | 0.0291 s |
Full feature (conv4) | 0.9607 | 0.9622 | 0.9228 | 0.9255 | 0.3259 s |
Dateset | Recall Results at 100% Precision | ||||
---|---|---|---|---|---|
Full Feature (conv4) | 4096 Hash Bits | FAB-MAP | SeqSLAM | ||
Norland | spring vs. summer | 57.09 | 69.02 | † | 45.71 |
spring vvs. fall | 64.66 | 67.26 | † | 33.91 | |
spring vs. winter | 76.77 | 72.88 | † | 35.53 | |
summer vs. fall | 86.88 | 87.67 | † | 47.89 | |
summer vs. winter | 60.47 | 28.26 | † | 22.84 | |
fall vs. winter | 82.16 | 79.65 | † | 15.82 | |
UTBM-1 | 37.97 | 32.88 | † | 20.16 | |
UTBM-2 | 16.35 | 11.54 | † | 8.53 | |
City Center | 75.04 | 76.29 | 31.78 | 52.63 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qiao, Y.; Cappelle, C.; Ruichek, Y.; Yang, T. ConvNet and LSH-Based Visual Localization Using Localized Sequence Matching. Sensors 2019, 19, 2439. https://doi.org/10.3390/s19112439
Qiao Y, Cappelle C, Ruichek Y, Yang T. ConvNet and LSH-Based Visual Localization Using Localized Sequence Matching. Sensors. 2019; 19(11):2439. https://doi.org/10.3390/s19112439
Chicago/Turabian StyleQiao, Yongliang, Cindy Cappelle, Yassine Ruichek, and Tao Yang. 2019. "ConvNet and LSH-Based Visual Localization Using Localized Sequence Matching" Sensors 19, no. 11: 2439. https://doi.org/10.3390/s19112439
APA StyleQiao, Y., Cappelle, C., Ruichek, Y., & Yang, T. (2019). ConvNet and LSH-Based Visual Localization Using Localized Sequence Matching. Sensors, 19(11), 2439. https://doi.org/10.3390/s19112439