Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

On-Device Mobile Landmark Recognition Using Binarized Descriptor with Multifeature Fusion

Published: 07 October 2015 Publication History

Abstract

Along with the exponential growth of high-performance mobile devices, on-device Mobile Landmark Recognition (MLR) has recently attracted increasing research attention. However, the latency and accuracy of automatic recognition remain as bottlenecks against its real-world usage. In this article, we introduce a novel framework that combines interactive image segmentation with multifeature fusion to achieve improved MLR with high accuracy. First, we propose an effective vector binarization method to reduce the memory usage of image descriptors extracted on-device, which maintains comparable recognition accuracy to the original descriptors. Second, we design a location-aware fusion algorithm that can fuse multiple visual features into a compact yet discriminative image descriptor to improve on-device efficiency. Third, a user-friendly interaction scheme is developed that enables interactive foreground/background segmentation to largely improve recognition accuracy. Experimental results demonstrate the effectiveness of the proposed algorithms for on-device MLR applications.

References

[1]
G. Baatz, K. Koeser, D. Chen, R. Grzeszczuk, and M. Pollefeys. 2010. Handling urban location recognition as a 2D homothetic problem. In Proceedings of the 11th European Conference on Computer Vision (ECCV’10).
[2]
H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. 2008. SURF: Speeded up robust features. Computer Vision and Image Understanding (CVIU) 110, 3, 346--359.
[3]
C. Biancalana, F. Gasparetti, and A. Micarelli. 2013. An approach to social recommendation for context-aware mobile services. ACM Transactions on Intelligent Systems and Technology 4, 1.
[4]
A. Bosch, A. Zisserman, and X. Munoz. 2007. Representing shape with a spatial pyramid kernel. In CIVR.
[5]
Y. Boykov and M. P. Jolly. 2001. Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In Proceedings of ICCV 2001.
[6]
J. Brandt. 2010. Transform coding for fast approximate nearest neighbor search in high dimensions. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1815--1822.
[7]
V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, Y. Reznik, R. Grzeszczuk, and B. Girod. 2011. Compressed histogram of gradients: A low bitrate descriptor. International Journal on Computer Vision 94, 5, 384--399.
[8]
D. Chen, G. Baatz, K. Koeser, S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. 2011a. City-scale landmark identification on mobile devices. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 737--744.
[9]
D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod. 2013. Residual enhanced visual vector as a compact signature for mobile visual search. Signal Processing 93, 8, 2316--2327.
[10]
T. Chen, K. H. Yap, and L. P. Chau. 2011b. Integrated content and context analysis for mobile landmark recognition. IEEE Transactions on Circuits and Systems for Video Technology 21, 10, 1476--1486.
[11]
Y. Chen, T. Guan, and C. Wang. 2010. Approximate nearest neighbor search by residual vector quantization. Sensors 10, 12, 11259--11273.
[12]
M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. Proceedings of the 20th Annual Symposium on Computational Geometry. 253--262.
[13]
A. Dey, J. Hightower, E. de Lara, and N. Davies. 2010. Location-based services. IEEE Pervasive Computing 9, 1, 11--12.
[14]
M. Douze, A. Ramisa, and C. Schmid. 2011. Combining attributes and Fisher vectors for efficient image retrieval. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 745--752.
[15]
J. A. Fails and D. R. Olsen. 2003. Interactive machine learning. ACM IUI.
[16]
B. Fernando, E. Fromont, D. Muselet, and M. Sebban. 2012. Discriminative feature fusion for image classification. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3434--3441.
[17]
P. Gehler and S. Nowozin. 2009. On feature combination for multiclass object classification. In Proceedings of the IEEE International Conference on Computer Vision. 221--228.
[18]
Y. C. Gong and S. Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 817--824.
[19]
A. Gordo and F. Perronnin. 2011. Asymmetric distances for binary embeddings. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 729--736.
[20]
T. Guan, Y. F. He, J. Gao, J. Z. Yang, and J. Q. Yu. 2013. On-device mobile visual location recognition by integrating vision and inertial sensors. IEEE Transactions on Multimedia.
[21]
H. Jegou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1, 117--128.
[22]
H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9, 1704--1716.
[23]
R. R. Ji, L. Y. Duan, J. Chen, H. X. Yao, Y. Rui, S. F. Chang, and W. Gao. 2011. Towards low bit rate mobile visual search with multiple-channel coding. In Proceedings of the ACM International Conference on Multimedia. 573--582.
[24]
R. R. Ji, L. Y. Duan, J. Chen, H. X. Yao, J. S. Yuan, and Y. W. Rui. Gao. 2012. Location discriminative vocabulary coding for mobile landmark search. International Journal of Computer Vision 96, 3, 290--314.
[25]
R. R. Ji, Y. Gao, W. Liu, X. Xie, Q. Tian, and X. L. Li. 2014. When location meets social multimedia: A comprehensive survey on location-aware social multimedia. ACM Transactions on Intelligent System and Technology 6, 1.
[26]
R. R. Ji, H. X. Yao, Q. Tian, P. F. Xu, X. S. Sun, and X. M. Liu. 2012. Context-aware semi-local feature detector. ACM Transactions on Intelligent System and Technology 3, 3.
[27]
D. Kurz and S. Benhimane. 2011. Inertial sensor-aligned visual feature descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[28]
Y. Li, J. Sun, C. K. Tang, and H. Y. Shum. 2004. Lazy snapping. ACM Transactions on Graphics 23, 3, 303--308.
[29]
H. Liu, T. Mei, J. B. Luo, H. Q. Li, and S. P. Li. 2012a. Finding perfect rendezvous on the go: Accurate mobile visual localization and its applications to routing. In Proceedings of the ACM Multimedia (ACM MM).
[30]
N. N. Liu, E. Dellandrea, C. Zhu, C. E. Bichot, and L. M. Chen. 2012b. A selective weighted late fusion for visual concept recognition. In Proceedings of the 12th International Conference on Computer Vision (ECCV’12). 426--435.
[31]
W. Min, C. Xu, M. Xu, X. Xiao, and B. Bao. 2014. Mobile landmark search with 3D models. IEEE Transactions on Multimedia 16, 3, 623--636.
[32]
D. Nister and H. Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). 2161--2168.
[33]
M. Raginsky and S. Lazebnik. 2009. Locality-sensitive binary codes from shift-invariant kernels. In Proceedings of the Conference on Neural Information Processing Systems. 1509--1517.
[34]
C. Rother, V. Kolmogorov, and A. Blake. 2004. GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics 23, 3, 309--314.
[35]
J. T. Sang, T. Mei, Y. Q. Xu, C. Zhao, C. S. Xu, and S. P. Li. 2013. Interaction design for mobile visual search. IEEE Transactions on Multimedia 15, 7, 1665--1676.
[36]
G. Schroth, R. Huitl, D. Chen, M. Abu-Alqumsan, A. Al-Nuaimi, and E. Steinbach. 2011. Mobile visual location recognition. IEEE Signal Processing Magazine 28, 4, 77--89.
[37]
J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision. 1470--1477.
[38]
J. Song, Y. Yang, Z. Huang, H. Shen, and R. Hong. 2011. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In Proceedings of the ACM International Conference on Multimedia. 423--432.
[39]
Y. Tian, T. Guan, C. Wang, L. J. Li, and W. Liu. 2009. Interactive foreground segmentation method using mean shift and graph cuts. Sensor Review 29, 157--162.
[40]
S. Tsai, D. Chen, H. Chen, C. H. Hsu, K. H. Kim, J. P. Singh, and B. Girod. 2011. Combining image and text features: A hybrid approach to mobile book spine recognition. In Proceedings of the ACM International Conference on Multimedia. 1029--1032.
[41]
K. Y. Tseng, Y. L. Lin, C. Y. Hsiu, and W. H. Hsu. 2012. Sketch-based image retrieval on mobile devices using compact hash bits. In Proceedings of the ACM International Conference on Multimedia. 913--916.
[42]
T. Wang et al. 2013. TouchCut: Fast image and video segmentation using single-touch interaction. Computer Vision and Image Understanding 120, 14--30.
[43]
Y. Wang, T. Mei, J. D. Wang, H. Q. Li, and S. P. Li. 2011. JIGSAW: Interactive mobile visual search with multimodal queries. In Proceedings of the ACM International Conference on Multimedia. 73--82.
[44]
Y. Weiss, A. Torralba, and R. Fergus. 2008. Spectral hashing. In Advances in Neural Information Processing Systems. 1--8.
[45]
C. Wengert, M. Douze, and M. Douze. 2011. Bag-of-colors for improved image search. In Proceedings of the ACM International Conference on Multimedia. 1437--1440.
[46]
S. White, D. Marino, and S. Feiner. 2007. Designing a mobile user interface for automated species identification. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (ACM CHI). 291--294.
[47]
Y. Wu, S. Y. Lu, T. Mei, J. Zhang, and S. P. Li. 2012. Local visual words coding for low bit rate mobile visual search. In Proceedings of the ACM International Conference on Multimedia. 989--992.
[48]
J. H. Xia, K. Gao, D. M. Zhang, and Z. D. Mao. 2012. Geometric context-preserving progressive transmission in mobile visual search. Proceedings of the ACM International Conference on Multimedia. 953--956.
[49]
X. Xian, C. Xu, J. Wang, and M. Xu. 2012. Enhanced 3D modeling for landmark image classification. IEEE Transactions on Multimedia 14, 4, 1246--1258.
[50]
K. H. Yap, T. Chen, Z. Li, and K. Wu. 2010. A comparative study of mobile-based landmark recognition techniques. IEEE Intelligent Systems 25, 1, 48--57.
[51]
K. H. Yap, Z. Li, D. J. Zhang, and Z. K. Ng. 2012. Efficient mobile landmark recognition based on saliency-aware scalable vocabulary tree. In Proceedings of the ACM International Conference on Multimedia. 1001--1004.
[52]
Y. Yang, J. K. Song, Z. Huang, Z. G. Ma, N. Sebe, and A. G. Hauptmann. 2013. Multifeature fusion via hierarchical regression for multimedia analysis. IEEE Transactions on Multimedia 15, 3, 572--581.
[53]
G. N. Ye, D. Liu, I. H. Jhuo, and S. F. Chang. 2012. Robust late fusion with rank minimization. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3021--3028.
[54]
T. Yeh and T. Darrell. 2005. Doubleshot: An interactive user-aided segmentation tool. In Proceedings of the 10th International Conference on Intelligent User Interfaces (ACM IUI). 287--289.
[55]
W. Zhang, K. Gao, Y. D. Zhang, and J. T. Li. 2011. Efficient approximate nearest neighbor search with integrated binary codes. In Proceedings of the ACM International Conference on Multimedia. 1189--1192.
[56]
S. Zhang, Q. Huang, G. Hua, S. Jiang, W. Gao, and Tian Q. 2010. Building contextual visual vocabulary for large-scale image applications. ACM Multimedia 501--510
[57]
W. Zhou, Y. Lu, H. Li, Y. Song, and Q. Tian. 2010. Spatial coding for large scale partial-duplicate web image search. ACM Multimedia 511--520.

Cited By

View all
  • (2021)Registration and occlusion handling based on the FAST ICP-ORB method for augmented reality systemsMultimedia Tools and Applications10.1007/s11042-020-10342-580:14(21041-21058)Online publication date: 1-Jun-2021
  • (2020)Double yolk eggs detection using fuzzy logicPLOS ONE10.1371/journal.pone.024188815:11(e0241888)Online publication date: 5-Nov-2020
  • (2019)Hierarchy-Dependent Cross-Platform Multi-View Feature Learning for Venue Category PredictionIEEE Transactions on Multimedia10.1109/TMM.2018.287683021:6(1609-1619)Online publication date: 22-May-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 7, Issue 1
October 2015
293 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2830012
  • Editor:
  • Yu Zheng
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2015
Accepted: 01 June 2015
Revised: 01 May 2015
Received: 01 July 2014
Published in TIST Volume 7, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Mobile landmark recognition
  2. binarization
  3. feature fusion
  4. on-device
  5. user interaction

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Special Fund for Earthquake Research in the Public Interest
  • National Natural Science Foundation of China (NSFC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Registration and occlusion handling based on the FAST ICP-ORB method for augmented reality systemsMultimedia Tools and Applications10.1007/s11042-020-10342-580:14(21041-21058)Online publication date: 1-Jun-2021
  • (2020)Double yolk eggs detection using fuzzy logicPLOS ONE10.1371/journal.pone.024188815:11(e0241888)Online publication date: 5-Nov-2020
  • (2019)Hierarchy-Dependent Cross-Platform Multi-View Feature Learning for Venue Category PredictionIEEE Transactions on Multimedia10.1109/TMM.2018.287683021:6(1609-1619)Online publication date: 22-May-2019
  • (2019)A Cascade Learning Approach for Automated Detection of Locomotive Speed Sensor Using Imbalanced Data in ITSIEEE Access10.1109/ACCESS.2019.29282247(90851-90862)Online publication date: 2019
  • (2018)Introducing AI to undergraduate students via computer vision projectsProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3505021(7956-7957)Online publication date: 2-Feb-2018
  • (2018)Application of Docker Swarm cluster for testing programs, developed for system of devices within paradigm of Internet of thingsJournal of Physics: Conference Series10.1088/1742-6596/1015/3/0321291015(032129)Online publication date: 21-May-2018
  • (2018)Fast exact fingerprint indexing based on Compact Binary Minutia Cylinder CodesNeurocomputing10.1016/j.neucom.2017.10.027275:C(1711-1724)Online publication date: 31-Jan-2018
  • (2018)D3Neurocomputing10.1016/j.neucom.2017.08.046273:C(611-621)Online publication date: 17-Jan-2018
  • (2018)Dictionary learning with structured noiseNeurocomputing10.1016/j.neucom.2017.07.041273:C(414-423)Online publication date: 17-Jan-2018
  • (2018)A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long short-term memoryAutomation in Construction10.1016/j.autcon.2017.11.00286(118-124)Online publication date: Feb-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media