research-article

On-Device Mobile Landmark Recognition Using Binarized Descriptor with Multifeature Fusion

Authors:

Rongrong JiAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 7, Issue 1

Article No.: 12, Pages 1 - 29

https://doi.org/10.1145/2795234

Published: 07 October 2015 Publication History

Abstract

Along with the exponential growth of high-performance mobile devices, on-device Mobile Landmark Recognition (MLR) has recently attracted increasing research attention. However, the latency and accuracy of automatic recognition remain as bottlenecks against its real-world usage. In this article, we introduce a novel framework that combines interactive image segmentation with multifeature fusion to achieve improved MLR with high accuracy. First, we propose an effective vector binarization method to reduce the memory usage of image descriptors extracted on-device, which maintains comparable recognition accuracy to the original descriptors. Second, we design a location-aware fusion algorithm that can fuse multiple visual features into a compact yet discriminative image descriptor to improve on-device efficiency. Third, a user-friendly interaction scheme is developed that enables interactive foreground/background segmentation to largely improve recognition accuracy. Experimental results demonstrate the effectiveness of the proposed algorithms for on-device MLR applications.

References

[1]

G. Baatz, K. Koeser, D. Chen, R. Grzeszczuk, and M. Pollefeys. 2010. Handling urban location recognition as a 2D homothetic problem. In Proceedings of the 11th European Conference on Computer Vision (ECCV’10).

Digital Library

[2]

H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool. 2008. SURF: Speeded up robust features. Computer Vision and Image Understanding (CVIU) 110, 3, 346--359.

Digital Library

[3]

C. Biancalana, F. Gasparetti, and A. Micarelli. 2013. An approach to social recommendation for context-aware mobile services. ACM Transactions on Intelligent Systems and Technology 4, 1.

Digital Library

[4]

A. Bosch, A. Zisserman, and X. Munoz. 2007. Representing shape with a spatial pyramid kernel. In CIVR.

Digital Library

[5]

Y. Boykov and M. P. Jolly. 2001. Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In Proceedings of ICCV 2001.

[6]

J. Brandt. 2010. Transform coding for fast approximate nearest neighbor search in high dimensions. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 1815--1822.

[7]

V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, Y. Reznik, R. Grzeszczuk, and B. Girod. 2011. Compressed histogram of gradients: A low bitrate descriptor. International Journal on Computer Vision 94, 5, 384--399.

Digital Library

[8]

D. Chen, G. Baatz, K. Koeser, S. Tsai, R. Vedantham, T. Pylvanainen, K. Roimela, X. Chen, J. Bach, M. Pollefeys, B. Girod, and R. Grzeszczuk. 2011a. City-scale landmark identification on mobile devices. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 737--744.

Digital Library

[9]

D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod. 2013. Residual enhanced visual vector as a compact signature for mobile visual search. Signal Processing 93, 8, 2316--2327.

Digital Library

[10]

T. Chen, K. H. Yap, and L. P. Chau. 2011b. Integrated content and context analysis for mobile landmark recognition. IEEE Transactions on Circuits and Systems for Video Technology 21, 10, 1476--1486.

Digital Library

[11]

Y. Chen, T. Guan, and C. Wang. 2010. Approximate nearest neighbor search by residual vector quantization. Sensors 10, 12, 11259--11273.

[12]

M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. Proceedings of the 20th Annual Symposium on Computational Geometry. 253--262.

Digital Library

[13]

A. Dey, J. Hightower, E. de Lara, and N. Davies. 2010. Location-based services. IEEE Pervasive Computing 9, 1, 11--12.

Digital Library

[14]

M. Douze, A. Ramisa, and C. Schmid. 2011. Combining attributes and Fisher vectors for efficient image retrieval. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 745--752.

Digital Library

[15]

J. A. Fails and D. R. Olsen. 2003. Interactive machine learning. ACM IUI.

Digital Library

[16]

B. Fernando, E. Fromont, D. Muselet, and M. Sebban. 2012. Discriminative feature fusion for image classification. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3434--3441.

Digital Library

[17]

P. Gehler and S. Nowozin. 2009. On feature combination for multiclass object classification. In Proceedings of the IEEE International Conference on Computer Vision. 221--228.

[18]

Y. C. Gong and S. Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 817--824.

Digital Library

[19]

A. Gordo and F. Perronnin. 2011. Asymmetric distances for binary embeddings. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 729--736.

Digital Library

[20]

T. Guan, Y. F. He, J. Gao, J. Z. Yang, and J. Q. Yu. 2013. On-device mobile visual location recognition by integrating vision and inertial sensors. IEEE Transactions on Multimedia.

Digital Library

[21]

H. Jegou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1, 117--128.

Digital Library

[22]

H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9, 1704--1716.

Digital Library

[23]

R. R. Ji, L. Y. Duan, J. Chen, H. X. Yao, Y. Rui, S. F. Chang, and W. Gao. 2011. Towards low bit rate mobile visual search with multiple-channel coding. In Proceedings of the ACM International Conference on Multimedia. 573--582.

Digital Library

[24]

R. R. Ji, L. Y. Duan, J. Chen, H. X. Yao, J. S. Yuan, and Y. W. Rui. Gao. 2012. Location discriminative vocabulary coding for mobile landmark search. International Journal of Computer Vision 96, 3, 290--314.

Digital Library

[25]

R. R. Ji, Y. Gao, W. Liu, X. Xie, Q. Tian, and X. L. Li. 2014. When location meets social multimedia: A comprehensive survey on location-aware social multimedia. ACM Transactions on Intelligent System and Technology 6, 1.

Digital Library

[26]

R. R. Ji, H. X. Yao, Q. Tian, P. F. Xu, X. S. Sun, and X. M. Liu. 2012. Context-aware semi-local feature detector. ACM Transactions on Intelligent System and Technology 3, 3.

Digital Library

[27]

D. Kurz and S. Benhimane. 2011. Inertial sensor-aligned visual feature descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Digital Library

[28]

Y. Li, J. Sun, C. K. Tang, and H. Y. Shum. 2004. Lazy snapping. ACM Transactions on Graphics 23, 3, 303--308.

Digital Library

[29]

H. Liu, T. Mei, J. B. Luo, H. Q. Li, and S. P. Li. 2012a. Finding perfect rendezvous on the go: Accurate mobile visual localization and its applications to routing. In Proceedings of the ACM Multimedia (ACM MM).

Digital Library

[30]

N. N. Liu, E. Dellandrea, C. Zhu, C. E. Bichot, and L. M. Chen. 2012b. A selective weighted late fusion for visual concept recognition. In Proceedings of the 12th International Conference on Computer Vision (ECCV’12). 426--435.

Digital Library

[31]

W. Min, C. Xu, M. Xu, X. Xiao, and B. Bao. 2014. Mobile landmark search with 3D models. IEEE Transactions on Multimedia 16, 3, 623--636.

Digital Library

[32]

D. Nister and H. Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). 2161--2168.

Digital Library

[33]

M. Raginsky and S. Lazebnik. 2009. Locality-sensitive binary codes from shift-invariant kernels. In Proceedings of the Conference on Neural Information Processing Systems. 1509--1517.

[34]

C. Rother, V. Kolmogorov, and A. Blake. 2004. GrabCut: Interactive foreground extraction using iterated graph cuts. ACM Transactions on Graphics 23, 3, 309--314.

Digital Library

[35]

J. T. Sang, T. Mei, Y. Q. Xu, C. Zhao, C. S. Xu, and S. P. Li. 2013. Interaction design for mobile visual search. IEEE Transactions on Multimedia 15, 7, 1665--1676.

Digital Library

[36]

G. Schroth, R. Huitl, D. Chen, M. Abu-Alqumsan, A. Al-Nuaimi, and E. Steinbach. 2011. Mobile visual location recognition. IEEE Signal Processing Magazine 28, 4, 77--89.

[37]

J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision. 1470--1477.

Digital Library

[38]

J. Song, Y. Yang, Z. Huang, H. Shen, and R. Hong. 2011. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In Proceedings of the ACM International Conference on Multimedia. 423--432.

Digital Library

[39]

Y. Tian, T. Guan, C. Wang, L. J. Li, and W. Liu. 2009. Interactive foreground segmentation method using mean shift and graph cuts. Sensor Review 29, 157--162.

[40]

S. Tsai, D. Chen, H. Chen, C. H. Hsu, K. H. Kim, J. P. Singh, and B. Girod. 2011. Combining image and text features: A hybrid approach to mobile book spine recognition. In Proceedings of the ACM International Conference on Multimedia. 1029--1032.

Digital Library

[41]

K. Y. Tseng, Y. L. Lin, C. Y. Hsiu, and W. H. Hsu. 2012. Sketch-based image retrieval on mobile devices using compact hash bits. In Proceedings of the ACM International Conference on Multimedia. 913--916.

Digital Library

[42]

T. Wang et al. 2013. TouchCut: Fast image and video segmentation using single-touch interaction. Computer Vision and Image Understanding 120, 14--30.

Digital Library

[43]

Y. Wang, T. Mei, J. D. Wang, H. Q. Li, and S. P. Li. 2011. JIGSAW: Interactive mobile visual search with multimodal queries. In Proceedings of the ACM International Conference on Multimedia. 73--82.

Digital Library

[44]

Y. Weiss, A. Torralba, and R. Fergus. 2008. Spectral hashing. In Advances in Neural Information Processing Systems. 1--8.

[45]

C. Wengert, M. Douze, and M. Douze. 2011. Bag-of-colors for improved image search. In Proceedings of the ACM International Conference on Multimedia. 1437--1440.

Digital Library

[46]

S. White, D. Marino, and S. Feiner. 2007. Designing a mobile user interface for automated species identification. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (ACM CHI). 291--294.

Digital Library

[47]

Y. Wu, S. Y. Lu, T. Mei, J. Zhang, and S. P. Li. 2012. Local visual words coding for low bit rate mobile visual search. In Proceedings of the ACM International Conference on Multimedia. 989--992.

Digital Library

[48]

J. H. Xia, K. Gao, D. M. Zhang, and Z. D. Mao. 2012. Geometric context-preserving progressive transmission in mobile visual search. Proceedings of the ACM International Conference on Multimedia. 953--956.

Digital Library

[49]

X. Xian, C. Xu, J. Wang, and M. Xu. 2012. Enhanced 3D modeling for landmark image classification. IEEE Transactions on Multimedia 14, 4, 1246--1258.

Digital Library

[50]

K. H. Yap, T. Chen, Z. Li, and K. Wu. 2010. A comparative study of mobile-based landmark recognition techniques. IEEE Intelligent Systems 25, 1, 48--57.

Digital Library

[51]

K. H. Yap, Z. Li, D. J. Zhang, and Z. K. Ng. 2012. Efficient mobile landmark recognition based on saliency-aware scalable vocabulary tree. In Proceedings of the ACM International Conference on Multimedia. 1001--1004.

Digital Library

[52]

Y. Yang, J. K. Song, Z. Huang, Z. G. Ma, N. Sebe, and A. G. Hauptmann. 2013. Multifeature fusion via hierarchical regression for multimedia analysis. IEEE Transactions on Multimedia 15, 3, 572--581.

Digital Library

[53]

G. N. Ye, D. Liu, I. H. Jhuo, and S. F. Chang. 2012. Robust late fusion with rank minimization. In Proceedings of the International Conference on Computer Vision and Pattern Recognition. 3021--3028.

Digital Library

[54]

T. Yeh and T. Darrell. 2005. Doubleshot: An interactive user-aided segmentation tool. In Proceedings of the 10th International Conference on Intelligent User Interfaces (ACM IUI). 287--289.

Digital Library

[55]

W. Zhang, K. Gao, Y. D. Zhang, and J. T. Li. 2011. Efficient approximate nearest neighbor search with integrated binary codes. In Proceedings of the ACM International Conference on Multimedia. 1189--1192.

Digital Library

[56]

S. Zhang, Q. Huang, G. Hua, S. Jiang, W. Gao, and Tian Q. 2010. Building contextual visual vocabulary for large-scale image applications. ACM Multimedia 501--510

Digital Library

[57]

W. Zhou, Y. Lu, H. Li, Y. Song, and Q. Tian. 2010. Spatial coding for large scale partial-duplicate web image search. ACM Multimedia 511--520.

Digital Library

Cited By

Tian YZhou XWang XWang ZYao H(2021)Registration and occlusion handling based on the FAST ICP-ORB method for augmented reality systemsMultimedia Tools and Applications10.1007/s11042-020-10342-580:14(21041-21058)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1007/s11042-020-10342-5
Intarakumthornchai TKesvarakul R(2020)Double yolk eggs detection using fuzzy logicPLOS ONE10.1371/journal.pone.024188815:11(e0241888)Online publication date: 5-Nov-2020
https://doi.org/10.1371/journal.pone.0241888
Jiang SMin WMei S(2019)Hierarchy-Dependent Cross-Platform Multi-View Feature Learning for Venue Category PredictionIEEE Transactions on Multimedia10.1109/TMM.2018.287683021:6(1609-1619)Online publication date: 22-May-2019
https://dl.acm.org/doi/10.1109/TMM.2018.2876830
Show More Cited By

Index Terms

On-Device Mobile Landmark Recognition Using Binarized Descriptor with Multifeature Fusion
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
2. Information systems
  1. Information retrieval
  2. Information systems applications

Recommendations

A Survey on Mobile Landmark Recognition for Information Retrieval
MDM '09: Proceedings of the 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware

The growing usage of mobile devices has led to proliferation of many mobile applications. A growing trend in mobile applications is centered on mobile landmark recognition. It is a new mobile application that recognizes a captured landmark using the ...
2D facial expression recognition via 3D reconstruction and feature fusion

This paper proposed a method for facial expression recognition.In proposed method, facial depth has been added to facial texture for feature extraction.We demonstrated that adding the facial depth to feature extraction is effective.The 3DH-LLBP is ...
Expression recognition methods based on feature fusion
BI'10: Proceedings of the 2010 international conference on Brain informatics

Expression recognition is popular research focus in Artificial Intelligence and Pattern Recognition. Feature fusion is one of the most important technical methods in expression recognition. To study how the feature information extracted from different ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 7, Issue 1

October 2015

293 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/2830012

Editor:
Yu Zheng
Microsoft Research, China

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2015

Accepted: 01 June 2015

Revised: 01 May 2015

Received: 01 July 2014

Published in TIST Volume 7, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Special Fund for Earthquake Research in the Public Interest
National Natural Science Foundation of China (NSFC)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
424
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tian YZhou XWang XWang ZYao H(2021)Registration and occlusion handling based on the FAST ICP-ORB method for augmented reality systemsMultimedia Tools and Applications10.1007/s11042-020-10342-580:14(21041-21058)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1007/s11042-020-10342-5
Intarakumthornchai TKesvarakul R(2020)Double yolk eggs detection using fuzzy logicPLOS ONE10.1371/journal.pone.024188815:11(e0241888)Online publication date: 5-Nov-2020
https://doi.org/10.1371/journal.pone.0241888
Jiang SMin WMei S(2019)Hierarchy-Dependent Cross-Platform Multi-View Feature Learning for Venue Category PredictionIEEE Transactions on Multimedia10.1109/TMM.2018.287683021:6(1609-1619)Online publication date: 22-May-2019
https://dl.acm.org/doi/10.1109/TMM.2018.2876830
Li BZhou SCheng LZhu RHu TAnjum AHe ZZou Y(2019)A Cascade Learning Approach for Automated Detection of Locomotive Speed Sensor Using Imbalanced Data in ITSIEEE Access10.1109/ACCESS.2019.29282247(90851-90862)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2928224
Zeng KLi YXu YWu DWu NMcIlraith SWeinberger K(2018)Introducing AI to undergraduate students via computer vision projectsProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3505021(7956-7957)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.5555/3504035.3505021
Shichkina YKupriyanov MMoldachev S(2018)Application of Docker Swarm cluster for testing programs, developed for system of devices within paradigm of Internet of thingsJournal of Physics: Conference Series10.1088/1742-6596/1015/3/0321291015(032129)Online publication date: 21-May-2018
https://doi.org/10.1088/1742-6596/1015/3/032129
Bai CWang WZhao TLi M(2018)Fast exact fingerprint indexing based on Compact Binary Minutia Cylinder CodesNeurocomputing10.1016/j.neucom.2017.10.027275:C(1711-1724)Online publication date: 31-Jan-2018
https://dl.acm.org/doi/10.1016/j.neucom.2017.10.027
Hong SRyu JIm WYang H(2018)D3Neurocomputing10.1016/j.neucom.2017.08.046273:C(611-621)Online publication date: 17-Jan-2018
https://dl.acm.org/doi/10.1016/j.neucom.2017.08.046
Zhou PFang CLin ZZhang CChang E(2018)Dictionary learning with structured noiseNeurocomputing10.1016/j.neucom.2017.07.041273:C(414-423)Online publication date: 17-Jan-2018
https://dl.acm.org/doi/10.1016/j.neucom.2017.07.041
Ding LFang WLuo HLove PZhong BOuyang X(2018)A deep hybrid learning model to detect unsafe behavior: Integrating convolution neural networks and long short-term memoryAutomation in Construction10.1016/j.autcon.2017.11.00286(118-124)Online publication date: Feb-2018
https://doi.org/10.1016/j.autcon.2017.11.002
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents