research-article

EMOD: an efficient on-device mobile visual search system

Authors:

Mooi Choo ChuahAuthors Info & Claims

MMSys '15: Proceedings of the 6th ACM Multimedia Systems Conference

Pages 25 - 36

https://doi.org/10.1145/2713168.2713172

Published: 18 March 2015 Publication History

Abstract

Recently, researchers have proposed solutions to build on-device mobile visual search (ODMVS) systems. Different from traditional client-server mobile visual search systems, an ODMVS supports image searching directly within a mobile device. An ODMVS needs to be designed with constrained hardware in mind e.g. limited memory, less powerful CPU. In this paper, we present, EMOD, an efficient on-device mobile visual search system based on the Bag-of-Visual-Word (BOVW) framework but uses a small visual dictionary. An Object Word Ranking (OWR) algorithm is proposed to efficiently identify the most useful visual words of an image so as to construct a compact image signature for fast retrieval and greatly improved retrieval performance. Due to having a small visual dictionary, we propose the Top Inverted Index Ranking scheme to reduce the number of candidate images for similarity calculation. In addition, EMOD adopts a more efficient version of the recently proposed Ranking Consistency re-ranking algorithm for further performance enhancement. Via extensive experimental evaluations, we demonstrate that our prototype EMOD system yields good retrieval accuracy and query response times for a database with over 10K images.

References

[1]

http://www.google.com/mobile/goggles/.

[2]

http://camfindapp.com/.

[3]

http://www.flowerchecker.com/.

[4]

http://www.vision.ee.ethz.ch/showroom/zubud/index.en.html/.

[5]

http://www.robots.ox.ac.uk/vgg/data/oxbuildings/.

[6]

http://www.robots.ox.ac.uk/vgg/data/parisbuildings/.

[7]

http://mickey.cse.lehigh.edu/lubud/.

[8]

https://github.com/EsotericSoftware/kryo/.

[9]

R. Arandjelovic and A. Zisserman. All about vlad. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 1578--1585, June 2013.

Digital Library

[10]

H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (surf). Comput. Vis. Image Underst., 110(3):346--359, June 2008.

Digital Library

[11]

J. Beis and D. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on, pages 1000--1006, Jun 1997.

Digital Library

[12]

S. A. Chatzichristofis and Y. S. Boutalis. Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval. In Proceedings of the 6th international conference on Computer vision systems, ICVS'08, pages 312--322, 2008.

Digital Library

[13]

D. Chen and B. Girod. Memory-efficient image databases for mobile visual search. MultiMedia, IEEE, 21(1):14--23, Jan 2014.

[14]

D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, R. Vedantham, R. Grzeszczuk, and B. Girod. Residual enhanced visual vector as a compact signature for mobile visual search. Signal Process., 93(8):2316--2327, Aug. 2013.

Digital Library

[15]

X. Chen and M. Koskela. Mobile visual search from dynamic image databases. In A. Heyden and F. Kahl, editors, Image Analysis, volume 6688 of Lecture Notes in Computer Science, pages 196--205. Springer Berlin Heidelberg, 2011.

Digital Library

[16]

Y. Chen, X. Li, A. Dick, and R. Hill. Ranking consistency for image matching and object retrieval. Pattern Recognition, 47(3): 1349--1360, 2014. Handwriting Recognition and other {PR} Applications.

Digital Library

[17]

O. Chum, J. Philbin, and A. Zisserman. Near duplicate image detection: min-hash and tf-idf weighting. In British Machine Vision Conference, 2008.

[18]

Y. D. Chun, N. C. Kim, and I. H. Jang. Content-based image retrieval using multiresolution color and texture features. Multimedia, IEEE Transactions on, 10(6):1073--1084, Oct 2008.

Digital Library

[19]

M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24(6):381--395, June 1981.

Digital Library

[20]

A. Hartl, D. Schmalstieg, and G. Reitmayr. Client-side mobile visual search. In VISAPP 2014 - Proceedings of the 9th International Conference on Computer Vision Theory and Applications, 2014.

[21]

H. Jegou, M. Douze, and C. Schmid. Packing bag-of-features. In Computer Vision, 2009 IEEE 12th International Conference on, pages 2357--2364, Sept 2009.

[22]

E. Keogh, J. Lin, and A. Fu. Hot sax: Efficiently finding the most unusual time series subsequence. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM '05, pages 226--233, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[23]

J. Knopp, J. Sivic, and T. Pajdla. Avoiding confusing features in place recognition. In Proceedings of the 11th European Conference on Computer Vision: Part I, ECCV'10, pages 748--761, Berlin, Heidelberg, 2010. Springer-Verlag.

Digital Library

[24]

D. Kumhyr. Method for suspect identification using scanning of surveillance media. In US Patent Application 10/185685, Jan 2014.

[25]

D. Li and M. C. Chuah. Emovis: An efficient mobile visual search system for landmark recognition. In Mobile Ad-hoc and Sensor Networks (MSN), 2013 IEEE Ninth International Conference on, pages 53--60, Dec 2013.

Digital Library

[26]

D. Li, M.-C. Chuah, and L. Tian. Demo: Lehigh explorer augmented campus tour (lact). In Proceedings of the 2014 Workshop on Mobile Augmented Reality and Robotic Technology-based Systems, MARS '14, pages 15--16, New York, NY, USA, 2014. ACM.

Digital Library

[27]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91--110, Nov. 2004.

Digital Library

[28]

B. Manjunath, J.-R. Ohm, V. Vasudevan, and A. Yamada. Color and texture descriptors. Circuits and Systems for Video Technology, IEEE Transactions on, 11(6):703--715, Jun 2001.

Digital Library

[29]

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. V. Gool. A comparison of affine region detectors. Int. J. Comput. Vision, 65(1--2):43--72, Nov. 2005.

Digital Library

[30]

M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Application VISSAPP'09), pages 331--340. INSTICC Press, 2009.

[31]

D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR '06, pages 2161--2168, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[32]

M. Perd'och, O. Chum, and J. Matas. Efficient representation of local geometry for large scale object retrieval. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 9--16, June 2009.

[33]

F. Perronnin, J. Sánchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In Proceedings of the 11th European Conference on Computer Vision: Part IV, ECCV'10, pages 143--156, Berlin, Heidelberg, 2010. Springer-Verlag.

Digital Library

[34]

J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on, pages 1--8, June 2007.

[35]

J. Philbin, M. Isard, J. Sivic, and A. Zisserman. Descriptor learning for efficient retrieval. In K. Daniilidis, P. Maragos, and N. Paragios, editors, Computer Vision âĂŞ ECCV 2010, volume 6313 of Lecture Notes in Computer Science, pages 677--691. Springer Berlin Heidelberg, 2010.

Digital Library

[36]

B. Ruf, E. Kokiopoulou, and M. Detyniecki. Mobile museum guide based on fast sift recognition. In M. Detyniecki, U. Leiner, and A. NÃijrnberger, editors, Adaptive Multimedia Retrieval. Identifying, Summarizing, and Recommending Image and Music, volume 5811 of Lecture Notes in Computer Science, pages 170--183. Springer Berlin Heidelberg, 2010.

Digital Library

[37]

J. Sanchez and F. Perronnin. High-dimensional signature compression for large-scale image classification. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1665--1672, June 2011.

Digital Library

[38]

C. Silpa-Anan and R. Hartley. Optimised kd-trees for fast image descriptor matching. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1--8, June 2008.

[39]

J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, ICCV '03, pages 1470--, Washington, DC, USA, 2003. IEEE Computer Society.

Digital Library

[40]

Y. Song, W. Cai, and D. Deng. Hierarchical spatial matching for medical image retrieval. In Proceedings of 2011 International ACM Workshop on Medical Multimedia Analysis and Retrieval, 2011.

Digital Library

[41]

B. Thomee, E. M. Bakker, and M. S. Lew. Top-surf: A visual words toolkit. In Proceedings of the International Conference on Multimedia, MM '10, pages 1473--1476, New York, NY, USA, 2010. ACM.

Digital Library

[42]

P. Turcot and D. Lowe. Better matching with fewer features: The selection of useful features in large database recognition problems. In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pages 2109--2116, Sept 2009.

[43]

Y. Yang and S. Newsam. Geographic image retrieval using local invariant features. Geoscience and Remote Sensing, IEEE Transactions on, 51(2):818--832, Feb 2013.

[44]

K.-H. Yap, T. Chen, Z. Li, and K. Wu. A comparative study of mobile-based landmark recognition techniques. Intelligent Systems, IEEE, 25(1):48--57, Jan 2010.

Digital Library

Cited By

Li DChuah M(2018)Accurate Image Retrieval With Unsupervised 2-Stage k-NN Re-RankingComputer Vision10.4018/978-1-5225-5204-8.ch072(1726-1745)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-5204-8.ch072
Li WWen LChang MLim SLyu S(2017)Adaptive RNN Tree for Large-Scale Human Action Recognition2017 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2017.161(1453-1461)Online publication date: Oct-2017
https://doi.org/10.1109/ICCV.2017.161
Çalışır FBaştan MUlusoy ÖGüdükbay U(2017)Mobile multi-view object image searchMultimedia Tools and Applications10.1007/s11042-016-3659-976:10(12433-12456)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s11042-016-3659-9
Show More Cited By

Recommendations

TapTell: understanding visual intents on-the-go
MM '11: Proceedings of the 19th ACM international conference on Multimedia

This demonstration presents a mobile-based visual recognition and recommendation application on Windows Phone 7 called TapTell. This is different from other mobile-based visual search mechanisms which merely focus on the search process. TapTell firstly ...
EMOVIS: An Efficient Mobile Visual Search System for Landmark Recognition
MSN '13: Proceedings of the 2013 IEEE 9th International Conference on Mobile Ad-hoc and Sensor Networks

Traditionally, content-based image retrieval systems (CBIR) are designed to allow users to search for images in large databases which match closely with users' query images. Recent emergence of powerful mobile devices equipped with digital cameras have ...
Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MMSys '15: Proceedings of the 6th ACM Multimedia Systems Conference

March 2015

277 pages

ISBN:9781450333511

DOI:10.1145/2713168

General Chair:
Wei Tsang Ooi
National University of Singapore, Singapore
,
Program Chairs:
Wu-chi Feng
Portland State University
,
Feng Liu
Portland State University

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

MMSys '15

Sponsor:

MMSys '15: Multimedia Systems Conference 2015

March 18 - 20, 2015

Oregon, Portland

Acceptance Rates

MMSys '15 Paper Acceptance Rate 12 of 41 submissions, 29%;

Overall Acceptance Rate 176 of 530 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
256
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li DChuah M(2018)Accurate Image Retrieval With Unsupervised 2-Stage k-NN Re-RankingComputer Vision10.4018/978-1-5225-5204-8.ch072(1726-1745)Online publication date: 2018
https://doi.org/10.4018/978-1-5225-5204-8.ch072
Li WWen LChang MLim SLyu S(2017)Adaptive RNN Tree for Large-Scale Human Action Recognition2017 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2017.161(1453-1461)Online publication date: Oct-2017
https://doi.org/10.1109/ICCV.2017.161
Çalışır FBaştan MUlusoy ÖGüdükbay U(2017)Mobile multi-view object image searchMultimedia Tools and Applications10.1007/s11042-016-3659-976:10(12433-12456)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1007/s11042-016-3659-9
Li DChuah M(2016)Accurate Image Retrieval with Unsupervised 2-Stage k-NN Re-RankingInternational Journal of Multimedia Data Engineering & Management10.4018/IJMDEM.20160101037:1(41-59)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.4018/IJMDEM.2016010103
Li DSalonidis TDesai NChuah M(2016)DeepCham: Collaborative Edge-Mediated Adaptive Deep Learning for Mobile Object Recognition2016 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC.2016.38(64-76)Online publication date: Oct-2016
https://doi.org/10.1109/SEC.2016.38
Kusamura YKozawa YAmagasa TKitagawa H(2016)GPU Acceleration of Content-Based Image Retrieval Based on SIFT Descriptors2016 19th International Conference on Network-Based Information Systems (NBiS)10.1109/NBiS.2016.55(342-347)Online publication date: Sep-2016
https://doi.org/10.1109/NBiS.2016.55

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten