Evolving weighting schemes for the Bag of Visual Words

Hugo Jair Escalante¹,
Víctor Ponce-López^2,3,4,
Sergio Escalera^3,4,
Xavier Baró^2,3,4,
Alicia Morales-Reyes¹ &
…
José Martínez-Carranza¹

639 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

The Bag of Visual Words (BoVW) is an established representation in computer vision. Taking inspiration from text mining, this representation has proved to be very effective in many domains. However, in most cases, standard term-weighting schemes are adopted (e.g., term-frequency or TF-IDF). It remains open the question of whether alternative weighting schemes could boost the performance of methods based on BoVW. More importantly, it is unknown whether it is possible to automatically learn and determine effective weighting schemes from scratch. This paper brings some light into both of these unknowns. On the one hand, we report an evaluation of the most common weighting schemes used in text mining, but rarely used in computer vision tasks. Besides, we propose an evolutionary algorithm capable of automatically learning weighting schemes for computer vision problems. We report empirical results of an extensive study in several computer vision problems. Results show the usefulness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Bag-of-Words Method with Dictionary Analysis by Evolutionary Algorithm

Visual Dictionary Pruning Using Mutual Information and Information Gain

Novel Distributional Visual-Feature Representations for image classification

Article 24 September 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

One should note the text mining community has proposed variants that aim to soften such assumptions, e.g., using n-grams [2], still the BoW is very competitive with such formulations.
Please note that traditional weighting schemes have been proposed by researchers based on their own experiences and biases, making strong assumptions and relying on intuition.
Please note that in GP, for each individual, either mutation or crossover is performed each time, but not both. This is different from other variants like genetic algorithms.
Matlab files with the predefined partitions are publicly available under request.
PHOW is an extension to the raw BoVW formulation that aims at incorporating spatial information by means of a pyramidal structure, see [3] for details.
Please note that estimating the fitness function is quite efficient, as it is based on a fast approximation to a linear SVM. So this method can be used for most computer vision applications. Also, we emphasize that the fitness function is only estimated during the learning process, which has to be done a single time and most of the times is performed offline.

References

Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Boston
Google Scholar
Bekkerman R, Allan J (2004) Using bigrams in text categorization. Technical Report, Department of Computer Science. University of Massachusetts, Amherst, vol 1003, pp 1–2
Bosch A, Zisserman A, Munoz X (2007) Image classification using random forests and ferns. In: Proceedings of the ICCV
Chang KW, Roth D (2011) Selective block minimization for faster convergence of limited memory large-scale linear models. In: SIGKDD conference on knowledge discovery and data mining. ACM
Csurka G, Dance CR, Fan L, Willamowski J, Bra C (2004) Visual categorization with bags of keypoints. In: International workshop on statistical learning in computer vision
Cummins R, O’Riordan C (2006) Evolving local and global weighting schemes in information retrieval. Inf Retr 9:311–330
Article Google Scholar
Debole F, Sebastiani F (2003) Supervised term-weighting for automated text categorization. In: Proceedings of the 2003 ACM symposium on applied computing, SAC ’03. ACM, New York, pp 784–788
Demsar J (2006) Statistical comparisons of classifiersover multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Deselaers T, Pimenidis L, Ney H (2008) Bag of visual words for adult image classification and filtering. In: Proceedings of the international conference on pattern recognition. IEEE
Djuric N, Lan L, Vucetic S, Wang Z (2013) Budgetedsvm: a toolbox for scalable svm approximations. J Mach Learn Res 14:3813–3817
MathSciNet MATH Google Scholar
Escalante HJ, Garcia M, Morales A, Graff M, Montes M, Morales EF, Martinez J (2015) Term-weighting learning via genetic programming for text classification. Knowl Based Syst 83:176–189
Article Google Scholar
Escalante HJ, Martinez-Carranza J, Escalera S, Ponce-López V, Baró X (2015) Improving bag of visual words representations with genetic programming. In: Proceedings of the 2015 international joint conference on neural networks. IEEE, pp 3674–3681
Escalante HJ, Montes M, Sucar E (2012) Semantic cohesion for image annotation and retrieval. Comput Sist 10(1):121–126
Google Scholar
Escalante HJ, Sucar E, Morales E (2016) A naive bayes baseline for early gesture recognition. Pattern Recogn Lett 73:91–99
Article Google Scholar
Escalera S, Baro X, Gonzalez J, Bautista MA, Madadi M, Reyes M, Ponce V, Escalante HJ, Shotton J, Guyon I (2014) ChaLearn looking at people challenge 2014: dataset and results. In: Proceedings of ECCV—chalearn workshop
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: Proceedings of the IEEE, CVPRW
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
MATH Google Scholar
García-Limón M, Escalante HJ, Montes y Gómez M, Morales A, Morales E (2014) Towards the automated generation of term-weighting schemes for text categorization. In: Procddings of GECCO Comp’14, (Late-breaking abstract), pp 1459–1460
Gonzalez-Gurrola LC, Moreno R, Escalante HJ, Martnez F, Carlos R (2015) Learning roadway surface disruption patterns using the bag of words representation. IEEE transactions on intelligent transportation systems (under review)
Grauman K, Leibe B (2010) Visual object recognition. Morgan and Claypool, San Rafael
Google Scholar
Guyon I, Athitsos V, Jangyodsuk P, Escalante HJ (2014) The Chalearn gesture dataset (CGD 2011). Mach Vis Appl 25(8):1929–1951
Article Google Scholar
Hernández-Vela A, Bautista MA, Perez-Sala X, Ponce-López V, Escalera S, Baró X, Pujol O, Angulo C (2014) Probability-based dynamic time warping and bag-of-visual-and-depth-words for human gesture recognition in rgb-d. Pattern Recognit Lett 50(1):112–121
Article Google Scholar
Hoai M, De la Torre F (2012) Max-margin early event detectors. In: IEEE conference on computer vision and pattern recognition. IEEE, Providence, RI, pp 2863–2870
Hoai M, Lan Z, De la Torre F (2011) Joint segmentation and classification of human actions in video. In: IEEE conference on computer vision and pattern recognition. IEEE, Providence, RI, pp 3265–3272
Huang D, Yao S, Wang Y, De La Torre F (2014) Sequential max-margin event detectors. In: European conference on computer vision
Lan M, Tan CL, Su J, Lu Y (2009) Supervised and traditional term-weighting methods for automatic text categorization. Trans PAMI 31(4):721–735
Article Google Scholar
Langdon WB, Poli R (2001) Foundations of genetic programming. Springer, Berlin
MATH Google Scholar
Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123
Article Google Scholar
Lazebnik S, Schmid C, Ponce J (2004) Semi-local affine parts for object recognition. In: British machine vision conference, pp 779–788
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the computer vision and image processing conference, IEEE, pp 2169–2178
Lazebnik S, Schmid C, Ponce JA (2015) Maximum entropy framework for part-based texture and object recognition. In: IEEE international conference on computer vision, pp 832–838
Lopez-Monroy AP, Montes y Gomez M, Escalante HJ, Cruz-Roa A, Gonzalez FA (2015) Improving the bovw with discriminative n-grams and mkl. Neurocomputing 175:768–781
Article Google Scholar
Luke S, Panait L (2002) Lexicographic parsimony pressure. In: Proceedings of the 2002 genetic and evolutionary computation conference, pp 829–836
Manchala S, Prasad VK, Janaki V (2014) Gmm based language identification system using robust features. Int J Speech Technol 17:99–105
Article Google Scholar
Mirza-Mohammadi M, Escalera S, Radeva P(2009) Contextual-guided bag-of-visual-words model for multi-class object categorization. In: Proceedings of the CAIP. Springer, pp 748–756
Neverova N, Wolf C, Taylor GW, Nebout F (2014) Multi-scale deep learning for gesture detection and localization. In: Proceedings of the ECCV chalearn workshop on looking at people
Saffari A, Guyon I (2006) Quick start guide for clop. Technical report, TU Graz—CLOPINET
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24:513–523
Article Google Scholar
Sebastiani F (2008) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Article Google Scholar
Sidorov G, Gelbukh A, Gomez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: similarity of features in vector space model. Comput Sist 18(3):491–504
Google Scholar
Silva S, Almeida J (2003) Gplab-a genetic programming toolbox for matlab. In: Proceedings of the Nordic MATLAB conference, pp 273–278
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. Int Conf Comput Vis 2:1470–1477
Google Scholar
Tirilly P, Claveau V, Gros P (2009) A review of weighting schemes for bag of visual words image retrieval. Technical report, IRISA
Turney P, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188
MathSciNet MATH Google Scholar
Vedaldi A, Fulkerson B (2010) VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 1469–1472
Wang J, Liu P, She FH, Nahavandi M, Kouzani A (2013) Bag-of-words representation for biomedical time series classification. Biomed Signal Process Control 8(6):634–644
Article Google Scholar
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: IEEE conference on computer vision and pattern recognition. IEEE, Providence, RI, pp 1290–1297
Xia L, Aggarwal JK (2013) Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In: IEEE conference on computer vision and pattern recognition. IEEE, Portland, OR, pp 2834–2841
Yoo SJ (2004) Intelligent multimedia information retrieval for identifying and rating adult images. In: Proceedings of the international conference KES, vol 3213 of LNAI, pp 164–170. Springer
Zhang J, Marszablek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis 73(2):213–238
Article Google Scholar
Zhang K, Lan L, Wang Z, Moerchen F (2012) Scaling up kernel svm on limited resources: A low-rank linearization approach. In: Proceedings of th AISTATS 2012

Download references

Acknowledgments

This work was supported by CONACyT under Project Grant No. CB-2014-241306 (Clasificación y recuperación de imágenes mediante técnicas de minería de textos) and Spanish Ministry of Economy and Competitiveness TIN2013-43478-P. Víctor Ponce-López is supported by Fellowship No. 2013FI-B01037 and Project TIN2012-38187-C03-02.

Author information

Authors and Affiliations

Instituto Nacional de Astrofísica, Óptica y Electrónica, 72840, Puebla, Mexico
Hugo Jair Escalante, Alicia Morales-Reyes & José Martínez-Carranza
Universitat Oberta de Catalunya, Barcelona, Spain
Víctor Ponce-López & Xavier Baró
University of Barcelona, Barcelona, Spain
Víctor Ponce-López, Sergio Escalera & Xavier Baró
Computer Vision Center, Barcelona, Spain
Víctor Ponce-López, Sergio Escalera & Xavier Baró

Authors

Hugo Jair Escalante
View author publications
You can also search for this author in PubMed Google Scholar
Víctor Ponce-López
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Escalera
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Baró
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Morales-Reyes
View author publications
You can also search for this author in PubMed Google Scholar
José Martínez-Carranza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hugo Jair Escalante.

Additional information

This paper is an extended and improved version of [12] and it is being submitted to the Special Issue on Computational Intelligence for Vision and Robotics of the Neural Computing and Applications Journal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Escalante, H.J., Ponce-López, V., Escalera, S. et al. Evolving weighting schemes for the Bag of Visual Words. Neural Comput & Applic 28, 925–939 (2017). https://doi.org/10.1007/s00521-016-2223-x

Download citation

Received: 13 December 2015
Accepted: 11 February 2016
Published: 01 March 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s00521-016-2223-x

Evolving weighting schemes for the Bag of Visual Words

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Bag-of-Words Method with Dictionary Analysis by Evolutionary Algorithm

Visual Dictionary Pruning Using Mutual Information and Information Gain

Novel Distributional Visual-Feature Representations for image classification

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Evolving weighting schemes for the Bag of Visual Words

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Bag-of-Words Method with Dictionary Analysis by Evolutionary Algorithm

Visual Dictionary Pruning Using Mutual Information and Information Gain

Novel Distributional Visual-Feature Representations for image classification

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation