Simple ConvNet Based on Bag of MLP-Based Local Descriptors

Takumi Kobayashi⁹,
Hidenori Ide⁹ &
Kenji Watanabe⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1142))

Included in the following conference series:

International Conference on Neural Information Processing

Abstract

Deep convolutional neural network (ConvNet) is applied to versatile image recognition tasks with great success, though demanding high computation cost. Toward efficient computation, we propose a simple ConvNet architecture based on local descriptors in the bag-of-features framework. The local descriptors are formulated in a simple form of MLP and thus are efficiently computed on various ROI in a flexible manner. The proposed method is effectively trained in an end-to-end manner by reformulating the MLP descriptor into the form of deep ConvNet stacking convolution layers linearly. Through projection-based visual word encoding, the local descriptors are aggregated and fed into a classifier for image recognition tasks, which enables us to compute the network forwarding pass by matrix-vector multiplication. In the experiments on image classification, the proposed method is analyzed thoroughly, exhibiting favorable generalization performance on various tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

From Local Binary Patterns to Pixel Difference Networks for Efficient Visual Representation Learning

Towards Reversal-Invariant Image Representation

Article 28 November 2016

AFSRNet: learning local descriptors with adaptive multi-scale feature fusion and symmetric regularization

Article 20 April 2024

References

The PASCAL Visual Object Classes Challenge 2007 (VOC2007). http://www.pascal-network.org/challenges/VOC/voc2007/index.html
Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)
Google Scholar
Azizpour, H., Razavian, A.S., Sullivan, J., Maki, A., Carlsson, S.: Factors of transferability for a generic convnet representation. PAMI 38(9), 1790–1802 (2016)
Article Google Scholar
Bracewell, R.N.: The Fourier Transform and Its Applications. McGraw-Hill, New York (1999)
MATH Google Scholar
Cimpoi, M., Maji, S., Vedaldi, A.: Deep filter banks for texture recognition and segmentation. In: CVPR, pp. 3828–3836 (2015)
Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical report 7694, Caltech (2007)
Google Scholar
Kawaguchi, K.: Deep learning without poor local minima. In: NIPS, pp. 586–594 (2016)
Google Scholar
Kestur, S., Davis, J.D., Chung, E.S.: Towards a universal FPGA matrix-vector multiplication architecture. In: FCCM, pp. 9–16 (2012)
Google Scholar
Kobayashi, T.: Dirichlet-based histogram feature transform for image classification. In: CVPR, pp. 3278–3285 (2014)
Google Scholar
Kobayashi, T.: Analyzing filters toward efficient convnets. In: CVPR, pp. 5619–5628 (2018)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, L.J., Fei-Fei, L.: What, where and who? Classifying events by scene and object recognition. In: ICCV (2007)
Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network. In: ICLR (2014)
Google Scholar
Lowe, D.G.: Distinctive image features from scale invariant features. IJCV 60, 91–110 (2004)
Article Google Scholar
Mohedano, E., McGuinness, K., O’Connor, N.E., Salvador, A., Marques, F., Giro-i-Nieto, X.: Bags of local convolutional features for scalable instance search. In: ICMR, pp. 327–331 (2016)
Google Scholar
Nair, V., Hinton, G.: Rectified linear units improve restricted Boltzmann machines. In: ICML, pp. 807–814 (2010)
Google Scholar
Paulin, M., Douze, M., Harchaoui, Z., Mairal, J., Perronnin, F., Schmid, C.: Local convolutional features with unsupervised training for image retrieval. In: ICCV, pp. 91–99 (2015)
Google Scholar
Perronnin, F., Larlus, D.: Fisher vectors meet neural networks: a hybrid classification architecture. In: CVPR, pp. 3743–3752 (2015)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR, pp. 77–85 (2017)
Google Scholar
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR, pp. 413–420 (2009)
Google Scholar
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. IJCV 105(3), 222–245 (2013)
Article MathSciNet Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNETV2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)
Google Scholar
Sharan, L., Rosenholtz, R., Adelson, E.: Material perception: what can you see in a brief glance? J. Vis. 9(8), 784 (2009)
Article Google Scholar
Simo-Serra, E., Trulls, E., Ferraz, L., Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: ICCV, pp. 118–126 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Vedaldi, A., Lenc, K.: MatConvNet - convolutional neural networks for MATLAB. In: ACM MM (2015)
Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale scene recognition from abbey to zoo. In: CVPR (2010)
Google Scholar
Zhang, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: CVPR, pp. 6848–6856 (2018)
Google Scholar
Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. PAMI 40(5), 1224–1244 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Advanced Industrial Science and Technology, Umezono 1-1-1, Tsukuba, Ibaraki, Japan
Takumi Kobayashi, Hidenori Ide & Kenji Watanabe

Authors

Takumi Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Hidenori Ide
View author publications
You can also search for this author in PubMed Google Scholar
Kenji Watanabe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takumi Kobayashi .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kobayashi, T., Ide, H., Watanabe, K. (2019). Simple ConvNet Based on Bag of MLP-Based Local Descriptors. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1142. Springer, Cham. https://doi.org/10.1007/978-3-030-36808-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-36808-1_23
Published: 05 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36807-4
Online ISBN: 978-3-030-36808-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Simple ConvNet Based on Bag of MLP-Based Local Descriptors

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

From Local Binary Patterns to Pixel Difference Networks for Efficient Visual Representation Learning

Towards Reversal-Invariant Image Representation

AFSRNet: learning local descriptors with adaptive multi-scale feature fusion and symmetric regularization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Simple ConvNet Based on Bag of MLP-Based Local Descriptors

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

From Local Binary Patterns to Pixel Difference Networks for Efficient Visual Representation Learning

Towards Reversal-Invariant Image Representation

AFSRNet: learning local descriptors with adaptive multi-scale feature fusion and symmetric regularization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation