Binary Stochastic Representations for Large Multi-class Classification

Thomas Gerald¹⁸,
Nicolas Baskiotis¹⁸ &
Ludovic Denoyer¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10634))

Included in the following conference series:

International Conference on Neural Information Processing

4757 Accesses
3 Altmetric

Abstract

Classification with a large number of classes is a key problem in machine learning and corresponds to many real-world applications like tagging of images or textual documents in social networks. If one-vs-all methods usually reach top performance in this context, these approaches suffer of a high inference complexity, linear w.r.t. the number of categories. Different models based on the notion of binary codes have been proposed to overcome this limitation, achieving in a sublinear inference complexity. But they a priori need to decide which binary code to associate to which category before learning using more or less complex heuristics. We propose a new end-to-end model which aims at simultaneously learning to associate binary codes with categories, but also learning to map inputs to binary codes. This approach called Deep Stochastic Neural Codes (DSNC) keeps the sublinear inference complexity but do not need any a priori tuning. Experimental results on different datasets show the effectiveness of the approach w.r.t. baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Two-Level Neural Network for Multi-label Document Classification

Binary Classification of Sequences Possessing Unilateral Common Factor with AMS and APR

A Deep Interpretation of Classifier Chains

Notes

1.
In practice, a code of size $k \log K$ is needed with k ranging between 10 and 20.
2.
The code size denotes in this case the number of hidden units.
3.
One-versus-all algorithm has been tested with results similar to the best MLP score.
4.
All the models share the same encoding complexity (fowarding the input to the hidden layer, discrete or not).

References

Partalas, I., Kosmopoulos, A., Baskiotis, N., Artieres, T., Paliouras, G., Gaussier, E., Androutsopoulos, I., Amini, M.R., Galinari, P.: Large scale hierarchical text classification challenge : a benchmark for large-scale text classification. arXiv:1503.08581v1 (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Good practice in large-scale learning for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 507–520 (2014)
Article Google Scholar
Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv:1308.3432 (2013)
Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi-class tasks. Adv. Neural Inf. Process. Syst. 23, 163–171 (2010)
Google Scholar
Weston, J., Makadia, A., Yee, H.: Label partitioning for sublinear ranking. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), vol. 28, pp. 181–189 (2013)
Google Scholar
Puget, R., Baskiotis, N.: Hierarchical label partitioning for large scale classification. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–10 (2015)
Google Scholar
Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263–286 (1995)
MATH Google Scholar
Zhong, G., Cheriet, M.: Adaptive error-correcting output codes. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI 2013), pp. 1932–1938. AAAI Press (2013)
Google Scholar
Cissé, M., Artières, T., Gallinari, P.: Learning compact class codes for fast inference in large multi class classification. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012. LNCS, vol. 7523, pp. 506–520. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33460-3_38
Chapter Google Scholar
Norouzi, M., Punjani, A., Fleet, D.J.: Fast exact search in hamming space with multi-index hashing. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1107–1119 (2014)
Article Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases (VLDB 1999), pp. 518–529. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. Adv. Neural Inf. Process. Syst. 21, 1753–1760 (2009)
Google Scholar
Salakhutdinov, R., Hinton, G.: Semantic hashing. Int. J. Approx. Reason. 50(7), 969–978 (2009)
Article Google Scholar
Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. arXiv:1504.03410 (2015)
Do, T.-T., Doan, A.-D., Cheung, N.-M.: Learning to hash with binary deep neural network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 219–234. Springer, Cham (2016). doi:10.1007/978-3-319-46454-1_14
Chapter Google Scholar
Wang, J., Zhang, T., Sebe, N., Shen, H.T., et al.: A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. (2017, to appear)
Google Scholar
Erin Liong, V., Lu, J., Wang, G., Moulin, P., Zhou, J.: Deep hashing for compact binary codes learning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229–256 (1992)
MATH Google Scholar
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., Herrera, F.: Dynamic classifier selection for one-vs-one strategy: avoiding non-competent classifiers. Pattern Recognit. 46(12), 3412–3424 (2013)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv:1512.03385 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)

Download references

Acknowledgments

This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-2015-CRG4-2639.

Author information

Authors and Affiliations

Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, Paris, France
Thomas Gerald, Nicolas Baskiotis & Ludovic Denoyer

Authors

Thomas Gerald
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Baskiotis
View author publications
You can also search for this author in PubMed Google Scholar
Ludovic Denoyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Gerald .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gerald, T., Baskiotis, N., Denoyer, L. (2017). Binary Stochastic Representations for Large Multi-class Classification. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-70087-8_17
Published: 24 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70086-1
Online ISBN: 978-3-319-70087-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Binary Stochastic Representations for Large Multi-class Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Two-Level Neural Network for Multi-label Document Classification

Binary Classification of Sequences Possessing Unilateral Common Factor with AMS and APR

A Deep Interpretation of Classifier Chains

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Binary Stochastic Representations for Large Multi-class Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Two-Level Neural Network for Multi-label Document Classification

Binary Classification of Sequences Possessing Unilateral Common Factor with AMS and APR

A Deep Interpretation of Classifier Chains

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation