Abstract
Multi-label classification has aroused extensive attention in various fields. With the emergence of high-dimensional label space, academia has devoted to performing label embedding in recent years. Whereas current embedding approaches do not take feature space correlation sufficiently into consideration or require an encoding function while learning embedded space. Besides, few of them can be spread to track the missing labels. In this paper, we propose a Label Embedding method via Dependence Maximization (LEDM), which obtains the latent space on which the label and feature information can be embedded simultaneously. To end this, the low-rank factorization model on the label matrix is applied to exploit label correlations instead of the encoding process. The dependence between feature space and label space is increased by the Hilbert–Schmidt independence criterion to facilitate the predictability. The proposed LEDM can be easily extended the missing labels in learning embedded space at the same time. Comprehensive experimental results on data sets validate the effectiveness of our approach over the state-of-art methods on both complete-label and missing-label cases.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Katakis I, Tsoumakas G, Vlahavas I (2008) Multilabel text classification for automated tag suggestion. In: Proceedings of the ECML/PKDD’18
Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Adv Neural Inf Proces Syst 14:681–687
Kong D, Ding CHQ, Huang H, Zhao H (2012) Multi-label relieff and f-statistic feature selections for image annotation. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2352–2359
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. IJDWM 3(3):1–13
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771
Tsoumakas G, Vlahavas IP (2007) Random \(k\)-labelsets: an ensemble method for multilabel classification. In: European conference on machine learning, pp 406–417
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Zhang M, Zhou Z (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048
Yoav F, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(1612):771–780
Hsu DJ, Kakade S, Langford J, Zhang T (2009) Multi-label prediction via compressed sensing. In: Advances in neural information processing systems, pp 772–780
Tai F, Lin H (2012) Multilabel classification with principal label space transformation. Neural Comput 24(9):2508–2542
Chen Y, Lin H (2012) Feature-aware label space dimension reduction for multi-label classification. In: Advances in neural information processing systems, pp 1538–1546
Huang K, Lin H (2017) Cost-sensitive label embedding for multi-label classification. Mach Learn 106(9–10):1725–1746
Lin Z, Ding G, Han J, Shao L (2018) End-to-end feature-aware label space encoding for multilabel classification with many classes. IEEE Trans Neural Netw Learn Syst 29(6):2472–2487
Sun Y, Zhang Y, Zhou Z (2010) Multi-label learning with weak label. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence
Gao N, Huang S, Chen S (2016) Multi-label active learning by model guided distribution matching. Front Comput Sci 10(5):845–855
Wu B, Jia F, Liu W, Ghanem B, Lyu S (2018) Multi-label learning with missing labels using mixed dependency graphs. Int J Comput Vis 126(8):875–896
Bucak SS, Jin R, Jain AK (2011) Multi-label learning with incomplete class assignments. In: The 24th IEEE conference on computer vision and pattern recognition, pp 2801–2808
Chen G, Song Y, Wang F, Zhang C (2008) Semi-supervised multi-label learning by solving a Sylvester equation. In: Proceedings of the SIAM international conference on data mining, pp 410–419
Liu B, Li Y, Xu Z (2018) Manifold regularized matrix completion for multi-label learning with ADMM. Neural Netw 101:57–67
Wu B, Liu Z, Wang S, Hu B, Ji Q (2014) Multi-label learning with missing labels. In: 22nd international conference on pattern recognition, pp 1964–1968
Yu H, Jain P, Kar P, Dhillon IS (2014) Large-scale multi-label learning with missing labels. In: Proceedings of the 31th international conference on machine learning, pp 593–601
Xu C, Tao D, Xu C (2016) Robust extreme multi-label learning. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1275–1284
Ji S, Ye J (2009) An accelerated gradient method for trace norm minimization. In: Proceedings of the 26th annual international conference on machine learning, pp 457–464
Cai J, Candès EJ, Shen Z (2010) A singular value thresholding algorithm for matrix completion. SIAM J Optim 20(4):1956–1982
Zhu Y, Kwok JT, Zhou Z (2018) Multi-label learning with global and local label correlation. IEEE Trans Knowl Data Eng 30(6):1081–1094
Guo B, Hou C, Shan J, Yi D (2018) Low rank multi-label classification with missing labels. In: 24th international conference on pattern recognition, pp 417–422
Xu M, Jin R, Zhou Z (2013) Speedup matrix completion with side information: application to multi-label learning. In: Advances in neural information processing systems, pp 2301–2309
Xu L, Wang Z, Shen Z, Wang Y, Chen E (2014) Learning low-rank label correlations for multi-label classification with missing labels. In: 2014 IEEE international conference on data mining, pp 1067–1072
Zhao F, Guo Y (2015) Semi-supervised multi-label learning with incomplete labels. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, pp 4062–4068
Yang H, Zhou JT, Cai J (2016) Improving multi-label learning with missing labels by structured semantic correlations. In: 14th European conference on computer vision—ECCV 2016, pp 835–851
Ren W, Zhang L, Jiang B, Wang Z, Guo G, Liu G (2017) Robust mapping learning for multi-view multi-label classification with missing labels. In: 10th international conference on knowledge science, engineering and management, pp 543–551
Koren Y, Bell RM, Volinsky C (2009) Matrix factorization techniques for recommender systems. IEEE Comput 42(8):30–37. https://doi.org/10.1109/MC.2009.263
Wen Z, Yin W, Zhang Y (2012) Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math Program Comput 4(4):333–361
Song L, Smola AJ, Gretton A, Borgwardt KM, Bedo J (2007) Supervised feature selection via dependence estimation. In: Proceedings of the twenty-fourth international conference on machine learning, pp 823–830
Fukumizu K, Bach FR, Jordan MI (2004) Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. J Mach Learn Res 5:73–99
Yamanishi Y, Vert JP, Kanehisa M (2004) Heterogeneous data comparison and gene selection with kernel canonical correlation analysis. In: Kernel methods in computational biology, pp 209–229
Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48
Gretton A, Herbrich R, Smola AJ (2003) The kernel mutual information. In: 2003 IEEE international conference on acoustics, pp 880–884
Gretton A, Bousquet O, Smola AJ, Schölkopf B (2005) Measuring statistical dependence with Hilbert–Schmidt norms. In: 16th international conference on algorithmic learning theory, pp 63–77
Gretton A, Fukumizu K, Teo CH, Song L, Schölkopf B, Smola AJ (2007) A kernel statistical test of independence. Adv Neural Inf Process Syst 20:585–592
Zhang X, Song L, Gretton A, Smola AJ (2008) Kernel measures of independence for non-iid data. In: Proceedings of the twenty-second annual conference on advances in neural information processing systems, Vancouver, British Columbia, Canada, 8–11 December 2008, vol 21, pp 1937–1944
Devroye L, Györfi L, Lugosi G (2013) A probabilistic theory of pattern recognition, vol 31. Springer, Berlin
Wicker J, Pfahringer B, Kramer S (2012) Multi-label classification using boolean matrix decomposition. In: Proceedings of the ACM symposium on applied computing, pp 179–186
Han S, Cao Q, Han M (2012) Parameter selection in SVM with RBF kernel function. World Autom Congr 2012:1–4
Lu Z, Ip HH, Peng Y (2011) Exhaustive and efficient constraint propagation: a semi-supervised learning perspective and its applications. CoRR arXiv:1109.4684
Pacharawongsakda E, Theeramunkong T (2012) Towards more efficient multi-label classification using dependent and independent dual space reduction. In: 16th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 383–394
Han Y, Wu F, Jia J, Zhuang Y, Yu B (2010) Multi-task sparse discriminant analysis (NtSDA) with overlapping categories. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence
Lehoucq RB, Sorensen DC (1996) Deflation techniques for an implicitly restarted Arnoldi iteration. SIAM J Matrix Anal Appl 17(4):789–821. https://doi.org/10.1137/S0895479895281484
Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Zhou Z, Zhang M (2017) Multi-label learning. Springer US, New York, pp 875–881
Cao L, Xu J (2015) A label compression coding approach through maximizing dependence between features and labels for multi-label classification. In: 2015 International joint conference on neural networks, pp 1–8
Acknowledgements
This work was supported by the National Natural Science Foundation of China (61573266).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Y., Yang, Y. Label Embedding for Multi-label Classification Via Dependence Maximization. Neural Process Lett 52, 1651–1674 (2020). https://doi.org/10.1007/s11063-020-10331-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-020-10331-7