Abstract
In this paper, we propose a novel large margin proximal non-parallel twin support vector machine for binary classification. The significant advantages over twin support vector machine are that the structural risk minimization principle is implemented and by adopting uncommon constraint formulation for the primal problem, the proposed method avoids the computation of the large inverse matrices before training which is inevitable in the formulation of twin support vector machine. In addition, the dual coordinate descend algorithm is used to solve the optimization problems to accelerate the training efficiency. Experimental results exhibit the effectiveness and the classification accuracy of the proposed method.
Supported by the Hainan Provincial Natural Science Foundation of China (No. 118QN181), the Scientific Research Foundation of Hainan University, the National Natural Science Foundation of China (Nos. 11501310, 61703370, and 61603338).
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Support vector machine
- Non-parallel SVM
- Structural risk minimization principle
- Dual coordinate descend algorithm
1 Introduction
As a powerfull tool in machine learning, support vector machines (SVMs) have been gained a great deal of attention in wide variety of fields [1,2,3,4,5]. The classical support vector classifier (SVC) is trying to maximize the margin between two parallel hyperplanes, which results in solving a convex quadratic programming problem (QPP). Furhtermore, some non-parallel hyperplane classifiers such as the generalized eigen-value proximal support vector machine (GEPSVM) and twin support vector machine (TWSVM) have been proposed in [4, 5]. TWSVM is to search two non-parallel proximal hyperplanes such that each hyperplane is closer to one class and as far as possible from the other one. In fact, as an efficient generalization of the classical SVC, TWSVMs have been studied extensively [6,7,8,9,10,11,12,13,14], which need to solve two small QPPs in contrast with the classical SVC. This paper proposes a novel large margin proximal non-parallel support vector machine for binary classification (PNSVM). The main contributions of PNSVM are:
-
PNSVM minimizes the structural risk by imposing a regularization term.
-
PNSVM is proximal both classes and maximizes the corresponding margin, while TWSVM is proximal each class and far away from the other class.
-
PNSVM can be solved efficiently with dual coordinate descend method [15].
2 Related Work
Benefiting from its excellent generalization performance of twin support vector machines, the approaches of constructing the non-parallel hyperplanes have received extensive attention [6, 8, 9, 11,12,13,14, 16]. Shao et al. [6] present a variant of GEPSVM based on the difference measure, which seems to be superior in classification accuracy and computation time. For the imbalanced data classification, Shao et al. [8] suggest an efficient weighted Lagrangian twin support vector machine (WLTSVM), which is robust to outliers and overcomes the bias phenomenon in the original TWSVM. Liu et al. [9] present a support vector machine for large scale regression based on the minimization of deviation distribution. Tian et al. [11] propose a novel nonparallel SVM for binary classification, which implements the structural risk minimization and is suitable for large scale problems. Shao et al. [16] present a sparse \(L_q\)-norm least squares support vector machine, where feature selection and prediction are performed simultaneously.
3 PNSVM
3.1 Linear PNSVM
Linear PNSVM is to find two non-parallel hyperplanes \(f_1(\varvec{x})=\varvec{w}_1^T\varvec{x}+b_1=0\) and \(f_2(\varvec{x})=\varvec{w}_2^T\varvec{x}+b_2=0\) by solving the following two problems:
and
where \(c_i, i=1,2,\cdots ,6\) are parameters.
The formulation given at (1) can be understood as follows: The first term in (1) is the sum of squared distances from \(f(\varvec{x})=\varvec{w}_1\varvec{x}+b_1=0\) to points of positive class, whose minimization means to keep \(f(\varvec{x})=\varvec{w}_1\varvec{x}+b_1=0\) close to the positive class. The minimization of the second term in (1) implies that the structural risk principle is implemented by the term \(\frac{1}{2}c_2(\Vert \varvec{w}_1^2+b_1^2\Vert )\). The second constraints in (1) require the hyperplane \(f(\varvec{x})=\varvec{w}_1^T\varvec{x}+b_1\) to be at a distance of at least 1 and at most \(\rho _1\) from points of negative class. The third term in (1) tries to minimize mis-classification. The last term in (1) requires the points of negative class to be a distance no far away from the hyperplane. A similar interpretation may also be given to the formulation given at (2).
The Lagrangian of the formulation (1) is given by
where \(\varvec{\alpha _1}\in \mathbf {R}^{m_1\times 1}, \varvec{\alpha }_2\in \mathbf {R}^{m_2\times 1},\varvec{\alpha }_3\in \mathbf {R}^{m_2\times 1},\varvec{\alpha }_4\in \mathbf {R}^{m_1\times 1},\beta \in \mathbf {R}\) are the vectors of Lagrange multipliers. The optimality conditions for \(\varvec{w}_1,b_1,\varvec{p},\varvec{q}_1,\rho _1\) are given by
Then substituting (3) and (7) into the Lagrangian, we obtain the dual problem of the problem
where
The dual of the problem (2) is
where
Once the \((\varvec{w}_1,b_1)\) and \((\varvec{w}_2,b_2)\) are obtained from the (9) and (10) by means of the dual coordinate descent method [15], a new input \(\varvec{x}\in \mathbf {R}^n\) is assigned to class \(i~(i=+1,-1)\) by \({Class~i}=arg \min \limits _{k=1,2} \frac{|\varvec{w}_k^T\varvec{x}+b_k|}{\Vert \varvec{w}_k\Vert }.\)
3.2 Kernel PNSVM
Kernel PNSVM searches the following two kernel-generated surfaces instead of hyperplanes \(K(\varvec{x}^T, \varvec{C}^T)\varvec{w}_1+b_1 = 0\) and \(K(\varvec{x}^T, \varvec{C}^T)\varvec{w}_2+b_2 = 0,\) where \(\varvec{C}^T=[\varvec{A}~\varvec{B}]^T\in \mathbf {R}^{n\times \ell }\). The optimization problems are
and
where \(c_i, i=1,\cdots ,6\) are parameters.
4 Experimental Results
In this section, the UCI data sets are chosen to demonstrate the performance of our PNSVM compared with SVC, TWSVM and WLTSVM [13]. The methods are implemented by Matlab 9.0 running on a PC with an Intel(R) Core Duo i7(2.70 GHZ) with 32 GB RAM. Our PNSVM is solved by the dual coordinate descend algorithm and SVC and TWSVM are solved by the optimization toolbox QP in Matlab. The classification accuracy is measured by the standard 10-fold cross-validation.
The classification accuracy, computation time, and optimal values of \(c_1(=c_2),c_4=(c_5)\) and \(c_3\) and \(c_6\) in our PNSVM are listed in Table 1. The parameters from PNSVM, TWSVM, SVC and WLTSVM are searched in the range \(\{2^{2i}|i=-8,\cdots ,8\}\), and the parameters \(c_3\) and \(c_6\) of PNSVM are selected from the set \(\{0.01,0.1,1,10,100\}\). Table 1 shows the comparison results of all four methods. It can be seen from Table 1 that the accuracy of the proposed linear PNSVM is significantly better than that of the linear TWSVM on most of data sets. And our PNSVM is the fastest on most of data sets. Figure 1 exhibits the influence on Accuracy of parameters \(c_3\) and \(c_6\) for linear PNSVM. It can be observed that the choice of \(c_3\) and \(c_6\) affects the results dramatically, which implies that adjusting the parameters \(c_3\) and \(c_6\) is practical selection in real applications. Table 2 is concerned with our kernel PNSVM, TWSVM ans SVC and WLTSVM. The Gaussian kernel \(K(\varvec{x},\varvec{x}')=e^{-\mu \Vert \varvec{x}-\varvec{x}'\Vert ^2}\) is used. The kernel parameter \(\mu \) is also searched from the sets \(\{2^i|i=-8,\cdots ,8\}\). The classification accuracy and computation times for all three methods are also listed in Table 2.
5 Conclusions
For binary classification problem, a novel large margin proximal non-parallel twin support vector machine was proposed in this paper. The main contribution are that the structural risk minimization principle is implemented by introducing a regularization term in the primal problems of our PNSVM and the dual formulation of PNSVM avoids the computation of inverse matrices and speeds up the training efficiency. Experimental results of our PNSVM and SVC and TWSVM and WLTSVM, have been made on several data sets, implying that the proposed method is not only faster but also exhibits better generalization. It should be pointed out that there are six parameters in our PNSVM, so the parameter selection and more efficient algorithm [17] is a practical problem and should be addressed in the future.
References
Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
Deng, N.Y., Tian, Y.J., Zhang, C.H.: Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions. CRC Press, New York (2012)
Noble, W.S.: Support vector machine applications in computational biology. In: Kernel Methods in Computational Biology, pp. 71–92 (2004)
Mangasarian, O.L., Wild, E.W.: Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 69–74 (2006)
Jayadeva, R.K., Khemchandani, R., Chandra, S.: Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 905–910 (2007)
Shao, Y.H., Zhang, C.H., Wang, X.B., Deng, N.Y.: Improvements on twin support vector machines. IEEE Trans. Neural Netw. 22(6), 962–968 (2011)
Shao, Y.H., Wang, Z., Chen, W.J., Deng, N.Y.: A regularization for the projection twin support vector machine. Knowl. Based Syst. 37, 203–210 (2013)
Shao, Y.H., Chen, W.J., Zhang, J.J., Wang, Z., Deng, N.Y.: An efficient weighted Largrangian twin support vector machine for imbalanced data classification. Pattern Recogn. 47(9), 3158–3167 (2014)
Liu, M.Z., Shao, Y.H., Wang, Z., Li, C.N., Chen, W.J.: Minimum deviation distribution machine for large scale regression. Knowl. Based Syst. 146, 167–180 (2018)
Qi, Z.Q., Tian, Y.J., Shi, Y.: Robust twin support vector machine for pattern classification. Patterm Recogn. 46(1), 305–316 (2013)
Tian, Y.J., Qi, Z.Q., Ju, X.C., Shi, Y., Liu, X.H.: Nonparallel support vector machines for pattern classification. IEEE Trans. Cybern. 44(7), 1067–1079 (2014)
Tian, Y.J., Ping, Y.: Large-scale linear nonparallel support vector machine solver. Neural Netw. 50, 166–174 (2014)
Shao, Y.H., Chen, W.J., Wang, Z., Li, C.N., Deng, N.Y.: Weighted linear loss twin support vector machine for large-scale classification. Knowl. Based Syst. 73, 276–288 (2015)
Shao, Y.H., Chen, W.J., Deng, N.Y.: Nonparallel hyperplane support vector machine for binary classification problems. Inf. Sci. 263, 22–35 (2014)
Hsieh, C.J., Chang, K.W., Lin, C.J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, pp. 408–415. ACM (2008)
Shao, Y.H., Li, C.N., Liu, M.Z., Wang, Z., Deng, N.Y.: Sparse \(L_q\)-norm least square support vector machine with feature selection. Pattern Recogn. 78, 167–181 (2018)
Wang, Z., Shao, Y.H., Bai, L., Liu, L.M., Deng, N.Y.: Stochastic gradient twin support vector machine for large scale problems. arXiv preprint arXiv:1704.05596
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Liu, M., Shao, Y. (2018). Large Margin Proximal Non-parallel Support Vector Classifiers. In: Shi, Y., et al. Computational Science – ICCS 2018. ICCS 2018. Lecture Notes in Computer Science(), vol 10862. Springer, Cham. https://doi.org/10.1007/978-3-319-93713-7_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-93713-7_69
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93712-0
Online ISBN: 978-3-319-93713-7
eBook Packages: Computer ScienceComputer Science (R0)