Support vector machine in big data: smoothing strategy and adaptive distributed inference

Kangning Wang¹,
Jin Liu¹ &
Xiaofei Sun¹

275 Accesses
Explore all metrics

Abstract

Support vector machine (SVM) is a powerful binary classification tool, but the growing size of modern data is bringing challenges to it. First, the non-smoothness of hinge loss poses difficulties in large-scale computation. Second, the existing large-scale distributed algorithms heavily rely on uniformity and randomness conditions, which are frequently violated in practice. To solve these issues, we first construct a convolution smoothing SVM, which enjoys a smooth and convex objective function. Then a distributed SVM is developed, in which the estimator can be calculated conveniently by minimizing a pilot sample-based distributed surrogate loss. In particular, it can be adaptive when the uniformity or randomness condition is violated. The established theoretical results and numerical experiments on both synthetic and real data all confirm the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolution smoothing and online updating estimation for support vector machine

Article 06 December 2024

Byzantine-robust distributed support vector machine

Article 05 September 2024

Bias-corrected support vector machine with Gaussian kernel in high-dimension, low-sample-size settings

Article 15 July 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

No datasets were generated or analysed during the current study.

References

Blanchard, G., Bousquet, O., Massart, P.: Statistical performance of support vector machines. Ann. Stat. 36, 489–531 (2008)
MathSciNet Google Scholar
Battey, H., Fan, J., Liu, H., Lu, J., Zhu, Z.: Distributed testing and estimation under sparse high dimensional models. Ann. Stat. 46, 1352–1382 (2018)
MathSciNet Google Scholar
Chang, K., Hsieh, C., Lin, C.: Coordinate descent method for large scale $l_{2}$-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)
MathSciNet Google Scholar
Chen, L., Zhou, Y.: Quantile regression in big data: a divide and conquer based strategy. Comput. Stat. Data Anal. 144, 106892 (2020)
MathSciNet Google Scholar
Chen, X., Liu, W., Zhang, Y.: Quantile regression under memory constraint. Ann. Stat. 47, 3244–3273 (2019)
MathSciNet Google Scholar
Chen, X., Xie, M.: A split-and-conquer approach for analysis of extraordinarily large data. Stat. Sin. 24, 1655–1684 (2014)
MathSciNet Google Scholar
Chen, B., Harker, P.: Smooth approximations to nonlinear complementarity problems. SIAM J. Optimiz. 7, 403–420 (1997)
MathSciNet Google Scholar
Chen, C., Mangasarian, O.: Smoothing methods for convex inequalities and linear complementarity problems. Math. Program. 71, 51–69 (1995)
MathSciNet Google Scholar
Chen, X., Ye, Y.: On homotopy-smoothing methods for variational inequalities. SIAM J. Control. Optim. 37, 589–616 (1999)
MathSciNet Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Google Scholar
Cervantes, J., Garcia-Lamont, F., Rodriguez-Mazahua, L., Lopez, A.: A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408, 189–215 (2020)
Google Scholar
Fan, J., Guo, Y., Wang, K.: Communication-efficient accurate statistical estimation, (2019). arXiv: 1906.04870
Fan, J., Wang, D., Wang, K., Zhu, Z.: Distributed estimation of principal eigenspaces, (2017). arXiv: 1702.06488
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Google Scholar
Gopal, S., Yang, Y.: Distributed training of large-scale logistic models. In: International Conference on Machine Learning, 289–297 (2013)
Horowitz, J.: Bootstrap methods for median regression models. Econometrica 66, 1327–1351 (1998)
MathSciNet Google Scholar
Huang, C., Huo, X.: A distributed one-step estimator. Math. Program. 174, 41–76 (2019)
MathSciNet Google Scholar
Jordan, M.I., Lee, J.D., Yang, Y.: Communication-efficient distributed statistical inference. J. Am. Stat. Assoc. 14, 668–681 (2019)
MathSciNet Google Scholar
Koo, J., Lee, Y., Kim, Y., Park, C.: A bahadur representation of the linear support vector machine. J. Mach. Learn. Res. 9, 1343–1368 (2008)
MathSciNet Google Scholar
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)
MathSciNet Google Scholar
Koenker, R.: Quantile regression. Cambridge University Press, Cambridge (2005)
Google Scholar
Lee, Y., Mangasarian, O.: SSVM: a smooth support vector machine for classification. Comput. Optim. Appl. 20, 5–22 (2001)
MathSciNet Google Scholar
Lepski, O., Mammen, E., Spokoiny, V.: Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Stat. 25, 929–947 (1997)
MathSciNet Google Scholar
Lian, H., Fan, Z.: Divide-and-conquer for debiased l1-norm support vector machine in ultra-high dimensions. J. Mach. Learn. Res. 18, 6691–6716 (2017)
Google Scholar
Liu, Y., Zhang, H., Park, C., Ahn, J.: Support vector machines with adaptive $l_{q}$ penalty. Comput. Stat. Data Anal. 51, 6380–6394 (2007)
Google Scholar
Luo, L., Song, P.: Renewable estimation and incremental inference in generalized linear models with streaming data sets. J. Roy. Stat. Soc. B 82, 69–97 (2020)
MathSciNet Google Scholar
Nemirovski, A., Yudin, D.: Problem complexity and method efficiency in optimization. Wiley, New York (1983)
Google Scholar
Peng, B., Wang, L., Wu, Y.: An error bound for $l_{1}$-norm support vector machine coefficients in ultra-high dimension. J. Mach. Learn. Res. 17, 1–26 (2016)
Google Scholar
Pan, R., Ren, T., Guo, B., Li, F., Li, G., Wang, H.: A note on distributed quantile regression by pilot sampling and one-step updating. J. Bus. Econ. Stat. 40, 1691–1700 (2022)
MathSciNet Google Scholar
Park, C., Kim, K., Myung, R., Koo, J.: Oracle properties of scad-penalized support vector machine. J. Stat. Plan. Inference 142, 2257–2270 (2012)
MathSciNet Google Scholar
Scovel, J., Steinwart, I.: Fast rates for support vector machines using gaussian kernels. Ann. Stat. 35, 575–607 (2007)
MathSciNet Google Scholar
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14, 567–599 (2013)
MathSciNet Google Scholar
Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155, 105–145 (2016)
MathSciNet Google Scholar
Shamir, O., Srebro, N., Zhang, T.: Communication-efficient distributed optimization using an approximate newton-type method. Int Conf. Mach. Learn. 32, 1000–1008 (2014)
Google Scholar
Steinwart, I.: Consistency of support vector machines and other regularized kernel machines. IEEE Trans. Inf. Theory 51, 128–142 (2005)
Google Scholar
Sun, G., Wang, X., Yan, Y., Zhang, R.: Statistical inference and distributed implementation for linear multi-category SVM. Stat 12, e611 (2023)
Vapnik, V.: The nature of statistical learning theory. Springer, New York (1996)
Google Scholar
Wang, X., Yang, Z., Chen, X., Liu, W.: Distributed inference for linear support vector machine. J. Mach. Learn. Res. 20, 1–41 (2019)
MathSciNet Google Scholar
Wang, F., Zhu, Y., Huang, D., Qi, H., Wang, H.: Distributed one-step upgraded estimation for non-uniformly and non-randomly distributed data. Comput. Stat. Data Anal. 162, 107265 (2021)
MathSciNet Google Scholar
Wang, K., Li, S.: Distributed statistical optimization for non-randomly stored big data with application to penalized learning. Stat. Comput. 33, 73 (2023)
MathSciNet Google Scholar
Wang, K., Wang, H., Li, S.: Renewable quantile regression for streaming datasets. Knowl.-Based Syst. 235, 107675 (2022)
Google Scholar
Wang, G., Zhang, G., Choi, K., Lam, K., Lu, J.: Output based transfer learning with least squares support vector machine and its application in bladder cancer prognosis. Neurocomputing 387, 279–292 (2020)
Google Scholar
Wang, K., Yang, J., Polat, K., Alhudhaif, A., Sun, X.: Convolution smoothing and non-convex regularization for support vector machine in high dimensions. Appl. Soft Comput. 155, 111433 (2024)
Google Scholar
Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–84 (2004)
MathSciNet Google Scholar
Zhang, Y., Duchi, J., Wainwright, M.: Divide and conquer kernel ridge regression: a distributed algorithm with minimax optimal rates. J. Mach. Learn. Res. 16, 3299–3340 (2015)
MathSciNet Google Scholar
Zhang, X., Wu, Y., Wang, L., Li, R.: Variable selection for support vector machine in moderately high dimensions. J. R. Stat. Soc. Ser. B 78, 53–76 (2016)
MathSciNet Google Scholar
Zhu, X., Li, F., Wang, H.: Least squares approximation for a distributed system, (2019). arXiv preprint arXiv: 1908.04904
Zhao, T., Cheng, G., Liu, H.: A partially linear framework for massive heterogeneous data. Ann. Stat. 44, 1400–1437 (2016)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Statistics, Shandong Technology and Business University, Yantai, China
Kangning Wang, Jin Liu & Xiaofei Sun

Authors

Kangning Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Sun
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Kangning Wang and Xiaofei Sun developed the methods and wrote the main manuscript text. Jin Liu prepared numerical studies. All authors reviewed the manuscript.

Corresponding author

Correspondence to Xiaofei Sun.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research was supported by NNSF project of China (12401355) and Humanity and Social Science Foundation of Ministry of Education of China (24YJC910009).

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 2412 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, K., Liu, J. & Sun, X. Support vector machine in big data: smoothing strategy and adaptive distributed inference. Stat Comput 34, 188 (2024). https://doi.org/10.1007/s11222-024-10506-5

Download citation

Received: 03 June 2024
Accepted: 15 September 2024
Published: 26 September 2024
DOI: https://doi.org/10.1007/s11222-024-10506-5

Support vector machine in big data: smoothing strategy and adaptive distributed inference

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Convolution smoothing and online updating estimation for support vector machine

Byzantine-robust distributed support vector machine

Bias-corrected support vector machine with Gaussian kernel in high-dimension, low-sample-size settings

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 2412 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Support vector machine in big data: smoothing strategy and adaptive distributed inference

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Convolution smoothing and online updating estimation for support vector machine

Byzantine-robust distributed support vector machine

Bias-corrected support vector machine with Gaussian kernel in high-dimension, low-sample-size settings

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 2412 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now