Online Feature Selection of Class Imbalance via PA Algorithm

Chao Han¹,
Yun-Kun Tan¹,
Jin-Hui Zhu¹,
Yong Guo¹,
Jian Chen¹ &
…
Qing-Yao Wu¹

213 Accesses
8 Citations
Explore all metrics

Abstract

Imbalance classification techniques have been frequently applied in many machine learning application domains where the number of the majority (or positive) class of a dataset is much larger than that of the minority (or negative) class. Meanwhile, feature selection (FS) is one of the key techniques for the high-dimensional classification task in a manner which greatly improves the classification performance and the computational efficiency. However, most studies of feature selection and imbalance classification are restricted to off-line batch learning, which is not well adapted to some practical scenarios. In this paper, we aim to solve high-dimensional imbalanced classification problem accurately and efficiently with only a small number of active features in an online fashion, and we propose two novel online learning algorithms for this purpose. In our approach, a classifier which involves only a small and fixed number of features is constructed to classify a sequence of imbalanced data received in an online manner. We formulate the construction of such online learner into an optimization problem and use an iterative approach to solve the problem based on the passive-aggressive (PA) algorithm as well as a truncated gradient (TG) method. We evaluate the performance of the proposed algorithms based on several real-world datasets, and our experimental results have demonstrated the effectiveness of the proposed algorithms in comparison with the baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Longadge R, Dongre S S, Malik L. Class imbalance problem in data mining: Review. International Journal of Computer Science and Network, 2013, 2(1): 1305-1707.
Google Scholar
Dash M, Liu H. Feature selection for classification. Intelligent Data Analysis, 1997, 1(1/2/3/4): 131–156.
3] Mladenic D, Grobelnik M. Feature selection for unbalanced class distribution and Naive Bayes. In Proc. the 16th Int. Conf. Machine Learning, June 1999, pp.258-267.
Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 3: 1157-1182.
MATH Google Scholar
Wasikowski M, Chen X. Combating the small sample class imbalance problem using feature selection. IEEE Trans. Knowl. Data Eng., 2010, 22(10): 1388-1400.
Article Google Scholar
Hoi S C H, Wang J, Zhao P, Jin R. Online feature selection for mining big data. In Proc. the 1st Int. Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, August 2012, pp.93-100.
Wang J, Zhao P, Hoi S C H, Jin R. Online feature selection and its application. IEEE Trans. Knowl. Data Eng., 2014, 26(3): 698-710.
Article Google Scholar
Forman G. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 2003, 3: 1289-1305.
MATH Google Scholar
Zheng Z, Wu X, Srihari R K. Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 80-89.
Article Google Scholar
Chawla N V, Japkowicz N, Kotcz A. Editorial: Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 1-6.
Article Google Scholar
Langford J, Li L, Zhang T. Sparse online learning via truncated gradient. Journal of Machine Learning Research, 2009, 10: 777-801.
MathSciNet MATH Google Scholar
Chawla N V, Bowyer K W, Hall L O, Philip Kegelmeyer W. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2011, 16(1): 321-357.
Maldonado S, Weber R, Famili F. Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines. Information Sciences: an International Journal, 2014, 286: 228-246.
Article Google Scholar
Tax D M J, Duin R P W. Support vector data description. Machine Learning, 2004, 54(1): 45–66.
Article MATH Google Scholar
Zhou Z, Liu X. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng., 2006, 18(1): 63–77.
Article Google Scholar
Zhao P, Hoi S C H. Cost-sensitive online active learning with application to malicious URL detection. In Proc. the 19th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Aug. 2013, pp.919-927.
Wang J, Zhao P, Hoi S C H. Cost-sensitive online classification. IEEE Trans. Knowl. Data Eng., 2014, 26(10): 2425-2438.
Article Google Scholar
Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004, 5: 1205-1224.
MathSciNet MATH Google Scholar
Chen X, Jeong J C. Minimum reference set based feature selection for small sample classification. In Proc. the 24th Int. Conf. Machine Learning, June 2007, pp.153-160.
Wu Q, Ye Y, Zhang H, Ng M K, Ho S S. ForesTexter: An efficient random forest algorithm for imbalanced text categorization. Knowledge-Based Systems, 2014, 67: 105-116.
Article Google Scholar
Wu Q, Ye Y, Liu Y, Ng M K. SNP selection and classification of genome-wide SNP data using stratified sampling random forests. IEEE Trans. Nanobioscience, 2012, 11(3): 216-227.
Article Google Scholar
Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 1988, 65(6): 386-408.
Article Google Scholar
Freund Y, Schapire R E. Large margin classification using the perceptron algorithm. Machine Learning, 1999, 37(3): 277-296.
Article MATH Google Scholar
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y. Online passive-aggressive algorithms. Journal of Machine Learning Research, 2006, 7: 551-585.
MathSciNet MATH Google Scholar
Kubat M, Matwin S. Addressing the curse of imbalanced training sets: One-sided selection. In Proc. 14th Int. Conf. Machine Learning, April 1997, pp.179-186.

Download references

Author information

Authors and Affiliations

School of Software Engineering, South China University of Technology, Guangzhou, 510000, China
Chao Han, Yun-Kun Tan, Jin-Hui Zhu, Yong Guo, Jian Chen & Qing-Yao Wu

Authors

Chao Han
View author publications
You can also search for this author in PubMed Google Scholar
Yun-Kun Tan
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Hui Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qing-Yao Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jian Chen or Qing-Yao Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, C., Tan, YK., Zhu, JH. et al. Online Feature Selection of Class Imbalance via PA Algorithm. J. Comput. Sci. Technol. 31, 673–682 (2016). https://doi.org/10.1007/s11390-016-1656-0

Download citation

Received: 06 March 2016
Revised: 18 May 2016
Published: 08 July 2016
Issue Date: July 2016
DOI: https://doi.org/10.1007/s11390-016-1656-0

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Effective Imbalance Learning Utilizing Informative Data

Online Feature Selection Based on Passive-Aggressive Algorithm with Retaining Features

Online Multi-label Feature Selection on Imbalanced Data Sets

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Online Feature Selection of Class Imbalance via PA Algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Effective Imbalance Learning Utilizing Informative Data

Online Feature Selection Based on Passive-Aggressive Algorithm with Retaining Features

Online Multi-label Feature Selection on Imbalanced Data Sets

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation