Abstract
The hierarchical classification with an imbalance class problem is a challenge for in machine learning, and is caused by data with an uneven distribution. Learning from an imbalanced dataset can lead to performance degradation of the classifier. Cost-sensitive learning is a useful solution for handling the gap probability of majority and minority classes. This paper proposes a cost-sensitive hierarchical classification for imbalance classes (CSHCIC), constructing a cost-sensitive factor to balance the relationship between majority and minority classes. First, we divide a large hierarchical classification task into several small subclassification tasks by class hierarchy. Second, we establish a cost-sensitive factor by more precisely using the number of different samples of subclassifications. Then, we calculate the probability of every node using logistic regression. Lastly, we update the cost-sensitive factor using the flexibility factor and the number of samples. The experimental results show that the cost-sensitive hierarchical classification method achieves excellent performance on handling imbalance class datasets. The running time cost of the proposed method is smaller than most state-of-the-art methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Datasets and Matlab code in this research have been uploaded to GitHub. They are accessible by the following link: https://github.com/fhqxa//APIN-D-19-01226.
References
Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. Acm Sigkdd Explor Newslett 6(1):20–29
Braytee A, Wei L, Kennedy P (2016) A cost-sensitive learning strategy for feature extraction from imbalanced data. In: International conference on neural information processing
Cao P, Zhao D, Zaiane O (2013) An optimized cost-sensitive SVM for imbalanced data learning. In: Pacific-Asia conference on knowledge discovery and data mining
Chung Y, Lin H, Yang S (2015) Cost-aware pre-training for multiclass cost-sensitive deep learning. Computer Science
Ding C, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4):349–358
Duda R, Hart P, Stork D (2001) Pattern classification
Fan J, Zhang J, Mei K, Peng J, Gao L (2015) Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection. Pattern Recogn 48(5):1673–1687
Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Disc 1(3):291–316
Grimaudo L, Mellia M, Baralis E (2012) Hierarchical learning for fine grained internet traffic classification. In: International wireless communications and mobile computing conference
He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study
Kai M (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14 (3):659–665
Khan S, Hayat M, Bennamoun M, Sohel F, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573– 3587
Kira K, Rendell L (1992) A practical approach to feature selection. In: International workshop on machine learning
Krawczyk B, Woźniak M, Schaefer G (2014) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 14(1):554–562
Li D, Ju Y, Zou Q (2016) Protein folds prediction with hierarchical structured SVM. Curr Proteomics 13(2):79–85
Liu J, Hu Q, Yu D (2008) A weighted rough set based method developed for class imbalance learning. Inform Sci 178(4):1235–1256
Liu X, Zhao H (2019) Hierarchical feature extraction based on discriminant analysis. Appl Intell 49 (7):2780–2792
Lu H, Xu Y, Ye M, Ke Y, Jin Q, Gao Z (2018) Learning misclassification costs for imbalanced datasets application in gene expression data classification
Liu X, Wu J, Zhou Z (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B 39(2):539–550
Min F, Liu F, Wen L, Zhang Z (2018) Tri-partition cost-sensitive active learning through KNN. Soft Comput 10:1–16
Mullick S, Datta S, Das S (2018) Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance. IEEE Trans Neural Netw Learn Syst 99:1–13
Murzin A, Brenner S, Hubbard T, Chothia C (1995) Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540
Nakano F, Pinto W, Pappa G, Cerri R (2017) Top-down strategies for hierarchical classification of transposable elements with neural networks. In: International joint conference on neural networks
Nie F, Huang H, Xiao C, Ding C (2010) Efficient and robust feature selection via joint l2,1-norms minimization. In: International conference on neural information processing systems
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell, 1226–1238
Prati R, Batista G, Monard M (2004) Class imbalances versus class overlapping: An analysis of a learning system behavior. Lect Notes Comput Sci 2972:312–321
Tao Q, Wu G, Wang F, Wang J (2005) Posterior probability support vector machines for unbalanced data. IEEE Trans Neural Netw 16(6):1561–1573
Qu Y, Lin L, Shen F, Lu C, Wu Y, Xie Y, Tao D (2017) Joint hierarchical category structure learning and large-scale image classification. IEEE Trans Image Process, 4331–4346
Sandrine D, Jane F (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):1–21
Sun A, Lim E (2001) Hierarchical text classification and evaluation. In: IEEE international conference on data mining
Sun Y, Kamel M, Wong A, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378
Tuo Q, Zhao H, Hu Q (2019) Hierarchical feature selection with subtree based graph regularization. Knowl-Based Syst 163:996–1008
Wei L, Liao M, Gao X, Zou Q (2015) An improved protein structural prediction method by incorporating both sequence and structure information. IEEE Trans Nanobioscience 14(4):339–349
Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. Proc IEEE Conf Comput Vis Pattern Recogn 23(3):3485–3492
Yu W, Hu Q, Zhou Y, Hong Z, Qian Y, Liang J (2017) Local bayes risk minimization based stopping strategy for hierarchical classification. In: IEEE International conference on data mining
Yuan X, Xie L, Abouelenien M (2017) A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data. Pattern Recogn 77:160–172
Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: IEEE International conference on data mining
Zhang C, Tan K, Li H, Hong G (2018) A cost-sensitive deep belief network for imbalanced classification. IEEE Trans Neural Netw Learn Syst 99:1–14
Zhou Y, Hu Q, Yu W (2018) Deep super-class learning for long-tail distributed image classification. Pattern Recogn, 118–128
Ashburner M, Ball C, Blake J, Botstein D, Cherry J (2000) Gene ontology: tool for the unification of biology. Nat Gen, 25–29
Gopal S, Yang Y (2015) Hierarchical Bayesian inference and recursive regularization for large-scale classification. Acm Trans Knowl Discov Data, 1–23
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant No. 61703196, the Natural Science Foundation of Fujian Province under Grant No. 2018J01549, and the President’s Fund of Minnan Normal University under Grant No. KJ19021.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zheng, W., Zhao, H. Cost-sensitive hierarchical classification for imbalance classes. Appl Intell 50, 2328–2338 (2020). https://doi.org/10.1007/s10489-019-01624-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01624-z