research-article

Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem

Authors:

Xu-Ying LiuAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 18, Issue 1

Pages 63 - 77

https://doi.org/10.1109/TKDE.2006.17

Published: 01 January 2006 Publication History

Abstract

This paper studies empirically the effect of sampling and threshold-moving in training cost-sensitive neural networks. Both oversampling and undersampling are considered. These techniques modify the distribution of the training data such that the costs of the examples are conveyed explicitly by the appearances of the examples. Threshold-moving tries to move the output threshold toward inexpensive classes such that examples with higher costs become harder to be misclassified. Moreover, hard-ensemble and soft-ensemble, i.e., the combination of above techniques via hard or soft voting schemes, are also tested. Twenty-one UCI data sets with three types of cost matrices and a real-world cost-sensitive data set are used in the empirical study. The results suggest that cost-sensitive learning with multiclass tasks is more difficult than with two-class tasks, and a higher degree of class imbalance may increase the difficulty. It also reveals that almost all the techniques are effective on two-class tasks, while most are ineffective and even may cause negative effect on multiclass tasks. Overall, threshold-moving and soft-ensemble are relatively good choices in training cost-sensitive neural networks. The empirical study also suggests that some methods that have been believed to be effective in addressing the class imbalance problem may, in fact, only be effective on learning with imbalanced two-class data sets.

References

[1]

N. Abe, B. Zadrozny, and J. Langford, “An Iterative Method for Multiclass Cost-Sensitive Learning,” Proc. 10th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 3-11, 2004.

Digital Library

[2]

E. Bauer and R. Kohavi, “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants,” Machine Learning, vol. 36, nos. 1-2, pp. 102-139, 1999.

Digital Library

[3]

S.D. Bay, “UCI KDD Archive,” Dept. of Information and Computer Science, Univ. of California, Irvine, 2000,

[4]

C. Blake, E. Keogh, and C.J. Merz, “UCI Repository of Machine Learning Databases,” Dept. of Information and Computer Science, Univ. of California, Irvine, 1998,

[5]

J.P. Bradford, C. Kuntz, R. Kohavi, C. Brunk, and C.E. Brodley, “Pruning Decision Trees with Misclassification Costs,” Proc. 10th European Conf. Machine Learning, pp. 131-136, 1998.

Digital Library

[6]

U. Brefeld, P. Geibel, and F. Wysotzki, “Support Vector Machines with Example Dependent Costs,” Proc. 14th European Conf. Machine Learning, pp. 23-34, 2003.

Digital Library

[7]

L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees. Belmont, Calif.: Wadsworth, 1984.

[8]

N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” J. Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

Digital Library

[9]

B.V. Dasarathy, Nearest Neighbor Norms: NN Pattern Classification Techniques. Los Alamitos, Calif.: IEEE CS Press, 1991.

[10]

T.G. Dietterich, “Ensemble Learning,” The Handbook of Brain Theory and Neural Networks, second ed., M.A. Arbib, ed., Cambridge, Mass.: MIT Press, 2002.

[11]

P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive,” Proc. Fifth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 155-164, 1999.

Digital Library

[12]

C. Drummond and R.C. Holte, “Explicitly Representing Expected Cost: An Alternative to ROC Representation,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 198-207, 2000.

Digital Library

[13]

C. Drummond and R.C. Holte, “C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling Beats Over-Sampling,” Working Notes of the ICML'03 Workshop Learning from Imbalanced Data Sets, 2003.

[14]

C. Elkan, “The Foundations of Cost-Sensitive Learning,” Proc. 17th Int'l Joint Conf. Artificial Intelligence, pp. 973-978, 2001.

Digital Library

[15]

N. Japkowicz, “Learning from Imbalanced Data Sets: A Comparison of Various Strategies,” Working Notes of the AAAI'00 Workshop Learning from Imbalanced Data Sets, pp. 10-15, 2000.

[16]

N. Japkowicz and S. Stephen, “The Class Imbalance Problem: A Systematic Study,” Intelligent Data Analysis, vol. 6, no. 5, pp. 429-450, 2002.

[17]

U. Knoll, G. Nakhaeizadeh, and B. Tausend, “Cost-Sensitive Pruning of Decision Trees,” Proc. Eighth European Conf. Machine Learning, pp. 383-386, 1994.

Digital Library

[18]

M. Kubat and S. Matwin, “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” Proc. 14th Int'l Conf. Machine Learning, pp. 179-186, 1997.

[19]

M. Kukar and I. Kononenko, “Cost-Sensitive Learning with Neural Networks,” Proc. 13th European Conf. Artificial Intelligence, pp. 445-449, 1998.

[20]

L.I. Kuncheva and C.J. Whitaker, “Measures of Diversity in Classifier Ensembles,” Machine Learning, vol. 51, no. 2, pp. 181-207, 2003.

Digital Library

[21]

S. Lawrence, I. Burns, A. Back, A.C. Tsoi, and C.L. Giles, “Neural Network Classification and Prior Class Probabilities,” Lecture Notes in Computer Science 1524, G.B. Orr and K.-R. Müller, eds., pp. 299-313, Berlin: Springer, 1998.

Digital Library

[22]

M.A. Maloof, “Learning When Data Sets are Imbalanced and When Costs Are Unequal and Unknown,” Proc. Working Notes ICML'03 Workshop Learning from Imbalanced Data Sets, 2003.

[23]

D.D. Margineantu and T.G. Dietterich, “Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers,” Proc. 17th Int'l Conf. Machine Learning, pp. 583-590, 2000.

Digital Library

[24]

M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, and C. Brunk, “Reducing Misclassification Costs,” Proc. 11th Int'l Conf. Machine Learning, pp. 217-225, 1994.

[25]

F. Provost, “Machine Learning from Imbalanced Data Sets 101,” Working Notes AAAI'00 Workshop Learning from Imbalanced Data Sets, pp. 1-3, 2000.

[26]

F. Provost and T. Fawcett, “Analysis and Visualization of Classifier Performance: Comparison Under Imprecise Class and Cost Distributions,” Proc. Third ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 43-48, 1997.

[27]

J.R. Quinlan, “MiniBoosting Decision Trees,”

[28]

D.E. Rumelhart, G.E. Hinton, and R.J. Williams, “Learning Internal Representations by Error Propagation,” Parallel Distributed Processing: Explorations in The Microstructure of Cognition, D.E. Rumelhart and J.L. McClelland, eds., vol. 1, pp. 318-362, Cambridge, Mass.: MIT Press, 1986.

[29]

Machine Learning— A Technological Roadmap, L. Saitta, ed. The Netherlands: Univ. of Amsterdam, 2000.

[30]

C. Stanfill and D. Waltz, “Toward Memory-Based Reasoning,” Comm. ACM, vol. 29, no. 12, pp. 1213-1228, 1986.

Digital Library

[31]

K.M. Ting, “A Comparative Study of Cost-Sensitive Boosting Algorithms,” Proc. 17th Int'l Conf. Machine Learning, pp. 983-990, 2000.

Digital Library

[32]

K.M. Ting, “An Empirical Study of MetaCost Using Boosting Algorithm,” Proc. 11th European Conf. Machine Learning, pp. 413-425, 2000.

Digital Library

[33]

K.M. Ting, “An Instance-Weighting Method to Induce Cost-Sensitive Trees,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, pp. 659-665, Apr./May 2002.

Digital Library

[34]

I. Tomek, “Two Modifications of CNN,” IEEE Trans. Systems, Man, and Cybernetics, vol. 6, no. 6, pp. 769-772, 1976.

[35]

P.D. Turney, “Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm,” J. Artificial Intelligence Research, vol. 2, pp. 369-409, 1995.

Digital Library

[36]

M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas, “Non-Linear Dimensionality Reduction Techniques for Classification and Visualization,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 645-651, 2002.

Digital Library

[37]

G.I. Webb, “Cost-Sensitive Specialization,” Proc. Fourth Pacific Rim Int'l Conf. Artificial Intelligence, pp. 23-34, 1996.

Digital Library

[38]

G.M. Weiss, “Mining with Rarity— Problems and Solutions: A Unifying Framework,” SIGKDD Explorations, vol. 6, no. 1, pp. 7-19, 2004.

Digital Library

[39]

B. Zadrozny and C. Elkan, “Learning and Making Decisions When Costs and Probabilities Are Both Unknown,” Proc. Seventh ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 204-213, 2001.

Digital Library

Cited By

Zou RLiang YSi TWang P(2024)Predictive Modeling of Pulmonary Arterial Hypertension Based on Phonocardiogram SignalsProceedings of the 2024 16th International Conference on Computer Modeling and Simulation10.1145/3686812.3686816(1-0)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3686812.3686816
Wang HJing BDing KZhu YCheng WZhang SFan YZhang LZhou DBaeza-Yates RBonchi F(2024)Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and GeneralizationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671880(3045-3056)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671880
Nishtala SRavindran B(2024)Cost-Sensitive Trees for Interpretable Reinforcement LearningProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632443(91-99)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3632410.3632443
Show More Cited By

Index Terms

Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
  2. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Cost-Sensitive back-propagation neural networks with binarization techniques in addressing multi-class problems and non-competent classifiers

A novel method based on cost-sensitive neural networks with binarization techniques for multi-class problems is developed.The effect of aggregation methods for the proposed method is studied.The positive synergy between the management of non-competent ...
Ensemble of Cost-Sensitive Hypernetworks for Class-Imbalance Learning
SMC '13: Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics

Hyper network is a probabilistic graphic model of learning and memory inspired by biomolecular networks, which is very useful for discovering higher-order correlations among multiple attributes. However, as many traditional machine learning algorithms, ...
Cost-Sensitive Boosting

A novel framework is proposed for the design of cost-sensitive boosting algorithms. The framework is based on the identification of two necessary conditions for optimal cost-sensitive learning that 1) expected losses must be minimized by optimal cost-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 18, Issue 1

January 2006

142 pages

ISSN:1041-4347

Issue’s Table of Contents

Copyright © 2006.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 January 2006

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

303
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zou RLiang YSi TWang P(2024)Predictive Modeling of Pulmonary Arterial Hypertension Based on Phonocardiogram SignalsProceedings of the 2024 16th International Conference on Computer Modeling and Simulation10.1145/3686812.3686816(1-0)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.1145/3686812.3686816
Wang HJing BDing KZhu YCheng WZhang SFan YZhang LZhou DBaeza-Yates RBonchi F(2024)Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and GeneralizationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671880(3045-3056)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671880
Nishtala SRavindran B(2024)Cost-Sensitive Trees for Interpretable Reinforcement LearningProceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)10.1145/3632410.3632443(91-99)Online publication date: 4-Jan-2024
https://dl.acm.org/doi/10.1145/3632410.3632443
Wang XZhong YHuang CHuang X(2024)ChatPRCS: A Personalized Support System for English Reading Comprehension Based on ChatGPTIEEE Transactions on Learning Technologies10.1109/TLT.2024.340574717(1762-1776)Online publication date: 27-May-2024
https://dl.acm.org/doi/10.1109/TLT.2024.3405747
Wu OLi M(2024)Revisiting the Effective Number Theory for Imbalanced LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.336794936:8(4192-4206)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TKDE.2024.3367949
Zha HWang HFeng ZXiang ZYan WHe YLin Y(2024)LT-SEI: Long-Tailed Specific Emitter Identification Based on Decoupled Representation Learning in Low-Resource ScenariosIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.330871625:1(929-943)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TITS.2023.3308716
Park SLee HIm J(2024)Relabeling & raking algorithm for imbalanced classification▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2024.123274247:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.eswa.2024.123274
Ma TLu SJiang C(2024)A membership-based resampling and cleaning algorithm for multi-class imbalanced overlapping data▪Expert Systems with Applications: An International Journal10.1016/j.eswa.2023.122565240:COnline publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.122565
Duan JGu YYu HYang XGao S(2024)ECC + +Expert Systems with Applications: An International Journal10.1016/j.eswa.2023.121366236:COnline publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121366
Xue LZhu T(2024)Hybrid resampling and weighted majority voting for multi-class anomaly detection on imbalanced malware and network traffic dataEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107568128:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.engappai.2023.107568
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents