Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3055635.3056643acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcConference Proceedingsconference-collections
research-article

Combining Over-Sampling and Under-Sampling Techniques for Imbalance Dataset

Published: 24 February 2017 Publication History

Abstract

An important problem in medical data analysis is imbalance dataset. This problem is a cause of diagnostic mistake. The results of diagnostic affect to life of patients. If a doctor fails in diagnostic of patient who have disease that means he cannot treat patient in timely. However, the problem can be easily solved by adding or removing the data to closely balance for performance of diagnostic in medically. This paper proposed a solution to adjust imbalance dataset by combining Neighbor Cleaning Rule (NCL) and Synthetic Minority Over-Sampling Technique (SMOTE) techniques. The process of work is using NCL technique for removing sample data that are outliers in majority class and SMOTE technique is used for increasing sample data in minority class to closely balance dataset. After that, the balanced medical dataset is classified by Naive Bayes, SMO and KNN algorithm. The experimental results show that the recall rate can be improved from the models that were created from balanced dataset.

References

[1]
Nitesh V. Chawla, Debray, Kevin W. Bowyer, Lawrence O. Hall and W. Philip Kegelmeyer. SMOTE: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 2002; 16: 321--357.
[2]
Jorma Laurikkala, Improving Identification of Difficult Small Classes by Balancing Class Distribution. AIME 2001, LNAI 2101,2001; 63--66.
[3]
Marcelo Beckmann, Nelson F. F. Ebecken and Beatriz S. L. Pires de Lima. A KNN Undersampling Approach for Data Balancing. Journal of Intelligent Learning Systems and Applications 2015; 7: 104--116.
[4]
Rok Blagus and Lara Lusa. SMOTE for high-dimensional class-imbalanced data. Blagus and Lusa BMC Bioinformatics 2013; 14: 106
[5]
Nele Verbiest, Enislay Ramentol, Chris Cornelis and Francisco Herrera. Improving SMOTE with Fuzzy Rough Prototype Selection to Detect Noise in Imbalanced Classification Data. Ibero-American Conference on AI, Cartagena de Indias 2012; 13: 169--178.
[6]
Juanjuan Wang, Mantao Xu, Hui Wang, Jiwu Zhang, Classification of Imbalanced Data by Using the SMOTE Algorithm and Locally Linear Embedding, ICSP 2006.
[7]
Kazuo Hattori, Masahito Takahashi, A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recognition 2000; 33: 521--528.
[8]
Zeping Yang and Daqi Gao. Classification for Imbalanced and Overlapping Classes Using Outlier Detection and Sampling Techniques. Applied Mathematics & Information Sciences 2013; 7: 375--381.
[9]
Elhassan T, Aljurf M, Al-Mohanna F and Shoukri M. Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method. Journal of Informatics and Data Mining 2016; 1: 1--12.
[10]
Ronaldo C. Prati, Gustavo E. A. P. A. Batista and Maria Carolina Monard. Data mining with imbalanced class distributions: concepts and methods. Indian International Conference on Artificial Intelligence (IICAI) 2009; 4: 359--376.

Cited By

View all
  • (2025)Precision Forecasting in Colorectal Oncology: Predicting Six-Month Survival to Optimize Clinical DecisionsElectronics10.3390/electronics1405088014:5(880)Online publication date: 23-Feb-2025
  • (2025)Improving Surgical Site Infection Prediction Using Machine Learning: Addressing Challenges of Highly Imbalanced DataDiagnostics10.3390/diagnostics1504050115:4(501)Online publication date: 19-Feb-2025
  • (2025)Towards Smarter E-Learning: Real-Time Analytics and Machine Learning for Personalized EducationInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.78611:1Online publication date: 2-Jan-2025
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMLC '17: Proceedings of the 9th International Conference on Machine Learning and Computing
February 2017
545 pages
ISBN:9781450348171
DOI:10.1145/3055635
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Classification
  2. Data Mining
  3. Imbalance Dataset
  4. Medical Data
  5. NCL
  6. SMOTE

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICMLC 2017

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)149
  • Downloads (Last 6 weeks)13
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Precision Forecasting in Colorectal Oncology: Predicting Six-Month Survival to Optimize Clinical DecisionsElectronics10.3390/electronics1405088014:5(880)Online publication date: 23-Feb-2025
  • (2025)Improving Surgical Site Infection Prediction Using Machine Learning: Addressing Challenges of Highly Imbalanced DataDiagnostics10.3390/diagnostics1504050115:4(501)Online publication date: 19-Feb-2025
  • (2025)Towards Smarter E-Learning: Real-Time Analytics and Machine Learning for Personalized EducationInternational Journal of Computational and Experimental Science and Engineering10.22399/ijcesen.78611:1Online publication date: 2-Jan-2025
  • (2025)The development of classification-based machine-learning models for the toxicity assessment of chemicals associated with plastic packagingJournal of Hazardous Materials10.1016/j.jhazmat.2024.136702484(136702)Online publication date: Feb-2025
  • (2024)Crop Classification with Attention Based BI-LSTM and Temporal Convolution Neural Network Combination for Remote Sensing Breizhcrop Time Series DataYüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi10.53433/yyufbed.1335866Online publication date: 13-Feb-2024
  • (2024)Early Identification of Cognitive Impairment in Community Environments Through Modeling Subtle Inconsistencies in Questionnaire Responses: Machine Learning Model Development and ValidationJMIR Formative Research10.2196/543358(e54335)Online publication date: 13-Nov-2024
  • (2024)The effect of data balancing approaches on the prediction of metabolic syndrome using non-invasive parameters based on random forestBMC Bioinformatics10.1186/s12859-024-05633-925:1Online publication date: 11-Jan-2024
  • (2024)Boosting Software Fault PredictionApplied Computational Intelligence and Soft Computing10.1155/2024/29595822024Online publication date: 1-Jan-2024
  • (2024)Multi-Class Imbalanced Data Handling with Concept Drift in Fog Computing: A Taxonomy, Review, and Future DirectionsACM Computing Surveys10.1145/368962757:1(1-48)Online publication date: 7-Oct-2024
  • (2024)Meta-Learning for Multi-Family Android Malware ClassificationACM Transactions on Software Engineering and Methodology10.1145/366480633:7(1-27)Online publication date: 26-Aug-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media