research-article

Public Access

Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning

Authors:

Na Zou,

Xia Ben HuAuthors Info & Claims

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 2476 - 2485

https://doi.org/10.1145/3511808.3557474

Published: 17 October 2022 Publication History

PDF eReader

Abstract

Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. While numerous over-sampling algorithms have been proposed, they heavily rely on heuristics, which could be sub-optimal since we may need different sampling strategies for different datasets and base classifiers, and they cannot directly optimize the performance metric. Motivated by this, we investigate developing a learning-based over-sampling algorithm to optimize the classification performance, which is a challenging task because of the huge and hierarchical decision space. At the high level, we need to decide how many synthetic samples to generate. At the low level, we need to determine where the synthetic samples should be located, which depends on the high-level decision since the optimal locations of the samples may differ for different numbers of samples. To address the challenges, we propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions. Motivated by the success of SMOTE and its extensions, we formulate the generation process as a Markov decision process (MDP) consisting of three levels of policies to generate synthetic samples within the SMOTE search space. Then we leverage deep hierarchical reinforcement learning to optimize the performance metric on the validation data. Extensive experiments on six real-world datasets demonstrate that AutoSMOTE significantly outperforms the state-of-the-art resampling algorithms. The code is at https://github.com/daochenzha/autosmote

Supplementary Material

MP4 File (CIKM22-fp0772.mp4)

Presentation video

Download
41.78 MB

References

[1]

Sabri Boughorbel, Fethi Jarray, and Mohammed El-Anbari. 2017. Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS one, Vol. 12, 6 (2017), e0177678.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Deep reinforcement learning for multi-class imbalanced training: applications in healthcare

Imbalanced Sentiment Classification with Multi-Task Learning

Online sequential class-specific extreme learning machine for binary imbalanced learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations