Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique

Liqaa M. Shoohi; Jamila H. Saud

doi:10.23851/mjs.v31i2.740

Authors

Liqaa M. Shoohi Dept. Computer Science, College of Science, Mustansiriyah University.
Jamila H. Saud Dept. Computer Science, College of Science, Mustansiriyah University. http://orcid.org/0000-0003-1872-160X

DOI:

https://doi.org/10.23851/mjs.v31i2.740

Keywords:

Imbalanced Datasets, O.S., SMOTE, Borderline-SMOTE, ADASYN.

Abstract

Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.

Downloads

Download data is not yet available.

References

Y. Yan, "Deep Learning-Based Imbalanced Data Classification and Information Retrieval for Multimedia Big Data," ProQuest Diss. Thesis, p. 172, 2018.

N. V Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, "SMOTEBoost : Improving Prediction," Lavrač N., Gamberger D., Todorovski L., Blockeel H. Knowl. Discov. Databases PKDD 2003. LNCS, vol. 2838, pp. 107-119, 2003. CROSSREF

W. Pedrycz and S. Chen, "Data Science and Big Data: An Environment of Computational Intelligence," Springer International Publishing AG 2017, vol. 24. 2017. CROSSREF, PUBMED

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, "Classification with class imbalance problem: A review," Int. J. Adv. Soft Comput. its Appl., vol. 7, no. 3, pp. 176-204, 2015.

C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, "Safe-Level-SMOTE : Safe-Level-Synthetic Minority Over-Sampling Technique," Springer-Verlag Berlin Heidelberg 2009, pp. 475-476, 2009. CROSSREF

B. J. Park, S. K. Oh, and W. Pedrycz, "The Design of Polynomial Function-based Neural Network Predictors for Detection of Software Defects," Inf. Sci. (NY)., vol. 229, pp. 40-57, 2013. CROSSREF

Qi Dong, Shaogang Gong, and Xiatian Zhu, "Imbalanced Deep Learning by Minority Class Incremental Rectification," arXiv:1804.10851v1, 28 Apr 2018

S. Del Río, V. López, J. M. Benítez, and F. Herrera, "On the Use of MapReduce for Imbalanced Big Data using Random Forest," Inf. Sci. (Ny)., vol. 285, no. 1, pp. 112-137, 2014. CROSSREF

T. Wang, R. Huang, X. Wei, and F. Zhou, "Improving User's Quality of Experience in Imbalanced Dataset," Proc. - 2016 Int. Comput. Symp. ICS 2016, pp. 690-695, 2017. CROSSREF

J. M. Choi, "A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines," AAAI'2000 Work. imbalanced datasets, p. 107, 2010.

Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," 978-1-4244-1821-3/08 IEEE, 2008.