Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique

Liqaa M. Shoohi, Jamila H. Saud

Abstract


Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE),  Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.


Keywords


Imbalanced Datasets, O.S., SMOTE, Borderline-SMOTE, ADASYN.

Full Text:

PDF

References


Y. Yan, "Deep Learning-Based Imbalanced Data Classification and Information Retrieval for Multimedia Big Data," ProQuest Diss. Thesis, p. 172, 2018.

N. V Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, "SMOTEBoost : Improving Prediction," Lavrač N., Gamberger D., Todorovski L., Blockeel H. Knowl. Discov. Databases PKDD 2003. LNCS, vol. 2838, pp. 107-119, 2003. CROSSREF

W. Pedrycz and S. Chen, "Data Science and Big Data: An Environment of Computational Intelligence," Springer International Publishing AG 2017, vol. 24. 2017. CROSSREF, PUBMED

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, "Classification with class imbalance problem: A review," Int. J. Adv. Soft Comput. its Appl., vol. 7, no. 3, pp. 176-204, 2015.

C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, "Safe-Level-SMOTE : Safe-Level-Synthetic Minority Over-Sampling Technique," Springer-Verlag Berlin Heidelberg 2009, pp. 475-476, 2009. CROSSREF

B. J. Park, S. K. Oh, and W. Pedrycz, "The Design of Polynomial Function-based Neural Network Predictors for Detection of Software Defects," Inf. Sci. (NY)., vol. 229, pp. 40-57, 2013. CROSSREF

Qi Dong, Shaogang Gong, and Xiatian Zhu, "Imbalanced Deep Learning by Minority Class Incremental Rectification," arXiv:1804.10851v1, 28 Apr 2018

S. Del Río, V. López, J. M. Benítez, and F. Herrera, "On the Use of MapReduce for Imbalanced Big Data using Random Forest," Inf. Sci. (Ny)., vol. 285, no. 1, pp. 112-137, 2014. CROSSREF

T. Wang, R. Huang, X. Wei, and F. Zhou, "Improving User's Quality of Experience in Imbalanced Dataset," Proc. - 2016 Int. Comput. Symp. ICS 2016, pp. 690-695, 2017. CROSSREF

J. M. Choi, "A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines," AAAI'2000 Work. imbalanced datasets, p. 107, 2010.

Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," 978-1-4244-1821-3/08 IEEE, 2008.




DOI: http://dx.doi.org/10.23851/mjs.v31i2.740

Refbacks

  • There are currently no refbacks.


Copyright (c) 2020 Al-Mustansiriyah Journal of Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Copyright (c) 2018 by Al-Mustansiriyah Journal of Science
ISSN: 1814-635X (Print), ISSN: 2521-3520 (online)