Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique
Keywords:Imbalanced Datasets, O.S., SMOTE, Borderline-SMOTE, ADASYN.
Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.
Y. Yan, "Deep Learning-Based Imbalanced Data Classification and Information Retrieval for Multimedia Big Data," ProQuest Diss. Thesis, p. 172, 2018.
N. V Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, "SMOTEBoost : Improving Prediction," Lavrač N., Gamberger D., Todorovski L., Blockeel H. Knowl. Discov. Databases PKDD 2003. LNCS, vol. 2838, pp. 107-119, 2003. CROSSREF
A. Ali, S. M. Shamsuddin, and A. L. Ralescu, "Classification with class imbalance problem: A review," Int. J. Adv. Soft Comput. its Appl., vol. 7, no. 3, pp. 176-204, 2015.
C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, "Safe-Level-SMOTE : Safe-Level-Synthetic Minority Over-Sampling Technique," Springer-Verlag Berlin Heidelberg 2009, pp. 475-476, 2009. CROSSREF
B. J. Park, S. K. Oh, and W. Pedrycz, "The Design of Polynomial Function-based Neural Network Predictors for Detection of Software Defects," Inf. Sci. (NY)., vol. 229, pp. 40-57, 2013. CROSSREF
Qi Dong, Shaogang Gong, and Xiatian Zhu, "Imbalanced Deep Learning by Minority Class Incremental Rectification," arXiv:1804.10851v1, 28 Apr 2018
S. Del Río, V. López, J. M. Benítez, and F. Herrera, "On the Use of MapReduce for Imbalanced Big Data using Random Forest," Inf. Sci. (Ny)., vol. 285, no. 1, pp. 112-137, 2014. CROSSREF
T. Wang, R. Huang, X. Wei, and F. Zhou, "Improving User's Quality of Experience in Imbalanced Dataset," Proc. - 2016 Int. Comput. Symp. ICS 2016, pp. 690-695, 2017. CROSSREF
J. M. Choi, "A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines," AAAI'2000 Work. imbalanced datasets, p. 107, 2010.
Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," 978-1-4244-1821-3/08 IEEE, 2008.
How to Cite
Articles accepted for publication in Al-Mustansiriyah Journal of Science (MJS) are protected under the Creative Commons Attribution 4.0 International License (CC BY-NC). Authors of accepted articles are requested to sign a copyright release form prior to their article being published. All authors must agree to the submission, sign copyright release forms, and agree to be included in any correspondence between MJS and the authors before submitting a work to MJS. For personal or educational use, permission is given without charge to print or create digital copies of all or portions of a MJS article. However, copies must not be produced or distributed for monetary gain. It is necessary to respect the copyright of any parts of this work that are not owned by MJS.