Adaptation Proposed Methods for Handling Imbalanced Datasets based on Over-Sampling Technique
Keywords:Imbalanced Datasets, O.S., SMOTE, Borderline-SMOTE, ADASYN.
Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.
Y. Yan, "Deep Learning-Based Imbalanced Data Classification and Information Retrieval for Multimedia Big Data," ProQuest Diss. Thesis, p. 172, 2018.
N. V Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, "SMOTEBoost : Improving Prediction," Lavrač N., Gamberger D., Todorovski L., Blockeel H. Knowl. Discov. Databases PKDD 2003. LNCS, vol. 2838, pp. 107-119, 2003. CROSSREF
A. Ali, S. M. Shamsuddin, and A. L. Ralescu, "Classification with class imbalance problem: A review," Int. J. Adv. Soft Comput. its Appl., vol. 7, no. 3, pp. 176-204, 2015.
C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, "Safe-Level-SMOTE : Safe-Level-Synthetic Minority Over-Sampling Technique," Springer-Verlag Berlin Heidelberg 2009, pp. 475-476, 2009. CROSSREF
B. J. Park, S. K. Oh, and W. Pedrycz, "The Design of Polynomial Function-based Neural Network Predictors for Detection of Software Defects," Inf. Sci. (NY)., vol. 229, pp. 40-57, 2013. CROSSREF
Qi Dong, Shaogang Gong, and Xiatian Zhu, "Imbalanced Deep Learning by Minority Class Incremental Rectification," arXiv:1804.10851v1, 28 Apr 2018
S. Del Río, V. López, J. M. Benítez, and F. Herrera, "On the Use of MapReduce for Imbalanced Big Data using Random Forest," Inf. Sci. (Ny)., vol. 285, no. 1, pp. 112-137, 2014. CROSSREF
T. Wang, R. Huang, X. Wei, and F. Zhou, "Improving User's Quality of Experience in Imbalanced Dataset," Proc. - 2016 Int. Comput. Symp. ICS 2016, pp. 690-695, 2017. CROSSREF
J. M. Choi, "A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines," AAAI'2000 Work. imbalanced datasets, p. 107, 2010.
Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," 978-1-4244-1821-3/08 IEEE, 2008.
How to Cite
The journal has no restrictions for the author to hold the copyrights of his articles. The journal does not allow authors to republish the same article in other journals or conferences that is published in one of its volumes.