Covid-19 Prediction Model Using Data Mining Algorithms
Keywords:COVID-19, Diagnosis, Predicting, KCDC dataset, Correlation based Feature Selection (CFS), ROC curve, SVM, NB
From 2019 until today, the whole world is panicking due to the pandemic of the Corona virus or the so-called COVID-19, so it was inevitable to search for a way to predict the disease before it occurs. Disease forecasting requires huge databases, human hands, and high-speed technologies. Due to the rapid spread of this pandemic, scientists have looked at using data mining methods to predict diseases. Predicting and early diagnosis of disease through mining algorithms reduces human errors, saves money and scientists make the most accurate decision. It then shortens the long time needed to detect COVID-19 using available algorithms and tools that rely on data such as lung images or disease symptoms such as temperature, among others. In this research, two data mining algorithms were used to predict COVID-19 that support vector machine (SVM) and naïve bays (NB) using a dataset from the Korea Centers for Disease Control (KCDC). Feature selection was performed using Correlation based Feature Selection (CFS) after previously processed data. Performance measures for the research proved that SVM is best classifier of NB with accuracy, sensitivity and specificity of SVM being 96.72%, 94.08%, and 97.96% respectively. The receiver operating characteristic (ROC) curve also demonstrated better SVM performance than NB for predicting COVID-19.
Omran S. H.; Abbas K. A.; Alaa H. A.; Nihad Q. M. "Epidemiological Features of COVID-19 Epidemic in Basrah Province-Southern Iraq-First Report ", The Medical Journal of Basrah University, 38, 6-17, 2020.
Mateus M. ; Jonatha S. P.; Ivalbert S. P.; João G.; Marcos E. B.; Anderson A., "Convolutional Support Vector Models: Prediction of Coronavirus Disease Using Chest X-rays", Information, 11, 2020.
N.; Adel G.; Mohammad S., "COVID-19 Prediction Classifier Model Using Hybrid Algorithms in Data Mining", Int J Pediatr, 9, 12723-12737, 2021.
Zahraa N. S., "Data Mining in Cancer Diagnosis and Prediction:Review about Latest Ten Years", Current Journal of Applied Science and Technology, 39, 11-32, 2020.
Rachid Z.; Muhammad A. K.; Atta-ur-R.; Muhammad A. S.; Muhammad F. M.; Abdur R.; Muhammad F. K., "Modelling and Simulation of COVID-19 Outbreak Prediction Using Supervised Machine Learning", Computers, Materials & Continua, 66, 2397-2407, 2021.
Shawni D.; Samir K. B., "Machine learning approach for confirmation of COVID-19 cases: positive, negative, death and release", Iberoamerican Journal of Medicine, 3, 172-177, 2020.
Ekta G.; Ritika J.; Alankrit G., "Regression Analysis of COVID-19 using Machine Learning Algorithms", Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020), IEEE Xplore Part Number: CFP20V90-ART, 2021.
Muhammad L. J.; Md. M. I.; Sani S. U.; Safial I. A., "Predictive Data Mining Models for Novel Coronavirus (COVID‑19) Infected Patients' Recovery", SN Computer Science, 1, 2020.
Emmanuel de-G. J. O.; Wen Z.; Nancy Z.; Ning W., "Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS® Implementations", Northeast SAS User Group proceedings, Section of Health Care and Life Sciences, Baltimore, Maryland, 14-17, 2010.
Angelina S.; Amponsah K. S.; Abaidoo R., "Sensitivity and Specificity Analysis Relation to Statistical Hypothesis Testing and Its Errors: Application to Cryptosporidium Detection Techniques", Open Journal of Applied Sciences, 6, 209-216, 2016.
Kavitha K.; Catherine R. D.; Anuradha D., "Testing the Sensitivity and Specificity of ICU Patients and Diagnose Statistics Hypothetical Errors", International Journal of Innovative and Exploring Engineering (IJITEE), 8, 1042-1045, 2019.
Jan Y. V.; Ewout W. S.; Hajime U.; Bavo De C.; Laure W.; Gary S. C.; Ben V. C., "ROC CURVES FOR CLINICAL PREDICTION MODEL SERIES", Journal of Clinical Epidemiology, 126, 207-216, 2020.
Coronavirus dataset of Korea Centers for Disease Control & Prevention (KCDC), https: //www.kaggl e.com/kimji hoo/coron avirusdata set/data. Accessed 20 Apr 2020.
Agnieszka W.; Danuta Z., "Integrating Correlation-Based Feature Selection and for Improved Cardiovascular Disease Diagnosis", Complexity, 2018, 2018.
Rakkrit D.; Terry W., "Correlation-Based and Causal Feature Selection Analysis for Ensemble Classifiers", IAPR Workshop on Artificial Neural Networks in Pattern Recognition, 25-36, 2010.
Daniel B., "Bayes' Theorem and Naive Bayes Classifier", Encyclopedia of Bioinformatics and Computational Biology, 1, Elsevier, 403-412, 2018.
Michael C., "The Naive Bayes Model, Maximum-Likelihood Estimation, and the EM Algorithm", 2012.
Jakub H.; Jaromir V.; Petr S., "Support Vector Machine Methods and Artificial Neural Networks Used for the Development of Bankruptcy Prediction Models and their Comparison", J. Risk Financial Manag., 13, 2020.
Yogita B. B.; Kalyani C.W., "Intrusion Detection System Using Data Mining Technique: Support Vector Machine", International Journal of Emerging Technology and Advanced Engineering, 3, 581-586, 2013.
Pijush S.; Sanjiban S. R.; Valentina B., "Handbook of Neural Computation", Academic Press, 2017.
Abdulkareem N., M.; bdulazeez A. M.; zeebaree D. Q.; Hasan D. A., "COVID-19 World Vaccination Progress Using Machine Learning Classification Algorithms", Qubahan Academic Journal, 1, 100-105, 2021.
How to Cite
Copyright (c) 2022 Al-Mustansiriyah Journal of Science
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The journal has no restrictions for the author to hold the copyrights of his articles. The journal does not allow authors to republish the same article in other journals or conferences that is published in one of its volumes.