Ensemble Machine Learning Approach for Anemia Classification Using Complete Blood Count Data

Rasha Jamal Hindi

doi:10.23851/mjs.v36i3.1709

Authors

Rasha Jamal Hindi Computer Science, College of Education, Mustansiriyah University, Baghdad, Iraq https://orcid.org/0009-0006-4700-2240

DOI:

https://doi.org/10.23851/mjs.v36i3.1709

Keywords:

Anemia classification, Machine learning, Complete blood count, Ensemble methods, Decision tree

Abstract

Background: Anemia is a widespread global health issue affecting millions of individuals worldwide. Early and accurate diagnosis is essential for effective treatment. Traditional diagnostic approaches rely on complete blood count (CBC) parameters, which provide valuable clinical insights but may require advanced tools to enhance diagnostic accuracy. Objective: This study aims to develop and evaluate machine learning models for classifying different anemia subtypes using CBC data. The goal is to assess the performance of individual models and ensemble methods in improving diagnostic accuracy. Methods: Five machine learning algorithms were implemented for the classification task: Decision tree, random forest, XGBoost, gradient boosting, and neural networks. In addition to evaluating individual models, ensemble techniques-including hard voting, soft voting, and stacking-were applied to enhance model performance. Results: Experimental results demonstrated that ensemble methods significantly outperformed individual models in classification accuracy. Among them, the stacking ensemble achieved the highest accuracy of 98.44%, indicating superior performance in distinguishing anemia subtypes. Conclusions: This study demonstrates that ensemble learning methods, particularly stacking, can substantially improve the performance of machine learning models in anemia classification based on CBC data. These findings suggest the potential integration of such ensemble techniques into clinical decision-support systems to assist healthcare providers in making efficient and timely diagnoses.

Downloads

Download data is not yet available.

References

N. Milman, “Anemia—still a major health problem in many parts of the world!” Annals of Hematology, vol. 90, no. 4, pp. 369–377, 2011.
CrossRef | Google Scholar | PubMed

N. J. Kassebaum, R. Jasrasaria, M. Naghavi, S. K. Wulf, N. Johns, R. Lozano, M. Regan, D. Weatherall, D. P. Chou, T. P. Eisele, et al., “A systematic analysis of global anemia burden from 1990 to 2010,” Blood, vol. 123, no. 5, pp. 615–624, 2014.
CrossRef | Google Scholar | PubMed

W. Gardner and N. Kassebaum, “Global, regional, and national prevalence and trends in infant breastfeeding status in 204 countries and territories, 1990–2019,” Current Developments in Nutrition, vol. 4, pp. nzaa054–064, Jun. 2020.
CrossRef | Google Scholar

C. M. Chaparro and P. S. Suchdev, “Anemia epidemiology, pathophysiology, and etiology in low- and middle-income countries,” Annals of the New York Academy of Sciences, vol. 1450, no. 1, pp. 15–31, 2019.
CrossRef | Google Scholar | PubMed

S. Bathla and S. Arora, “Prevalence and approaches to manage iron deficiency anemia (IDA),” Critical Reviews in Food Science and Nutrition, vol. 62, no. 32, pp. 8815–8828, 2021.
CrossRef | Google Scholar | PubMed

L. Agnello, R. V. Giglio, G. Bivona, C. Scazzone, C. M. Gambino, A. Iacona, A. M. Ciaccio, B. Lo Sasso, and M. Ciaccio, “The value of a complete blood count (CBC) for sepsis diagnosis and prognosis,” Diagnostics, vol. 11, no. 10, Art no. 1881, 2021.
CrossRef | Google Scholar | PubMed

M. Buttarello, “Laboratory diagnosis of anemia: Are the old and new red cell parameters useful in classification and treatment, how?” International Journal of Laboratory Hematology, vol. 38, no. S1, pp. 123–132, 2016.
CrossRef | Google Scholar | PubMed

Y. Gelaw, B. Woldu, and M. Melku, “The role of reticulocyte hemoglobin content for diagnosis of iron deficiency and iron deficiency anemia, and monitoring of iron therapy: A literature review,” Clinical Laboratory, vol. 65, no. 12/2019, 2019.
CrossRef | Google Scholar | PubMed

S. Pullakhandam and S. McRoy, “Classification and explanation of iron deficiency anemia from complete blood count data using machine learning,” BioMedInformatics, vol. 4, no. 1, pp. 661–672, 2024.
CrossRef | Google Scholar

A. M. El-Boghdady, S. Kishk, M. M. Ashour, and E. AbdElhalim, “Machine-learning based stacked ensemble model for accurate multi classification of CBC anemia,” Mansoura Engineering Journal, vol. 49, no. 3, Art no. 4, 2023.
CrossRef | Google Scholar

X. Lin, Z. Cheng, L. Yun, Q. Lu, and Y. Luo, “Enhanced recommendation combining collaborative filtering and large language models,” in Proceedings of the 2025 2nd International Conference on Informatics Education and Computer Technology Applications, ser. IECA 2025, ACM, Jan. 2025, pp. 40–45.
CrossRef | Google Scholar

S. Gholampour, “Impact of nature of medical data on machine and deep learning for imbalanced datasets: Clinical validity of SMOTE is questionable,” Machine Learning and Knowledge Extraction, vol. 6, no. 2, pp. 827–841, 2024.
CrossRef | Google Scholar

I. D. Mienye and Y. Sun, “A survey of ensemble learning: Concepts, algorithms, applications, and prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022.
CrossRef | Google Scholar

P. Mahajan, S. Uddin, F. Hajati, and M. A. Moni, “Ensemble learning for disease prediction: A review,” Healthcare, vol. 11, no. 12, Art no. 1808, 2023.
CrossRef | Google Scholar | PubMed

J. W. Asare, P. Appiahene, and E. T. Donkoh, “Detection of anaemia using medical images: A comparative study of machine learning algorithms—A systematic literature review,” Informatics in Medicine Unlocked, vol. 40, Art no. 101283, 2023.
CrossRef | Google Scholar

World Health Organization, “Haemoglobin concentrations for the diagnosis of anaemia and assessment of severity,” Technical documents, 2011.
Link | Google Scholar

W. M. Gardner, C. Razo, T. A. McHugh, H. Hagins, V. M. Vilchis-Tella, C. Hennessy, H. J. Taylor, N. Perumal, K. Fuller, K. M. Cercy, et al., “Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990–2021: Findings from the global burden of disease study 2021,” The Lancet Haematology, vol. 10, no. 9, pp. e713–e734, 2023.
CrossRef | Google Scholar | PubMed

C. C. Hsia, “Respiratory function of hemoglobin,” New England Journal of Medicine, vol. 338, no. 4, pp. 239–248, 1998.
CrossRef | Google Scholar | PubMed

A. Sarna, A. Porwal, S. Ramesh, P. K. Agrawal, R. Acharya, R. Johnston, N. Khan, H. P. S. Sachdev, K. M. Nair, L. Ramakrishnan, et al., “Characterisation of the types of anaemia prevalent among children and adolescents aged 1–19 years in India: a population-based study,” The Lancet Child & Adolescent Health, vol. 4, no. 7, pp. 515–525, 2020.
CrossRef | Google Scholar | PubMed

M. B. Zimmermann and R. F. Hurrell, “Nutritional iron deficiency,” The Lancet, vol. 370, no. 9586, pp. 511–520, 2007.
CrossRef | Google Scholar | PubMed

T. Uchida, “Change in red blood cell distribution width with iron deficiency,” Clinical & Laboratory Haematology, vol. 11, no. 2, pp. 117–121, 1989.
CrossRef | Google Scholar | PubMed

D. van Zeben, R. Bieger, R. K. A. van Wermeskerken, A. Castel, and J. Hermans, “Evaluation of microcytosis using serum ferritin and red blood cell distribution width,” European Journal of Haematology, vol. 44, no. 2, pp. 106–109, 1990.
CrossRef | Google Scholar | PubMed

M. Burk, J. Arenz, A. Giagounidis, and W. Schneider, “Erythrocyte indices as screening tests for the differentiation of microcytic anemias,” European Journal of Medical Research, vol. 1, no. 1, pp. 33–37, 1995.
Google Scholar | PubMed

V. A. Dugusheva, J. A. Kotova, and M. V. Pashkov, “Modern indicators of the general blood test in the differential diagnosis of anemia,” Medical Scientific Bulletin of Central Chernozemye, vol. 25, no. 3, pp. 88–91, 2024.
CrossRef | Google Scholar

M. Kang, “Machine Learning: Diagnostics and Prognostics,” in Prognostics and Health Management of Electronics, John Wiley & Sons, Ltd, 2018, ch. 7, pp. 163–191.
CrossRef | Google Scholar

A. J. Nashwan, I. M. Alkhawaldeh, N. Shaheen, I. Albalkhi, I. Serag, K. Sarhan, A. A. Abujaber, A. Abd-Alrazaq, and M. A. Yassin, “Using artificial intelligence to improve body iron quantification: A scoping review,” Blood Reviews, vol. 62, Art no. 101133, Nov. 2023.
CrossRef | Google Scholar | PubMed

N. Rane, S. P. Choudhary, and J. Rane, “Ensemble deep learning and machine learning: Applications, opportunities, challenges, and future directions,” Studies in Medical and Health Sciences, vol. 1, no. 2, pp. 18–41, 2024.
CrossRef | Google Scholar

B. Nair, C. Mysorekar, R. Srivastava, and S. Kale, “Towards thalassemia detection using optoelectronic measurements assisted with machine-learning algorithms: A non-invasive, pain-free and blood-free approach towards diagnostics,” in 2024 IEEE Applied Sensing Conference (APSCON), IEEE, Jan. 2024, pp. 1–4.
CrossRef | Google Scholar

D. Abdulkarim and A. M. Abdulazeez, “Machine learning-based prediction of thalassemia: A review,” Indonesian Journal of Computer Science, vol. 13, no. 3, pp. 4046–4071, 2024.
CrossRef | Google Scholar

M. Saleem, W. Aslam, M. I. U. Lali, H. T. Rauf, and E. A. Nasr, “Predicting thalassemia using feature selection techniques: A comparative analysis,” Diagnostics, vol. 13, no. 22, Art no. 3441, 2023.
CrossRef | Google Scholar | PubMed

K. Ferih, B. Elsayed, A. M. Elshoeibi, A. A. Elsabagh, M. Elhadary, A. Soliman, M. Abdalgayoom, and M. Yassin, “Applications of artificial intelligence in thalassemia: A comprehensive review,” Diagnostics, vol. 13, no. 9, Art no. 1551, 2023.
CrossRef | Google Scholar | PubMed

A. Karollus, Ž. Avsec, and J. Gagneur, “Predicting mean ribosome load for 5'UTR of any length using deep learning,” PLOS Computational Biology, vol. 17, no. 5, Art no. e1008982, 2021.
CrossRef | Google Scholar | PubMed

N. Tressa, A. V, S. C. M, S. K. Singh, and S. J, “Alpha thalassemia classifier using machine learning techniques based on genetic mutations,” in 2023 Third International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS), IEEE, Sep. 2023, pp. 118–122.
CrossRef | Google Scholar

A. S. Alharthi, A. Alqurashi, T. Essa Alharbi, M. M. Alammar, N. Aldosari, H. R. E. H. Bouchekara, Y. A. Sha'aban, M. Shoaib Shahriar, and A. Al Ayidh, “Explainable AI for sensor signal interpretation to revolutionize human health monitoring: A review,” IEEE Access, vol. 13, pp. 115990–116024, 2025.
CrossRef | Google Scholar

A. R. Laeli, Z. Rustam, S. Hartini, F. Maulidina, and J. E. Aurelia, “Hyperparameter optimization on support vector machine using grid search for classifying thalassemia data,” in 2020 International Conference on Decision Aid Sciences and Application (DASA), IEEE, Nov. 2020, pp. 817–821.
CrossRef | Google Scholar

C. Chatfield, “Exploratory data analysis,” European Journal of Operational Research, vol. 23, no. 1, pp. 5–13, 1986.
CrossRef | Google Scholar

T. Milo and A. Somech, “Automating exploratory data analysis via machine learning: An overview,” in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD/PODS '20, ACM, May 2020, pp. 2617–2622.
CrossRef | Google Scholar

H. Kürzl, “Exploratory data analysis: recent advances for the interpretation of geochemical data,” Journal of Geochemical Exploration, vol. 30, no. 1–3, pp. 309–322, 1988.
CrossRef | Google Scholar

R. Vohra, A. Hussain, A. K. Dudyala, J. Pahareeya, and W. Khan, “Multi-class classification algorithms for the diagnosis of anemia in an outpatient clinical setting,” PLOS ONE, vol. 17, no. 7, Art no. e0269685, 2022.
CrossRef | Google Scholar | PubMed

N. Ghaniaviyanto Ramadhan, Adiwijaya, W. Maharani, and A. Akbar Gozali, “Chronic diseases prediction using machine learning with data preprocessing handling: A critical review,” IEEE Access, vol. 12, pp. 80698–80730, 2024.
CrossRef | Google Scholar

M. Razavi, S. Ziyadidegan, A. Mahmoudzadeh, S. Kazeminasab, E. Baharlouei, V. Janfaza, R. Jahromi, and F. Sasangohar, “Machine learning, deep learning, and data preprocessing techniques for detecting, predicting, and monitoring stress and stress-related mental disorders: Scoping review,” JMIR Mental Health, vol. 11, Art no. e53714, Aug. 2024.
CrossRef | Google Scholar | PubMed

M. Latifi, R. B. Zali, A. A. Javadi, and R. Farmani, “Efficacy of tree-based models for pipe failure prediction and condition assessment: A comprehensive review,” Journal of Water Resources Planning and Management, vol. 150, no. 7, Art no. 03124001, 2024.
CrossRef | Google Scholar

H. A. Abdulqader and A. M. Abdulazeez, “Review on decision tree algorithm in healthcare applications,” Indonesian Journal of Computer Science, vol. 13, no. 3, pp. 3863–3881, 2024.
CrossRef | Google Scholar

F. Prinzi, T. Currieri, S. Gaglio, and S. Vitabile, “Shallow and deep learning classifiers in medical image analysis,” European Radiology Experimental, vol. 8, no. 1, Art no. 26, 2024.
CrossRef | Google Scholar | PubMed

N. Idris and M. A. Ismail, “A review of homogenous ensemble methods on the classification of breast cancer data,” Przegląd Elektrotechniczny, vol. 1, no. 1, pp. 101–104, 2024.
CrossRef | Google Scholar

M. Hort, Z. Chen, J. M. Zhang, M. Harman, and F. Sarro, “Bias mitigation for machine learning classifiers: A comprehensive survey,” ACM Journal on Responsible Computing, vol. 1, no. 2, pp. 1–52, 2024.
CrossRef | Google Scholar

J. S. Wadhwa, L. Jagwani, and B. Pitchaimanickam, “A hybrid gradient boosting algorithm for dynamic pricing using a custom dataset,” in 2024 5th International Conference on Image Processing and Capsule Networks (ICIPCN), IEEE, Jul. 2024, pp. 217–225.
CrossRef | Google Scholar

S. Rezvani, F. Pourpanah, C. P. Lim, and Q. M. J. Wu, “Methods for class-imbalanced learning with support vector machines: A review and an empirical evaluation,” Soft Computing, vol. 28, no. 20, pp. 11873–11894, 2024.
CrossRef | Google Scholar

R. Guido, S. Ferrisi, D. Lofaro, and D. Conforti, “An overview on the advancements of support vector machine models in healthcare applications: A review,” Information, vol. 15, no. 4, Art no. 235, 2024.
CrossRef | Google Scholar

M. Z. Tsegaye and M. Shashi, “A hybrid convolutional neural network and support vector machine classifier for Amharic character recognition,” Neural Computing and Applications, vol. 36, no. 27, pp. 16839–16856, 2024.
CrossRef | Google Scholar

F. Furizal, A. Ma’arif, D. Rifaldi, and A. A. Firdaus, “Comparison of convolutional neural networks and support vector machines on medical data: A review,” International Journal of Robotics and Control Systems, vol. 4, no. 1, pp. 445–462, 2024.
CrossRef | Google Scholar

L. Revathi and R. Murugesh, “A review of support vector machine in cancer prediction on genomic data,” International Journal of Bioinformatics Research and Applications, vol. 20, no. 2, pp. 161–180, 2024.
CrossRef | Google Scholar

G. Corso, H. Stark, S. Jegelka, T. Jaakkola, and R. Barzilay, “Graph neural networks,” Nature Reviews Methods Primers, vol. 4, no. 1, Art no. 17, 2024.
CrossRef | Google Scholar

F. Aguirre, A. Sebastian, M. Le Gallo, W. Song, T. Wang, J. J. Yang, W. Lu, M.-F. Chang, D. Ielmini, Y. Yang, et al., “Hardware implementation of memristor-based artificial neural networks,” Nature Communications, vol. 15, no. 1, Art no. 1974, 2024.
CrossRef | Google Scholar | PubMed

X. Zhao, L. Wang, Y. Zhang, X. Han, M. Deveci, and M. Parmar, “A review of convolutional neural networks in computer vision,” Artificial Intelligence Review, vol. 57, no. 4, 2024.
CrossRef | Google Scholar

Z. Liu, G. Wan, B. A. Prakash, M. S. Lau, and W. Jin, “A review of graph neural networks in epidemic modeling,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ser. KDD '24, ACM, Aug. 2024, pp. 6577–6587.
CrossRef | Google Scholar

M. Kurucan, M. Özbaltan, Z. Yetgin, and A. Alkaya, “Applications of artificial neural network based battery management systems: A literature review,” Renewable and Sustainable Energy Reviews, vol. 192, Art no. 114262, Mar. 2024.
CrossRef | Google Scholar

B. Yegnanarayana, Artificial neural networks. PHI Learning Pvt. Ltd., 2009. ISBN: 9788194884897.
Google Scholar

A. Krenker, J. Bester, and A. Kos, “Introduction to the artificial neural networks,” in Artificial neural networks methodological advances and biomedical applications. InTech, Apr. 2011.
CrossRef | Google Scholar

T. G. Dietterich, “Ensemble learning,” in The handbook of brain theory and neural networks, 2nd ed., M. A. Arbib, Ed., MIT Press, 2002, pp. 405–408.
Google Scholar | Link

P. Pintelas and I. E. Livieris, “Special issue on ensemble learning and applications,” Algorithms, vol. 13, no. 6, Art no. 140, 2020.
CrossRef | Google Scholar

M. Ganaie, M. Hu, A. Malik, M. Tanveer, and P. Suganthan, “Ensemble deep learning: A review,” Engineering Applications of Artificial Intelligence, vol. 115, Art no. 105151, Oct. 2022.
CrossRef | Google Scholar