Diabetes Multiclass Prediction Using Ensemble Learning Techniques

https://doi.org/10.61710/kjcs.v2i4.87

Authors

  • Akbal Salman Space Technology Engineering Department, Electrical Engineering Technical College, Middle Technical University, Iraq

Keywords:

Diabetes, PIMA dataset, Naïve Bayes, Random Forest, SVM, Multiclass diabetes dataset

Abstract

Diabetes is one of the most prevalent diseases in the modern era, leading to a significant number of deaths annually, as reported by the World Health Organization. Early prediction of diabetes can substantially improve patient outcomes and save lives. This study introduces a new model for predicting diabetes using the Random Forest algorithm, known for its powerful ability to split data until reaching an optimal state. Two datasets are utilized: the Multiclass Diabetes dataset and the PIMA Indian Diabetes dataset. The data are preprocessed by removing outliers, handling missing values, and balancing the classes. These preprocessed data are then classified using the Random Forest algorithm through continuous splitting until the stopping criteria are met, aiming to predict diabetic individuals. The proposed model demonstrated superior performance with the Multiclass Diabetes dataset, it achieves a validation accuracy of 100%, a precision of 98.20%, and recall and F1 scores of 98.11% and 98.12%, respectively. With the PIMA dataset, the proposed model achieves a validation accuracy of 85.30%, with precision, recall, and F1 scores of 88.07%, 87.50%, and 87.50%, respectively. In addition to our proposed model, we built many machine learning models with the first dataset such as SVM, logistic regression, logistic regression with L1/ L2 regularization, K-NN, and naïve bayes. Our results indicate that the Random Forest algorithm significantly outperforms other machine learning techniques in predicting diabetes, offering a highly accurate and reliable tool for early diagnosis. This research underscores the potential of ensemble learning in healthcare, particularly in managing chronic diseases like diabetes.

References

“World Health Organization.” Accessed: Jun. 01, 2024. [Online]. Available: https://www.who.int/health-topics/diabetes#tab=tab_1

A. K. Nawar et al., “Heart Attack Prediction by Integrating Independent Component Analysis with Machine Learning Classifiers,” Res. Sq., 2024, doi: https://doi.org/10.21203/rs.3.rs-5256555/v1.

L. R. Al-Khazraji, A. R. Abbas, and A. S. Jamil, “A Systematic Review of Deep Dream,” Iraqi J. Comput. Commun.

Control Syst. Eng., vol. 23, no. 2, pp. 192–209, 2023, doi: 10.33103/uot.ijccce.23.2.15.

A. Z. Mohammed and L. E. George, “Osteoporosis detection using convolutional neural network based on dual-energy X-ray absorptiometry images,” Indones. J. Electr. Eng. Comput. Sci., vol. 29, no. 1, pp. 315–321, 2022, doi: 10.11591/ijeecs.v29.i1.pp315-321.

L. R. Ali, S. A. Jebur, M. M. Jahefer, and B. N. Shaker, “Employing Transfer Learning for Diagnosing COVID-19 Disease,” Int. J. online Biomed. Eng., vol. 18, no. 15, pp. 31–42, 2022, doi: https://doi.org/10.3991/ijoe.v18i15.35761.

L. R. Al-Khazraji, A. R. Abbas, and A. S. Jamil, “The Effect of Changing Targeted Layers of The Deep Dream Technique Using VGG-16 Model,” Int. J. online Biomed. Eng., vol. 19, no. 3, pp. 34–47, 2022.

S. Sumathi, S. Rajappa, L. A. Kumar, and S. Paneerselvam, Advanced decision sciences based on deep learning and ensemble learning algorithms: a practical approach using python. Nova Science Publishers, Inc., 2021. doi: https://doi.org/10.52305/XSMY1504.

H. Jafarzadeh, M. Mahdianpari, E. Gill, F. Mohammadimanesh, and S. Homayouni, “Bagging and boosting ensemble classifiers for classification of multispectral, hyperspectral and polSAR data: A comparative evaluation,” Remote Sens., vol. 13, no. 21, 2021, doi: 10.3390/rs13214405.

D. C. Yadav and S. Pal, “Prediction of heart disease using feature selection and random forest ensemble method,” Int. J. Pharm. Res., vol. 12, no. 4, pp. 56–66, 2020, doi: 10.31838/ijpr/2020.12.04.013.

S. Abdulazeez, A. K. Nawar, N. B. Hassan, and E. Tariq, “Internet of Things: Architecture, Technologies, Applications, and Challenges,” AlKadhim J. Comput. Sci., vol. 2, no. 1, pp. 36–52, 2024.

B. S. Zynal, A. T. Lateef, S. A. Jebur, and H. Naser, “Improving Communication Performance Through Fiber Amplifier EDFA,” AlKadhim J. Comput. Sci., vol. 2, no. 2, pp. 1–9, 2024.

Z. F. Hussain et al., “A new model for iris data set classification based on linear support vector machine parameter’s optimization,” Int. J. Electr. Comput. Eng., vol. 10, no. 1, pp. 1079–1084, 2020, doi: 10.11591/ijece.v10i1.pp1079-1084.

M. J. Hazar, B. N. Shaker, L. R. Ali, and E. R. Alzaidi, “Using received strength signal indication for indoor mobile localization based on machine learning technique,” Webology, vol. 17, no. 1. pp. 30–42, 2020. doi: 10.14704/WEB/V17I1/A206.

X. Feng, Y. Cai, and R. Xin, “Optimizing diabetes classification with a machine learning-based framework,” BMC Bioinformatics, vol. 24, no. 1, pp. 1–20, 2023, doi: 10.1186/s12859-023-05467-x.

“Multiclass Diabetes Dataset.” Accessed: May 30, 2024. [Online]. Available: https://data.mendeley.com/datasets/jpp8bsjgrm/1

“National Institute of Diabetes and Digestive and Kidney Diseases (2022) Pima Indians Diabetes - dataset by uci | data.world.” Accessed: May 30, 2024. [Online]. Available: https://data.world/uci/pima-indians-diabetes

L. R. Ali, H. K. Homood, and A. S. Elameer, “Feature Extraction Techniques on Facial Images : An Overview,” Int. J. Sci. Res., vol. 6, no. 9, pp. 2015–2018, 2017, doi: 10.21275/ART20176682.

M. Phongying and S. Hiriote, “Diabetes Classification Using Machine Learning Techniques,” Computation, vol. 11, no. 5, 2023, doi: 10.3390/computation11050096.

O. Iparraguirre-Villanueva, K. Espinola-Linares, R. O. Flores Castañeda, and M. Cabanillas-Carbonell, “Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes,” Diagnostics, vol. 13, no. 14, 2023, doi: 10.3390/diagnostics13142383.

S. Wei, X. Zhao, and C. Miao, “A comprehensive exploration to the machine learning techniques for diabetes identification,” IEEE World Forum Internet Things, WF-IoT 2018 - Proc., vol. 2018-Janua, no. July, pp. 291–295, 2018, doi: 10.1109/WF-IoT.2018.8355130.

U. M. Butt, S. Letchmunan, M. Ali, F. H. Hassan, A. Baqir, and H. H. R. Sherazi, “Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications,” J. Healthc. Eng., vol. 2021, 2021, doi: 10.1155/2021/9930985.

T. Dudkina, I. Meniailov, K. Bazilevych, S. Krivtsov, and A. Tkachenko, “Classification and prediction of diabetes disease using decision tree method,” CEUR Workshop Proc., vol. 2824, pp. 163–172, 2021.

A. Cutler, D. R. Cutler, and J. R. Stevens, “Random forests,” in Ensemble machine learning: Methods and applications, Springer Science & Business Media, 2012, pp. 157–175. doi: 10.1007/978-1-4419-9326-7.

P. T. Nguyen et al., “Soft computing ensemble models based on logistic regression for groundwater potential mapping,” Appl. Sci., vol. 10, no. 7, p. 2469, 2020, doi: 10.3390/app10072469.

M. Awad and R. Khanna, “Support Vector Machines for Classification,” in Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers, 2015, pp. 39–66. doi: 10.1007/978-1-4302-5990-9.

D. Berrar, “Bayes’ theorem and naive bayes classifier,” in Encyclopedia of Bioinformatics and Computational Biology, vol. 1, 2019, pp. 403–412. doi: 10.1016/B978-0-12-809633-8.20473-1.

S. A. Salman, A.-H. Ayad Salih, A. Hussein Ali, M. Khamees Khaleel, and M. Abdulghfoor Mohammed, “A New Model for Iris Classification Based on Naïve Bayes Grid Parameters Optimization,” Artic. Int. J. Sci. Basic Appl. Res., vol. 40, no. 2, pp. 150–155, 2018, [Online]. Available: http://gssrr.org/index.php?journal=JournalOfBasicAndApplied

S. Zhang, “Challenges in KNN Classification,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 10, pp. 4663–4675, 2022,doi: 10.1109/TKDE.2021.3049250.

M. Awad and R. Khanna, Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers. Springer Nature, 2015.

J. Qin and Y. Lou, “L1-2 Regularized Logistic Regression,” in 2019 53rd Asilomar conference on signals, systems, and computers, IEEE, 2019, pp. 779–783. doi: 10.1109/IEEECONF44664.2019.9048830.

S. A. Jebur, K. A. Hussein, and H. K. Hoomod, “Abnormal Behavior Detection in Video Surveillance Using Inception-v3 Transfer Learning Approaches,” IRAQI J. Comput. Commun. Control Syst. Eng., vol. 23, no. 2, pp. 210–221, 2023.

Published

2024-12-25

How to Cite

Salman, A. (2024). Diabetes Multiclass Prediction Using Ensemble Learning Techniques. AlKadhim Journal for Computer Science, 2(4), 10–22. https://doi.org/10.61710/kjcs.v2i4.87

Issue

Section

Computer Science