Explainable Artificial Intelligence Integrated Ensemble Learning Framework for Diabetes Prediction

Main Article Content

Mohammed Mohammed Abdallazez

Abstract

Accurate prediction of diabetes based on clinical and demographic indicators is essential, as early prediction of this chronic metabolic disorder plays a critical role in preventing long-term organ complications. However, existing research continues to face significant challenges, including pronounced class imbalance, the scarcity of large and diverse datasets, and limited integration of explainable artificial intelligence. This research compares several ensemble learning methods (Decision Tree, Random Forest, AdaBoost, Gradient Boosting, Histogram based Gradient Boosting, extremely randomized Trees, and XGBoost) on a large imbalanced dataset (87,664 negative vs. 8,482 positive samples). To mitigate imbalance, we evaluate six resampling approaches including (Random Over Sampling, Random Under Sampling, Synthetic Minority Over-sampling, Adaptive Synthetic Sampling, Tomek Links Removal Sampling and (Synthetic Minority Over-sampling with Edited Nearest Neighbors). We assess models using metrics robust to class imbalance (precision, recall, F1, AUC-ROC, and AUC-PR) and calibration measures.  The Extra Trees classifier achieved the highest measured accuracy (0.994); with Random Over Sampling for balancing dataset. also, these results were compared with several previous works and number of machine learning algorithms, and the results showed superiority. Explainability is performed at both global and local levels: permutation and SHAP for global feature importance, and (Local Interpretable Model-Agnostic Explanations) force plots for instance-level reasoning. however, we analyzed this result using sensitivity, specificity, PR-AUC and calibration, we report detailed experiments showing how resampling method, hyperparameter tuning, and stratified validation influence performance. Finally, we provide clinical-relevant insights from SHAP analyses and discuss limitations and future directions for deploying interpretable models in screening workflows.

Downloads

Download data is not yet available.

Article Details

How to Cite
Mohammed Abdallazez, M. (2025). Explainable Artificial Intelligence Integrated Ensemble Learning Framework for Diabetes Prediction. AlKadhim Journal for Computer Science, 3(4), 109–126. https://doi.org/10.61710/kjcs.v3i4.141
Section
Computer Science

References

Olorunfemi, B.O., Ogunde, A.O., Almogren, A. et al., “Efficient diagnosis of diabetes mellitus using an improved ensemble method”, Scientific Reports, Vol.15, No. 3235, (2025), available online: https://doi.org/10.1038/s41598-025-30733-4, last visit: 28.02.2013.

Aikaterini, T., Athanasia, K., Andreas, M., “Type 2 diabetes and quality of life”, National Library of Medicine, Vol.8, No.4, (2017), pp.120–129, available online: https://doi.org/10.4239/wjd.v8.i4.120, last visit: 28.02.2013.

Saihood, Q., Sonuç, E., “A practical framework for early detection of diabetes using ensemble machine learning models”, Turkish Journal of Electrical Engineering and Computer Sciences, Vol.31, No.4, (2023), Article 4, available online: https://doi.org/10.55730/1300-0632.4013, last visit: 28.02.2013.

Saif, D., Sarhan, A.M., Elshennaway, N.M., “Early prediction of chronic kidney disease based on ensemble of deep learning models and optimizers”, JESIT, Vol.11, No.17, (2024), available online: https://doi.org/10.1186/s43067-024-00142-4, last visit: 28.02.2013.

Das, D., Aayushman, Kumar, S., Hussain, M.A., Reddy, B.R., “Diabetes prediction using ensemble learning techniques”, Procedia Computer Science, (2025), available online: https://doi.org/10.1016/j.procs.2025.04.573, last visit: 28.02.2013.

Qi, H., Song, X., Liu, S., Zhang, Y., Wong, K.K.L., “KFPredict: An ensemble learning prediction framework for diabetes based on fusion of key features”, Computer Methods and Programs in Biomedicine, Vol.231, (2023), pp.107378, available online: https://doi.org/10.1016/j.cmpb.2023.107378, last visit: 28.02.2013.

Islam, M.T. et al., “Diabetes mellitus prediction using different ensemble machine learning approaches”, 2020 ICCCNT, IEEE, (2020), pp.1–7, available online: https://doi.org/10.1109/ICCCNT49239.2020.9225446, last visit: 28.02.2013.

Ayat, Y. et al., “Novel diabetes classification approach based on CNN-LSTM: enhanced performance and accuracy”, Diagnostyka, Vol.25, No.1, (2024), available online: –, last visit: 28.02.2013.

Yaman, M.A., Rattay, F., Subasi, A., “Comparison of bagging and boosting ensemble machine learning methods for face recognition”, Procedia Computer Science, Vol.194, (2021), pp.202–209, available online: https://doi.org/10.1016/j.procs.2021.10.074, last visit: 28.02.2013.

Maulidiyyah, N.A. et al., “Comparison of decision tree and random forest methods in the classification of diabetes mellitus”, JIKO, Vol.7, No.2, (2024), pp.79–87, available online: –, last visit: 28.02.2013.

Sahid, M.A., Babar, M.U.H., Uddin, M.P., “Predictive modeling of multi-class diabetes mellitus using machine learning and filtering Iraqi diabetes data dynamics”, PLOS One, Vol.19, No.5, (2024), e0300785, available online: https://doi.org/10.1371/journal.pone.0300785, last visit: 28.02.2013.

Gupta, H., Varshney, H., Sharma, T.K. et al., “Comparative performance analysis of quantum machine learning with deep learning for diabetes prediction”, Complex Intelligent Systems, Vol.8, (2022), pp.3073–3087, available online: https://doi.org/10.1007/s40747-021-00398-7, last visit: 28.02.2013.

Olatunji, A., “Detection and classification of diabetic retinopathy using deep learning models”, Electronic Theses and Dissertations, Paper 4333, (2024), available online: https://dc.etsu.edu/etd/4333, last visit: 28.02.2013.

Ahmed, U. et al., “Prediction of diabetes empowered with fused machine learning”, IEEE Access, Vol.10, (2022), pp.8529–8538, available online: https://doi.org/10.1109/ACCESS.2022.3141836, last visit: 28.02.2013.

Abousaber, I., Abdallah, H.F., El-Ghaish, H., “Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets”, Frontiers in Artificial Intelligence, Vol.7, (2025), Article 1499530, available online: https://doi.org/10.3389/frai.2025.1499530, last visit: 28.02.2013.

Sampath, P. et al., “Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique”, Scientific Reports, Vol.14, No.1, (2024), pp.28984, available online: https://doi.org/10.1038/s41598-024-31601-2, last visit: 28.02.2013.

Kaliappan, J. et al., “Analyzing classification and feature selection strategies for diabetes prediction across diverse datasets”, Frontiers in Artificial Intelligence, Vol.7, (2024), Article 1421751, available online: https://doi.org/10.3389/frai.2024.1421751, last visit: 28.02.2013.

Kaliappan, J. et al., “Analyzing classification and feature selection strategies across diverse datasets”, (2024), available online: https://doi.org/10.3389/frai.2024.1421751, last visit: 28.02.2013.

Arslan, A.K. et al., “Enhancing type 2 diabetes mellitus prediction by integrating metabolomics and tree-based boosting approaches”, Frontiers in Endocrinology, Vol.15, (2024), Article 1444282, available online: https://doi.org/10.3389/fendo.2024.1444282, last visit: 28.02.2013.

Jaycee, M. et al., “Acoustic analysis and prediction of type 2 diabetes mellitus using smartphone-recorded voice segments”, Mayo Clin Proc Digital Health, Vol.1, No.4, (2023), pp.534–544, available online: http://creativecommons.org/licenses/by-nc-nd/4.0/, last visit: 28.02.2013.

“AI could predict type 2 diabetes up to 10 years in advance”, Imperial NHS, (2023), available online: https://www.imperial.nhs.uk/about-us/news/ai-could-predict-type-2-diabetes-up-to-10-years-in-advance?utm_source=chatgpt.com, last visit: 28.02.2013.

Bukhari, M.M. et al., “An improved artificial neural network model for effective diabetes prediction”, Complexity, Vol.2021, No.1, (2021), Article 5525271, available online: https://doi.org/10.1155/2021/5525271, last visit: 28.02.2013.

Krishnan, R.H., Sheshasaayee, A., “Optimizing diabetes classification: BOA-enhanced ML with EDA and SMOTE”, (2025), available online: –, last visit: 28.02.2013.

Nuankaew, P., Chaising, S., Temdee, P., “Average weighted objective distance-based method for type 2 diabetes prediction”, IEEE Access, Vol.9, (2021), pp.137015–137028, available online: https://doi.org/10.1109/ACCESS.2021.3117374, last visit: 28.02.2013.

Talari, P. et al., “Hybrid feature selection and classification technique for early prediction and severity of diabetes type 2”, PLOS One, Vol.19, No.1, (2024), e0292100, available online: https://doi.org/10.1371/journal.pone.0292100, last visit: 28.02.2013.

Meshram, N. et al., “Automatic detection and classification of diabetic eye disorders”, JETIR, Vol.11, No.5, (2024), available online: www.jetir.org, last visit: 28.02.2013.

Sajjadi, S.F. et al., “Algorithms to define diabetes type using data from administrative databases: a systematic review of the evidence”, Diabetes Research and Clinical Practice, Vol.203, (2023), pp.110859, available online: https://doi.org/10.1016/j.diabres.2023.110859, last visit: 28.02.2013.

“Kaggle diabetes prediction dataset”, available online: https://www.kaggle.com/datasets/iammustafatz/diabetes-prediction-dataset, last visit: 28.02.2013.

Kunapuli, G., “Ensemble Methods for Machine Learning”, (2023), Manning Publications Co., available online: –, last visit: 28.02.2013.

Geurts, P., Ernst, D., Wehenkel, L., “Extremely randomized trees”, Machine Learning, Vol.63, No.1, (2006), pp.3–42, available online: https://doi.org/10.1007/s10994-006-6226-1, last visit: 28.02.2013.