Document Type : Original Article
Authors
1
Department of Statistics, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran.
2
Department of Community Medicine, School of Medicine, Rafsanjan University of Medical Sciences, Rafsanjan, Iran
3
Clinical Research Development Unit, Ali-Ibn Abi-Talib Hospital, Rafsanjan University of Medical Sciences, Rafsanjan, Iran; Department of Internal Medicine, Ali-Ibn Abi-Talib Hospital, School of Medicine, Rafsanjan University of Medical
4
Department of Physiology, School of Medicine, Hamadan University of Medical Sciences, Hamadan, Iran; Department of Pharmacology and Toxicology, School of Pharmacy, Hamadan University of Medical Sciences, Hamadan, Iran.
5
Physiology-Pharmacology Research Center, Research Institute of Basic Medical Sciences, Rafsanjan University of Medical Sciences, Rafsanjan, Iran Department of Physiology and Pharmacology, School of Medicine, Rafsanjan University of
Abstract
The accurate diagnosis of infectious diseases such as COVID-19 requires statistically reliable classification methods capable of handling complex, heterogeneous, and imbalanced data. In this study, several statistical and machine learning algorithms --logistic regression, linear discriminant analysis, k-nearest neighbors, decision tree, and random forest --were comparatively evaluated using clinical and laboratory data from 506 hospitalized patients in Rafsanjan, Iran. The dataset included 27 categorical and 11 quantitative variables. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was employed. Model performance was assessed using a comprehensive set of criteria, including accuracy, sensitivity, specificity, positive and negative predictive values (NPV), and the area under the ROC curve. The comparative analysis showed that RF and LR achieved the best overall performance, while SMOTE improved sensitivity and NPV at the expense of specificity. The findings emphasize the importance of appropriate imbalance correction and multi-metric evaluation in developing statistically robust diagnostic models for medical data.
Keywords
Subjects