@phdthesis{192, author = {Moeketsi Ndaba and Anban Pillay and Absalom Ezugwu}, title = {A Comparative Study of Machine Learning Techniques for Classifying Type II Diabetes Mellitus}, abstract = {Diabetes is a metabolic disorder that develops when the body does not make enough insulin or is not able to use insulin effectively. Accurate and early detection of diabetes can aid in effective management of the disease. Several machine learning techniques have shown promise as cost ef- fective ways for early diagnosis of the disease to reduce the occurrence of health complications arising due to delayed diagnosis. This study compares the efficacy of three broad machine learning approaches; viz. Artificial Neural Networks (ANNs), Instance-based classification technique, and Statistical Regression to diagnose type II diabetes. For each approach, this study proposes novel techniques that extend the state of the art. The new techniques include Artificial Neural Networks hybridized with an improved K-Means clustering and a boosting technique; improved variants of Logistic Regression (LR), K-Nearest Neighbours algorithm (KNN), and K-Means clustering. The techniques were evaluated on the Pima Indian diabetes dataset and the results were compared to recent results reported in the literature. The highest classification accuracy of 100% with 100% sensitivity and 100% specificity were achieved using an ensemble of the Boosting technique, the enhanced K-Means clustering algorithm (CVE-K-Means) and the Generalized Regression Neu- ral Network (GRNN): B-KGRNN. A hybrid of CVE-K-Means algorithm and GRNN (KGRNN) achieved the best accuracy of 86% with 83% sensitivity. The improved LR model (LR-n) achieved the highest classification accuracy of 84% with 72% sensitivity. The new multi-layer percep- tron (MLP-BPX) achieved the best accuracy of 82% and 72% sensitivity. A hybrid of KNN and CVE-K-Means (CKNN) technique achieved the best accuracy of 81% and 89% sensitivity. CVE- K-Means technique achieved the best accuracy of 80% and 61% sensitivity. The B-KGRNN, KGRNN, LR-n, and CVE-K-Means technique outperformed similar techniques in literature in terms of classification accuracy by 15%, 1%, 2%, and 3% respectively. CKNN and KGRNN tech- nique proved to have less computational complexity compared to the standard KNN and GRNN algorithm. Employing data pre-processing techniques such as feature extraction and missing value removal improved the classification accuracy of machine learning techniques by more than 11% in most instances.}, year = {2018}, volume = {MSc}, }