Problem motivation: Multiple sclerosis (MS) is one of the most common autoimmune diseases of the central nervous system in young adults, causing an autoimmune response against myelin nerve sheath. The pathology of MS is located entirely in the central nervous system and presents as classic inflammation around capillaries and myelin sheaths, visible as colored areas or lesions on magnetic resonance imaging (MRI) scans. The disease typically progresses in one of three phases: (i) relapsing-remitting, (ii) secondary-progressive, and (iii) primary-progressive. For the purpose of diagnosing and monitoring the progression of MS, three-dimensional (3D) MRI brain scans are commonly used. The recommended standardized protocol for acquiring 3D MRI brain images includes T1-weighted (T1w) and FLAIR modalities. For predicting the future course of MS, particularly the risk of increased disability according to the expanded disability status scale (EDSS), measurements or biomarkers obtained from MRI scans are used. These biomarkers include measurements of total brain volume, atrophy of specific brain regions, changes in ventricular volume, changes in gray matter volume, changes in lesion volume and count over time with the progression of MS, etc. The topic of predicting the future progression of MS, namely the course of EDSS, is an active research field, where the recent research articles report the ability of predictive models to classify patients with disease progression with up to 80% accuracy, indicating the potential usefulness of these models as a diagnostic tool in the treatment process of MS.
Data: The database used in this master's thesis was created as part of the Artificial Intelligence in predicting Progression in Multiple Sclerosis study (AI ProMiS) and contains 3D MRI scans of patients with MS, along with demographic and clinical patient data [16]. The dataset consists of 1284 T1w and FLAIR MRI scans obtained for 486 patients, with 71.3% being female and 28.7% male, with an average age of 39.7 ± 10.3 years. The final dataset consists of measurements of brain volumes and lesion counts, patient information such as age, gender, EDSS scores, as well as volumes of healthy brain structure, corresponding volumes normalized relative to the intracranial volume, and asymmetry of corresponding left-right brain regions across the hemispheres. The ratio of patients with future disease progression in MS and those without is 1:4.
Methods: The design of training and evaluation of predictive models included three main steps: (i) selection of relevant features, (ii) mapping of feature space into a lower-dimensional space, and (iii) classification method. The methods used for feature selection were the Correlation-based Feature Selection (CFS) method, Recursive Feature Elimination (RFE) method, Least Absolute Shrinkage and Selection Operator (LASSO) method, and Genetic Algorithms (GA). For the purpose of reducing the dimensionality of input features, the Principal Component Analysis (PCA) method was used. The employed models for classifying features into classes of patient with and without future disease progression were the k-nearest neighbors (KNN) method, Random Forest (RF), and Support Vector Machines (SVM) method. Combinations of feature selection, with or without the use of PCA, and classification methods were trained and test using four-fold cross-validation and then computed the overall performance metrics like the area under the receiver operating curve (AUC), accuracy, sensitivity and specificity.
Results: From the perspective of comparing the selected methods for choosing the optimal subset of features, the correlation-based feature selection method proved optimal, providing the best results in combination with all three classification models. The best overall results were obtained using the method with the correlation filter in combination with the PCA and the SVC classifier, which yielded an AUC metric value of 0.77, accuracy of 0.69, sensitivity of 0.72, and specificity of 0.68 . Furthermore, the results indicate the positive impact of implementing the PCA method for reducing the dimensionality and/or mapping of input data, which generally improved the performance of the models.
Conclusion: Despite the use of different validation datasets, with our being composed of MRIs from five different scanners and four different institutions and thus the most heterogeneous, the obtained results of this master's thesis demonstrate rather good reproducibility of the best findings of a previous study [45] and thereby reaffirm the hypothesis regarding the ability to predict the progression of MS based on data from measurements of healthy and pathological brain regions obtained from T1w and FLAIR MRI scans.
|