Autism is a developmental disorder with first signs in early childhood (at 14 months) and is typically diagnosed at the age of 4. It is primarily expressed as altered behavior in social interaction, communication and imagination. The need for early diagnosis of autism and the need for time-relieving the diagnostic process are the main reasons for the increasingly frequent implementation of machine learning methods in the first stage of the diagnostic process, which is the identification of children at risk . This study aims to identify children with autism using machine learning classification models and identify the most important variables for classification, using predictive classification models. For this purpose, 6 different supervised machine learning methods were used on 4 different age groups. The sample consisted of 48050 children over 13 months old, with 60 % boys and 40 % girls and 20 % diagnosed. The presence of autism in the sample was 2 %.
The study was successful in identifying autism in a group of children aged 37-48 months where the most successful model (Random Forest method) achieved 72 % accuracy and 59 % sensitivity for autism and 90 % specificity for autism. The model misclassified 10 % of non-autistic cases and correctly identified 59 % of autistic cases and classified the remaining 41 % as having another diagnosis. Nested cross-validation was used for classification performance metrics estimation. In the group of three-year-olds, we identified gender and variables that measure children's auditory perception as the most important variables for classification.
Models were generally less accurate in younger age groups, with the Naive Bayes classifier being the least accurate. The youngest two age groups however showed very low sensitivity for autism (9 % in the youngest group and 18 % in the two-year-old group). The evaluation of models in different age groups showed varying success and no clear trend for which method is the most successful across all groups, which could be a consequence of different screeners and expression of autism in different age groups. We can conclude that despite the growing use of machine learning methods in autism diagnosis, we were not successful with finding a definitive solution across different age groups. We can identify the main challenge: the models struggle to distinguish autism from other diagnoses and they successfully identify diagnosed cases but misclassify 40 % of autistic cases as having another diagnosis.
|