Keywords

1 Introduction

Recent years saw an emergence of artificial intelligence (AI) in all kinds of educational settings [4, 5]. Processes such as teacher-student communication, interaction with peers, and learning of multiple educational subjects are central in education. For children with neurodevelopmental disorders (NDDs), such as autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD), difficulties around these processes are greatly amplified. This is because such students have impairments in several cognitive, emotional, and motivational factors [8]; for example, many autistic people have impairments in social skills, emotion regulation, and attention. Moreover, children often express themselves in different ways using creativity, nonverbal communication, etc. This makes the modeling by AI of such a vulnerable population a great challenge.

AI technology, in particular machine learning (ML), has been notably helpful for children with NDDs in educational settings. This includes, e.g., automatic emotion recognition for personalized educational plans [10], recommendation of teacher-student communication strategy [8], and assessment and improvement of skills via gamified tasks [7]. To that end, ML models have been built from different kinds of data, such as subjective, behavioral and physiological data. On the one hand, dealing with data of such different nature is a challenging multimodal problem, from data collection to model building. On the other hand, this allows one to understand the student’s cognitive, metacognitive, emotional, and motivational state [11].

This paper analyzes the opportunities offered by recent ML-based approaches for NDD children. We focus on two aspects: (i) the datasets used in the studies, and (ii) the capabilities of the ML models trained from such data. Typically, each study relies on different data, so we consider multiple aspects: what data is collected, who collects the data, how data annotation is used, the size of the data, and so on. We do this analysis based on descriptions of the datasets in the selected papers since actual access to these private datasets was not possible.

While there exist reviews on AI for NDDs in education [1, 6] that focus mostly on what the AI models cover (e.g. social skills) and the effects on NDD children, our focus is instead on how such AI models are constructed. Analyses such as ours exist in the context of neurotypical students in specific tasks, such as prediction of student performance and failure aiming to identify the most predictive features and the types of ML models used [3, 14]. To our knowledge, however, there has been no review of ML datasets and models in the context of NDDs.

The analysis of datasets is crucial to understand common trade-offs, such as the human effort employed in annotating the data and which processes can be automated. Moreover, our analysis will also indicate how practical different ML approaches can be, which gives insight into transferring such ideas to a new setting. In the end, we discuss future challenges to help advance the area. Since AI for NDD children in schools is still a somewhat emerging area, we extend our scope to studies that educational activities possibly outside a school setting (e.g. at home).

The rest of this paper is organized as follows. In Sect. 2 we discuss the data characteristics of the selected studies, while the functionalities of ML models are discussed in Sect. 3. We discuss future challenges in Sect. 4, and conclude the paper in Sect. 5.

2 Analysis of Datasets

In this section, we analyze several characteristics of datasets used in recent studies on children with NDDs. We selected these papers since they are recent and cover different aspects of NDDs. We discuss the data collection and how humans were involved in this, e.g. by annotating variables. Note that our analysis is based on the data descriptions available in the papers, not on the actual datasets since they are private.

We make a distinction in the type of ML paradigm used, which in this case is either a supervised learning or reinforcement learning approach. An overview of the datasets is shown in Table 1, while the data annotation and automation aspects are summarized in Table 2.

2.1 Supervised Learning Studies

The goal of supervised learning is to predict a response variable given a set of predictors. It is a quite common ML approach given its conceptual simplicity, with well-established evaluation metrics. Normally, supervised learning starts with data collection followed by model training. After this, the model can be deployed, e.g. in a classroom.

Table 1. Overview of studies and datasets. Type refers to the variable type.
Table 2. Data collection, including tasks by humans, involved in building the datasets. N: dataset size; n: number of subjects. The indicated human effort was explicitly described in the studies and we avoided making inferences when that was not the case (the same holds for automation aspects).

Communication Strategy. In [8] the goal is to model teacher-student communication strategies for autistic children in schools. Educational activities such as academic, social and pedagogic ones are considered, which can be done in small groups, individually, etc., and take place inside or outside the classroom. The main ML goal is to predict whether the student will write a sentence when asked to (which can be a full, partial, or no response).

Several predictors are collected such as the teacher’s communication style, type of teaching, and the student’s emotional state during the interaction. As shown in Table 2, the student’s emotions and actions were annotated by humans in every interaction.

Emotion Recognition. The work in [10] targets children with disabilities in schools, and aims to identify the student’s emotional state (sad, scared, happy, or calm). The approach also suggests an interface for instructors to create a management plan, if necessary. Electroencephalogram signals (EEG) of the student’s brain activity are recorded as predictors of the student’s emotional state.

A ML model is trained to link EEGs and emotions. To this end, a dataset that associates images with emotions is used, as shown in Table 2. In this study, a pre-existing dataset was used (the international affective picture system (IAPS)), which results in less human effort by re-using data.

Task Switching. In [7] task switching for computational thinking is considered for autistic adolescents. The goal is to identify the student’s performance (positive/negative) in four gamified tasks for assessing task switching. As predictors, they recorded the student’s speech (answers and utterances) in the tasks.

As Table 2 indicates, the dataset instances were annotated as either positive or negative manually, by identifying examples of positive/negative performance in each gamified task [7]. However, the transcription of speech-to-text was automated.

2.2 Reinforcement Learning Studies

In reinforcement learning (RL) we model agents acting on an environment. The environment includes relevant information about the current state, e.g., student characteristics. As opposed to supervised learning, RL has the notion of action. The effect of an action on an environment is called a reward. Over time the agent learns how to better act, which is challenging since the actual effect of an action is uncertain.

Motivator Selection. The study [13] takes place in the context o individual education programs for autistic children. The goal is to select the appropriate motivator when a disruptive behavior occurs. A RL approach is used, which models the dynamics of state–action–reward. In this case, the state represents the child’s behavior, an action represents a motivator (edible, sensory, activity, token, social, or choice), and a reward indicates the result of using that motivator.

The RL system does not act autonomously here; instead, it supports the caregiver (e.g. a teacher or a therapist) by recommending a motivator when a behavior is identified by the caregiver. This ultimately puts the caregivers in control of which action to take.

As Table 2 indicates, the caregiver annotates multiple pieces of information: the agent’s state (such as the student’s behavior and its cause) and the reward obtained by using the selected motivator.

3 Machine Learning Models

In this section, we discuss aspects of ML models of the previous studies, such as model training and the functionalities they provide. Table 3 provides a summary.

The study on communication strategy [8] suggests that the main ML models are trained by using the educational interactions of all students together. Later on, they try to learn a model by adding previous interactions (i.e. auto-regressions), indicating that for some types of ML models this can be beneficial. While the main models predict the student’s response, the paper also shows an alternative way of using the models to instead provide recommendations on the communication style (originally a predictor variable).

To train a ML model for emotion recognition [10] , an image-emotion dataset is used, where each image is shown to the student. In this dataset, it is already known which emotion corresponds to the image, which allows the ML model to capture the EEG patterns for different emotions. The study suggests that a specific model is created for each student. After the model is trained, it can be used for predicting the emotional state of the student based on the observed EEG patterns. However, the performance of the models is unclear.

In the task switching paper [7], speech-to-text is used on each gamified task, and datasets specific to each gamified task are created. After transformation to text, feature extraction is done to identify relevant textual features, such as n-grams, line length, etc., which allows for training ML models. There are two phases in the ML implementation: baseline sessions, in which models are trained; and intervention sessions, where the models are used for making predictions in real-time while the student does the tasks at home.

Table 3. Functionalities and details of machine learning models. Type models are RF: random forest; LR: logistic regression; GP: Gaussian process; SVM: support vector machine; kNN: k-nearest neighbors.

As opposed to the other studies, the ML model for motivator selection [13] is built while the therapy intervention takes place. One major concern in this setting is ethical constraints. This means that the exploration by the RL agent is more limited (e.g. compared to game playing), which puts constraints on exploring all possible states, experiment repetition, among others. In practice, this means that the available data for the agent to learn will be limited. Moreover, this RL system only provides suggestions as to which motivator to use since the final decision is made by the caregiver, who can also reject the suggestion. The performance of Table 3 indicates that when a motivator was selected, this led to a positive reward in 89.6% of the cases. The authors also considered when motivators were selected by the caregivers without the RL agent, which resulted in a performance of 45.5%.

4 Challenges

In this section we point out existing limitations and future challenges that can help advance the area, inspired by the studies analyzed. We provide considerations about the data and ML models.

Datasets. One fundamental challenge for building useful ML models revolves around building datasets. For children with NDDs in educational contexts, we saw that datasets are built in different ways, where humans play a key role. This varies significantly, from manual labelling of data records (e.g. school activities [8]) to the usage of pre-existing datasets of image-emotions [10]. On the other hand, this indicates that processes such as data re-use and automation (e.g. the speech-to-text in [7]) are already in place.

Processes such as data re-use and automation come with their own issues. Using data created in different situations raises several issues: privacy and ethical concerns on the data level (to a lesser extent if the data is public), and whether meaningful and accurate models can be created. On the other hand, proper data re-use and automation can not only reduce the (human) burden of creating a ML dataset, but also potentially enable creating even larger datasets.

Another dataset issue is the limited size. Many studies rely on reasonable-sized datasets, however based on a small or very small number of subjects. For example, a child can participate in many different activities over time [8]. This attempts to makes up for the small number of participants, but can still raise concerns on the representativeness of the sample, since records from the same subject are likely not independent from each other. Addressing this issue is very important to improve the effectiveness of ML models.

ML Models. The deployment of ML models for vulnerable populations in educational settings is complex. In cases of data paucity, there can exist differences in the population where the ML model was trained on compared to the population where deployment happens (i.e. where the model is used). One example is when model training is done using pilot data and deployed on a different population (e.g. children and adolescents [7]). Such heterogeneity might lead to a deterioration of performance of ML models. This problem is complex to handle as it involves the study design. A related issue is when real-time data of audio, video, brain signals, and so on are collected and a shift in the data characteristics occurs at some point. While this can be problematic as well, in the ML community this problem is known as concept drift, for which several algorithms exist [9].

In terms of performance, the ML models based on supervised learning all achieved performance below 0.8 on different metrics, such as accuracy and F1 measure. While this is a reasonable performance, it is still far from perfect performance. In that sense, it is unclear whether the current performance is sufficient for the safe deployment of such approaches in classrooms involving students with NDDs.

We also point out the feasibility of certain approaches in the context of children with NDDs. For example, deploying the ML models that use technologies such as EEGs [10], which might be intrusive for autistic people in classrooms, due to the sensory difficulties that many of them have. Although a positive usability evaluation was obtained when using EEGs, this was based on only a few subjects.

5 Conclusion

In this paper, we inspected the opportunities offered by ML models for children with NDDs in educational settings. In particular, we examined the datasets used in a selection of studies in terms of what data was collected, by whom and where, with a special focus on whether humans were involved (and to what extent). We saw that datasets are built in very different ways, but they all have the same issue of limited size.

We see multiple possibilities for future work. We plan to include more studies, as well as extend the scope of our study by including, e.g., (affective) intelligent tutoring systems and systems based on unsupervised learning. Another direction is to perform evaluations based on the datasets themselves, such as data quality [12] and readiness levels [2]. This would normally require access to the data itself, which seems more feasible for public datasets.