1 Introduction
The increasing implementation of artificial intelligence in the students’ everyday lives requires a sound knowledge of the technology they use on a daily basis [
32]. In a phenomena-driven computer science education approach, understanding students’ conceptions is essential for designing lessons [
13]. Known by many terms – mental models [
33,
50], beliefs [
9,
33,
44], misconceptions [
9,
46], alternative frameworks/conceptions [
10,
15,
17,
51] or myths [
2,
22] – the supposedly incorrect prior knowledge [
46] of varying complexity determines the learning of students to a significant extent: “Research on students’ (alternative) conceptions in science has revealed that students prior conceptions severely influence, even determine learning of the science conceptions presented in class, in textbooks or the like. It is one of the ‘sad’ messages from this research that science instruction appears in general not to be too successful in guiding students from their preinstructional conceptions to the science conceptions” [
17, p. 47]. Under this assumption, we refer in this paper to
conceptions, analogous to terminology such as
beliefs,
ideas, or
mental models, in the sense of linguistically expressed, conflated, naïve explanations of terms and contexts by students.
Taking into account findings on students’ conceptions is also crucial for eliciting conceptual change [
17]. Up to this point, research on students’ conceptions is a recent field in computer science education [
5,
7,
14,
26,
29,
40]. However, in computer science education, the research process typically stops once students’ conceptions have been identified. This paper aims to propose and discuss a methodological approach to foster a comprehensive conceptual understanding that aligns with scientifically correct conceptions (hereafter denoted as
conceptual change) based on students’ conceptions research in the field of artificial intelligence (AI) and tentative results of a small experimental study with secondary school students.
3 Methods
Addressing this research question a similar quantitative approach as presented in the related literature [
3,
8] is chosen. In the following subsections the general study design, the intervention lesson and the constructed CCT is presented.
3.1 Study Design
The study was carried out at a German secondary school grade 10 and 11. A total of 76 students took part in the study, attending four different computer science courses. Out of these, the questionnaires of 69 students, 22 female and 47 male, were filed out completely and processed. Half of the four courses were from grade 10 and the other half from grade 11. All of the participating courses were elective, so the students voluntarily chose to participate in computer science classes. All participants in the study have received computer science lessons since grade 8, which take place once a week in all age groups. The teaching staff explained that the subject of AI had not yet been taught, but that ChatGPT was known.
The methodology follows a pre- and post-test design. In order to analyze the enhancement of conceptual understanding based on the agreement to a certain conception, data was collected by means of a questionnaire.
This pre-test was piloted and revised prior to the main intervention. On the one hand, the questionnaire was intended to record the student’s prior knowledge of AI and to identify a concept that had particularly strong approval. The CCT was then created based on this agreement. The designed pre- and post-test contains a four-point Likert scale, where 0 - strongly disagree, 1 - disagree, 2 - agree, 3 - strongly agree. The various multiple choice items are created from already existing research findings on students conceptions of AI. To analyze the effectiveness of the CCT, the participating students were assigned to either an experimental or control group.
3.2 Lesson Design
First, all students got the pre-test one week before the intervention. One week after the intervention the post-test occurred. In total, the study lasted for three weeks. The experimental group (N = 34 students) received a teaching intervention focusing on the CCT, while the control group (N = 35 students) was taught without the CCT. The lessons lasted 90 minutes in both groups. The following table shows the different, yet comparable lessons (see Table
1).
3.3 Test Instrument: Conceptual Change Text
In respect to the research question a CCT with the following structure [
23] was developed:
Task 1: First, the students got the task to comment on the statement “Every output of artificial intelligence is pre-programmed” in a written form.
Task 2: Afterwards they read the following text: You’ve probably tried out the new chatbot ChatGPT recently and noticed the following: No matter what question you asked the chatbot, it had a suitable answer to almost every question. Whether your input is a simple question, such as “What is a byte?”, or something more complex, such as “Write me a book summary of the book ’Faust’ by Goethe”. You are probably also familiar with some voice assistants, such as Siri from Apple or Alexa from Amazon. If you ask these voice assistants a question, they will also answer appropriately. Even if you make a search query on Google, the most suitable websites will be suggested to you based on your query. This means that no matter which artificial intelligence (AI for short) system you and your classmates would use, the developers of the AI system would have to have pre-programmed their own suitable output for every potential question or query. But this is not correct from a scientific perspective! For the developers of an AI system, it is even impossible! An AI system, such as ChatGPT, generates its answers based on its training data set and the algorithms on which the system is based. [...].
This text continues with the content clarification and examples to create a cognitive conflict and to deliver a scientifically correct concept.
Task 3: After reading the CCT the students were requested to read their statement from Task 1 and comment again on the same statement like before reading the text. Thus they revised their statement to broaden their conception.
3.4 Statistical Analysis
The results from the pre- and post-test were compared using the statistical software SPSS. Firstly, the mean values were compared with each other and significance test was carried out to assess the development of student’s conceptions with regard to the research question. In this case, the significance level was set at 5 %. If the significance is below 5 %, the test result is considered statistically significant. To carry out the significance test, a two-fold unpaired t-test was performed, which measures the difference in mean values between two groups in independent samples (in case of normal distribution), represented here by the experimental and control group. In this study the calculated mean values represent the student agreement to a common naïve conception. The greater the difference between the mean value of the two groups and the smaller the standard error, the less likely it is that the difference between the two groups is a coincidence.
4 Results
In this section the results of the students’ questionnaires are presented. Within the questionnaire the participants needed to express their agreement to the statement “Every output of artificial intelligence is pre-programmed” on a four point Likert scale (0 - strongly disagree, 1 - disagree, 2 - agree, 3 - strongly agree).
In Table
2 the results of the pre- and post-test are presented. In the pre-test the mean value of the control group (CG) is
M = 1.97 and the experimental groups (EG) mean value is
M = 1.88. The calculated difference of both groups is
d = 0.09. After the intervention the post test (see Table
2) was conducted and the mean value of the control group reaches
M = 1.49 where as the experimental group agreement is calculated with
M = 0.59. In the post test the difference of both groups reaches
d = 0.9.
The results of the unpaired t-test for the pre-test showed that no statistically significant difference was found between the mean values of the experimental and control groups (t = 0.4179, df = 67, p = 0.6774). When comparing the mean values of the post-test, a statistically significant result was found between the experimental group and the control group (t = 5.3125, df = 67, p < 0.0001). Furthermore we performed a paired t-test to analyze the development of the agreement within the groups. The results of the control group are considered to be statistically significant (t = 3.2400, df = 34, p = 0.0027). However, the results of the experimental group are considered to be statistically significant (t = 7.5392, df = 33, p < 0.0001).
To test on the normal distribution, we used a Shapiro-Wilk test, which did not confirm the normal distribution. Since
N > 30 applies to both groups, the t-test is considered robust to the violation of the normal distribution assumption. Accordingly, the result of the Shapiro-Wilk test is irrelevant in this case [
20,
25].
The results in Table
2 show that in both the control and the experimental group, the mean value of the students’ agreement with the students’ conceptions has fallen. However, the difference in the experimental group is more drastic, amounting to 1.29. According the t-test, the difference between the pre- and post-test in the experimental group is statistically significant, since
p < 0.001 was reached. The data is published [
39].
5 Discussion
In respect to conceptual change theory knowledge according to students’ conceptions is necessary to successfully achieve conceptual change towards a scientifically correct understanding of a concept. Here a logically break in the recent computer science education research gets visible and the research gap is uncovered.
First of all, it must be critically reviewed that under the term “conception(s)” – as mentioned at the beginning – many similar constructs such as
mental models [
33,
50],
beliefs [
9,
33,
44],
misconceptions [
9,
46],
alternative frameworks/conceptions [
10,
15,
17,
51] oder
myths [
2,
22] can be subsumed. All of which refer to the student’s supposedly incorrect prior knowledge of various kinds [
46], but are not always suitable as measurable or observable artifacts in the sense of conceptual change. Here it is essential to clarify which of the conceptions already highlighted in empirical research work are really suitable in terms of conceptual change and which are not. For this, the pre-conceptions must be related to the scientific concepts to be taught or be composed of individual measurable/observable conceptual components [
49]. In the groundwork on student conceptions of AI, however, it can be observed that it is often much more a matter of
attitudes [
30], which cannot be measured in the sense of a conceptual change, play a subordinate role here, but are relevant for the overall understanding of the subject matter to be taught [
32] and can also be used in computer science lessons in the sense of educational reconstruction [
13].
The aim of this paper is to test and discuss the influence of a conceptual change text (CCT) on students conceptions (represented by the student agreement) regarding the statement “Every output of artificial intelligence is pre-programmed”, since this includes a frequently occurring student conception in the literature and met with strong approval in a pilot test with secondary school students, as described. The study shows promising results regarding the effect of CCT on conceptual change in the cohort surveyed. Prior research presents a big variety of students’ conceptions of AI [
4,
19,
26,
29,
31,
34]. It needs to be tested if this positive results can be reinforced for other conceptions found. Thus, it follows on the successful results from the area of science education [
3,
8,
52,
53]. However, in order to be able to measure a long-term effect on the conceptual change, a follow-up test must also be carried out in future a few weeks after the post-test, as also suggested by Grospietsch and Mayer [
22] according to several other studies referenced.
In the presented study, however, no textbook texts were used in the comparison group, as related work did [
3,
8]. This is mainly due to the fact that textbooks in computer science rarely cover current school topics such as AI or appropriate textbook texts are not available in the target group language. It would be interesting to conduct similar studies with schoolbook texts from computer science to better compare the results of both fields.
Furthermore results from science education indicate the same effect of conceptual change texts on students, independent of their prior knowledge [
8]. Due to the fact that a hurdle in computer science education is the quite heterogeneous prior experiences and conceptions of students towards computer science [
6], it could be a well-suited instrument to equalize those differences.
It is also necessary to carefully consider and take into account so-called boomerang effects, which, according to Grospietsch and Mayer [
23], lead to conceptions (referred to here by Grospietsch and Mayer [
23] as “myths”) being favored and reinforced. These include long-term retention through the mere mention of a memorable scientific myth (boomerang effect of familiarity), the attractiveness of the simply formulated scientific myth in the face of too many scientific counterarguments (boomerang effect of information overload) and the distorted processing and reinforcement of a scientific myth in people with strong beliefs through confrontation with counterarguments (worldview boomerang effect) [
23]. For this reason, we cannot exclude the possibility that the pre-conceptions to be changed have been reinforced among some students.
Based on the successful methodological transfer from the field of science education, other methods such as concept cartoons [
11] or support through videos [
1] should also be considered with regard to the creation of a conceptual change in computer science education. It could also be interesting to explore which other conceptual change activities might be more suitable than texts based on affective factors of learning, since these are highly important in this research field [
18,
37].
In quasi-experimental studies, it is a challenge to design both learning environments in such a way that a fair comparison of both groups is possible [
45]. In this intervention the experimental group got the explanation on the functionality of
ChatGPT regarding the forming of answers. The control group needed to work out how different AI systems work by themselves. Due to this design it seems plausible to compare the groups’ answers. However, according to Taber [
45] it should be carefully considered if the treatment conditions are comparable. Thus, we would critically argue whether “the ’standard’ provision” [
45] is really the only available comparison or whether it could “be more informative, to test the innovation that is the focus of the study against some other approach already shown to be effective” [
45, p. 93].
It should not go unmentioned that although artificial intelligence has always been a field of research in computer science [
41] and is reflected in the numerous fundamental ideas of computer science [
43], it is still hardly found in educational standards or curricula at secondary school level. However, studies show slowly growing progress, for example in Germany [
48], the USA [
47] or China, India and South Korea [
27]. However, common standards to which educational research in the field of artificial intelligence and conceptual change can be oriented and classified accordingly are still lacking. In particular, the national standards of the AI4K12 initiative in the USA along the “Five Big Ideas of AI” [
47] seem to provide a promising classification of concepts in the addressed educational field – even if it has yet to be validated – as Druga, Otero and Ko [
16] have already used for teaching materials. It is therefore interesting to see to what extent students’ conceptions of AI and corresponding conceptual change activities can be classified along standards and implemented in the curriculum. This area of research therefore appears potentially promising for the future.