1. Introduction
The emergence of generative artificial intelligence (Gen-AI) systems, or artificial intelligence (AI) in short, such as the current most popular ChatGPT tool from OpenAI [
1], is becoming a significant turning point in the academic world, the consequences of which are starting to be explored [
2,
3,
4], although the repercussions may be broader than anticipated. Gen-AI systems are created to produce a wide range of outputs, such as texts, images, videos, or codes, by employing a data repository that trains it. There exist some other Gen-AI systems, such as Rytr [
5], Jasper [
6], CopyAI [
7], Writesonic [
8], Kafkai [
9], Copysmith [
10], or Article Forge [
11], but the rapid success of ChatGPT model GPT-3.5 and GPT-4 has represented a significant advancement in AI technology, which has subsequently raised concerns about their potential impact on academic integrity [
2,
12,
13,
14,
15,
16,
17,
18,
19]. As stated in the literature, it is important to consider the ethical implications of these systems and plan the implementation of appropriate measures to ensure their responsible use in academic environments.
ChatGPT is a large language model in which generative pre-trained transformer models (GPT) generate content as a response to the interaction with a prompted question or command. The latest model of ChatGPT 3 was released only in November 2022, recently updated to model 3.5 in March 2023, and has spread at a dizzying speed as one of the most employed tools in the academia. GPT-4 has also been released in March 2023 improving the capabilities of GPT-3. They are designed to generate responses in dialogues/conversations to a wide variety of language tasks, and has shown, so far, more use cases than other Gen-AI systems.
GPT-3/GPT-4 has proven in a very short time to be a extremely handy tool in the academic field. The information provided by ChatGPT is being thoroughly studied and tested by OpenAI and numerous users that employ the tool to gauge its effectiveness and actual capabilities [
13,
14,
18,
19,
20,
21,
22,
23]. The ability, shown up to now, to carry out scientific essays and academic texts, as well as to solve complex problems, has called into question the efficiency of different learning methodologies and the appropriate use of these tools [
3,
13,
17,
23]. Since ChatGPT facilitates or directly solves the tasks posed by the teachers, it may enable students to bypass the learning process and acquire answers without developing the necessary knowledge, skills, or competencies [
3,
13,
23].
For more than a decade, following the Joint Declaration of the European Ministers of Education convened in Bologna on 19 June 1999 [
24], universities began a process where the teacher-centered model gave way to a new model, where the weight of learning was based more on students’ needs and paces. This evolution was accompanied by a more widespread use of digital tools, which were increasingly being implemented in the classroom and in the learning process. These two facts favored the rise of new educational methodologies, such as blended learning (BL or b-learning), which combines traditional face-to-face instruction with various digital technologies and resources. These strategies allow for greater flexibility in how, when, and where students learn, promoting the creation of a customized learning experience.
These strategies, which were progressively implemented in the academic world, helped to soften the impact of COVID-19 on education. The lockdown and the shift from a mixed face-to-face and online teaching (in the best cases) to a fully online environment caused a sudden change and an immediate need for technological adaptation. Literature can be found discussing different approaches and evaluating the consequences of this adaptation, see, for example, [
25,
26,
27,
28,
29]. After the pandemic, many digital elements remained, while others slowly faded away. These remaining tools, such as online exams, quizzes, knowledge reinforcement exercises, and games are mainly implemented, as in the case we are addressing, in blended learning methodologies. In this article, we will focus on examining the possible consequences of introducing AI into these types of methodologies, as they seem to have more vulnerabilities when faced with such tools.
Blended learning can be defined as a student-centered approach that combines the benefits of online learning (flexibility, abundant resources, and timely updates) with the interactivity of traditional teaching. Researchers have assessed the feasibility and effectiveness of these models through multiple dimensions, such as knowledge acquisition, competencies performance, technology availability, and satisfaction. Although the concept of BL is not new, its use and implementation have been expanding in the academic world [
30,
31,
32]. This growth has been driven and reinforced by the previously mentioned technological advancements and the extensive range of resources that have been introduced.
BL integrates the best aspects of traditional face-to-face instruction with online learning components to create an optimal, flexible, and engaging learning experience for students [
29,
33,
34]. This educational methodology has demonstrated several positive outcomes in education, such as improved learning outcomes, increased student commitment and satisfaction, enhanced self-regulated learning and time management, increased access and adaptability, and cost-effectiveness [
35,
36,
37,
38,
39,
40,
41,
42,
43].
In a BL environment, students can benefit from various learning modalities, from face-to-face instruction (involving in-person interactions between teachers and students, direct communication and immediate feedback [
44]), online learning (involving self-paced learning through online resources), and collaborative learning (encouraging collaboration among students and knowledge sharing). However, some studies have reported potential challenges or negative outcomes, such as technology barriers that can hinder the effectiveness of BL for some students [
45], an increased workload for educators [
46], social isolation [
47] or difficulty in adapting to the learning format [
48]. These challenges can be addressed through careful planning, providing adequate support to both students and educators, and continuously evaluating and refining the BL approach.
In blended learning, many other methodologies can be included, such as flip-teaching (FT) or the use of game-based learning (GBL). However, since these latter methodologies have their own distinctive elements, we treat them as separate methodologies. These two learning methodologies that have shown considerable success in academia, and may also be affected by the emergence of AI technologies.
FT methodology aims to foster active participation in the learning process by incorporating out-of-classroom activities. These activities are designed to help students learn, practice, and master the required concepts. Recent studies have shown the effectiveness of FT in promoting student engagement, improving student performance and satisfaction, and enhancing critical thinking skills [
49,
50,
51,
52].
Another of the methodologies in the BL setting used in this research has been GBL. This methodology involves introducing games or game elements in the classroom, in order to promote motivation, active participation, and the creation of a positive learning environment [
53,
54,
55,
56,
57,
58,
59,
60,
61]. Among the games that can be introduced in the classroom, Escape Rooms (ER) have recently gained significant popularity in the academic world, due to their adaptability to different environments, promotion of collaborative work, and varied nature of the challenges to be solved. When this type of game is applied with the aim of promoting learning and competency development in education, they are called Educational Escape Rooms (EERs) [
62,
63,
64]. EERs are usually developed in the classroom or in a controlled environment with physical participation of students in groups. There is also a digital version of these games (digital Educational Escape Rooms, dEERs), which can be collaborative or not, and which students have the option to do outside of the classroom. This modality was motivated by the simplification of resources and the COVID-19 pandemic [
29,
65,
66]. The application of this type of game seems to be related to promoting students’ learning process and enhancing the development of transversal competencies, such as team-work, lateral and critical thinking, communication, and working under pressure, among others [
62,
64,
65,
66,
67,
68,
69]. Escape rooms are based on implementing a theme and a narrative that serves as the guiding thread of the activity. The tremendous thematic variety allows these dEER to be applied in many contexts.
AI, specifically ChatGPT, has demonstrated a high level of proficiency in composing texts and essays, translating between various languages, and generating original ideas. In Science, Technology, Engineering and Mathematics (STEM) area, this tool, in addition to aiding the aforementioned activities, faces the added challenge of performing calculations and solving scientific problems, such as engineering issues or mathematical challenges. It is in this area where more difficulties arise in obtaining reliable answers. The mathematics subject, which we focus on in this study, requires an advanced understanding of certain concepts and, most importantly, the prior development of specific competencies for success. The use of scientific language, which aids in understanding and solving problems that arise in STEM once mastered, is also a cornerstone in the smooth development of the learning process. Many students exhibit weaknesses in some competencies, especially in the early years of their studies, so an appropriate course design can help address these shortcomings. The emergence of tools such as ChatGPT, which could potentially solve these problems, might weaken the learning process by hindering the deep assimilation of techniques and results.
All three methodologies (BL, FT, GBL) aim to increase student engagement and improve learning outcomes by making learning more interactive, flexible, and enjoyable for students. While each method has its unique approach, they can be combined or used interchangeably, depending on the needs of the students and the goals of the educators. Nevertheless, they can be threatened or reinforced by the emergence of ChatGPT. When faced with a challenge or problem, students typically seek help from teachers, consult online resources, such as web pages, texts, videos, or tutorials, and fill in gaps in their knowledge. However, the emergence of ChatGPT presents a new challenge to this process, potentially diminishing the efficiency of existing teaching methodologies and the designed activities [
3,
14].
This article seeks to evaluate ChatGPT models GPT-3.5 and GPT-4 problem-solving capabilities in a mathematical context, specifically within the STEM field. The study focuses on Mathematics I, a course offered at the Higher Technical School of Design Engineering, Technical University of Valencia (UPV), Spain. The course employs a BL methodology, where laboratory work and weekly tasks are conducted using a FT approach, while other methodologies, such as GBL techniques, are utilized in the classroom setting [
44]. By testing its abilities in a real-world academic setting, and studying students’ potential use and opinions, this study can provide valuable insights into the possible consequences that ChatGPT and other generative AI tools can have on the BL methodology applied in this case study and the implications in general STEM education.
3. Results
3.1. Mathematical Tests with ChatGPT
Before the lab session, students engage in self-directed learning using the FT methodology. This involves accessing exercises and explanatory texts with examples similar to those they will face in the lab session. At the start of the class, the teacher reviews the content, and students have the opportunity to ask questions. The lab sessions are based on the use of Wolfram Mathematica software [
74], version 12 or 13, a highly reliable and powerful mathematical calculation tool capable of solving a wide range of physical and mathematical problems.
However, using Mathematica requires prior knowledge of the correct syntax and the appropriate commands to use for each calculation. To help with this, Mathematica provides an assistant that suggests and corrects possible errors in syntax. Once the command is entered correctly, Mathematica returns a solution with an impressive level of reliability and precision.
One possible disadvantage of using Mathematica is that it needs some mathematical background to identify the problem to be solved, using the proper command or function, and interpreting correctly the outputs. Mathematica software provides a vast digital library with thousands of examples that students can access through the Wolfram Documentation Assistant to help them learn the fundamental mathematical knowledge and the syntax and command structure necessary for solving problems. Additionally, the UPV offers free access computer rooms in all schools and faculties; Wi-Fi technology in its three campuses, both in buildings and in gardens and outdoor areas, and its Mathematica license enables its use available to current faculty, staff and students for teaching, learning and academic research, and even to install it in their laptops with a provisional 1-year renewable licence. For those who do not have access to Mathematica, Wolfram Alpha [
75] is a free tool that can be used to solve a wide range of mathematical problems, using a more general syntax. Wolfram Alpha can be accessed via the internet, and calculations are performed on the Wolfram server rather than on personal computer hardware.
In the laboratory sessions of Mathematics I, Wolfram Mathematica is a fundamental tool for solving exercises. Students need to review the theoretical knowledge and learn the specific syntax of the software to solve the problems successfully. During the session, after the teacher’s explanation and doubt clarification, students take a test that includes exercises similar to those reviewed at home. The tests are carried out on a weekly basis, and in each session, different mathematical topics related to the theoretical concepts covered in the theory sessions are addressed. This paper shows the results of 18 tests, from Test01 to Test18. The evaluated tests cover the topics of complex numbers, hyperbolic functions, root finding, calculus of integrals, applications of integral calculus, numerical integration, improper integrals, systems of equations, matrices, determinants, curve fitting, vector spaces, Euclidean spaces, linear applications, and matrix diagonalization. The tests are designed to be carried out within a limited time and in a controlled environment. Despite having access to the Internet during the session, students are expected to behave honestly during the test. The aim of this weekly learning process is to reinforce the critical thinking skills of the students and to gain a deep understanding of the mathematical concepts required in a Bachelor’s degree in engineering. The advent of ChatGPT has raised questions about its impact on this students’ learning process. If misused, it could lead to an impoverishment in the acquisition of competencies, while, if used correctly, it could reinforce the mathematical knowledge. The authors have attempted to solve these tests with ChatGPT in order to evaluate GPT-3.5 (Legacy) and GPT-4 capabilities in solving the problems presented in the laboratory sessions.
Unlike Mathematica, ChatGPT interface offers a much more flexible syntax for requesting calculations. Students can simply copy and paste the problem into the system, and ChatGPT will evaluate the problem and determine how to solve it. This approach has shown remarkable reliability, with a success rate of 96% for model GPT-3.5 and 98% for model GPT-4 in interpreting the meaning of a collection of 100 mathematical exercises covering various problems in Differential Calculus, Algebra, Integral Calculus, and Series, and offering an appropriate theoretical answer. We assigned a score of 1 for correct interpretations and 0 for incorrect ones to measure the reliability. It is important to note that the way in which the question is written can affect the system’s understanding of the problem, so this indicator should only be taken as a rough estimate of the reliability of the answer. In the few cases where ChatGPT failed to interpret the problem correctly, the appropriate answer was obtained after no more than two interactions with the system. In addition, ChatGPT provides a detailed response with the necessary steps to solve the problem, which is a significant advantage during its use.
Next, we evaluate the accuracy of ChatGPT’s numerical mathematical solutions (NMS). For example, when asked to diagonalize a matrix, part of the problem involves correctly identifying the task at hand, while the other part involves correctly solving the problem numerically and providing the appropriate matrices. GPT-3.5 has shown lower accuracy in this second part of the calculation (see
Table 2).
Initially, only 36% of the problems were solved correctly. However, when an error was detected and corrected through interaction with the AI, the success rate slightly increased. For instance, a dot product of two vectors repeatedly yielded errors, but after a third or fourth interaction, the success rate increased to 44%. It is important to note that ChatGPT offers the option to re-evaluate the answer without indicating any reason, using an equivalent technique if possible. This showcases the AI’s impressive versatility, but it also means that the reassessed answer may not always be accurate, despite the initial solution being correct. One question that arises from these findings is whether ChatGPT is capable of passing tests without the intervention of the user.
The results suggest that while ChatGPT still struggles with passing exams primarily based on mathematical calculations, it does demonstrate remarkable proficiency in theoretically orienting the posed problem. Therefore, it can be a valuable resource for students during their learning activities. Furthermore, it is worth noting that even when ChatGPT does not provide a correct theoretical answer, students with prior knowledge of the subject can leverage their critical thinking skills to rephrase or break down the question into smaller parts so that ChatGPT can provide satisfactory answers. This indicates that ChatGPT can function as a complementary tool to traditional learning methods rather than a substitute.
As can be seen in
Table 2, the reliability of AI with respect to Theoretical Mathematical Solution (TMS) is extremely accurate; in 90% of the tests, model GPT-3.5 has provided the correct solution to the problem, although it has failed in the calculations performed. GPT-4 improves the theoretical results up to 95%, although it also fails in the calculations. However, during the process, an improvement in the final results can be appreciated, since GPT-4 increases the score obtained in 70% of the occasions. Although it cannot yet be claimed that they are capable of passing a purely numerical exam, it can be said without any doubt that they have been able to understand and provide the necessary steps for their resolution. In fact, the authors have conducted the experiment of solving the problems following the steps indicated by GPT-3.5 and GPT-4 but performing the calculations with Mathematica, and the results have been extremely good; all the tests and exam obtained more than a 8.5 (Mean = 9.5, Median = 9, SD = 1.5).
3.2. Digital Escape Rooms and ChatGPT
This subsection examines ChatGPT’s problem-solving abilities on the dEERs designed for the course. ChatGPT was applied for the resolution of 5 dEERs that covered 3 for algebra (dEER1, dEER2, and dEER3) and 2 for integral calculus (dEER4 and dEER5). The concepts were related to those seen in the corresponding parts of the theory. In each of the dEERs, the tests were of two types, numerical problem-solving and multiple-choice questions with different response options. ChatGPT’s performance in both types of questions is different due to the nature of the questions.
The responses to the numerical problem-based questions were similar to those obtained in the laboratory session tests since they are based on numerical results with considerable precision. However, due to the fact that Mathematica was not required during the game, and, therefore, the questions were designed to not require a very powerful calculation engine, the number of correct responses increased. ChatGPT (both GPT-3.5 and GPT-4) performed well in the multiple-choice questions, with better results (see
Table 3). This could have important implications for the use of these methodologies and the reduction in competencies that need to be reinforced.
In this case, we did not evaluate response by response. The problem was presented directly to ChatGPT. When the response was incorrect, the problem was presented again, as the game allows for multiple attempts (although there is a limit due to penalties around of seven tries). For this reason, the performance is evaluated only based on whether ChatGPT was able to complete the dEER (success) or not (failure), implying that either it succeeded on some attempt or failed on all (
Table 3).
3.3. Students’ Opinion and Use
This subsection analyzes the data collected from surveys regarding the usage and opinions of students in the Mathematics I course. The survey was conducted among 110 out of 128 enrolled students, which represents a high participation rate of around 86%. Among the respondents, 74.5% identified themselves as male, while 25.5% identified as female.
This study begins by analyzing how quickly this tool has spread among students (question Q4 of the questionnaire, see
Table 1). The results show that ChatGPT has been widely adopted since its release in November 2022. All surveyed students reported being aware of the ChatGPT tool, and approximately 70% started using it for academic purposes in January, as shown in
Table 4. This highlights the significant impact of ChatGPT in the academic community and its widespread adoption among students.
Next, the study examines how frequently students use ChatGPT in the general academic context (Q5) and in the context of the mathematics subject in particular (Q6). The aim is to assess whether students use this tool more or less in the subject under study compared to other academic activities. Regarding the use of the ChatGPT tool for academic purposes, the answers varied from “(1) I do not use it at all”, “(2) I use it very rarely”, “(3) Occasionally”, “(4) Quite often”, and “(5) I use it a lot”. The results showed that students used the tool quite frequently (Mean = 3.06, Median = 3, SD = 1.30). When considering the gender, results were similar for male/female respondents, as shown in
Table 5. Although women tended to use the tool more often than men, the difference was not statistically significant (
p-value = 0.09).
When evaluating the use in the Mathematics I subject, (see
Table 5, fifth and sixth columns), it can be observed that the average decreases for both men and women compared to general use in the academic context. However, when comparing the means of general use with the use in Mathematics I, there is no significant difference between the means (
p-value = 0.1, paired sample
t-test).
After studying the frequency of use, and seeing that the use of this tool is quite widespread and therefore seems to constitute another tool in the students’ learning process, we wanted to evaluate how much credibility they give to ChatGPT in two separate areas. On one hand, the theoretical mathematical response, in which it explains the concepts involved, and on the other hand, the computational aspect, in which numerical answers to the problems posed are provided. The responses to the question “How reliable do you think the answers of ChatGPT are with respect to the theoretical mathematical background?” were recorded on a 5-point Likert scale ranging from 1 (not at all reliable) to 5 (very reliable), and were collected both overall and stratified by gender.
Overall, the confidence in the mathematical background of the ChatGPT responses was found to be very high (Mean = 4.21, Median = 4, SD = 0.73), with a fairly low standard deviation (see
Table 6. As far as the authors have been able to verify, the ChatGPT responses regarding the problems at hand have been very accurate, without considering the calculations, and are capable of providing a fairly reliable step-by-step guide.
However, when comparing the confidence means between men and women, a slight but significant difference was observed (
p-value = 0.024). Specifically, women expressed slightly lower confidence levels than men (see third and fourth columns of
Table 6).
From
Table 6, we infer that the confidence in the computational aspect of ChatGPT’s answers is not as high as in the theoretical one. Indeed, a significant difference was found between the means obtained in confidence in the theoretical and calculistic aspects of ChatGPT (
p-value = 0.001, Independent Sample T Test). However, when studying the difference between the means of the responses of men and women in terms of the reliability of the calculations, no significant difference was found (
p-value = 0.617).
After examining general use and reliability, we now analyze the usefulness of ChatGPT in fostering the learning of mathematical concepts. The survey question Q9 asked: “Do you think that the use of ChatGPT has helped you to learn/reinforce some mathematical concepts used in the subject of Mathematics I?” Responses ranged from 1 (no, it has not helped me) to 5 (yes, a lot of times).
Responses (see
Table 7) show a positive appreciation of the usefulness of ChatGPT in learning or reinforcing mathematical concepts (Mean = 3.50, Median = 4.00, SD = 1.03). When considering gender, the mean for men (Mean = 3.46) and women (Mean = 3.61) did not differ significantly (
p-value = 0.506).
Table 7 also shows the results related to the question Q10: “Do you think that the use of ChatGPT has helped you in solving problems/exercises in the subject of Mathematics I?” Responses ranged from 1 (no, it has not helped me) to 5 (many times).
Responses also showed a positive appreciation of the usefulness of ChatGPT in solving mathematics problems/exercises (Mean = 3.37, Median = 3.00, SD = 1.19). When considering gender, the mean for men (Mean = 3.35) and women (Mean = 3.43) did not differ significantly (p-value = 0.813).
As observed in
Table 7, the means did not differ much between the responses, indicating that students found ChatGPT responses quite useful in the learning process and in solving problems. This seems to indicate that, despite the short time it has been in use, students have already integrated it into their digital learning environment.
Once the usefulness of ChatGPT in students’ learning process has been established, it is logical to ask to what extent they use it, not only to improve this learning process, but also to address doubts in tasks and exercises that are part of a BL methodology structure. This is the most delicate part, as the activities, especially those planned to reinforce students’ critical thinking, can be affected by a tool that provides answers and reasoning without the student properly assimilating them in a not controlled environment. To assess students’ use of ChatGPT for completing tasks and assignments, they were asked if they had used ChatGPT to help them complete scheduled tasks outside the classroom (Q11). Responses ranged from 1 (no, never) to 5 (yes, many times) and are summarize in
Table 8. The mean values obtained from the responses were lower than those in other categories (Mean = 2.33, Median = 2, SD = 0.97). However, caution must be exercised when interpreting these responses as the neutral tone of the question may have caused students to infer a search for information about a possible misuse of ChatGPT. Descriptors based on gender can be found in
Table 8.
Students were surveyed about the importance of AI in academia (Q12), with responses ranging from 1 (not at all important) to 5 (very important). Results indicated that students generally considered these tools to be important in the academic world (Mean = 3.78, Median = 4, SD = 0.95).
Table 9 presents the results stratified by gender. A significant difference was found between the responses of men and women (
p-value = 0.002), with men giving greater importance to these tools than women.
Table 9 summarizes the students’ opinion results on how important the new tool is in academia, underscoring the increasing rapidity in which it has been integrated and its potential in this area.
One concern regarding the use of ChatGPT is whether it will hinder students’ acquisition of essential skills in the development of their coursework. In this section, we evaluate students’ opinions on three competencies critical to their academic development, critical thinking (CT), problem-solving (PS), and group work (GW). Responses to these competencies included the following options: (1) no, it will not affect at all, (2) yes, it will affect very little, (3) yes, it will affect somewhat, (4) yes, it will quite affect, and (5) it will affect a lot. The answers varied depending on the competency being evaluated:
Critical Thinking: Mean = 2.38, Median = 2, SD = 1.10.
Problem-solving: Mean = 2.39, Median = 2, SD = 1.28.
Group work: Mean = 2.97, Median = 3, SD = 0.83.
The responses indicate that students perceive ChatGPT as having a small to moderate effect on the acquisition of the aforementioned competencies. Group work appears to be the most affected competency, according to the opinions of the students.
Table 10 shows the values of these opinions based on gender, providing a sense of how students believe that using AI affects the acquisition of competencies. Significant differences were found in students’ perceptions of how ChatGPT impacts their problem-solving skills (
p-value = 0.008) and critical thinking (
p-value = 0.017). However, there is greater consensus on how it will affect group work (
p-value = 0.687).
3.4. Initial Results on Performance
In this section, the students’ results are compared with those from previous years with the aim of examining significant differences in competencies acquisition.
Table 11 displays the test results of students conducted to date in the academic years 2021/2022 and 2022/2023. The Levene column shows the significance (
p-value) obtained in the Levene’s normality test. The
t-Test column provides the
p-value when comparing means, considering the result of the Levene’s test. Four theoretical exams (C1, Algebra, Test1, and Test2) have been conducted in a controlled environment without access to computers or any electronic devices. The C1 exam covers complex numbers and integral calculus with applications, similar to Test1. The difference between the two exams is that the former focuses on problem-solving, while the latter emphasizes theoretical concepts with answer options that penalize in case of error. The same situation occurs for the Algebra exam and Test2. Both exams cover algebra concepts (matrices, determinants, linear equation systems, vector spaces, Euclidean spaces, linear applications, and diagonalization), but the former is centered on problem-solving, while the latter focuses on more theoretical concepts. It can be observed that there is no significant difference between the scores of C1 and Algebra, but there is in the scores of the tests. This may be due to various reasons and normal score variability; however, this difference is not evident when comparing scores from previous courses.
Next, the scores from lab sessions are compared, in which students solve problems previously prepared outside the classroom. Before each lab session, students can consult the professor with any doubts. For these exams, they have access to Mathematica, which means students can use computers. Although it is a more or less controlled environment, students could potentially access the internet since there are no restrictions on the computer connections. In approximately 50% of the laboratory sessions, there is a significant difference between the scores obtained in 2022 and those obtained in 2021, the former being higher.
4. Discussion
B-learning methodologies are meant to promote the active participation of students, who complete tasks designed to improve and strengthen their knowledge, competencies, and skills [
30,
33,
34,
43]. Even if a correct solution is not reached, attempting to solve problems strengthens critical thinking and improves learning and deductive abilities. Repetitive and simple tasks also aim to ensure the correct assimilation of knowledge within a broad context of questions. The subject matter addressed in this article is mathematical, which requires specific skills based on the correct assimilation of content, practice, and application, as well as reinforcement through activities. In contrast to other subjects, the use of a mathematical language different from the one used conversationally implies a need for additional learning support. However, the use of ChatGPT could weaken this support if it becomes capable of solving the problems raised and explaining the calculations made. On the other hand, ChatGPT can be helpful if used properly, as it provides a detailed description of the mathematical knowledge required to solve problems.
Regarding student performance, when comparing the results to those obtained in the previous academic year, a slight increase in the scores of the practical sessions can be observed, in which the environment is not entirely controlled, as students have access to the internet. This could indicate better performance, assuming students are truthful in the survey and use ChatGPT for preparing activities. However, this increase in scores is not noticeable in exams held in more controlled environments, where no significant differences can be observed, neither for better nor for worse.
Regarding the use of GBL and dEERs to promote student motivation and competency development, the use of ChatGPT has the potential to significantly affect its usage and the information collected. The data collected from the games are used to improve the students’ learning experience through a feedforward strategy [
44]. However, if these data are altered, the evaluation of competencies and content is highly affected, which may prevent deficiencies shown during the game from being reinforced in the future. This could have very negative repercussions on the design of future activities.
The student’s attitude will determine, as with other technologies, whether the use of ChatGPT in active methodologies will have benefits or drawbacks [
18,
20]. Since COVID-19 pandemic, students have an improved wide range of computer and digital tools at their disposal that facilitate learning [
29], but the introduction of ChatGPT with the ability to directly customize the problem posed can greatly facilitate the search process. Consequently, the focus of attention shifts from an active search to mainly analyzing whether the answer is correct or not. If students rely solely on ChatGPT to find answers instead of training their skills, the effectiveness of FT-based methodologies could be significantly diminished. In the STEM area, the response capabilities of ChatGPT pose a risk to the integrity of the learning process if they prevent the acquisition of skills. However, the doubts generated regarding the viability and correctness of the responses generated so far can promote critical thinking.
Nevertheless, ChatGPT has also positive features for blended learning environments such as, easy access to vast information to supplement their learning resources, quick assistance with homework, assignments, or clarifying doubts, and a strong capability to adapt to users based on individual needs. In the authors’ opinion, this new tool can facilitate obtaining answers and facilitate knowledge acquisition. However, its potential effects on educational development and the design of activities require evaluation, as ChatGPT’s response capacity can alter the learning process [
12,
13,
14]. A word cloud has been generated with recent literature about ChatGPT in education (see
Figure 2).
The results obtained in this study are in agreement with the findings of other studies.
Table 12 examines the performance of ChatGPT, potential issues associated with ChatGPT, and the use of this tool for enhancing learning. For a more exhaustive comparison among the literature results, see [
77].
While ChatGPT can be a valuable tool in BL methodologies, there are some concerns that should be addressed:
Reliability issues: ChatGPT can provide incorrect, inaccurate, or outdated information, which can lead to misunderstandings or misconceptions in an educational environment [
80,
81]. Students’ opinions show a rather lukewarm average confidence, especially when it comes to the calculations provided. The convenience sample does not allow these results to be extended to the entire university student community, as the specific group is from an engineering discipline with quite high profile.
Cheating: AI-generated content may be used to complete assignments and out-of-the-classroom exercises, weakening the learning process, and undermining the acquisition of key competences. The results obtained show a high use of this tool in the academic field, suggesting its use in completing tasks and assignments. Although students’ opinions indicate that they do not believe this usage affects the assimilation of key skills, the reality may differ significantly, and it may still take some time to accurately measure the consequences on the learning process.
Over-reliance on AI: Results from the questionnaire indicate that ChatGPT is widely use. Its ease of use and high accessibility across different platforms have allowed ChatGPT to revolutionize the use of AI in the academic environment. However, students may become too dependent on ChatGPT for problem-solving and knowledge acquisition, hindering the development of critical thinking skills and self-reliance.
Accessibility: Despite the low requirements needed to use the tool, and its ease of use, not all students may have equal access to it due to technological or financial constraints, leading to potential inequalities in learning opportunities.
Teacher–student interaction: The regular use of this tool when encountering difficulties in the learning process can substantially reduce the amount of interaction between teachers and students. This reduces the teacher’s opportunities to supervise and guide the students in the assimilation of knowledge and competencies.
Not controlled environment: Although students have access to a wealth of information on the internet, books, videos, etc., in a blended learning environment, many activities do not take place in a controlled setting. Teachers rely on students using technology to obtain certain answers. However, ChatGPT’s adaptability and ability to personalize the problems posed may oversimplify the information-seeking process, the ability to critically analyze responses, and weaken the learning process.
Assessment challenges: The emergence of ChatGPT calls into question the usual way of assessing content acquisition. The results presented by students and the content generated by them (essays, articles, and projects) must be carefully monitored. It will take time to establish ChatGPT’s potential and determine which tests will and will not be representative of the knowledge generated and acquired.
To address these concerns, educators should strive to use ChatGPT as a supplementary tool rather than a replacement for traditional teaching methods, carefully monitor its usage, and promote critical thinking and evaluation of AI-generated content.
5. Conclusions
As AI continues to gain prominence in education, new challenges will arise in developing effective teaching methodologies that leverage the potential of these tools while addressing their limitations.
The consequences of the emergence of AI in the academic world will need to be assessed as the outcomes of implementing these tools become more measurable. In controlled environments, such as the classroom with a classic face-to-face methodology, the use of AI can be minimized, simply by restricting access to the network and mobile devices. In these same controlled settings, as students must demonstrate the acquisition of knowledge and competencies at different assessment points throughout the course, it is expected that the use of AI will be merely anecdotal, being completely inappropriate and reprehensible its use during the tests or activities.
It is crucial to proactively address these challenges to ensure that students continue to receive a high-quality education that prepares them for the demands of the future. In our opinion, the digital elements of blended learning methodologies (online exams, quizzes, knowledge reinforcement exercises, games) are the ones that carry the greatest risk of being oversimplified by AI.
Despite the recent advent of ChatGPT and risk of wrongful answers, its ability to learn and adapt is a significant advantage over other sources of information, which may also contain incorrect or outdated information. In addition, the personalization of the problems and the detailed guidance provided by ChatGPT have been highlighted by students as key strengths.
The results of this study show that students have a high level of confidence in the accuracy of ChatGPT’s answers, with a high percentage of correct responses when compared to the numerical solutions provided in the activities. Furthermore, ChatGPT not only provides solutions to the mathematical problems posed but also offers a step-by-step guide to the process required for their solution, which enhances the student’s understanding of the problem-solving process. Nevertheless, it is important to note that the use of ChatGPT may have implications for the development of critical thinking and problem-solving skills in students. Therefore, it is crucial to strike a balance between leveraging the benefits of AI and ensuring that students develop the necessary competencies to succeed in their academic and professional lives.
In conclusion, ChatGPT has both advantages and disadvantages in blended learning environments. While it offers easy access to a huge amount of information and educational assistance, it also raises concerns about the ability to assess correctly the learning progress of the students, ethical use, and oversimplification of learning process. Successful integration of ChatGPT requires a balanced approach, where it complements human interaction and guidance. Teachers and educational institutions must carefully monitor its use to ensure it supports the learning process rather than hindering it.