@Article{info:doi/10.2196/56850, author="Wang, Ying-Mei and Shen, Hung-Wei and Chen, Tzeng-Ji and Chiang, Shu-Chiung and Lin, Ting-Guan", title="Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study", journal="JMIR Med Educ", year="2025", month="Jan", day="17", volume="11", pages="e56850", keywords="artificial intelligence; ChatGPT; chat generative pre-trained transformer; GPT-4; medical education; educational measurement; pharmacy licensure; Taiwan; Taiwan national pharmacist licensing examination; learning model; AI; Chatbot; pharmacist; evaluation and comparison study; pharmacy; statistical analyses; medical databases; medical decision-making; generative AI; machine learning", abstract="Background: OpenAI released versions ChatGPT-3.5 and GPT-4 between 2022 and 2023. GPT-3.5 has demonstrated proficiency in various examinations, particularly the United States Medical Licensing Examination. However, GPT-4 has more advanced capabilities. Objective: This study aims to examine the efficacy of GPT-3.5 and GPT-4 within the Taiwan National Pharmacist Licensing Examination and to ascertain their utility and potential application in clinical pharmacy and education. Methods: The pharmacist examination in Taiwan consists of 2 stages: basic subjects and clinical subjects. In this study, exam questions were manually fed into the GPT-3.5 and GPT-4 models, and their responses were recorded; graphic-based questions were excluded. This study encompassed three steps: (1) determining the answering accuracy of GPT-3.5 and GPT-4, (2) categorizing question types and observing differences in model performance across these categories, and (3) comparing model performance on calculation and situational questions. Microsoft Excel and R software were used for statistical analyses. Results: GPT-4 achieved an accuracy rate of 72.9{\%}, overshadowing GPT-3.5, which achieved 59.1{\%} (P<.001). In the basic subjects category, GPT-4 significantly outperformed GPT-3.5 (73.4{\%} vs 53.2{\%}; P<.001). However, in clinical subjects, only minor differences in accuracy were observed. Specifically, GPT-4 outperformed GPT-3.5 in the calculation and situational questions. Conclusions: This study demonstrates that GPT-4 outperforms GPT-3.5 in the Taiwan National Pharmacist Licensing Examination, particularly in basic subjects. While GPT-4 shows potential for use in clinical practice and pharmacy education, its limitations warrant caution. Future research should focus on refining prompts, improving model stability, integrating medical databases, and designing questions that better assess student competence and minimize guessing. ", issn="2369-3762", doi="10.2196/56850", url="https://mededu.jmir.org/2025/1/e56850", url="https://doi.org/10.2196/56850" }