ChatGPT's Potential in Navigating the Complexity of the Polish Anaesthesiology Specialist Examination

Document Type : Original Article

Authors

1 Student Scientific Association of Computer Analysis and Artificial Intelligence at the Department of Radiology and Nuclear Medicine of the Medical University of Silesia in Katowice

2 Department of Radiodiagnostics, Interventional Radiology and Nuclear Medicine

3 Dr B. Hager Memorial Multi-specialty District Hospital, Pyskowicka 47-51, 42-600 Tarnowskie Góry, Poland

4 Faculty of Medical Sciences in Katowice, Medical University of Silesia, 40-752 Katowice, Poland

Abstract

Purpose: This study aims to assess the capability of an artificial intelligence (AI) model, specifically ChatGPT-3.5, in answering questions from the test section of the Polish National Specialist Examination (PES) in anaesthesiology and intensive care.
Materials and Methods: A pool of 118 questions from the spring 2023 PES exam was utilized. Bloom's classification was employed to categorize questions based on comprehension, critical thinking, and memory. The questions were then presented to ChatGPT-3.5 in five independent sessions to evaluate its performance. Statistical analyses were conducted to assess correlations between the model's confidence, question difficulty, and correctness of answers.
Results: ChatGPT-3.5 achieved an overall accuracy of 47.5%, with variations observed across different question types and subtypes. Significant correlations were found between the model's confidence and answer correctness. However, no correlation was observed between the certainty index and question difficulty or answer correctness based on category or subcategory.
Conclusions: While ChatGPT-3.5 exhibited moderate performance, it fell short of the 60% threshold required to pass the PES exam. Comparison with similar AI studies in Japan suggests superior performance by the Polish AI model, albeit with limitations in expertise level. Human candidates consistently outperformed the AI model, indicating the current superiority of human expertise in this domain. Despite current limitations, continued research and collaboration offer promising prospects for AI integration in medical practice, supporting diagnostics, therapeutics, and patient care.

Keywords