Chat Generative Pre-trained Transformer (ChatGPT) is a natural language processing model that generates human-like text. The tool is a large language model (LLM) trained to anticipate word sequences based on context serves. ChatGPT has undergone testing and even passed the US Medical Licencing Exam.
The goal of this new study by the Feinstein Institutes’ researchers was to test if ChatGPT (versions 3 and 4) could pass the ACG assessment, which is intended to gauge the performance on the ABIM Gastroenterology board examination.
ChatGPT-3 and ChatGPT-4 were used to answer the 2022 and 2021 American College of Gastroenterology (ACG) Self-Assessment Tests. The exact questions were inputted in both versions of ChatGPT. A 70% or higher was required to pass the assessment.
There are 300 multiple-choice questions with immediate responses on each ACG test. ChatGPT versions 3 and 4 were used to copy and paste each query and response. ChatGPT responded to 455 inquiries (145 were omitted due to an image requirement). In the two exams, Chat GPT-3 answered 296 of 455 questions correctly (65.1%), and Chat GPT-4 answered 284 questions correctly (62.4%).
Andrew C. Yacht, MD, senior vice president of academic affairs and chief academic officer at Northwell Health, said, “ChatGPT has sparked enthusiasm, but with that enthusiasm comes skepticism around the accuracy and validity of AI’s current role in health care and education.”
Even if the Chat GPT is seen as a potential educational tool, it will get its medical specialty certification sometime soon, suggests the study.
Arvind Trindade, MD, associate professor at the Feinstein Institutes’ Institute of Heath System Science and senior author on the paper said, “Recently, there has been a lot of attention on ChatGPT and the use of AI across various industries. Regarding medical education, there is a lack of research around this potential ground-breaking tool. Based on our research, ChatGPT should not be used for medical education in gastroenterology now and has a ways to go before it should be implemented into the health care field.”
ChatGPT lacks any inherent understanding of a subject or problem. Potential explanations for ChatGPT’s failing grade could be the lack of access to paid subscription medical journals or ChatGPT’s sourcing of questionable outdated or non-medical sources, with more research needed before it is used reliably.
- Suchman, Kelly; Garg, Shashank; Trindade, Arvind J MD. ChatGPT Fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test. The American Journal of Gastroenterology. DOI: 10.14309/ajg.0000000000002320