Using questions from the United States Medical Licensing Examination(USMLE), a research team evaluated the clinical reasoning capabilities of ChatGPT, an artificial intelligence (AI) chatbot. Due to its high stakes, a comprehensive three-step standardized testing program covering all topics in physicians’ fund of knowledge, including basic science, clinical reasoning, medical management, and bioethics, the team chose USMLE questions to test the generative language AI.
In the experiment, OpenAI’s ChatGPT was able to successfully passed all the 3-parts of the United States Medical Licensing Examination (USMLE) in a single sitting, whereas it typically takes a medical student close to 4 years and more than 2 years of clinical rotations to pass the USMLE.
How did ChatGPT perform in the USMLE exam?
Without any specialized instruction or reinforcement, ChatGPT “performed at or near the passing threshold for all three exams,” according to the researchers. Furthermore, ChatGPT’s explanations displayed a high degree of concordance and insight. Moreover, the study found that large language models (LLMs) could contribute to clinical decision-making and medical education.
The USMLE is a rigorous, three-part, high-stakes testing program for physicians that covers all areas of medical knowledge, including basic science, clinical reasoning, medical management, and bioethics. A perfect input substrate for AI testing, the difficulty, and complexity of the questions are highly standardized and regulated.
The researchers’ version of the test was not used to train the language model, nor was it given any additional medical training before the study, which involved it responding to several open-ended and multiple-choice questions. Instead, the language model was trained on vast amounts of text from the internet.
The team stated in their study that
“ChatGPT performed at >50% accuracy across all examinations, exceeding 60% in most analyses.”
USMLE pass thresholds vary from year to year, but they are generally around 60%. In light of this, ChatGPT is now easily within the passing range. We think this is a surprising and impressive result, especially considering that this is the first experiment to hit this benchmark.
According to the team, more prompting and interaction with the model could enhance the AI’s performance. They think that missing information that the AI has not encountered may have played a role in instances where the AI performed poorly, providing less consistent answers.
ChatGPT alternate & their comparison (PubMedGPT)
In contrast to models only trained on medical text, they think the OpenAI bot had an advantage because it had a broader understanding of the clinical context. The team stated in their discussion that PubMedGPT, a counterpart language learning model with a similar neural structure but only trained on literature in the biomedical domain, was outperformed by ChatGPT (accuracy 50.8%, unpublished data).
Since the PubMedGPT model incorporates real-world text from ongoing academic discourse, which frequently uses ambiguous, contradictory, or highly conservative or noncommittal language, we hypothesize that domain-specific training may have increased ambivalence in the model.
Application of AI in Medical & Healthcare
Given the industry’s rapid development, the team predicts that AI may soon be used in healthcare settings regularly, perhaps by enhancing risk assessment or offering help and support when making clinical decisions.
In the tech world, ChatGPT and its many applications continue to be fascinating. Education is one application where ChatGPT is viewed as a game-changer, and it appears that the chatbot can perform admirably when it comes to medical education. One study claims that ChatGPT was successful in passing the US Medical Licensing Examination (USMLE), which is ordinarily taken by medical students who want to become licensed physicians.
Is it better to use ChatGPT or AI?
In fact, as researchers demonstrated in one paper, ChatGPT not only aced the exam—which consists of three steps for different levels of medical professionals—but also provided explanations and insights into how it arrived at its conclusions. The initial study demonstrated that ChatGPT was able to achieve greater than 50% accuracy across all tests. The paper hasn’t been peer-reviewed yet, though.
There was a high level of concordance and insight in the explanations provided in this study. This suggests that large language models could be of use in medical education.
The researchers tested using 376 questions from a June 2022 sample exam that was made publicly available. A random spot check was performed to ensure “no answers, explanations, or related content have been indexed on Google before January 1, 2022.” The company also points out that ChatGPT’s learning dataset is currently restricted to 2021.
How ChatGPT could affect education?
Another intriguing instance involves ChatGPT, which was able to pass an MBA exam created by a University of Pennsylvania professor at Wharton. The ChatGPT-3-powered chatbot passed the MBA course’s final exam with a grade between a B- and a B.
The ChatGPT bot does an amazing job at basic operations management and process analysis questions, including those that are based on case studies, according to the report, and it provides clear explanations. Additionally, it was claimed that the bot was skilled at “modifying its answers in response to human hints.”
The response pattern of ChatGPT, in the opinion of educators, makes it challenging to distinguish it from human responses. While some people are not at all concerned, artificial intelligence experts and educators warn that products like ChatGPT could have a negative impact on the educational system in the future.
With advent of technology an evolution of smart AI systems, though they definitely has an advantage over humans, will it be able replace the human intelligence; only time will tell.