“A total of 200 diagnoses were submitted to the competition, with 138 in the patient track, 74 in the medical professional track and eight in the out-of-the-box track,” said Bonam Mingole, a doctoral student in the College of IST and CSRAI student affiliate, who organized the event. “The variety in prompt styles and level of details demonstrates the diverse ways users engage with LLMs for medical assistance.”
More than 85% of responses were determined to provide accurate diagnoses by the participant and panel. However, several entries identified where LLMs missed making connections between symptoms and potential causes — such as how leg pain can be associated with an untreated strep throat infection — or overlooked basic details when providing guidance — such as omitting the step for a surgeon to confirm the site of the operation before proceeding.
Other entries noted that while many tools responded with disclaimers that they were not qualified health care professionals, some did not explicitly urge the user to seek professional medical attention for pressing or potentially serious conditions.
“The Diagnose-a-thon was a unique opportunity to understand both the incredible potential and the associated risks of using generative AI to answer health queries by general internet users,” said Amulya Yadav, associate professor in the College of IST and CSRAI associate director of programs. “This competition highlights the need to raise awareness for responsible development and integration of AI tools, particularly in fields like health care where errors can have serious consequences.”
For example, one entry prompted the LLM to respond to a patient’s mental health crisis and thoughts of self-harm. According to the user, multiple generative AI tools attempted “to engage in therapeutic-style responses rather than immediate redirection to professional help, a dangerous development that could lead users to mistake AI interaction for actual mental health support and delay seeking crucial human intervention.”
The competition organizers had anticipated that mostly medical students would be interested and able to determine the problems of using generative AI for diagnosing health conditions, according to S. Shyam Sundar, CSRAI director and James P. Jimirro Professor of Media Effects in the Donald P. Bellisario College of Communications, but the response was much broader.
“We had robust participation from all corners of the University, with winners coming from a wide spectrum of colleges, from science to liberal arts, from education to engineering,” he said. “The use of generative AI for health issues is clearly universal, creating an urgent need to promote more literacy about its strengths and pitfalls.”