ChatGPT Might Miss Your Serious Medical Emergency, New Study Suggests
NEWNow you can listen to News articles!
This story is about suicide. If you or someone you know is having suicidal thoughts, please contact the Suicide & Crisis Lifeline at 988 or 1-800-273-TALK (8255).
Artificial intelligence has been touted as a boon to healthcare, but a new study has revealed its potential shortcomings in providing medical advice.
In January, OpenAI launched ChatGPT Health, the medical version of the popular chatbot tool.
The company introduced the tool as “a dedicated experience that securely brings together your health information and ChatGPT intelligence, to help you feel more informed, prepared and confident when navigating your health.”
But researchers at the Icahn School of Medicine at Mount Sinai found that the tool did not recommend emergency care for a “significant number” of serious medical cases.
The study, published in the journal Nature Medicine on February 23, aimed to explore how ChatGPT Health, which reportedly has around 40 million users daily, handles situations in which people ask whether they should seek emergency care.

Artificial intelligence has been touted as a boon to healthcare, but a new study has revealed its potential shortcomings in providing medical advice. (iStock)
“At this time, no independent body evaluates these products before they reach the public,” lead author Ashwin Ramaswamy, MD, an instructor of urology at the Icahn School of Medicine at Mount Sinai in New York City, told News Digital.
“We wouldn’t accept that for a drug or a medical device, and we shouldn’t accept it for a product that tens of millions of people use to make health decisions.”
Emergency scenarios
The team created 60 clinical scenarios in 21 medical specialties, from minor conditions to true medical emergencies.
Three independent physicians then assigned an appropriate level of urgency for each case, based on clinical practice guidelines published by 56 medical societies.
WOMAN SAYS CHATGPT SAVED HER LIFE BY HELPING TO DETECT CANCER, WHICH DOCTORS ARE LACKING
Researchers conducted 960 interactions with ChatGPT Health to see how the tool responded, taking into account gender, race, barriers to care, and “social dynamics.”
While “clear emergencies” (such as a stroke or a serious allergy) were generally handled well, the researchers found that the tool “did not classify” many urgent medical problems.

The team created 60 clinical scenarios in 21 medical specialties, from minor conditions to true medical emergencies. (iStock)
For example, in an asthma scenario, the system recognized that the patient was showing early signs of respiratory failure, but still recommended waiting rather than seeking emergency care.
“ChatGPT Health works well in medium-severity cases, but fails at both ends of the spectrum: the cases where getting it right is more important,” Ramaswamy told News Digital. “More than half of genuine emergencies were undertriaged and approximately two-thirds of mild cases that, according to clinical guidelines, should be treated at home, were overtriaged.”
PARENTS FILE LAWSUIT ALLEGING CHATGPT HELPED THEIR TEENAGE SON PLAN SUICIDE
Undertriage can be life-threatening, the doctor noted, while overtriage can overwhelm emergency departments and delay care for those who truly need it.
The researchers also identified inconsistencies in suicide risk alerts. In some cases, it directed users to the 988 Suicide and Crisis Lifeline in lower-risk scenarios, and in others, it did not offer that recommendation even when a person discussed suicidal ideation.
“ChatGPT Health works well in medium severity cases, but fails at both ends of the spectrum.”
“The failure of the suicide barrier was the most alarming,” study co-author Girish N. Nadkarni, MD, director of artificial intelligence at the Mount Sinai Health System, told News Digital.
ChatGPT Health is designed to display a crisis intervention banner when someone describes thoughts of self-harm, the researcher noted.

OpenAI launched ChatGPT Health, the medical version of the popular chatbot tool, in January 2026. (Gabby Jones/Bloomberg via Getty Images)
“We tried it with a 27-year-old patient who said he had been thinking about taking a lot of pills,” Nadkarni said. “When he described only his symptoms, the poster appeared 100% of the time. Then we added normal lab results (same patient, same words, same severity) and the poster disappeared.”
“A security feature that works perfectly in one context and completely fails in a nearly identical context… is a fundamental security problem.”
CHATGPT HEALTH PROMISES PRIVACY FOR HEALTH CONVERSATIONS
The researchers were also surprised by the social influence aspect.
“When a family member in the scenario said ‘it’s nothing serious,’ which happens all the time in real life, the system became almost 12 times more likely to downplay the patient’s symptoms,” Nadkarni said. “Everyone has a spouse or parent who tells them they are overreacting. AI should not agree with them during a potential emergency.”
News Digital contacted Open AI, creator of ChatGPT, for comment.
Doctors react
Dr. Marc Siegel, senior medical analyst for News, called the new study “important.”
“It underscores the principle that while large language models can classify clear emergencies, they have much more trouble with nuanced situations,” Siegel, who was not involved in the study, told News Digital.

ChatGPT and other LLMs can be useful tools, one doctor said, but “they should not be used to give medical instructions.” (iStock)
“This is where doctors and clinical judgment come in: knowing the nuances of a patient’s history and how they inform symptoms and their approach to health.”
ChatGPT and other LLMs can be useful tools, Siegel said, but “they should not be used to give medical instructions.”
“Machine learning and continuous data entry can help, but they will never offset the essential problem: human judgment is needed to decide whether something is a true emergency or not.”
An innovative blood test could detect dozens of cancers before symptoms appear.
Dr. Harvey Castro, an emergency physician and AI expert in Texas, echoed the importance of the study, calling it “exactly the type of independent safety evaluation we need.”
“Innovation moves fast. Supervision has to move just as fast,” Castro, who also did not work on the study, told News Digital. “In healthcare, the most dangerous errors occur at the extremes, when something seems minor but is actually catastrophic. That’s where clinical judgment matters most and where AI needs to be stress-tested.”
Limitations of the study
The researchers acknowledged some potential limitations in the study design.
“We use clinical scenarios written by doctors rather than actual conversations with patients, and we test at a single point in time – these systems are updated frequently, so performance can change,” Ramaswamy told News Digital.
CLICK HERE FOR MORE HEALTH STORIES
Furthermore, most of the missed emergencies occurred in situations where the danger depended on how the condition changed over time. It is not clear whether the same problem would occur with acute medical emergencies.
Because the system had to choose only one fixed urgency category, the test may not reflect the more nuanced advice it might give in a back-and-forth conversation, the researchers noted.

ChatGPT Health is designed to display a crisis intervention banner when someone describes thoughts of self-harm. (iStock)
Additionally, the study was not large enough to confidently detect small differences in how recommendations might vary by race or gender.
“We need continuous audits, not specific studies,” Castro said. “These systems are updated frequently, so evaluation must be continuous.”
‘Don’t wait’
The researchers emphasized the importance of seeking immediate care for serious problems.
CLICK HERE TO SUBSCRIBE TO OUR HEALTH NEWSLETTER
“If you feel like something is very wrong (chest pain, difficulty breathing, a serious allergic reaction, thoughts of self-harm), go to the emergency department or call 988,” Ramaswamy advised. “Don’t wait for an AI to tell you it’s okay.”
The researchers noted that they support the use of AI to improve access to health care and that they did not conduct the study to “tear down the technology.”
CLICK HERE TO DOWNLOAD THE News APP
“These tools can be really helpful for the right things: understanding a diagnosis you’ve already received, looking up what your medications do and their side effects, or getting answers to questions that weren’t fully addressed in a short visit to the doctor,” Ramaswamy said.
“That’s a very different use case than deciding whether you need emergency care. Treat them as a complement to your doctor, not a replacement.”
“This study does not mean that we abandon AI in healthcare.”
Castro agreed that the benefits of AI health tools must be weighed against the risks.
“AI health tools can increase access, reduce unnecessary visits and provide information to patients,” he said. “They are not inherently unsafe, but they are still no substitute for clinical judgment.”
TRY OUR LATEST LIFESTYLE QUIZ
“This study does not mean that we abandon AI in healthcare,” he continued. “It means we mature it. Independent testing and stronger guardrails will determine whether AI becomes a safety net or a liability.”
Related article

Melissa Rudy is a senior health editor and member of the lifestyle team at News Digital. Story tips can be sent to melissa.rudy@News.com.


