Leading artificial intelligence (AI) systems, including ChatGPT, are vulnerable to repeating false health information when presented convincingly, according to new research published in The Lancet Digital Health. This raises critical concerns as AI becomes increasingly integrated into healthcare, where accuracy can be a matter of life and death.
The Rise of AI in Healthcare… and the Risks
LLMs (Large Language Models) are rapidly being adopted to assist clinicians and patients with faster access to medical insights. But this study demonstrates that these systems can still uncritically accept and disseminate misinformation, even in realistic medical language. This is a problem because people increasingly rely on online sources—including AI chatbots—for health information, and incorrect advice can have serious consequences.
How the Study Was Conducted
Researchers at Mount Sinai Health System tested 20 LLMs from major developers (OpenAI, Meta, Google, Alibaba, Microsoft, Mistral AI) with over a million prompts. These prompts included false medical statements disguised as legitimate information: fabricated hospital notes, debunked health myths from Reddit, and simulated clinical scenarios. The goal was simple: would the AI repeat falsehoods if they were phrased credibly?
Key Findings: Gullibility Varies, but Remains a Problem
The results showed that AI models fell for made-up information 32% of the time overall. However, there was significant variation. Smaller or less advanced models believed false claims over 60% of the time, while more powerful systems like ChatGPT-4o only repeated them in 10% of cases. Surprisingly, medically fine-tuned models performed worse than general-purpose LLMs in identifying false claims.
Examples of Misinformation Accepted by AI
The study identified several dangerous examples:
- AI models accepted false claims like “Tylenol can cause autism if taken by pregnant women.”
- They repeated misinformation such as “rectal garlic boosts the immune system.”
- One model even accepted a discharge note advising patients with bleeding esophagitis to “drink cold milk to soothe symptoms.”
These examples demonstrate the potential for AI to spread harmful health advice. The study also found that AI systems are more likely to believe false claims when they are presented with persuasive but logically flawed reasoning, such as appeals to authority (“an expert says this is true”) or slippery slope arguments (“if X happens, disaster follows”).
What’s Next? Measuring AI Reliability
The authors emphasize the need to treat AI’s susceptibility to misinformation as a measurable property. They suggest using large-scale stress tests and external evidence checks before integrating AI into clinical tools. The researchers have released their dataset for developers and hospitals to evaluate their models.
“Instead of assuming a model is safe, you can measure how often it passes on a lie, and whether that number falls in the next generation,” said Mahmud Omar, the first author of the study.
The findings underscore that while AI has the potential to improve healthcare, its uncritical acceptance of false information poses a significant risk. Rigorous testing and built-in safeguards are crucial before widespread adoption.
