Generative LLMs and medical triage decision making

Another paper (Ramaswamy et al 2026) showing that generative large language models cannot be reliably used as tools for medical decision making, in alignment with the theoretical arguments in my paper, linked below.  In my paper, though, I suggest that this is not just a case-by-case thing, but a fundamental structural characteristic of generative LLMs, and further that it is closely related to the problems that are solved in evidence based medicine.  For this reason, I urge medical researchers to not be distracted by generative models, and to continue to focus on clinical trials, prediction modeling, etc.

References

Weisenthal, Samuel J. “Treatment, evidence, imitation, and chat.” arXiv preprint arXiv:2506.23040 (2025). https://arxiv.org/html/2506.23040v2

Ramaswamy, A., Tyagi, A., Hugo, H. et al. ChatGPT Health performance in a structured test of triage recommendations. Nat Med (2026). https://doi.org/10.1038/s41591-026-04297-7. 

Leave a comment