Tech and health

Ai enters the ward: comparing ChatGPT, Claude and Copilot in healthcare

From summarising medical records to transcribing visits, artificial intelligence promises to ease the workload of healthcare professionals

by Francesco Branda*

13 March 2026

7' min read

Translated by AI

Versione italiana

Key points

ChatGPT Health: the generalist entering the clinic
Claude for Healthcare: algorithmic prudence and transparency
Copilot Health: integrated AI in the hospital system
Between promise and responsibility

7' min read

Translated by AI

Versione italiana

For centuries, medicine was a slow discipline, built on the time of clinical observation. The physician listened, noted, reflected. The ward notebooks, often full of abbreviations and notes in the margins, told not only the patient's story, but also the clinician's diagnostic reasoning. A diagnosis was the result of a sequence of intuitions, hypotheses and verifications, built up from experience and memory.

With the arrival of computers in hospitals in the 1980s and 1990s, this process began to change. Electronic medical records gradually replaced paper records, making it easier to store data, retrieve information and share documents between different departments. However, the essence of clinical work remained the same: the doctor continued to interpret data, while the technology mainly performed an administrative and organisational support function.

Over the past two decades, the digitisation of healthcare has further accelerated this process. The adoption of hospital information systems, clinical databases and data analysis platforms has made it possible to aggregate huge amounts of health information. Laboratory data, diagnostic images, vital parameters monitored in real time have started to flow into increasingly complex digital ecosystems. Medicine has become progressively more quantitative, more data-driven.

Today we are entering an even more radical phase. Generative artificial intelligence no longer merely stores or analyses information: it directly enters the clinical decision-making flow. Advanced systems are capable of synthesising medical records, suggesting diagnostic hypotheses, supporting communication with patients and assisting healthcare personnel in their daily activities. They are no longer just management software, but real cognitive interlocutors.

In this scenario, three platforms are emerging that represent different approaches to AI-assisted healthcare: ChatGPT Health, Claude for Healthcare and Microsoft Copilot Health. Three tools born in different technological ecosystems, but sharing a common goal: to reduce the cognitive and administrative burden of healthcare professionals by transforming AI into a widespread clinical assistant.

ChatGPT Health: the generalist entering the clinic

ChatGPT Health is the healthcare declination of the large language model developed by OpenAI that allows electronic health records, wearable data and wellness apps to be linked for targeted answers to health questions, summaries of personal data and suggestions contextualised to the user's condition. The stated aim is to support physicians and patients in understanding health information, not to replace diagnosis or professional clinical advice.

In terms oflanguage processing power, ChatGPT Health helps summarise extensive medical records, clean up narrative notes, organise lab reports and present data in a way that is readable or suitable for sharing with patients. In areas such as radiology or pathology, it can also assist in the initial interpretation of images or the generation of diagnostic differentials and treatment plans, helping to improve operational efficiency.

However, the recent scientific literature points to significant accuracy and safety issues. Systematic and umbrella reviews show that the models do not always achieve reliable diagnostic capabilities: the average accuracy of ChatGPT medical responses in clinical trials is around 50 60 %, with wide variability in study and methodology.

More alarming is the evidence that these AIs canaccept misinformation presented in a convincing manner, responding with confidence even to dangerous health advice if couched in technical clinical language. This behaviour has been documented in studies showing that models are unable to discern between plausible and dangerous statements if the latter are 'grammatically correct'.

A survey published in Nature Medicine also showed that ChatGPT Health underestimates or ignores signs of clinical severity in more than half of simulated emergency cases, e.g. by failing to recognise situations of respiratory failure or suicidal ideation and failing to provide correct warning signals.

This evidence aligns with critical reviews on the potential of language models to generate unsafe, biased or even harmful responses. The literature warns against risks associated with hallucination, i.e. the generation of completely fabricated information, as well as bias in training data that may reproduce gender, ethnic or cultural disparities in the healthcare context.

On the ethical and regulatory front, despite the introduction of additional safeguards, such as separate conversation spaces and advanced encryption, ChatGPT Health is still not regulated as a medical device in many jurisdictions and is not HIPAA-compliant in the US when considered only as a consumer product.

In summary: ChatGPT Health offers real potential for informational and organisational support in clinical practice, butthe possibility of erroneous, unpredictable or inadequately cited information makes it unsuitable as a stand-alone clinical decision-making tool without medical oversight, robust governance and systematic external controls.

Claude for Healthcare: algorithmic prudence and transparency

Claude for Healthcare, derived from Anthropic models, adopts a different paradigm than generalist implementations such as ChatGPT Health. The design philosophy emphasises security, controllability and transparency of results: the model is designed to express uncertainty when appropriate, avoid unsupported assertions, and refer to academic sources or clinical guidelines where possible.

Although there are no published studies specifically on Claude for Healthcare as a commercial platform (most analyses concern comparable general models), academic research compares the performance of various LLMs on structured clinical medical questions. These studies tend to confirm that models with algorithmic caution settings produce fewer clearly unsafe answers and perform better on qualitative metrics related to clarity and contextual correctness.

A comparative study on autoimmune diseases showed that Claude (or its Sonnet/3.5 versions in clinical datasets) can generate more structured and accurate responses, even outperforming junior and senior doctors in specific domains of clinical knowledge.

This increased focus on safety comes at an operational cost: Claude's responses tend to bemore cautious, slower and less immediate, with details favouring the exposition of limits and uncertain circumstances. In high-pressure contexts, such as emergency rooms or intensive care, this cautiousness may be perceived asless rapidity of synthesis than in more assertive systems. However, it may be advantageous in conditions where it is crucial to keep critical command in the hands of the human clinician.

From an academic point of view, Claude's approach is often cited as an example of how IA can explicate uncertainty and contextualise answers, reducing the risk of harm due to overconfidence in algorithms. This is consistent with thegeneral recommendations of the scientific literature, which emphasises the importance ofexplainability methods and ethical frameworks for health AI models, such as the AIMES checklist proposed by the WHO to increase transparency and reliability.

Moreover, the algorithmic prudence implemented by Claude reflects a key principle that has emerged in critical reviews on the ethics of LLM in medicine: AIs must make explicit limits, sources and confidence levels in order to be reliable tools, not mere generators of plausible text.

In summary: Claude for Healthcare represents a more conservative and clinically readable model, with pmore transparent and less risky responses than more assertive models. However, cas with all LLMs, human clinical supervision and independent validation remain necessary before direct decision-making use.

Copilot Health: AI integrated in the hospital system

Microsoft Copilot Health takes a different route than pure conversational models: here, artificial intelligence is deeply integrated into existing digital clinical systems, electronic medical records, collaboration tools and organisational flows, and works 'behind the scenes'.

Rather than being queried via prompts like a chatbot, Copilot Health observes, transcribes and synthesises: it is used to generate structured clinical notes in real time, transcribe doctor-patient conversations, organise appointments and provide interpretative information within the healthcare workflow without the need for ad hoc consultation.

This model focuses on documentation automation and reducing bureaucratic effort - a crucial aspect since studies on AI in healthcare show that many clinical resources are captured by administrative tasks instead of direct patient care. The literature clearly indicates that AI tools that optimise data management and documentation can improve efficiency and free up significant clinical time.

A documented example of LLM in this respect is the use of retrieval augmented generation (RAG) to link language models to clinical knowledge bases such as UpToDate or coded diagnostic systems, which showed a reduction of hallucination errors by up to 40 % compared to pure models.

However, recent scientific literature warns of critical issues related to privacy and security of health data: integrated tools accessing medical records face stringent (and in many casesnot yet globally harmonised) regulations and require very robust governance controls to avoid sensitive health data leaks and breaches.

Moreover, despite deep integration, Copilot Health does not eliminate the risks of clinical errors: the same problems of accuracy in interpreting data and generating reliable outputs remain open, and the literature suggests that human supervision must remain the mainstay of clinical decision-making.

In summary: Copilot Health promisespowerful document automation and integration into healthcare processes, but its actual clinical effectiveness and compliance with safety standards require independent in-depth evaluations and very robust data control structures.

LE CARATTERISTICHE

RACCOMANDAZIONI PRATICHE PER POLICY MAKER E OSPEDALI

Between promise and responsibility

Watching artificial intelligence enter hospital corridors and clinical laboratories is fascinating but at the same time disturbing. S

systems such as ChatGPT Health, Claude for Healthcare and Microsoft Copilot Health offer extraordinary perspectives: they accelerate data synthesis, lighten documentation, help filter the vastness of scientific literature. They can make accessible a quality of analysis and clinical support that only decades ago would have been reserved for highly specialised teams.

Yet, as the scientific literature shows, health AI is not neutral. Every algorithm brings with it epistemic limits, implicit biases and risks of cognitive delegation. The fluidity with which they generate text and syntheses risks hiding errors, transforming intuitions into automatisms and attenuating the critical reasoning of the physician or researcher. In other words, AI may make the process faster and more linear, but not more correct per se: the ultimate responsibility always remains human.

The real challenge of the coming years will be more cultural than technical. It is not just a matter of implementing advanced software, but of redefining the relationship between experience, judgement and automation. AI can become a powerful ally, but only if the practitioner maintains an awareness of limits and retains space for doubt, for controlled error, for critical exploration.

In the ward, as in the laboratory, the promise of AI is not the replacement of thought, but its amplification: freeing cognitive energies from the burden of bureaucracy and the mass of information, to focus on the questions that really matter. But this freedom is fragile. Without attention, it can turn into an illusion of security, a flattening of scientific and clinical curiosity.

The warning is clear: do not fear AI, but do not be seduced by its apparent perfection. Science, and medicine, thrive in the spaces of uncertainty, in the deviations, in the errors that force reasoning. It is there, in those moments of friction between intuition and form, that the insights capable of truly changing clinical practice and research are born.

Healthcare AI is an extraordinary tool, but the future of medicine will be decided by the cawareness and responsibility of the professionals who lead it, not by the brilliance of the algorithms that flank them.

*Unit of Medical Statistics and Molecular Epidemiology, University Campus Bio-Medico of Rome