HealthFebruary 26, 2025

Responsible AI in healthcare: 4 pillars of trust in clinical GenAI solutions

Before choosing a clinical AI solution, make sure it is built on a framework of rigorous review and continuous advancement.

Artificial intelligence (AI) has potential to revolutionize the healthcare industry through solutions that transform how providers and patients interact with information. Yet, when it comes to clinical information and interpreting medical data that may have direct impact on patient care, the healthcare industry must weigh AI’s potential with its inherent risks.

In order to advance large language model (LLM)-powered clinical generative AI (GenAI) and take advantage of its promise to interpret vast amounts of medical data and deliver insights quickly, both solution developers and healthcare provider partners must emphasize the critical importance of accountability in AI-driven healthcare solutions. Trust can only be built on a strong foundation of rigor and responsibility.

The stakes are high for responsible AI in healthcare

Clinical AI has been known to misread queries or make up responses, and often reminds users in disclaimers that it’s still learning. “It’s going to take a while for these models to become more reliable, but I’m a firm believer in the art of innovation and technology scaling to overcome such hurdles,” says Manish Vazirani, Vice President of Clinical Effectiveness Product Software Engineering for Wolters Kluwer Health.

When investigating clinical LLM tools for their organizations, Vazirani advises healthcare leaders to look for AI-driven clinical decision support solutions that are rooted in the same rigorous standards as traditional expert-curated sources. Through robust clinical review and evidence-based content, traditional clinical decision support resources like UpToDate® develop reliable medical information. Clinical LLM-powered GenAI must strive to meet an equal standard, Vazirani explains, not by replacing traditional expertise but by complementing it with the speed and scalability that AI offers using a model trained on the same expert content, carefully curated and defined to help hone relevant responses.

When clinical GenAI operates without oversight, biases and incomplete data can influence its responses, leading to a decrease in the distinction between trustworthy and erroneous information. The likelihood of flawed information increases when clinical LLM-powered GenAI operates without proper oversight, leaving its responses vulnerable to biases and incomplete data. For this reason, Vazirani emphasizes the importance of developers implementing ongoing internal reviews aided or run by the right subject matter experts to confirm that the content generated by GenAI remains unaltered.

A responsible AI solution presents its own “unique dilemma” to engineers and clinical experts, Vazirani explains. “We’re trying to focus on responsible development over speed to get the balance of what ‘good’ looks like. And we have to consider additional prompts that take into account ethics and fairness.”

The four pillars of trust for clinical GenAI

Responsible AI solutions must be built on a framework of rigor and continuous improvement. Vazirani recommends reviewing clinical GenAI tools against a checklist of four pillars of trust:

1. Rigorous clinical review

Clinical GenAI’s foundation must be a robust system of clinical review with the inclusion of subject matter experts to help AI generate contextually appropriate responses:

  • Rigorous clinical review of curated training questions: For example, in his experience at UpToDate, Vazirani’s team used 4,000 curated “golden questions” spanning 25 specialties to form the backbone of training models. This, he says, helps maintain outputs that are tailored to specific patient demographics, clinical relevance, medical conditions, and care needs.
  • Identify unreliable responses: Equally important to identifying reliable results is identifying, tracking, and improving areas where the model retrieves low-relevance data. When a clinical LLM consistently learns from its mistakes, Vazirani says, it builds trust through refinement.
  • Review content for “explainability” and interpretability.

2. Real-world use applications

Responsible clinical GenAI platforms should include additional layers of review for reliability and appropriateness:

  • Rigorous clinical review of AI generated answers to evaluate reliability.
  • Prompts and algorithms behind the scenes should be designed to anticipate ethical concerns, guiding AI answers responsibly. For example, some systems ensure that queries involving privacy violations or harmful questions — such as methods of overdose — are automatically blocked.
  • To back this process, tools should undergo constant testing, such as calculating F1 scores to evaluate their ability to balance reliability and recall in information delivery.

3. Curated, evidence-based source content

While some advancements in clinical GenAI may lead to ground-breaking applications, ungoverned content is likely to result in inconsistencies.

The reliability of LLM outputs begins with the quality of its source materials. Unlike open-source AI systems that pull uncontrolled and unverified data, responsible platforms utilize evidence-based source content validated by thousands of clinical experts. Content should originate from trusted medical literature to align with established guidelines.

Vazirani recommends solution vendors who adhere to the following responsible standards:

  • Collecting user feedback from both internal and external stakeholders on generated answers.
  • Producing defined metrics on reliability for generated answers.
  • Applying rigorous clinical review.

4. Continuous model improvements

No AI system is static, Vazirani says. Clinical LLM-powered GenAI platforms should leverage both internal clinical reviews and feedback loops from early adopters to continue developing and innovating. Gathering data from real-world applications helps fine-tune prompts and algorithms, supporting responsible clinical AI tools in adapting to changing needs and expectations:

  • Continuous fine-tuning LLM and prompts to help improve reliability.
  • Learning model setup with early adopters.

How can clinical GenAI address challenges and amplify value?

“I think there’s a lot of potential for AI to simplify the work for medical professionals and to unlock new value in a responsible way,” says Vazirani.

With a responsible clinical LLM GenAI platform, he sees some opportunity to address several universal challenges facing healthcare providers:

  • Healthcare burnout: AI tools have potential to help alleviate healthcare worker burnout by creating operational and clinical efficiencies and accelerating access to trusted, clinical care intelligence.
  • Reducing complexity: By surfacing information quickly and helping simplify complex decision-making processes, clinical AI tools have potential to optimize quality outcomes and patient care and proactively manage community health needs.
  • Integration: By removing workflow barriers, AI tools integrated into clinical workflow solutions can provide real-time insights at the point of care and streamline clinical decision-making.
  • Cost concerns: By connecting care teams through trusted, standardized content and establishing consistency, AI tools have potential to reduce costly variation in care and enhance patient outcomes.

Long-term commitment to responsible innovation and AI partnership

Responsible AI in healthcare requires an ongoing commitment to quality and review from both developers and users of solutions. Clinical GenAI solutions are only as strong as the safeguards underpinning them. By focusing on the four pillars — rigorous clinical review, real-world application, curated evidence-based sources, and continuous improvements — AI solutions can build trust among healthcare providers and patients.

Learn More About Responsible Clinical AI Solutions By UpToDate
Back To Top