A physician at Mayo Clinic’s neurology department makes a phone call to an insurance company to verify a patient’s benefits. Or rather, she used to.
In 2026, a voice-based AI agent makes that call autonomously. It navigates the phone tree. It waits on hold. It extracts the benefits information, cross-references it against the patient’s care plan, and updates the EHR. The agent completes 87–90% of these calls without human intervention, across Mayo’s neurology and pediatrics departments.
This is not a demo or a research project. It is a production infrastructure at one of the most clinically rigorous health systems in the United States.
The era of healthcare AI pilots is ending, not by proclamation but by budget allocation. Deloitte’s 2026 US Health Care Outlook Survey found more than 80% of healthcare executives expecting agentic AI to deliver moderate-to-significant value across clinical, business, and back-office functions this year.
Rock Health reports that AI-enabled companies captured 62% of all US digital health funding in H1 2025, deal sizes running 83% larger than non-AI peers. Mayo Clinic is mapping out more than $1 billion in AI investments across 200+ projects.
The question has shifted from “should we deploy healthcare AI” to “how do we scale what’s already in production.” That is a different engineering challenge than building the first pilot. This guide is about the second phase.
I’m Mayank Pratap, co-founder of EngineerBabu, a CMMI Level 5, Google AI Accelerator team that has shipped AI clinical documentation, prior authorization automation, and clinical decision support systems for healthcare clients including Apollo Hospitals and ResMed.
We build agentic healthcare systems in production. This is what that actually requires.
What Is Agentic AI in Healthcare?
Agentic AI in healthcare refers to autonomous AI systems that execute multi-step clinical and administrative workflows end-to-end, perceiving data from multiple systems, applying clinical or policy rules, making decisions, taking actions, and escalating to humans only when genuinely required.
Unlike traditional AI that performs a single task (generating a clinical note, predicting a risk score), healthcare AI agents complete entire processes: a prior authorization agent that reads the clinical record, identifies the payer criteria, fills the submission form, submits to the payer, tracks the status, and generates the appeal if denied, without human keystrokes at each step.

Why Healthcare Is One of the Best Environments for Agents
The counterintuitive insight: healthcare administration is one of the highest-value environments for agentic AI precisely because it is heavily regulated and policy-driven.
Regulatory environments produce documented, traceable processes. When an agent navigates a prior authorization workflow, every decision it makes traces back to a specific payer policy clause, a specific clinical documentation element, a specific CPT code rule.
This is not the messy ambiguity of an open-ended chatbot, it is a complex but ultimately rule-followable set of deterministic decisions.
As Sairohith Thummarakoti presented at HIMSS 2026: “Agentic AI is especially ready for the administrative side of healthcare, claims, prior authorization, utilization management, because these workflows are policy-driven, heavily documented, and already tracked in systems.”
The contrast to ambiguous clinical judgment cases is important. The most successful healthcare agent deployments in 2026 share a profile:
- Structured inputs: Patient demographics, insurance data, CPT codes, clinical notes, ICD-10 diagnoses, all already in structured systems
- Verifiable rules: Payer coverage policies, clinical criteria, billing regulations, all documented and in principle codifiable
- Clear success criteria: Authorization approved/denied/pended, claim submitted/rejected/paid, appointment scheduled/confirmed/rescheduled
- Measurable volume: Prior auth teams processing 40+ requests per physician per week; eligibility verification teams running thousands of checks per day
These are exactly the conditions under which agents outperform both pure rule-based automation (too rigid for policy variation) and human teams (too slow and expensive for volume).
The Deployment Reality: What’s Actually in Production
-
Mayo Clinic: Prior Authorization Voice Agents
VoiceCare AI, piloting at Mayo’s neurology and pediatrics departments, uses multi-modal agentic architecture for benefit verification, prior authorization, and prescription support phone calls.
The platform achieves 87–90% autonomous call completion, the agent handles the entire interaction, from navigating IVR menus through extracting authorization decisions, without a human representative initiating or monitoring the call.
This is the prior authorization phone call that AMA’s 2026 survey found consuming 13 hours of physician and staff time per week. One every 1.2 seconds denied at the payer side. Agents on the provider side completing 87–90% of the defensive calls that process those denials.
-
Mayo Clinic: Eligibility and Claims Orchestration
Beyond voice, Mayo deploys agentic platforms that autonomously handle eligibility and benefit verification, claims processing, and prescription support, autonomously enhancing administrative workflows across administrative and clinical support operations.
-
Deloitte 2026 Survey: The Enterprise Scaling Threshold
Deloitte’s 2026 US Health Care Outlook Survey identified the moment the market is crossing: organizations that spent 2024–2025 in pilots are now facing a binary decision, continue treating agentic AI as experimental, or invest in enterprise scaling that reshapes workflows across care and operations.
80%+ of healthcare executives surveyed expect agentic AI to deliver moderate-to-significant value. The obstacle shifting from technology uncertainty to organizational change management.
-
Rock Health Funding Signal
AI-enabled digital health companies captured 62% of all US digital health funding in H1 2025, up from 37% in 2024. Deal sizes for AI-focused startups ran 83% larger than non-AI peers.
The biggest funding rounds clustered around documentation platforms, clinical workflow orchestration, and data infrastructure. This is institutional capital allocating to where agentic AI infrastructure lives.
The Three-Phase Deployment Framework

The most important strategic insight from studying successful healthcare agent deployments is that they follow a consistent phased architecture. Organizations that tried to start with clinical workflow orchestration (phase three) failed. Organizations that were built from administrative agents up succeeded.
-
Phase 1: Single Administrative Agent, Bounded Scope
What it is: One agent, one clearly scoped administrative task, clearly defined inputs and outputs, explicit human escalation protocol, measurable success criteria established before deployment.
Examples in production: Prior authorization submission agent (reads EHR, maps to payer criteria, submits, tracks status). Eligibility verification agent (checks coverage against payer database, updates patient record). Appointment reminder and scheduling confirmation agent (confirms upcoming appointments, reschedules cancellations, updates EHR).
Success criteria: Automation rate (% of tasks completed without human intervention), accuracy rate (% of outcomes matching human-reviewed ground truth), escalation rate (% of tasks appropriately escalated).
What this phase builds: The data infrastructure, governance frameworks, and organizational trust that phases two and three depend on. More important than the automation itself is learning the failure modes, which task types the agent handles poorly, which payer configurations create edge cases, which escalation triggers are calibrated correctly.
-
Phase 2: Adjacent Administrative Workflows, Integration Expansion
What it is: The phase 1 agent’s success creates organizational confidence. Add adjacent workflows: the prior auth agent gains appeal generation capability. The eligibility agent gains explanation capability. The scheduling agent gains referral coordination capability.
Each expansion adds integration touchpoints and builds the organization’s change management muscle, the clinical staff experience of working alongside agents, reporting issues, and trusting escalation protocols.
The integration depth requirement: Phase 2 agents need access to more systems than phase 1. A prior auth agent that can also handle the appeal needs access to the EHR (clinical documentation), the payer portal (submission and status), and potentially a clinical policy database (payer coverage criteria for appeal argument construction). FHIR R4 integration for clinical data, payer API integration for status, LLM for appeal drafting, all secured under HIPAA with appropriate BAAs.
-
Phase 3: Clinical Workflow Orchestration
What it is: Multi-agent coordination across care settings. Agents handling referrals end-to-end (intake request → insurance verification → prior auth → specialist scheduling → EHR notification). Post-discharge follow-up coordination (discharge summary generation → patient contact → medication adherence check → readmission risk alert if threshold crossed). Chronic disease management orchestration (RPM data review → alert triage → care coordinator notification → care plan update → billing code generation).
Why most organizations aren’t here yet: Phase 3 requires the trust infrastructure built in phases 1 and 2. Clinical workflow agents have patient safety implications. Phase 3 agents must have demonstrated performance stability across phases 1 and 2, clear human oversight protocols, and organizational confidence in escalation behavior before being deployed in clinical pathways.
The Technical Architecture: How to Build a Healthcare AI Agent
This is the architecture the EngineerBabu team deploys for production healthcare agentic systems.
-
The Agent Orchestration Layer: LangGraph
LangGraph is the production-grade orchestration framework for stateful, multi-step healthcare workflows. It uses a graph-based architecture where:
- Nodes are individual agent actions (read FHIR data, apply payer policy rule, call payer API, generate appeal letter, update EHR)
- Edges are the transitions between actions, including conditional branching (if authorization approved → update EHR; if denied → trigger appeal agent)
- State is the shared context maintained across the full workflow — patient ID, clinical documentation, payer responses, current workflow status
The critical property for healthcare: state persistence. A prior authorization workflow may span hours or days. The agent must maintain context across asynchronous payer response times, human review queues, and system availability windows. LangGraph’s built-in state management handles this correctly.
from langgraph.graph import StateGraph, END
from typing import TypedDict, Optional, Annotated
from langchain_openai import AzureChatOpenAI # HIPAA BAA covered
import operator
# Define the workflow state
class PriorAuthState(TypedDict):
patient_id: str
fhir_clinical_data: dict
payer_id: str
service_codes: list[str]
coverage_requirements: Optional[dict]
documentation_package: Optional[dict]
submission_response: Optional[dict]
authorization_status: Optional[str]
appeal_letter: Optional[str]
escalation_required: bool
audit_trail: Annotated[list, operator.add] # append-only audit
# Initialize LLM (Azure OpenAI — HIPAA BAA covered)
llm = AzureChatOpenAI(
azure_deployment=“gpt-4o”,
azure_endpoint=“https://your-endpoint.openai.azure.com/”,
api_version=“2024-12-01-preview”
)
def fetch_clinical_documentation(state: PriorAuthState) -> PriorAuthState:
“””Node 1: Pull FHIR data from EHR”””
fhir_data = fetch_fhir_patient_data(
patient_id=state[“patient_id”],
resources=[“Condition”, “MedicationRequest”, “Observation”, “Procedure”]
)
return {
**state,
“fhir_clinical_data”: fhir_data,
“audit_trail”: [{“step”: “fetch_clinical_data”, “timestamp”: now(), “resources”: list(fhir_data.keys())}]
}
def check_coverage_requirements(state: PriorAuthState) -> PriorAuthState:
“””Node 2: Query payer coverage criteria database”””
criteria = payer_policy_db.get_criteria(
payer_id=state[“payer_id”],
service_codes=state[“service_codes”]
)
return {
**state,
“coverage_requirements”: criteria,
“audit_trail”: [{“step”: “coverage_check”, “payer”: state[“payer_id”], “criteria_version”: criteria.get(“version”)}]
}
def generate_documentation_package(state: PriorAuthState) -> PriorAuthState:
“””Node 3: LLM maps clinical evidence to payer criteria”””
prompt = f”””
Map the following clinical documentation to the payer’s coverage criteria.
Clinical data: {state[‘fhir_clinical_data’]}
Coverage requirements: {state[‘coverage_requirements’]}
For each required criterion, identify the supporting documentation element.
Flag any gaps as MISSING.
Output as structured JSON only.
“””
response = llm.invoke(prompt)
doc_package = parse_json_response(response.content)
# Check for missing criteria → escalate if critical gaps
has_critical_gaps = any(v == “MISSING” for v in doc_package.get(“criteria_map”, {}).values()
if doc_package.get(“criteria_importance”, {}).get(k) == “required”)
return {
**state,
“documentation_package”: doc_package,
“escalation_required”: has_critical_gaps,
“audit_trail”: [{“step”: “documentation_mapping”, “gaps”: doc_package.get(“missing_criteria”, [])}]
}
def route_after_documentation(state: PriorAuthState) -> str:
“””Conditional routing: escalate or proceed”””
if state[“escalation_required”]:
return “escalate_to_human”
return “submit_authorization”
def submit_to_payer(state: PriorAuthState) -> PriorAuthState:
“””Node 4: Submit via Da Vinci PAS or X12 278″””
response = payer_api.submit_prior_auth(
payer_id=state[“payer_id”],
patient_id=state[“patient_id”],
service_codes=state[“service_codes”],
documentation=state[“documentation_package”]
)
return {
**state,
“submission_response”: response,
“authorization_status”: response.get(“status”),
“audit_trail”: [{“step”: “submission”, “response_code”: response.get(“status”), “auth_number”: response.get(“auth_number”)}]
}
def route_after_submission(state: PriorAuthState) -> str:
“””Conditional routing based on payer response”””
status = state.get(“authorization_status”)
if status == “approved”:
return “update_ehr_approved”
elif status == “denied”:
return “generate_appeal”
elif status == “pended”:
return “track_pending”
else:
return “escalate_to_human”
def generate_appeal(state: PriorAuthState) -> PriorAuthState:
“””Node: LLM generates appeal letter citing specific policy evidence”””
denial_reason = state[“submission_response”].get(“denial_reason”)
policy_section = payer_policy_db.get_applicable_policy_section(
payer_id=state[“payer_id”],
denial_reason=denial_reason
)
prompt = f”””
Generate a HIPAA-compliant medical necessity appeal letter.
Denial reason: {denial_reason}
Applicable policy section: {policy_section}
Clinical documentation: {state[‘fhir_clinical_data’]}
Cite specific policy language. Map to specific clinical evidence.
Professional medical correspondence format.
“””
appeal = llm.invoke(prompt)
return {
**state,
“appeal_letter”: appeal.content,
“audit_trail”: [{“step”: “appeal_generated”, “denial_reason”: denial_reason}]
}
# Build the workflow graph
workflow = StateGraph(PriorAuthState)
# Add nodes
workflow.add_node(“fetch_clinical_data”, fetch_clinical_documentation)
workflow.add_node(“check_coverage”, check_coverage_requirements)
workflow.add_node(“build_documentation”, generate_documentation_package)
workflow.add_node(“submit_authorization”, submit_to_payer)
workflow.add_node(“generate_appeal”, generate_appeal)
workflow.add_node(“update_ehr_approved”, update_ehr_with_approval)
workflow.add_node(“escalate_to_human”, create_human_review_task)
workflow.add_node(“track_pending”, poll_payer_status)
# Define edges
workflow.set_entry_point(“fetch_clinical_data”)
workflow.add_edge(“fetch_clinical_data”, “check_coverage”)
workflow.add_edge(“check_coverage”, “build_documentation”)
workflow.add_conditional_edges(“build_documentation”, route_after_documentation)
workflow.add_conditional_edges(“submit_authorization”, route_after_submission)
workflow.add_edge(“generate_appeal”, “escalate_to_human”) # Appeals require human sign-off
workflow.add_edge(“update_ehr_approved”, END)
workflow.add_edge(“escalate_to_human”, END)
workflow.add_edge(“track_pending”, “submit_authorization”) # Re-check after delay
# Compile
prior_auth_agent = workflow.compile()
-
The Provenance Graph: Every Agent Action Must Be Auditable
This is the healthcare-specific requirement that distinguishes compliant agentic AI from general-purpose agent frameworks.
Every action an agent takes, every FHIR resource it reads, every policy clause it applies, every decision it makes must be logged in a provenance graph: what data it saw, which rules and policies it applied, what it decided, and why. This is not just operational logging. It is the audit infrastructure that:
- Enables regulatory compliance: HIPAA audit controls require logging of all PHI access (45 CFR §164.312(b)). Agents that access EHR data trigger audit log requirements for every resource read.
- Supports clinical accountability: When an agent’s authorization submission is denied or triggers an adverse outcome, the provenance trail determines whether the agent behaved within its defined parameters or produced an unexpected result.
- Enables model improvement: Provenance graphs connect agent decisions to outcomes, enabling supervised learning on the cases where human override improved the result.
The audit_trail field in the LangGraph state above, using Annotated[list, operator.add] to make it append-only across state transitions is the foundation of this. Every node appends its action record. The final state contains a complete, tamper-evident audit trail of the agent’s full execution.
-
The HIPAA Compliance Layer
Every component in a healthcare agent that touches PHI requires:
- LLM calls: Azure OpenAI or AWS Bedrock (HIPAA BAAs). Never standard OpenAI API without enterprise BAA.
- EHR data retrieval: FHIR R4 via Epic SMART on FHIR or equivalent (full guide: Blog 4)
- Payer communication: Da Vinci PAS for FHIR-native payers, X12 278 adapter for legacy payers
- State storage: PHI in agent state must be stored in HIPAA-eligible storage (AWS DynamoDB with BAA, or in-memory for synchronous flows)
- Audit log storage: AWS CloudTrail + S3 with Object Lock (immutable, 6-year retention)
The Guardrails Problem: Why Healthcare Agents Need More Than Alignment
Standard LLM alignment, training models to refuse harmful outputs is insufficient for healthcare agents. The failure mode is not an agent saying something harmful. The failure mode is an agent taking an action with patient safety implications that wasn’t explicitly blocked.
Research published in arXiv in March 2026 documented the core problem: without deterministic pre-action authorization, social engineering succeeded against model alignment 74.6% of the time in adversarial testing. With policy-based pre-action authorization, the attack success rate dropped to 0% across 879 attempts.
For healthcare agents, the practical governance framework:
1. Scope restriction at the tool level.
Each agent has an explicitly defined tool set. A prior authorization agent has read access to FHIR clinical data, write access to the EHR authorization field, and submission access to payer APIs. It does not have access to billing modification, prescription creation, or clinical order entry. Tool access is granted at the infrastructure level, not the prompt level.
2. Consequence classification.
Classify every tool call by consequence level:
- Reversible low-stakes: Insurance eligibility check, status inquiry, appointment reminder, agent can execute autonomously
- Reversible high-stakes: Prior auth submission, appointment scheduling, EHR status update, agent executes with post-action human notification
- Irreversible: Prescription issuance, order creation, clinical documentation finalization, requires human pre-approval regardless of agent confidence
3. Human-in-the-loop at defined thresholds.
LangGraph’s human interrupt capability allows defining specific workflow nodes where execution pauses and a human review task is created. The agent prepares the context; the human makes the final call. For healthcare, this should be the default for any action that directly affects patient treatment.
4. Confidence thresholds with explicit escalation.
Agents should escalate when their confidence falls below a threshold, when edge cases arise outside training distribution, or when payer responses are unexpected. Escalation is not a failure, it is the correct behavior. The escalation rate is a key performance metric for healthy agent deployment.
The Highest-Value Healthcare Agent Use Cases in 2026

Based on production deployments and investment patterns:
Tier 1: Proven ROI, deployable now:
| Use Case | Automation Rate | Revenue Impact |
| Prior authorization submission | 70–90% autonomous | MUSC: 5,000 staff hours/month recovered |
| Insurance eligibility verification | 85–95% autonomous | Eliminates ~30% of eligibility-caused denials |
| Appointment scheduling & confirmation | 80–90% autonomous | Reduces no-show rates 15–25% |
| Denial appeal generation | 60–80% autonomous | Recovers 65% of previously unappealed revenue |
| RPM alert triage & routing | 70–85% autonomous | Enables 4–5× care coordinator capacity multiplier |
Tier 2: High value, more complex governance:
| Use Case | Status | Key Challenge |
| Post-discharge follow-up coordination | Piloting | Multi-system coordination, patient consent |
| Referral management end-to-end | Piloting | Specialist scheduling complexity |
| Clinical documentation improvement (CDI) | Production at scale | Physician trust building, accuracy validation |
| Chronic disease care plan updates | Early | Clinical safety governance |
What Separates Successful Healthcare Agent Deployments

The Bessemer 2026 State of Health AI report is explicit about where healthcare AI companies fail: “Healthcare IT’s graveyard is full of startups that tried to boil the ocean and drowned in complexity before finding product-market fit.”
The companies succeeding with agentic AI in 2026 share four characteristics:
-
They started with administrative, not clinical.
Administrative workflows have clearer success criteria, fewer patient safety implications, faster feedback loops, and lower regulatory barriers. Clinical workflow agents require the organizational trust infrastructure that administrative agent success builds.
-
They built for measurable outcomes from day one.
“Reduce prior authorization staff hours by 40%” or “Improve first-pass PA approval rate from 65% to 90%.” Not “improve administrative efficiency.” The specific metric drives product design, proves value, and creates the reference case for expansion.
-
They treated provenance and audit as product features.
The health systems deploying agents in production need to explain every agent action to their compliance team, legal team, and clinical leadership. Agents that don’t produce auditable provenance don’t get deployed. Agents that do get deployed faster and expand faster.
-
They designed escalation as a feature, not a failure.
An agent that escalates 15% of cases to human review while automating 85% is not a 15% failure rate, it’s an 85% automation rate with intelligent exception routing. Health system leaders trust agents more when they can see clear, appropriate escalation behavior. Low escalation rates that come from agents forcing decisions they shouldn’t be making are the actual failure mode.
The Bottom Line
The question in 2026 is not whether to deploy healthcare AI agents. It’s which administrative workflows to start with, how to build the governance infrastructure that enables clinical expansion, and which agentic architecture produces the provenance guarantees that enterprise health systems require.
The organizations succeeding, Mayo Clinic, health systems receiving the $3.95B in AI-focused digital health funding from H1 2025, the healthcare AI startups raising at 83% larger deal sizes than non-AI peers are the ones that moved from pilots to production by solving the governance problem, not the technology problem.
The technology is proven. LangGraph orchestrates the workflow. Azure OpenAI generates the language. FHIR APIs connect to the clinical record. The differentiation is in the domain-specific policy databases, the provenance audit infrastructure, the escalation calibration, and the organizational change management that converts tool deployment into workflow transformation.
The EngineerBabu team builds agentic healthcare systems, prior authorization agents, clinical documentation agents, care coordination orchestration for health systems and digital health companies.
If you’re evaluating where agentic AI fits in your clinical or administrative workflows, that’s the conversation worth having before you start building. Reach me at mayank@engineerbabu.com.
Author: Mayank Pratap Co-Founder, EngineerBabu Google AI Accelerator 2024 · CMMI Level 5 · 500+ Products · 20+ Countries, LinkedIn
FAQ
-
What is agentic AI in healthcare?
Autonomous AI systems that execute multi-step clinical and administrative workflows end-to-end, perceiving data from multiple systems, applying clinical or policy rules, taking actions (submitting prior authorizations, scheduling appointments, updating EHRs), and escalating to humans only when genuinely required. Distinguished from single-task AI (a model that generates a note) by its ability to complete entire processes with minimal human initiation.
-
What is Mayo Clinic deploying in 2026?
Mayo Clinic is deploying VoiceCare AI’s voice-based agentic platform for prior authorization, benefit verification, and prescription support phone calls across its neurology and pediatrics departments, achieving 87–90% autonomous call completion. Mayo is also deploying broader agentic platforms for eligibility verification, claims processing, and prescription support across administrative operations, with $1B+ in total AI investments spanning 200+ projects.
-
What framework should I use to build healthcare AI agents?
LangGraph (built on LangChain) is the production-grade framework for stateful, multi-step healthcare agentic workflows. It provides graph-based workflow management, built-in state persistence across asynchronous steps, conditional routing, and human-in-the-loop interrupt capability. For production healthcare agents, pair LangGraph with Azure OpenAI (HIPAA BAA) for LLM calls, FHIR R4 for EHR data, and AWS HIPAA-eligible services for infrastructure.
-
Do healthcare AI agents require FDA clearance?
Administrative agents (prior auth submission, eligibility verification, scheduling), no FDA clearance required. Agents that make specific clinical recommendations or treatment decisions may qualify as SaMD and require FDA oversight. The key distinction: does the agent make a clinical decision, or execute an administrative workflow based on clinical data? (Full FDA SaMD guide: Blog 16)
-
What is the biggest risk in deploying healthcare AI agents?
Inadequate escalation design. Agents that force decisions outside their training distribution rather than escalating appropriately create patient safety and compliance risk. The provenance gap, agents that cannot produce an auditable trail of every action prevent enterprise deployment. And scope creep: agents that have access to tools they shouldn’t (clinical order creation, prescription issuance) create liability regardless of how they actually behave.
-
What’s the ROI of agentic AI in healthcare administration?
Prior auth: MUSC recovered 5,000+ staff hours per month. Eligibility verification: eliminates ~30% of registration-caused denials, saving $25/denial in rework × denial volume. Mayo VoiceCare: 87–90% call completion rate on tasks consuming 13 staff hours/physician/week across hundreds of physicians. Industry-wide: AI and automation in the revenue cycle could generate $360B in annual savings according to McKinsey.
-
How do I ensure healthcare agents are HIPAA compliant?
Every LLM call must use an LLM provider with a signed HIPAA BAA (Azure OpenAI, AWS Bedrock not standard OpenAI API). Every FHIR data access must go through HIPAA-eligible EHR integration with appropriate authorization scopes. All PHI in agent state must be stored in HIPAA-eligible storage. Every PHI access must generate an audit log entry. All vendors in the agent’s tool chain must sign BAAs.