DeepYardDeepYard
P

PhysAssistBench

Benchmark for evaluating LLM agents in doctor-patient-EHR clinical assistance workflows

Open SourceFree

About

PhysAssistBench is a research benchmark designed to evaluate LLM agents in realistic physician assistance scenarios. It tests coordinated capabilities across clinical knowledge, electronic health record (EHR) system interactions, and patient communication. The benchmark provides a standardized evaluation framework for multi-agent medical AI systems that must navigate complex healthcare workflows involving both structured data and natural conversation.

Details

Type
Integrations
Language

Tags

evaluationmulti-agentopen-sourceresearchhealthcare