DeepYardDeepYard
F

FinToolBench

Specialized benchmark for evaluating LLM agents on real-world financial tool use and compliance

Open SourceFree

About

Academic benchmark designed to test LLM agents in finance-specific scenarios requiring tool use, compliance awareness, and handling of volatile data. Unlike general-purpose evaluations, FinToolBench addresses high-stakes decision-making with real-world financial constraints. Evaluates dynamic agentic interactions beyond static text analysis, making it essential for organizations deploying AI in regulated financial environments.

Details

Type
Integrations
Language

Tags

evaluationtool-useautonomousopen-source