DeepYardDeepYard
M

MalSkillBench

Security benchmark for detecting malicious AI agent skills with runtime verification

Open SourceFree

About

Academic benchmark for evaluating security tools that detect malicious agent skills. Provides verified ground truth test cases covering both code-based and natural language instruction-based threats. Addresses supply chain security risks from third-party agent components by testing detection capabilities against known malicious patterns in hybrid skill formats.

Details

Type
Integrations
Language

Tags

evaluationopen-sourceautonomoustool-useframework