AgentLens

Interpretable safety framework for steering coding agent behavior via LLM internal representations

Open SourceFree

About

AgentLens is a research framework that enables fine-grained safety control for multi-turn coding agents by analyzing and steering LLM internal representations through mechanistic subspaces. Unlike traditional external guardrails, it operates at the model's internal layer level to provide interpretable behavioral control during agent execution. Designed for researchers and developers working on AI safety and agent alignment in coding contexts.

Details

Type
Integrations
Language

AgentLens

About

Details

Tags

Quick Info

Also in Dev Tools

Crawl4AI

Firecrawl

Headroom Context Optimization