Rapidly iterate on your agent.

Find, fix, and analyze its problematic behaviors.

Request Demo

Tracea7e2c841-3f19-4d6b-8a05-e91c7b2d0f38

Code Review - agent-v2.4gpt-4o

Metadata: correlationId=run-20250210-review-4821, max_tokens=16000, model=gpt-4o, temperature=0

+ Add behavior

1 issue detected

Timeline

Metrics

Raw

Analysis

Contents108

System

"You are Claude Code, Anth..."

User

"Please review the following..."

Assistant

"I'll review this pull request ..."

User

"Tool result"

Assistant

"Let me start by examining t..."

User

"Tool result"

Assistant

"Tool: Read"

User

"Tool result"

System

You are Claude Code, Anthropic's official CLI for Claude. You are an interactive CLI tool that helps users with software engineering tasks.

Use the instructions below and the tools available to you to assist the user. IMPORTANT: You must NEVER generate or guess URLs...

If the user asks for help or wants to give feedback inform them of the following: /help: Get help with using Claude Code...

Trace Inspection

Traces are compressed and analyzed by our internal agent before you read a single message. Get a summary of the full agent journey, key decisions, and issues up front.

User

LLM

Tool: Read

LLM

Tool: Edit

Tool: Bash

err

LLM

Experiments

Compare variants side by side across prompts, harnesses, or config changes.

Success

Latency

Coverage

Efficiency

v2.3

v2.4

Surface Problematic Traces

Automatically surface runs that deviate from normal patterns without reviewing every trace.

Variant Metrics

We're heavily optimized to support custom metrics for coding agents. Track tool errors, retries, subagent spins, files read, coverage, behavior patterns, and more across every variant you ship.

Tokens

Latency

Tool calls

Errors

Behavior Detection

Define behaviors in plain language and let our agent autonomously find traces with these behaviors.

retry_looppermission_deniedsuccessful_edithallucinationgood_tool_use

↳"flag traces where the agent retries the same tool more than 3 times"

Production Monitoring

Ingest production traces in real time and catch regressions before your users do.

3 active traces

throughput

42/min

p99

3.2s

error rate

1.2%

Understand your agents.

Set up in minutes.

Request Demo