Documentation
Core Concepts
AgentTrace is built around three ideas: everything that happens inside an agent run is recorded, that recording can be replayed deterministically, and replays can be forked at any point to explore alternatives.
Trace
A Trace is one complete execution of your agent — from the moment @agenttrace.run is entered to when it returns or throws.
Each trace has:
| Field | Type | Description |
|---|---|---|
trace_id | UUID | Globally unique identifier |
metadata.title | string | "Run: {agent_name}" by default |
metadata.status | "in_progress" | "completed" | "failed" | Final status |
metadata.duration_s | float | Wall-clock execution time |
events | Event[] | Ordered list of everything that happened |
source_code | string | The full source file captured at decoration time |
requirements | string | pip freeze output at record time |
Traces are uploaded to the backend at the end of the function via POST /api/trace/register.
Event
An Event is the atomic unit of a trace — one thing that happened at one moment in time.
Every event has this shape:
{
"seq": 3,
"type": "network_call",
"timestamp": "2024-03-14T09:21:09.000Z",
"timestamp_epoch": 1710411669.3,
"step": "issue-refund",
"payload": {
"method": "POST",
"url": "https://api.payments.example.com/refunds",
"status": 200,
"response": "{\"success\":true,\"txn_id\":\"TXN-441\"}"
}
}
| Field | Description |
|---|---|
seq | Zero-indexed sequential position within the trace |
type | See event types below |
timestamp | ISO 8601 UTC string |
timestamp_epoch | Unix epoch float (used for time replay) |
step | The agenttrace.step() name active when this event fired, or "root" |
payload | Event-specific data |
Event Types
| Type | Triggered By |
|---|---|
agent_start | Entry into @agenttrace.run decorated function |
network_call | Any requests, httpx, or urllib3 HTTP request |
socket_call | Raw TCP connect/send/recv (gRPC, websockets, etc.) |
observation | agenttrace.step(..., type="observation") |
tool_call | agenttrace.step(..., type="tool_call") |
error | Unhandled exception inside the agent |
agent_complete | Return from @agenttrace.run decorated function |
Step
A Step is a named scope you define using agenttrace.step(). Steps don't create new events — they label the events that occur inside them.
with agenttrace.step("check-eligibility", type="observation", user_id=user_id):
eligible = amount <= balance
agenttrace.set_result({"eligible": eligible, "balance": balance})
When any network call or tool call fires inside this context manager, it gets "step": "check-eligibility" in its event record. This is how AgentTrace tracks which part of your code caused which I/O.
Steps:
- Can be nested
- Work in both sync and async code (
async with agenttrace.step(...)) - Capture exceptions (the step's event type becomes
"error"if the block raises)
Record Mode
In record mode (the default), AgentTrace monkey-patches the Python runtime boundaries:
| Boundary | What's captured |
|---|---|
urllib3 HTTP | Method, URL, response body, status code |
httpx (sync + async) | Same as urllib3 |
socket.connect/send/recv | Raw bytes, remote address |
time.time and datetime.now | Captured via the agent_start event timestamp |
random, uuid.uuid4 | The seed used is captured in agent_start.payload.seed |
These patches are installed when @agenttrace.run is entered and removed when the function returns. They do not affect code outside the decorated scope.
Replay Mode
In replay mode, the same patches run in reverse — instead of capturing I/O, the interceptor serves recorded responses back from the event pool.
When you replay, the interceptor guarantees:
time.time()returns the same epoch as the original run (frozen fromagent_start.timestamp_epoch)random.random(),uuid.uuid4()produce the same sequence (seed restored fromagent_start.payload.seed)- Every HTTP call to the same URL returns the exact same response body from the recorded trace
- Every raw socket connection to the same address returns the same bytes
This means the agent makes the same decisions, in the same order, with the same data — on any machine, without network access.
Enabling replay mode
import agenttrace
# Load events from a previously recorded trace
events = [...] # fetch from your dashboard or API
agenttrace.init(
api_key="at_live_xxx",
mode="replay",
replay_events=events
)
@agenttrace.run("refund-processor")
def process_refund(order_id: str, amount: float):
# This runs in the sandbox — all I/O served from `events`
...
process_refund("ORD-9021", 340.00)
Or from a file:
AGENTTRACE_MODE=replay \
AGENTTRACE_REPLAY_EVENTS_FILE=./trace-8821.json \
python agent.py
Governance Levels
AgentTrace enforces a governance level that controls what happens when the agent tries to do something the sandbox hasn't recorded.
| Level | Behavior on unexpected I/O |
|---|---|
"relaxed" (default) | Prints a warning, allows the call through |
"governance" | Raises DeterminismLeakError or ReplayMismatchError, halts execution |
Use "governance" in CI pipelines where you need hard guarantees. Use "relaxed" when iterating locally.
agenttrace.init(
api_key="at_live_xxx",
mode="replay",
replay_events=events,
governance_level="governance" # strict mode
)
Environment variable:
AGENTTRACE_GOVERNANCE_LEVEL=governance python agent.py
What governance catches
- Network calls to URLs not present in the recorded trace (
ReplayMismatchError) - DNS lookups (
DeterminismLeakError: DNS leak) os.urandom()calls (DeterminismLeakError: Raw entropy leak)subprocess.Popen()callsopen()andos.listdir()filesystem accessthreading.Thread().start()(prints a warning in both levels)
Causal Fingerprint
After a replay completes, AgentTrace computes a causal fingerprint — a SHA-256 hash of all events, stripping non-deterministic fields (timestamp, timestamp_epoch, env, argv).
[AgentTrace] Replay consumed 14 events from trace. Fingerprint: a3f92c1d8e74b50f
If the fingerprint matches the original trace's fingerprint, the replay is causally identical — the agent made the same decisions in the same order. A mismatch means something changed (a prompt, a dependency, a code path).
This fingerprint is how AgentTrace detects regressions in CI without running in production.
Branching
A Branch is a fork of a trace at a specific step, with optional payload overrides. Branches let you answer questions like:
- "What would have happened if the LLM had returned a different tool call at step 5?"
- "What if the balance check returned $0 instead of $1240?"
- "Does my retry logic work correctly when the API returns a 429 at step 3?"
Branches are created via the API or the Dashboard's Multiverse View. See Branching for the full guide.
Thread Safety
AgentTrace uses Python contextvars.ContextVar for all internal state. This means:
- Multiple async tasks running concurrently inside a single
@agenttrace.runare fully isolated - Each
asyncio.Taskgets its own trace context - Events are written with a
threading.Lockto prevent concurrent append races
In replay mode, multi-threading is detected and warned about because un-intercepted threads can make network calls that bypass the sandbox. In governance level, all threading.Thread().start() calls raise a warning.