Documentation

Core Concepts

AgentTrace is built around three ideas: everything that happens inside an agent run is recorded, that recording can be replayed deterministically, and replays can be forked at any point to explore alternatives.


Trace

A Trace is one complete execution of your agent — from the moment @agenttrace.run is entered to when it returns or throws.

Each trace has:

FieldTypeDescription
trace_idUUIDGlobally unique identifier
metadata.titlestring"Run: {agent_name}" by default
metadata.status"in_progress" | "completed" | "failed"Final status
metadata.duration_sfloatWall-clock execution time
eventsEvent[]Ordered list of everything that happened
source_codestringThe full source file captured at decoration time
requirementsstringpip freeze output at record time

Traces are uploaded to the backend at the end of the function via POST /api/trace/register.


Event

An Event is the atomic unit of a trace — one thing that happened at one moment in time.

Every event has this shape:

{
  "seq": 3,
  "type": "network_call",
  "timestamp": "2024-03-14T09:21:09.000Z",
  "timestamp_epoch": 1710411669.3,
  "step": "issue-refund",
  "payload": {
    "method": "POST",
    "url": "https://api.payments.example.com/refunds",
    "status": 200,
    "response": "{\"success\":true,\"txn_id\":\"TXN-441\"}"
  }
}
FieldDescription
seqZero-indexed sequential position within the trace
typeSee event types below
timestampISO 8601 UTC string
timestamp_epochUnix epoch float (used for time replay)
stepThe agenttrace.step() name active when this event fired, or "root"
payloadEvent-specific data

Event Types

TypeTriggered By
agent_startEntry into @agenttrace.run decorated function
network_callAny requests, httpx, or urllib3 HTTP request
socket_callRaw TCP connect/send/recv (gRPC, websockets, etc.)
observationagenttrace.step(..., type="observation")
tool_callagenttrace.step(..., type="tool_call")
errorUnhandled exception inside the agent
agent_completeReturn from @agenttrace.run decorated function

Step

A Step is a named scope you define using agenttrace.step(). Steps don't create new events — they label the events that occur inside them.

with agenttrace.step("check-eligibility", type="observation", user_id=user_id):
    eligible = amount <= balance
    agenttrace.set_result({"eligible": eligible, "balance": balance})

When any network call or tool call fires inside this context manager, it gets "step": "check-eligibility" in its event record. This is how AgentTrace tracks which part of your code caused which I/O.

Steps:

  • Can be nested
  • Work in both sync and async code (async with agenttrace.step(...))
  • Capture exceptions (the step's event type becomes "error" if the block raises)

Record Mode

In record mode (the default), AgentTrace monkey-patches the Python runtime boundaries:

BoundaryWhat's captured
urllib3 HTTPMethod, URL, response body, status code
httpx (sync + async)Same as urllib3
socket.connect/send/recvRaw bytes, remote address
time.time and datetime.nowCaptured via the agent_start event timestamp
random, uuid.uuid4The seed used is captured in agent_start.payload.seed

These patches are installed when @agenttrace.run is entered and removed when the function returns. They do not affect code outside the decorated scope.


Replay Mode

In replay mode, the same patches run in reverse — instead of capturing I/O, the interceptor serves recorded responses back from the event pool.

When you replay, the interceptor guarantees:

  • time.time() returns the same epoch as the original run (frozen from agent_start.timestamp_epoch)
  • random.random(), uuid.uuid4() produce the same sequence (seed restored from agent_start.payload.seed)
  • Every HTTP call to the same URL returns the exact same response body from the recorded trace
  • Every raw socket connection to the same address returns the same bytes

This means the agent makes the same decisions, in the same order, with the same data — on any machine, without network access.

Enabling replay mode

import agenttrace

# Load events from a previously recorded trace
events = [...]  # fetch from your dashboard or API

agenttrace.init(
    api_key="at_live_xxx",
    mode="replay",
    replay_events=events
)

@agenttrace.run("refund-processor")
def process_refund(order_id: str, amount: float):
    # This runs in the sandbox — all I/O served from `events`
    ...

process_refund("ORD-9021", 340.00)

Or from a file:

AGENTTRACE_MODE=replay \
AGENTTRACE_REPLAY_EVENTS_FILE=./trace-8821.json \
python agent.py

Governance Levels

AgentTrace enforces a governance level that controls what happens when the agent tries to do something the sandbox hasn't recorded.

LevelBehavior on unexpected I/O
"relaxed" (default)Prints a warning, allows the call through
"governance"Raises DeterminismLeakError or ReplayMismatchError, halts execution

Use "governance" in CI pipelines where you need hard guarantees. Use "relaxed" when iterating locally.

agenttrace.init(
    api_key="at_live_xxx",
    mode="replay",
    replay_events=events,
    governance_level="governance"   # strict mode
)

Environment variable:

AGENTTRACE_GOVERNANCE_LEVEL=governance python agent.py

What governance catches

  • Network calls to URLs not present in the recorded trace (ReplayMismatchError)
  • DNS lookups (DeterminismLeakError: DNS leak)
  • os.urandom() calls (DeterminismLeakError: Raw entropy leak)
  • subprocess.Popen() calls
  • open() and os.listdir() filesystem access
  • threading.Thread().start() (prints a warning in both levels)

Causal Fingerprint

After a replay completes, AgentTrace computes a causal fingerprint — a SHA-256 hash of all events, stripping non-deterministic fields (timestamp, timestamp_epoch, env, argv).

[AgentTrace] Replay consumed 14 events from trace. Fingerprint: a3f92c1d8e74b50f

If the fingerprint matches the original trace's fingerprint, the replay is causally identical — the agent made the same decisions in the same order. A mismatch means something changed (a prompt, a dependency, a code path).

This fingerprint is how AgentTrace detects regressions in CI without running in production.


Branching

A Branch is a fork of a trace at a specific step, with optional payload overrides. Branches let you answer questions like:

  • "What would have happened if the LLM had returned a different tool call at step 5?"
  • "What if the balance check returned $0 instead of $1240?"
  • "Does my retry logic work correctly when the API returns a 429 at step 3?"

Branches are created via the API or the Dashboard's Multiverse View. See Branching for the full guide.


Thread Safety

AgentTrace uses Python contextvars.ContextVar for all internal state. This means:

  • Multiple async tasks running concurrently inside a single @agenttrace.run are fully isolated
  • Each asyncio.Task gets its own trace context
  • Events are written with a threading.Lock to prevent concurrent append races

In replay mode, multi-threading is detected and warned about because un-intercepted threads can make network calls that bypass the sandbox. In governance level, all threading.Thread().start() calls raise a warning.