How to Build a Production-Ready ReAct Agent in TypeScript: From Loop Architecture to Drift Correction
The ReAct pattern — Reasoning + Acting in an interleaved loop — is the control flow hiding inside every serious agent framework shipping today. Claude's Agent SDK uses it. LangChain's AgentExecutor uses it. AutoGPT uses it. Yet most tutorials stop at "LLM calls tools in a loop," which is roughly as useful as describing a database as "disk stores bytes."
This tutorial builds a production-grade ReAct agent from scratch in TypeScript. Not a toy. An agent with structured reasoning traces, typed tool execution, observation parsing, graceful failure recovery, and the monitoring hooks you need to stop it from silently drifting in production. By the end, you will have a working agent and — more importantly — an understanding of why each architectural decision matters when agents run unsupervised.
Why ReAct, and Why It Matters Now
Before ReAct, LLM-based systems fell into two camps that each failed in predictable ways:
Chain-of-Thought (CoT) reasoning asked models to think step-by-step. Excellent for math and logic, but fundamentally closed-world — the model cannot check a database, call an API, or verify its own assumptions. It hallucinates facts with high confidence.
Action-only agents gave models tools and let them fire. Without explicit reasoning, the model makes poor decisions about which tool to use and when. It thrashes — calling wrong tools, retrying pointlessly, losing track of goals.
ReAct interleaves both modes: Think → Act → Observe → Think → Act → Observe until the task is complete. Reasoning guides tool selection; observations ground reasoning in reality. Each mode compensates for the other's weakness.
The 2026 market validates this pattern's importance. According to recent job market analysis, multi-agent orchestration is the single highest-demand skill for AI engineering roles paying $180K–$350K. Every orchestration system is, at its core, coordinating multiple ReAct loops. If you cannot build one from scratch, you cannot debug one in production — and agents that cannot be debugged in production are agents that get shut down.
The Architecture: What We Are Building
Here is the complete control flow:
┌─────────────────────────────────────────────────────────┐
│ AGENT LOOP │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ REASON │────▶│ ACT │────▶│ OBSERVE │ │
│ │ │ │ │ │ │ │
│ │ "I need │ │ call │ │ parse │ │
│ │ to..." │ │ tool() │ │ result │ │
│ └──────────┘ └──────────┘ └────┬─────┘ │
│ ▲ │ │
│ └──────────────────────────────────┘ │
│ │
│ Exit conditions: answer found | max iterations | │
│ unrecoverable error | timeout │
└─────────────────────────────────────────────────────────┘
Our agent will: accept a natural language task, reason about what tools to call, execute those tools with typed inputs, parse observations, and loop until it has an answer or hits a guardrail. Alongside the core loop, we will wire in structured logging, drift detection, and human escalation hooks.
Step 1: Define the Type System
Type safety is not optional for production agents. When an LLM produces malformed tool calls at 2 AM, you need your runtime to catch it before downstream systems consume garbage.
// types.ts
interface Tool {
name: string;
description: string;
parameters: Record<string, ParameterDef>;
execute: (params: Record<string, unknown>) => Promise<ToolResult>;
}
interface ParameterDef {
type: "string" | "number" | "boolean" | "object";
description: string;
required: boolean;
}
interface ToolResult {
success: boolean;
data: unknown;
error?: string;
durationMs: number;
}
interface ReasoningStep {
thought: string;
action?: { tool: string; params: Record<string, unknown> };
observation?: ToolResult;
timestamp: number;
}
interface AgentTrace {
taskId: string;
task: string;
steps: ReasoningStep[];
finalAnswer: string | null;
exitReason: "answer" | "max_iterations" | "error" | "timeout" | "escalation";
totalDurationMs: number;
}
Key design decision: AgentTrace captures the complete reasoning history. This is not optional logging — it is your audit trail, your debugging surface, and your drift detection input. In compliance-sensitive domains like healthcare (where HIPAA violations average $1.5M per incident), this trace is the artifact that proves your agent behaved correctly.
Step 2: Build the Tool Registry
Tools are the agent's hands. A well-designed registry validates inputs before execution and captures timing data automatically.
// tool-registry.ts
class ToolRegistry {
private tools: Map<string, Tool> = new Map();
register(tool: Tool): void {
if (this.tools.has(tool.name)) {
throw new Error(`Duplicate tool registration: ${tool.name}`);
}
this.tools.set(tool.name, tool);
}
async execute(
toolName: string,
params: Record<string, unknown>
): Promise<ToolResult> {
const tool = this.tools.get(toolName);
if (!tool) {
return {
success: false,
data: null,
error: `Unknown tool: ${toolName}. Available: ${[...this.tools.keys()].join(", ")}`,
durationMs: 0,
};
}
// Validate required parameters
for (const [key, def] of Object.entries(tool.parameters)) {
if (def.required && !(key in params)) {
return {
success: false,
data: null,
error: `Missing required parameter: ${key}`,
durationMs: 0,
};
}
}
const start = performance.now();
try {
const result = await tool.execute(params);
return { ...result, durationMs: performance.now() - start };
} catch (err) {
return {
success: false,
data: null,
error: err instanceof Error ? err.message : String(err),
durationMs: performance.now() - start,
};
}
}
describeTools(): string {
return [...this.tools.values()]
.map(
(t) =>
`- ${t.name}: ${t.description}\n Parameters: ${JSON.stringify(t.parameters)}`
)
.join("\n");
}
}
Practical takeaway: The describeTools() method generates the tool description block that gets injected into the LLM's system prompt. This is the contract between your agent and its reasoning engine. If this description is vague, the agent will pick wrong tools. Be precise.
Step 3: Implement the ReAct Loop
This is the core. Every architectural decision in the previous steps exists to serve this loop.
// react-agent.ts
import Anthropic from "@anthropic-ai/sdk";
interface AgentConfig {
maxIterations: number;
timeoutMs: number;
model: string;
temperature: number;
onStep?: (step: ReasoningStep) => void; // monitoring hook
onEscalation?: (trace: AgentTrace) => void; // human-in-the-loop
}
const DEFAULT_CONFIG: AgentConfig = {
maxIterations: 15,
timeoutMs: 120_000,
model: "claude-sonnet-4-20250514",
temperature: 0,
};
class ReActAgent {
private client: Anthropic;
private registry: ToolRegistry;
private config: AgentConfig;
constructor(
client: Anthropic,
registry: ToolRegistry,
config: Partial<AgentConfig> = {}
) {
this.client = client;
this.registry = registry;
this.config = { ...DEFAULT_CONFIG, ...config };
}
async run(task: string): Promise<AgentTrace> {
const trace: AgentTrace = {
taskId: crypto.randomUUID(),
task,
steps: [],
finalAnswer: null,
exitReason: "max_iterations",
totalDurationMs: 0,
};
const startTime = performance.now();
const messages: Anthropic.MessageParam[] = [];
const systemPrompt = this.buildSystemPrompt(task);
for (let i = 0; i < this.config.maxIterations; i++) {
// Check timeout
if (performance.now() - startTime > this.config.timeoutMs) {
trace.exitReason = "timeout";
break;
}
// REASON: Ask the model what to do next
const response = await this.client.messages.create({
model: this.config.model,
max_tokens: 2048,
temperature: this.config.temperature,
system: systemPrompt,
messages: [
...messages,
...(i === 0
? [{ role: "user" as const, content: task }]
: []),
],
});
const text = response.content
.filter((b): b is Anthropic.TextBlock => b.type === "text")
.map((b) => b.text)
.join("");
const parsed = this.parseResponse(text);
const step: ReasoningStep = {
thought: parsed.thought,
timestamp: Date.now(),
};
// Check if the agent has a final answer
if (parsed.finalAnswer) {
trace.finalAnswer = parsed.finalAnswer;
trace.exitReason = "answer";
trace.steps.push(step);
this.config.onStep?.(step);
break;
}
// ACT: Execute the tool
if (parsed.action) {
step.action = parsed.action;
const result = await this.registry.execute(
parsed.action.tool,
parsed.action.params
);
step.observation = result;
// Append to conversation for next iteration
messages.push(
{ role: "assistant", content: text },
{
role: "user",
content: `Observation: ${JSON.stringify(result.data ?? result.error)}`,
}
);
// Check for repeated failures — escalation signal
if (this.detectThrashing(trace.steps)) {
trace.exitReason = "escalation";
this.config.onEscalation?.(trace);
break;
}
}
trace.steps.push(step);
this.config.onStep?.(step);
}
trace.totalDurationMs = performance.now() - startTime;
return trace;
}
private buildSystemPrompt(task: string): string {
return `You are a ReAct agent. For each step, output EXACTLY this format:
Thought: <your reasoning about what to do next>
Action: <tool_name>
Action Input: <JSON parameters>
When you have the final answer, output:
Thought: <your reasoning>
Final Answer: <your answer>
Available tools:
${this.registry.describeTools()}
Rules:
- Always reason before acting
- If a tool returns an error, reason about alternatives
- Never repeat the same tool call with the same parameters
- If stuck after 3 attempts, state what you know and give a partial answer`;
}
// ... parseResponse and detectThrashing shown next
}
Critical detail: Temperature is set to 0. For agent loops, you want deterministic tool selection. Creativity belongs in the final answer synthesis, not in deciding which API to call. If you need varied approaches to problem-solving, increase temperature selectively on retry iterations — not globally.
Step 4: Parse LLM Output and Detect Thrashing
The parser converts free-form LLM text into structured actions. The thrashing detector prevents your agent from burning tokens in infinite retry loops.
// Inside ReActAgent class
private parseResponse(text: string): {
thought: string;
action?: { tool: string; params: Record<string, unknown> };
finalAnswer?: string;
} {
const thoughtMatch = text.match(/Thought:\s*(.*?)(?=\n(?:Action|Final Answer))/s);
const actionMatch = text.match(/Action:\s*(\S+)/);
const inputMatch = text.match(/Action Input:\s*({.*?})/s);
const answerMatch = text.match(/Final Answer:\s*(.*)/s);
const thought = thoughtMatch?.[1]?.trim() ?? text;
if (answerMatch) {
return { thought, finalAnswer: answerMatch[1].trim() };
}
if (actionMatch && inputMatch) {
try {
return {
thought,
action: {
tool: actionMatch[1].trim(),
params: JSON.parse(inputMatch[1]),
},
};
} catch {
return { thought }; // Malformed JSON — let the loop retry
}
}
return { thought };
}
private detectThrashing(steps: ReasoningStep[]): boolean {
if (steps.length < 3) return false;
const recent = steps.slice(-3);
const signatures = recent.map((s) =>
s.action ? `${s.action.tool}:${JSON.stringify(s.action.params)}` : "none"
);
// Same tool call three times in a row = thrashing
return signatures[0] === signatures[1] && signatures[1] === signatures[2];
}
Why thrashing detection matters: Without it, an agent encountering a flaky API will retry identical calls until it exhausts your token budget. At $3–$15 per million tokens for capable models, an unmonitored thrashing agent can burn $50+ in minutes. The detectThrashing method is a simple heuristic — production systems should track more sophisticated patterns like oscillation between two tools or monotonically increasing error rates.
Step 5: Add Production Monitoring Hooks
Here is where most tutorials end and where production reality begins. An agent without monitoring is a liability. The emerging discipline of Agent Reliability-as-a-Service exists precisely because the dominant failure mode is not agents crashing — it is agents silently drifting from correct behavior.
// monitoring.ts
interface AgentMetrics {
taskId: string;
iterationCount: number;
toolCallCounts: Record<string, number>;
errorRate: number;
avgStepDurationMs: number;
exitReason: string;
driftScore: number;
}
function computeMetrics(trace: AgentTrace): AgentMetrics {
const toolCalls = trace.steps.filter((s) => s.action);
const errors = trace.steps.filter(
(s) => s.observation && !s.observation.success
);
const toolCallCounts: Record<string, number> = {};
for (const step of toolCalls) {
const name = step.action!.tool;
toolCallCounts[name] = (toolCallCounts[name] ?? 0) + 1;
}
const durations = trace.steps
.filter((s) => s.observation)
.map((s) => s.observation!.durationMs);
return {
taskId: trace.taskId,
iterationCount: trace.steps.length,
toolCallCounts,
errorRate: toolCalls.length > 0 ? errors.length / toolCalls.length : 0,
avgStepDurationMs:
durations.length > 0
? durations.reduce((a, b) => a + b, 0) / durations.length
: 0,
exitReason: trace.exitReason,
driftScore: computeDriftScore(trace),
};
}
function computeDriftScore(trace: AgentTrace): number {
// Drift = deviation from expected behavior patterns
// Higher score = more concerning
let score = 0;
// Penalty: high iteration count relative to task complexity
if (trace.steps.length > 10) score += (trace.steps.length - 10) * 0.1;
// Penalty: high error rate
const errorSteps = trace.steps.filter(
(s) => s.observation && !s.observation.success
);
score += (errorSteps.length / Math.max(trace.steps.length, 1)) * 2;
// Penalty: escalation or timeout exits
if (trace.exitReason === "escalation") score += 3;
if (trace.exitReason === "timeout") score += 2;
if (trace.exitReason === "max_iterations") score += 1;
return Math.min(score, 10); // Normalize to 0-10
}
Practical takeaway: The driftScore is a composite heuristic that should trigger alerts. In production, set thresholds: a score above 3 gets logged as a warning; above 5 pages an engineer; above 7 halts the agent and escalates to a human. These thresholds are domain-specific — a compliance agent in healthcare should have tighter thresholds than a content summarization agent.
Step 6: Wire It All Together
// main.ts
const client = new Anthropic();
const registry = new ToolRegistry();
// Register a sample tool
registry.register({
name: "search_docs",
description: "Search internal documentation by keyword query",
parameters: {
query: { type: "string", description: "Search query", required: true },
},
execute: async (params) => {
// Your actual search implementation here
const results = await searchIndex.query(params.query as string);
return { success: true, data: results, durationMs: 0 };
},
});
const agent = new ReActAgent(client, registry, {
maxIterations: 12,
timeoutMs: 60_000,
onStep: (step) => {
console.log(`[${new Date(step.timestamp).toISOString()}] ${step.thought}`);
if (step.observation && !step.observation.success) {
console.warn(` ⚠ Tool error: ${step.observation.error}`);
}
},
onEscalation: (trace) => {
console.error(`🚨 Agent thrashing detected on task: ${trace.task}`);
// Send to PagerDuty, Slack, etc.
},
});
const trace = await agent.run(
"Find our API rate limiting policy and summarize the limits for the Pro tier"
);
const metrics = computeMetrics(trace);
console.log(`Exit: ${metrics.exitReason}, Drift: ${metrics.driftScore}`);
Common Failure Modes and How to Handle Them
After deploying ReAct agents across production systems, these are the failure patterns that recur:
1. Parse Failures. The LLM deviates from the expected output format. Solution: make your parser tolerant. Fall back to treating the entire response as a "thought" and let the next iteration self-correct. Never crash on malformed output.
2. Tool Hallucination. The model invents tool names that do not exist. The ToolRegistry.execute method handles this by returning an error with available tool names — the model self-corrects on the next iteration in roughly 95% of cases.
3. Context Window Exhaustion. Long-running agents accumulate observations that overflow the context window. Solution: implement a sliding window that summarizes older steps. Keep the last 3 full steps and compress earlier ones to single-line summaries.
4. Drift Without Errors. The agent technically succeeds but its answers degrade over time as upstream data sources or APIs change. This is where continuous monitoring earns its keep — track answer quality metrics, not just error rates.
Scaling to Multi-Agent Orchestration
A single ReAct agent handles a single task. Real systems require fleets of specialized agents coordinated by an orchestrator. The architecture extends naturally:
Orchestrator (ReAct Agent)
├── Research Agent (ReAct + search tools)
├── Analysis Agent (ReAct + computation tools)
└── Writing Agent (ReAct + document tools)
Each sub-agent runs its own ReAct loop and returns a structured AgentTrace to the orchestrator. The orchestrator's "tools" are the sub-agents themselves. This is the pattern behind granular labor decomposition — every complex task decomposes into micro-tasks handled by specialized agents, and the orchestrator's job is routing, not reasoning about domain specifics.
The monitoring layer compounds in value at this scale. A single agent's drift score is informative; aggregate drift scores across a fleet reveal systemic issues — a degraded API affecting multiple agents, a prompt regression after a model update, or a data source going stale.
Conclusion
The ReAct pattern is deceptively simple: think, act, observe, repeat. The production reality is that each component — typed tool execution, structured reasoning traces, parse tolerance, thrashing detection, drift monitoring, and escalation hooks — exists because agents fail in specific, predictable ways that this architecture systematically addresses.
The code in this tutorial gives you a working foundation. The real work begins when you deploy it: tuning thresholds, building domain-specific tools, implementing context window management, and connecting monitoring to alerting systems. The agents that survive in production are not the cleverest ones — they are the ones with the best observability and the fastest human escalation paths.
Every line of monitoring code you write today is cheaper than the silent failure you will debug at 3 AM without it.
Need help building AI agent systems or designing multi-agent architectures? Ledd Consulting specializes in autonomous workflow design and agent orchestration for enterprise teams.