ADR: Why Our Cloud Agent Runs Behind a Privilege Boundary Instead of on the Public Web Server

Ledd Consulting

08 Apr 2026 — 7 min read

When you run 25+ services on a single VPS — including an AI agent that can read your CRM, trigger outbound emails, and execute code — the question of where that agent lives relative to your public website isn't academic. It's the single most consequential infrastructure decision you'll make.

At Ledd Consulting, we confronted this decision in early 2026 when we needed to expose our Cloud Agent to external requests (inbound MCP sessions, scheduled triggers, authenticated API calls) while keeping it firmly separated from our public marketing site. Here's what we evaluated, what we chose, and why.

Context — What Decision Needed to Be Made and Why

Our Cloud Agent is a privileged runtime. It has access to internal services, can invoke tools across our platform, and operates with credentials that would be catastrophic if leaked. It listens on localhost:3120 and handles everything from authenticated MCP sessions to scheduled automation triggers.

Meanwhile, our public website (leddconsulting.com) serves static pages, a contact form, a free prompt-audit tool, and several product landing pages. The contact form service on localhost:8080 handles Stripe webhooks, audit requests, and scoring endpoints.

Both runtimes live on the same VPS. Both need to be reachable from the internet. The question: how do we expose the agent without merging it into the web-facing attack surface?

This isn't a theoretical concern. We'd already seen what happens when privilege boundaries blur — a single misconfigured auth header on one service cascaded into 205 consecutive failures across dependent services. We needed an architecture that made accidental privilege escalation structurally impossible, not just policy-forbidden.

Options Considered

Option 1: Merge Into the Website Process

The simplest approach — add agent endpoints as routes in the existing web server.

Pros:

Zero additional infrastructure
Single process to monitor, single port to manage
Shared middleware for logging, auth, rate limiting

Cons:

A vulnerability in the public contact form would share process memory with the privileged agent
Every CVE in a website dependency becomes an agent-runtime CVE
Restart cycles for website deploys would take down the agent mid-operation
No way to apply different auth policies per route without brittle middleware chains

We rejected this immediately. When your agent can execute tools that modify production state, sharing a process with a public form handler is not a trade-off — it's a liability.

Option 2: Same Server, Path-Based Routing

Keep separate processes but route everything through a single domain: leddconsulting.com/api/* for the website, leddconsulting.com/agent/* for the Cloud Agent.

# Path-based routing — what we considered and rejected
server {
    server_name leddconsulting.com;

    location /api/ {
        proxy_pass http://127.0.0.1:8080;
    }

    location /agent/ {
        proxy_pass http://127.0.0.1:3120;
    }

    location / {
        root /var/www/leddconsulting;
    }
}

Pros:

Single TLS certificate, single domain
Familiar pattern, easy to reason about
Process isolation maintained

Cons:

Any nginx misconfiguration that falls through to the wrong location block bridges the boundary
CORS policies, CSP headers, and cookie scopes are shared at the domain level
Credential leakage via browser-stored cookies — a session token for the public site could be sent to agent endpoints by default
Monitoring and access logs are interleaved, making incident investigation harder
Rate limiting the public site and the agent require per-location configuration that's easy to get wrong

This was tempting because it's simple. But path-based routing makes the privilege boundary a configuration detail rather than an architectural guarantee. One misplaced location / block and the agent is reachable without auth.

Option 3: Separate Subdomain + Cloudflare Tunnel (Chosen)

Expose the agent on a dedicated subdomain (agent.leddconsulting.com), routed through a Cloudflare Tunnel to the VPS, with nginx performing the final proxy to localhost:3120.

Pros:

Domain-level isolation — completely separate TLS contexts, cookie scopes, CORS policies
Cloudflare provides DDoS protection, bot filtering, and TLS termination before traffic ever hits the VPS
The tunnel means the agent port is never directly exposed to the internet — no open ports in the firewall
Auth policies are enforced at both the Cloudflare layer and the application layer
Separate access logs, separate rate limits, separate monitoring

Cons:

Additional infrastructure dependency (Cloudflare)
Slightly more complex setup and DNS management
Tunnel daemon is another process to keep alive

Option 4: Separate Server Entirely

Run the agent on a different machine.

Pros:

Maximum isolation — network-level separation
Independent scaling, independent failure domains

Cons:

We run a 25-service platform on a single VPS. Adding a second server doubles our infrastructure cost and ops burden for a problem that doesn't require hardware-level isolation
Inter-service communication between the agent and the 25 services it orchestrates would need to cross the network, adding latency and complexity
We'd need to solve service discovery, credential distribution, and state synchronization across machines

This is the right answer at a different scale. For a team our size, it's over-engineering.

Decision Criteria — What Mattered Most and Why

We ranked our criteria in this order:

Privilege isolation must be structural, not configurational. A single nginx typo should not be able to bridge the public site and the agent runtime. Domain-level separation gives us this — cookies, CORS, and TLS contexts are isolated by the browser's own security model.
No open ports for the agent. The agent should not be directly reachable by IP scanning. Cloudflare Tunnels achieve this — traffic ingresses through Cloudflare's network and exits through the tunnel daemon on our VPS. The agent port is bound to 127.0.0.1 only.
Independent lifecycle management. We should be able to restart, update, or take down the public website without affecting the agent, and vice versa. Separate nginx server blocks and separate systemd units give us this.
Minimal operational overhead. We're a small team. The solution needs to be simple enough that any engineer can understand the full topology in under a minute.

Our Decision — What We Chose and How We Implemented It

We chose Option 3: subdomain isolation with Cloudflare Tunnel ingress.

The architecture looks like this:

Internet
  │
  ├── leddconsulting.com ──→ Cloudflare CDN ──→ VPS nginx ──→ static files + localhost:8080
  │
  ├── agent.leddconsulting.com ──→ Cloudflare Tunnel ──→ VPS nginx ──→ localhost:3120
  │
  └── mcp.leddconsulting.com ──→ Cloudflare Tunnel ──→ VPS nginx ──→ localhost:8080

The nginx configuration for the agent subdomain enforces auth at the proxy layer:

server {
    server_name agent.example.com;
    listen 443 ssl;

    # Agent endpoints — require API key on every request
    location / {
        # Reject requests without valid auth
        if ($http_x_api_key = "") {
            return 401 '{"error": "authentication required"}';
        }

        proxy_pass http://127.0.0.1:3120;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support for MCP sessions
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

The agent service itself validates the key at the application layer too — defense in depth:

// Auth middleware — runs on every inbound request to the agent
function validateAuth(req, res, next) {
  const key = req.headers['x-api-key'] || req.query.key;
  if (!key || key !== process.env.AGENT_AUTH_KEY) {
    return res.status(401).json({ error: 'authentication required' });
  }
  next();
}

For the MCP endpoint, we took the same approach — a dedicated subdomain (mcp.leddconsulting.com) proxying to a separate service on localhost:8080. Unauthenticated browser hits return a public descriptor so crawlers and curious visitors get a useful response rather than an error:

// Public endpoint descriptor for unauthenticated MCP hits
app.get('/', (req, res) => {
  if (!req.headers['x-api-key']) {
    return res.json({
      name: 'Ledd Consulting MCP Server',
      description: 'Authenticated MCP endpoint for tool access',
      auth: 'X-API-Key header required',
      docs: 'https://leddconsulting.com/for-agents'
    });
  }
  // Authenticated requests proceed to MCP session handler
  handleMCPSession(req, res);
});

We kept a compatibility path on the legacy domain (consulting.example.com/mcp) for existing integrations, but new clients always get the canonical subdomain.

Consequences — What Worked, What We'd Do Differently

What Worked

The boundary held under real pressure. When we had 205 consecutive auth failures cascade through our internal services, the public website was completely unaffected. The failure was contained within the agent's subdomain and its internal service mesh. No public-facing endpoint returned an error during the entire incident.

Monitoring became trivially simple. Separate subdomains mean separate Cloudflare analytics dashboards, separate nginx access logs, separate uptime checks. When our uptime monitor (which covers 26 services) flags the agent endpoint, we know it's the agent — not a website issue that happens to share a domain.

The "no open ports" property proved its value. We've seen port scans hit our VPS regularly. The agent port doesn't appear in any scan results because it's bound to 127.0.0.1. The only way to reach it is through the Cloudflare Tunnel, which means every request has already passed through Cloudflare's bot protection and rate limiting before it touches our nginx.

Credential scoping became natural. The AGENT_AUTH_KEY environment variable is only set in the agent service's systemd unit. The contact form service has its own credentials. There's no shared auth context that could leak across boundaries.

What We'd Do Differently

We should have set up the tunnel from day one. We initially ran the agent on a direct port with firewall rules, then migrated to the tunnel architecture. The migration was straightforward but required updating every client integration that was hitting the old endpoint. If we'd started with the tunnel, we'd have saved a weekend of coordination.

Systemd timeout configuration needed more thought. We discovered that services behind tunnels have different timeout characteristics than directly-exposed services. Our outreach CRM service, which chains multiple Claude API calls, needed its TimeoutStartSec bumped from 300 to 1200 seconds after systemd started killing it mid-operation during heavy processing runs with 23+ queued contacts. The tunnel adds a layer of indirection that makes timeout math less intuitive.

When to Reconsider

This decision should be revisited if any of these conditions become true:

We move to multiple servers. If we split services across machines, the Cloudflare Tunnel architecture stays the same, but we'd need to evaluate whether the agent should run on a dedicated host with its own tunnel.
We need sub-millisecond latency on agent calls. The tunnel adds ~5-15ms of latency. For our use case (agent operations that take seconds to minutes), this is negligible. If we needed real-time streaming with strict latency budgets, we'd need to evaluate direct exposure with a WAF.
Cloudflare changes their tunnel pricing or reliability guarantees. We're currently on a plan that makes tunnels economical. If that changes, we'd evaluate alternatives like WireGuard or Tailscale for the ingress layer.
The agent needs to serve high-traffic public endpoints. Right now, all agent traffic is authenticated and low-volume. If we added public-facing features (a free tool, a demo endpoint), we might need to split those onto the main domain and keep only privileged operations on the agent subdomain.

Conclusion

The core insight is simple: privilege boundaries should be enforced by architecture, not by configuration. When your AI agent can modify production state, read sensitive data, and trigger external actions, the question isn't whether to isolate it — it's how many layers of isolation you can afford. Subdomain separation with tunnel ingress gave us domain-level cookie isolation, zero open ports, independent lifecycle management, and clean monitoring boundaries — all without adding a second server.

The total implementation cost was an afternoon of nginx configuration and DNS setup. The ongoing cost is one tunnel daemon process. For a system running 25 services, 7 agents, and 60+ scheduled timers on a single VPS, that's the cheapest insurance policy we carry.

Need help building AI agent systems or designing multi-agent architectures? Ledd Consulting specializes in autonomous workflow design and agent orchestration for enterprise teams.

When a 401 Error Became Valid Data: How Poisoned Rows Survived Two Days Undetected

Intelligence Brief — Saturday, April 11, 2026

Intelligence Brief — Friday, April 10, 2026

The Demand-Aware Controller Pattern: Replacing Fixed Timers With State-Reading Triggers

Context — What Decision Needed to Be Made and Why

Options Considered

Option 1: Merge Into the Website Process

Option 2: Same Server, Path-Based Routing

Option 3: Separate Subdomain + Cloudflare Tunnel (Chosen)

Option 4: Separate Server Entirely

Decision Criteria — What Mattered Most and Why

Our Decision — What We Chose and How We Implemented It

Consequences — What Worked, What We'd Do Differently

What Worked

What We'd Do Differently

When to Reconsider

Conclusion

Read more

When a 401 Error Became Valid Data: How Poisoned Rows Survived Two Days Undetected

Intelligence Brief — Saturday, April 11, 2026

Intelligence Brief — Friday, April 10, 2026

The Demand-Aware Controller Pattern: Replacing Fixed Timers With State-Reading Triggers