ADR: Choosing IPC Over HTTP for Inter-Service Communication on a Single Host
When you run 25 services on a single VPS, the default instinct is to wire them together with HTTP. Every service gets a port, every call gets serialized to JSON, every request traverses the full TCP stack — and nobody questions it because "that's how microservices work." We questioned it. This is the decision record for why Ledd Consulting chose a localhost IPC pattern with signed envelopes over conventional HTTP, and what that decision cost and saved us over six months of production operation.
Context — What Decision Needed to Be Made and Why
Our platform runs 25 services on a single VPS. These services handle everything from agent orchestration and event routing to notification delivery and trade monitoring. Early in development, every service talked to every other service over HTTP on unique ports — each bound to 127.0.0.1 with its own port number.
The port allocation table grew unwieldy fast. We tracked services across ports 3004 through 3120, each requiring its own health check, its own timeout configuration, and its own retry logic. More critically, every inter-service call paid the full cost of TCP connection setup, HTTP header parsing, and JSON serialization — even though the bytes traveled exactly zero network hops.
The breaking point arrived when our notification router needed to fan out to five downstream services per event. At peak load during a monitoring storm (documented separately), we observed request queuing caused entirely by localhost TCP handshakes stacking up. The p99 latency on internal calls hit 45ms — absurd for bytes that stay on the same kernel.
We needed a communication pattern optimized for the reality of our deployment: everything runs on one machine, every service shares the same filesystem, and network partitions between services are impossible.
Options Considered
Option 1: Keep HTTP on Localhost (Status Quo)
Every service binds a unique port on 127.0.0.1. Callers use http.request() with hardcoded host/port pairs.
Strengths: Universal tooling support. Every monitoring tool, every debug proxy, every developer already knows HTTP. curl works for ad-hoc testing. Load balancers slot in trivially if we ever distribute services across hosts.
Weaknesses: Port management becomes a coordination problem at scale. Each service owns a port, and collisions require manual conflict resolution. TCP overhead is measurable when fan-out multiplies per-call costs. Every internal call carries HTTP headers that serve zero purpose — Content-Length, Connection, Host — just ceremony for a call that stays within the same kernel.
Option 2: Unix Domain Sockets with a Custom Protocol
Replace TCP ports with filesystem-based sockets. Each service creates a .sock file, and callers connect by path instead of port number.
Strengths: Eliminates TCP overhead entirely. The kernel skips the full network stack for UDS connections — routing, packet framing, congestion control all vanish. File permissions provide OS-level access control. Socket paths are self-documenting (/var/run/platform/notification.sock tells you exactly what lives there).
Weaknesses: Tooling support drops sharply. Standard HTTP clients need adapters. Debug tools like curl require --unix-socket flags. If we ever split services across hosts, every UDS call needs a complete transport rewrite. Stale socket files after crashes require cleanup logic.
Option 3: Shared-Memory or Named Pipes
Use OS primitives — shared memory segments, named pipes, or memory-mapped files — for zero-copy communication between services.
Strengths: Absolute minimum latency. Shared memory eliminates serialization entirely for structured data. Named pipes provide simple unidirectional streams.
Weaknesses: Concurrency management becomes our problem — mutexes, semaphores, race conditions. Node.js has limited ergonomic support for POSIX shared memory. Debugging becomes opaque; there is nothing to tcpdump or log at the transport layer. The complexity budget is enormous for marginal latency gains over UDS.
Option 4: Localhost HTTP with Signed Envelopes and SDK Contracts
Keep HTTP as the transport (preserving tooling compatibility) but optimize the pattern: bind exclusively to 127.0.0.1, replace ad-hoc authentication with HMAC-signed envelopes, and centralize all inter-service contracts in a shared SDK.
Strengths: Retains full HTTP tooling ecosystem. curl still works. Monitoring still works. Migration to multi-host requires only a config change (swap 127.0.0.1 for a real hostname). Signed envelopes add replay protection and source verification — genuine security even for localhost traffic. The SDK eliminates per-service boilerplate and enforces consistent contracts.
Weaknesses: Still pays TCP overhead on the loopback interface (though the kernel optimizes this heavily). Still requires port allocation. Adds cryptographic overhead per call (HMAC computation).
Decision Criteria — What Mattered Most and Why
We ranked these criteria by operational impact:
- Debuggability under pressure. When a service chain fails at 2 AM, we need
curland standard HTTP tooling to isolate the fault in minutes. Any pattern that requires custom tooling to debug is a liability. - Migration path to multi-host. We run on a single VPS today. That constraint could change. The IPC pattern must survive a topology change with configuration updates — code changes are unacceptable.
- Security between services. Even on localhost, we verify message authenticity. A compromised service must be unable to forge messages from another service. This matters because our agents handle financial operations and external API calls with real consequences.
- Developer velocity. New services join the platform regularly. Onboarding a service to the communication layer should take minutes, require minimal boilerplate, and produce consistent behavior.
- Latency at fan-out. A single event can trigger five to eight downstream service calls. Per-call overhead compounds multiplicatively, so the baseline cost of each call matters.
Our Decision — What We Chose and How We Implemented It
We chose Option 4: Localhost HTTP with signed envelopes and SDK-based contracts. The implementation has two components: a signing protocol that every service speaks, and a shared SDK that encapsulates the protocol.
The Signed Envelope Protocol
Every inter-service message carries four headers that form a cryptographic envelope:
function signEnvelope(rawBody, source = 'ipc-service') {
if (!IPC_AUTH_SECRET) {
throw new Error('MESSAGE_AUTH_SECRET is required');
}
const timestamp = String(Date.now());
const nonce = crypto.randomUUID();
const signature = crypto
.createHmac('sha256', IPC_AUTH_SECRET)
.update(`${timestamp}.${nonce}.${rawBody}`)
.digest('hex');
return {
'X-Agent-Timestamp': timestamp,
'X-Agent-Nonce': nonce,
'X-Agent-Signature': signature,
'X-Agent-Source': source,
};
}
The receiving service verifies the envelope before processing:
function verifySignedEnvelope(headers, rawBody) {
if (!IPC_AUTH_SECRET) {
return { ok: false, status: 503, error: 'MESSAGE_AUTH_SECRET is required' };
}
const signature = readHeader(headers, 'x-agent-signature');
const timestampRaw = readHeader(headers, 'x-agent-timestamp');
const nonce = readHeader(headers, 'x-agent-nonce');
if (!signature || !timestampRaw || !nonce) {
return { ok: false, status: 401, error: 'Missing signed envelope headers' };
}
const timestamp = parseTimestamp(timestampRaw);
if (!Number.isFinite(timestamp)) {
return { ok: false, status: 401, error: 'Invalid timestamp' };
}
const ageMs = Math.abs(Date.now() - timestamp);
if (ageMs > MESSAGE_AUTH_WINDOW_MS) {
return { ok: false, status: 401, error: `Stale timestamp (${ageMs}ms)` };
}
cleanupNonces();
const seen = seenNonces.get(nonce);
if (seen && seen > Date.now()) {
return { ok: false, status: 409, error: 'Replay detected (nonce reused)' };
}
const expected = crypto
.createHmac('sha256', IPC_AUTH_SECRET)
.update(`${timestampRaw}.${nonce}.${rawBody}`)
.digest('hex');
const valid = crypto.timingSafeEqual(
Buffer.from(signature, 'hex'),
Buffer.from(expected, 'hex')
);
if (!valid) return { ok: false, status: 401, error: 'Invalid signature' };
// ... record nonce, return success
}
Three properties make this envelope robust in production:
Timestamp windowing. Messages older than 5 minutes are rejected. This prevents captured requests from being replayed hours later, while providing enough slack for clock skew between service restarts.
Nonce tracking with bounded memory. Every nonce is stored with an expiration timestamp. A background cleanup pass runs every 30 seconds and evicts expired entries. If memory pressure forces it, we cap at 20,000 stored nonces and evict oldest-first:
function cleanupNonces(now = Date.now()) {
const shouldCleanup = (now - lastNonceCleanupAt > 30_000)
|| seenNonces.size > MAX_NONCES;
if (!shouldCleanup) return;
lastNonceCleanupAt = now;
for (const [nonce, expiresAt] of seenNonces) {
if (expiresAt <= now) seenNonces.delete(nonce);
}
if (seenNonces.size <= MAX_NONCES) return;
const overflow = seenNonces.size - MAX_NONCES;
let removed = 0;
for (const nonce of seenNonces.keys()) {
seenNonces.delete(nonce);
removed++;
if (removed >= overflow) break;
}
}
Timing-safe comparison. We use crypto.timingSafeEqual for signature verification — a constant-time comparison that prevents timing side-channel attacks. Even on localhost, defense in depth is the standard.
The SDK Contract Layer
The second component is a shared SDK that every service imports. Instead of each service hand-rolling HTTP calls with bespoke header handling, the SDK encapsulates registration, authentication, and response formatting:
class PlatformSDK {
constructor(config = {}) {
this.apiKey = config.apiKey || process.env.PLATFORM_API_KEY;
this.webhookSecret = config.webhookSecret || process.env.WEBHOOK_SECRET;
this.apiEndpoint = config.apiEndpoint || 'https://api.example.com';
this.webhookPort = config.webhookPort || 8080;
this.queryHandlers = [];
this.errorHandlers = [];
this.stats = {
queries_received: 0,
queries_processed: 0,
queries_failed: 0,
avg_response_time_ms: 0,
};
}
startWebhookServer() {
const server = http.createServer((req, res) => {
let body = '';
req.on('data', chunk => { body += chunk.toString(); });
req.on('end', async () => {
const data = JSON.parse(body);
if (!this.verifyWebhookSignature(data,
req.headers['x-webhook-signature'] || req.headers['x-signature']
)) {
res.writeHead(401, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ error: 'Signature verification failed' }));
return;
}
this.stats.queries_received++;
const startTime = Date.now();
for (const handler of this.queryHandlers) {
const result = await handler(data.query_id, data.query_text, data);
if (result) {
await this.submitResponse(data.query_id, result);
break;
}
}
const responseTime = Date.now() - startTime;
this.updateStats(responseTime, true);
res.writeHead(200, { 'Content-Type': 'application/json' });
res.end(JSON.stringify({ status: 'processing', query_id: data.query_id }));
});
});
server.listen(this.webhookPort, () => {
console.log(`Webhook server listening on port ${this.webhookPort}`);
});
return server;
}
onQuery(handler) {
this.queryHandlers.push(handler);
}
}
A new service joins the platform by importing the SDK, registering a query handler, and calling startWebhookServer(). The signing protocol, signature verification, stats tracking, and error handling are inherited automatically. Onboarding a service to the IPC layer takes under 10 minutes.
Status Proxy Pattern
For services that expose read-only status endpoints, we use a proxy pattern through the IPC server instead of exposing each service's port directly:
function fetchInternalService(endpoint) {
return new Promise((resolve, reject) => {
const opts = {
hostname: '127.0.0.1',
port: 5000,
path: endpoint,
method: 'GET',
headers: { Authorization: 'Bearer ' + process.env.SERVICE_READ_KEY },
timeout: 5000,
};
const req = http.request(opts, (r) => {
let body = '';
r.on('data', c => { body += c; });
r.on('end', () => {
try { resolve(JSON.parse(body)); }
catch (_) { resolve({ raw: body }); }
});
});
req.on('error', (e) => reject(e));
req.on('timeout', () => { req.destroy(); reject(new Error('timeout')); });
req.end();
});
}
This consolidates external-facing access through a single gateway while keeping individual services bound exclusively to localhost.
Consequences — What Worked, What We'd Do Differently
What worked well:
The signed envelope protocol caught a real issue within the first month. During a monitoring storm that generated 1,400 duplicate events in 20 minutes, the nonce deduplication layer rejected replayed messages automatically. The bounded nonce map held steady at ~4,000 entries during peak load — well within the 20,000 cap.
Debuggability remained excellent. When a notification chain failed, we diagnosed the issue with a single curl call including the signed headers. Every HTTP-compatible tool in our arsenal continued working unchanged.
The SDK cut new service integration time from roughly two hours of boilerplate to under ten minutes of handler registration. Across 25 services, that saved weeks of cumulative development time.
What we'd do differently:
Port management remains a manual coordination problem. We maintain a service reference document mapping every port, and collisions still happen occasionally during rapid development. An automatic port registry or a shift to Unix domain sockets for purely internal services would eliminate this class of error.
The HMAC computation adds ~0.3ms per call. Across a fan-out of eight services, that is 2.4ms of pure cryptographic overhead. For our workload, this is negligible. For latency-critical pipelines processing thousands of events per second, this overhead would justify a session-key approach with less frequent full HMAC verification.
When to Reconsider — Conditions That Would Change This Decision
Multi-host deployment. The moment services span more than one machine, the localhost binding assumption breaks. The migration path is straightforward — swap 127.0.0.1 for service hostnames and add TLS — but the cost profile changes dramatically. Network latency dwarfs any IPC optimization, and a service mesh or proper API gateway becomes worth the operational complexity.
Service count above 50. Our port allocation system works at 25 services. At 50+, the coordination overhead becomes a real drag. At that scale, Unix domain sockets with a service discovery mechanism (even a simple JSON registry file on a shared filesystem) would deliver better ergonomics.
Latency requirements below 1ms p99. If we ever run a service chain where sub-millisecond end-to-end latency matters, shared memory or a purpose-built IPC framework like nanomsg becomes the right tool. Our current ~3ms p99 for a signed localhost call is excellent for orchestration workloads — it would be unacceptable for a high-frequency trading pipeline.
Conclusion
The decision to stay on HTTP while adding signed envelopes and SDK contracts gave us the best combination of debuggability, security, and migration flexibility for a 25-service single-host deployment. We kept every tool in the HTTP ecosystem working while adding cryptographic guarantees that caught real production issues. The SDK layer compressed weeks of per-service integration work into a reusable contract that new services adopt in minutes.
The key insight: optimize for your actual deployment topology, and resist the pull toward distributed-system patterns when your services share a kernel. The simplest architecture that handles your real constraints beats the most sophisticated architecture designed for constraints you may face someday.
Need help building AI agent systems or designing multi-agent architectures? Ledd Consulting specializes in autonomous workflow design and agent orchestration for enterprise teams.