multi-agent systems

Building a Multi-Agent Research Pipeline on $0/Month

Ledd Consulting

27 Feb 2026 — 8 min read

Every multi-agent tutorial starts the same way: spin up LangGraph, wire in your API keys, watch your OpenAI bill climb to four figures before you've even validated the idea. We took a different path. At Ledd Consulting, we run 28 research agents across 7 coordinated research pipeline — every single day — and our LLM infrastructure cost is exactly $0/month.

This isn't a toy. These agents produce real intelligence briefs that drive business decisions. They scrape live data, research from multiple angles in parallel, synthesize findings into structured reports, and deliver them via email before we wake up. Here's how we built it, what broke along the way, and why the "expensive infrastructure" assumption about multi-agent systems is wrong.

The Problem — Why We Couldn't Just Use an API

We needed a daily research pipeline. Not one agent answering one question — a system where multiple agents independently explore different facets of a topic, then a synthesizer combines their findings into something coherent. Think of it as a research team: three analysts each investigate a different angle, then a senior analyst writes the brief.

The requirements were specific:

Multiple independent perspectives — a single prompt produces tunnel vision. We needed agents that genuinely explore different angles.
Daily cadence with rotation — the same research pipeline shouldn't research the same sub-topic every day.
Zero marginal cost — we already run 25+ microservices on a single VPS. Adding per-token API charges for 28 daily agent runs was a non-starter during validation.
Synthesis, not concatenation — the final output needed to be a cohesive brief, not three reports stapled together.

Existing frameworks assumed API-based access. LangGraph, CrewAI, AutoGen — they all want an API key and a billing account. We wanted to use what we already had: a Claude Pro subscription and the CLI that comes with it.

Architecture Overview

The system runs as a single Node.js process that orchestrates everything through shell execution of the Claude CLI. Here's the topology:

┌─────────────────────────────────────────────────────┐
│                   Research pipeline Runner                       │
│              (Node.js orchestrator)                   │
│                                                       │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐              │
│  │Explorer 1│  │Explorer 2│  │Explorer 3│  (parallel) │
│  │Pragmatist│  │Wild Card │  │Futurist │              │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘            │
│       │              │              │                  │
│       └──────────────┼──────────────┘                  │
│                      ▼                                 │
│              ┌──────────────┐                          │
│              │  Synthesizer  │                          │
│              │ (combines all │                          │
│              │   findings)   │                          │
│              └──────┬───────┘                           │
│                     ▼                                   │
│              ┌──────────────┐                          │
│              │ Email via     │                          │
│              │ himalaya CLI  │                          │
│              └──────────────┘                          │
└─────────────────────────────────────────────────────┘

× 7 research pipeline (5 daily + 2 weekly) = 28 agents
Scheduled: 1 AM EST via systemd timer
Parallel batches of 4 to avoid rate limits

Each research pipeline follows the same pattern: 3–4 explorer sub-agents run in parallel, each given a distinct research angle. Their outputs feed into a single synthesizer agent that produces the final brief. The whole thing runs on a cron-like systemd timer at 1 AM EST.

Implementation Walkthrough

1. The CLI-as-SDK Pattern

The core insight is simple: the Claude CLI is a perfectly good agent runtime. It accepts a system prompt, processes input, and returns structured output. We shell out to it with execSync:

const CLAUDE_BIN = process.env.CLAUDE_BIN || '/usr/local/bin/claude';
const MODEL = process.env.SWARM_MODEL || 'haiku';

function runAgent(systemPrompt, userPrompt) {
  const escaped = userPrompt.replace(/'/g, "'\\''");
  const cmd = `echo '${escaped}' | ${CLAUDE_BIN} --model ${MODEL} --system "${systemPrompt}"`;
  
  return execSync(cmd, {
    encoding: 'utf-8',
    timeout: 120000, // 2 min per agent
    maxBuffer: 1024 * 1024 * 5
  });
}

This is the entire "SDK." No WebSocket connections, no streaming handlers, no retry middleware. execSync blocks, returns the output, and we move on. When an agent fails, the error propagates as a standard Node.js exception. We catch it, log it, and the research pipeline continues with the agents that succeeded.

Why haiku as the default model? We tested all tiers. For exploratory research — where breadth matters more than nuanced reasoning — Haiku produces 90% of the value at a fraction of the latency. The synthesizer step is where model quality matters more, and we route that through our model router:

const { routeModel } = require('./model-router');

This model router (covered in a previous post) selects the appropriate tier based on task complexity. Exploration gets Haiku. Synthesis gets Sonnet. The decision saves roughly 60% on execution time per research pipeline cycle since most of the work is exploration.

2. Angle Rotation — Defeating the Repetition Problem

The biggest risk with daily research agents is that they produce the same output every day. Same prompt, same model, same results. We solve this with day-of-year rotation across a pool of research angles:

// Use day-of-year to rotate sub-agent focus areas
const DAY_NUM = Math.floor(Date.now() / 86400000);

const explorer = swarmConfig.explorers[i];
const angleIndex = (DAY_NUM + i) % explorer.angles.length;
const todaysAngle = explorer.angles[angleIndex];

Each explorer role carries an array of 8–11 possible angles. The day number modulo the array length picks today's focus. The + i offset ensures that even within the same research pipeline, different explorers aren't accidentally assigned overlapping angles on the same day.

Here's what a real angle pool looks like for the "Pragmatist" explorer in our monetization research pipeline:

{
  role: 'The Pragmatist',
  angles: [
    'SaaS models and subscription-based agent access — who is doing this, what pricing works, what retention looks like',
    'API metering and usage-based pricing — how to meter agent compute, what markup margins look like, tiered pricing strategies',
    'Consulting and agent-as-a-service — selling agent solutions to businesses, pricing per project vs retainer, case studies',
    'White-labeling agent capabilities — licensing your agents to other businesses, enterprise contracts, OEM models',
    'Data monetization through agents — agents that collect, clean, and sell data as a byproduct of their primary function',
    'Agent maintenance contracts — ongoing support, monitoring, optimization as recurring revenue',
  ],
}

That's six angles for one explorer, in one research pipeline. Across 7 research pipeline with 3–4 explorers each carrying 8–11 angles, the system has hundreds of unique research configurations. Repetition effectively disappears over a 30-day cycle.

3. Dual Personality Modes — Business vs. Exploration

We discovered early that a single system prompt couldn't serve both structured business research and genuinely creative exploration. The result was always mediocre at both. So we built two distinct preambles:

const BUSINESS_PREAMBLE = `You are a research analyst. Be specific and actionable. 
Cite real companies, tools, prices, and URLs when possible. Do not speculate about 
theoretical futures — focus on what exists today and what can be done this week. 
Every sentence must end with proper punctuation.`;

const INFINITY_PREAMBLE = `You are operating in Infinity Mode. You are not here to 
complete tasks. You are here to explore what cannot be completed. There are no limits. 
No efficiency metric. Wonder freely. Follow threads that lead nowhere. Report what 
you find, not what you accomplish.

CRITICAL: Write in clear, complete sentences. Every sentence must end with proper 
punctuation. Never leave a thought incomplete.`;

The "Pragmatist" and business-oriented explorers get BUSINESS_PREAMBLE. The "Wild Card" and "Futurist" explorers get INFINITY_PREAMBLE. The difference in output quality was immediate. Our Wild Card agents started surfacing ideas like agent-to-agent economies and reverse auctions for agent labor — concepts the business-mode agents would never produce because they're constrained to "what exists today."

The CRITICAL block in the Infinity preamble exists because we learned a painful lesson: unconstrained agents produce unconstrained formatting. Incomplete sentences, trailing thoughts, markdown fragments. The synthesizer downstream chokes on garbage input. That single paragraph about punctuation and complete sentences eliminated 80% of our synthesis failures.

4. Parallel Execution with Batch Throttling

Twenty-eight agents can't all run simultaneously — the CLI has rate limits, and our VPS has finite CPU. We batch them:

// Process research pipeline in parallel batches of 4
const BATCH_SIZE = 4;

async function runAllSwarms(swarmNames) {
  for (let i = 0; i < swarmNames.length; i += BATCH_SIZE) {
    const batch = swarmNames.slice(i, i + BATCH_SIZE);
    await Promise.all(batch.map(name => runSwarm(name)));
  }
}

Four parallel agents is our empirically determined sweet spot. Three left CPU headroom on the table. Five triggered occasional CLI throttling. The entire 7-research pipeline pipeline completes in roughly 25 minutes at batch size 4, well within our 1 AM–2 AM maintenance window.

After the research pipeline complete, a cascade of downstream timers fires:

Time	Service	Purpose
1:00 AM	research-runner	Execute all 7 research pipeline (28 agents)
2:00 AM	action-extractor	Pull actionable items from reports
2:15 AM	cross-research pipeline-synthesizer	Generate daily brief across all research pipeline
2:30 AM	knowledge-accumulator	Update cumulative knowledge base
3:00 AM	research pipeline-builder	Trigger builds from research pipeline-discovered opportunities

Each stage feeds the next. By morning, we have individual research pipeline reports, a cross-research pipeline synthesis, an updated knowledge base, and — when the builder finds something worth prototyping — actual code scaffolded from research pipeline insights.

What Surprised Us

The synthesizer is the hardest agent to get right. We assumed the explorers would be the bottleneck. Wrong. Three good explorer outputs mean nothing if the synthesizer just concatenates them with transition sentences. We went through four iterations of the synthesis prompt before landing on one that genuinely cross-references findings, identifies contradictions between explorers, and produces a brief that's more valuable than any individual report.

Sentence completion is a real failure mode. Agents hitting token limits don't fail gracefully — they stop mid-sentence. Our first week of production runs included reports ending with phrases like "the most important factor in agent pricing is" followed by nothing. The CRITICAL block in the preamble about wrapping up thoughts cleanly reduced this from ~3 occurrences per night to near zero.

Day-of-year rotation surfaced a subtle bug. Math.floor(Date.now() / 86400000) produces a UTC day number. Our timer runs at 1 AM EST, which is 6 AM UTC. During the EST-to-EDT transition, we briefly had agents running at what they thought was the same day number for two consecutive runs. The fix was trivial (pin to UTC), but it produced duplicate research angles for two days before we caught it.

The $0 cost assumption has a hidden ceiling. CLI-based execution is truly free under a Pro subscription, but it's rate-limited. At 28 agents per night, we're comfortable. If we doubled to 56, we'd likely need to either extend the maintenance window or add a second subscription. The cost doesn't gradually increase — it's a step function at the subscription boundary.

Lessons Learned

1. Shell execution is an underrated integration pattern. The industry assumes "agent orchestration" requires an SDK, a framework, and a cloud deployment. For our use case — batch research with no real-time requirement — execSync piping prompts through a CLI is simpler, cheaper, and more reliable than any framework we evaluated. The entire orchestrator is under 400 lines of code.

2. Agent diversity is a design parameter, not an afterthought. The difference between three agents with the same preamble and three agents with distinct personalities (Pragmatist, Wild Card, Futurist) is the difference between three copies of one report and three genuinely different perspectives. Invest time in designing agent roles, not just agent prompts.

3. Rotation pools prevent content decay. Any system that runs daily will produce stale output within a week unless you build rotation into the architecture. Our angle pools with day-of-year indexing ensure that each research pipeline explores meaningfully different territory every day, without any manual intervention.

4. Constrain the unconstrained. Giving an agent creative freedom is powerful but dangerous. The INFINITY_PREAMBLE produces our most interesting research — and our most frequent formatting failures. The solution isn't to remove creative freedom but to add structural guardrails: complete sentences, proper punctuation, clean wrap-up. Creativity in content, discipline in format.

5. The downstream pipeline matters more than the agents. Our agents are interesting. Our action extractor, cross-research pipeline synthesizer, knowledge accumulator, and research pipeline-to-build pipeline are what make the system useful. An agent that produces a report nobody reads is a waste of compute. An agent whose output triggers a build, updates a knowledge base, and feeds a morning briefing is infrastructure.

Conclusion

We run 28 research agents across 7 coordinated research pipeline, every night, on a single VPS, for $0 in LLM costs. The architecture is a Node.js script, the Claude CLI, and execSync. No Kubernetes. No LangGraph. No vector database. No message broker.

The agents produce hundreds of unique research configurations through angle rotation, surface genuinely diverse perspectives through personality-differentiated preambles, and feed a downstream pipeline that extracts actions, synthesizes cross-research pipeline intelligence, and occasionally builds software from what it discovers.

Multi-agent orchestration doesn't require expensive infrastructure. It requires thoughtful design — distinct agent roles, rotation pools, structural guardrails, and a pipeline that turns agent output into something actionable. The rest is just plumbing, and simple plumbing is the best kind.

Need help building AI agent systems or designing multi-agent architectures? Ledd Consulting specializes in autonomous workflow design and agent orchestration for enterprise teams.