The Demand-Aware Controller Pattern: Replacing Fixed Timers With State-Reading Triggers
We run 60+ scheduled timers on a single VPS. For months, many of them fired on fixed intervals — every 30 minutes, every hour, every 6 hours — regardless of whether there was actually work to do. The AI lead generator ran every 4 hours even when the pipeline already had 12 outreach-ready leads sitting untouched. The audit scout ran twice daily even when every audit slot was filled.
The waste wasn't just compute. Every unnecessary run burned API tokens, wrote noise into logs, and — worst of all — occasionally created problems by generating duplicate leads or re-scoring already-scored prospects. We'd already written about keeping 60 timers reliable. This post is about making them smart.
The Pattern — Demand-Aware Controller
A demand-aware controller is a thin orchestration layer that reads current state before deciding whether to trigger a downstream service. Instead of timer → run operation, the flow becomes timer → controller reads state → controller decides → conditional trigger → re-read state → write receipt.
The controller itself does no business logic. It's a decision gate — a 100-line script that answers one question: "Does the system actually need more of what this operation produces?"
The Naive Approach (and Why It Fails)
Most teams schedule automated work the same way:
# The fixed-interval approach
[Timer]
OnCalendar=*-*-* 00,04,08,12,16,20:00:00
Persistent=true
This fires every 4 hours, period. The service it triggers runs to completion every time. If the downstream pipeline is full, you've wasted a run. If the pipeline emptied 30 minutes after the last run, you wait 3.5 hours with nothing to process.
We tried three fixes before landing on the right one:
Fix 1: Add a guard clause inside the service itself. We added a "check if work exists" step at the top of the lead generator. Problem: the service still started, loaded its dependencies, authenticated with three APIs, read config — then quit. Startup cost alone was 8-12 seconds and a handful of API calls.
Fix 2: Shorter intervals. We dropped from 4 hours to 1 hour. Now we caught demand faster but ran 4x as many no-op cycles. Log noise quadrupled. Our morning briefing — which synthesizes overnight activity — started reporting "lead generator ran 6 times, produced 0 leads" as a recurring pattern.
Fix 3: Cron with conditional logic. We put bash conditionals in the ExecStart line. This works for trivial checks but falls apart when "should we run?" requires reading JSON state from another service's output directory, parsing timestamps, and applying policy thresholds.
The core problem: the thing that knows whether work is needed and the thing that does the work are at different layers. Mixing them creates either bloated services or fragile shell scripts.
Pattern Implementation
Here's the actual controller we deployed. The policy file defines thresholds; the controller reads live state, applies the policy, and conditionally triggers downstream services.
The Policy File
{
"controller": "lead-demand",
"version": 2,
"check_interval_minutes": 60,
"supply_sources": [
{
"name": "outreach-ready",
"state_path": "state/leads/outreach-ready.json",
"minimum_supply": 3,
"freshness_hours": 48
}
],
"triggers": [
{
"condition": "supply_below_minimum",
"service": "ai-lead-generator.service",
"cooldown_minutes": 120,
"max_triggers_per_day": 4
},
{
"condition": "supply_stale",
"service": "audit-lead-scout.service",
"cooldown_minutes": 240,
"max_triggers_per_day": 2
}
],
"receipt_dir": "state/lead-demand-controller"
}
The policy is static JSON — no code, no logic. An operator can change thresholds without touching the controller. The cooldown_minutes prevents rapid-fire triggering if the downstream service fails to produce supply (we learned that one the hard way — more on it below). The max_triggers_per_day is a hard ceiling that exists purely because we don't trust ourselves not to introduce a feedback loop.
The Controller
const fs = require('fs');
const path = require('path');
const { execSync } = require('child_process');
const WORKSPACE = process.env.WORKSPACE_DIR || '/var/lib/controller';
const POLICY_PATH = path.join(WORKSPACE, 'control-plane', 'lead-demand-policy.json');
function loadPolicy() {
return JSON.parse(fs.readFileSync(POLICY_PATH, 'utf8'));
}
function readSupplyState(source) {
const fullPath = path.join(WORKSPACE, source.state_path);
if (!fs.existsSync(fullPath)) return { count: 0, oldest_hours: Infinity };
const data = JSON.parse(fs.readFileSync(fullPath, 'utf8'));
const leads = Array.isArray(data) ? data : (data.leads || []);
if (leads.length === 0) return { count: 0, oldest_hours: Infinity };
const now = Date.now();
const oldest = Math.min(...leads.map(l => new Date(l.scored_at || l.created_at).getTime()));
const oldest_hours = (now - oldest) / (1000 * 60 * 60);
return { count: leads.length, oldest_hours };
}
function getCooldownState(trigger, receiptDir) {
const receiptPath = path.join(receiptDir, `${trigger.service}.json`);
if (!fs.existsSync(receiptPath)) return { can_trigger: true, reason: 'no-prior-receipt' };
const receipt = JSON.parse(fs.readFileSync(receiptPath, 'utf8'));
const elapsed = (Date.now() - new Date(receipt.last_triggered).getTime()) / (1000 * 60);
if (elapsed < trigger.cooldown_minutes) {
return { can_trigger: false, reason: `cooldown: ${Math.round(trigger.cooldown_minutes - elapsed)}m remaining` };
}
const today = new Date().toISOString().slice(0, 10);
const todayCount = (receipt.daily_counts || {})[today] || 0;
if (todayCount >= trigger.max_triggers_per_day) {
return { can_trigger: false, reason: `daily cap: ${todayCount}/${trigger.max_triggers_per_day}` };
}
return { can_trigger: true, reason: 'cooldown-elapsed' };
}
function writeReceipt(trigger, receiptDir, decision) {
const receiptPath = path.join(receiptDir, `${trigger.service}.json`);
const today = new Date().toISOString().slice(0, 10);
let receipt = fs.existsSync(receiptPath)
? JSON.parse(fs.readFileSync(receiptPath, 'utf8'))
: { daily_counts: {} };
receipt.last_triggered = new Date().toISOString();
receipt.last_decision = decision;
receipt.daily_counts[today] = ((receipt.daily_counts || {})[today] || 0) + 1;
// Prune old daily counts
for (const key of Object.keys(receipt.daily_counts)) {
if (key < new Date(Date.now() - 7 * 86400000).toISOString().slice(0, 10)) {
delete receipt.daily_counts[key];
}
}
fs.mkdirSync(path.dirname(receiptPath), { recursive: true });
fs.writeFileSync(receiptPath, JSON.stringify(receipt, null, 2));
}
function triggerService(serviceName) {
try {
execSync(`sudo /usr/bin/systemctl start ${serviceName}`, { timeout: 120000 });
return { success: true };
} catch (err) {
return { success: false, error: err.message };
}
}
async function run() {
const policy = loadPolicy();
const receiptDir = path.join(WORKSPACE, policy.receipt_dir);
const decisions = [];
for (const source of policy.supply_sources) {
const supply = readSupplyState(source);
console.log(`[supply] ${source.name}: count=${supply.count} oldest=${supply.oldest_hours.toFixed(1)}h`);
for (const trigger of policy.triggers) {
const cooldown = getCooldownState(trigger, receiptDir);
let shouldTrigger = false;
if (trigger.condition === 'supply_below_minimum' && supply.count < source.minimum_supply) {
shouldTrigger = true;
}
if (trigger.condition === 'supply_stale' && supply.oldest_hours > source.freshness_hours) {
shouldTrigger = true;
}
if (shouldTrigger && cooldown.can_trigger) {
console.log(`[trigger] ${trigger.service} — condition=${trigger.condition}`);
const result = triggerService(trigger.service);
writeReceipt(trigger, receiptDir, { triggered: true, result });
decisions.push({ service: trigger.service, action: 'triggered', result });
// Re-read supply after trigger to update state
const refreshed = readSupplyState(source);
console.log(`[refresh] ${source.name}: count=${refreshed.count} (was ${supply.count})`);
} else if (shouldTrigger && !cooldown.can_trigger) {
console.log(`[skip] ${trigger.service} — ${cooldown.reason}`);
decisions.push({ service: trigger.service, action: 'skipped', reason: cooldown.reason });
} else {
console.log(`[ok] ${trigger.service} — supply sufficient`);
decisions.push({ service: trigger.service, action: 'no-op', supply });
}
}
}
// Write run summary
const summaryPath = path.join(receiptDir, 'latest-run.json');
fs.mkdirSync(path.dirname(summaryPath), { recursive: true });
fs.writeFileSync(summaryPath, JSON.stringify({
ran_at: new Date().toISOString(),
decisions
}, null, 2));
}
run().catch(err => {
console.error(`[fatal] ${err.message}`);
process.exit(1);
});
A few things worth calling out:
The re-read after trigger. After triggerService() fires the downstream service synchronously (via systemctl start, which blocks until the service completes), the controller re-reads the supply state. This is critical. Without it, a controller that triggers the lead generator but doesn't see the new leads will try to trigger the audit scout too — double-triggering on a single cycle.
Receipt pruning. We keep 7 days of daily counts and prune older ones on every write. Without this, the receipt file grows indefinitely. We found this out after 3 months when receipt files for high-frequency controllers hit 400+ lines of daily_counts entries.
The sudo systemctl start call. The controller runs as a non-root service user but needs to start other systemd units. We use a targeted sudoers rule — the controller can only run systemctl start on the specific services listed in its policy. No wildcards, no other verbs.
The Systemd Wiring
# lead-demand-controller.timer
[Unit]
Description=Lead demand controller — check supply and trigger if needed
[Timer]
OnCalendar=*-*-* *:00:00
Persistent=true
RandomizedDelaySec=120
[Install]
WantedBy=timers.target
# lead-demand-controller.service
[Unit]
Description=Lead demand controller run
[Service]
Type=oneshot
User=svc-controller
ExecStart=/usr/bin/node /var/lib/controller/lead-demand-controller.js
WorkingDirectory=/var/lib/controller
TimeoutStartSec=300
The timer fires hourly. The controller decides in under 2 seconds whether anything needs to happen. If nothing does, total cost is one fs.readFileSync and one JSON.parse. If a trigger fires, the downstream service runs inline (Type=oneshot means systemd waits for completion), then the controller re-reads and continues.
In Production
We rolled this pattern out on April 10, replacing the fixed ai-lead-generator.timer that had been running every 4 hours since January.
Before vs. After
| Metric | Fixed Timer | Demand Controller |
|---|---|---|
| Daily runs of lead generator | 6 | 1.4 avg |
| No-op runs (supply already sufficient) | ~4/day | 0 |
| Time-to-trigger when pipeline empty | 0–4 hours (random) | < 62 minutes |
| API tokens wasted on no-op runs | ~2,400/day | 0 |
| Receipt/audit trail | None | Full JSON receipts |
The reduction in wasted runs was immediate. But the more valuable outcome was the audit trail. Every decision the controller makes — trigger, skip, no-op — gets written to a receipt file with timestamps. When our morning briefing asks "what happened with lead generation overnight?", the answer is a single JSON file, not a journalctl trawl.
Edge Cases We Hit
Feedback loop on empty supply. In the first week, the AI lead generator service had a bug where it would complete successfully but write zero leads to the state file (it was writing to a legacy path). The controller saw empty supply, triggered the generator, re-read supply (still empty), and would have triggered the audit scout too. The cooldown saved us — but we added an explicit check: if a trigger fires and the re-read shows supply didn't change, log a warning and do not trigger the next service. Something upstream is broken.
Stale state files. The controller trusts whatever's in the state JSON. If the lead generator crashes mid-write and leaves a corrupt file, the controller either fails to parse (caught, logged, exits non-zero so systemd reports failure) or reads partial data. We added a simple integrity check: if the state file's mtime is older than the last receipt's last_triggered timestamp, the supply data is stale — the downstream service likely failed.
Clock drift in receipt comparison. All our services run on one VPS, so clock drift isn't an issue today. But the cooldown math uses Date.now() vs. stored ISO timestamps. If we ever distribute this, we'll need to switch to monotonic counters or accept that cooldowns might be off by the drift delta.
Variations
We've since applied this pattern to two other controllers beyond lead demand:
App Update Controller. Reads the current deployed version, checks whether a newer build artifact exists, and only triggers the deploy pipeline when there's actually something new. Before this, deploys ran on a 2-hour timer and no-op'd 90% of the time.
Agent OS Controller. Reads health state from all registered agent services, and only triggers the restart/recovery pipeline when an agent is actually degraded. Previously, the health-check timer ran every 15 minutes and the recovery service ran every 30 — even when every agent was healthy.
The pattern generalizes cleanly to any situation where:
- An operation is expensive (API calls, compute, side effects)
- The operation produces supply that accumulates
- You can cheaply read the current supply level
- You have a policy for "enough" vs. "need more"
If any of those four conditions don't hold, a fixed timer is simpler and probably fine. Don't over-engineer a cron job that sends a Slack reminder.
Adapting the Policy File
The policy JSON is deliberately flat. You could extend it with:
- Priority ordering on triggers, so the controller tries the cheapest supply source first
- Backoff multipliers that increase cooldown after repeated triggers with no supply change (exponential backoff for broken downstreams)
- Time-of-day windows — we haven't needed these yet, but some teams only want lead generation during business hours
The key constraint: keep the policy declarative. The moment you put if/else logic in the JSON (or start embedding JavaScript expressions), you've moved business logic out of version-controlled code and into a config file that no linter will catch.
Conclusion
Fixed-interval timers are the while(true) { sleep(interval) } of operations. They're easy to set up and hard to debug when they cause problems three layers down. The demand-aware controller adds ~100 lines of code and a JSON policy file, but gives you: fewer wasted runs, faster response to actual demand, a complete audit trail, and policy-driven thresholds you can tune without redeploying.
The pattern isn't complicated. Read state, apply policy, trigger conditionally, write a receipt. The hard part is resisting the urge to make the controller itself smart. It's a gate, not a brain.
Need help building AI agent systems or designing multi-agent architectures? Ledd Consulting specializes in autonomous workflow design and agent orchestration for enterprise teams.