ADR: Why We Built a Notification Router Instead of Letting Each Service Send Its Own Alerts
At Ledd Consulting, we run 25 services on a single VPS — scrapers, agent pipelines, CRM integrations, content publishers, billing webhooks, and more. Each one has something to say. A lead comes in. A bid gets approved. A blog post is ready for review. A health check fails. A cost threshold trips.
By February 2026, we had 40+ notification sources, and every single one was sending alerts directly. The result was exactly what you'd expect: a Telegram channel so noisy we muted it, an inbox full of one-line emails, and — critically — a missed revenue alert buried under twelve system health pings.
This is the architectural decision record for how we fixed it.
Context — What Decision Needed to Be Made and Why
Our platform grew organically. Each service was responsible for its own alerting. The scraper sent Telegram messages when it found high-scoring jobs. The CRM sent emails on new leads. The cost tracker sent Telegram alerts on spend thresholds. The content pipeline sent emails when blog drafts were ready.
Three problems compounded:
Duplicate alerts. When a lead arrived, the contact form service sent a notification, the CRM pipeline sent a notification, and the lead-scoring agent sent a notification. Three alerts for one event.
Priority blindness. A critical revenue alert (new client message) looked identical to a low-priority system log (daily scraper summary). Everything arrived with the same urgency, so everything got the same response: ignored until we had time.
Channel sprawl. Some services sent Telegram messages. Others sent emails. A few wrote to log files that required SSH to read. Figuring out "what happened today" meant checking four different surfaces.
We needed a single system that every service could push into, and that would handle when, where, and whether to deliver each alert.
Options Considered
Option 1: Managed Alerting Platform (PagerDuty, Opsgenie)
Pros: Battle-tested escalation policies, on-call rotation, mobile apps, integrations with everything.
Cons: Priced per-user per-month, designed for ops teams of 5+, and fundamentally oriented around infrastructure alerting. Our alerts are business events — "a $2,500 proposal just landed" is a different problem than "CPU at 95%." The mental model mismatch was significant. We'd be paying $30+/month to wedge business intelligence into an incident management tool.
Option 2: Event Bus With Per-Service Subscribers
We already run a lightweight event bus (covered in a previous post). We considered having each delivery channel — Telegram bot, email sender, Slack webhook — subscribe to events and decide independently what to surface.
Pros: Decoupled, each channel owns its own filtering logic, easy to add channels.
Cons: Deduplication becomes every subscriber's problem. Priority logic gets duplicated across channels. Quiet hours require coordination. We'd be distributing complexity instead of centralizing it — trading one problem (noisy services) for another (noisy subscribers).
Option 3: Centralized Notification Router
A single service that accepts notifications from all sources, deduplicates them, assigns priority, respects quiet hours, and dispatches to the right channel at the right time.
Pros: One place to tune signal-to-noise. Deduplication happens once. Priority routing lives in one config file. Every service gets a simple contract: POST your notification, walk away.
Cons: Single point of failure for all alerting. Requires discipline — every service must route through it instead of sending directly.
Decision Criteria — What Mattered Most and Why
We ranked our requirements:
- Signal-to-noise ratio. The entire point was receiving fewer, better alerts. Any option that preserved the volume problem was disqualified.
- Deduplication at the source. When three services report the same event, we should see it once.
- Priority-aware delivery timing. Critical alerts arrive immediately. Daily summaries arrive at 7 AM. Weekly digests arrive on Sunday. The system should match delivery cadence to urgency.
- Quiet hours. Between 10 PM and 7 AM, only true emergencies should buzz a phone.
- Actionable alerts. Especially on mobile — tap a button to approve, dismiss, or escalate. Reading a wall of text on Telegram and then SSHing into a server to act on it defeats the purpose.
- Operational simplicity. We're a small team. The solution should be a single file we can read top to bottom, hosted alongside everything else on our VPS.
Criteria 1–3 eliminated Option 2 (distributed subscribers). Criteria 4–6 eliminated Option 1 (managed platform). We went with Option 3.
Our Decision — What We Chose and How We Implemented It
We built notification-router, a single Node.js service that accepts notifications via HTTP POST, queues them, and dispatches digests on a priority-based schedule.
The Contract: One Endpoint, Six Fields
Every service in our mesh sends notifications the same way:
// Any service can send a notification with a single POST
await fetch('http://127.0.0.1:5000/notify', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
source: 'lead-scorer',
subject: 'New high-value lead: Acme Corp',
body: 'Score: 8.7/10. Budget: $15,000. Responded to MCP audit page.',
priority: 'high', // critical | high | normal | low
category: 'revenue', // revenue | lead | content | system | intelligence
dedup_key: 'lead-acme-corp-2026-03-15'
})
});
That's the entire integration. The sending service has zero knowledge of Telegram, email, quiet hours, or batching. It fires and forgets.
Priority-Based Routing
The core of the system is a routing table that maps priority levels to delivery cadences:
const defaultConfig = {
routing: {
critical: { action: 'immediate', description: 'Send immediately via email' },
high: { action: 'batch_3x', times_utc: [13, 18, 23],
description: '3x/day digest (8 AM, 1 PM, 6 PM EST)' },
normal: { action: 'batch_daily', time_utc: 12,
description: 'Daily digest (7 AM EST)' },
low: { action: 'batch_weekly', day: 0, time_utc: 14,
description: 'Weekly digest (Sunday 9 AM EST)' }
},
quiet_hours: {
enabled: true,
start_utc: 3, // 10 PM EST
end_utc: 12, // 7 AM EST
description: '10 PM - 7 AM EST: queue as high instead of immediate'
},
dedup_window_ms: 60 * 60 * 1000 // 1 hour
};
This means a critical revenue alert at 2 PM hits Telegram and email within seconds. A normal system health summary waits for the morning digest. A low content performance report shows up in the Sunday weekly roundup. One config block controls the entire organization's alert cadence.
Deduplication Windows
When a new lead triggers notifications from the contact form, the CRM pipeline, and the lead scorer, the router sees all three within its one-hour deduplication window and delivers only the first:
function isDuplicate(notification, queue, config) {
const window = config.dedup_window_ms || DEDUP_WINDOW_MS;
const now = Date.now();
const cutoff = now - window;
return queue.some(n => {
const nTime = new Date(n.timestamp || n.received_at).getTime();
return nTime > cutoff &&
n.source === notification.source &&
(n.dedup_key || n.subject) === (notification.dedup_key || notification.subject);
});
}
The dedup_key field gives services explicit control. When two different services report the same event, they can share a dedup key (lead-acme-corp-2026-03-15) and the router collapses them into one notification. When a service omits the key, the router falls back to matching on source + subject — still effective for catching rapid-fire duplicates from a single service.
Quiet Hours
The quiet hours check is deliberately simple:
function isQuietHours(config) {
if (!config.quiet_hours || !config.quiet_hours.enabled) return false;
const hour = new Date().getUTCHours();
const start = config.quiet_hours.start_utc;
const end = config.quiet_hours.end_utc;
if (start < end) {
return hour >= start && hour < end;
} else {
return hour >= start || hour < end;
}
}
During quiet hours, critical notifications get downgraded to high and queued for the first morning batch. We sleep through system noise and still see everything important by 8 AM.
Actionable Telegram Buttons
The router pairs with a companion service that adds inline keyboard buttons to Telegram notifications. Instead of reading "New proposal draft ready" and then opening a laptop, we get one-tap decisions:
const BUTTON_LAYOUTS = {
proposal_review: (id, title) => ({
text: `New proposal draft: "${title}"\n\nApprove to submit, or skip.`,
buttons: [
[
{ text: 'Approve & Submit', callback_data: `approve_proposal:${id}` },
{ text: 'Skip', callback_data: `skip_proposal:${id}` }
],
[
{ text: 'Edit First', callback_data: `edit_proposal:${id}` }
]
]
}),
job_opportunity: (id, title, score) => ({
text: `Hot opportunity (${score}/10): "${title}"\n\nBid on this?`,
buttons: [
[
{ text: 'Bid Now', callback_data: `bid_job:${id}` },
{ text: 'Pass', callback_data: `pass_job:${id}` }
]
]
}),
nurture_followup: (id, clientName) => ({
text: `Follow-up drafted for ${clientName}.\n\nSend or skip?`,
buttons: [
[
{ text: 'Send', callback_data: `approve_nurture:${id}` },
{ text: 'Skip', callback_data: `skip_nurture:${id}` }
],
[
{ text: 'Edit First', callback_data: `edit_nurture:${id}` }
]
]
})
};
Five button layouts cover our most common decision points: proposal approval, job bidding, blog publishing, client follow-ups, and message triage. Each callback triggers a downstream action — approving a proposal writes to the bid queue and kicks off submission automatically.
Category-Ordered Digests
When batched notifications flush into an HTML digest email, they're grouped by business category with revenue and leads first:
const CATEGORY_ORDER = ['revenue', 'lead', 'intelligence', 'system', 'content', 'education'];
The morning digest reads like a briefing: revenue events at the top, new leads second, intelligence insights third, system health buried at the bottom where it belongs. The hierarchy ensures we read the most important items even when we skim.
Consequences — What Worked and What We'd Do Differently
What worked immediately:
- Alert volume dropped roughly 70%. Forty-plus raw notifications per day collapsed into 3–4 digest emails and a handful of critical Telegram pings.
- We caught a $900 overnight API spend because the cost alert came through as
criticalpriority and hit Telegram instantly — it stood out because the channel was quiet. - Proposal response time improved. Inline buttons on Telegram mean we approve or skip bids from our phone in under five seconds.
- The deduplication window eliminated the "three alerts for one lead" problem on day one.
What we'd refine:
- The file-based queue (
queue.json) works at our scale but creates a subtle race condition under burst load. Two near-simultaneous writes can clobber each other. At higher volume, we'd move to SQLite or an append-only log. - Our deduplication is source-aware by default — two different services with the same subject still create two entries unless they share a
dedup_key. We've trained ourselves to set explicit dedup keys, but it requires discipline from every service author. - The single-service architecture means a restart drops in-flight notifications. We mitigate this with the persistent queue file, but a true at-least-once guarantee would require acknowledgment tracking.
When to Reconsider
This decision holds as long as several conditions remain true:
- Team size stays small. With 1–3 people, a single routing config is manageable. At 10+ engineers with different on-call rotations, a managed platform like PagerDuty starts earning its cost.
- Alert volume stays under ~200/day. Our file-based queue handles this comfortably. Past 1,000/day, we'd want a proper message store.
- Channels stay limited. We currently multiplex across Telegram and email. Adding Slack, SMS, PagerDuty, and webhooks would warrant a plugin architecture instead of hardcoded dispatch.
- Single-VPS topology persists. Everything runs on one box, so
127.0.0.1:5000is reachable from every service. In a multi-region deployment, the router would need service discovery and authenticated ingress.
If any of these flip, we'd likely migrate to a hybrid: keep the centralized routing logic but back it with a durable queue (Redis Streams or NATS JetStream) and add a proper plugin system for output channels.
Conclusion
Centralized notification routing turned out to be one of the highest-leverage architectural decisions we made for our 25-service platform. The implementation is a single file — pure Node.js stdlib plus fs for persistence. It took a day to build and eliminated an entire class of operational pain: duplicate alerts, missed escalations, and context-switching across channels.
The pattern generalizes to any team running more than a handful of microservices. If your engineers are muting Slack channels or ignoring PagerDuty because the signal-to-noise ratio is broken, the fix is usually architectural: centralize the routing, deduplicate at the gate, and batch by priority.
Need help building AI agent systems or designing multi-agent architectures? Ledd Consulting specializes in autonomous workflow design and agent orchestration for enterprise teams.