How a Social Monitor Storm Created 1,400 Duplicate Events in 20 Minutes
At Ledd Consulting, we run social monitoring agents that scan developer communities for signals — new articles, forum questions, hiring posts. These monitors feed into our event bus, which fans out to notification services, an AI watcher for opportunity scoring, and a lead pipeline. On March 8th, a routine tag expansion turned a calm 3x-daily scan into a firehose that pushed 1,400 duplicate events through the bus in 20 minutes, triggering cascading Telegram alerts, redundant AI evaluations, and a temporarily unresponsive notification service. Here's exactly what happened, how we traced it, and the edge-deduplication pattern we now run everywhere.
Timeline
08:35 EST — Scheduled scan fires. Our Dev.to monitor kicks off its morning run, querying 8 tags sequentially with a 1.5-second delay between requests.
08:36 EST — First anomaly. The event bus begins receiving devto.articles event payloads. Normally we see 3–5 batches per run. This time, 8 batches arrive in rapid succession, each containing up to 10 articles.
08:37 EST — Telegram floods. The notification service forwards summaries to our Telegram bot. We receive 14 alert messages in 90 seconds. Our phones are buzzing.
08:39 EST — Community monitor fires. A separate community lead-gen monitor — also scheduled at 08:00 — hits Dev.to with overlapping search terms. It finds many of the same articles and emits its own event batch.
08:41 EST — Watcher queue saturates. The AI watcher (which evaluates each article batch for build opportunities using Claude) now has 19 pending evaluation tasks. Each one spins up a Claude session. We're burning through our subscription concurrency.
08:55 EST — We notice and start investigating. Scrolling through Telegram, the same article titles appear 3–4 times each. We pull the event bus logs.
09:10 EST — Root cause identified. Cross-tag duplication. A single Dev.to article tagged ai, llm, and automation gets fetched three times — once per tag query — and each instance generates a separate event bus payload.
09:25 EST — Hotfix deployed. We add a per-run dedup set that collapses duplicates before any event emission.
09:30 EST — Verification. Dry-run confirms: 8 tag queries, 160 raw articles fetched, 43 unique articles emitted. The fan-out multiplier drops from ~3.2x to 1x.
Root Cause
The architecture was straightforward: query each tag, collect articles, check against a persistent seen file, and emit events for anything new. Here's the relevant loop from our monitor:
const TAGS = [
'aiagents',
'llm',
'claudeai',
'ai',
'mcp',
'automation',
'machinelearning',
'openai',
];
const PER_PAGE = 20;
const REQUEST_DELAY = 1500;
Each tag query fetches the 20 most recent articles sorted by published_at. The dedup mechanism was a persistent seen-articles.json file:
function loadSeen() {
try { return new Set(JSON.parse(fs.readFileSync(SEEN_FILE, 'utf8'))); }
catch { return new Set(); }
}
function saveSeen(seen) {
const arr = [...seen];
fs.writeFileSync(SEEN_FILE, JSON.stringify(arr.slice(-5000)));
}
This worked well for cross-run deduplication — articles seen on the morning scan would be skipped on the afternoon scan. But it had a critical blind spot: cross-tag duplication within a single run.
When a developer publishes an article on Dev.to and tags it ai, llm, and automation, that article appears in three of our eight tag queries. The seen set caught articles from previous runs, but during the current run, the seen set was only loaded once at startup. Each tag's fetch-and-emit cycle checked against seen, but the emit happened before the article ID was added to seen — because saveSeen() was called once at the end of the entire run.
The result: article ID 1847293 gets fetched under #ai, passes the seen check, gets emitted to the event bus. Then it gets fetched under #llm, passes the same (stale) seen check, gets emitted again. Then #automation — emitted a third time.
With 8 tags and significant overlap in the AI/ML content space, the average duplication factor was 3.2x. On that morning's run, 43 unique articles became 138 event emissions. Each emission fanned out to three downstream consumers (event bus subscribers, Telegram notification, AI watcher), producing roughly 414 downstream processing events.
The second amplifier: our community lead monitor ran on a near-identical schedule and also queried Dev.to:
const SEARCH_TERMS = [
'AI agent',
'building agents',
'Claude agent',
'need AI consultant',
'hiring AI',
'automation consultant',
'agent framework',
'multi-agent',
'LLM automation',
];
This monitor used URL-based dedup against its own lead files, completely independent from the Dev.to monitor's seen-articles.json. Same articles, separate dedup boundaries, separate event emissions. The two monitors shared zero dedup state.
Over the 20-minute window where both monitors ran and their downstream consumers processed the backlog, our event bus logged 1,427 total events traceable to ~43 unique articles.
The Fix
Three changes, deployed in sequence over 48 hours.
Fix 1: Per-Run Dedup Set (Immediate Hotfix)
We introduced an in-memory runSeen set that deduplicates articles across tags within a single run, independent of the persistent cross-run seen file:
async function runMonitor() {
const seen = loadSeen();
const runSeen = new Set(); // ← NEW: per-run dedup across tags
const allNew = [];
for (const tag of TAGS) {
const articles = await fetchTag(tag);
await sleep(REQUEST_DELAY);
for (const article of articles) {
if (seen.has(article.id)) continue;
if (runSeen.has(article.id)) { // ← NEW: skip cross-tag duplicates
log(` Skipping duplicate: ${article.title} (already seen under another tag)`);
continue;
}
runSeen.add(article.id); // ← NEW: mark seen for this run
seen.add(article.id);
allNew.push(article);
}
}
// Single batched emission after all tags processed
if (allNew.length > 0) {
await notifyEventBus(allNew); // ← CHANGED: one event, all articles
await notifyMT(`Found ${allNew.length} new articles across ${TAGS.length} tags`);
}
saveSeen(seen);
}
The key structural change: we moved from emit-per-tag to collect-then-emit. The event bus receives one devto.articles event per run containing all unique articles, rather than 8 separate events with overlapping payloads.
Fix 2: Cooldown Windows Between Monitors
Both monitors were scheduled at overlapping times (both targeting the 08:00 window). We staggered them with a 30-minute cooldown gap and added a shared dedup boundary:
// community-monitor.js — now checks the devto monitor's seen file too
function loadCrossMonitorSeen() {
const ids = new Set();
// Load own previous leads
const leadFiles = fs.readdirSync(LEADS_DIR)
.filter(f => f.endsWith('.json')).sort().reverse();
for (const file of leadFiles.slice(0, 7)) {
try {
const data = JSON.parse(fs.readFileSync(path.join(LEADS_DIR, file), 'utf8'));
for (const lead of data) {
if (lead.id) ids.add(lead.id);
if (lead.url) ids.add(lead.url);
}
} catch (e) {
log(`Warning: Could not parse ${file}: ${e.message}`);
}
}
// Load devto monitor's seen articles for cross-monitor dedup
try {
const devtoSeen = JSON.parse(fs.readFileSync(DEVTO_SEEN_FILE, 'utf8'));
for (const id of devtoSeen) ids.add(id);
} catch { /* first run or file missing — safe to skip */ }
return ids;
}
Timer staggering was simple — we shifted the community monitor's schedule by 35 minutes:
# devto-monitor.timer: 08:35, 14:35, 20:35 EST
# community-monitor.timer: 09:10, 15:10, 21:10 EST
Fix 3: Idempotent Event Processing on the Bus
The event bus itself now enforces idempotency. Each event includes a deterministic dedup key, and the bus drops events with keys it has seen within a rolling 6-hour window:
function notifyEventBus(articles) {
// Deterministic dedup key: sorted article IDs, hashed
const dedupKey = articles
.map(a => a.id)
.sort()
.join('|');
return postJson('127.0.0.1', 8080, '/event', {
type: 'devto.articles',
source: 'devto-monitor',
dedupKey: dedupKey,
data: {
count: articles.length,
articles: articles.slice(0, 10).map(a => ({
title: a.title,
description: a.description.slice(0, 200),
url: a.url,
tag: a.tag,
reactions: a.reactions,
})),
},
});
}
This is defense-in-depth. Even if a monitor somehow emits the same batch twice (process restart, timer overlap), the bus itself rejects the duplicate.
Prevention
We implemented four systemic changes to ensure this class of failure stays gone:
1. Edge deduplication as a mandatory pattern. Every monitor that queries multiple overlapping sources now runs a per-batch dedup pass before emitting any events. We codified this as a review checklist item for all new monitors.
2. Fan-out budget alerts. The event bus now tracks the ratio of incoming events to unique dedup keys per source per hour. If any source exceeds a 1.5x fan-out ratio, it fires a warning to Telegram:
// In event-bus handler
const fanoutRatio = totalEvents / uniqueDedupKeys;
if (fanoutRatio > 1.5) {
await alertOps(`⚠️ High fan-out detected: ${source} at ${fanoutRatio.toFixed(1)}x ` +
`(${totalEvents} events / ${uniqueDedupKeys} unique keys in last hour)`);
}
3. Shared dedup state across monitors. Monitors that query overlapping data sources (both our monitors query Dev.to) now read each other's seen-state files. The dedup boundary expanded from "per-monitor" to "per-data-source."
4. Batched emission by default. We eliminated the emit-per-query pattern across all social monitors. Every monitor now collects results across all its queries, deduplicates the full set, and emits once. This reduced our baseline event bus volume by 60% even on normal days.
Lessons for Your Team
Dedup boundaries must match your query topology. If your system queries the same data source through N different lenses (tags, search terms, API endpoints), your deduplication must span all N queries. Per-query dedup only prevents cross-run duplicates — the within-run cross-query duplicates are the ones that create storms.
Fan-out architectures amplify every duplicate. A single duplicate event is manageable. A single duplicate that fans out to 3 subscribers, each of which triggers downstream work (AI evaluation, notification, lead scoring), turns 1 duplicate into 6+ wasted operations. Deduplicate at the earliest possible point — before the fan-out, at the edge.
Schedule overlap between independent monitors is a hidden coupling. Two monitors with separate codebases, separate dedup state, and separate timers can still create correlated load spikes if they query the same upstream APIs during the same window. Stagger schedules explicitly and share dedup state across monitors that touch the same data sources.
Idempotency at the bus level is your safety net. Edge deduplication is the first line of defense. Bus-level idempotency (via deterministic dedup keys with a TTL window) is the second. Both are cheap to implement and together they make duplicate storms structurally impossible rather than merely unlikely.
Monitor your monitors. The fan-out ratio metric (events emitted / unique content items) is the single most useful signal for detecting dedup failures early. A ratio of 1.0 means perfect dedup. Anything above 1.5 means duplicates are leaking through. We now track this per-source and alert on it — the 1,400-event storm would have been caught at ~50 events if we'd had this metric from day one.
Conclusion
The 1,400-event storm lasted 20 minutes and required 55 minutes to diagnose and fix. The root cause was mundane: overlapping tag queries producing cross-tag duplicates that passed through a dedup boundary scoped too narrowly. The fix was equally mundane: widen the dedup boundary, batch before emitting, and add bus-level idempotency as a safety net.
The real lesson is architectural. In any system where monitors query overlapping data through multiple lenses and fan results out to multiple consumers, deduplication boundaries must be drawn deliberately — and they must be drawn before the fan-out point. Every duplicate that crosses the fan-out boundary multiplies by your subscriber count.
We now run this dedup-at-the-edge pattern across all 25 services in our infrastructure, and our event bus volume dropped 60% overnight — simply by emitting fewer redundant events.
Need help building AI agent systems or designing multi-agent architectures? Ledd Consulting specializes in autonomous workflow design and agent orchestration for enterprise teams.