fix: reduce log noise for node disconnect/late invoke errors (#1607)

* fix: reduce log noise for node disconnect/late invoke errors

- Handle both 'node not connected' and 'node disconnected' errors at info level
- Return success with late:true for unknown invoke IDs instead of error
- Add 30-second throttle to skills change listener to prevent rapid-fire probes
- Add tests for isNodeUnavailableError and late invoke handling

* fix: clean up skills refresh timer and listener on shutdown

Store the return value from registerSkillsChangeListener() and call it
on gateway shutdown. Also clear any pending refresh timer. This follows
the same pattern used for agentUnsub and heartbeatUnsub.

* refactor: simplify KISS/YAGNI - inline checks, remove unit tests for internal utilities

* fix: reduce gateway log noise (#1607) (thanks @petter-b)

* test: align agent id casing expectations (#1607)

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
This commit is contained in:
Petter Blomberg
2026-01-24 21:05:41 +01:00
committed by GitHub
parent 40ef3b5d30
commit 39d8c441eb
5 changed files with 165 additions and 9 deletions

View File

@@ -344,9 +344,19 @@ export async function startGatewayServer(
setSkillsRemoteRegistry(nodeRegistry);
void primeRemoteSkillsCache();
registerSkillsChangeListener(() => {
const latest = loadConfig();
void refreshRemoteBinsForConnectedNodes(latest);
// Debounce skills-triggered node probes to avoid feedback loops and rapid-fire invokes.
// Skills changes can happen in bursts (e.g., file watcher events), and each probe
// takes time to complete. A 30-second delay ensures we batch changes together.
let skillsRefreshTimer: ReturnType<typeof setTimeout> | null = null;
const skillsRefreshDelayMs = 30_000;
const skillsChangeUnsub = registerSkillsChangeListener((event) => {
if (event.reason === "remote-node") return;
if (skillsRefreshTimer) clearTimeout(skillsRefreshTimer);
skillsRefreshTimer = setTimeout(() => {
skillsRefreshTimer = null;
const latest = loadConfig();
void refreshRemoteBinsForConnectedNodes(latest);
}, skillsRefreshDelayMs);
});
const { tickInterval, healthInterval, dedupeCleanup } = startGatewayMaintenanceTimers({
@@ -544,6 +554,11 @@ export async function startGatewayServer(
if (diagnosticsEnabled) {
stopDiagnosticHeartbeat();
}
if (skillsRefreshTimer) {
clearTimeout(skillsRefreshTimer);
skillsRefreshTimer = null;
}
skillsChangeUnsub();
await close(opts);
},
};