homelab-companion

Two modes — flag the trap up front, or run a phase-by-phase post-mortem after the break.

A Claude Code skill-pack that catches confidently-wrong homelab advice before it ships. Two modes share one catalog: preventive (flag the AI default trap when reviewing a Docker compose, ufw rule, systemd timer, or sync workflow) and retrospective (phase-by-phase post-mortem when something already broke). Each entry has a "Why LLMs miss this" section that names the plausible-but-wrong default a model reaches for without context — and how to redirect. In testing — 4 evals, 22 runs — the skill scored 22/22 vs the baseline's 4/22. The +83 percentage point gap is where AI defaults are confident but wrong.

eval · 4 evals · 22 runs · judged by Sonnet 4.6

where the gap lives — one cell per run

ufw-docker bridge

baseline 0 / 5with skill 5 / 5

gluetun service recovery

baseline 0 / 5with skill 5 / 5

git script intermittent

baseline 3 / 6with skill 6 / 6

netns child orphan · the teachable case

baseline 1 / 6with skill 6 / 6

22 / 22 with skill · 4 / 22 baseline

eval-3 · netns child orphan · the teachable case

Both arms wrote structured post-mortems. Only one named the mechanism that actually broke.

without skillmonitors: tunnel state

“tunnel stalls / interface goes bad”

→ watches the wrong layer

with skillmonitors: parent restart

“gluetun received a new network namespace on restart — docker did not re-wire the dependent container”

→ catches the parent-restart

same post-mortem shape · opposite mechanism+83 pp · iteration-1 self-eval

git clone https://github.com/ikkeseb/homelab-companion ~/.claude/skills/homelab-companion

specifications

HOST: cc skill-pack
MODES: preventive + retro
DELTA: +83 pp eval