Mr Fixit 🦊🔧¶

Last updated: 2026-04-13 · Reading time: ~30 min · Difficulty: hard

TL;DR

Mr Fixit is the infrastructure fox. He watches every other agent's health, runs brain validation and Dropbox conflict scans, archives stale facts monthly, runs a weekly security audit, gates every git push to the remote, and escalates to you on Telegram when something is off. He is silent on all-clear, terse on alert, and never chatty.
Deploy him first. He is the training-wheels agent — the one where you discover every silent failure in the deploy pipeline on an agent whose blast radius is your monitoring story rather than your credit card.
Mr Fixit's first deploy is a minefield. I hit eight silent failures on mine, none of which produced a grep-able error. The "What makes him hard" section below is the war story; read it before you run deploy.py fix-it, not after.
His current cron surface is ten agent-side crons in agents/fix-it/manifest.json plus three host-side crons (fleet-health, morning-status, and morning-fleet-deliver) that the manifest does not register. A lot of Mr-Fixit-adjacent work that used to live as in-container LLM crons has been migrated off the OpenClaw dispatch queue in the last few days, because none of it needed LLM judgment. Ch 06 has the full two-kinds-of-crons distinction.
Infra agents with broad privileges will eventually confabulate a diagnosis. My Mr Fixit is currently on a 14-day probation for doing exactly that. See the pitfalls section for the mechanism and the mitigation.

Meet the agent¶

In Richard Scarry's Busytown, Mr Fixit was a fox in overalls who claimed he could fix anything. His shelves fell. His pipes leaked. His electrical work was, charitably, experimental. The townspeople called him anyway because he was cheap and he brought cookies. This Mr Fixit — the one that lives in my Hetzner box and runs ten scheduled crons against a file-based shared brain — is named in the spirit of what his Busytown forebear aspired to and never quite achieved. He keeps the lights on. He watches the health of every other agent in the fleet and escalates when something looks wrong. He runs the monthly archival and the weekly security audit and the daily cron-self-check. He does not bring cookies, and his repairs work on the first try most of the time. Checking… fixed.

Why you'd want one — and why you might not¶

Mr Fixit is the fleet's canary. He is the agent that wakes up before anyone else, reads fleet-health.json, and tells you whether the rest of your agents made it through the night. He archives stale facts on the first of every month. He scans for Dropbox conflict files every two hours. He runs an openclaw security audit every Sunday. He enforces the "only Mr Fixit pushes to GitHub" convention by running scripts/pre-push-check.sh before every git push origin master. When he is behaving, he is the most reliable and least noisy agent in the fleet — and in a multi-agent fleet, he pays for himself in the first night of failures he catches.

Why you might not deploy him. If you are running exactly one agent — say, Lowly Worm for a daily news digest and nothing else — Mr Fixit's monitoring surface is larger than the thing he's monitoring. You don't need a canary for the canary. A single-agent fleet can skip him and rely on you noticing when the Telegram messages stop arriving. If you are running more than one agent, though, skip him at your peril. The first morning you would have caught an overnight failure by opening fleet-health.json and instead discover it by wondering why your news agent hasn't sent a digest in 36 hours is the morning you'll regret it.

What makes Mr Fixit hard¶

It is not the code. Mr Fixit's scripts are the simplest in the fleet: a heartbeat probe, a brain-validation runner, a Dropbox conflict scanner, a tar-everything archival job, a security-audit wrapper. None of them are hard. What makes Mr Fixit hard is that he is the first agent you deploy, which means every silent failure in the deploy pipeline hits his workspace first, and most of those failures produce symptoms that do not point at their causes.

I hit the following on my first Mr Fixit deploy. None of them appeared in any CLI documentation I had access to. None of them produced a grep-able error. Each one cost me somewhere between twenty minutes and two hours to diagnose. Read them before you deploy, not after.

📖 War story: the first-deploy minefield.

1. OAuth wizard invalid_scope on first registration. The first time I ran oci agents add fix-it, the OAuth flow inside the gateway rejected the scope list with a generic invalid_scope error. No clue which scope was at fault. The fix — which I had to discover by diffing the new agent's workspace against the main agent's — was to copy auth-profiles.json from main into the new agent's workspace before the first registration attempt. main had a working profile; fix-it had whatever the wizard was trying to build fresh. Every subsequent agent in the fleet has inherited that profile via the same copy.

2. policy: null default in the manifest blocks everything. An empty exec-approvals stanza in manifest.json gets serialized to OpenClaw as policy: null, which the exec layer interprets as "ask on every command." Every cron hit an approval prompt, and since the crons were --no-deliver, the prompts never reached me — they just piled up in the approval queue while the crons silently returned exec blocked by approval policy. Fix: set policy: full in the manifest's approvals stanza explicitly. Ch 06 has the full exec-approvals model; the specific symptom here is that "no approvals configured" is not the same thing as "permissive approvals configured," and the default is the worst of both.

3. BOOTSTRAP.md shadowing IDENTITY.md. Covered in detail in Ch 06 and the OpenClaw-onboarding subsection of Ch 07-0. First symptom I saw: the fox introduced himself to me on Telegram as "I just came online. What should I call you?" while his correct SOUL.md and IDENTITY.md sat perfectly deployed one directory over. Fix: deploy.py now auto-deletes BOOTSTRAP.md on every deploy, but on my first deploy it didn't yet, and I spent two hours convinced I'd installed the wrong files.

4. exec bash wrapper.sh reset BASH_SOURCE and broke the deploy wrapper. Every agent used to have its own deploy.sh that exec-ed a shared wrapper script. The wrapper used ${BASH_SOURCE[1]} to detect which agent had called it, and it refused to run if the caller wasn't recognized. exec replaces the shell process, which resets BASH_SOURCE — so [1] was always empty and the wrapper always refused. Fix: each agent now sets OPENCLAW_AGENT_ID as an environment variable before exec-ing the wrapper, and the wrapper reads the env var instead of inspecting the bash call stack. (This failure mode got replaced entirely by deploy.py, but it lives on as a cautionary tale about inspecting call stacks in exec-wrapped scripts.)

5. /setcommands via @BotFather is a separate step, and OpenClaw clobbers it. You can't set a bot's slash-command menu via the OpenClaw CLI. You can only set it via the Telegram Bot API directly (setMyCommands) or via @BotFather's /setcommands command in an interactive chat. I did neither on my first deploy, and Mr Fixit's Telegram surface was the default "Welcome to your new bot" menu for several days. Worse: once I did set it, OpenClaw's next channel-sync clobbered my custom commands with its built-in defaults. The full fix is in Ch 05's "bot commands and descriptions" section — set commands.native: false in openclaw.json and run scripts/set-bot-commands.sh as an entrypoint hook, both, because they fail in different scenarios.

6. Parser format drift is silent. Mr Fixit reads several config files at runtime — exec-approvals.json, the agent roster, the cron manifest, his own CRONS.md. If any of those change format in a way the parser doesn't expect, the parse fails silently, the parser falls back to defaults, and Mr Fixit happily proceeds with stale or empty configuration. I've been bitten twice by this: once when an OpenClaw upgrade added fields to exec-approvals.json and my parser dropped anything it didn't recognize, once when I hand-edited CRONS.md and left a trailing comma that took the parser to an empty dictionary. Fix: the validation scripts now explicitly assert on the schema and fail loudly rather than silently.

7. Windows line endings. On my first terraform + Docker rebuild, the entrypoint script died at container boot with /usr/bin/env: 'bash\r': No such file or directory. I had edited the entrypoint on Windows and scp-d it to the VPS with CRLF line endings intact. sed -i 's/\r$//' fixes it once, configuring your editor to use LF for Linux-destined files fixes it permanently. ops/scripts/crlf-scan.py is the preventative version — point it at a file or a directory before SCPing and it'll detect (--fix normalizes). This one is covered in Ch 03's pitfalls, but it always bites the first deploy, and Mr Fixit is the first deploy.

8. Status-file drift and the three-layer root cause. On 2026-04-14 at 12:03 UTC, Mr Fixit's brain-validation cron fired and escalated one FAIL to my Telegram: Bad header: agents/family-calendar.status.md — expected '# {Name} — Status', got: '# family-calendar status'. The obvious fix was a one-line edit to the file header. I made it, brain-validation went green, and I reported "done." I was wrong by about 22 hours — that's how long it would've taken the next morning-briefing cron tick to overwrite my fix, because Mistress Mouse's cron prompt said "5) Update your status file." and the agent LLM was writing the whole file from scratch every tick, picking a header of its own making. The obvious fix was a symptom-layer fix. The real fix lived two layers down. Layer 1 (the surface): edit the file header to match the canonical form. Holds for one cron tick. Layer 2 (the reading side): retire validate.py's check_agent_files() entirely, because post-R6 the per-agent *.status.md files are legacy — fleet-health.json is the authoritative health surface, nothing reads the status files anymore, and the validator checking their headers was a leftover from the R1 era that nobody went back and pruned. Layer 3 (the writing side): strip "Update your status file" from every cron prompt in every agent (6 agents, 17 cron prompts) and replace it with "DO NOT touch ~/Dropbox/openclaw-backup/agents/<agent>.status.md — fleet-health.json (R6 orchestrator) is the authoritative health source." Then add the exact string "Update your status file" to Safeguard 9's forbidden-patterns list as a regression guard, so any future deploy that re-introduces the legacy clause is refused at the gate (exit 8). The lesson I took from it: the symptom layer, the reading layer, and the writing layer are usually three different places, and a surface fix almost always leaves two of them intact. Also — infra retirements leave prose fossils. R6 migrated the health surface from status.md to fleet-health.json weeks earlier, but nobody went back and rewrote the cron prompts that told LLMs to keep writing the retired surface. Every major infra retirement should end with a grep across all cron prompts for the thing you just retired, and a regression guard in Safeguard 9 for the phrase that used to be load-bearing.

Eight silent failures across one agent. None of them produced an error message that pointed at the fix. Each of them is a one-line change once you know what the fix is. Every subsequent agent in this guide benefits from the fact that these were all debugged against Mr Fixit first — which is why "deploy Mr Fixit first" is the first rule of Ch 07-0's deployment order.

Deployment walkthrough¶

The general arc in Ch 07-0 applies in full. What follows is the Mr-Fixit-specific material: workspace paths, manifest details, smoke-test, and the host-side crons you need to install separately.

The 10-cron manifest¶

agents/fix-it/manifest.json.example declares ten OpenClaw-side crons:

Cron	Schedule	Delivers?	What it does
`brain-validation`	`0 /6 * *`	silent on pass	`python3 ~/Dropbox/openclaw-backup/scripts/validate.py`; alerts on any failure
`conflict-scan`	`0 /2 * *`	silent on clean	`find … -name 'conflicted copy'`; alerts on any match
`file-size-monitor`	`0 12 * * *`	silent on clean	`find … -size +500k`; alerts on oversized files
`monthly-archival`	`0 3 1 * *`	always	moves stale facts and completed tasks older than 90 days to `archive/YYYY-MM/`
`security-audit`	`0 4 * * 0`	always	runs the wrapped `openclaw security audit --deep` and forwards the formatted report
`update-check`	`0 4 * * 3`	always	`openclaw update` to check for a new version; does not apply
`cron-self-check`	`0 0 * * *`	silent on pass	reads `expected-crons.json` and re-registers anything missing from `openclaw cron list`
`obsidian-briefing`	`10 12 * * *`	silent on success	runs the Obsidian briefing generator from Ch 05
`probation-end-reminder`	date-gated	always	pings me on a specific date to decide whether an agent stays on probation
`workspace-snapshot`	`30 3 * * *`	silent on success	tars each agent's workspace to Dropbox for regression recovery

Every cron message opens with the EXEC DISCIPLINE preamble from Ch 06's script contract — "run any python3 <path> command exactly as written, no shell wrapping, no chaining, no exit-code capture, scripts print one JSON line with a status field" — and every silent cron explicitly returns the empty string on success ("zero bytes, no narration, not even 'NO output'") to avoid burning LLM tokens on no-op acknowledgements.

Host-side crons — not in the manifest¶

Three Mr-Fixit-adjacent crons live entirely on the VPS host crontab rather than in manifest.json. They are installed by ops/scripts/install-host-cron.sh:

Host cron	Wrapper	Schedule	What it does
`fleet-health`	`ops/scripts/fleet-health-host.sh`	`/15 * * *`	Invokes `ops/scripts/fleet-health.py`, which calls each agent's `probe()` and writes `fleet-health.json`. If any agent reports non-`ok`, the wrapper pushes one aggregated Telegram alert to Mr Fixit's bot. Replaced Mr Fixit's old per-agent in-manifest `heartbeat-check` cron during the R3 fleet-health consolidation.
`morning-status`	`ops/scripts/morning-status-host.sh`	`30 10 * * *`	Runs the Mr-Fixit-owned `scripts/morning-status.py`, which reads `fleet-health.json` + `KNOWN_ISSUES.md` and writes the morning brief into `cache/morning-brief-ready.txt`. Replaced the old in-manifest LLM `morning-status` cron; the work is deterministic Python, not LLM judgment, and the 600-second cron budget was wildly overkill for it.
`morning-fleet-deliver`	`ops/scripts/morning-fleet-deliver-host.sh`	`0 12 * * *`	At noon UTC / 5am PT, reads the cached morning briefs written by Mr Fixit and every other agent that produced one, chunks them into Telegram-sized messages, and delivers them. Moved out of the in-container cron loop when the compose-vs-deliver split made the deliver step trivially deterministic.

The pattern across all three: work that never needed LLM judgment and never needed to live inside the OpenClaw dispatch queue in the first place. An earlier version of Clawford had per-agent heartbeat-check crons and an LLM-composed morning-status cron inside each agent's manifest; the R3 fleet-health consolidation, the R4 morning-status refactor, and the subsequent morning-fleet-deliver split moved all of them off the LLM dispatch queue, because recursive docker exec from inside the container hit race conditions and the 600-second LLM budget was overkill for work that finishes in under a second. Mr Fixit now has exactly one in-container probe — agents/fix-it/scripts/heartbeat.py::probe — and it only checks fleet-health.json freshness: if generated_at is more than 30 minutes old, it alerts that the orchestrator itself is broken, which is how Mr Fixit escalates a dead fleet-health to the human.

install-host-cron.sh is also defensively written against the same yo-yo class of bug that hit cron-self-check (below) — it carries a list of stale crontab markers from previous refactors and removes them on every run, so a re-run of the install script can't accidentally reintroduce a cron that got retired. There are several other host crons registered on the same box for other agents (Costco JWT refresh for Hilda Hippo, LinkedIn keep-alive and engagement polling for Lowly Worm, reminder-check for Mistress Mouse) — see each agent's subchapter for their host-cron surface.

Smoke test¶

After deploy.py fix-it, two things to verify:

Agent-side: oc cron list shows all ten crons; fire conflict-scan or brain-validation as a sanity check and confirm no Telegram output (they're silent-on-success).
Host-side: fire the host cron directly and read the result:

bash ~/repo/ops/scripts/fleet-health-host.sh
python3 -c "import json; d=json.load(open('/home/openclaw/Dropbox/openclaw-backup/fleet-health.json')); print(d['generated_at']); [print(f'  {k}: {v[\"status\"]}') for k,v in d['agents'].items()]"

fleet-health.json should have a fresh generated_at and show fix-it: ok. If it shows error with a fleet-health.json missing message, the orchestrator hasn't run yet — re-fire the wrapper and check ~/.openclaw/logs/fleet-health-host.log for the last run's exit code and any traceback.

The post-deploy bot-surface dance¶

Mr Fixit's bot is bound to the Telegram default account — meaning the TELEGRAM_BOT_TOKEN env var points to Mr Fixit's token, not to some other bot. This is non-obvious and there is a whole act of the Ballad of Mr Fixit dedicated to why it has to be this way (see docs/ballad-of-mr-fixit.md Act V); the short version is that default is the chair at the gateway table that can receive inbound messages, and a named account cannot. Named accounts can send but they can't receive. So if your Mr Fixit binding is telegram:fixit with a named account, you'll be able to receive his status reports but you won't be able to message him back and get a reply. Fix: bind his bot as default and put its token in TELEGRAM_BOT_TOKEN.

After deploy, run the bot-surface scripts from Ch 05 to set his slash commands and his short/long descriptions:

bash ~/repo/scripts/set-bot-commands.sh
bash ~/repo/scripts/set-bot-descriptions.sh

Both are idempotent. Both also run automatically via the entrypoint hooks at +25s and +30s after container start, so a docker compose restart re-applies both — but on first deploy, you want to run them once immediately rather than wait.

Pitfalls you'll hit¶

🧨 Pitfall. Binding Mr Fixit's bot as a named telegram:fixit account instead of telegram:default. Why: named accounts can send but cannot receive. Your Mr Fixit will cheerfully push status reports at you and completely ignore any message you send back, because inbound messages route to the default account and default isn't bound to him. I once removed the default account entirely while "cleaning up unused bots," which also broke inbound routing for every other agent in the fleet; that debugging saga is Ballad Act IV. How to avoid: Mr Fixit's bot is the default account. Put his token in TELEGRAM_BOT_TOKEN, bind him to telegram:default, and don't delete the default entry even if it looks like a leftover from a tutorial.

🧨 Pitfall. Mr Fixit's cron-self-check yo-yo'ing retired crons back into the LLM scheduler. Why: cron-self-check reads expected-crons.json and re-registers anything "missing" from openclaw cron list. If the file still lists crons that were intentionally retired or moved to host cron — for example the old per-agent heartbeat-check crons that became the host-side fleet-health probe — Mr Fixit will re-create them every midnight UTC, complete with their original messages. The symptom is a Telegram alert at 00:01 UTC listing a handful of crons as "re-registered" after you just spent a weekend retiring them. The worst case: a re-registered morning-status LLM cron firing a duplicate morning brief eighty minutes after the host-side one already ran. How to avoid: prune expected-crons.json every time you retire or move a cron. Also, rewrite the cron-self-check message to explicitly enumerate the crons it should re-register and to ignore everything else ("do not re-register anything not in this list even if it looks missing"). The full post-mortem is in DEPLOY.md gotcha 15.

🧨 Pitfall. Confabulating a diagnosis when asked about an expired approval or an ambiguous cron firing. Why: Mr Fixit has the broadest tool access in the fleet, and when you ask him "what fired approval X?" he will occasionally invent a plausible-sounding answer from symptoms rather than running the diagnostic tool first. On 2026-04-11 mine did exactly this — guessed heartbeat-check for an approval that actually came from security-audit, then proposed reverting a same-day commit without checking git log. The fix to the process, not just the incident: I wrote an explicit diagnostic discipline section into his SOUL.md with five rules (cite evidence before root-causing, check git log before reverting, match cron schedules to timestamps, one report one diagnosis one fix, never ask the human to approve a write that hides a permissions bug). Then I put him on a 14-day probation with a failure ledger at probation.md and a retire.sh script; one P1/P2/P3 failure during the probation window auto-retires him via bash agents/fix-it/retire.sh. This is still in progress as I write — his probation ends 2026-04-25. How to avoid: if your infra agent has broad privileges, assume it will eventually confabulate under load and write the diagnostic-discipline rules into the SOUL.md before that happens. A failure ledger is cheap; retiring-and-rebuilding an infra agent who lost your trust is not.

🧨 Pitfall. Invoking Claude Code via ACP instead of shell. Why: Mr Fixit can invoke Claude Code as a repair tool for complex diagnostics, and there are two ways to do that — a one-shot shell command (claude -p) and an ACP-dispatched session. ACP creates a persistent session bound to the Telegram thread, and once that binding is active, every subsequent message to the fox's chat routes to Claude Code instead of to Mr Fixit. Your Mr Fixit channel becomes a Claude Code session you cannot exit, because /stop, /new, and /reset all reset the session but leave the thread binding intact. The full story is Ballad Act III. How to avoid: use claude -p as a one-shot shell command, always with --add-dir ~/Dropbox/openclaw-backup/, and never enable ACP dispatch (oc config set acp.dispatch.enabled false). Mr Fixit's SOUL.md says this explicitly; leave it there.

🧨 Pitfall. scp-ing a local openclaw.json onto the VPS to "fix" a config drift. Why: openclaw.json stores every agent's channel bindings, registrations, and account mappings. Overwriting it from a local copy wipes all of them, in the fleet-bricking sense. Ch 07-0 flags this in its "two rules that didn't fit earlier," and it is the one place where a five-second move destroys every agent's ability to receive Telegram messages. How to avoid: never scp openclaw.json. Use openclaw config set inside the container (via the oc() wrapper) to change specific fields, or pull the live file to your laptop with ssh openclaw@<vps> "cat ~/.openclaw/openclaw.json" > local.json as a read-only backup.

🧨 Pitfall. Letting Mr Fixit push to GitHub without a pre-push scan. Why: Mr Fixit is the only agent in the fleet with git push privileges — the others commit freely to ~/repo/ but only he is trusted to push to the remote. He is also the agent most likely to have a cron scheduled by an LLM that silently writes something it shouldn't (a token, a chat ID, a local path). If that commit goes out without a safety scan, you're rewriting history with git-filter-repo before breakfast. How to avoid: Mr Fixit's push workflow runs scripts/pre-push-check.sh before git push origin master, and the script hard-fails on tracked secrets, .env files, unsuffixed per-agent configs, oversized binaries, and empty commit messages. If the script finds something, he alerts the human on Telegram and refuses to push. I've had that refusal fire twice. Both times I was grateful.