Mr Fixit π¦π§¶
Last updated: 2026-04-13 Β· Reading time: ~30 min Β· Difficulty: hard
TL;DR
- Mr Fixit is the infrastructure fox. He watches every other agent's health, runs brain validation and Dropbox conflict scans, archives stale facts monthly, runs a weekly security audit, gates every
git pushto the remote, and escalates to you on Telegram when something is off. He is silent on all-clear, terse on alert, and never chatty. - Deploy him first. He is the training-wheels agent β the one where you discover every silent failure in the deploy pipeline on an agent whose blast radius is your monitoring story rather than your credit card.
- Mr Fixit's first deploy is a minefield. I hit eight silent failures on mine, none of which produced a grep-able error. The "What makes him hard" section below is the war story; read it before you run
deploy.py fix-it, not after. - His current cron surface is ten agent-side crons in
agents/fix-it/manifest.jsonplus three host-side crons (fleet-health,morning-status, andmorning-fleet-deliver) that the manifest does not register. A lot of Mr-Fixit-adjacent work that used to live as in-container LLM crons has been migrated off the OpenClaw dispatch queue in the last few days, because none of it needed LLM judgment. Ch 06 has the full two-kinds-of-crons distinction. - Infra agents with broad privileges will eventually confabulate a diagnosis. My Mr Fixit is currently on a 14-day probation for doing exactly that. See the pitfalls section for the mechanism and the mitigation.
Meet the agent¶
In Richard Scarry's Busytown, Mr Fixit was a fox in overalls who claimed he could fix anything. His shelves fell. His pipes leaked. His electrical work was, charitably, experimental. The townspeople called him anyway because he was cheap and he brought cookies. This Mr Fixit β the one that lives in my Hetzner box and runs ten scheduled crons against a file-based shared brain β is named in the spirit of what his Busytown forebear aspired to and never quite achieved. He keeps the lights on. He watches the health of every other agent in the fleet and escalates when something looks wrong. He runs the monthly archival and the weekly security audit and the daily cron-self-check. He does not bring cookies, and his repairs work on the first try most of the time. Checkingβ¦ fixed.
Why you'd want one β and why you might not¶
Mr Fixit is the fleet's canary. He is the agent that wakes up before anyone else, reads fleet-health.json, and tells you whether the rest of your agents made it through the night. He archives stale facts on the first of every month. He scans for Dropbox conflict files every two hours. He runs an openclaw security audit every Sunday. He enforces the "only Mr Fixit pushes to GitHub" convention by running scripts/pre-push-check.sh before every git push origin master. When he is behaving, he is the most reliable and least noisy agent in the fleet β and in a multi-agent fleet, he pays for himself in the first night of failures he catches.
Why you might not deploy him. If you are running exactly one agent β say, Lowly Worm for a daily news digest and nothing else β Mr Fixit's monitoring surface is larger than the thing he's monitoring. You don't need a canary for the canary. A single-agent fleet can skip him and rely on you noticing when the Telegram messages stop arriving. If you are running more than one agent, though, skip him at your peril. The first morning you would have caught an overnight failure by opening fleet-health.json and instead discover it by wondering why your news agent hasn't sent a digest in 36 hours is the morning you'll regret it.
What makes Mr Fixit hard¶
It is not the code. Mr Fixit's scripts are the simplest in the fleet: a heartbeat probe, a brain-validation runner, a Dropbox conflict scanner, a tar-everything archival job, a security-audit wrapper. None of them are hard. What makes Mr Fixit hard is that he is the first agent you deploy, which means every silent failure in the deploy pipeline hits his workspace first, and most of those failures produce symptoms that do not point at their causes.
I hit the following on my first Mr Fixit deploy. None of them appeared in any CLI documentation I had access to. None of them produced a grep-able error. Each one cost me somewhere between twenty minutes and two hours to diagnose. Read them before you deploy, not after.
π War story: the first-deploy minefield.
1. OAuth wizard
invalid_scopeon first registration. The first time I ranoci agents add fix-it, the OAuth flow inside the gateway rejected the scope list with a genericinvalid_scopeerror. No clue which scope was at fault. The fix β which I had to discover by diffing the new agent's workspace against themainagent's β was to copyauth-profiles.jsonfrommaininto the new agent's workspace before the first registration attempt.mainhad a working profile;fix-ithad whatever the wizard was trying to build fresh. Every subsequent agent in the fleet has inherited that profile via the same copy.2.
policy: nulldefault in the manifest blocks everything. An empty exec-approvals stanza inmanifest.jsongets serialized to OpenClaw aspolicy: null, which the exec layer interprets as "ask on every command." Every cron hit an approval prompt, and since the crons were--no-deliver, the prompts never reached me β they just piled up in the approval queue while the crons silently returnedexec blocked by approval policy. Fix: setpolicy: fullin the manifest'sapprovalsstanza explicitly. Ch 06 has the full exec-approvals model; the specific symptom here is that "no approvals configured" is not the same thing as "permissive approvals configured," and the default is the worst of both.3.
BOOTSTRAP.mdshadowingIDENTITY.md. Covered in detail in Ch 06 and the OpenClaw-onboarding subsection of Ch 07-0. First symptom I saw: the fox introduced himself to me on Telegram as "I just came online. What should I call you?" while his correctSOUL.mdandIDENTITY.mdsat perfectly deployed one directory over. Fix:deploy.pynow auto-deletesBOOTSTRAP.mdon every deploy, but on my first deploy it didn't yet, and I spent two hours convinced I'd installed the wrong files.4.
exec bash wrapper.shresetBASH_SOURCEand broke the deploy wrapper. Every agent used to have its owndeploy.shthatexec-ed a shared wrapper script. The wrapper used${BASH_SOURCE[1]}to detect which agent had called it, and it refused to run if the caller wasn't recognized.execreplaces the shell process, which resetsBASH_SOURCEβ so[1]was always empty and the wrapper always refused. Fix: each agent now setsOPENCLAW_AGENT_IDas an environment variable beforeexec-ing the wrapper, and the wrapper reads the env var instead of inspecting the bash call stack. (This failure mode got replaced entirely bydeploy.py, but it lives on as a cautionary tale about inspecting call stacks inexec-wrapped scripts.)5.
/setcommandsvia @BotFather is a separate step, and OpenClaw clobbers it. You can't set a bot's slash-command menu via the OpenClaw CLI. You can only set it via the Telegram Bot API directly (setMyCommands) or via @BotFather's/setcommandscommand in an interactive chat. I did neither on my first deploy, and Mr Fixit's Telegram surface was the default "Welcome to your new bot" menu for several days. Worse: once I did set it, OpenClaw's next channel-sync clobbered my custom commands with its built-in defaults. The full fix is in Ch 05's "bot commands and descriptions" section β setcommands.native: falseinopenclaw.jsonand runscripts/set-bot-commands.shas an entrypoint hook, both, because they fail in different scenarios.6. Parser format drift is silent. Mr Fixit reads several config files at runtime β
exec-approvals.json, the agent roster, the cron manifest, his ownCRONS.md. If any of those change format in a way the parser doesn't expect, the parse fails silently, the parser falls back to defaults, and Mr Fixit happily proceeds with stale or empty configuration. I've been bitten twice by this: once when an OpenClaw upgrade added fields toexec-approvals.jsonand my parser dropped anything it didn't recognize, once when I hand-editedCRONS.mdand left a trailing comma that took the parser to an empty dictionary. Fix: the validation scripts now explicitlyasserton the schema and fail loudly rather than silently.7. Windows line endings. On my first terraform + Docker rebuild, the entrypoint script died at container boot with
/usr/bin/env: 'bash\r': No such file or directory. I had edited the entrypoint on Windows andscp-d it to the VPS with CRLF line endings intact.sed -i 's/\r$//'fixes it once, configuring your editor to use LF for Linux-destined files fixes it permanently.ops/scripts/crlf-scan.pyis the preventative version β point it at a file or a directory before SCPing and it'll detect (--fixnormalizes). This one is covered in Ch 03's pitfalls, but it always bites the first deploy, and Mr Fixit is the first deploy.8. Status-file drift and the three-layer root cause. On 2026-04-14 at 12:03 UTC, Mr Fixit's
brain-validationcron fired and escalated one FAIL to my Telegram:Bad header: agents/family-calendar.status.md β expected '# {Name} β Status', got: '# family-calendar status'. The obvious fix was a one-line edit to the file header. I made it, brain-validation went green, and I reported "done." I was wrong by about 22 hours β that's how long it would've taken the nextmorning-briefingcron tick to overwrite my fix, because Mistress Mouse's cron prompt said "5) Update your status file." and the agent LLM was writing the whole file from scratch every tick, picking a header of its own making. The obvious fix was a symptom-layer fix. The real fix lived two layers down. Layer 1 (the surface): edit the file header to match the canonical form. Holds for one cron tick. Layer 2 (the reading side): retirevalidate.py'scheck_agent_files()entirely, because post-R6 the per-agent*.status.mdfiles are legacy βfleet-health.jsonis the authoritative health surface, nothing reads the status files anymore, and the validator checking their headers was a leftover from the R1 era that nobody went back and pruned. Layer 3 (the writing side): strip "Update your status file" from every cron prompt in every agent (6 agents, 17 cron prompts) and replace it with "DO NOT touch~/Dropbox/openclaw-backup/agents/<agent>.status.mdβfleet-health.json(R6 orchestrator) is the authoritative health source." Then add the exact string "Update your status file" to Safeguard 9's forbidden-patterns list as a regression guard, so any future deploy that re-introduces the legacy clause is refused at the gate (exit 8). The lesson I took from it: the symptom layer, the reading layer, and the writing layer are usually three different places, and a surface fix almost always leaves two of them intact. Also β infra retirements leave prose fossils. R6 migrated the health surface fromstatus.mdtofleet-health.jsonweeks earlier, but nobody went back and rewrote the cron prompts that told LLMs to keep writing the retired surface. Every major infra retirement should end with a grep across all cron prompts for the thing you just retired, and a regression guard in Safeguard 9 for the phrase that used to be load-bearing.
Eight silent failures across one agent. None of them produced an error message that pointed at the fix. Each of them is a one-line change once you know what the fix is. Every subsequent agent in this guide benefits from the fact that these were all debugged against Mr Fixit first β which is why "deploy Mr Fixit first" is the first rule of Ch 07-0's deployment order.
Deployment walkthrough¶
The general arc in Ch 07-0 applies in full. What follows is the Mr-Fixit-specific material: workspace paths, manifest details, smoke-test, and the host-side crons you need to install separately.
The 10-cron manifest¶
agents/fix-it/manifest.json.example declares ten OpenClaw-side crons:
| Cron | Schedule | Delivers? | What it does |
|---|---|---|---|
brain-validation |
0 */6 * * * |
silent on pass | python3 ~/Dropbox/openclaw-backup/scripts/validate.py; alerts on any failure |
conflict-scan |
0 */2 * * * |
silent on clean | find β¦ -name '*conflicted copy*'; alerts on any match |
file-size-monitor |
0 12 * * * |
silent on clean | find β¦ -size +500k; alerts on oversized files |
monthly-archival |
0 3 1 * * |
always | moves stale facts and completed tasks older than 90 days to archive/YYYY-MM/ |
security-audit |
0 4 * * 0 |
always | runs the wrapped openclaw security audit --deep and forwards the formatted report |
update-check |
0 4 * * 3 |
always | openclaw update to check for a new version; does not apply |
cron-self-check |
0 0 * * * |
silent on pass | reads expected-crons.json and re-registers anything missing from openclaw cron list |
obsidian-briefing |
10 12 * * * |
silent on success | runs the Obsidian briefing generator from Ch 05 |
probation-end-reminder |
date-gated | always | pings me on a specific date to decide whether an agent stays on probation |
workspace-snapshot |
30 3 * * * |
silent on success | tars each agent's workspace to Dropbox for regression recovery |
Every cron message opens with the EXEC DISCIPLINE preamble from Ch 06's script contract β "run any python3 <path> command exactly as written, no shell wrapping, no chaining, no exit-code capture, scripts print one JSON line with a status field" β and every silent cron explicitly returns the empty string on success ("zero bytes, no narration, not even 'NO output'") to avoid burning LLM tokens on no-op acknowledgements.
Host-side crons β not in the manifest¶
Three Mr-Fixit-adjacent crons live entirely on the VPS host crontab rather than in manifest.json. They are installed by ops/scripts/install-host-cron.sh:
| Host cron | Wrapper | Schedule | What it does |
|---|---|---|---|
fleet-health |
ops/scripts/fleet-health-host.sh |
*/15 * * * * |
Invokes ops/scripts/fleet-health.py, which calls each agent's probe() and writes fleet-health.json. If any agent reports non-ok, the wrapper pushes one aggregated Telegram alert to Mr Fixit's bot. Replaced Mr Fixit's old per-agent in-manifest heartbeat-check cron during the R3 fleet-health consolidation. |
morning-status |
ops/scripts/morning-status-host.sh |
30 10 * * * |
Runs the Mr-Fixit-owned scripts/morning-status.py, which reads fleet-health.json + KNOWN_ISSUES.md and writes the morning brief into cache/morning-brief-ready.txt. Replaced the old in-manifest LLM morning-status cron; the work is deterministic Python, not LLM judgment, and the 600-second cron budget was wildly overkill for it. |
morning-fleet-deliver |
ops/scripts/morning-fleet-deliver-host.sh |
0 12 * * * |
At noon UTC / 5am PT, reads the cached morning briefs written by Mr Fixit and every other agent that produced one, chunks them into Telegram-sized messages, and delivers them. Moved out of the in-container cron loop when the compose-vs-deliver split made the deliver step trivially deterministic. |
The pattern across all three: work that never needed LLM judgment and never needed to live inside the OpenClaw dispatch queue in the first place. An earlier version of Clawford had per-agent heartbeat-check crons and an LLM-composed morning-status cron inside each agent's manifest; the R3 fleet-health consolidation, the R4 morning-status refactor, and the subsequent morning-fleet-deliver split moved all of them off the LLM dispatch queue, because recursive docker exec from inside the container hit race conditions and the 600-second LLM budget was overkill for work that finishes in under a second. Mr Fixit now has exactly one in-container probe β agents/fix-it/scripts/heartbeat.py::probe β and it only checks fleet-health.json freshness: if generated_at is more than 30 minutes old, it alerts that the orchestrator itself is broken, which is how Mr Fixit escalates a dead fleet-health to the human.
install-host-cron.sh is also defensively written against the same yo-yo class of bug that hit cron-self-check (below) β it carries a list of stale crontab markers from previous refactors and removes them on every run, so a re-run of the install script can't accidentally reintroduce a cron that got retired. There are several other host crons registered on the same box for other agents (Costco JWT refresh for Hilda Hippo, LinkedIn keep-alive and engagement polling for Lowly Worm, reminder-check for Mistress Mouse) β see each agent's subchapter for their host-cron surface.
Smoke test¶
After deploy.py fix-it, two things to verify:
- Agent-side:
oc cron listshows all ten crons; fireconflict-scanorbrain-validationas a sanity check and confirm no Telegram output (they're silent-on-success). - Host-side: fire the host cron directly and read the result:
bash ~/repo/ops/scripts/fleet-health-host.sh
python3 -c "import json; d=json.load(open('/home/openclaw/Dropbox/openclaw-backup/fleet-health.json')); print(d['generated_at']); [print(f' {k}: {v[\"status\"]}') for k,v in d['agents'].items()]"
fleet-health.json should have a fresh generated_at and show fix-it: ok. If it shows error with a fleet-health.json missing message, the orchestrator hasn't run yet β re-fire the wrapper and check ~/.openclaw/logs/fleet-health-host.log for the last run's exit code and any traceback.
The post-deploy bot-surface dance¶
Mr Fixit's bot is bound to the Telegram default account β meaning the TELEGRAM_BOT_TOKEN env var points to Mr Fixit's token, not to some other bot. This is non-obvious and there is a whole act of the Ballad of Mr Fixit dedicated to why it has to be this way (see docs/ballad-of-mr-fixit.md Act V); the short version is that default is the chair at the gateway table that can receive inbound messages, and a named account cannot. Named accounts can send but they can't receive. So if your Mr Fixit binding is telegram:fixit with a named account, you'll be able to receive his status reports but you won't be able to message him back and get a reply. Fix: bind his bot as default and put its token in TELEGRAM_BOT_TOKEN.
After deploy, run the bot-surface scripts from Ch 05 to set his slash commands and his short/long descriptions:
bash ~/repo/scripts/set-bot-commands.sh
bash ~/repo/scripts/set-bot-descriptions.sh
Both are idempotent. Both also run automatically via the entrypoint hooks at +25s and +30s after container start, so a docker compose restart re-applies both β but on first deploy, you want to run them once immediately rather than wait.
Pitfalls you'll hit¶
𧨠Pitfall. Binding Mr Fixit's bot as a named
telegram:fixitaccount instead oftelegram:default. Why: named accounts can send but cannot receive. Your Mr Fixit will cheerfully push status reports at you and completely ignore any message you send back, because inbound messages route to thedefaultaccount anddefaultisn't bound to him. I once removed thedefaultaccount entirely while "cleaning up unused bots," which also broke inbound routing for every other agent in the fleet; that debugging saga is Ballad Act IV. How to avoid: Mr Fixit's bot is thedefaultaccount. Put his token inTELEGRAM_BOT_TOKEN, bind him totelegram:default, and don't delete thedefaultentry even if it looks like a leftover from a tutorial.𧨠Pitfall. Mr Fixit's
cron-self-checkyo-yo'ing retired crons back into the LLM scheduler. Why:cron-self-checkreadsexpected-crons.jsonand re-registers anything "missing" fromopenclaw cron list. If the file still lists crons that were intentionally retired or moved to host cron β for example the old per-agentheartbeat-checkcrons that became the host-sidefleet-healthprobe β Mr Fixit will re-create them every midnight UTC, complete with their original messages. The symptom is a Telegram alert at 00:01 UTC listing a handful of crons as "re-registered" after you just spent a weekend retiring them. The worst case: a re-registeredmorning-statusLLM cron firing a duplicate morning brief eighty minutes after the host-side one already ran. How to avoid: pruneexpected-crons.jsonevery time you retire or move a cron. Also, rewrite thecron-self-checkmessage to explicitly enumerate the crons it should re-register and to ignore everything else ("do not re-register anything not in this list even if it looks missing"). The full post-mortem is inDEPLOY.mdgotcha 15.𧨠Pitfall. Confabulating a diagnosis when asked about an expired approval or an ambiguous cron firing. Why: Mr Fixit has the broadest tool access in the fleet, and when you ask him "what fired approval X?" he will occasionally invent a plausible-sounding answer from symptoms rather than running the diagnostic tool first. On 2026-04-11 mine did exactly this β guessed
heartbeat-checkfor an approval that actually came fromsecurity-audit, then proposed reverting a same-day commit without checkinggit log. The fix to the process, not just the incident: I wrote an explicit diagnostic discipline section into hisSOUL.mdwith five rules (cite evidence before root-causing, check git log before reverting, match cron schedules to timestamps, one report one diagnosis one fix, never ask the human to approve a write that hides a permissions bug). Then I put him on a 14-day probation with a failure ledger atprobation.mdand aretire.shscript; one P1/P2/P3 failure during the probation window auto-retires him viabash agents/fix-it/retire.sh. This is still in progress as I write β his probation ends 2026-04-25. How to avoid: if your infra agent has broad privileges, assume it will eventually confabulate under load and write the diagnostic-discipline rules into theSOUL.mdbefore that happens. A failure ledger is cheap; retiring-and-rebuilding an infra agent who lost your trust is not.𧨠Pitfall. Invoking Claude Code via ACP instead of shell. Why: Mr Fixit can invoke Claude Code as a repair tool for complex diagnostics, and there are two ways to do that β a one-shot shell command (
claude -p) and an ACP-dispatched session. ACP creates a persistent session bound to the Telegram thread, and once that binding is active, every subsequent message to the fox's chat routes to Claude Code instead of to Mr Fixit. Your Mr Fixit channel becomes a Claude Code session you cannot exit, because/stop,/new, and/resetall reset the session but leave the thread binding intact. The full story is Ballad Act III. How to avoid: useclaude -pas a one-shot shell command, always with--add-dir ~/Dropbox/openclaw-backup/, and never enable ACP dispatch (oc config set acp.dispatch.enabled false). Mr Fixit'sSOUL.mdsays this explicitly; leave it there.𧨠Pitfall.
scp-ing a localopenclaw.jsononto the VPS to "fix" a config drift. Why:openclaw.jsonstores every agent's channel bindings, registrations, and account mappings. Overwriting it from a local copy wipes all of them, in the fleet-bricking sense. Ch 07-0 flags this in its "two rules that didn't fit earlier," and it is the one place where a five-second move destroys every agent's ability to receive Telegram messages. How to avoid: neverscpopenclaw.json. Useopenclaw config setinside the container (via theoc()wrapper) to change specific fields, or pull the live file to your laptop withssh openclaw@<vps> "cat ~/.openclaw/openclaw.json" > local.jsonas a read-only backup.𧨠Pitfall. Letting Mr Fixit push to GitHub without a pre-push scan. Why: Mr Fixit is the only agent in the fleet with
git pushprivileges β the others commit freely to~/repo/but only he is trusted to push to the remote. He is also the agent most likely to have a cron scheduled by an LLM that silently writes something it shouldn't (a token, a chat ID, a local path). If that commit goes out without a safety scan, you're rewriting history withgit-filter-repobefore breakfast. How to avoid: Mr Fixit's push workflow runsscripts/pre-push-check.shbeforegit push origin master, and the script hard-fails on tracked secrets,.envfiles, unsuffixed per-agent configs, oversized binaries, and empty commit messages. If the script finds something, he alerts the human on Telegram and refuses to push. I've had that refusal fire twice. Both times I was grateful.
See also¶
- Ch 05 β Infra setup β the deploy tool, the bot-commands clobber, the shared brain that Mr Fixit validates.
- Ch 06 β Intro to agents β the script contract, the exec-approvals model, the two-kinds-of-crons distinction, the split-brain
BOOTSTRAP.mdstory. - Ch 07-0 β Your first agent β the general 8-step deploy arc Mr Fixit inherits from.
- Ch 08 β Security and hardening β the
chattr +iimmutable-identity story, the exec-approvals drift detection, and the defense-in-depth posture Mr Fixit enforces. agents/fix-it/SOUL.md.exampleβ the full values and boundaries document, including the diagnostic-discipline section that exists because of the probation incident.agents/fix-it/manifest.json.exampleβ the 10-cron manifest in full, with all cron messages.ops/scripts/fleet-health-host.shandops/scripts/morning-status-host.shβ the host-side crons that observe the container from outside.DEPLOY.mdβ the tendeploy.pysafeguards and the running gotcha list (gotcha 15 is the cron-self-check yo-yo covered above).docs/ballad-of-mr-fixit.mdβ the five-act tragedy. Required reading at least once, ideally before you deploy the fox rather than after.