Skip to content

VPS setup

Last updated: 2026-04-13 · Reading time: ~25 min · Difficulty: moderate

TL;DR

  • Use Terraform to provision the VPS, not the Hetzner console. You want the whole thing reproducible from a git repo because you will rebuild it — probably more than once.
  • I run Hetzner cpx31 in their Hillsboro, Oregon datacenter. Other providers work, but the community Terraform module this guide uses targets Hetzner specifically.
  • Lock down SSH (key-only, non-root, restricted CIDRs), put Tailscale on top, and bind the OpenClaw gateway to localhost only. You should not be able to reach the gateway port from the public internet, ever.
  • Residential proxies are a first-class dependency for any agent that needs to reach Amazon or Costco. Buy a sticky residential egress, not a rotating pool, and test it before you deploy Hilda Hippo.
  • Never develop on the VPS. Code lives in local git; the shared brain lives in Dropbox. The VPS just mounts both.

What Ch 03 gets you

You start this chapter with a Hetzner account and an SSH key. You leave it with:

  • A provisioned, SSH-hardened, firewalled VPS reachable via Tailscale.
  • The OpenClaw gateway container running on it, listening on localhost:18789, with a strong gateway token.
  • A residential proxy wired in and tested end-to-end.
  • A first-boot smoke test you can re-run any time you're nervous.

Ch 04 (dev setup) and Ch 05 (infra setup) pick up from there.

Choosing a provider

I run Hetzner cpx31 out of their Hillsboro, Oregon datacenter. Four vCPUs, 8 GB RAM, 160 GB SSD, currently ~$30/month. It's comfortable for all six agents with headroom, and Hetzner's combination of cheap SSD, a usable console, and a US-West location that doesn't add transatlantic latency to every Telegram round-trip has kept me on it.

Other providers work — DigitalOcean, Linode, Vultr, OVH, AWS Lightsail. What you give up by switching is that the community Terraform module this guide leans on is Hetzner-specific. If you port it to another provider, most of the work is in the provider-specific terraform module; the rest of Clawford doesn't care what's underneath.

A few things to check if you're provider-shopping:

  • IP reputation. Some datacenter ranges are more aggressively blocked by Amazon, Cloudflare, and Costco than others. Hetzner's Hillsboro block has been fine for me for gateway traffic; for scraping I go through a residential proxy anyway (see below).
  • Snapshot / backup pricing. Hetzner charges 20% of the VPS price for automated snapshots. Cheap, and I recommend having them on.
  • Instance availability by region. The tier you want may not be in the region you want. When I was provisioning I picked cpx31 partly because the slightly larger cpx32 wasn't available in my region of choice. If you're shopping among regional SKUs, pick one step above what you think you need — the price delta is small and headroom is cheap.

🔦 Tip. Ch 02's links to the OpenClaw Hetzner install guide and the openclaw-terraform-hetzner module are the canonical sources for exact command sequences. This chapter gives you the shape of the deploy and the choices you'll make — those upstream docs are where to go when you want a command to paste.

What you'll need on your local machine

Before you open a terraform directory, install locally:

  • terraform (Hashicorp installer, Homebrew, or winget)
  • hcloud (the Hetzner CLI)
  • An SSH key pair — ssh-keygen -t ed25519 if you don't have one. This key is the only credential that grants root-level access to the VPS until Tailscale is up; protect it accordingly.

You'll also need:

  • A Hetzner Cloud API token (console → Security → API Tokens). Generate with read/write scope for the project you'll provision into.
  • A GitHub Personal Access Token with read:packages scope, for pulling the OpenClaw Docker image from GHCR during the first build.

All four of these go into environment variables read by Terraform. None of them should ever hit git.

The Terraform flow

The upstream openclaw-terraform-hetzner module provisions the server, installs Docker, configures cloud-init, sets up the firewall, and optionally installs Tailscale. The shape:

  1. Clone the module and the Docker config repo. One gives you the infrastructure-as-code; the other gives you the Dockerfile + OpenClaw gateway config that will live on the VPS.
  2. Create a Hetzner context and upload your SSH key. hcloud context create with your API token, then hcloud ssh-key create to register the public key with the project. Save the fingerprint — you'll pass it to Terraform.
  3. Fill in config/inputs.sh. API token, SSH key fingerprint, CIDR allowlist, path to the Docker config directory, GHCR username and token. The example file ships as inputs.example.sh; copy it and edit locally.
  4. Fill in secrets/openclaw.env. The gateway token (openssl rand -hex 32), the gateway port (18789 by convention), and placeholders for bot tokens that get filled in later when the bots exist. The bind address does not go in the env file — it's handled by the Docker ports: mapping in docker-compose.yml, which pins the host-facing port to loopback. See the firewall section below.
  5. terraform plan and review. Look for: new server, firewall resource, volume, optional Tailscale resource. If you see anything else, stop and read the plan. Terraform state is the system's memory of what it thinks it's doing, and a surprising plan is the first sign that its memory is out of date.
  6. terraform apply. Type yes. Wait ~2 minutes for the VPS to come up. Note the output IP — save it somewhere you can grep for later.

⚠️ Warning. The example inputs.sh and secrets/openclaw.env are git-ignored for a reason. Double-check that they're in .gitignore before you fill anything in, and stage by filename rather than git add -A on anything under infra/ or secrets/.

At this point you have a running VPS with Docker installed, a non-root openclaw user, and an SSH key attached to root. You do not have OpenClaw running yet — that's the next section — and you do not have the box locked down to where I'd want it before I trust it with credentials. That's the section after that.

SSH hardening and the Tailscale overlay

Key-only SSH is a good start, not the end state. The VPS after terraform apply is reachable from 0.0.0.0/0 on port 22 by default. That's fine for ~15 minutes of setup work; it is not fine as a steady state. Two improvements, cheap and high-leverage:

Narrow the CIDR, or close SSH entirely

The fastest improvement is to set TF_VAR_ssh_allowed_cidrs to just your current IP (["203.0.113.42/32"] — not real), re-run terraform apply, and the Hetzner firewall will refuse SSH from anywhere else. That's fine if you always connect from the same place. It falls apart the minute you travel or your ISP rotates your IP.

The durable answer is Tailscale.

Install Tailscale

Tailscale is a WireGuard mesh overlay. Install it on the VPS (curl -fsSL https://tailscale.com/install.sh | sudo sh && sudo tailscale up), install it on your laptop and phone, and the three of them join the same tailnet with private 100.x.x.x addresses. The VPS becomes reachable at openclaw-prod (or whatever you named the machine) from any device you've logged into Tailscale, and invisible to the public internet at the SSH port.

Concretely, what Tailscale buys you:

  • SSH over the tailnet from any device, regardless of where you're sitting (hotel Wi-Fi, phone hotspot, a cafe in Lisbon — all fine).
  • The ability to close port 22 at the Hetzner firewall entirely (TF_VAR_ssh_allowed_cidrs='[]'), which means every SSH brute-force bot on the internet sees a closed port. This is a surprisingly effective upgrade; the noise in /var/log/auth.log drops to zero overnight.
  • Tailscale Serve lets you expose the gateway web UI to yourself without exposing it to the public internet. No SSH tunnel, no port forward, no TLS cert to manage. Just a URL that only works on your tailnet.

What Tailscale does not buy you:

  • Protection against a compromised device that is already on the tailnet. If someone steals your laptop and it's enrolled, they can reach the VPS. MFA on your Tailscale account is the primary mitigation.
  • Protection against stolen or leaked auth keys. Rotate them every ~90 days and set them to expire.
  • Any protection against OpenClaw-level vulnerabilities. Tailscale gets you into the perimeter; the perimeter is still only as strong as what's inside it.

I have Tailscale on as TF_VAR_enable_tailscale=true in every Clawford deploy. If you decide to skip it, you'll need to keep the SSH CIDR allowlist maintained manually and you'll need to SSH-tunnel the gateway port when you want to hit the UI from your laptop. It works, it's just more friction on every action you take.

Firewall basics

The default firewall posture from the Terraform module is conservative and I haven't needed to change it:

  • Inbound SSH (22/tcp): open only to the CIDRs in ssh_allowed_cidrs, or closed entirely when Tailscale is up.
  • Inbound Tailscale (UDP 41641): open when Tailscale is enabled. This is the WireGuard port.
  • Everything else: denied. No HTTP, no HTTPS, no exposed gateway port. The gateway port is pinned to the host's loopback interface at the Docker level — docker-compose.yml declares ports: ["127.0.0.1:18789:18789"], which means the VPS's public IP does not listen on 18789 at all. Anything trying to reach port 18789 from outside sees a closed port regardless of whether Hetzner's firewall is blocking it.

That loopback port mapping is the primary security boundary, so do not touch it. (Inside the container the gateway binds to 0.0.0.0:18789 — that's fine, it's the container's private network. The 127.0.0.1 on the host side of the port mapping is the load-bearing part.) When you need to reach the gateway UI, use an SSH tunnel (ssh -N -L 18789:127.0.0.1:18789 openclaw@<vps-hostname>) or Tailscale Serve — not a change to the ports: line.

⚠️ Warning. If you ever need to debug a "why can't I reach the gateway?" problem, do not "just widen the host-side ports: mapping to 0.0.0.0:18789:18789 for a minute." The gateway token is the only thing between an open port and full control over the fleet, and rate-limiting on the token is minimal. SSH-tunnel or fix Tailscale Serve. Don't open the port.

Starting the OpenClaw gateway

Once the VPS is up and reachable, the OpenClaw container is a separate install step. The canonical command sequence lives in the OpenClaw Hetzner install guide and the Docker config repo's README — I defer to them for exact commands because they move faster than this guide. The high-level flow:

  1. SCP the Dockerfile, docker-compose.yml, entrypoint.sh, and config/ tree onto the VPS under ~/openclaw/ (as the openclaw user). Not docker cp, not scp into the container, not a bind-mount from your laptop. Files on the host survive container recreates; files copied into a container do not.
  2. Drop the secrets/openclaw.env into ~/openclaw/.env. This is the same secrets file you filled in earlier — it contains the gateway token and the bot token placeholders.
  3. docker compose build --no-cache on first install. The Dockerfile multi-stages a Go build (for the Costco CLI wrapper) and a Node base image with Chromium, Python 3, and every other runtime dependency the skills need. The build takes ~5-10 minutes and is the single longest step.
  4. docker compose up -d, then docker compose logs --tail 50 after a minute. You're looking for listening on ws://0.0.0.0:18789 — that's the container-internal bind; Docker's host-side port mapping pins the host-facing port to 127.0.0.1:18789.
  5. Add the oc() and oci() shell helpers to ~/.bashrc on the VPS. These wrap docker compose exec ... openclaw ... so you can run CLI commands inside the gateway container without remembering the full invocation. Ch 10 has the canonical wrapper-function source.

🧨 Pitfall. Line endings. If you edit entrypoint.sh on Windows and scp it to the VPS, CRLF line endings will make the container fail at boot with bash\r: No such file or directory. How to avoid: configure your editor to use LF for Linux-destined files. If something already slipped past you, ops/scripts/crlf-scan.py will detect (and with --fix, normalize) any CRLF-polluted file or directory on either side of an SCP; sed -i 's/\r$//' ~/openclaw/docker/entrypoint.sh is the one-file version for when you know exactly which file is wrong.

Once the gateway is listening, Tailscale Serve (or an SSH tunnel on ssh -N -L 18789:127.0.0.1:18789 openclaw@<vps-hostname>) gets you to the web UI. You'll need the gateway token from .env to actually sign in.

Residential proxies, a first-class dependency

If you are ever going to run Hilda Hippo (the shopping agent), you need a residential proxy. Amazon and Costco both detect Hetzner IPs — and every other major cloud provider's IP ranges — and serve either outright blocks or a persistent series of CAPTCHAs that your headless browser cannot solve. A datacenter IP does not get you to a shopping cart, and I have burned enough evenings confirming that to recommend skipping the experiment.

Sticky vs rotating

The two shapes of residential proxy you'll see:

  • Rotating pool. Each request (or each short window) comes out of a different residential IP. Great for bulk scraping, terrible for anything session-based because your auth cookies were handed out to IP A and the next request comes from IP B — and the site decides you just got phished and invalidates the session.
  • Sticky / session-persistent. One IP stays with you for minutes to hours. Each connection gets its own sticky session via a session identifier encoded in the credentials your provider expects.

You want sticky for shopping workflows. Every Amazon / Costco / LinkedIn arc in this fleet depends on session persistence across scrape → auth → cart → checkout, and a rotating pool breaks every one of those arcs. The good news is this is almost never a pricing decision — most providers (including the one I use) treat rotating vs sticky as a config toggle on the same plan, not as separate tiers. It's a correctness choice, not a cost tradeoff.

DataImpulse

I use DataImpulse's Residential Proxy Premium plan against the US residential pool. Other residential-proxy providers should work on the same general pattern, but I have only tested DataImpulse on the premium pool, so the specifics below are for that provider — and if you pick a different one, treat everything below as the shape of the setup, not the exact strings.

DataImpulse's gateway is gw.dataimpulse.com:823 — a single host and port for both rotating and sticky modes. The session behavior is encoded in the username, not in the port. A working URL has the shape:

http://<login>__<session-spec>:<password>@gw.dataimpulse.com:823

Where <session-spec> is a suffix that encodes country code, rotation behavior, and (for sticky) a session identifier. DataImpulse's dashboard has a copy-pastable "Basic URL example" for whichever mode you've selected — use that rather than hand-constructing the URL. Get the sticky-suffix format from the DataImpulse docs or the dashboard's config panel; don't guess at it, and don't trust a pattern you remember from a different provider. The string has to match what the account expects exactly.

Whichever URL you end up with is a secret: it contains the account password in cleartext. Put it in .env as PROXY_URL and never commit it.

PROXY_URL=http://<login>__<session-spec>:<password>@gw.dataimpulse.com:823

Scripts that need the proxy read PROXY_URL and pass it to Playwright's browser.launch(proxy={...}). Scripts that don't need the proxy just ignore it. The seam is at the Python-subprocess level, not inside the OpenClaw container — which means the scraper processes the agent kicks off route through the proxy, and the gateway itself does not.

Test it before you need it

Before you deploy any agent that will depend on the proxy, run a one-off script from the VPS that:

  1. Launches a Playwright browser (or even just a plain curl) through the proxy.
  2. Loads https://api.ipify.org or https://ifconfig.me.
  3. Prints the resulting IP.

You should see a residential IP in the expected country, not the Hetzner IP. If you see the Hetzner IP, either your PROXY_URL isn't being read, the environment variable isn't reaching the subprocess, or the credentials are wrong. If you see a residential IP in the wrong country, check the country code in the session suffix. Diagnose either before the first real scrape — it is much easier to debug a one-shot test than to chase a mystery 403 in the middle of a cron run.

For sticky verification, make two back-to-back requests against api.ipify.org using the same sticky session string. Both should return the same IP. If they differ, the session suffix is not doing what you think it is, and any shopping flow that depends on it will silently break at checkout time.

🧨 Pitfall. Hitting the provider's rate limits or account lockout because the first real use burns through your traffic quota on failed-auth retries. Why: a misconfigured sticky session can look like "auth failed → retry → auth failed → retry" and burn gigabytes in minutes. How to avoid: test the proxy in isolation first. Confirm the residential IP. Confirm session stickiness (two requests in the same session should come from the same IP). Only then point Hilda Hippo at it.

Disk, backups, snapshots

A practical note on how the VPS's 160 GB of disk gets used:

  • The shared brain. Tiny today. The whole brain directory lives comfortably under 100 MB right now — a few months of daily notes, people files, commitments, tasks, and a modest fact set. I expect that to grow as I enrich the brain with more facts, more history, and deeper per-person context over time, but cpx31 has headroom even for a much larger brain than this one. Don't size the VPS around today's footprint.
  • Docker images and layers. Not tiny. The OpenClaw gateway image is ~2 GB after the Chromium + Python 3 + Node base + Costco Go build multi-stage. Add image history, dangling layers from rebuilds, and docker system prune -a from time to time.
  • Deploy-backup tarballs. deploy.py writes a pre-deploy tarball to ~/.openclaw/deploy-backups/<agent>-<timestamp>.tar.gz on every deploy. These pile up unless you prune them. Cheap to keep; even cheaper to forget about until you're trying to debug a disk-full event at midnight.
  • Playwright browser cache and Camoufox state. Browser automation eats a surprising amount of disk in cookies, local storage, and cached page assets. Watch ~/.cache/camoufox/ and ~/.cache/ms-playwright/ if you start seeing unexpected disk growth.

I have automatic Hetzner snapshots turned on — 20% of the VPS monthly price, one snapshot per day, seven-day retention. They are the difference between "last night's update broke everything and I can roll back" and "last night's update broke everything and I have to recreate the brain by hand." Turn them on. You will use them.

First-boot smoke test

Before you consider the VPS "done" and move on to Ch 04, walk this checklist. Each item is a one-liner that should either pass or give you an obvious clue about what's wrong:

  • ssh openclaw@<tailnet-hostname> "echo connected" → prints connected
  • ssh openclaw@<tailnet-hostname> "docker --version" → prints a Docker version
  • ssh openclaw@<tailnet-hostname> "cd ~/openclaw && docker compose ps" → shows openclaw-gateway with state Up
  • ssh openclaw@<tailnet-hostname> "cd ~/openclaw && docker compose logs --tail 5" → last lines contain listening on ws://0.0.0.0:18789 (container-internal bind)
  • ssh openclaw@<tailnet-hostname> "oc health" (after the oc() helper is in .bashrc) → prints Telegram: ok and Agents: main (default)
  • ssh openclaw@<tailnet-hostname> "git clone https://github.com/your-handle/your-repo.git /tmp/test-clone && rm -rf /tmp/test-clone" → clones and cleans up without a credential prompt
  • Residential-proxy IP check (from the section above) → prints a residential IP, not the Hetzner IP

If all seven pass, the VPS is ready for Ch 04.

The three invariants to carry into everything else

Three rules that everything else in the guide assumes. Two belong in code gates already (Ch 04 covered the enforcement), and the third belongs in your muscle memory:

Never develop on the VPS

This is the first rule I broke and the first one I regret breaking. The code that runs this fleet — scripts, cron manifests, deploy.py, configs — lives in local git. The shared brain — facts, people, commitments, tasks, notes — lives in Dropbox and gets bind-mounted into the container at runtime. Neither one lives on the VPS as its source of truth. The VPS is a mount point where both are assembled at runtime, and nothing else.

What this rule forbids is on-VPS code edits: hand-editing a script in ~/openclaw/scripts/, tweaking a cron message inside ~/.openclaw/<agent>-workspace/CRONS.md, or patching a deploy tool in place. On-VPS code edits are invisible to git, invisible to deploy.py's drift detection (unless it's explicitly checking), and they're the first thing you forget when you're trying to reproduce a working state weeks later. deploy.py Safeguard 1 refuses to deploy if there are uncommitted local changes, and Safeguard 5 refuses to deploy if the VPS workspace has drifted from the last manifest — both exist because I did not respect this rule and had to recover from it.

Brain writes are the explicit exception. Agents are expected to write facts, commitments, daily notes, and the like into the Dropbox-synced brain paths — that is the intended runtime flow, not a violation of the rule. The rule is about code, not about runtime state.

Bake all dependencies into the Dockerfile

The top of ops/Dockerfile has a literal comment: "Runtime installs are lost on container restart. All binaries needed by skills MUST be installed here at build time." The reason that comment is there is that I once added a Playwright browser install step to a runtime script, it worked for a week, the container recreated itself for an unrelated reason, and every Playwright-using cron fell over at once because the browser binary was no longer present. If a cron needs a binary, the binary goes in the Dockerfile. If a cron needs a Python package, the package goes in the requirements file that the Dockerfile installs at build time. Never apt-get install or pip install from inside a running agent.

SCP to the host volume, not to the container

When you need to update a script or config file on the VPS (as opposed to a full git-driven deploy), scp the file to the host path the container sees via a bind mount — ~/openclaw/... on the host, /home/node/.openclaw/... inside the container. Do not docker cp into the container directly. Container filesystems are ephemeral; a container recreate (for an OpenClaw upgrade, a crashed process, a rebuild) will wipe anything you docker cp'd. Host volumes survive.

Pitfalls you'll hit

🧨 Pitfall. Provisioning with TF_VAR_ssh_allowed_cidrs='["0.0.0.0/0"]' and then forgetting to narrow it. Why: the default opens SSH to the world, which is fine for the first hour and irresponsible after that. How to avoid: either narrow the CIDR to your current IP immediately after terraform apply succeeds, or enable Tailscale first and set the CIDR to '[]' so SSH is closed to the public internet entirely.

🧨 Pitfall. Binding the OpenClaw gateway to 0.0.0.0 "just to debug." Why: the gateway token is the only thing between that bind and full fleet control, and the rate limiting on it is minimal. Opening the port for "a minute" leaves it open until the next restart, and a scanner will find it. How to avoid: keep OPENCLAW_GATEWAY_BIND=127.0.0.1 always. Reach the UI via SSH tunnel or Tailscale Serve, not via a public bind.

🧨 Pitfall. Using a rotating residential proxy (or no proxy, or a datacenter proxy) for shopping workflows. Why: a new IP per request invalidates every session Amazon and Costco build, so auth flows that need a consistent egress IP across requests silently fail. Datacenter IPs get blocked outright. How to avoid: buy a sticky residential proxy, test it with a one-off script that prints its egress IP, and confirm session persistence before you let an agent near it.

See also