Part 1. Setting Up OpenClaw on a Cloud Server — Why Costs Hit $10 a Day
![]()
Series: Can AI Agents Actually Run Business Operations? — Part 1
OpenClaw has been getting a lot of attention as an AI agent platform since late 2025. But how usable is it in real business work? And how does it hold up from a security standpoint?
In this series, I share what I learned from using OpenClaw on an actual client engagement — rough edges, failures, and course corrections included. I am not an OpenClaw expert; this is one engineer working through the real problems. I hope it helps people who are evaluating OpenClaw or similar AI agent platforms for adoption, as a kind of “here is what to watch out for” reference.
In this post (Part 1), I walk through why my API bill came in 5-10x higher than expected on day one. The traps include: auto-compaction defaults that are tuned for avoiding crashes rather than controlling costs, Cron and HEARTBEAT running the same task twice, and a 3-tier model strategy (including a local LLM) that brings the numbers back under control.
For the overall project summary, please see the project page.
1. Server Selection and Base Setup
We chose Hetzner Cloud for a 24/7 agent server.
| Item | Value |
|---|---|
| Location | Helsinki (hel1) |
| OS | Ubuntu 24.04 |
| Spec | 4vCPU / 16GB RAM / 150GB SSD |
| Monthly | about €12 |
Latency is not a big deal for long-running agents, so the European region was fine. Cloud was the right fit because procuring dedicated hardware was not realistic for a client project, we needed a sandbox separated from the internal network, and 24/7 uptime was a requirement.
2. OpenClaw Installation — Things to Watch
What --accept-risk actually means
OpenClaw onboarding runs like this:
npm install -g openclaw@latest
openclaw onboard --install-daemon --non-interactive --accept-risk
If you leave out --accept-risk, you get:
Non-interactive setup requires explicit risk acknowledgement.
OpenClaw runs with full host access, so it asks for explicit risk acknowledgement. The agent can read and write files and run arbitrary commands. You should understand this before deciding to install it.
Browser settings on a server without GUI
On a GUI-less server running as root, two settings are needed:
openclaw config set browser.noSandbox true
openclaw config set browser.headless true
You also need to install Google Chrome Stable in advance:
wget -q https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb -O /tmp/chrome.deb
apt-get install -y /tmp/chrome.deb
Exposing the web dashboard
OpenClaw has a built-in web management UI. To access it from outside, I used sslip.io to turn the server IP into a domain, then set up Nginx as a reverse proxy with Let’s Encrypt.
One setting to be careful about: OpenClaw requires device pairing as a brute-force protection. For external access, this has to be disabled:
openclaw config set gateway.controlUi.dangerouslyDisableDeviceAuth true
The name has dangerously in it for a reason. I compensated with three layers of protection: Nginx Basic Auth, a gateway token, and HTTPS.
3. Security Settings — Do Them First
If you run OpenClaw on a cloud server, please do these on day one.
- Full host access:
--accept-riskis literal. The agent can read/write files and run commands freely. .envfile permissions: The file is created with644(world-readable) by default. It contains API keys, sochmod 600is required.- Template files: Be careful not to accidentally write real API keys into git-tracked template files like
.env.example.
chmod 600 /root/.openclaw/.env
Or let OpenClaw do it for you — openclaw security audit --fix tightens permissions on state, config, and include files in one pass.
It is easy to skip these while you are focused on getting things to work. I would recommend doing them before anything else.
4. Cost Surprise — 5-10x Over Budget
The day after setup, I checked OpenClaw’s /usage dashboard and found costs were much higher than expected.
- Expected: $1-2/day ($30-60/month)
- Actual: $8-10/day ($240-300/month)
I broke down the causes and found four issues stacking up.
Cause 1: Context token bloat (the biggest factor)
The main session had grown to about 310,000 tokens.
agent:main:main │ 310k/200k (155%)
Auto-compaction is enabled by default, but its defaults are tuned for crash avoidance, not cost control. According to the official documentation, OpenClaw compacts a session only when contextTokens > contextWindow - reserveTokens (default reserveTokens = 16,384) or when the model returns an overflow error. In other words, it fires close to the context ceiling — not when costs are climbing.
In my setup, the main agent was on a large-context model, so the session kept growing without ever reaching the compaction trigger. Every message was re-sending the accumulated setup logs and error history.
| State | Per message | Per 100 messages |
|---|---|---|
| 310K tokens | $0.023 | $2.30 |
| 40K tokens (after reset) | $0.003 | $0.30 |
Even a cheap model cannot save you from a large context. The practical fix is to start a fresh session with /reset after setup, or to tune agents.defaults.compaction.reserveTokens to a more aggressive value so compaction runs well before the ceiling.
Cause 2: Cron jobs running all the time
A monitoring cron running every 30 minutes was hitting the API 48 times a day.
Cause 3: Cron and HEARTBEAT both running
This one is the hardest to notice. OpenClaw has two scheduling mechanisms:
| Mechanism | Session | How to stop |
|---|---|---|
| Cron | Isolated session | Manage with openclaw cron |
| HEARTBEAT | Inside main session | Edit HEARTBEAT.md |
If you stop cron but leave the same task in HEARTBEAT.md, it keeps running. In practice, after I had disabled all crons, web_search calls continued.
Cause 4: Heavy use of web_search
web_search: 186 calls
web_fetch: 33 calls
The HEARTBEAT auto-processing was triggering web searches more than I realized.
5. Countermeasures — Compress Sessions, Review Scheduled Tasks
Immediate steps and their effect:
| Action | Effect |
|---|---|
Start a new session (/reset) | 310K → 21K tokens for the active session |
| Disable all cron jobs | Background API calls went to zero |
| Remove monitoring tasks from HEARTBEAT | Stopped the 30-minute auto-search |
A quick note on the session commands: /reset is an alias for /new [model] — it begins a fresh session rather than compacting the current one. If you need to reduce session size while preserving history, use /compact [instructions] instead. /context list and /usage tokens help you see what is actually consuming tokens before you decide which to use.
Before vs. after:
| Metric | Before | After |
|---|---|---|
| Main session tokens | 310,000 | 21,000 |
| Background API calls | 48/day | 0/day |
| Estimated monthly cost | $240-300 | $10-20 |
6. Local LLM — Ollama + Gemma3 4B
To fundamentally solve the 24/7 cron cost problem (which would have been $50-100/month on its own), I added a local LLM.
Model selection
| Model | Result | Note |
|---|---|---|
| Qwen3 4B | Not adopted | Thinking mode always on. A 3-line answer took 2,500 tokens and 3 minutes |
| Gemma3 4B | Adopted | The same task took 60 tokens and 3 seconds |
Qwen3’s chat template force-injects a <think> tag. On CPU inference, this slows things down significantly. I tried /no_think and a custom Modelfile, but the effect was limited.
In my view, Thinking-forced models are not a good fit for CPU-only inference environments.
Server resource usage
Hetzner: 4vCPU (AMD EPYC) / 16GB RAM / no GPU
Gemma3 4B loaded:
Disk: 3.3 GB
RAM: 4.2 GB (CPU inference)
Free RAM: 10 GB (plenty of room for OpenClaw + OS)
Speed: 14-15 tok/s
A 16GB-RAM server is enough. Even without a GPU, response speed is workable for routine tasks.
7. The 3-Tier Model Strategy
Based on this experience, we settled on a 3-tier strategy that matches model tier to task characteristics.
| Tier | Model | Use | Cost band |
|---|---|---|---|
| Tier 1 | Claude Sonnet / Opus | Code generation, deep analysis, inconsistency detection | medium-high |
| Tier 2 | Gemini 3 Flash | Orchestration, web summarization, HTML extraction | low |
| Tier 3 | Gemma3 4B (local) | Heartbeat response, routine processing | $0 |
Execution strategy
- Code-First: Convert rule checks into Python code and run them without an LLM (zero execution cost)
- Smart Routing: Low-load tasks go to Gemini Flash, heavy tasks go to Claude Sonnet
- Local Preference: Keep routine work on Ollama + Gemma3
- Code generation is Sonnet only: Gemini Flash tends to fall back to hardcoding (details in Part 2), so it is not suitable for code generation
Let OpenClaw configure itself
Rather than editing config files directly from Claude Code, it worked better to prompt OpenClaw itself and let it update its own settings.
openclaw agent --agent main --message 'Please change the default model to google/gemini-3-flash-preview.
Use models based on task difficulty:
- Normal: google/gemini-3-flash-preview
- Medium: anthropic/claude-sonnet-4-6
- Hard: anthropic/claude-opus-4-6'
OpenClaw updated its own default model and recorded the routing strategy in SOUL.md.
8. Takeaways
- Auto-compaction protects you from crashes, not from costs. It is on by default, but the default
reserveTokens = 16,384only triggers compaction near the context ceiling. With a large-window model, you can accumulate hundreds of thousands of tokens before it fires. Tuneagents.defaults.compaction.reserveTokensto be more aggressive, or/resetmanually after setup. - Cron and HEARTBEAT are different mechanisms. If you stop one, always check the other.
- Cheap models do not save you from large contexts. Gemini Flash is low-cost per token, but if you send 310K tokens every message, it is not cheap anymore. Cost = price × tokens × calls — you have to evaluate all three.
- Run
/resetright after setup to start a fresh session. Setup sessions accumulate a lot of trial-and-error context that you do not need going forward. Note that/resetbegins a new session rather than compacting the current one — use/compactif you need to keep history while reducing size. - Avoid Thinking-forced models on CPU inference. They are fine on GPU, but unusable on CPU-only servers.
- Check
/usagedaily. A one-day delay in noticing an anomaly can mean a $10 difference.
9. Post-Setup Checklist
- Check cost on
/usage(establish a baseline) - Check token count per session with
openclaw status - Inspect context composition with
/context listand/usage tokens - Review scheduled jobs with
openclaw cron --helporopenclaw tasks list - Review
HEARTBEAT.mdfor unnecessary tasks - Run
openclaw security audit --fixto tighten file permissions - Run
/resetafter setup is complete (or/compactif history matters) - Set a monthly spend cap in your API provider dashboard
- Browser settings:
headless: true+noSandbox: true
In Part 2, I will share an experiment where we gave AI models only the rule specification (in Japanese text) and asked them to generate Python check code on their own. Comparing three models, we found that Gemini Flash had been hardcoding the expected answers.
Join the conversation on LinkedIn — share your thoughts and comments.
Discuss on LinkedIn