docs(fly): comprehensive deployment guide with real-world learnings

Based on actual Flawd deployment experience:
- Proper fly.toml configuration with all required settings
- Step-by-step guide following exe.dev doc format
- Troubleshooting section with common issues and fixes
- Config file creation via SSH
- Cost estimates
This commit is contained in:
Peter Steinberger
2026-01-24 08:05:57 +00:00
parent a8f2ac5411
commit 90685ef814

View File

@@ -5,75 +5,49 @@ description: Deploy Clawdbot on Fly.io
# Fly.io Deployment
Deploy Clawdbot on [Fly.io](https://fly.io) with persistent storage and automatic HTTPS.
**Goal:** Clawdbot Gateway running on a [Fly.io](https://fly.io) machine with persistent storage, automatic HTTPS, and Discord/channel access.
## Prerequisites
## What you need
- [flyctl CLI](https://fly.io/docs/hands-on/install-flyctl/) installed
- Fly.io account
- Fly.io account (free tier works)
- Model auth: Anthropic API key (or other provider keys)
- Channel credentials: Discord bot token, Telegram token, etc.
## Quick Start
## Beginner quick path
1. Clone repo → customize `fly.toml`
2. Create app + volume → set secrets
3. Deploy with `fly deploy`
4. SSH in to create config or use Control UI
## 1) Create the Fly app
```bash
# Clone and enter the repo
# Clone the repo
git clone https://github.com/clawdbot/clawdbot.git
cd clawdbot
# Create the app (first time only)
fly apps create clawdbot
# Create a new Fly app (pick your own name)
fly apps create my-clawdbot
# Create persistent volume for data
# Create a persistent volume (1GB is usually enough)
fly volumes create clawdbot_data --size 1 --region lhr
# Set your secrets
fly secrets set ANTHROPIC_API_KEY=your-key-here
# Add other provider keys as needed
# Deploy
fly deploy
```
## Configuration
**Tip:** Choose a region close to you. Common options: `lhr` (London), `iad` (Virginia), `sjc` (San Jose).
The included `fly.toml` is a starting template. Key settings to customize:
## 2) Configure fly.toml
### VM Size
The default `shared-cpu-1x` with 512MB may be too small for production. Recommended:
Edit `fly.toml` to match your app name and requirements:
```toml
[[vm]]
size = "shared-cpu-2x"
memory = "2048mb"
```
app = "my-clawdbot" # Your app name
primary_region = "lhr"
### Bind Address
[build]
dockerfile = "Dockerfile"
**Important**: The gateway must bind to `0.0.0.0` for Fly's proxy to reach it:
```toml
[processes]
app = "node dist/index.js gateway --allow-unconfigured --port 3000 --bind lan"
```
When using `--bind lan`, you must also set a gateway token for security:
```bash
fly secrets set CLAWDBOT_GATEWAY_TOKEN=$(openssl rand -hex 32)
```
### State Directory
Store persistent data on the volume:
```toml
[env]
CLAWDBOT_STATE_DIR = "/data"
```
### Full Example
```toml
[env]
NODE_ENV = "production"
CLAWDBOT_PREFER_PNPM = "1"
@@ -83,33 +57,142 @@ Store persistent data on the volume:
[processes]
app = "node dist/index.js gateway --allow-unconfigured --port 3000 --bind lan"
[http_service]
internal_port = 3000
force_https = true
auto_stop_machines = false
auto_start_machines = true
min_machines_running = 1
processes = ["app"]
[[vm]]
size = "shared-cpu-2x"
memory = "2048mb"
[mounts]
source = "clawdbot_data"
destination = "/data"
```
## Secrets
**Key settings:**
Set your API keys as secrets (never commit these):
| Setting | Why |
|---------|-----|
| `--bind lan` | Binds to `0.0.0.0` so Fly's proxy can reach the gateway |
| `--allow-unconfigured` | Starts without a config file (you'll create one after) |
| `memory = "2048mb"` | 512MB is too small; 2GB recommended |
| `CLAWDBOT_STATE_DIR = "/data"` | Persists state on the volume |
## 3) Set secrets
```bash
fly secrets set ANTHROPIC_API_KEY=sk-...
# Required: Gateway token (for non-loopback binding)
fly secrets set CLAWDBOT_GATEWAY_TOKEN=$(openssl rand -hex 32)
# Model provider API keys
fly secrets set ANTHROPIC_API_KEY=sk-ant-...
# Optional: Other providers
fly secrets set OPENAI_API_KEY=sk-...
fly secrets set GOOGLE_API_KEY=...
# Channel tokens
fly secrets set DISCORD_BOT_TOKEN=MTQ...
```
## Accessing the Gateway
**Notes:**
- Non-loopback binds (`--bind lan`) require `CLAWDBOT_GATEWAY_TOKEN` for security.
- Treat these tokens like passwords.
After deployment:
## 4) Deploy
```bash
# Open the web UI
fly open
fly deploy
```
# Check logs
First deploy builds the Docker image (~2-3 minutes). Subsequent deploys are faster.
After deployment, verify:
```bash
fly status
fly logs
```
# SSH into the machine
You should see:
```
[gateway] listening on ws://0.0.0.0:3000 (PID xxx)
[discord] logged in to discord as xxx
```
## 5) Create config file
SSH into the machine to create a proper config:
```bash
fly ssh console
```
Create the config directory and file:
```bash
mkdir -p /data/.clawdbot
cat > /data/.clawdbot/clawdbot.json << 'EOF'
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-opus-4-5"
},
"models": {
"anthropic/claude-opus-4-5": {},
"anthropic/claude-sonnet-4-5": {}
},
"maxConcurrent": 4
},
"list": [
{
"id": "main",
"default": true
}
]
},
"channels": {
"discord": {
"enabled": true
}
}
}
EOF
```
Restart to apply:
```bash
exit
fly machine restart <machine-id>
```
## 6) Access the Gateway
### Control UI
Open in browser:
```bash
fly open
```
Or visit `https://my-clawdbot.fly.dev/`
Paste your gateway token (the one from `CLAWDBOT_GATEWAY_TOKEN`) to authenticate.
### Logs
```bash
fly logs # Live logs
fly logs --no-tail # Recent logs
```
### SSH Console
```bash
fly ssh console
```
@@ -117,12 +200,15 @@ fly ssh console
### "App is not listening on expected address"
If you see this warning, the gateway is binding to `127.0.0.1` instead of `0.0.0.0`. Add `--bind lan` to your process command (see Configuration above).
The gateway is binding to `127.0.0.1` instead of `0.0.0.0`.
**Fix:** Add `--bind lan` to your process command in `fly.toml`.
### OOM / Memory Issues
If the container gets killed or restarts frequently, increase memory:
Container keeps restarting or getting killed.
**Fix:** Increase memory in `fly.toml`:
```toml
[[vm]]
memory = "2048mb"
@@ -130,18 +216,52 @@ If the container gets killed or restarts frequently, increase memory:
### Gateway Lock Issues
If the gateway refuses to start with "already running" errors after a container restart, this is a stale PID lock. The lock file persists on the volume but the process doesn't survive restarts.
Gateway refuses to start with "already running" errors.
**Fix**: Delete the lock file via SSH:
This happens when the container restarts but the PID lock file persists on the volume.
**Fix:** Delete the lock file:
```bash
fly ssh console
rm /data/.clawdbot/run/gateway.*.lock
exit
fly machine restart <machine-id>
```
Then restart the machine.
### Config Not Being Read
If using `--allow-unconfigured`, the gateway creates a minimal config. Your custom config at `/data/.clawdbot/clawdbot.json` should be read on restart.
Verify the config exists:
```bash
fly ssh console --command "cat /data/.clawdbot/clawdbot.json"
```
## Updates
```bash
# Pull latest changes
git pull
# Redeploy
fly deploy
# Check health
fly status
fly logs
```
## Notes
- Fly.io uses **x86** architecture (not ARM)
- Fly.io uses **x86 architecture** (not ARM)
- The Dockerfile is compatible with both architectures
- For WhatsApp/Telegram, you'll need to run onboarding via `fly ssh console`
- For WhatsApp/Telegram onboarding, use `fly ssh console`
- Persistent data lives on the volume at `/data`
## Cost
With the recommended config (`shared-cpu-2x`, 2GB RAM):
- ~$10-15/month depending on usage
- Free tier includes some allowance
See [Fly.io pricing](https://fly.io/docs/about/pricing/) for details.