Merge remote-tracking branch 'origin/main' into upstream-preview-nix-2025-12-20

This commit is contained in:
Peter Steinberger
2026-01-01 09:15:28 +01:00
163 changed files with 10867 additions and 1712 deletions

View File

@@ -193,6 +193,7 @@ Notes:
- The arm default timeout is **2 minutes** (clamped to max 2 minutes); pass `timeoutMs` if you need shorter.
- `snapshot` defaults to `ai`; `aria` returns an accessibility tree for debugging.
- `click`/`type` require `ref` from `snapshot --format ai`; use `evaluate` for rare CSS selector one-offs.
- Avoid `wait` by default; use it only in exceptional cases when there is no reliable UI state to wait on.
## Security & privacy notes

View File

@@ -35,6 +35,7 @@ All camera access is gated behind **user-controlled settings**.
- `format: "jpg"`
- `base64: "<...>"`
- `width`, `height`
- Payload guard: photos are recompressed to keep the base64 payload under 5 MB.
- `camera.clip`
- Params:
@@ -90,6 +91,10 @@ If permissions are missing, the app will prompt when possible; if denied, `camer
Like `canvas.*`, the Android node only allows `camera.*` commands in the **foreground**. Background invocations return `NODE_BACKGROUND_UNAVAILABLE`.
### Payload guard
Photos are recompressed to keep the base64 payload under 5 MB.
## macOS app
### User setting (default off)
@@ -116,6 +121,7 @@ clawdis nodes camera clip --node <id> --no-audio
Notes:
- `clawdis nodes camera snap` defaults to `maxWidth=1600` unless overridden.
- Photo payloads are recompressed to keep base64 under 5 MB.
## Safety + practical limits

View File

@@ -195,6 +195,28 @@ Controls inbound/outbound prefixes and timestamps.
}
```
### `talk`
Defaults for Talk mode (macOS/iOS/Android). Voice IDs fall back to `ELEVENLABS_VOICE_ID` or `SAG_VOICE_ID` when unset.
`apiKey` falls back to `ELEVENLABS_API_KEY` (or the gateways shell profile) when unset.
`voiceAliases` lets Talk directives use friendly names (e.g. `"voice":"Clawd"`).
```json5
{
talk: {
voiceId: "elevenlabs_voice_id",
voiceAliases: {
Clawd: "EXAVITQu4vr4xnSDxMaL",
Roger: "CwhRBWXzGAHq8TQ4Fs17"
},
modelId: "eleven_v3",
outputFormat: "mp3_44100_128",
apiKey: "elevenlabs_api_key",
interruptOnSpeech: true
}
}
```
### `agent`
Controls the embedded agent runtime (model/thinking/verbose/timeouts).
@@ -237,6 +259,8 @@ Controls the embedded agent runtime (model/thinking/verbose/timeouts).
If `modelAliases` is configured, you may also use the alias key (e.g. `Opus`).
If you omit the provider, CLAWDIS currently assumes `anthropic` as a temporary
deprecation fallback.
Z.AI models are available as `zai/<model>` (e.g. `zai/glm-4.7`) and require
`ZAI_API_KEY` (or legacy `Z_AI_API_KEY`) in the environment.
`agent.heartbeat` configures periodic heartbeat runs:
- `every`: duration string (`ms`, `s`, `m`, `h`); default unit minutes. Omit or set
@@ -445,6 +469,20 @@ Defaults:
}
```
### `ui` (Appearance)
Optional accent color used by the native apps for UI chrome (e.g. Talk Mode bubble tint).
If unset, clients fall back to a muted light-blue.
```json5
{
ui: {
seamColor: "#FF4500" // hex (RRGGBB or #RRGGBB)
}
}
```
### `gateway` (Gateway server mode + bind)
Use `gateway.mode` to explicitly declare whether this machine should run the Gateway.

View File

@@ -7,11 +7,12 @@ read_when:
# Logging (macOS)
## Rolling diagnostics file log (Debug pane)
Clawdis can write a local, rotating diagnostics log to disk (useful when macOS unified logging is impractical during iterative repros).
Clawdis routes macOS app logs through swift-log (unified logging by default) and can write a local, rotating file log to disk when you need a durable capture.
- Enable: **Debug pane → Diagnostics log → “Write rolling diagnostics log (JSONL)”**
- Verbosity: **Debug pane → LogsApp logging → Verbosity**
- Enable: **Debug pane → Logs → App logging → “Write rolling diagnostics log (JSONL)”**
- Location: `~/Library/Logs/Clawdis/diagnostics.jsonl` (rotates automatically; old files are suffixed with `.1`, `.2`, …)
- Clear: **Debug pane → Diagnostics log → “Clear”**
- Clear: **Debug pane → Logs → App logging → “Clear”**
Notes:
- This is **off by default**. Enable only while actively debugging.

View File

@@ -8,6 +8,7 @@ read_when:
## What is shown
- We surface the current agent work state in the menu bar icon and in the first status row of the menu.
- Health status is hidden while work is active; it returns when all sessions are idle.
- The “Nodes” block in the menu lists **devices** only (gateway bridge nodes via `node.list`), not client/presence entries.
## State model
- Sessions: events arrive with `runId` (session key). The “main” session is the key `main`; if absent, we fall back to the most recently updated session.

79
docs/talk.md Normal file
View File

@@ -0,0 +1,79 @@
---
summary: "Talk mode: continuous speech conversations with ElevenLabs TTS"
read_when:
- Implementing Talk mode on macOS/iOS/Android
- Changing voice/TTS/interrupt behavior
---
# Talk Mode
Talk mode is a continuous voice conversation loop:
1) Listen for speech
2) Send transcript to the model (main session, chat.send)
3) Wait for the response
4) Speak it via ElevenLabs (streaming playback)
## Behavior (macOS)
- **Always-on overlay** while Talk mode is enabled.
- **Listening → Thinking → Speaking** phase transitions.
- On a **short pause** (silence window), the current transcript is sent.
- Replies are **written to WebChat** (same as typing).
- **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.
## Voice directives in replies
The assistant may prefix its reply with a **single JSON line** to control voice:
```json
{"voice":"<voice-id>","once":true}
```
Rules:
- First non-empty line only.
- Unknown keys are ignored.
- `once: true` applies to the current reply only.
- Without `once`, the voice becomes the new default for Talk mode.
- The JSON line is stripped before TTS playback.
Supported keys:
- `voice` / `voice_id` / `voiceId`
- `model` / `model_id` / `modelId`
- `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`
- `seed`, `normalize`, `lang`, `output_format`, `latency_tier`
- `once`
## Config (clawdis.json)
```json5
{
"talk": {
"voiceId": "elevenlabs_voice_id",
"modelId": "eleven_v3",
"outputFormat": "mp3_44100_128",
"apiKey": "elevenlabs_api_key",
"interruptOnSpeech": true
}
}
```
Defaults:
- `interruptOnSpeech`: true
- `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available)
- `modelId`: defaults to `eleven_v3` when unset
- `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available)
- `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)
## macOS UI
- Menu bar toggle: **Talk**
- Config tab: **Talk Mode** group (voice id + interrupt toggle)
- Overlay:
- **Listening**: cloud pulses with mic level
- **Thinking**: sinking animation
- **Speaking**: radiating rings
- Click cloud: stop speaking
- Click X: exit Talk mode
## Notes
- Requires Speech + Microphone permissions.
- Uses `chat.send` against session key `main`.
- TTS uses ElevenLabs streaming API with `ELEVENLABS_API_KEY` and incremental playback on macOS/iOS/Android for lower latency.
- `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.
- `latency_tier` is validated to `0..4` when set.
- Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.

View File

@@ -7,3 +7,16 @@ read_when:
- `pnpm test:force`: Kills any lingering gateway process holding the default control port, then runs the full Vitest suite with an isolated gateway port so server tests dont collide with a running instance. Use this when a prior gateway run left port 18789 occupied.
- `pnpm test:coverage`: Runs Vitest with V8 coverage. Global thresholds are 70% lines/branches/functions/statements. Coverage excludes integration-heavy entrypoints (CLI wiring, gateway/telegram bridges, webchat static server) to keep the target focused on unit-testable logic.
## Model latency bench (local keys)
Script: `scripts/bench-model.ts`
Usage:
- `source ~/.profile && pnpm tsx scripts/bench-model.ts --runs 10`
- Optional env: `MINIMAX_API_KEY`, `MINIMAX_BASE_URL`, `MINIMAX_MODEL`, `ANTHROPIC_API_KEY`
- Default prompt: “Reply with a single word: ok. No punctuation or extra text.”
Last run (2025-12-31, 20 runs):
- minimax median 1279ms (min 1114, max 2431)
- opus median 2454ms (min 1224, max 3170)

View File

@@ -51,6 +51,7 @@ Notes:
- Uses `browser.controlUrl` unless `controlUrl` is passed explicitly.
- `snapshot` defaults to `ai`; use `aria` for the accessibility tree.
- `act` requires `ref` from `snapshot --format ai`; use `evaluate` for rare CSS selector needs.
- Avoid `act``wait` by default; use it only in exceptional cases (no reliable UI state to wait on).
### `clawdis_canvas`
Drive the node Canvas (present, eval, snapshot, A2UI).