Merge remote-tracking branch 'origin/main' into upstream-preview-nix-2025-12-20
This commit is contained in:
@@ -193,6 +193,7 @@ Notes:
|
||||
- The arm default timeout is **2 minutes** (clamped to max 2 minutes); pass `timeoutMs` if you need shorter.
|
||||
- `snapshot` defaults to `ai`; `aria` returns an accessibility tree for debugging.
|
||||
- `click`/`type` require `ref` from `snapshot --format ai`; use `evaluate` for rare CSS selector one-offs.
|
||||
- Avoid `wait` by default; use it only in exceptional cases when there is no reliable UI state to wait on.
|
||||
|
||||
## Security & privacy notes
|
||||
|
||||
|
||||
@@ -35,6 +35,7 @@ All camera access is gated behind **user-controlled settings**.
|
||||
- `format: "jpg"`
|
||||
- `base64: "<...>"`
|
||||
- `width`, `height`
|
||||
- Payload guard: photos are recompressed to keep the base64 payload under 5 MB.
|
||||
|
||||
- `camera.clip`
|
||||
- Params:
|
||||
@@ -90,6 +91,10 @@ If permissions are missing, the app will prompt when possible; if denied, `camer
|
||||
|
||||
Like `canvas.*`, the Android node only allows `camera.*` commands in the **foreground**. Background invocations return `NODE_BACKGROUND_UNAVAILABLE`.
|
||||
|
||||
### Payload guard
|
||||
|
||||
Photos are recompressed to keep the base64 payload under 5 MB.
|
||||
|
||||
## macOS app
|
||||
|
||||
### User setting (default off)
|
||||
@@ -116,6 +121,7 @@ clawdis nodes camera clip --node <id> --no-audio
|
||||
|
||||
Notes:
|
||||
- `clawdis nodes camera snap` defaults to `maxWidth=1600` unless overridden.
|
||||
- Photo payloads are recompressed to keep base64 under 5 MB.
|
||||
|
||||
## Safety + practical limits
|
||||
|
||||
|
||||
@@ -195,6 +195,28 @@ Controls inbound/outbound prefixes and timestamps.
|
||||
}
|
||||
```
|
||||
|
||||
### `talk`
|
||||
|
||||
Defaults for Talk mode (macOS/iOS/Android). Voice IDs fall back to `ELEVENLABS_VOICE_ID` or `SAG_VOICE_ID` when unset.
|
||||
`apiKey` falls back to `ELEVENLABS_API_KEY` (or the gateway’s shell profile) when unset.
|
||||
`voiceAliases` lets Talk directives use friendly names (e.g. `"voice":"Clawd"`).
|
||||
|
||||
```json5
|
||||
{
|
||||
talk: {
|
||||
voiceId: "elevenlabs_voice_id",
|
||||
voiceAliases: {
|
||||
Clawd: "EXAVITQu4vr4xnSDxMaL",
|
||||
Roger: "CwhRBWXzGAHq8TQ4Fs17"
|
||||
},
|
||||
modelId: "eleven_v3",
|
||||
outputFormat: "mp3_44100_128",
|
||||
apiKey: "elevenlabs_api_key",
|
||||
interruptOnSpeech: true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### `agent`
|
||||
|
||||
Controls the embedded agent runtime (model/thinking/verbose/timeouts).
|
||||
@@ -237,6 +259,8 @@ Controls the embedded agent runtime (model/thinking/verbose/timeouts).
|
||||
If `modelAliases` is configured, you may also use the alias key (e.g. `Opus`).
|
||||
If you omit the provider, CLAWDIS currently assumes `anthropic` as a temporary
|
||||
deprecation fallback.
|
||||
Z.AI models are available as `zai/<model>` (e.g. `zai/glm-4.7`) and require
|
||||
`ZAI_API_KEY` (or legacy `Z_AI_API_KEY`) in the environment.
|
||||
|
||||
`agent.heartbeat` configures periodic heartbeat runs:
|
||||
- `every`: duration string (`ms`, `s`, `m`, `h`); default unit minutes. Omit or set
|
||||
@@ -445,6 +469,20 @@ Defaults:
|
||||
}
|
||||
```
|
||||
|
||||
### `ui` (Appearance)
|
||||
|
||||
Optional accent color used by the native apps for UI chrome (e.g. Talk Mode bubble tint).
|
||||
|
||||
If unset, clients fall back to a muted light-blue.
|
||||
|
||||
```json5
|
||||
{
|
||||
ui: {
|
||||
seamColor: "#FF4500" // hex (RRGGBB or #RRGGBB)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### `gateway` (Gateway server mode + bind)
|
||||
|
||||
Use `gateway.mode` to explicitly declare whether this machine should run the Gateway.
|
||||
|
||||
@@ -7,11 +7,12 @@ read_when:
|
||||
# Logging (macOS)
|
||||
|
||||
## Rolling diagnostics file log (Debug pane)
|
||||
Clawdis can write a local, rotating diagnostics log to disk (useful when macOS unified logging is impractical during iterative repros).
|
||||
Clawdis routes macOS app logs through swift-log (unified logging by default) and can write a local, rotating file log to disk when you need a durable capture.
|
||||
|
||||
- Enable: **Debug pane → Diagnostics log → “Write rolling diagnostics log (JSONL)”**
|
||||
- Verbosity: **Debug pane → Logs → App logging → Verbosity**
|
||||
- Enable: **Debug pane → Logs → App logging → “Write rolling diagnostics log (JSONL)”**
|
||||
- Location: `~/Library/Logs/Clawdis/diagnostics.jsonl` (rotates automatically; old files are suffixed with `.1`, `.2`, …)
|
||||
- Clear: **Debug pane → Diagnostics log → “Clear”**
|
||||
- Clear: **Debug pane → Logs → App logging → “Clear”**
|
||||
|
||||
Notes:
|
||||
- This is **off by default**. Enable only while actively debugging.
|
||||
|
||||
@@ -8,6 +8,7 @@ read_when:
|
||||
## What is shown
|
||||
- We surface the current agent work state in the menu bar icon and in the first status row of the menu.
|
||||
- Health status is hidden while work is active; it returns when all sessions are idle.
|
||||
- The “Nodes” block in the menu lists **devices** only (gateway bridge nodes via `node.list`), not client/presence entries.
|
||||
|
||||
## State model
|
||||
- Sessions: events arrive with `runId` (session key). The “main” session is the key `main`; if absent, we fall back to the most recently updated session.
|
||||
|
||||
79
docs/talk.md
Normal file
79
docs/talk.md
Normal file
@@ -0,0 +1,79 @@
|
||||
---
|
||||
summary: "Talk mode: continuous speech conversations with ElevenLabs TTS"
|
||||
read_when:
|
||||
- Implementing Talk mode on macOS/iOS/Android
|
||||
- Changing voice/TTS/interrupt behavior
|
||||
---
|
||||
# Talk Mode
|
||||
|
||||
Talk mode is a continuous voice conversation loop:
|
||||
1) Listen for speech
|
||||
2) Send transcript to the model (main session, chat.send)
|
||||
3) Wait for the response
|
||||
4) Speak it via ElevenLabs (streaming playback)
|
||||
|
||||
## Behavior (macOS)
|
||||
- **Always-on overlay** while Talk mode is enabled.
|
||||
- **Listening → Thinking → Speaking** phase transitions.
|
||||
- On a **short pause** (silence window), the current transcript is sent.
|
||||
- Replies are **written to WebChat** (same as typing).
|
||||
- **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.
|
||||
|
||||
## Voice directives in replies
|
||||
The assistant may prefix its reply with a **single JSON line** to control voice:
|
||||
|
||||
```json
|
||||
{"voice":"<voice-id>","once":true}
|
||||
```
|
||||
|
||||
Rules:
|
||||
- First non-empty line only.
|
||||
- Unknown keys are ignored.
|
||||
- `once: true` applies to the current reply only.
|
||||
- Without `once`, the voice becomes the new default for Talk mode.
|
||||
- The JSON line is stripped before TTS playback.
|
||||
|
||||
Supported keys:
|
||||
- `voice` / `voice_id` / `voiceId`
|
||||
- `model` / `model_id` / `modelId`
|
||||
- `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`
|
||||
- `seed`, `normalize`, `lang`, `output_format`, `latency_tier`
|
||||
- `once`
|
||||
|
||||
## Config (clawdis.json)
|
||||
```json5
|
||||
{
|
||||
"talk": {
|
||||
"voiceId": "elevenlabs_voice_id",
|
||||
"modelId": "eleven_v3",
|
||||
"outputFormat": "mp3_44100_128",
|
||||
"apiKey": "elevenlabs_api_key",
|
||||
"interruptOnSpeech": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Defaults:
|
||||
- `interruptOnSpeech`: true
|
||||
- `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available)
|
||||
- `modelId`: defaults to `eleven_v3` when unset
|
||||
- `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available)
|
||||
- `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)
|
||||
|
||||
## macOS UI
|
||||
- Menu bar toggle: **Talk**
|
||||
- Config tab: **Talk Mode** group (voice id + interrupt toggle)
|
||||
- Overlay:
|
||||
- **Listening**: cloud pulses with mic level
|
||||
- **Thinking**: sinking animation
|
||||
- **Speaking**: radiating rings
|
||||
- Click cloud: stop speaking
|
||||
- Click X: exit Talk mode
|
||||
|
||||
## Notes
|
||||
- Requires Speech + Microphone permissions.
|
||||
- Uses `chat.send` against session key `main`.
|
||||
- TTS uses ElevenLabs streaming API with `ELEVENLABS_API_KEY` and incremental playback on macOS/iOS/Android for lower latency.
|
||||
- `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.
|
||||
- `latency_tier` is validated to `0..4` when set.
|
||||
- Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.
|
||||
13
docs/test.md
13
docs/test.md
@@ -7,3 +7,16 @@ read_when:
|
||||
|
||||
- `pnpm test:force`: Kills any lingering gateway process holding the default control port, then runs the full Vitest suite with an isolated gateway port so server tests don’t collide with a running instance. Use this when a prior gateway run left port 18789 occupied.
|
||||
- `pnpm test:coverage`: Runs Vitest with V8 coverage. Global thresholds are 70% lines/branches/functions/statements. Coverage excludes integration-heavy entrypoints (CLI wiring, gateway/telegram bridges, webchat static server) to keep the target focused on unit-testable logic.
|
||||
|
||||
## Model latency bench (local keys)
|
||||
|
||||
Script: `scripts/bench-model.ts`
|
||||
|
||||
Usage:
|
||||
- `source ~/.profile && pnpm tsx scripts/bench-model.ts --runs 10`
|
||||
- Optional env: `MINIMAX_API_KEY`, `MINIMAX_BASE_URL`, `MINIMAX_MODEL`, `ANTHROPIC_API_KEY`
|
||||
- Default prompt: “Reply with a single word: ok. No punctuation or extra text.”
|
||||
|
||||
Last run (2025-12-31, 20 runs):
|
||||
- minimax median 1279ms (min 1114, max 2431)
|
||||
- opus median 2454ms (min 1224, max 3170)
|
||||
|
||||
@@ -51,6 +51,7 @@ Notes:
|
||||
- Uses `browser.controlUrl` unless `controlUrl` is passed explicitly.
|
||||
- `snapshot` defaults to `ai`; use `aria` for the accessibility tree.
|
||||
- `act` requires `ref` from `snapshot --format ai`; use `evaluate` for rare CSS selector needs.
|
||||
- Avoid `act` → `wait` by default; use it only in exceptional cases (no reliable UI state to wait on).
|
||||
|
||||
### `clawdis_canvas`
|
||||
Drive the node Canvas (present, eval, snapshot, A2UI).
|
||||
|
||||
Reference in New Issue
Block a user