docs: document push-to-talk hotkey
This commit is contained in:
@@ -68,6 +68,7 @@ clawdis relay --verbose
|
||||
## macOS Companion App (Clawdis.app)
|
||||
|
||||
- **On-device Voice Wake:** listens for wake words (e.g. “Claude”) using Apple’s on-device speech recognizer (macOS 26+). macOS still shows the standard Speech/Mic permissions prompt, but audio stays on device.
|
||||
- **Push-to-talk (Cmd+Fn):** hold the hotkey to speak; the voice overlay shows live partials and sends when you release.
|
||||
- **Config tab:** pick the model from your local Pi model catalog (`pi-mono/packages/ai/src/models.generated.ts`), or enter a custom model ID; edit session store path and context tokens.
|
||||
- **Voice settings:** language + additional languages, mic picker, live level meter, trigger-word table, and a built-in test harness.
|
||||
- **Menu bar toggle:** enable/disable Voice Wake from the menu bar; respects Dock-icon preference.
|
||||
|
||||
@@ -1,28 +1,34 @@
|
||||
# Voice Wake Pipeline
|
||||
# Voice Wake & Push-to-Talk
|
||||
|
||||
Updated: 2025-12-08 · Owners: mac app
|
||||
|
||||
## Runtime behavior
|
||||
- Always-on listener (Speech framework) waits for any trigger word.
|
||||
- On first trigger hit: start capture, raise ears immediately via `AppState.triggerVoiceEars(ttl: nil)`, reset capture buffer.
|
||||
- While capturing: keep buffer in sync with partial transcripts; update `lastHeard` whenever audio arrives.
|
||||
- End capture when 1.0s of silence is observed (or 8s hard stop), then call `stopVoiceEars()`, prepend the voice-prefix string, send once to Claude, and restart the recognizer for a clean next trigger. A short 350ms debounce prevents double-fires.
|
||||
## Modes
|
||||
- **Wake-word mode** (default): always-on Speech recognizer waits for trigger tokens (`swabbleTriggerWords`). On match it starts capture, shows the overlay with partial text, and auto-sends after silence.
|
||||
- **Push-to-talk (Cmd+Fn)**: hold Cmd+Fn to capture immediately—no trigger needed. The overlay appears while held; releasing finalizes and forwards after a short delay so you can tweak text.
|
||||
|
||||
## Visual states
|
||||
- **Listening for trigger:** idle icon.
|
||||
- **Wake word detected / capturing:** ears enlarged with holes; stays up until silence end, not a fixed timer.
|
||||
- **After send:** ears drop immediately when silence window elapses; icon returns to idle.
|
||||
## Runtime behavior (wake-word)
|
||||
- Speech recognizer lives in `VoiceWakeRuntime`.
|
||||
- Silence windows: 2.0s when speech is flowing, 5.0s if only the trigger was heard.
|
||||
- Hard stop: 120s to prevent runaway sessions.
|
||||
- Debounce between sessions: 350ms.
|
||||
- Overlay is driven via `VoiceWakeOverlayController` with committed/volatile coloring.
|
||||
- After send, recognizer restarts cleanly to listen for the next trigger.
|
||||
|
||||
## Push-to-talk specifics
|
||||
- Hotkey detection uses a global `.flagsChanged` monitor: Fn is `keyCode 63` and flagged via `.function`; Command is `keyCode 55/54`. We only **observe** events (no swallowing).
|
||||
- Capture pipeline lives in `VoicePushToTalk`: starts Speech immediately, streams partials to the overlay, and calls `VoiceWakeForwarder` on release.
|
||||
- When push-to-talk starts we pause the wake-word runtime to avoid dueling audio taps; it restarts automatically after release.
|
||||
- Permissions: requires Microphone + Speech. macOS will prompt the first time; seeing events needs Accessibility approval.
|
||||
- Fn caveat: some external keyboards don’t expose Fn; fall back to a standard shortcut if needed.
|
||||
|
||||
## User-facing settings
|
||||
- **Voice Wake** toggle: enables wake-word runtime.
|
||||
- **Hold Cmd+Fn to talk**: enables the push-to-talk monitor. Disabled on macOS < 26.
|
||||
- Language & mic pickers, live level meter, trigger-word table, tester, forward target/command all remain unchanged.
|
||||
|
||||
## Forwarding payload
|
||||
- Uses `VoiceWakeForwarder.prefixedTranscript(_:)` to prepend the model hint:
|
||||
`User talked via voice recognition on <machine> - repeat prompt first + remember some words might be incorrectly transcribed.`
|
||||
- Machine name resolves to Host.localizedName or hostName; caller can override for tests.
|
||||
- `VoiceWakeForwarder.prefixedTranscript(_:)` prepends the machine hint before sending. Shared between wake-word and push-to-talk paths.
|
||||
|
||||
## Testing hooks
|
||||
- Settings tester mirrors runtime: same capture/silence flow, same prefix, same ear behavior.
|
||||
- Unit test: `VoiceWakeForwarderTests.prefixedTranscriptUsesMachineName` covers the prefix format.
|
||||
|
||||
## Tuning knobs (swift constants)
|
||||
- Silence window: 1.0s (`silenceWindow` in `VoiceWakeRuntime`).
|
||||
- Hard stop after trigger: 8s (`captureHardStop`).
|
||||
- Post-send debounce: 0.35s (`debounceAfterSend`).
|
||||
## Quick verification
|
||||
- Toggle push-to-talk on, hold Cmd+Fn, speak, release: overlay should show partials then send.
|
||||
- While holding, menu-bar ears should stay enlarged (uses `triggerVoiceEars(ttl:nil)`); they drop after release.
|
||||
|
||||
Reference in New Issue
Block a user