Files
clawdbot/docs/mac/voicewake.md
2025-12-12 16:26:19 +00:00

2.8 KiB

summary, read_when
summary read_when
Voice wake and push-to-talk modes plus routing details in the mac app
Working on voice wake or PTT pathways

Voice Wake & Push-to-Talk

Updated: 2025-12-12 · Owners: mac app

Modes

  • Wake-word mode (default): always-on Speech recognizer waits for trigger tokens (swabbleTriggerWords). On match it starts capture, shows the overlay with partial text, and auto-sends after silence.
  • Push-to-talk (Right Option hold): hold the right Option key to capture immediately—no trigger needed. The overlay appears while held; releasing finalizes and forwards after a short delay so you can tweak text.

Runtime behavior (wake-word)

  • Speech recognizer lives in VoiceWakeRuntime.
  • Silence windows: 2.0s when speech is flowing, 5.0s if only the trigger was heard.
  • Hard stop: 120s to prevent runaway sessions.
  • Debounce between sessions: 350ms.
  • Overlay is driven via VoiceWakeOverlayController with committed/volatile coloring.
  • After send, recognizer restarts cleanly to listen for the next trigger.

Push-to-talk specifics

  • Hotkey detection uses a global .flagsChanged monitor for right Option (keyCode 61 + .option). We only observe events (no swallowing).
  • Capture pipeline lives in VoicePushToTalk: starts Speech immediately, streams partials to the overlay, and calls VoiceWakeForwarder on release.
  • When push-to-talk starts we pause the wake-word runtime to avoid dueling audio taps; it restarts automatically after release.
  • Permissions: requires Microphone + Speech; seeing events needs Accessibility/Input Monitoring approval.
  • External keyboards: some may not expose right Option as expected—offer a fallback shortcut if users report misses.

User-facing settings

  • Voice Wake toggle: enables wake-word runtime.
  • Hold Cmd+Fn to talk: enables the push-to-talk monitor. Disabled on macOS < 26.
  • Language & mic pickers, live level meter, trigger-word table, tester.
  • Sounds: chimes on trigger detect and on send; defaults to the macOS “Glass” system sound. You can pick any NSSound-loadable file (e.g. MP3/WAV/AIFF) for each event or choose No Sound.

Forwarding behavior

  • When Voice Wake is enabled, transcripts are forwarded to the active gateway/agent (the same local vs remote mode used by the rest of the mac app).
  • Replies are delivered to the last-used main surface (WhatsApp/Telegram/WebChat). If delivery fails, the error is logged and the run is still visible via WebChat/session logs.

Forwarding payload

  • VoiceWakeForwarder.prefixedTranscript(_:) prepends the machine hint before sending. Shared between wake-word and push-to-talk paths.

Quick verification

  • Toggle push-to-talk on, hold Cmd+Fn, speak, release: overlay should show partials then send.
  • While holding, menu-bar ears should stay enlarged (uses triggerVoiceEars(ttl:nil)); they drop after release.