10 KiB
summary, read_when
| summary | read_when | |||
|---|---|---|---|---|
| Plan for an iOS voice + canvas node that connects via a secure Bonjour-discovered macOS bridge |
|
iOS Node (internal) — Voice Trigger + Canvas
Status: prototype implemented (internal) · Date: 2025-12-13
Runbook (how to connect/pair + drive Canvas): docs/ios/connect.md
Goals
- Build an iOS app that acts as a remote node for Clawdis:
- Voice trigger (wake-word / always-listening intent) that forwards transcripts to the Gateway
agentmethod. - Canvas surface that the agent can control: navigate, draw/render, evaluate JS, snapshot.
- Voice trigger (wake-word / always-listening intent) that forwards transcripts to the Gateway
- Dead-simple setup:
- Auto-discover the host on the local network via Bonjour.
- One-tap pairing with an approval prompt on the Mac.
- iOS is never a local gateway; it is always a remote node.
- Operational clarity:
- When iOS is backgrounded, voice may still run; canvas commands must fail fast with a structured error.
- Provide settings: node display name, enable/disable voice wake, pairing status.
Non-goals (v1):
- Exposing the Node Gateway directly on the LAN.
- Supporting arbitrary third-party “plugins” on iOS.
- Perfect App Store compliance; this is internal-only initially.
Current repo reality (constraints we respect)
- The Gateway WebSocket server binds to
127.0.0.1:18789(src/gateway/server.ts) with an optionalCLAWDIS_GATEWAY_TOKEN. - macOS “Canvas” exists today, but is mac-only and controlled via mac app IPC (
clawdis-mac canvas ...) rather than the Gateway protocol (docs/mac/canvas.md). - Voice wake forwards via
GatewayChannelto Gatewayagent(mac app:VoiceWakeForwarder→GatewayConnection.sendAgent).
Recommended topology (B): Gateway-owned Bridge + loopback Gateway
Keep the Node gateway loopback-only; expose a dedicated gateway-owned bridge to the LAN/tailnet.
iOS App ⇄ (TLS + pairing) ⇄ Bridge (in gateway) ⇄ (loopback) ⇄ Gateway WS (ws://127.0.0.1:18789)
Why:
- Preserves current threat model: Gateway remains local-only.
- Centralizes auth, rate limiting, and allowlisting in the bridge.
- Lets us unify “canvas node” semantics across mac + iOS without exposing raw gateway methods.
Security plan (internal, but still robust)
Transport
- Current (v0): bridge is a LAN-facing TCP listener with token-based auth after pairing.
- Next: wrap the bridge in TLS and prefer key-pinned or mTLS-like auth after pairing.
Pairing
- Bonjour discovery shows a candidate “Clawdis Bridge” on the LAN.
- First connection:
- iOS generates a keypair (Secure Enclave if available).
- iOS connects to the bridge and requests pairing.
- The bridge forwards the pairing request to the Gateway as a pending request.
- Approval can happen via:
- macOS UI (Clawdis shows an alert with Approve/Reject/Later, including the node IP), or
- Terminal/CLI (headless flows).
- Once approved, the bridge returns a token to iOS; iOS stores it in Keychain.
- Subsequent connections:
- The bridge requires the paired identity. Unpaired clients get a structured “not paired” error and no access.
Gateway-owned pairing (Option B details)
Pairing decisions must be owned by the Gateway (clawd / Node) so nodes can be approved without the macOS app running.
Key idea:
- The Swift app may still show an alert, but it is only a frontend for pending requests stored in the Gateway.
Desired behavior:
- If the Swift UI is present: show alert with Approve/Reject/Later.
- If the Swift UI is not present:
clawdisCLI can list pending requests and approve/reject.
See docs/gateway/pairing.md for the API/events and storage.
CLI (headless approvals):
clawdis nodes pendingclawdis nodes approve <requestId>clawdis nodes reject <requestId>
Authorization / scope control (bridge-side ACL)
The bridge must not be a raw proxy to every gateway method.
- Allow by default:
agent(with guardrails; idempotency required)- minimal
system-eventbeacons (presence updates for the node) - node/canvas methods defined below (new protocol surface)
- Deny by default:
- anything that widens control without explicit intent (future “shell”, “files”, etc.)
- Rate limit:
- handshake attempts
- voice forwards per minute
- snapshot frequency / payload size
Protocol unification: add “node/canvas” to Gateway protocol
Principle
Unify mac Canvas + iOS Canvas under a single conceptual surface:
- The agent talks to the Gateway using a stable method set (typed protocol).
- The Gateway routes node-targeted requests to:
- local mac Canvas implementation, or
- remote iOS node via the bridge
Minimal protocol additions (v1)
Add to src/gateway/protocol/schema.ts (and regenerate Swift models):
Identity
- Node identity comes from
connect.params.client.instanceId(stable), andconnect.params.client.mode = "node"(or"ios-node").
Methods
node.list→ list paired/connected nodes + capabilitiesnode.describe→ describe a node (capabilities + supportednode.invokecommands)node.invoke→ send a command to a specific node- Params:
{ nodeId, command, params?, timeoutMs? }
- Params:
Events
node.event→ async node status/errors- e.g. background/foreground transitions, voice availability, canvas availability
Node command set (canvas)
These are values for node.invoke.command:
canvas.show/canvas.hidecanvas.navigatewith{ url }(Canvas URL or https URL; switches mode to"web")canvas.evalwith{ javaScript }canvas.snapshotwith{ maxWidth?, quality?, format? }canvas.setModewith{ mode: "canvas" | "web" }(use"canvas"to return to the scaffold)- A2UI (mobile + macOS canvas):
canvas.a2ui.pushwith{ messages: [...] }(A2UI v0.8 server→client messages)canvas.a2ui.pushJSONLwith{ jsonl: "..." }(legacy alias)canvas.a2ui.reset
Result pattern:
- Request is a standard
req/reswithok/error. - Long operations (loads, streaming drawing, etc.) may also emit
node.eventprogress.
Current (implemented)
As of 2025-12-13, the Gateway supports node.invoke for bridge-connected nodes.
Example: draw a diagonal line on the iOS Canvas:
clawdis nodes invoke --node ios-node --command canvas.eval --params '{"javaScript":"(() => { const {ctx} = window.__clawdis; ctx.clearRect(0,0,innerWidth,innerHeight); ctx.lineWidth=6; ctx.strokeStyle=\"#ff2d55\"; ctx.beginPath(); ctx.moveTo(40,40); ctx.lineTo(innerWidth-40, innerHeight-40); ctx.stroke(); return \"ok\"; })()"}'
Background behavior requirement
When iOS is backgrounded:
- Voice may still be active (subject to iOS suspension).
- All
canvas.*commands must fail with a stable error code, e.g.:NODE_BACKGROUND_UNAVAILABLE- Include
retryable: trueandretryAfterMsif we want the agent to wait.
iOS app architecture (SwiftUI)
App structure
- Single fullscreen Canvas surface (WKWebView).
- One settings entry point: a gear button that opens a settings sheet.
- All navigation/mode selection is agent-driven (no local URL bar).
Components
BridgeDiscovery: Bonjour browse + resolve (Network.frameworkNWBrowser)BridgeConnection: TCP session + pairing handshake + reconnect (TLS planned)NodeRuntime:- Voice pipeline (wake-word + capture + forward)
- Canvas pipeline (WKWebView controller + snapshot + eval)
- Background state tracking; enforces “canvas unavailable in background”
Voice in background (internal)
- Enable background audio mode (and required session configuration) so the mic pipeline can keep running when the user switches apps.
- If iOS suspends the app anyway, surface a clear node status (
node.event) so operators can see voice is unavailable.
Code sharing (macOS + iOS)
Create/expand SwiftPM targets so both apps share:
ClawdisProtocol(generated models; platform-neutral)ClawdisGatewayClient(shared WS framing + connect/req/res + seq-gap handling)ClawdisKit(node/canvas command types + deep links + shared utilities)
macOS continues to own:
- local Canvas implementation details (custom scheme handler serving on-disk HTML, window/panel presentation)
iOS owns:
- iOS-specific audio/speech + WKWebView presentation and lifecycle
Repo layout
- iOS app:
apps/ios/(XcodeGenproject.yml) - Shared Swift packages:
apps/shared/ - Lint/format: iOS target runs
swiftformat --lint+swiftlint lintusing repo configs (.swiftformat,.swiftlint.yml).
Generate the Xcode project:
cd apps/ios
xcodegen generate
open Clawdis.xcodeproj
Storage plan (private by default)
iOS
- Canvas/workspace files (persistent, private):
Application Support/Clawdis/canvas/<sessionKey>/...
- Snapshots / temp exports (evictable):
Library/Caches/Clawdis/canvas-snapshots/<sessionKey>/...
- Credentials:
- Keychain (paired identity + bridge trust anchor)
macOS
- Keep current Canvas root (already implemented):
~/Library/Application Support/Clawdis/canvas/<session>/...
- Bridge state:
- No local pairing store (pairing is gateway-owned).
- Any local bridge-only state should remain private under Application Support.
Gateway (node)
- Pairing (source of truth):
~/.clawdis/nodes/paired.json~/.clawdis/nodes/pending.json(orpending/*.jsonfor auditability)
Rollout plan (phased)
- Bridge discovery + pairing (mac + iOS)
- Bonjour browse + resolve
- Approve prompt on mac
- Persist pairing in Keychain/App Support
- Voice-only node
- iOS voice wake toggle
- Forward transcript to Gateway
agentvia bridge - Presence beacons via
system-event(or node.event)
- Protocol additions for nodes
- Add
node.list/node.invoke/node.eventto Gateway - Implement bridge routing + ACLs
- Add
- iOS canvas
- WKWebView canvas surface
canvas.navigate/eval/snapshot- Background fast-fail for
canvas.*
- Unify mac Canvas under the same node.invoke
- Keep existing implementation, but expose it through the unified protocol path so the agent uses one API.
Open questions
- Should
connect.params.client.modebe"node"withplatform="ios ..."or a distinct mode"ios-node"? (Presence filtering currently excludes"cli"only.) - Do we want a “permissions” model per node (voice only vs voice+canvas) at pairing time?
- Should “website mode” allow arbitrary https, or enforce an allowlist to reduce risk?