diff --git a/README.md b/README.md index 209f401..a3db7c6 100644 --- a/README.md +++ b/README.md @@ -573,3 +573,201 @@ adb devices year={2025} } ``` + +--- + +## 自动化部署指南(面向 AI) + +> **本章节专为 AI 助手(如 Claude Code)设计,用于自动化部署 Open-AutoGLM。** +> +> 如果你是人类读者,可以跳过本章节,按照上面的文档操作即可。 + +--- + +### 项目概述 + +Open-AutoGLM 是一个手机 Agent 框架: +- **输入**:用户的自然语言指令(如"打开微信发消息给张三") +- **输出**:自动操作用户的安卓手机完成任务 +- **原理**:截图 → 视觉模型理解界面 → 输出点击坐标 → ADB 执行操作 → 循环 + +架构分为两部分: +1. **Agent 代码**(本仓库):运行在用户电脑上,负责调用模型、解析动作、控制手机 +2. **视觉模型服务**:可以是远程 API,也可以本地部署 + +--- + +### 部署前置检查 + +在开始部署前,请逐项向用户确认以下内容: + +#### 硬件环境 +- [ ] 用户有一台安卓手机(Android 7.0+) +- [ ] 用户有一根支持数据传输的 USB 数据线(不是仅充电线) +- [ ] 手机和电脑可以通过数据线连接 + +#### 手机端配置 +- [ ] 手机已开启「开发者模式」(设置 → 关于手机 → 连续点击版本号 7 次) +- [ ] 手机已开启「USB 调试」(设置 → 开发者选项 → USB 调试) +- [ ] 部分机型需要同时开启「USB 调试(安全设置)」 +- [ ] 手机已安装 ADB Keyboard 应用(下载地址:https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) +- [ ] ADB Keyboard 已在系统设置中启用(设置 → 语言和输入法 → 启用 ADB Keyboard) + +#### 模型服务确认(二选一) + +**请明确询问用户:你是否已有可用的 AutoGLM 模型服务?** + +- **选项 A:使用已部署的模型服务(推荐)** + - 用户提供模型服务的 URL(如 `http://xxx.xxx.xxx.xxx:8000/v1`) + - 无需本地 GPU,无需下载模型 + - 直接使用该 URL 作为 `--base-url` 参数 + +- **选项 B:本地部署模型(高配置要求)** + - 需要 NVIDIA GPU(建议 24GB+ 显存) + - 需要安装 vLLM 或 SGLang + - 需要下载约 20GB 的模型文件 + - **如果用户是新手或不确定,强烈建议选择选项 A** + +--- + +### 部署流程 + +#### 阶段一:环境准备 + +```bash +# 1. 安装 ADB 工具 +# MacOS: +brew install android-platform-tools +# 或手动下载:https://developer.android.com/tools/releases/platform-tools + +# Windows: 下载后解压,添加到 PATH 环境变量 + +# 2. 验证 ADB 安装 +adb version +# 应输出版本信息 + +# 3. 连接手机并验证 +# 用数据线连接手机,手机上点击「允许 USB 调试」 +adb devices +# 应输出设备列表,如: +# List of devices attached +# XXXXXXXX device +``` + +**如果 `adb devices` 显示空列表或 unauthorized:** +1. 检查手机上是否弹出授权框,点击「允许」 +2. 检查 USB 调试是否开启 +3. 尝试更换数据线或 USB 接口 +4. 执行 `adb kill-server && adb start-server` 后重试 + +#### 阶段二:安装 Agent + +```bash +# 1. 克隆仓库(如果还没有克隆) +git clone https://github.com/zai-org/Open-AutoGLM.git +cd Open-AutoGLM + +# 2. 创建虚拟环境(推荐) +python -m venv venv +source venv/bin/activate # Windows: venv\Scripts\activate + +# 3. 安装依赖 +pip install -r requirements.txt +pip install -e . +``` + +**注意:不需要 clone 模型仓库,模型通过 API 调用。** + +#### 阶段三:配置模型服务 + +**如果用户选择选项 A(使用已部署的模型):** +- 直接使用用户提供的 URL +- 跳过本地模型部署步骤 + +**如果用户选择选项 B(本地部署模型):** + +```bash +# 1. 安装 vLLM +pip install vllm + +# 2. 启动模型服务(会自动下载模型,约 20GB) +python3 -m vllm.entrypoints.openai.api_server \ + --served-model-name autoglm-phone-9b \ + --allowed-local-media-path / \ + --mm-encoder-tp-mode data \ + --mm_processor_cache_type shm \ + --mm_processor_kwargs "{\"max_pixels\":5000000}" \ + --max-model-len 25480 \ + --chat-template-content-format string \ + --limit-mm-per-prompt "{\"image\":10}" \ + --model zai-org/AutoGLM-Phone-9B \ + --port 8000 + +# 模型服务 URL 为:http://localhost:8000/v1 +``` + +#### 阶段四:验证部署 + +```bash +# 在 Open-AutoGLM 目录下执行 +# 将 {MODEL_URL} 替换为实际的模型服务地址 + +python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b" "打开微信,对文件传输助手发送消息:部署成功" +``` + +**预期结果:** +- 手机自动打开微信 +- 自动搜索「文件传输助手」 +- 自动发送消息「部署成功」 + +--- + +### 异常处理 + +| 错误现象 | 可能原因 | 解决方案 | +|---------|---------|---------| +| `adb devices` 无输出 | USB 调试未开启或数据线问题 | 检查开发者选项,更换数据线 | +| `adb devices` 显示 unauthorized | 手机未授权 | 手机上点击「允许 USB 调试」| +| 能打开应用但无法点击 | 缺少安全调试权限 | 开启「USB 调试(安全设置)」| +| 中文输入变成乱码或无输入 | ADB Keyboard 未启用 | 在系统设置中启用 ADB Keyboard | +| 截图返回黑屏 | 敏感页面(支付/银行) | 正常现象,系统会自动处理 | +| 连接模型服务失败 | URL 错误或服务未启动 | 检查 URL,确认服务正在运行 | +| `ModuleNotFoundError` | 依赖未安装 | 执行 `pip install -r requirements.txt` | + +--- + +### 部署要点 + +1. **优先确认手机连接**:在安装任何代码之前,先确保 `adb devices` 能看到设备 +2. **不要跳过 ADB Keyboard**:没有它,中文输入会失败 +3. **模型服务是外部依赖**:Agent 代码本身不包含模型,需要单独的模型服务 +4. **遇到权限问题先检查手机设置**:大部分问题都是手机端配置不完整 +5. **部署完成后用简单任务测试**:建议用「打开微信发消息给文件传输助手」作为验收标准 + +--- + +### 命令速查 + +```bash +# 检查 ADB 连接 +adb devices + +# 重启 ADB 服务 +adb kill-server && adb start-server + +# 安装依赖 +pip install -r requirements.txt && pip install -e . + +# 运行 Agent(交互模式) +python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b" + +# 运行 Agent(单次任务) +python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b" "你的任务描述" + +# 查看支持的应用列表 +python main.py --list-apps +``` + +--- + +**部署完成的标志:手机能自动执行用户的自然语言指令。** diff --git a/README_coding_agent.md b/README_coding_agent.md index 6f3c964..a8cf84b 100644 --- a/README_coding_agent.md +++ b/README_coding_agent.md @@ -1,9 +1,13 @@ -# Open-AutoGLM Quick Start / 快速开始 +# Open-AutoGLM Quick Start for Coding Agent
+> **本文专为AI助手(如ClaudeCode)阅读,用于自动化部部署Open-AutoGLM。** +> +> 如果你是人类读者, 可以跳过本文, 按照README.md文档接操作即可。 +> > Quick deployment guide for Phone Agent desktop SDK > Phone Agent 桌面端 SDK 快速部署指南 diff --git a/README_en.md b/README_en.md index 1abdd41..89b9d80 100644 --- a/README_en.md +++ b/README_en.md @@ -488,3 +488,201 @@ If you find our work helpful, please cite the following papers: year={2025} } ``` + +--- + +## Automated Deployment Guide (For AI Assistants) + +> **This section is specifically designed for AI assistants (such as Claude Code) to automate the deployment of Open-AutoGLM.** +> +> If you are a human reader, you can skip this section and follow the documentation above. + +--- + +### Project Overview + +Open-AutoGLM is a phone agent framework: +- **Input**: User's natural language instructions (e.g., "Open WhatsApp and send a message to John") +- **Output**: Automatically operates the user's Android phone to complete tasks +- **Mechanism**: Screenshot → Vision model understands interface → Outputs tap coordinates → ADB executes actions → Loop + +The architecture consists of two parts: +1. **Agent Code** (this repository): Runs on the user's computer, responsible for calling models, parsing actions, and controlling the phone +2. **Vision Model Service**: Can be a remote API or deployed locally + +--- + +### Pre-Deployment Checklist + +Before starting deployment, confirm the following items with the user: + +#### Hardware Requirements +- [ ] User has an Android phone (Android 7.0+) +- [ ] User has a USB cable that supports data transfer (not just charging) +- [ ] Phone and computer can be connected via USB cable + +#### Phone Configuration +- [ ] Phone has Developer Mode enabled (Settings → About Phone → Tap Build Number 7 times) +- [ ] Phone has USB Debugging enabled (Settings → Developer Options → USB Debugging) +- [ ] Some models require enabling "USB Debugging (Security Settings)" as well +- [ ] ADB Keyboard app is installed (Download: https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) +- [ ] ADB Keyboard is enabled in system settings (Settings → Language & Input → Enable ADB Keyboard) + +#### Model Service Confirmation (Choose One) + +**Ask the user explicitly: Do you already have access to an AutoGLM model service?** + +- **Option A: Use an already-deployed model service (Recommended)** + - User provides the model service URL (e.g., `http://xxx.xxx.xxx.xxx:8000/v1`) + - No local GPU required, no model download needed + - Use this URL directly as the `--base-url` parameter + +- **Option B: Deploy model locally (High system requirements)** + - Requires NVIDIA GPU (24GB+ VRAM recommended) + - Requires installation of vLLM or SGLang + - Requires downloading approximately 20GB of model files + - **If the user is a beginner or unsure, strongly recommend Option A** + +--- + +### Deployment Process + +#### Phase 1: Environment Setup + +```bash +# 1. Install ADB tools +# MacOS: +brew install android-platform-tools +# Or download manually: https://developer.android.com/tools/releases/platform-tools + +# Windows: Download, extract, and add to PATH environment variable + +# 2. Verify ADB installation +adb version +# Should output version information + +# 3. Connect phone and verify +# Connect phone via USB cable, tap "Allow USB debugging" on phone +adb devices +# Should output device list, e.g.: +# List of devices attached +# XXXXXXXX device +``` + +**If `adb devices` shows empty list or unauthorized:** +1. Check if authorization popup appeared on phone, tap "Allow" +2. Check if USB debugging is enabled +3. Try a different cable or USB port +4. Run `adb kill-server && adb start-server` and retry + +#### Phase 2: Install Agent + +```bash +# 1. Clone repository (if not already cloned) +git clone https://github.com/zai-org/Open-AutoGLM.git +cd Open-AutoGLM + +# 2. Create virtual environment (recommended) +python -m venv venv +source venv/bin/activate # Windows: venv\Scripts\activate + +# 3. Install dependencies +pip install -r requirements.txt +pip install -e . +``` + +**Note: No need to clone model repository; models are called via API.** + +#### Phase 3: Configure Model Service + +**If user chooses Option A (using already-deployed model):** +- Use the URL provided by the user directly +- Skip local model deployment steps + +**If user chooses Option B (deploy model locally):** + +```bash +# 1. Install vLLM +pip install vllm + +# 2. Start model service (will auto-download model, ~20GB) +python3 -m vllm.entrypoints.openai.api_server \ + --served-model-name autoglm-phone-9b-multilingual \ + --allowed-local-media-path / \ + --mm-encoder-tp-mode data \ + --mm_processor_cache_type shm \ + --mm_processor_kwargs "{\"max_pixels\":5000000}" \ + --max-model-len 25480 \ + --chat-template-content-format string \ + --limit-mm-per-prompt "{\"image\":10}" \ + --model zai-org/AutoGLM-Phone-9B-Multilingual \ + --port 8000 + +# Model service URL: http://localhost:8000/v1 +``` + +#### Phase 4: Verify Deployment + +```bash +# Execute in the Open-AutoGLM directory +# Replace {MODEL_URL} with the actual model service address + +python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b-multilingual" "Open Gmail and send an email to File Transfer Assistant: Deployment successful" +``` + +**Expected Result:** +- Phone automatically opens Gmail +- Automatically searches for recipient +- Automatically sends the message "Deployment successful" + +--- + +### Troubleshooting + +| Error Symptom | Possible Cause | Solution | +|---------------|----------------|----------| +| `adb devices` shows nothing | USB debugging not enabled or cable issue | Check developer options, replace cable | +| `adb devices` shows unauthorized | Phone not authorized | Tap "Allow USB debugging" on phone | +| Can open apps but cannot tap | Missing security debugging permission | Enable "USB Debugging (Security Settings)" | +| Chinese/text input corrupted or missing | ADB Keyboard not enabled | Enable ADB Keyboard in system settings | +| Screenshot returns black screen | Sensitive page (payment/banking) | Normal behavior, system will handle automatically | +| Cannot connect to model service | Wrong URL or service not running | Check URL, confirm service is running | +| `ModuleNotFoundError` | Dependencies not installed | Run `pip install -r requirements.txt` | + +--- + +### Deployment Key Points + +1. **Prioritize confirming phone connection**: Before installing any code, ensure `adb devices` can see the device +2. **Don't skip ADB Keyboard**: Without it, text input will fail +3. **Model service is an external dependency**: Agent code doesn't include the model; a separate model service is required +4. **Check phone settings first for permission issues**: Most problems are due to incomplete phone-side configuration +5. **Test with simple tasks after deployment**: Recommend using "Open Gmail and send message to File Transfer Assistant" as acceptance criteria + +--- + +### Command Quick Reference + +```bash +# Check ADB connection +adb devices + +# Restart ADB service +adb kill-server && adb start-server + +# Install dependencies +pip install -r requirements.txt && pip install -e . + +# Run Agent (interactive mode) +python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b-multilingual" + +# Run Agent (single task) +python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b-multilingual" "your task description" + +# View supported apps list +python main.py --list-apps +``` + +--- + +**Deployment success indicator: The phone can automatically execute user's natural language instructions.**