Merge pull request #52 from zai-org/update-readme-en-1210

add guide for coding agent
This commit is contained in:
yongbin-buaa
2025-12-11 01:20:31 +08:00
committed by GitHub
3 changed files with 401 additions and 1 deletions

198
README.md
View File

@@ -573,3 +573,201 @@ adb devices
year={2025}
}
```
---
## 自动化部署指南(面向 AI
> **本章节专为 AI 助手(如 Claude Code设计用于自动化部署 Open-AutoGLM。**
>
> 如果你是人类读者,可以跳过本章节,按照上面的文档操作即可。
---
### 项目概述
Open-AutoGLM 是一个手机 Agent 框架:
- **输入**:用户的自然语言指令(如"打开微信发消息给张三"
- **输出**:自动操作用户的安卓手机完成任务
- **原理**:截图 → 视觉模型理解界面 → 输出点击坐标 → ADB 执行操作 → 循环
架构分为两部分:
1. **Agent 代码**(本仓库):运行在用户电脑上,负责调用模型、解析动作、控制手机
2. **视觉模型服务**:可以是远程 API也可以本地部署
---
### 部署前置检查
在开始部署前,请逐项向用户确认以下内容:
#### 硬件环境
- [ ] 用户有一台安卓手机Android 7.0+
- [ ] 用户有一根支持数据传输的 USB 数据线(不是仅充电线)
- [ ] 手机和电脑可以通过数据线连接
#### 手机端配置
- [ ] 手机已开启「开发者模式」(设置 → 关于手机 → 连续点击版本号 7 次)
- [ ] 手机已开启「USB 调试」(设置 → 开发者选项 → USB 调试)
- [ ] 部分机型需要同时开启「USB 调试(安全设置)」
- [ ] 手机已安装 ADB Keyboard 应用下载地址https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk
- [ ] ADB Keyboard 已在系统设置中启用(设置 → 语言和输入法 → 启用 ADB Keyboard
#### 模型服务确认(二选一)
**请明确询问用户:你是否已有可用的 AutoGLM 模型服务?**
- **选项 A使用已部署的模型服务推荐**
- 用户提供模型服务的 URL如 `http://xxx.xxx.xxx.xxx:8000/v1`
- 无需本地 GPU无需下载模型
- 直接使用该 URL 作为 `--base-url` 参数
- **选项 B本地部署模型高配置要求**
- 需要 NVIDIA GPU建议 24GB+ 显存)
- 需要安装 vLLM 或 SGLang
- 需要下载约 20GB 的模型文件
- **如果用户是新手或不确定,强烈建议选择选项 A**
---
### 部署流程
#### 阶段一:环境准备
```bash
# 1. 安装 ADB 工具
# MacOS:
brew install android-platform-tools
# 或手动下载https://developer.android.com/tools/releases/platform-tools
# Windows: 下载后解压,添加到 PATH 环境变量
# 2. 验证 ADB 安装
adb version
# 应输出版本信息
# 3. 连接手机并验证
# 用数据线连接手机,手机上点击「允许 USB 调试」
adb devices
# 应输出设备列表,如:
# List of devices attached
# XXXXXXXX device
```
**如果 `adb devices` 显示空列表或 unauthorized**
1. 检查手机上是否弹出授权框,点击「允许」
2. 检查 USB 调试是否开启
3. 尝试更换数据线或 USB 接口
4. 执行 `adb kill-server && adb start-server` 后重试
#### 阶段二:安装 Agent
```bash
# 1. 克隆仓库(如果还没有克隆)
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
# 2. 创建虚拟环境(推荐)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. 安装依赖
pip install -r requirements.txt
pip install -e .
```
**注意:不需要 clone 模型仓库,模型通过 API 调用。**
#### 阶段三:配置模型服务
**如果用户选择选项 A使用已部署的模型**
- 直接使用用户提供的 URL
- 跳过本地模型部署步骤
**如果用户选择选项 B本地部署模型**
```bash
# 1. 安装 vLLM
pip install vllm
# 2. 启动模型服务(会自动下载模型,约 20GB
python3 -m vllm.entrypoints.openai.api_server \
--served-model-name autoglm-phone-9b \
--allowed-local-media-path / \
--mm-encoder-tp-mode data \
--mm_processor_cache_type shm \
--mm_processor_kwargs "{\"max_pixels\":5000000}" \
--max-model-len 25480 \
--chat-template-content-format string \
--limit-mm-per-prompt "{\"image\":10}" \
--model zai-org/AutoGLM-Phone-9B \
--port 8000
# 模型服务 URL 为http://localhost:8000/v1
```
#### 阶段四:验证部署
```bash
# 在 Open-AutoGLM 目录下执行
# 将 {MODEL_URL} 替换为实际的模型服务地址
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b" "打开微信,对文件传输助手发送消息:部署成功"
```
**预期结果:**
- 手机自动打开微信
- 自动搜索「文件传输助手」
- 自动发送消息「部署成功」
---
### 异常处理
| 错误现象 | 可能原因 | 解决方案 |
|---------|---------|---------|
| `adb devices` 无输出 | USB 调试未开启或数据线问题 | 检查开发者选项,更换数据线 |
| `adb devices` 显示 unauthorized | 手机未授权 | 手机上点击「允许 USB 调试」|
| 能打开应用但无法点击 | 缺少安全调试权限 | 开启「USB 调试(安全设置)」|
| 中文输入变成乱码或无输入 | ADB Keyboard 未启用 | 在系统设置中启用 ADB Keyboard |
| 截图返回黑屏 | 敏感页面(支付/银行) | 正常现象,系统会自动处理 |
| 连接模型服务失败 | URL 错误或服务未启动 | 检查 URL确认服务正在运行 |
| `ModuleNotFoundError` | 依赖未安装 | 执行 `pip install -r requirements.txt` |
---
### 部署要点
1. **优先确认手机连接**:在安装任何代码之前,先确保 `adb devices` 能看到设备
2. **不要跳过 ADB Keyboard**:没有它,中文输入会失败
3. **模型服务是外部依赖**Agent 代码本身不包含模型,需要单独的模型服务
4. **遇到权限问题先检查手机设置**:大部分问题都是手机端配置不完整
5. **部署完成后用简单任务测试**:建议用「打开微信发消息给文件传输助手」作为验收标准
---
### 命令速查
```bash
# 检查 ADB 连接
adb devices
# 重启 ADB 服务
adb kill-server && adb start-server
# 安装依赖
pip install -r requirements.txt && pip install -e .
# 运行 Agent交互模式
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b"
# 运行 Agent单次任务
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b" "你的任务描述"
# 查看支持的应用列表
python main.py --list-apps
```
---
**部署完成的标志:手机能自动执行用户的自然语言指令。**

View File

@@ -1,9 +1,13 @@
# Open-AutoGLM Quick Start / 快速开始
# Open-AutoGLM Quick Start for Coding Agent
<div align="center">
<img src=resources/logo.svg width="20%"/>
</div>
> **本文专为AI助手(如ClaudeCode)阅读,用于自动化部部署Open-AutoGLM。**
>
> 如果你是人类读者, 可以跳过本文, 按照README.md文档接操作即可。
>
> Quick deployment guide for Phone Agent desktop SDK
> Phone Agent 桌面端 SDK 快速部署指南

View File

@@ -488,3 +488,201 @@ If you find our work helpful, please cite the following papers:
year={2025}
}
```
---
## Automated Deployment Guide (For AI Assistants)
> **This section is specifically designed for AI assistants (such as Claude Code) to automate the deployment of Open-AutoGLM.**
>
> If you are a human reader, you can skip this section and follow the documentation above.
---
### Project Overview
Open-AutoGLM is a phone agent framework:
- **Input**: User's natural language instructions (e.g., "Open WhatsApp and send a message to John")
- **Output**: Automatically operates the user's Android phone to complete tasks
- **Mechanism**: Screenshot → Vision model understands interface → Outputs tap coordinates → ADB executes actions → Loop
The architecture consists of two parts:
1. **Agent Code** (this repository): Runs on the user's computer, responsible for calling models, parsing actions, and controlling the phone
2. **Vision Model Service**: Can be a remote API or deployed locally
---
### Pre-Deployment Checklist
Before starting deployment, confirm the following items with the user:
#### Hardware Requirements
- [ ] User has an Android phone (Android 7.0+)
- [ ] User has a USB cable that supports data transfer (not just charging)
- [ ] Phone and computer can be connected via USB cable
#### Phone Configuration
- [ ] Phone has Developer Mode enabled (Settings → About Phone → Tap Build Number 7 times)
- [ ] Phone has USB Debugging enabled (Settings → Developer Options → USB Debugging)
- [ ] Some models require enabling "USB Debugging (Security Settings)" as well
- [ ] ADB Keyboard app is installed (Download: https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk)
- [ ] ADB Keyboard is enabled in system settings (Settings → Language & Input → Enable ADB Keyboard)
#### Model Service Confirmation (Choose One)
**Ask the user explicitly: Do you already have access to an AutoGLM model service?**
- **Option A: Use an already-deployed model service (Recommended)**
- User provides the model service URL (e.g., `http://xxx.xxx.xxx.xxx:8000/v1`)
- No local GPU required, no model download needed
- Use this URL directly as the `--base-url` parameter
- **Option B: Deploy model locally (High system requirements)**
- Requires NVIDIA GPU (24GB+ VRAM recommended)
- Requires installation of vLLM or SGLang
- Requires downloading approximately 20GB of model files
- **If the user is a beginner or unsure, strongly recommend Option A**
---
### Deployment Process
#### Phase 1: Environment Setup
```bash
# 1. Install ADB tools
# MacOS:
brew install android-platform-tools
# Or download manually: https://developer.android.com/tools/releases/platform-tools
# Windows: Download, extract, and add to PATH environment variable
# 2. Verify ADB installation
adb version
# Should output version information
# 3. Connect phone and verify
# Connect phone via USB cable, tap "Allow USB debugging" on phone
adb devices
# Should output device list, e.g.:
# List of devices attached
# XXXXXXXX device
```
**If `adb devices` shows empty list or unauthorized:**
1. Check if authorization popup appeared on phone, tap "Allow"
2. Check if USB debugging is enabled
3. Try a different cable or USB port
4. Run `adb kill-server && adb start-server` and retry
#### Phase 2: Install Agent
```bash
# 1. Clone repository (if not already cloned)
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
pip install -e .
```
**Note: No need to clone model repository; models are called via API.**
#### Phase 3: Configure Model Service
**If user chooses Option A (using already-deployed model):**
- Use the URL provided by the user directly
- Skip local model deployment steps
**If user chooses Option B (deploy model locally):**
```bash
# 1. Install vLLM
pip install vllm
# 2. Start model service (will auto-download model, ~20GB)
python3 -m vllm.entrypoints.openai.api_server \
--served-model-name autoglm-phone-9b-multilingual \
--allowed-local-media-path / \
--mm-encoder-tp-mode data \
--mm_processor_cache_type shm \
--mm_processor_kwargs "{\"max_pixels\":5000000}" \
--max-model-len 25480 \
--chat-template-content-format string \
--limit-mm-per-prompt "{\"image\":10}" \
--model zai-org/AutoGLM-Phone-9B-Multilingual \
--port 8000
# Model service URL: http://localhost:8000/v1
```
#### Phase 4: Verify Deployment
```bash
# Execute in the Open-AutoGLM directory
# Replace {MODEL_URL} with the actual model service address
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b-multilingual" "Open Gmail and send an email to File Transfer Assistant: Deployment successful"
```
**Expected Result:**
- Phone automatically opens Gmail
- Automatically searches for recipient
- Automatically sends the message "Deployment successful"
---
### Troubleshooting
| Error Symptom | Possible Cause | Solution |
|---------------|----------------|----------|
| `adb devices` shows nothing | USB debugging not enabled or cable issue | Check developer options, replace cable |
| `adb devices` shows unauthorized | Phone not authorized | Tap "Allow USB debugging" on phone |
| Can open apps but cannot tap | Missing security debugging permission | Enable "USB Debugging (Security Settings)" |
| Chinese/text input corrupted or missing | ADB Keyboard not enabled | Enable ADB Keyboard in system settings |
| Screenshot returns black screen | Sensitive page (payment/banking) | Normal behavior, system will handle automatically |
| Cannot connect to model service | Wrong URL or service not running | Check URL, confirm service is running |
| `ModuleNotFoundError` | Dependencies not installed | Run `pip install -r requirements.txt` |
---
### Deployment Key Points
1. **Prioritize confirming phone connection**: Before installing any code, ensure `adb devices` can see the device
2. **Don't skip ADB Keyboard**: Without it, text input will fail
3. **Model service is an external dependency**: Agent code doesn't include the model; a separate model service is required
4. **Check phone settings first for permission issues**: Most problems are due to incomplete phone-side configuration
5. **Test with simple tasks after deployment**: Recommend using "Open Gmail and send message to File Transfer Assistant" as acceptance criteria
---
### Command Quick Reference
```bash
# Check ADB connection
adb devices
# Restart ADB service
adb kill-server && adb start-server
# Install dependencies
pip install -r requirements.txt && pip install -e .
# Run Agent (interactive mode)
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b-multilingual"
# Run Agent (single task)
python main.py --base-url {MODEL_URL} --model "autoglm-phone-9b-multilingual" "your task description"
# View supported apps list
python main.py --list-apps
```
---
**Deployment success indicator: The phone can automatically execute user's natural language instructions.**