update
This commit is contained in:
2
LICENSE
2
LICENSE
@@ -186,7 +186,7 @@
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
Copyright 2025 Zhipu AI
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
|
||||
55
README.md
55
README.md
@@ -1,6 +1,6 @@
|
||||
# Open-AutoGLM
|
||||
|
||||
[Read this in English.](./README_en.md)
|
||||
[Readme in English](README_en.md)
|
||||
|
||||
<div align="center">
|
||||
<img src=resources/logo.svg width="20%"/>
|
||||
@@ -20,9 +20,12 @@ ADB 调试能力,可通过 WiFi 或网络连接设备,实现灵活的远程
|
||||
|
||||
## 模型下载地址
|
||||
|
||||
| Model | Download Links |
|
||||
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| AutoGLM-Phone-9B | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B) |
|
||||
| Model | Download Links |
|
||||
|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| AutoGLM-Phone-9B | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B) |
|
||||
| AutoGLM-Phone-9B-Multilingual | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B-Multilingual)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B-Multilingual) |
|
||||
|
||||
其中,`AutoGLM-Phone-9B` 是针对中文手机应用优化的模型,而 `AutoGLM-Phone-9B-Multilingual` 支持英语场景,适用于包含英文等其他语言内容的应用。
|
||||
|
||||
## 环境准备
|
||||
|
||||
@@ -52,6 +55,10 @@ ADB 调试能力,可通过 WiFi 或网络连接设备,实现灵活的远程
|
||||
3. 部分机型在设置开发者选项以后, 可能需要重启设备才能生效. 可以测试一下: 将手机用USB数据线连接到电脑后, `adb devices`
|
||||
查看是否有设备信息, 如果没有说明连接失败.
|
||||
|
||||
**请务必仔细检查相关权限**
|
||||
|
||||

|
||||
|
||||
### 4. 安装 ADB Keyboard(用于文本输入)
|
||||
|
||||
下载 [安装包](https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) 并在对应的安卓设备中进行安装。
|
||||
@@ -102,7 +109,7 @@ python3 -m vllm.entrypoints.openai.api_server \
|
||||
--port 8000
|
||||
```
|
||||
|
||||
- 该模型结构与 `GLM-4.1V-9B-Thinking` 相同, 关于模型部署的详细内容,你也可以查看 [GLM-V](https://github.com/zai-org/GLM-V)
|
||||
- 该模型结构与 `GLM-4.1V-9B-Thinking` 相同, 关于模型部署的详细内容,你也以查看 [GLM-V](https://github.com/zai-org/GLM-V)
|
||||
获取模型部署和使用指南。
|
||||
|
||||
- 运行成功后,将可以通过 `http://localhost:8000/v1` 访问模型服务。 如果您在远程服务器部署模型, 使用该服务器的IP访问模型.
|
||||
@@ -120,6 +127,9 @@ python main.py --base-url http://localhost:8000/v1 --model "autoglm-phone-9b"
|
||||
# 指定模型端点
|
||||
python main.py --base-url http://localhost:8000/v1 "打开美团搜索附近的火锅店"
|
||||
|
||||
# 使用英文 system prompt
|
||||
python main.py --lang en --base-url http://localhost:8000/v1 "Open Chrome browser"
|
||||
|
||||
# 列出支持的应用
|
||||
python main.py --list-apps
|
||||
```
|
||||
@@ -232,19 +242,22 @@ conn.disconnect("192.168.1.100:5555")
|
||||
|
||||
### 自定义SYSTEM PROMPT
|
||||
|
||||
直接修改配置文件 `phone_agent/config/prompts.py`
|
||||
系统提供中英文两套 prompt,通过 `--lang` 参数切换:
|
||||
|
||||
1. 可以通过注入system prompt来增强模型在特定领域的能力
|
||||
2. 可以通过注入app名称禁用某些app
|
||||
- `--lang cn` - 中文 prompt(默认),配置文件:`phone_agent/config/prompts_zh.py`
|
||||
- `--lang en` - 英文 prompt,配置文件:`phone_agent/config/prompts_en.py`
|
||||
|
||||
可以直接修改对应的配置文件来增强模型在特定领域的能力,或通过注入 app 名称禁用某些 app。
|
||||
|
||||
### 环境变量
|
||||
|
||||
| 变量 | 描述 | 默认值 |
|
||||
|-------------------------|-----------|----------------------------|
|
||||
| `PHONE_AGENT_BASE_URL` | 模型 API 地址 | `http://localhost:8000/v1` |
|
||||
| `PHONE_AGENT_MODEL` | 模型名称 | `autoglm-phone-9b` |
|
||||
| `PHONE_AGENT_MAX_STEPS` | 每个任务最大步数 | `100` |
|
||||
| `PHONE_AGENT_DEVICE_ID` | ADB 设备 ID | (自动检测) |
|
||||
| 变量 | 描述 | 默认值 |
|
||||
|-------------------------|------------------|----------------------------|
|
||||
| `PHONE_AGENT_BASE_URL` | 模型 API 地址 | `http://localhost:8000/v1` |
|
||||
| `PHONE_AGENT_MODEL` | 模型名称 | `autoglm-phone-9b` |
|
||||
| `PHONE_AGENT_MAX_STEPS` | 每个任务最大步数 | `100` |
|
||||
| `PHONE_AGENT_DEVICE_ID` | ADB 设备 ID | (自动检测) |
|
||||
| `PHONE_AGENT_LANG` | 语言 (`cn` 或 `en`) | `cn` |
|
||||
|
||||
### 模型配置
|
||||
|
||||
@@ -269,6 +282,7 @@ from phone_agent.agent import AgentConfig
|
||||
config = AgentConfig(
|
||||
max_steps=100, # 每个任务最大步数
|
||||
device_id=None, # ADB 设备 ID(None 为自动检测)
|
||||
lang="cn", # 语言选择:cn(中文)或 en(英文)
|
||||
verbose=True, # 打印调试信息(包括思考过程和执行动作)
|
||||
)
|
||||
```
|
||||
@@ -409,7 +423,8 @@ phone_agent/
|
||||
│ └── handler.py # 操作执行器
|
||||
├── config/ # 配置
|
||||
│ ├── apps.py # 支持的应用映射
|
||||
│ └── prompts.py # 系统提示词
|
||||
│ ├── prompts_zh.py # 中文系统提示词
|
||||
│ └── prompts_en.py # 英文系统提示词
|
||||
└── model/ # AI 模型客户端
|
||||
└── client.py # OpenAI 兼容客户端
|
||||
```
|
||||
@@ -438,6 +453,16 @@ adb devices
|
||||
|
||||
这通常意味着应用正在显示敏感页面(支付、密码、银行类应用)。Agent 会自动检测并请求人工接管。
|
||||
|
||||
### windows 编码异常问题
|
||||
报错信息形如 `UnicodeEncodeError gbk code`
|
||||
|
||||
解决办法: 在运行代码的命令前面加上环境变量: `PYTHONIOENCODING=utf-8`
|
||||
|
||||
### 交互模式非TTY环境无法使用
|
||||
报错形如: `EOF when reading a line`
|
||||
|
||||
解决办法: 使用非交互模式直接指定任务, 或者切换到 TTY 模式的终端应用.
|
||||
|
||||
### 引用
|
||||
|
||||
如果你觉得我们的工作有帮助,请引用以下论文:
|
||||
|
||||
208
README_en.md
208
README_en.md
@@ -11,67 +11,53 @@
|
||||
|
||||
## Project Introduction
|
||||
|
||||
Phone Agent is a mobile intelligent assistant framework built on AutoGLM. It can understand phone screen content in a
|
||||
multimodal way and help users complete tasks through automated operations. The system controls devices through ADB (
|
||||
Android Debug Bridge), uses vision-language models for screen perception, and combines intelligent planning capabilities
|
||||
to generate and execute operation workflows. Users only need to describe their requirements in natural language, such
|
||||
as "Open Xiaohongshu and search for food", and Phone Agent will automatically parse the intent, understand the current
|
||||
interface, plan the next action, and complete the entire workflow. The system also has a built-in sensitive operation
|
||||
confirmation mechanism and supports manual takeover in login or verification code scenarios. Additionally, it provides
|
||||
remote ADB debugging capabilities, allowing device connection via WiFi or network for flexible remote control and
|
||||
development.
|
||||
Phone Agent is a mobile intelligent assistant framework built on AutoGLM. It understands phone screen content in a multimodal manner and helps users complete tasks through automated operations. The system controls devices via ADB (Android Debug Bridge), perceives screens using vision-language models, and generates and executes operation workflows through intelligent planning. Users simply describe their needs in natural language, such as "Open Xiaohongshu and search for food," and Phone Agent will automatically parse the intent, understand the current interface, plan the next action, and complete the entire workflow. The system also includes a sensitive operation confirmation mechanism and supports manual takeover during login or verification code scenarios. Additionally, it provides remote ADB debugging capabilities, allowing device connection via WiFi or network for flexible remote control and development.
|
||||
|
||||
> ⚠️ This project is for research and learning purposes only. It is strictly prohibited to use it for illegal
|
||||
> information gathering, system interference, or any illegal activities. Please carefully review
|
||||
> the [Terms of Use](resources/privacy_policy.txt).
|
||||
> ⚠️ This project is for research and learning purposes only. It is strictly prohibited to use for illegal information acquisition, system interference, or any illegal activities. Please carefully review the [Terms of Use](resources/privacy_policy_en.txt).
|
||||
|
||||
## Model Download Links
|
||||
|
||||
| Model | Download Links |
|
||||
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| AutoGLM-Phone-9B | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B) |
|
||||
| Model | Download Links |
|
||||
|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| AutoGLM-Phone-9B | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B) |
|
||||
| AutoGLM-Phone-9B-Multilingual | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B-Multilingual)<br>[🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B-Multilingual) |
|
||||
|
||||
Currently, this model only supports Chinese. A multilingual version of the model is coming soon.
|
||||
`AutoGLM-Phone-9B` is optimized for Chinese mobile applications, while `AutoGLM-Phone-9B-Multilingual` supports English scenarios and is suitable for applications containing English or other language content.
|
||||
|
||||
## Environment Setup
|
||||
|
||||
### 1. Python Environment
|
||||
|
||||
Python 3.10 or above is recommended.
|
||||
Python 3.10 or higher is recommended.
|
||||
|
||||
### 2. ADB (Android Debug Bridge)
|
||||
|
||||
1. Download the official ADB [installation package](https://developer.android.com/tools/releases/platform-tools?hl=en),
|
||||
and extract it to a custom path
|
||||
1. Download the official ADB [installation package](https://developer.android.com/tools/releases/platform-tools) and extract it to a custom path
|
||||
2. Configure environment variables
|
||||
|
||||
- MacOS configuration method: In `Terminal` or any command-line tool
|
||||
- MacOS configuration: In `Terminal` or any command line tool
|
||||
|
||||
```bash
|
||||
# Assuming the extracted directory is ~/Downloads/platform-tools. If not, please adjust the command accordingly.
|
||||
# Assuming the extracted directory is ~/Downloads/platform-tools. Adjust the command if different.
|
||||
export PATH=${PATH}:~/Downloads/platform-tools
|
||||
```
|
||||
|
||||
- Windows configuration method: You can refer to
|
||||
this [third-party tutorial](https://blog.csdn.net/x2584179909/article/details/108319973) for configuration.
|
||||
- Windows configuration: Refer to [third-party tutorials](https://blog.csdn.net/x2584179909/article/details/108319973) for configuration.
|
||||
|
||||
### 3. Android 7.0+ Device or Emulator with `Developer Mode` and `USB Debugging` Enabled
|
||||
|
||||
1. Enable Developer Mode: The typical method is to find `Settings > About Phone > Build Number` and tap it quickly about
|
||||
10 times until a popup shows "Developer mode enabled". Different phones may vary slightly; if you can't find it,
|
||||
search online for a tutorial.
|
||||
2. Enable USB Debugging: After enabling Developer Mode, go to `Settings > Developer Options > USB Debugging` and enable
|
||||
it
|
||||
3. Some devices may require a restart after enabling developer options for changes to take effect. You can test it:
|
||||
connect your phone to the computer via USB cable, then run `adb devices` to check if device information appears. If
|
||||
not, the connection has failed.
|
||||
1. Enable Developer Mode: The typical method is to find `Settings > About Phone > Build Number` and tap it rapidly about 10 times until a popup shows "Developer mode has been enabled." This may vary slightly between phones; search online for tutorials if you can't find it.
|
||||
2. Enable USB Debugging: After enabling Developer Mode, go to `Settings > Developer Options > USB Debugging` and enable it
|
||||
3. Some devices may require a restart after setting developer options for them to take effect. You can test by connecting your phone to your computer via USB cable and running `adb devices` to see if device information appears. If not, the connection has failed.
|
||||
|
||||
### 4. Install ADB Keyboard (for text input)
|
||||
**Please carefully check the relevant permissions**
|
||||
|
||||
Download the [installation package](https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) and install it on
|
||||
the corresponding Android device.
|
||||
Note: After installation, you need to enable `ADB Keyboard` in `Settings > Input Method` or `Settings > Keyboard List`
|
||||
for it to work.
|
||||

|
||||
|
||||
### 4. Install ADB Keyboard (for Text Input)
|
||||
|
||||
Download the [installation package](https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) and install it on the corresponding Android device.
|
||||
Note: After installation, you need to enable `ADB Keyboard` in `Settings > Input Method` or `Settings > Keyboard List` for it to work.
|
||||
|
||||
## Deployment Preparation
|
||||
|
||||
@@ -84,9 +70,9 @@ pip install -e .
|
||||
|
||||
### 2. Configure ADB
|
||||
|
||||
Make sure **your USB cable supports data transfer**, not just charging.
|
||||
Make sure your **USB cable supports data transfer**, not just charging.
|
||||
|
||||
Ensure ADB is installed and connect the device with a **USB cable**:
|
||||
Ensure ADB is installed and connect the device via **USB cable**:
|
||||
|
||||
```bash
|
||||
# Check connected devices
|
||||
@@ -97,12 +83,10 @@ adb devices
|
||||
# emulator-5554 device
|
||||
```
|
||||
|
||||
### 3. Start the Model Service
|
||||
### 3. Start Model Service
|
||||
|
||||
1. Download the model and install the inference engine framework according to the `For Model Deployment` section in
|
||||
`requirements.txt`.
|
||||
2. Start via SGlang / vLLM to get an OpenAI-compatible service. Here's a vLLM deployment solution, please strictly
|
||||
follow our provided startup parameters:
|
||||
1. Download the model and install the inference engine framework according to the `For Model Deployment` section in `requirements.txt`.
|
||||
2. Start via SGlang / vLLM to get an OpenAI-format service. Here's a vLLM deployment solution; please strictly follow the startup parameters we provide:
|
||||
|
||||
- vLLM:
|
||||
|
||||
@@ -120,11 +104,9 @@ python3 -m vllm.entrypoints.openai.api_server \
|
||||
--port 8000
|
||||
```
|
||||
|
||||
- This model has the same architecture as `GLM-4.1V-9B-Thinking`. For detailed information about model deployment, you
|
||||
can also check [GLM-V](https://github.com/zai-org/GLM-V) for model deployment and usage guides.
|
||||
- This model has the same architecture as `GLM-4.1V-9B-Thinking`. For detailed information about model deployment, you can also check [GLM-V](https://github.com/zai-org/GLM-V) for model deployment and usage guides.
|
||||
|
||||
- After successful startup, you can access the model service via `http://localhost:8000/v1`. If you deploy the model on
|
||||
a remote server, use that server's IP to access the model.
|
||||
- After successful startup, the model service will be accessible at `http://localhost:8000/v1`. If you deploy the model on a remote server, access it using that server's IP address.
|
||||
|
||||
## Using AutoGLM
|
||||
|
||||
@@ -137,7 +119,10 @@ Set the `--base-url` and `--model` parameters according to your deployed model.
|
||||
python main.py --base-url http://localhost:8000/v1 --model "autoglm-phone-9b"
|
||||
|
||||
# Specify model endpoint
|
||||
python main.py --base-url http://localhost:8000/v1 "Find the top-rated cinema nearby and navigate me to there by foot"
|
||||
python main.py --base-url http://localhost:8000/v1 "Open Meituan and search for nearby hotpot restaurants"
|
||||
|
||||
# Use English system prompt
|
||||
python main.py --lang en --base-url http://localhost:8000/v1 "Open Chrome browser"
|
||||
|
||||
# List supported apps
|
||||
python main.py --list-apps
|
||||
@@ -159,27 +144,26 @@ model_config = ModelConfig(
|
||||
agent = PhoneAgent(model_config=model_config)
|
||||
|
||||
# Execute task
|
||||
result = agent.run("Open Taobao and search for wireless earphones")
|
||||
result = agent.run("Open Taobao and search for wireless earbuds")
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Remote Debugging
|
||||
|
||||
Phone Agent supports remote ADB debugging via WiFi/network, allowing device control without USB connection.
|
||||
Phone Agent supports remote ADB debugging via WiFi/network, allowing device control without a USB connection.
|
||||
|
||||
### Configure Remote Debugging
|
||||
|
||||
#### Enable Wireless Debugging on Phone
|
||||
|
||||
Make sure the phone and computer are on the same WiFi network, as shown below:
|
||||
Ensure the phone and computer are on the same WiFi network, as shown below:
|
||||
|
||||

|
||||
|
||||
#### Use Standard ADB Commands on Computer
|
||||
|
||||
```bash
|
||||
|
||||
# Connect via WiFi, change to the IP address and port shown on your phone
|
||||
# Connect via WiFi, replace with the IP address and port shown on your phone
|
||||
adb connect 192.168.1.100:5555
|
||||
|
||||
# Verify connection
|
||||
@@ -200,7 +184,7 @@ adb connect 192.168.1.100:5555
|
||||
adb disconnect 192.168.1.100:5555
|
||||
|
||||
# Execute task on specific device
|
||||
python main.py --device-id 192.168.1.100:5555 --base-url http://localhost:8000/v1 --model "autoglm-phone-9b" "Open Douyin and browse videos"
|
||||
python main.py --device-id 192.168.1.100:5555 --base-url http://localhost:8000/v1 --model "autoglm-phone-9b" "Open TikTok and browse videos"
|
||||
```
|
||||
|
||||
### Python API Remote Connection
|
||||
@@ -231,18 +215,18 @@ conn.disconnect("192.168.1.100:5555")
|
||||
|
||||
### Remote Connection Troubleshooting
|
||||
|
||||
**Connection refused:**
|
||||
**Connection Refused:**
|
||||
|
||||
- Ensure device and computer are on the same network
|
||||
- Check if firewall is blocking port 5555
|
||||
- Ensure the device and computer are on the same network
|
||||
- Check if the firewall is blocking port 5555
|
||||
- Confirm TCP/IP mode is enabled: `adb tcpip 5555`
|
||||
|
||||
**Connection dropped:**
|
||||
**Connection Dropped:**
|
||||
|
||||
- WiFi may have disconnected, use `--connect` to reconnect
|
||||
- Some devices disable TCP/IP after restart, requiring USB to re-enable
|
||||
- WiFi may have disconnected; use `--connect` to reconnect
|
||||
- Some devices disable TCP/IP after restart; re-enable via USB
|
||||
|
||||
**Multiple devices:**
|
||||
**Multiple Devices:**
|
||||
|
||||
- Use `--device-id` to specify which device to use
|
||||
- Or use `--list-devices` to view all connected devices
|
||||
@@ -251,19 +235,22 @@ conn.disconnect("192.168.1.100:5555")
|
||||
|
||||
### Custom SYSTEM PROMPT
|
||||
|
||||
Directly modify the configuration file `phone_agent/config/prompts.py`
|
||||
The system provides both Chinese and English prompts, switchable via the `--lang` parameter:
|
||||
|
||||
1. You can inject system prompts to enhance the model's capabilities in specific domains
|
||||
2. You can inject app names to disable certain apps
|
||||
- `--lang cn` - Chinese prompt (default), config file: `phone_agent/config/prompts_zh.py`
|
||||
- `--lang en` - English prompt, config file: `phone_agent/config/prompts_en.py`
|
||||
|
||||
You can directly modify the corresponding config files to enhance model capabilities in specific domains or disable certain apps by injecting app names.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|-------------------------|--------------------|----------------------------|
|
||||
| `PHONE_AGENT_BASE_URL` | Model API address | `http://localhost:8000/v1` |
|
||||
| `PHONE_AGENT_MODEL` | Model name | `autoglm-phone-9b` |
|
||||
| `PHONE_AGENT_MAX_STEPS` | Max steps per task | `100` |
|
||||
| `PHONE_AGENT_DEVICE_ID` | ADB device ID | (auto-detect) |
|
||||
| Variable | Description | Default Value |
|
||||
|---------------------------|---------------------------|------------------------------|
|
||||
| `PHONE_AGENT_BASE_URL` | Model API URL | `http://localhost:8000/v1` |
|
||||
| `PHONE_AGENT_MODEL` | Model name | `autoglm-phone-9b` |
|
||||
| `PHONE_AGENT_MAX_STEPS` | Maximum steps per task | `100` |
|
||||
| `PHONE_AGENT_DEVICE_ID` | ADB device ID | (auto-detect) |
|
||||
| `PHONE_AGENT_LANG` | Language (`cn` or `en`) | `cn` |
|
||||
|
||||
### Model Configuration
|
||||
|
||||
@@ -286,8 +273,9 @@ config = ModelConfig(
|
||||
from phone_agent.agent import AgentConfig
|
||||
|
||||
config = AgentConfig(
|
||||
max_steps=100, # Max steps per task
|
||||
max_steps=100, # Maximum steps per task
|
||||
device_id=None, # ADB device ID (None for auto-detect)
|
||||
lang="cn", # Language: cn (Chinese) or en (English)
|
||||
verbose=True, # Print debug info (including thinking process and actions)
|
||||
)
|
||||
```
|
||||
@@ -300,7 +288,7 @@ When `verbose=True`, the Agent outputs detailed information at each step:
|
||||
==================================================
|
||||
💭 Thinking Process:
|
||||
--------------------------------------------------
|
||||
Currently on the system home screen, need to launch the Xiaohongshu app first
|
||||
Currently on the system desktop, need to launch Xiaohongshu app first
|
||||
--------------------------------------------------
|
||||
🎯 Executing Action:
|
||||
{
|
||||
@@ -310,12 +298,12 @@ Currently on the system home screen, need to launch the Xiaohongshu app first
|
||||
}
|
||||
==================================================
|
||||
|
||||
... (continue to next step after executing action)
|
||||
... (continues to next step after executing action)
|
||||
|
||||
==================================================
|
||||
💭 Thinking Process:
|
||||
--------------------------------------------------
|
||||
Xiaohongshu is open, now need to tap the search box
|
||||
Xiaohongshu is now open, need to tap the search box
|
||||
--------------------------------------------------
|
||||
🎯 Executing Action:
|
||||
{
|
||||
@@ -334,18 +322,18 @@ This allows you to clearly see the AI's reasoning process and specific operation
|
||||
|
||||
## Supported Apps
|
||||
|
||||
Phone Agent supports 50+ mainstream Chinese apps:
|
||||
Phone Agent supports 50+ mainstream Chinese applications:
|
||||
|
||||
| Category | Apps |
|
||||
|-----------------------|-----------------------------------|
|
||||
| Social & Chat | WeChat, QQ, Weibo |
|
||||
| E-commerce | Taobao, JD.com, Pinduoduo |
|
||||
| Food & Delivery | Meituan, Ele.me, KFC |
|
||||
| Travel | Ctrip, 12306, Didi |
|
||||
| Video & Entertainment | Bilibili, Douyin, iQiyi |
|
||||
| Music & Audio | NetEase Music, QQ Music, Ximalaya |
|
||||
| Life Services | Dianping, Amap, Baidu Maps |
|
||||
| Content Communities | Xiaohongshu, Zhihu, Douban |
|
||||
| Category | Apps |
|
||||
|-------------------|-----------------------------------------|
|
||||
| Social & Messaging| WeChat, QQ, Weibo |
|
||||
| E-commerce | Taobao, JD.com, Pinduoduo |
|
||||
| Food & Delivery | Meituan, Ele.me, KFC |
|
||||
| Travel | Ctrip, 12306, Did |
|
||||
| Video & Entertainment | Bilibili, TikTok, iQiyi |
|
||||
| Music & Audio | NetEase Music, QQ Music, Ximalaya |
|
||||
| Life Services | Dianping, Amap, Baidu Maps |
|
||||
| Content Communities| Xiaohongshu, Zhihu, Douban |
|
||||
|
||||
Run `python main.py --list-apps` to see the complete list.
|
||||
|
||||
@@ -353,18 +341,18 @@ Run `python main.py --list-apps` to see the complete list.
|
||||
|
||||
The Agent can perform the following actions:
|
||||
|
||||
| Action | Description |
|
||||
|--------------|-----------------------------------------|
|
||||
| `Launch` | Launch an app |
|
||||
| `Tap` | Tap at specified coordinates |
|
||||
| `Type` | Input text |
|
||||
| `Swipe` | Swipe the screen |
|
||||
| `Back` | Go back to previous page |
|
||||
| `Home` | Return to home screen |
|
||||
| `Long Press` | Long press |
|
||||
| `Double Tap` | Double tap |
|
||||
| `Wait` | Wait for page to load |
|
||||
| `Take_over` | Request manual takeover (login/captcha) |
|
||||
| Action | Description |
|
||||
|----------------|------------------------------------------|
|
||||
| `Launch` | Launch an app |
|
||||
| `Tap` | Tap at specified coordinates |
|
||||
| `Type` | Input text |
|
||||
| `Swipe` | Swipe the screen |
|
||||
| `Back` | Go back to previous page |
|
||||
| `Home` | Return to home screen |
|
||||
| `Long Press` | Long press |
|
||||
| `Double Tap` | Double tap |
|
||||
| `Wait` | Wait for page to load |
|
||||
| `Take_over` | Request manual takeover (login/captcha) |
|
||||
|
||||
## Custom Callbacks
|
||||
|
||||
@@ -379,7 +367,7 @@ def my_confirmation(message: str) -> bool:
|
||||
def my_takeover(message: str) -> None:
|
||||
"""Manual takeover callback"""
|
||||
print(f"Please complete manually: {message}")
|
||||
input("Press Enter to continue after completion...")
|
||||
input("Press Enter after completion...")
|
||||
|
||||
|
||||
agent = PhoneAgent(
|
||||
@@ -399,7 +387,7 @@ Check the `examples/` directory for more usage examples:
|
||||
|
||||
## Development
|
||||
|
||||
### Configure Development Environment
|
||||
### Set Up Development Environment
|
||||
|
||||
Development requires dev dependencies:
|
||||
|
||||
@@ -428,14 +416,15 @@ phone_agent/
|
||||
│ └── handler.py # Action executor
|
||||
├── config/ # Configuration
|
||||
│ ├── apps.py # Supported app mappings
|
||||
│ └── prompts.py # System prompts
|
||||
│ ├── prompts_zh.py # Chinese system prompts
|
||||
│ └── prompts_en.py # English system prompts
|
||||
└── model/ # AI model client
|
||||
└── client.py # OpenAI-compatible client
|
||||
```
|
||||
|
||||
## FAQ
|
||||
|
||||
We've listed some common issues and their solutions:
|
||||
Here are some common issues and their solutions:
|
||||
|
||||
### Device Not Found
|
||||
|
||||
@@ -455,12 +444,21 @@ adb devices
|
||||
|
||||
### Screenshot Failed (Black Screen)
|
||||
|
||||
This usually means the app is displaying a sensitive page (payment, password, banking apps). The Agent will
|
||||
automatically detect this and request manual takeover.
|
||||
This usually means the app is displaying a sensitive page (payment, password, banking apps). The Agent will automatically detect this and request manual takeover.
|
||||
|
||||
### Windows Encoding Issues
|
||||
Error message like `UnicodeEncodeError gbk code`
|
||||
|
||||
Solution: Add the environment variable before running the code: `PYTHONIOENCODING=utf-8`
|
||||
|
||||
### Interactive Mode Not Working in Non-TTY Environment
|
||||
Error like: `EOF when reading a line`
|
||||
|
||||
Solution: Use non-interactive mode to specify tasks directly, or switch to a TTY-mode terminal application.
|
||||
|
||||
### Citation
|
||||
|
||||
If you find our work helpful, please cite the following paper:
|
||||
If you find our work helpful, please cite the following papers:
|
||||
|
||||
```bibtex
|
||||
@article{liu2024autoglm,
|
||||
@@ -475,4 +473,4 @@ If you find our work helpful, please cite the following paper:
|
||||
journal={arXiv preprint arXiv:2509.18119},
|
||||
year={2025}
|
||||
}
|
||||
```
|
||||
```
|
||||
|
||||
@@ -1,87 +1,101 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Phone Agent 使用示例
|
||||
Phone Agent Usage Examples / Phone Agent 使用示例
|
||||
|
||||
Demonstrates how to use Phone Agent for phone automation tasks via Python API.
|
||||
演示如何通过 Python API 使用 Phone Agent 进行手机自动化任务。
|
||||
"""
|
||||
|
||||
from phone_agent import PhoneAgent
|
||||
from phone_agent.agent import AgentConfig
|
||||
from phone_agent.config import get_messages
|
||||
from phone_agent.model import ModelConfig
|
||||
|
||||
|
||||
def example_basic_task():
|
||||
"""基础任务示例"""
|
||||
# 配置模型端点
|
||||
def example_basic_task(lang: str = "cn"):
|
||||
"""Basic task example / 基础任务示例"""
|
||||
msgs = get_messages(lang)
|
||||
|
||||
# Configure model endpoint
|
||||
model_config = ModelConfig(
|
||||
base_url="http://localhost:8000/v1",
|
||||
model_name="autoglm-phone-9b",
|
||||
temperature=0.1,
|
||||
)
|
||||
|
||||
# 配置 Agent 行为
|
||||
# Configure Agent behavior
|
||||
agent_config = AgentConfig(
|
||||
max_steps=50,
|
||||
verbose=True,
|
||||
lang=lang,
|
||||
)
|
||||
|
||||
# 创建 Agent
|
||||
# Create Agent
|
||||
agent = PhoneAgent(
|
||||
model_config=model_config,
|
||||
agent_config=agent_config,
|
||||
)
|
||||
|
||||
# 执行任务
|
||||
# Execute task
|
||||
result = agent.run("打开小红书搜索美食攻略")
|
||||
print(f"任务结果: {result}")
|
||||
print(f"{msgs['task_result']}: {result}")
|
||||
|
||||
|
||||
def example_with_callbacks():
|
||||
"""带回调的任务示例"""
|
||||
def example_with_callbacks(lang: str = "cn"):
|
||||
"""Task example with callbacks / 带回调的任务示例"""
|
||||
msgs = get_messages(lang)
|
||||
|
||||
def my_confirmation(message: str) -> bool:
|
||||
"""敏感操作确认回调"""
|
||||
print(f"\n[需要确认] {message}")
|
||||
response = input("是否继续?(y/n): ")
|
||||
"""Sensitive operation confirmation callback / 敏感操作确认回调"""
|
||||
print(f"\n[{msgs['confirmation_required']}] {message}")
|
||||
response = input(f"{msgs['continue_prompt']}: ")
|
||||
return response.lower() in ("yes", "y", "是")
|
||||
|
||||
def my_takeover(message: str) -> None:
|
||||
"""人工接管回调"""
|
||||
print(f"\n[需要人工操作] {message}")
|
||||
print("请手动完成操作...")
|
||||
input("完成后按回车继续: ")
|
||||
"""Manual takeover callback / 人工接管回调"""
|
||||
print(f"\n[{msgs['manual_operation_required']}] {message}")
|
||||
print(msgs["manual_operation_hint"])
|
||||
input(f"{msgs['press_enter_when_done']}: ")
|
||||
|
||||
# 创建带自定义回调的 Agent
|
||||
# Create Agent with custom callbacks
|
||||
agent_config = AgentConfig(lang=lang)
|
||||
agent = PhoneAgent(
|
||||
agent_config=agent_config,
|
||||
confirmation_callback=my_confirmation,
|
||||
takeover_callback=my_takeover,
|
||||
)
|
||||
|
||||
# 执行可能需要确认的任务
|
||||
# Execute task that may require confirmation
|
||||
result = agent.run("打开淘宝搜索无线耳机并加入购物车")
|
||||
print(f"任务结果: {result}")
|
||||
print(f"{msgs['task_result']}: {result}")
|
||||
|
||||
|
||||
def example_step_by_step():
|
||||
"""单步执行示例(用于调试)"""
|
||||
agent = PhoneAgent()
|
||||
def example_step_by_step(lang: str = "cn"):
|
||||
"""Step-by-step execution example (for debugging) / 单步执行示例(用于调试)"""
|
||||
msgs = get_messages(lang)
|
||||
|
||||
# 初始化任务
|
||||
agent_config = AgentConfig(lang=lang)
|
||||
agent = PhoneAgent(agent_config=agent_config)
|
||||
|
||||
# Initialize task
|
||||
result = agent.step("打开美团搜索附近的火锅店")
|
||||
print(f"步骤 1: {result.action}")
|
||||
print(f"{msgs['step']} 1: {result.action}")
|
||||
|
||||
# 如果未完成,继续执行
|
||||
# Continue if not finished
|
||||
while not result.finished and agent.step_count < 10:
|
||||
result = agent.step()
|
||||
print(f"步骤 {agent.step_count}: {result.action}")
|
||||
print(f" 思考过程: {result.thinking[:100]}...")
|
||||
print(f"{msgs['step']} {agent.step_count}: {result.action}")
|
||||
print(f" {msgs['thinking']}: {result.thinking[:100]}...")
|
||||
|
||||
print(f"\n最终结果: {result.message}")
|
||||
print(f"\n{msgs['final_result']}: {result.message}")
|
||||
|
||||
|
||||
def example_multiple_tasks():
|
||||
"""批量任务示例"""
|
||||
agent = PhoneAgent()
|
||||
def example_multiple_tasks(lang: str = "cn"):
|
||||
"""Batch task example / 批量任务示例"""
|
||||
msgs = get_messages(lang)
|
||||
|
||||
agent_config = AgentConfig(lang=lang)
|
||||
agent = PhoneAgent(agent_config=agent_config)
|
||||
|
||||
tasks = [
|
||||
"打开高德地图查看实时路况",
|
||||
@@ -91,69 +105,86 @@ def example_multiple_tasks():
|
||||
|
||||
for task in tasks:
|
||||
print(f"\n{'=' * 50}")
|
||||
print(f"任务: {task}")
|
||||
print(f"{msgs['task']}: {task}")
|
||||
print("=" * 50)
|
||||
|
||||
result = agent.run(task)
|
||||
print(f"结果: {result}")
|
||||
print(f"{msgs['result']}: {result}")
|
||||
|
||||
# 重置 Agent 状态
|
||||
# Reset Agent state
|
||||
agent.reset()
|
||||
|
||||
|
||||
def example_remote_device():
|
||||
"""远程设备示例"""
|
||||
def example_remote_device(lang: str = "cn"):
|
||||
"""Remote device example / 远程设备示例"""
|
||||
from phone_agent.adb import ADBConnection
|
||||
|
||||
# 创建连接管理器
|
||||
msgs = get_messages(lang)
|
||||
|
||||
# Create connection manager
|
||||
conn = ADBConnection()
|
||||
|
||||
# 连接远程设备
|
||||
# Connect to remote device
|
||||
success, message = conn.connect("192.168.1.100:5555")
|
||||
if not success:
|
||||
print(f"连接失败: {message}")
|
||||
print(f"{msgs['connection_failed']}: {message}")
|
||||
return
|
||||
|
||||
print(f"连接成功: {message}")
|
||||
print(f"{msgs['connection_successful']}: {message}")
|
||||
|
||||
# 创建 Agent 并指定设备
|
||||
# Create Agent with device specified
|
||||
agent_config = AgentConfig(
|
||||
device_id="192.168.1.100:5555",
|
||||
verbose=True,
|
||||
lang=lang,
|
||||
)
|
||||
|
||||
agent = PhoneAgent(agent_config=agent_config)
|
||||
|
||||
# 执行任务
|
||||
# Execute task
|
||||
result = agent.run("打开微信查看消息")
|
||||
print(f"任务结果: {result}")
|
||||
print(f"{msgs['task_result']}: {result}")
|
||||
|
||||
# 断开连接
|
||||
# Disconnect
|
||||
conn.disconnect("192.168.1.100:5555")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("Phone Agent 使用示例")
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Phone Agent Usage Examples")
|
||||
parser.add_argument(
|
||||
"--lang",
|
||||
type=str,
|
||||
default="cn",
|
||||
choices=["cn", "en"],
|
||||
help="Language for UI messages (cn=Chinese, en=English)",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
msgs = get_messages(args.lang)
|
||||
|
||||
print("Phone Agent Usage Examples")
|
||||
print("=" * 50)
|
||||
|
||||
# 运行基础示例
|
||||
print("\n1. 基础任务示例")
|
||||
# Run basic example
|
||||
print(f"\n1. Basic Task Example")
|
||||
print("-" * 30)
|
||||
example_basic_task()
|
||||
example_basic_task(args.lang)
|
||||
|
||||
# 其他示例可以取消注释运行
|
||||
# print("\n2. 带回调的任务示例")
|
||||
# Uncomment to run other examples
|
||||
# print(f"\n2. Task Example with Callbacks")
|
||||
# print("-" * 30)
|
||||
# example_with_callbacks()
|
||||
# example_with_callbacks(args.lang)
|
||||
|
||||
# print("\n3. 单步执行示例")
|
||||
# print(f"\n3. Step-by-step Example")
|
||||
# print("-" * 30)
|
||||
# example_step_by_step()
|
||||
# example_step_by_step(args.lang)
|
||||
|
||||
# print("\n4. 批量任务示例")
|
||||
# print(f"\n4. Batch Task Example")
|
||||
# print("-" * 30)
|
||||
# example_multiple_tasks()
|
||||
# example_multiple_tasks(args.lang)
|
||||
|
||||
# print("\n5. 远程设备示例")
|
||||
# print(f"\n5. Remote Device Example")
|
||||
# print("-" * 30)
|
||||
# example_remote_device()
|
||||
# example_remote_device(args.lang)
|
||||
|
||||
@@ -1,84 +1,64 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
演示 thinking 输出的示例
|
||||
Thinking Output Demo / 演示 thinking 输出的示例
|
||||
|
||||
This script demonstrates how the Agent outputs both thinking process and actions in verbose mode.
|
||||
这个脚本展示了在 verbose 模式下,Agent 会同时输出思考过程和执行动作。
|
||||
"""
|
||||
|
||||
from phone_agent import PhoneAgent
|
||||
from phone_agent.agent import AgentConfig
|
||||
from phone_agent.config import get_messages
|
||||
from phone_agent.model import ModelConfig
|
||||
|
||||
|
||||
def main():
|
||||
def main(lang: str = "cn"):
|
||||
msgs = get_messages(lang)
|
||||
|
||||
print("=" * 60)
|
||||
print("Phone Agent - Thinking 输出演示")
|
||||
print("Phone Agent - Thinking Demo")
|
||||
print("=" * 60)
|
||||
|
||||
# 配置模型
|
||||
# Configure model
|
||||
model_config = ModelConfig(
|
||||
base_url="http://localhost:8000/v1",
|
||||
model_name="autoglm-phone-9b",
|
||||
temperature=0.1,
|
||||
)
|
||||
|
||||
# 配置 Agent (verbose=True 会输出详细信息)
|
||||
# Configure Agent (verbose=True enables detailed output)
|
||||
agent_config = AgentConfig(
|
||||
max_steps=10,
|
||||
verbose=True, # 开启详细输出
|
||||
verbose=True,
|
||||
lang=lang,
|
||||
)
|
||||
|
||||
# 创建 Agent
|
||||
# Create Agent
|
||||
agent = PhoneAgent(
|
||||
model_config=model_config,
|
||||
agent_config=agent_config,
|
||||
)
|
||||
|
||||
# 执行任务
|
||||
print("\n📱 开始执行任务...\n")
|
||||
# Execute task
|
||||
print(f"\n📱 {msgs['starting_task']}...\n")
|
||||
result = agent.run("打开小红书搜索美食攻略")
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print(f"📊 最终结果: {result}")
|
||||
print(f"📊 {msgs['final_result']}: {result}")
|
||||
print("=" * 60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
"""
|
||||
运行此脚本,你将看到如下格式的输出:
|
||||
import argparse
|
||||
|
||||
==================================================
|
||||
💭 思考过程:
|
||||
--------------------------------------------------
|
||||
当前在系统桌面,需要先启动小红书应用,然后进行搜索
|
||||
--------------------------------------------------
|
||||
🎯 执行动作:
|
||||
{
|
||||
"_metadata": "do",
|
||||
"action": "Launch",
|
||||
"app": "小红书"
|
||||
}
|
||||
==================================================
|
||||
parser = argparse.ArgumentParser(description="Phone Agent Thinking Demo")
|
||||
parser.add_argument(
|
||||
"--lang",
|
||||
type=str,
|
||||
default="cn",
|
||||
choices=["cn", "en"],
|
||||
help="Language for UI messages (cn=Chinese, en=English)",
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
(执行后会继续下一步...)
|
||||
|
||||
==================================================
|
||||
💭 思考过程:
|
||||
--------------------------------------------------
|
||||
小红书已打开,现在需要点击搜索框并输入关键词
|
||||
--------------------------------------------------
|
||||
🎯 执行动作:
|
||||
{
|
||||
"_metadata": "do",
|
||||
"action": "Tap",
|
||||
"element": [500, 100]
|
||||
}
|
||||
==================================================
|
||||
|
||||
... (更多步骤)
|
||||
|
||||
🎉 ================================================
|
||||
✅ 任务完成: 已成功搜索美食攻略
|
||||
==================================================
|
||||
"""
|
||||
main()
|
||||
main(lang=args.lang)
|
||||
|
||||
10
main.py
10
main.py
@@ -354,6 +354,14 @@ Examples:
|
||||
"--list-apps", action="store_true", help="List supported apps and exit"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--lang",
|
||||
type=str,
|
||||
choices=["cn", "en"],
|
||||
default=os.getenv("PHONE_AGENT_LANG", "cn"),
|
||||
help="Language for system prompt (cn or en, default: cn)",
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"task",
|
||||
nargs="?",
|
||||
@@ -467,6 +475,7 @@ def main():
|
||||
max_steps=args.max_steps,
|
||||
device_id=args.device_id,
|
||||
verbose=not args.quiet,
|
||||
lang=args.lang,
|
||||
)
|
||||
|
||||
# Create agent
|
||||
@@ -482,6 +491,7 @@ def main():
|
||||
print(f"Model: {model_config.model_name}")
|
||||
print(f"Base URL: {model_config.base_url}")
|
||||
print(f"Max Steps: {agent_config.max_steps}")
|
||||
print(f"Language: {agent_config.lang}")
|
||||
|
||||
# Show device info
|
||||
devices = list_devices()
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
import base64
|
||||
import os
|
||||
import subprocess
|
||||
import tempfile
|
||||
import uuid
|
||||
from dataclasses import dataclass
|
||||
from io import BytesIO
|
||||
@@ -36,7 +37,7 @@ def get_screenshot(device_id: str | None = None, timeout: int = 10) -> Screensho
|
||||
If the screenshot fails (e.g., on sensitive screens like payment pages),
|
||||
a black fallback image is returned with is_sensitive=True.
|
||||
"""
|
||||
temp_path = f"/tmp/screenshot_{uuid.uuid4()}.png"
|
||||
temp_path = os.path.join(tempfile.gettempdir(), f"screenshot_{uuid.uuid4()}.png")
|
||||
adb_prefix = _get_adb_prefix(device_id)
|
||||
|
||||
try:
|
||||
|
||||
@@ -8,7 +8,7 @@ from typing import Any, Callable
|
||||
from phone_agent.actions import ActionHandler
|
||||
from phone_agent.actions.handler import do, finish, parse_action
|
||||
from phone_agent.adb import get_current_app, get_screenshot
|
||||
from phone_agent.config import SYSTEM_PROMPT
|
||||
from phone_agent.config import get_messages, get_system_prompt
|
||||
from phone_agent.model import ModelClient, ModelConfig
|
||||
from phone_agent.model.client import MessageBuilder
|
||||
|
||||
@@ -19,9 +19,14 @@ class AgentConfig:
|
||||
|
||||
max_steps: int = 100
|
||||
device_id: str | None = None
|
||||
system_prompt: str = SYSTEM_PROMPT
|
||||
lang: str = "cn"
|
||||
system_prompt: str | None = None
|
||||
verbose: bool = True
|
||||
|
||||
def __post_init__(self):
|
||||
if self.system_prompt is None:
|
||||
self.system_prompt = get_system_prompt(self.lang)
|
||||
|
||||
|
||||
@dataclass
|
||||
class StepResult:
|
||||
@@ -185,13 +190,14 @@ class PhoneAgent:
|
||||
action = finish(message=response.action)
|
||||
|
||||
if self.agent_config.verbose:
|
||||
# 打印思考过程
|
||||
# Print thinking process
|
||||
msgs = get_messages(self.agent_config.lang)
|
||||
print("\n" + "=" * 50)
|
||||
print("💭 思考过程:")
|
||||
print(f"💭 {msgs['thinking']}:")
|
||||
print("-" * 50)
|
||||
print(response.thinking)
|
||||
print("-" * 50)
|
||||
print("🎯 执行动作:")
|
||||
print(f"🎯 {msgs['action']}:")
|
||||
print(json.dumps(action, ensure_ascii=False, indent=2))
|
||||
print("=" * 50 + "\n")
|
||||
|
||||
@@ -221,8 +227,11 @@ class PhoneAgent:
|
||||
finished = action.get("_metadata") == "finish" or result.should_finish
|
||||
|
||||
if finished and self.agent_config.verbose:
|
||||
msgs = get_messages(self.agent_config.lang)
|
||||
print("\n" + "🎉 " + "=" * 48)
|
||||
print(f"✅ 任务完成: {result.message or action.get('message', '完成')}")
|
||||
print(
|
||||
f"✅ {msgs['task_completed']}: {result.message or action.get('message', msgs['done'])}"
|
||||
)
|
||||
print("=" * 50 + "\n")
|
||||
|
||||
return StepResult(
|
||||
|
||||
@@ -1,6 +1,35 @@
|
||||
"""Configuration module for Phone Agent."""
|
||||
|
||||
from phone_agent.config.apps import APP_PACKAGES
|
||||
from phone_agent.config.prompts import SYSTEM_PROMPT
|
||||
from phone_agent.config.i18n import get_message, get_messages
|
||||
from phone_agent.config.prompts_en import SYSTEM_PROMPT as SYSTEM_PROMPT_EN
|
||||
from phone_agent.config.prompts_zh import SYSTEM_PROMPT as SYSTEM_PROMPT_ZH
|
||||
|
||||
__all__ = ["APP_PACKAGES", "SYSTEM_PROMPT"]
|
||||
|
||||
def get_system_prompt(lang: str = "cn") -> str:
|
||||
"""
|
||||
Get system prompt by language.
|
||||
|
||||
Args:
|
||||
lang: Language code, 'cn' for Chinese, 'en' for English.
|
||||
|
||||
Returns:
|
||||
System prompt string.
|
||||
"""
|
||||
if lang == "en":
|
||||
return SYSTEM_PROMPT_EN
|
||||
return SYSTEM_PROMPT_ZH
|
||||
|
||||
|
||||
# Default to Chinese for backward compatibility
|
||||
SYSTEM_PROMPT = SYSTEM_PROMPT_ZH
|
||||
|
||||
__all__ = [
|
||||
"APP_PACKAGES",
|
||||
"SYSTEM_PROMPT",
|
||||
"SYSTEM_PROMPT_ZH",
|
||||
"SYSTEM_PROMPT_EN",
|
||||
"get_system_prompt",
|
||||
"get_messages",
|
||||
"get_message",
|
||||
]
|
||||
|
||||
@@ -68,6 +68,123 @@ APP_PACKAGES: dict[str, str] = {
|
||||
"星穹铁道": "com.miHoYo.hkrpg",
|
||||
"崩坏:星穹铁道": "com.miHoYo.hkrpg",
|
||||
"恋与深空": "com.papegames.lysk.cn",
|
||||
"AndroidSystemSettings": "com.android.settings",
|
||||
"Android System Settings": "com.android.settings",
|
||||
"Android System Settings": "com.android.settings",
|
||||
"Android-System-Settings": "com.android.settings",
|
||||
"Settings": "com.android.settings",
|
||||
"AudioRecorder": "com.android.soundrecorder",
|
||||
"audiorecorder": "com.android.soundrecorder",
|
||||
"Bluecoins": "com.rammigsoftware.bluecoins",
|
||||
"bluecoins": "com.rammigsoftware.bluecoins",
|
||||
"Broccoli": "com.flauschcode.broccoli",
|
||||
"broccoli": "com.flauschcode.broccoli",
|
||||
"Booking.com": "com.booking",
|
||||
"Booking": "com.booking",
|
||||
"booking.com": "com.booking",
|
||||
"booking": "com.booking",
|
||||
"BOOKING.COM": "com.booking",
|
||||
"Chrome": "com.android.chrome",
|
||||
"chrome": "com.android.chrome",
|
||||
"Google Chrome": "com.android.chrome",
|
||||
"Clock": "com.android.deskclock",
|
||||
"clock": "com.android.deskclock",
|
||||
"Contacts": "com.android.contacts",
|
||||
"contacts": "com.android.contacts",
|
||||
"Duolingo": "com.duolingo",
|
||||
"duolingo": "com.duolingo",
|
||||
"Expedia": "com.expedia.bookings",
|
||||
"expedia": "com.expedia.bookings",
|
||||
"Files": "com.android.fileexplorer",
|
||||
"files": "com.android.fileexplorer",
|
||||
"File Manager": "com.android.fileexplorer",
|
||||
"file manager": "com.android.fileexplorer",
|
||||
"gmail": "com.google.android.gm",
|
||||
"Gmail": "com.google.android.gm",
|
||||
"GoogleMail": "com.google.android.gm",
|
||||
"Google Mail": "com.google.android.gm",
|
||||
"GoogleFiles": "com.google.android.apps.nbu.files",
|
||||
"googlefiles": "com.google.android.apps.nbu.files",
|
||||
"FilesbyGoogle": "com.google.android.apps.nbu.files",
|
||||
"GoogleCalendar": "com.google.android.calendar",
|
||||
"Google-Calendar": "com.google.android.calendar",
|
||||
"Google Calendar": "com.google.android.calendar",
|
||||
"google-calendar": "com.google.android.calendar",
|
||||
"google calendar": "com.google.android.calendar",
|
||||
"GoogleChat": "com.google.android.apps.dynamite",
|
||||
"Google Chat": "com.google.android.apps.dynamite",
|
||||
"Google-Chat": "com.google.android.apps.dynamite",
|
||||
"GoogleClock": "com.google.android.deskclock",
|
||||
"Google Clock": "com.google.android.deskclock",
|
||||
"Google-Clock": "com.google.android.deskclock",
|
||||
"GoogleContacts": "com.google.android.contacts",
|
||||
"Google-Contacts": "com.google.android.contacts",
|
||||
"Google Contacts": "com.google.android.contacts",
|
||||
"google-contacts": "com.google.android.contacts",
|
||||
"google contacts": "com.google.android.contacts",
|
||||
"GoogleDocs": "com.google.android.apps.docs.editors.docs",
|
||||
"Google Docs": "com.google.android.apps.docs.editors.docs",
|
||||
"googledocs": "com.google.android.apps.docs.editors.docs",
|
||||
"google docs": "com.google.android.apps.docs.editors.docs",
|
||||
"Google Drive": "com.google.android.apps.docs",
|
||||
"Google-Drive": "com.google.android.apps.docs",
|
||||
"google drive": "com.google.android.apps.docs",
|
||||
"google-drive": "com.google.android.apps.docs",
|
||||
"GoogleDrive": "com.google.android.apps.docs",
|
||||
"Googledrive": "com.google.android.apps.docs",
|
||||
"googledrive": "com.google.android.apps.docs",
|
||||
"GoogleFit": "com.google.android.apps.fitness",
|
||||
"googlefit": "com.google.android.apps.fitness",
|
||||
"GoogleKeep": "com.google.android.keep",
|
||||
"googlekeep": "com.google.android.keep",
|
||||
"GoogleMaps": "com.google.android.apps.maps",
|
||||
"Google Maps": "com.google.android.apps.maps",
|
||||
"googlemaps": "com.google.android.apps.maps",
|
||||
"google maps": "com.google.android.apps.maps",
|
||||
"Google Play Books": "com.google.android.apps.books",
|
||||
"Google-Play-Books": "com.google.android.apps.books",
|
||||
"google play books": "com.google.android.apps.books",
|
||||
"google-play-books": "com.google.android.apps.books",
|
||||
"GooglePlayBooks": "com.google.android.apps.books",
|
||||
"googleplaybooks": "com.google.android.apps.books",
|
||||
"GooglePlayStore": "com.android.vending",
|
||||
"Google Play Store": "com.android.vending",
|
||||
"Google-Play-Store": "com.android.vending",
|
||||
"GoogleSlides": "com.google.android.apps.docs.editors.slides",
|
||||
"Google Slides": "com.google.android.apps.docs.editors.slides",
|
||||
"Google-Slides": "com.google.android.apps.docs.editors.slides",
|
||||
"GoogleTasks": "com.google.android.apps.tasks",
|
||||
"Google Tasks": "com.google.android.apps.tasks",
|
||||
"Google-Tasks": "com.google.android.apps.tasks",
|
||||
"Joplin": "net.cozic.joplin",
|
||||
"joplin": "net.cozic.joplin",
|
||||
"McDonald": "com.mcdonalds.app",
|
||||
"mcdonald": "com.mcdonalds.app",
|
||||
"Osmand": "net.osmand",
|
||||
"osmand": "net.osmand",
|
||||
"PiMusicPlayer": "com.Project100Pi.themusicplayer",
|
||||
"pimusicplayer": "com.Project100Pi.themusicplayer",
|
||||
"Quora": "com.quora.android",
|
||||
"quora": "com.quora.android",
|
||||
"Reddit": "com.reddit.frontpage",
|
||||
"reddit": "com.reddit.frontpage",
|
||||
"RetroMusic": "code.name.monkey.retromusic",
|
||||
"retromusic": "code.name.monkey.retromusic",
|
||||
"SimpleCalendarPro": "com.scientificcalculatorplus.simplecalculator.basiccalculator.mathcalc",
|
||||
"SimpleSMSMessenger": "com.simplemobiletools.smsmessenger",
|
||||
"Telegram": "org.telegram.messenger",
|
||||
"temu": "com.einnovation.temu",
|
||||
"Temu": "com.einnovation.temu",
|
||||
"Tiktok": "com.zhiliaoapp.musically",
|
||||
"tiktok": "com.zhiliaoapp.musically",
|
||||
"Twitter": "com.twitter.android",
|
||||
"twitter": "com.twitter.android",
|
||||
"X": "com.twitter.android",
|
||||
"VLC": "org.videolan.vlc",
|
||||
"WeChat": "com.tencent.mm",
|
||||
"wechat": "com.tencent.mm",
|
||||
"Whatsapp": "com.whatsapp",
|
||||
"WhatsApp": "com.whatsapp",
|
||||
}
|
||||
|
||||
|
||||
|
||||
73
phone_agent/config/i18n.py
Normal file
73
phone_agent/config/i18n.py
Normal file
@@ -0,0 +1,73 @@
|
||||
"""Internationalization (i18n) module for Phone Agent UI messages."""
|
||||
|
||||
# Chinese messages
|
||||
MESSAGES_ZH = {
|
||||
"thinking": "思考过程",
|
||||
"action": "执行动作",
|
||||
"task_completed": "任务完成",
|
||||
"done": "完成",
|
||||
"starting_task": "开始执行任务",
|
||||
"final_result": "最终结果",
|
||||
"task_result": "任务结果",
|
||||
"confirmation_required": "需要确认",
|
||||
"continue_prompt": "是否继续?(y/n)",
|
||||
"manual_operation_required": "需要人工操作",
|
||||
"manual_operation_hint": "请手动完成操作...",
|
||||
"press_enter_when_done": "完成后按回车继续",
|
||||
"connection_failed": "连接失败",
|
||||
"connection_successful": "连接成功",
|
||||
"step": "步骤",
|
||||
"task": "任务",
|
||||
"result": "结果",
|
||||
}
|
||||
|
||||
# English messages
|
||||
MESSAGES_EN = {
|
||||
"thinking": "Thinking",
|
||||
"action": "Action",
|
||||
"task_completed": "Task Completed",
|
||||
"done": "Done",
|
||||
"starting_task": "Starting task",
|
||||
"final_result": "Final Result",
|
||||
"task_result": "Task Result",
|
||||
"confirmation_required": "Confirmation Required",
|
||||
"continue_prompt": "Continue? (y/n)",
|
||||
"manual_operation_required": "Manual Operation Required",
|
||||
"manual_operation_hint": "Please complete the operation manually...",
|
||||
"press_enter_when_done": "Press Enter when done",
|
||||
"connection_failed": "Connection Failed",
|
||||
"connection_successful": "Connection Successful",
|
||||
"step": "Step",
|
||||
"task": "Task",
|
||||
"result": "Result",
|
||||
}
|
||||
|
||||
|
||||
def get_messages(lang: str = "cn") -> dict:
|
||||
"""
|
||||
Get UI messages dictionary by language.
|
||||
|
||||
Args:
|
||||
lang: Language code, 'cn' for Chinese, 'en' for English.
|
||||
|
||||
Returns:
|
||||
Dictionary of UI messages.
|
||||
"""
|
||||
if lang == "en":
|
||||
return MESSAGES_EN
|
||||
return MESSAGES_ZH
|
||||
|
||||
|
||||
def get_message(key: str, lang: str = "cn") -> str:
|
||||
"""
|
||||
Get a single UI message by key and language.
|
||||
|
||||
Args:
|
||||
key: Message key.
|
||||
lang: Language code, 'cn' for Chinese, 'en' for English.
|
||||
|
||||
Returns:
|
||||
Message string.
|
||||
"""
|
||||
messages = get_messages(lang)
|
||||
return messages.get(key, key)
|
||||
74
phone_agent/config/prompts_en.py
Normal file
74
phone_agent/config/prompts_en.py
Normal file
@@ -0,0 +1,74 @@
|
||||
"""System prompts for the AI agent."""
|
||||
from datetime import datetime
|
||||
|
||||
today = datetime.today()
|
||||
formatted_date = today.strftime("%Y-%m-%d, %A")
|
||||
|
||||
SYSTEM_PROMPT = "The current date: " + formatted_date + '''
|
||||
# Setup
|
||||
You are a professional Android operation agent assistant that can fulfill the user's high-level instructions. Given a screenshot of the Android interface at each step, you first analyze the situation, then plan the best course of action using Python-style pseudo-code.
|
||||
|
||||
# More details about the code
|
||||
Your response format must be structured as follows:
|
||||
|
||||
Think first: Use <think>...</think> to analyze the current screen, identify key elements, and determine the most efficient action.
|
||||
Provide the action: Use <answer>...</answer> to return a single line of pseudo-code representing the operation.
|
||||
|
||||
Your output should STRICTLY follow the format:
|
||||
<think>
|
||||
[Your throught]
|
||||
</think>
|
||||
<answer>
|
||||
[Your operation code]
|
||||
</answer>
|
||||
|
||||
- **Tap**
|
||||
Perform a tap action on a specified screen area. The element is a list of 2 integers, representing the coordinates of the tap point.
|
||||
**Example**:
|
||||
<answer>
|
||||
do(action="Tap", element=[x,y])
|
||||
</answer>
|
||||
- **Type**
|
||||
Enter text into the currently focused input field.
|
||||
**Example**:
|
||||
<answer>
|
||||
do(action="Type", text="Hello World")
|
||||
</answer>
|
||||
- **Swipe**
|
||||
Perform a swipe action with start point and end point.
|
||||
**Examples**:
|
||||
<answer>
|
||||
do(action="Swipe", start=[x1,y1], end=[x2,y2])
|
||||
</answer>
|
||||
- **Long Press**
|
||||
Perform a long press action on a specified screen area.
|
||||
You can add the element to the action to specify the long press area. The element is a list of 2 integers, representing the coordinates of the long press point.
|
||||
**Example**:
|
||||
<answer>
|
||||
do(action="Long Press", element=[x,y])
|
||||
</answer>
|
||||
- **Launch**
|
||||
Launch an app. Try to use launch action when you need to launch an app. Check the instruction to choose the right app before you use this action.
|
||||
**Example**:
|
||||
<answer>
|
||||
do(action="Launch", app="Settings")
|
||||
</answer>
|
||||
- **Back**
|
||||
Press the Back button to navigate to the previous screen.
|
||||
**Example**:
|
||||
<answer>
|
||||
do(action="Back")
|
||||
</answer>
|
||||
- **Finish**
|
||||
Terminate the program and optionally print a message.
|
||||
**Example**:
|
||||
<answer>
|
||||
finish(message="Task completed.")
|
||||
</answer>
|
||||
|
||||
|
||||
REMEMBER:
|
||||
- Think before you act: Always analyze the current UI and the best course of action before executing any step, and output in <think> part.
|
||||
- Only ONE LINE of action in <answer> part per response: Each step must contain exactly one line of executable code.
|
||||
- Generate execution code strictly according to format requirements.
|
||||
'''
|
||||
72
phone_agent/config/prompts_zh.py
Normal file
72
phone_agent/config/prompts_zh.py
Normal file
@@ -0,0 +1,72 @@
|
||||
"""System prompts for the AI agent."""
|
||||
from datetime import datetime
|
||||
|
||||
today = datetime.today()
|
||||
weekday_names = ["星期一", "星期二", "星期三", "星期四", "星期五", "星期六", "星期日"]
|
||||
weekday = weekday_names[today.weekday()]
|
||||
formatted_date = today.strftime("%Y年%m月%d日") + " " + weekday
|
||||
|
||||
SYSTEM_PROMPT = "今天的日期是: " + formatted_date + '''
|
||||
你是一个智能体分析专家,可以根据操作历史和当前状态图执行一系列操作来完成任务。
|
||||
你必须严格按照要求输出以下格式:
|
||||
<think>{think}</think>
|
||||
<answer>{action}</answer>
|
||||
|
||||
其中:
|
||||
- {think} 是对你为什么选择这个操作的简短推理说明。
|
||||
- {action} 是本次执行的具体操作指令,必须严格遵循下方定义的指令格式。
|
||||
|
||||
操作指令及其作用如下:
|
||||
- do(action="Launch", app="xxx")
|
||||
Launch是启动目标app的操作,这比通过主屏幕导航更快。此操作完成后,您将自动收到结果状态的截图。
|
||||
- do(action="Tap", element=[x,y])
|
||||
Tap是点击操作,点击屏幕上的特定点。可用此操作点击按钮、选择项目、从主屏幕打开应用程序,或与任何可点击的用户界面元素进行交互。坐标系统从左上角 (0,0) 开始到右下角(999,999)结束。此操作完成后,您将自动收到结果状态的截图。
|
||||
- do(action="Tap", element=[x,y], message="重要操作")
|
||||
基本功能同Tap,点击涉及财产、支付、隐私等敏感按钮时触发。
|
||||
- do(action="Type", text="xxx")
|
||||
Type是输入操作,在当前聚焦的输入框中输入文本。使用此操作前,请确保输入框已被聚焦(先点击它)。输入的文本将像使用键盘输入一样输入。重要提示:手机可能正在使用 ADB 键盘,该键盘不会像普通键盘那样占用屏幕空间。要确认键盘已激活,请查看屏幕底部是否显示 'ADB Keyboard {ON}' 类似的文本,或者检查输入框是否处于激活/高亮状态。不要仅仅依赖视觉上的键盘显示。自动清除文本:当你使用输入操作时,输入框中现有的任何文本(包括占位符文本和实际输入)都会在输入新文本前自动清除。你无需在输入前手动清除文本——直接使用输入操作输入所需文本即可。操作完成后,你将自动收到结果状态的截图。
|
||||
- do(action="Type_Name", text="xxx")
|
||||
Type_Name是输入人名的操作,基本功能同Type。
|
||||
- do(action="Interact")
|
||||
Interact是当有多个满足条件的选项时而触发的交互操作,询问用户如何选择。
|
||||
- do(action="Swipe", start=[x1,y1], end=[x2,y2])
|
||||
Swipe是滑动操作,通过从起始坐标拖动到结束坐标来执行滑动手势。可用于滚动内容、在屏幕之间导航、下拉通知栏以及项目栏或进行基于手势的导航。坐标系统从左上角 (0,0) 开始到右下角(999,999)结束。滑动持续时间会自动调整以实现自然的移动。此操作完成后,您将自动收到结果状态的截图。
|
||||
- do(action="Note", message="True")
|
||||
记录当前页面内容以便后续总结。
|
||||
- do(action="Call_API", instruction="xxx")
|
||||
总结或评论当前页面或已记录的内容。
|
||||
- do(action="Long Press", element=[x,y])
|
||||
Long Pres是长按操作,在屏幕上的特定点长按指定时间。可用于触发上下文菜单、选择文本或激活长按交互。坐标系统从左上角 (0,0) 开始到右下角(999,999)结束。此操作完成后,您将自动收到结果状态的屏幕截图。
|
||||
- do(action="Double Tap", element=[x,y])
|
||||
Double Tap在屏幕上的特定点快速连续点按两次。使用此操作可以激活双击交互,如缩放、选择文本或打开项目。坐标系统从左上角 (0,0) 开始到右下角(999,999)结束。此操作完成后,您将自动收到结果状态的截图。
|
||||
- do(action="Take_over", message="xxx")
|
||||
Take_over是接管操作,表示在登录和验证阶段需要用户协助。
|
||||
- do(action="Back")
|
||||
导航返回到上一个屏幕或关闭当前对话框。相当于按下 Android 的返回按钮。使用此操作可以从更深的屏幕返回、关闭弹出窗口或退出当前上下文。此操作完成后,您将自动收到结果状态的截图。
|
||||
- do(action="Home")
|
||||
Home是回到系统桌面的操作,相当于按下 Android 主屏幕按钮。使用此操作可退出当前应用并返回启动器,或从已知状态启动新任务。此操作完成后,您将自动收到结果状态的截图。
|
||||
- do(action="Wait", duration="x seconds")
|
||||
等待页面加载,x为需要等待多少秒。
|
||||
- finish(message="xxx")
|
||||
finish是结束任务的操作,表示准确完整完成任务,message是终止信息。
|
||||
|
||||
必须遵循的规则:
|
||||
1. 在执行任何操作前,先检查当前app是否是目标app,如果不是,先执行 Launch。
|
||||
2. 如果进入到了无关页面,先执行 Back。如果执行Back后页面没有变化,请点击页面左上角的返回键进行返回,或者右上角的X号关闭。
|
||||
3. 如果页面未加载出内容,最多连续 Wait 三次,否则执行 Back重新进入。
|
||||
4. 如果页面显示网络问题,需要重新加载,请点击重新加载。
|
||||
5. 如果当前页面找不到目标联系人、商品、店铺等信息,可以尝试 Swipe 滑动查找。
|
||||
6. 遇到价格区间、时间区间等筛选条件,如果没有完全符合的,可以放宽要求。
|
||||
7. 在做小红书总结类任务时一定要筛选图文笔记。
|
||||
8. 购物车全选后再点击全选可以把状态设为全不选,在做购物车任务时,如果购物车里已经有商品被选中时,你需要点击全选后再点击取消全选,再去找需要购买或者删除的商品。
|
||||
9. 在做外卖任务时,如果相应店铺购物车里已经有其他商品你需要先把购物车清空再去购买用户指定的外卖。
|
||||
10. 在做点外卖任务时,如果用户需要点多个外卖,请尽量在同一店铺进行购买,如果无法找到可以下单,并说明某个商品未找到。
|
||||
11. 请严格遵循用户意图执行任务,用户的特殊要求可以执行多次搜索,滑动查找。比如(i)用户要求点一杯咖啡,要咸的,你可以直接搜索咸咖啡,或者搜索咖啡后滑动查找咸的咖啡,比如海盐咖啡。(ii)用户要找到XX群,发一条消息,你可以先搜索XX群,找不到结果后,将"群"字去掉,搜索XX重试。(iii)用户要找到宠物友好的餐厅,你可以搜索餐厅,找到筛选,找到设施,选择可带宠物,或者直接搜索可带宠物,必要时可以使用AI搜索。
|
||||
12. 在选择日期时,如果原滑动方向与预期日期越来越远,请向反方向滑动查找。
|
||||
13. 执行任务过程中如果有多个可选择的项目栏,请逐个查找每个项目栏,直到完成任务,一定不要在同一项目栏多次查找,从而陷入死循环。
|
||||
14. 在执行下一步操作前请一定要检查上一步的操作是否生效,如果点击没生效,可能因为app反应较慢,请先稍微等待一下,如果还是不生效请调整一下点击位置重试,如果仍然不生效请跳过这一步继续任务,并在finish message说明点击不生效。
|
||||
15. 在执行任务中如果遇到滑动不生效的情况,请调整一下起始点位置,增大滑动距离重试,如果还是不生效,有可能是已经滑到底了,请继续向反方向滑动,直到顶部或底部,如果仍然没有符合要求的结果,请跳过这一步继续任务,并在finish message说明但没找到要求的项目。
|
||||
16. 在做游戏任务时如果在战斗页面如果有自动战斗一定要开启自动战斗,如果多轮历史状态相似要检查自动战斗是否开启。
|
||||
17. 如果没有合适的搜索结果,可能是因为搜索页面不对,请返回到搜索页面的上一级尝试重新搜索,如果尝试三次返回上一级搜索后仍然没有符合要求的结果,执行 finish(message="原因")。
|
||||
18. 在结束任务前请一定要仔细检查任务是否完整准确的完成,如果出现错选、漏选、多选的情况,请返回之前的步骤进行纠正。
|
||||
'''
|
||||
134
resources/privacy_policy_en.txt
Normal file
134
resources/privacy_policy_en.txt
Normal file
@@ -0,0 +1,134 @@
|
||||
Part I: Safety Description of Model/Technology
|
||||
|
||||
1. AutoGLM Technical Mechanism and Deployment Flexibility
|
||||
The core functionality of AutoGLM is automated operation execution. Its working principle is as follows:
|
||||
- Instruction-Driven: Based on operation instructions issued by the user or developer.
|
||||
- Screen Understanding: Captures the screen content of the current operating environment and sends the image to a large model (which can be deployed locally or in the cloud) for analysis and understanding.
|
||||
- Operation Simulation: Simulates human interaction methods (such as clicking, swiping, inputting information, etc.) to complete tasks in the target environment.
|
||||
- Example: When instructed to book a high-speed rail ticket, AutoGLM would open the relevant application, identify the interface content, and follow the instructions to select a train, complete the order, etc., similar to manual operation. The user or developer can terminate the task at any time.
|
||||
|
||||
Key Flexibility:
|
||||
- Model Deployment: Developers can freely choose to deploy the AutoGLM model on local devices or on cloud servers.
|
||||
- Operation Execution Environment: Automated operations can be executed on local devices or on cloud-based devices, as determined by the developer based on application scenarios and requirements.
|
||||
- Data Flow: The data flow depends on the deployment choice:
|
||||
- Local Deployment (Model + Execution): Screen capture, model analysis, and operation execution are all completed on the local device. Data does not leave the device, offering the highest level of privacy.
|
||||
- Cloud Deployment (Model + Execution): Screen content needs to be transmitted from the operating environment (local or cloud device) to the cloud-based model. After analysis, the model returns instructions to the operating environment for execution. Developers must ensure the security of transmission and cloud processing.
|
||||
- Hybrid Deployment (e.g., Local Execution + Cloud Model): Screen content is captured locally, transmitted to the cloud model for analysis, and the analysis results are returned to the local environment for execution. Developers need to pay attention to data transmission security.
|
||||
|
||||
2. System Permission Usage Description (For the Operation Execution Environment)
|
||||
To ensure the normal execution of automated operations, the environment running AutoGLM operations may need to obtain the following permissions:
|
||||
- ADB (Android Debug Bridge) Permissions: Used to obtain information and simulate user interaction operations such as clicking, swiping, and inputting.
|
||||
- Storage Permissions: Used for temporary storage of necessary data, model files (if deployed locally), or logs.
|
||||
- Network Permissions: Used to access online services (e.g., calling cloud models, accessing target application services).
|
||||
- Other Specific Permissions: May be required for specific tasks (e.g., microphone for voice commands).
|
||||
|
||||
Developer Responsibilities:
|
||||
- Principle of Least Privilege: Only request permissions absolutely necessary to complete a specific task.
|
||||
- Transparent Disclosure: Clearly and explicitly inform end-users in the application or service about the purpose and necessity of each permission.
|
||||
- User Authorization: Must obtain explicit authorization from the end-user before enabling relevant permissions and functionalities in the operating environment.
|
||||
- Environment Adaptation: Ensure that the permission request and acquisition mechanisms are adapted to the chosen operation execution environment (local or cloud).
|
||||
|
||||
3. Data Processing and Privacy Protection Principles
|
||||
The AutoGLM open-source project itself does not collect user data. The responsibility for data processing and privacy protection lies with the developers who build specific applications or services based on AutoGLM. Their responsibilities vary depending on the deployment method:
|
||||
- Local Deployment (Model + Execution):
|
||||
- Developers must implement secure local data storage and processing at the application level. All data processing (screen capture, model analysis, operation execution) is completed on the end-user's local device.
|
||||
- Developers should ensure their application does not actively upload sensitive data (such as screen content, operation logs) to the developer's servers or third parties, unless with the user's explicit, informed consent and for a necessary functionality.
|
||||
- Cloud Deployment (Model and/or Execution):
|
||||
- Involves data transmission (screen content, operation instructions, model analysis results) between the operating environment and the cloud.
|
||||
- Developers must:
|
||||
- Implement strong encryption to protect all data in transit and at rest.
|
||||
- Clearly inform end-users about what data will be sent to the cloud, the purpose of transmission, storage location, and retention period, and obtain the end-user's explicit consent for data transmission and cloud processing.
|
||||
- Comply with applicable data protection regulations, provide a clear privacy policy explaining data processing practices.
|
||||
- Ensure secure configuration and access control for the cloud environment (model servers, operating environment servers).
|
||||
- General Principles (All Deployment Methods):
|
||||
- Data Minimization: Collect and process only the minimum information absolutely necessary to complete the automated task.
|
||||
- Purpose Limitation: Use data solely for the specific purpose of the automated operation to fulfill the user's instruction.
|
||||
- Security Safeguards: Developers are responsible for taking reasonable technical and administrative measures to protect the security and confidentiality of all user data they process (whether locally or in the cloud), preventing unauthorized access, use, disclosure, or loss.
|
||||
- User Control: Provide mechanisms allowing end-users to view and manage (e.g., delete) data related to them (where technically feasible and consistent with the deployment method).
|
||||
|
||||
|
||||
---
|
||||
|
||||
Part II: Usage Norms Developers/Users Should Follow
|
||||
Developers/users must always comply with applicable laws and regulations when using the AutoGLM open-source project.
|
||||
|
||||
1. Critical Operation Confirmation Mechanism
|
||||
Developers must design and implement explicit, mandatory user confirmation steps within their applications or services built on AutoGLM for the following 6+1 types of high-risk operations:
|
||||
- Information Interaction and Content Dissemination: Including but not limited to sending messages, emails, posting comments, liking, sharing, etc.
|
||||
- File Handling and Permission Management: Including but not limited to creating, editing, deleting, moving files or folders, enabling or disabling any permissions, etc.
|
||||
- Transaction Orders and Disposal of Rights/Interests: Including but not limited to clearing shopping carts, submitting orders, modifying/adding shipping addresses, using coupons/points, etc.
|
||||
- Fund Transfers and Payment Settlement: Including but not limited to transfers, payments, receiving funds, recharging, withdrawals, binding/unbinding payment methods, etc.
|
||||
- Account Identity and Security Configuration: Including but not limited to changing passwords, setting/modifying security options, deleting accounts or linked accounts, deleting friends/contacts, deleting conversations/records, etc.
|
||||
- Healthcare and Legal Compliance: Including but not limited to accessing, authorizing, or disposing of medical records/health data, purchasing medication, physical or psychological testing, signing electronic agreements, etc.
|
||||
- Other High-Risk Operations: Any other operation that may significantly impact user data security, property security, account security, or reputation.
|
||||
|
||||
Requirements:
|
||||
- The confirmation step must be triggered before operation execution, clearly displaying the details of the upcoming operation.
|
||||
- Provide convenient cancel/termination mechanisms, allowing users to abort the task at any time before confirmation or during the operation process.
|
||||
- Developer Responsibility: Developers shall bear corresponding responsibility for losses caused to users due to failure to implement an effective confirmation mechanism.
|
||||
- User Responsibility: Users shall bear losses resulting from their failure to promptly terminate erroneous operations after confirmation.
|
||||
|
||||
2. Obligations of Developers and Users
|
||||
Developer Obligations:
|
||||
- Transparent Disclosure: Clearly and accurately explain to end-users the functionality, working principles (especially the automated parts), data collection and processing methods (including whether the cloud is involved), potential risks, and how users can exercise control.
|
||||
- Provide Monitoring and Control: Design a user interface that allows end-users to:
|
||||
- View or understand the current status and steps of automated operations in real-time.
|
||||
- Conveniently and quickly pause or terminate any ongoing automated task.
|
||||
- Manage permissions and settings for automated operations.
|
||||
- Secure Development: Follow secure coding practices to ensure the security of the application/service itself and prevent malicious exploitation.
|
||||
- Compliance: Ensure that the developed application/service complies with all applicable laws, regulations, industry standards, and third-party platform (e.g., the application being operated on) terms of service.
|
||||
- Risk Warning: Clearly warn users in appropriate locations (e.g., feature entry points, first-time use, confirmation steps) about potential risks of using automation functions (such as misoperation, privacy risks, third-party platform policy risks).
|
||||
- Avoid Critical Dependencies: Carefully evaluate and refrain from recommending AutoGLM for handling extremely critical, high-risk operations or those with severe consequences upon error (e.g., medical device control, critical infrastructure operations, large financial transactions without human review).
|
||||
|
||||
User Obligations:
|
||||
- Understand Risks: Before using AutoGLM-based automation features, carefully read the developer's instructions, privacy policy, and risk warnings to fully understand their working principles and potential risks.
|
||||
- Grant Permissions Cautiously: Only grant necessary permissions after fully trusting the application/service developer and understanding the authorization content.
|
||||
- Active Monitoring: Maintain appropriate attention during the execution of automated tasks, especially for important operations. Utilize monitoring functions provided by the developer to understand operation progress.
|
||||
- Timely Intervention: Immediately use the provided termination function to stop the task if any operation error, abnormality, or deviation from expectation is observed.
|
||||
- Assume Responsibility: Bear responsibility for instructions issued, operations confirmed, and any losses resulting from failure to promptly monitor and stop erroneous operations.
|
||||
|
||||
3. Developer and User Code of Conduct
|
||||
It is strictly prohibited to use the AutoGLM open-source project or applications/services developed based on it to engage in the following behaviors:
|
||||
|
||||
(1) Bulk Automation and Malicious Competition
|
||||
- Any form of falsified data manipulation: brushing orders, votes, likes, comments, traffic, followers, play counts, downloads, etc.
|
||||
- Bulk account manipulation: bulk registration, bulk login, bulk operation of third-party platform accounts (group control, multi-instance, cloud control).
|
||||
- Disrupting market order: malicious bulk purchasing, hoarding and profiteering, snatching limited resources, bulk claiming/abusing coupons/subsidies, maliciously occupying service resources ("薅羊毛").
|
||||
- Manipulating platform rules: brushing rankings/search results, artificially influencing recommendation algorithms, artificially inflating/deflating content exposure.
|
||||
- Creating false engagement: bulk publishing, reposting, liking, collecting, following, unfollowing, etc., on social media.
|
||||
- Undermining game fairness: power-leveling services, studio operations, bulk farming of equipment/currency/experience/items.
|
||||
- Undermining fairness: bulk voting, ballot stuffing, manipulating online polls/survey results.
|
||||
|
||||
(2) False Information and Fraudulent Behavior
|
||||
- Creating misleading information: publishing/spreading false product/service reviews, false user feedback, false testimonials, false experiences.
|
||||
- Fabricating commercial data: creating false transaction records, sales figures, user engagement, positive review rates.
|
||||
- Identity fraud: impersonating others, fabricating personal information, stealing others' accounts/avatars/nicknames, forging identity documents.
|
||||
- False marketing: publishing false advertisements, conducting false promotions, exaggerating product efficacy, concealing product defects/risks.
|
||||
- Participating in fraudulent activities: online scams, false investments, pyramid schemes, illegal fundraising, fake prize wins, phishing, etc.
|
||||
- Spreading unverified information: creating or maliciously spreading fake news, rumors, unverified information.
|
||||
|
||||
(3) Harming Third-Party Services and System Security
|
||||
- Unauthorized access: using AutoGLM for data scraping (violating robots.txt or platform policies), information theft, API abuse, unauthorized penetration testing.
|
||||
- Technical sabotage: reverse engineering, cracking, modifying, injecting malicious code into third-party applications, disrupting their normal operation.
|
||||
- Resource abuse: maliciously occupying third-party server resources, sending spam requests, generating abnormal traffic, conducting DDoS attacks.
|
||||
- Violating platform rules: intentionally violating the user agreements, terms of service, or community rules of the third-party application being operated on.
|
||||
- Malicious competition: malicious negative reviews, false reporting, false complaints, commercial defamation.
|
||||
- Spreading harmful content: spreading computer viruses, trojans, malware, ransomware, spam, illegal content.
|
||||
- Infringing data rights: unauthorized large-scale commercial data collection, user information gathering, privacy snooping.
|
||||
|
||||
(4) Infringing on Others' Legitimate Rights and Interests
|
||||
- Account theft: stealing others' accounts, passwords, identity credentials for operations.
|
||||
- Online harassment and bullying: malicious harassment, threats, insults, defamation, doxxing others.
|
||||
- Privacy and secret infringement: unauthorized collection, use, or dissemination of others' personal information, private data, trade secrets.
|
||||
- Cybersquatting: registering others' trademarks, domain names, usernames, social media accounts, etc., in bad faith.
|
||||
- Harassment: malicious spamming, message bombing, forced following/subscription.
|
||||
- Harming commercial interests: industrial espionage, unfair competition, malicious poaching, theft of trade secrets.
|
||||
|
||||
(5) Resource Abuse and Damaging Project Ecosystem
|
||||
- Abusing registration resources: maliciously registering numerous accounts, fake registration.
|
||||
- Wasting computing/device resources: maliciously occupying local or cloud device resources, long-term idle occupancy, running high-energy-consumption programs unrelated to automated tasks (e.g., cryptocurrency mining).
|
||||
- Destabilizing systems: maliciously testing system performance, conducting unauthorized stress tests, frequently restarting services, exploiting technical vulnerabilities/defects for personal gain or to harm the project/platform.
|
||||
- Violating open-source licenses: violating the terms of the AutoGLM project's open-source license.
|
||||
|
||||
Consequences of Violation:
|
||||
If developers/users fail to follow the corresponding laws, regulations, policies, industry standards (including but not limited to technical specifications, security standards), and the project's agreements (including but not limited to open-source licenses, usage notes) during use, all resulting legal liabilities, economic losses, and any adverse consequences shall be solely and independently borne by the developers / users.
|
||||
BIN
resources/screenshot-20251209-181423.png
Normal file
BIN
resources/screenshot-20251209-181423.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 198 KiB |
4
setup.py
4
setup.py
@@ -9,8 +9,8 @@ with open("README.md", "r", encoding="utf-8") as f:
|
||||
setup(
|
||||
name="phone-agent",
|
||||
version="0.1.0",
|
||||
author="Your Name",
|
||||
author_email="your.email@example.com",
|
||||
author="Zhipu AI",
|
||||
author_email="",
|
||||
description="AI-powered phone automation framework",
|
||||
long_description=long_description,
|
||||
long_description_content_type="text/markdown",
|
||||
|
||||
Reference in New Issue
Block a user