commit 7e1785e08e1fca2908ba6aefc65ce6b1adba7431
Author: zRzRzRzRzRzRzR <2448370773@qq.com>
Date: Mon Dec 8 23:54:29 2025 +0800
draft init
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yaml b/.github/ISSUE_TEMPLATE/bug_report.yaml
new file mode 100644
index 0000000..d3ac443
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug_report.yaml
@@ -0,0 +1,72 @@
+name: "\U0001F41B Bug Report"
+description: Submit a bug report to help us improve Open-AutoGLM / 提交一个 Bug 问题报告来帮助我们改进 Open-AutoGLM
+body:
+ - type: textarea
+ id: system-info
+ attributes:
+ label: System Info / 系統信息
+ description: Your operating environment / 您的运行环境信息
+ placeholder: Includes Cuda version, Transformers version, Python version, operating system, hardware information (if you suspect a hardware problem)... / 包括Cuda版本,Transformers版本,Python版本,操作系统,硬件信息(如果您怀疑是硬件方面的问题)...
+ validations:
+ required: true
+
+ - type: textarea
+ id: who-can-help
+ attributes:
+ label: Who can help? / 谁可以帮助到您?
+ description: |
+ Your issue will be replied to more quickly if you can figure out the right person to tag with @
+ All issues are read by one of the maintainers, so if you don't know who to tag, just leave this blank and our maintainer will ping the right person.
+
+ Please tag fewer than 3 people.
+
+ 如果您能找到合适的标签 @,您的问题会更快得到回复。
+ 所有问题都会由我们的维护者阅读,如果您不知道该标记谁,只需留空,我们的维护人员会找到合适的开发组成员来解决问题。
+
+ 标记的人数应该不超过 3 个人。
+
+ If it's not a bug in these three subsections, you may not specify the helper. Our maintainer will find the right person in the development group to solve the problem.
+
+ 如果不是这三个子版块的bug,您可以不指明帮助者,我们的维护人员会找到合适的开发组成员来解决问题。
+
+ placeholder: "@Username ..."
+
+ - type: checkboxes
+ id: information-scripts-examples
+ attributes:
+ label: Information / 问题信息
+ description: 'The problem arises when using: / 问题出现在'
+ options:
+ - label: "The official example scripts / 官方的示例脚本"
+ - label: "My own modified scripts / 我自己修改的脚本和任务"
+
+ - type: textarea
+ id: reproduction
+ validations:
+ required: true
+ attributes:
+ label: Reproduction / 复现过程
+ description: |
+ Please provide a code example that reproduces the problem you encountered, preferably with a minimal reproduction unit.
+ If you have code snippets, error messages, stack traces, please provide them here as well.
+ Please format your code correctly using code tags. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
+ Do not use screenshots, as they are difficult to read and (more importantly) do not allow others to copy and paste your code.
+
+ 请提供能重现您遇到的问题的代码示例,最好是最小复现单元。
+ 如果您有代码片段、错误信息、堆栈跟踪,也请在此提供。
+ 请使用代码标签正确格式化您的代码。请参见 https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
+ 请勿使用截图,因为截图难以阅读,而且(更重要的是)不允许他人复制粘贴您的代码。
+ placeholder: |
+ Steps to reproduce the behavior/复现Bug的步骤:
+
+ 1.
+ 2.
+ 3.
+
+ - type: textarea
+ id: expected-behavior
+ validations:
+ required: true
+ attributes:
+ label: Expected behavior / 期待表现
+ description: "A clear and concise description of what you would expect to happen. /简单描述您期望发生的事情。"
diff --git a/.github/ISSUE_TEMPLATE/feature-request.yaml b/.github/ISSUE_TEMPLATE/feature-request.yaml
new file mode 100644
index 0000000..e69a269
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature-request.yaml
@@ -0,0 +1,34 @@
+name: "\U0001F680 Feature request"
+description: Submit a request for a new Open-AutoGLM / 提交一个新的 Open-AutoGLM 的功能建议
+labels: [ "feature" ]
+body:
+ - type: textarea
+ id: feature-request
+ validations:
+ required: true
+ attributes:
+ label: Feature request / 功能建议
+ description: |
+ A brief description of the functional proposal. Links to corresponding papers and code are desirable.
+ 对功能建议的简述。最好提供对应的论文和代码链接
+
+ - type: textarea
+ id: motivation
+ validations:
+ required: true
+ attributes:
+ label: Motivation / 动机
+ description: |
+ Your motivation for making the suggestion. If that motivation is related to another GitHub issue, link to it here.
+ 您提出建议的动机。如果该动机与另一个 GitHub 问题有关,请在此处提供对应的链接。
+
+ - type: textarea
+ id: contribution
+ validations:
+ required: true
+ attributes:
+ label: Your contribution / 您的贡献
+ description: |
+
+ Your PR link or any other link you can help with.
+ 您的PR链接或者其他您能提供帮助的链接。
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
new file mode 100644
index 0000000..af69a7e
--- /dev/null
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,40 @@
+# Contribution Guide
+
+We welcome your contributions to this repository. To ensure elegant code style and better code quality, we have prepared
+the following contribution guidelines.
+
+## What We Accept
+
++ This PR fixes a typo or improves the documentation (if this is the case, you may skip the other checks).
++ This PR fixes a specific issue — please reference the issue number in the PR description. Make sure your code strictly
+ follows the coding standards below.
++ This PR introduces a new feature — please clearly explain the necessity and implementation of the feature. Make sure
+ your code strictly follows the coding standards below.
+
+## Code Style Guide
+
+Good code style is an art. We have prepared a `pre-commit` hook to enforce consistent code
+formatting across the project. You can clean up your code following the steps below:
+
+```shell
+pre-commit run --all-files
+```
+
+If your code complies with the standards, you should not see any errors.
+
+## Naming Conventions
+
++ Please use **English** for naming; do not use Pinyin or other languages. All comments should also be in English.
++ Follow **PEP8** naming conventions strictly, and use underscores to separate words. Avoid meaningless names such as
+ `a`, `b`, `c`.
+
+## For glmv-reward Contributors
+
+Before PR, Please run:
+
+```bash
+cd glmv-reward/
+uv sync
+uv run poe lint
+uv run poe typecheck
+```
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..986012f
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,60 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Virtual environments
+venv/
+ENV/
+env/
+.venv/
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+.nox/
+
+# Type checking
+.mypy_cache/
+
+# Jupyter
+.ipynb_checkpoints/
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Project specific
+*.log
+/tmp/
+screenshots/
+
+# Keep old files during transition
+call_model.py
+app_package_name.py
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
new file mode 100644
index 0000000..b92d80c
--- /dev/null
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,22 @@
+default_install_hook_types:
+ - pre-commit
+ - commit-msg
+
+default_stages:
+ - pre-commit # Run locally
+repos:
+- repo: https://github.com/astral-sh/ruff-pre-commit
+ rev: v0.11.7
+ hooks:
+ - id: ruff
+ args: [--output-format, github, --fix, --select, I]
+ - id: ruff-format
+- repo: https://github.com/crate-ci/typos
+ rev: v1.32.0
+ hooks:
+ - id: typos
+- repo: https://github.com/jackdewinter/pymarkdown
+ rev: v0.9.29
+ hooks:
+ - id: pymarkdown
+ args: [fix]
\ No newline at end of file
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..b1a313f
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,201 @@
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to the Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright [yyyy] [name of copyright owner]
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..ff1d98d
--- /dev/null
+++ b/README.md
@@ -0,0 +1,451 @@
+# Open-AutoGLM
+
+
+
+
+
+ 👋 加入我们的 微信 社区
+
+
+## 项目介绍
+
+Phone Agent 是一个基于 AutoGLM 构建的手机端智能助理框架,它能够以多模态方式理解手机屏幕内容,并通过自动化操作帮助用户完成任务。系统通过
+ADB(Android Debug Bridge)来控制设备,以视觉语言模型进行屏幕感知,再结合智能规划能力生成并执行操作流程。用户只需用自然语言描述需求,如“打开小红书搜索美食”,Phone
+Agent 即可自动解析意图、理解当前界面、规划下一步动作并完成整个流程。系统还内置敏感操作确认机制,并支持在登录或验证码场景下进行人工接管。同时,它提供远程
+ADB 调试能力,可通过 WiFi 或网络连接设备,实现灵活的远程控制与开发。
+
+> ⚠️ 本项目仅供研究和学习使用。严禁用于非法获取信息、干扰系统或任何违法活动。请仔细审阅 [使用条款](resources/privacy_policy.txt)。
+
+## 模型下载地址
+
+| Model | Download Links |
+|------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
+| AutoGLM-Phone-9B | [🤗 Hugging Face](https://huggingface.co/zai-org/AutoGLM-Phone-9B) [🤖 ModelScope](https://modelscope.cn/models/ZhipuAI/AutoGLM-Phone-9B) |
+
+## 环境准备
+
+### 1. Python 环境
+
+建议使用 Python 3.10 及以上版本。
+
+### 2. ADB (Android Debug Bridge)
+
+1. 下载官方 ADB [安装包](https://developer.android.com/tools/releases/platform-tools?hl=zh-cn),并解压到自定义路径
+2. 配置环境变量
+
+- MacOS 配置方法:在 `Terminal` 或者任何命令行工具里
+
+ ```bash
+ # 假设解压后的目录为 ~/Downlaods/platform-tools。如果不是请自行调整命令。
+ export PATH=${PATH}:~/Downloads/platform-tools
+ ```
+
+- Windows 配置方法:可参考 [第三方教程](https://blog.csdn.net/x2584179909/article/details/108319973) 进行配置。
+
+### 3. Android 7.0+ 的设备或模拟器,并启用 `开发者模式` 和 `USB 调试`
+
+1. 开发者模式启用:通常启用方法是,找到 `设置-关于手机-版本号` 然后连续快速点击 10
+ 次左右,直到弹出弹窗显示“开发者模式已启用”。不同手机会有些许差别,如果找不到,可以上网搜索一下教程。
+2. USB 调试启用:启用开发者模式之后,会出现 `设置-开发者选项-USB 调试`,勾选启用
+3. 部分机型在设置开发者选项以后, 可能需要重启设备才能生效. 可以测试一下: 将手机用USB数据线连接到电脑后, `adb devices`
+ 查看是否有设备信息, 如果没有说明连接失败.
+
+### 4. 安装 ADB Keyboard(用于文本输入)
+
+下载 [安装包](https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk) 并在对应的安卓设备中进行安装。
+注意,安装完成后还需要到 `设置-输入法` 或者 `设置-键盘列表` 中启用 `ADB Keyboard` 才能生效
+
+## 部署准备工作
+
+### 1. 安装依赖
+
+```bash
+pip install -r requirements.txt
+pip install -e .
+```
+
+### 2. 配置 ADB
+
+确认 **USB数据线具有数据传输功能**, 而不是仅有充电功能
+
+确保已安装 ADB 并使用 **USB数据线** 连接设备:
+
+```bash
+# 检查已连接的设备
+adb devices
+
+# 输出结果应显示你的设备,如:
+# List of devices attached
+# emulator-5554 device
+```
+
+### 3. 启动模型服务
+
+1. 下载模型,并按照 `requirements.txt` 中 `For Model Deployment` 章节自行安装推理引擎框架。
+2. 通过 SGlang / vLLM 启动,得到 OpenAI 格式服务。这里提供一个 vLLM部署方案,请严格遵循我们提供的启动参数:
+
+- vLLM:
+
+```shell
+python3 -m vllm.entrypoints.openai.api_server \
+ --served-model-name autoglm-phone-9b \
+ --allowed-local-media-path / \
+ --mm-encoder-tp-mode data \
+ --mm_processor_cache_type shm \
+ --mm_processor_kwargs "{\"max_pixels\":5000000}" \
+ --max-model-len 25480 \
+ --chat-template-content-format string \
+ --limit-mm-per-prompt "{\"image\":10}" \
+ --model zai-org/AutoGLM-Phone-9B \
+ --port 8000
+```
+
+- 该模型结构与 `GLM-4.1V-9B-Thinking` 相同, 关于模型部署的详细内容,你也以查看 [GLM-V](https://github.com/zai-org/GLM-V)
+ 获取模型部署和使用指南。
+
+- 运行成功后,将可以通过 `http://localhost:8000/v1` 访问模型服务。 如果您在远程服务器部署模型, 使用该服务器的IP访问模型.
+
+## 使用 AutoGLM
+
+### 命令行
+
+根据你部署的模型, 设置 `--base-url` 和 `--model` 参数. 例如:
+
+```bash
+# 交互模式
+python main.py --base-url http://localhost:8000/v1 --model "autoglm-phone-9b"
+
+# 指定模型端点
+python main.py --base-url http://localhost:8000/v1 "打开美团搜索附近的火锅店"
+
+# 列出支持的应用
+python main.py --list-apps
+```
+
+### Python API
+
+```python
+from phone_agent import PhoneAgent
+from phone_agent.model import ModelConfig
+
+# Configure model
+model_config = ModelConfig(
+ base_url="http://localhost:8000/v1",
+ model_name="autoglm-phone-9b",
+)
+
+# 创建 Agent
+agent = PhoneAgent(model_config=model_config)
+
+# 执行任务
+result = agent.run("打开淘宝搜索无线耳机")
+print(result)
+```
+
+## 远程调试
+
+Phone Agent 支持通过 WiFi/网络进行远程 ADB 调试,无需 USB 连接即可控制设备。
+
+### 配置远程调试
+
+#### 在手机端开启无线调试
+
+确保手机和电脑在同一个WiFi中,如图所示
+
+
+
+#### 在电脑端使用标准 ADB 命令
+
+```bash
+
+# 通过 WiFi 连接, 改成手机显示的 IP 地址和端口
+adb connect 192.168.1.100:5555
+
+# 验证连接
+adb devices
+# 应显示:192.168.1.100:5555 device
+```
+
+### 设备管理命令
+
+```bash
+# 列出所有已连接设备
+adb devices
+
+# 连接远程设备
+adb connect 192.168.1.100:5555
+
+# 断开指定设备
+adb disconnect 192.168.1.100:5555
+
+# 指定设备执行任务
+python main.py --device-id 192.168.1.100:5555 --base-url http://localhost:8000/v1 --model "autoglm-phone-9b" "打开抖音刷视频"
+```
+
+### Python API 远程连接
+
+```python
+from phone_agent.adb import ADBConnection, list_devices
+
+# 创建连接管理器
+conn = ADBConnection()
+
+# 连接远程设备
+success, message = conn.connect("192.168.1.100:5555")
+print(f"连接状态: {message}")
+
+# 列出已连接设备
+devices = list_devices()
+for device in devices:
+ print(f"{device.device_id} - {device.connection_type.value}")
+
+# 在 USB 设备上启用 TCP/IP
+success, message = conn.enable_tcpip(5555)
+ip = conn.get_device_ip()
+print(f"设备 IP: {ip}")
+
+# 断开连接
+conn.disconnect("192.168.1.100:5555")
+```
+
+### 远程连接问题排查
+
+**连接被拒绝:**
+
+- 确保设备和电脑在同一网络
+- 检查防火墙是否阻止 5555 端口
+- 确认已启用 TCP/IP 模式:`adb tcpip 5555`
+
+**连接断开:**
+
+- WiFi 可能断开了,使用 `--connect` 重新连接
+- 部分设备重启后会禁用 TCP/IP,需要通过 USB 重新启用
+
+**多设备:**
+
+- 使用 `--device-id` 指定要使用的设备
+- 或使用 `--list-devices` 查看所有已连接设备
+
+## 配置
+
+### 自定义SYSTEM PROMPT
+
+直接修改配置文件 `phone_agent/config/prompts.py`
+
+1. 可以通过注入system prompt来增强模型在特定领域的能力
+2. 可以通过注入app名称禁用某些app
+
+### 环境变量
+
+| 变量 | 描述 | 默认值 |
+|-------------------------|-----------|----------------------------|
+| `PHONE_AGENT_BASE_URL` | 模型 API 地址 | `http://localhost:8000/v1` |
+| `PHONE_AGENT_MODEL` | 模型名称 | `autoglm-phone-9b` |
+| `PHONE_AGENT_MAX_STEPS` | 每个任务最大步数 | `100` |
+| `PHONE_AGENT_DEVICE_ID` | ADB 设备 ID | (自动检测) |
+
+### 模型配置
+
+```python
+from phone_agent.model import ModelConfig
+
+config = ModelConfig(
+ base_url="http://localhost:8000/v1",
+ api_key="EMPTY", # API 密钥(如需要)
+ model_name="autoglm-phone-9b", # 模型名称
+ max_tokens=3000, # 最大输出 token 数
+ temperature=0.1, # 采样温度
+ frequency_penalty=0.2, # 频率惩罚
+)
+```
+
+### Agent 配置
+
+```python
+from phone_agent.agent import AgentConfig
+
+config = AgentConfig(
+ max_steps=100, # 每个任务最大步数
+ device_id=None, # ADB 设备 ID(None 为自动检测)
+ verbose=True, # 打印调试信息(包括思考过程和执行动作)
+)
+```
+
+### Verbose 模式输出
+
+当 `verbose=True` 时,Agent 会在每一步输出详细信息:
+
+```
+==================================================
+💭 思考过程:
+--------------------------------------------------
+当前在系统桌面,需要先启动小红书应用
+--------------------------------------------------
+🎯 执行动作:
+{
+ "_metadata": "do",
+ "action": "Launch",
+ "app": "小红书"
+}
+==================================================
+
+... (执行动作后继续下一步)
+
+==================================================
+💭 思考过程:
+--------------------------------------------------
+小红书已打开,现在需要点击搜索框
+--------------------------------------------------
+🎯 执行动作:
+{
+ "_metadata": "do",
+ "action": "Tap",
+ "element": [500, 100]
+}
+==================================================
+
+🎉 ================================================
+✅ 任务完成: 已成功搜索美食攻略
+==================================================
+```
+
+这样可以清楚地看到 AI 的推理过程和每一步的具体操作。
+
+## 支持的应用
+
+Phone Agent 支持 50+ 款主流中文应用:
+
+| 分类 | 应用 |
+|------|-----------------|
+| 社交通讯 | 微信、QQ、微博 |
+| 电商购物 | 淘宝、京东、拼多多 |
+| 美食外卖 | 美团、饿了么、肯德基 |
+| 出行旅游 | 携程、12306、滴滴出行 |
+| 视频娱乐 | bilibili、抖音、爱奇艺 |
+| 音乐音频 | 网易云音乐、QQ音乐、喜马拉雅 |
+| 生活服务 | 大众点评、高德地图、百度地图 |
+| 内容社区 | 小红书、知乎、豆瓣 |
+
+运行 `python main.py --list-apps` 查看完整列表。
+
+## 可用操作
+
+Agent 可以执行以下操作:
+
+| 操作 | 描述 |
+|--------------|-----------------|
+| `Launch` | 启动应用 |
+| `Tap` | 点击指定坐标 |
+| `Type` | 输入文本 |
+| `Swipe` | 滑动屏幕 |
+| `Back` | 返回上一页 |
+| `Home` | 返回桌面 |
+| `Long Press` | 长按 |
+| `Double Tap` | 双击 |
+| `Wait` | 等待页面加载 |
+| `Take_over` | 请求人工接管(登录/验证码等) |
+
+## 自定义回调
+
+处理敏感操作确认和人工接管:
+
+```python
+def my_confirmation(message: str) -> bool:
+ """敏感操作确认回调"""
+ return input(f"确认执行 {message}?(y/n): ").lower() == "y"
+
+
+def my_takeover(message: str) -> None:
+ """人工接管回调"""
+ print(f"请手动完成: {message}")
+ input("完成后按回车继续...")
+
+
+agent = PhoneAgent(
+ confirmation_callback=my_confirmation,
+ takeover_callback=my_takeover,
+)
+```
+
+## 示例
+
+查看 `examples/` 目录获取更多使用示例:
+
+- `basic_usage.py` - 基础任务执行
+- 单步调试模式
+- 批量任务执行
+- 自定义回调
+
+## 二次开发
+
+### 配置开发环境
+
+二次开发需要使用开发依赖:
+
+```bash
+pip install -e ".[dev]"
+```
+
+### 运行测试
+
+```bash
+pytest tests/
+```
+
+### 完整项目结构
+
+```
+phone_agent/
+├── __init__.py # 包导出
+├── agent.py # PhoneAgent 主类
+├── adb/ # ADB 工具
+│ ├── connection.py # 远程/本地连接管理
+│ ├── screenshot.py # 屏幕截图
+│ ├── input.py # 文本输入 (ADB Keyboard)
+│ └── device.py # 设备控制 (点击、滑动等)
+├── actions/ # 操作处理
+│ └── handler.py # 操作执行器
+├── config/ # 配置
+│ ├── apps.py # 支持的应用映射
+│ └── prompts.py # 系统提示词
+└── model/ # AI 模型客户端
+ └── client.py # OpenAI 兼容客户端
+```
+
+## 常见问题
+
+我们列举了一些常见的问题,以及对应的解决方案:
+
+### 设备未找到
+
+尝试通过重启 ADB 服务来解决:
+
+```bash
+adb kill-server
+adb start-server
+adb devices
+```
+
+### 文本输入不工作
+
+1. 确保设备已安装 ADB Keyboard
+2. 在设置 > 系统 > 语言和输入法 > 虚拟键盘 中启用
+3. Agent 会在需要输入时自动切换到 ADB Keyboard
+
+### 截图失败(黑屏)
+
+这通常意味着应用正在显示敏感页面(支付、密码、银行类应用)。Agent 会自动检测并请求人工接管。
+
+
+### 引用
+
+如果你觉得我们的工作有帮助,请引用以下论文:
+
+```bibtex
+@article{liu2024autoglm,
+ title={Autoglm: Autonomous foundation agents for guis},
+ author={Liu, Xiao and Qin, Bo and Liang, Dongzhu and Dong, Guang and Lai, Hanyu and Zhang, Hanchen and Zhao, Hanlin and Iong, Iat Long and Sun, Jiadai and Wang, Jiaqi and others},
+ journal={arXiv preprint arXiv:2411.00820},
+ year={2024}
+}
+```
\ No newline at end of file
diff --git a/examples/basic_usage.py b/examples/basic_usage.py
new file mode 100644
index 0000000..9e44122
--- /dev/null
+++ b/examples/basic_usage.py
@@ -0,0 +1,159 @@
+#!/usr/bin/env python3
+"""
+Phone Agent 使用示例
+
+演示如何通过 Python API 使用 Phone Agent 进行手机自动化任务。
+"""
+
+from phone_agent import PhoneAgent
+from phone_agent.agent import AgentConfig
+from phone_agent.model import ModelConfig
+
+
+def example_basic_task():
+ """基础任务示例"""
+ # 配置模型端点
+ model_config = ModelConfig(
+ base_url="http://localhost:8000/v1",
+ model_name="autoglm-phone-9b",
+ temperature=0.1,
+ )
+
+ # 配置 Agent 行为
+ agent_config = AgentConfig(
+ max_steps=50,
+ verbose=True,
+ )
+
+ # 创建 Agent
+ agent = PhoneAgent(
+ model_config=model_config,
+ agent_config=agent_config,
+ )
+
+ # 执行任务
+ result = agent.run("打开小红书搜索美食攻略")
+ print(f"任务结果: {result}")
+
+
+def example_with_callbacks():
+ """带回调的任务示例"""
+
+ def my_confirmation(message: str) -> bool:
+ """敏感操作确认回调"""
+ print(f"\n[需要确认] {message}")
+ response = input("是否继续?(y/n): ")
+ return response.lower() in ("yes", "y", "是")
+
+ def my_takeover(message: str) -> None:
+ """人工接管回调"""
+ print(f"\n[需要人工操作] {message}")
+ print("请手动完成操作...")
+ input("完成后按回车继续: ")
+
+ # 创建带自定义回调的 Agent
+ agent = PhoneAgent(
+ confirmation_callback=my_confirmation,
+ takeover_callback=my_takeover,
+ )
+
+ # 执行可能需要确认的任务
+ result = agent.run("打开淘宝搜索无线耳机并加入购物车")
+ print(f"任务结果: {result}")
+
+
+def example_step_by_step():
+ """单步执行示例(用于调试)"""
+ agent = PhoneAgent()
+
+ # 初始化任务
+ result = agent.step("打开美团搜索附近的火锅店")
+ print(f"步骤 1: {result.action}")
+
+ # 如果未完成,继续执行
+ while not result.finished and agent.step_count < 10:
+ result = agent.step()
+ print(f"步骤 {agent.step_count}: {result.action}")
+ print(f" 思考过程: {result.thinking[:100]}...")
+
+ print(f"\n最终结果: {result.message}")
+
+
+def example_multiple_tasks():
+ """批量任务示例"""
+ agent = PhoneAgent()
+
+ tasks = [
+ "打开高德地图查看实时路况",
+ "打开大众点评搜索附近的咖啡店",
+ "打开bilibili搜索Python教程",
+ ]
+
+ for task in tasks:
+ print(f"\n{'='*50}")
+ print(f"任务: {task}")
+ print('='*50)
+
+ result = agent.run(task)
+ print(f"结果: {result}")
+
+ # 重置 Agent 状态
+ agent.reset()
+
+
+def example_remote_device():
+ """远程设备示例"""
+ from phone_agent.adb import ADBConnection
+
+ # 创建连接管理器
+ conn = ADBConnection()
+
+ # 连接远程设备
+ success, message = conn.connect("192.168.1.100:5555")
+ if not success:
+ print(f"连接失败: {message}")
+ return
+
+ print(f"连接成功: {message}")
+
+ # 创建 Agent 并指定设备
+ agent_config = AgentConfig(
+ device_id="192.168.1.100:5555",
+ verbose=True,
+ )
+
+ agent = PhoneAgent(agent_config=agent_config)
+
+ # 执行任务
+ result = agent.run("打开微信查看消息")
+ print(f"任务结果: {result}")
+
+ # 断开连接
+ conn.disconnect("192.168.1.100:5555")
+
+
+if __name__ == "__main__":
+ print("Phone Agent 使用示例")
+ print("=" * 50)
+
+ # 运行基础示例
+ print("\n1. 基础任务示例")
+ print("-" * 30)
+ example_basic_task()
+
+ # 其他示例可以取消注释运行
+ # print("\n2. 带回调的任务示例")
+ # print("-" * 30)
+ # example_with_callbacks()
+
+ # print("\n3. 单步执行示例")
+ # print("-" * 30)
+ # example_step_by_step()
+
+ # print("\n4. 批量任务示例")
+ # print("-" * 30)
+ # example_multiple_tasks()
+
+ # print("\n5. 远程设备示例")
+ # print("-" * 30)
+ # example_remote_device()
diff --git a/examples/demo_thinking.py b/examples/demo_thinking.py
new file mode 100644
index 0000000..283cd0e
--- /dev/null
+++ b/examples/demo_thinking.py
@@ -0,0 +1,84 @@
+#!/usr/bin/env python3
+"""
+演示 thinking 输出的示例
+
+这个脚本展示了在 verbose 模式下,Agent 会同时输出思考过程和执行动作。
+"""
+
+from phone_agent import PhoneAgent
+from phone_agent.agent import AgentConfig
+from phone_agent.model import ModelConfig
+
+
+def main():
+ print("="*60)
+ print("Phone Agent - Thinking 输出演示")
+ print("="*60)
+
+ # 配置模型
+ model_config = ModelConfig(
+ base_url="http://localhost:8000/v1",
+ model_name="autoglm-phone-9b",
+ temperature=0.1,
+ )
+
+ # 配置 Agent (verbose=True 会输出详细信息)
+ agent_config = AgentConfig(
+ max_steps=10,
+ verbose=True, # 开启详细输出
+ )
+
+ # 创建 Agent
+ agent = PhoneAgent(
+ model_config=model_config,
+ agent_config=agent_config,
+ )
+
+ # 执行任务
+ print("\n📱 开始执行任务...\n")
+ result = agent.run("打开小红书搜索美食攻略")
+
+ print("\n" + "="*60)
+ print(f"📊 最终结果: {result}")
+ print("="*60)
+
+
+if __name__ == "__main__":
+ """
+ 运行此脚本,你将看到如下格式的输出:
+
+ ==================================================
+ 💭 思考过程:
+ --------------------------------------------------
+ 当前在系统桌面,需要先启动小红书应用,然后进行搜索
+ --------------------------------------------------
+ 🎯 执行动作:
+ {
+ "_metadata": "do",
+ "action": "Launch",
+ "app": "小红书"
+ }
+ ==================================================
+
+ (执行后会继续下一步...)
+
+ ==================================================
+ 💭 思考过程:
+ --------------------------------------------------
+ 小红书已打开,现在需要点击搜索框并输入关键词
+ --------------------------------------------------
+ 🎯 执行动作:
+ {
+ "_metadata": "do",
+ "action": "Tap",
+ "element": [500, 100]
+ }
+ ==================================================
+
+ ... (更多步骤)
+
+ 🎉 ================================================
+ ✅ 任务完成: 已成功搜索美食攻略
+ ==================================================
+ """
+ main()
diff --git a/main.py b/main.py
new file mode 100644
index 0000000..f240e90
--- /dev/null
+++ b/main.py
@@ -0,0 +1,528 @@
+#!/usr/bin/env python3
+"""
+Phone Agent CLI - AI-powered phone automation.
+
+Usage:
+ python main.py [OPTIONS]
+
+Environment Variables:
+ PHONE_AGENT_BASE_URL: Model API base URL (default: http://localhost:8000/v1)
+ PHONE_AGENT_MODEL: Model name (default: autoglm-phone-9b)
+ PHONE_AGENT_MAX_STEPS: Maximum steps per task (default: 100)
+ PHONE_AGENT_DEVICE_ID: ADB device ID for multi-device setups
+"""
+
+import argparse
+import os
+import shutil
+import subprocess
+import sys
+from urllib.parse import urlparse
+
+from openai import OpenAI
+
+from phone_agent import PhoneAgent
+from phone_agent.adb import ADBConnection, list_devices
+from phone_agent.agent import AgentConfig
+from phone_agent.config.apps import list_supported_apps
+from phone_agent.model import ModelConfig
+
+
+def check_system_requirements() -> bool:
+ """
+ Check system requirements before running the agent.
+
+ Checks:
+ 1. ADB tools installed
+ 2. At least one device connected
+ 3. ADB Keyboard installed on the device
+
+ Returns:
+ True if all checks pass, False otherwise.
+ """
+ print("🔍 Checking system requirements...")
+ print("-" * 50)
+
+ all_passed = True
+
+ # Check 1: ADB installed
+ print("1. Checking ADB installation...", end=" ")
+ if shutil.which("adb") is None:
+ print("❌ FAILED")
+ print(" Error: ADB is not installed or not in PATH.")
+ print(" Solution: Install Android SDK Platform Tools:")
+ print(" - macOS: brew install android-platform-tools")
+ print(" - Linux: sudo apt install android-tools-adb")
+ print(
+ " - Windows: Download from https://developer.android.com/studio/releases/platform-tools"
+ )
+ all_passed = False
+ else:
+ # Double check by running adb version
+ try:
+ result = subprocess.run(
+ ["adb", "version"], capture_output=True, text=True, timeout=10
+ )
+ if result.returncode == 0:
+ version_line = result.stdout.strip().split("\n")[0]
+ print(f"✅ OK ({version_line})")
+ else:
+ print("❌ FAILED")
+ print(" Error: ADB command failed to run.")
+ all_passed = False
+ except FileNotFoundError:
+ print("❌ FAILED")
+ print(" Error: ADB command not found.")
+ all_passed = False
+ except subprocess.TimeoutExpired:
+ print("❌ FAILED")
+ print(" Error: ADB command timed out.")
+ all_passed = False
+
+ # If ADB is not installed, skip remaining checks
+ if not all_passed:
+ print("-" * 50)
+ print("❌ System check failed. Please fix the issues above.")
+ return False
+
+ # Check 2: Device connected
+ print("2. Checking connected devices...", end=" ")
+ try:
+ result = subprocess.run(
+ ["adb", "devices"], capture_output=True, text=True, timeout=10
+ )
+ lines = result.stdout.strip().split("\n")
+ # Filter out header and empty lines, look for 'device' status
+ devices = [line for line in lines[1:] if line.strip() and "\tdevice" in line]
+
+ if not devices:
+ print("❌ FAILED")
+ print(" Error: No devices connected.")
+ print(" Solution:")
+ print(" 1. Enable USB debugging on your Android device")
+ print(" 2. Connect via USB and authorize the connection")
+ print(" 3. Or connect remotely: python main.py --connect :")
+ all_passed = False
+ else:
+ device_ids = [d.split("\t")[0] for d in devices]
+ print(f"✅ OK ({len(devices)} device(s): {', '.join(device_ids)})")
+ except subprocess.TimeoutExpired:
+ print("❌ FAILED")
+ print(" Error: ADB command timed out.")
+ all_passed = False
+ except Exception as e:
+ print("❌ FAILED")
+ print(f" Error: {e}")
+ all_passed = False
+
+ # If no device connected, skip ADB Keyboard check
+ if not all_passed:
+ print("-" * 50)
+ print("❌ System check failed. Please fix the issues above.")
+ return False
+
+ # Check 3: ADB Keyboard installed
+ print("3. Checking ADB Keyboard...", end=" ")
+ try:
+ result = subprocess.run(
+ ["adb", "shell", "ime", "list", "-s"],
+ capture_output=True,
+ text=True,
+ timeout=10,
+ )
+ ime_list = result.stdout.strip()
+
+ if "com.android.adbkeyboard/.AdbIME" in ime_list:
+ print("✅ OK")
+ else:
+ print("❌ FAILED")
+ print(" Error: ADB Keyboard is not installed on the device.")
+ print(" Solution:")
+ print(" 1. Download ADB Keyboard APK from:")
+ print(
+ " https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk"
+ )
+ print(" 2. Install it on your device: adb install ADBKeyboard.apk")
+ print(
+ " 3. Enable it in Settings > System > Languages & Input > Virtual Keyboard"
+ )
+ all_passed = False
+ except subprocess.TimeoutExpired:
+ print("❌ FAILED")
+ print(" Error: ADB command timed out.")
+ all_passed = False
+ except Exception as e:
+ print("❌ FAILED")
+ print(f" Error: {e}")
+ all_passed = False
+
+ print("-" * 50)
+
+ if all_passed:
+ print("✅ All system checks passed!\n")
+ else:
+ print("❌ System check failed. Please fix the issues above.")
+
+ return all_passed
+
+
+def check_model_api(base_url: str, model_name: str) -> bool:
+ """
+ Check if the model API is accessible and the specified model exists.
+
+ Checks:
+ 1. Network connectivity to the API endpoint
+ 2. Model exists in the available models list
+
+ Args:
+ base_url: The API base URL
+ model_name: The model name to check
+
+ Returns:
+ True if all checks pass, False otherwise.
+ """
+ print("🔍 Checking model API...")
+ print("-" * 50)
+
+ all_passed = True
+
+ # Check 1: Network connectivity
+ print(f"1. Checking API connectivity ({base_url})...", end=" ")
+ try:
+ # Parse the URL to get host and port
+ parsed = urlparse(base_url)
+
+ # Create OpenAI client
+ client = OpenAI(base_url=base_url, api_key="EMPTY", timeout=10.0)
+
+ # Try to list models (this tests connectivity)
+ models_response = client.models.list()
+ available_models = [model.id for model in models_response.data]
+
+ print("✅ OK")
+
+ # Check 2: Model exists
+ print(f"2. Checking model '{model_name}'...", end=" ")
+ if model_name in available_models:
+ print("✅ OK")
+ else:
+ print("❌ FAILED")
+ print(f" Error: Model '{model_name}' not found.")
+ print(f" Available models:")
+ for m in available_models[:10]: # Show first 10 models
+ print(f" - {m}")
+ if len(available_models) > 10:
+ print(f" ... and {len(available_models) - 10} more")
+ all_passed = False
+
+ except Exception as e:
+ print("❌ FAILED")
+ error_msg = str(e)
+
+ # Provide more specific error messages
+ if "Connection refused" in error_msg or "Connection error" in error_msg:
+ print(f" Error: Cannot connect to {base_url}")
+ print(" Solution:")
+ print(" 1. Check if the model server is running")
+ print(" 2. Verify the base URL is correct")
+ print(f" 3. Try: curl {base_url}/models")
+ elif "timed out" in error_msg.lower() or "timeout" in error_msg.lower():
+ print(f" Error: Connection to {base_url} timed out")
+ print(" Solution:")
+ print(" 1. Check your network connection")
+ print(" 2. Verify the server is responding")
+ elif (
+ "Name or service not known" in error_msg
+ or "nodename nor servname" in error_msg
+ ):
+ print(f" Error: Cannot resolve hostname")
+ print(" Solution:")
+ print(" 1. Check the URL is correct")
+ print(" 2. Verify DNS settings")
+ else:
+ print(f" Error: {error_msg}")
+
+ all_passed = False
+
+ print("-" * 50)
+
+ if all_passed:
+ print("✅ Model API checks passed!\n")
+ else:
+ print("❌ Model API check failed. Please fix the issues above.")
+
+ return all_passed
+
+
+def parse_args() -> argparse.Namespace:
+ """Parse command line arguments."""
+ parser = argparse.ArgumentParser(
+ description="Phone Agent - AI-powered phone automation",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ epilog="""
+Examples:
+ # Run with default settings
+ python main.py
+
+ # Specify model endpoint
+ python main.py --base-url http://localhost:8000/v1
+
+ # Run with specific device
+ python main.py --device-id emulator-5554
+
+ # Connect to remote device
+ python main.py --connect 192.168.1.100:5555
+
+ # List connected devices
+ python main.py --list-devices
+
+ # Enable TCP/IP on USB device and get connection info
+ python main.py --enable-tcpip
+
+ # List supported apps
+ python main.py --list-apps
+ """,
+ )
+
+ # Model options
+ parser.add_argument(
+ "--base-url",
+ type=str,
+ default=os.getenv("PHONE_AGENT_BASE_URL", "http://localhost:8000/v1"),
+ help="Model API base URL",
+ )
+
+ parser.add_argument(
+ "--model",
+ type=str,
+ default=os.getenv("PHONE_AGENT_MODEL", "autoglm-phone-9b"),
+ help="Model name",
+ )
+
+ parser.add_argument(
+ "--max-steps",
+ type=int,
+ default=int(os.getenv("PHONE_AGENT_MAX_STEPS", "100")),
+ help="Maximum steps per task",
+ )
+
+ # Device options
+ parser.add_argument(
+ "--device-id",
+ "-d",
+ type=str,
+ default=os.getenv("PHONE_AGENT_DEVICE_ID"),
+ help="ADB device ID",
+ )
+
+ parser.add_argument(
+ "--connect",
+ "-c",
+ type=str,
+ metavar="ADDRESS",
+ help="Connect to remote device (e.g., 192.168.1.100:5555)",
+ )
+
+ parser.add_argument(
+ "--disconnect",
+ type=str,
+ nargs="?",
+ const="all",
+ metavar="ADDRESS",
+ help="Disconnect from remote device (or 'all' to disconnect all)",
+ )
+
+ parser.add_argument(
+ "--list-devices", action="store_true", help="List connected devices and exit"
+ )
+
+ parser.add_argument(
+ "--enable-tcpip",
+ type=int,
+ nargs="?",
+ const=5555,
+ metavar="PORT",
+ help="Enable TCP/IP debugging on USB device (default port: 5555)",
+ )
+
+ # Other options
+ parser.add_argument(
+ "--quiet", "-q", action="store_true", help="Suppress verbose output"
+ )
+
+ parser.add_argument(
+ "--list-apps", action="store_true", help="List supported apps and exit"
+ )
+
+ parser.add_argument(
+ "task",
+ nargs="?",
+ type=str,
+ help="Task to execute (interactive mode if not provided)",
+ )
+
+ return parser.parse_args()
+
+
+def handle_device_commands(args) -> bool:
+ """
+ Handle device-related commands.
+
+ Returns:
+ True if a device command was handled (should exit), False otherwise.
+ """
+ conn = ADBConnection()
+
+ # Handle --list-devices
+ if args.list_devices:
+ devices = list_devices()
+ if not devices:
+ print("No devices connected.")
+ else:
+ print("Connected devices:")
+ print("-" * 60)
+ for device in devices:
+ status_icon = "✓" if device.status == "device" else "✗"
+ conn_type = device.connection_type.value
+ model_info = f" ({device.model})" if device.model else ""
+ print(
+ f" {status_icon} {device.device_id:<30} [{conn_type}]{model_info}"
+ )
+ return True
+
+ # Handle --connect
+ if args.connect:
+ print(f"Connecting to {args.connect}...")
+ success, message = conn.connect(args.connect)
+ print(f"{'✓' if success else '✗'} {message}")
+ if success:
+ # Set as default device
+ args.device_id = args.connect
+ return not success # Continue if connection succeeded
+
+ # Handle --disconnect
+ if args.disconnect:
+ if args.disconnect == "all":
+ print("Disconnecting all remote devices...")
+ success, message = conn.disconnect()
+ else:
+ print(f"Disconnecting from {args.disconnect}...")
+ success, message = conn.disconnect(args.disconnect)
+ print(f"{'✓' if success else '✗'} {message}")
+ return True
+
+ # Handle --enable-tcpip
+ if args.enable_tcpip:
+ port = args.enable_tcpip
+ print(f"Enabling TCP/IP debugging on port {port}...")
+
+ success, message = conn.enable_tcpip(port, args.device_id)
+ print(f"{'✓' if success else '✗'} {message}")
+
+ if success:
+ # Try to get device IP
+ ip = conn.get_device_ip(args.device_id)
+ if ip:
+ print(f"\nYou can now connect remotely using:")
+ print(f" python main.py --connect {ip}:{port}")
+ print(f"\nOr via ADB directly:")
+ print(f" adb connect {ip}:{port}")
+ else:
+ print("\nCould not determine device IP. Check device WiFi settings.")
+ return True
+
+ return False
+
+
+def main():
+ """Main entry point."""
+ args = parse_args()
+
+ # Handle --list-apps (no system check needed)
+ if args.list_apps:
+ print("Supported apps:")
+ for app in sorted(list_supported_apps()):
+ print(f" - {app}")
+ return
+
+ # Handle device commands (these may need partial system checks)
+ if handle_device_commands(args):
+ return
+
+ # Run system requirements check before proceeding
+ if not check_system_requirements():
+ sys.exit(1)
+
+ # Check model API connectivity and model availability
+ if not check_model_api(args.base_url, args.model):
+ sys.exit(1)
+
+ # Create configurations
+ model_config = ModelConfig(
+ base_url=args.base_url,
+ model_name=args.model,
+ )
+
+ agent_config = AgentConfig(
+ max_steps=args.max_steps,
+ device_id=args.device_id,
+ verbose=not args.quiet,
+ )
+
+ # Create agent
+ agent = PhoneAgent(
+ model_config=model_config,
+ agent_config=agent_config,
+ )
+
+ # Print header
+ print("=" * 50)
+ print("Phone Agent - AI-powered phone automation")
+ print("=" * 50)
+ print(f"Model: {model_config.model_name}")
+ print(f"Base URL: {model_config.base_url}")
+ print(f"Max Steps: {agent_config.max_steps}")
+
+ # Show device info
+ devices = list_devices()
+ if agent_config.device_id:
+ print(f"Device: {agent_config.device_id}")
+ elif devices:
+ print(f"Device: {devices[0].device_id} (auto-detected)")
+
+ print("=" * 50)
+
+ # Run with provided task or enter interactive mode
+ if args.task:
+ print(f"\nTask: {args.task}\n")
+ result = agent.run(args.task)
+ print(f"\nResult: {result}")
+ else:
+ # Interactive mode
+ print("\nEntering interactive mode. Type 'quit' to exit.\n")
+
+ while True:
+ try:
+ task = input("Enter your task: ").strip()
+
+ if task.lower() in ("quit", "exit", "q"):
+ print("Goodbye!")
+ break
+
+ if not task:
+ continue
+
+ print()
+ result = agent.run(task)
+ print(f"\nResult: {result}\n")
+ agent.reset()
+
+ except KeyboardInterrupt:
+ print("\n\nInterrupted. Goodbye!")
+ break
+ except Exception as e:
+ print(f"\nError: {e}\n")
+
+
+if __name__ == "__main__":
+ main()
diff --git a/phone_agent/__init__.py b/phone_agent/__init__.py
new file mode 100644
index 0000000..0bb1fb2
--- /dev/null
+++ b/phone_agent/__init__.py
@@ -0,0 +1,11 @@
+"""
+Phone Agent - An AI-powered phone automation framework.
+
+This package provides tools for automating Android phone interactions
+using AI models for visual understanding and decision making.
+"""
+
+from phone_agent.agent import PhoneAgent
+
+__version__ = "0.1.0"
+__all__ = ["PhoneAgent"]
diff --git a/phone_agent/actions/__init__.py b/phone_agent/actions/__init__.py
new file mode 100644
index 0000000..ac828dc
--- /dev/null
+++ b/phone_agent/actions/__init__.py
@@ -0,0 +1,5 @@
+"""Action handling module for Phone Agent."""
+
+from phone_agent.actions.handler import ActionHandler, ActionResult
+
+__all__ = ["ActionHandler", "ActionResult"]
diff --git a/phone_agent/actions/handler.py b/phone_agent/actions/handler.py
new file mode 100644
index 0000000..13cc1a0
--- /dev/null
+++ b/phone_agent/actions/handler.py
@@ -0,0 +1,307 @@
+"""Action handler for processing AI model outputs."""
+
+import time
+from dataclasses import dataclass
+from typing import Any, Callable
+
+from phone_agent.adb import (
+ back,
+ clear_text,
+ detect_and_set_adb_keyboard,
+ double_tap,
+ home,
+ launch_app,
+ long_press,
+ restore_keyboard,
+ swipe,
+ tap,
+ type_text,
+)
+
+
+@dataclass
+class ActionResult:
+ """Result of an action execution."""
+
+ success: bool
+ should_finish: bool
+ message: str | None = None
+ requires_confirmation: bool = False
+
+
+class ActionHandler:
+ """
+ Handles execution of actions from AI model output.
+
+ Args:
+ device_id: Optional ADB device ID for multi-device setups.
+ confirmation_callback: Optional callback for sensitive action confirmation.
+ Should return True to proceed, False to cancel.
+ takeover_callback: Optional callback for takeover requests (login, captcha).
+ """
+
+ def __init__(
+ self,
+ device_id: str | None = None,
+ confirmation_callback: Callable[[str], bool] | None = None,
+ takeover_callback: Callable[[str], None] | None = None,
+ ):
+ self.device_id = device_id
+ self.confirmation_callback = confirmation_callback or self._default_confirmation
+ self.takeover_callback = takeover_callback or self._default_takeover
+
+ def execute(
+ self, action: dict[str, Any], screen_width: int, screen_height: int
+ ) -> ActionResult:
+ """
+ Execute an action from the AI model.
+
+ Args:
+ action: The action dictionary from the model.
+ screen_width: Current screen width in pixels.
+ screen_height: Current screen height in pixels.
+
+ Returns:
+ ActionResult indicating success and whether to finish.
+ """
+ action_type = action.get("_metadata")
+
+ if action_type == "finish":
+ return ActionResult(
+ success=True, should_finish=True, message=action.get("message")
+ )
+
+ if action_type != "do":
+ return ActionResult(
+ success=False,
+ should_finish=True,
+ message=f"Unknown action type: {action_type}",
+ )
+
+ action_name = action.get("action")
+ handler_method = self._get_handler(action_name)
+
+ if handler_method is None:
+ return ActionResult(
+ success=False,
+ should_finish=False,
+ message=f"Unknown action: {action_name}",
+ )
+
+ try:
+ return handler_method(action, screen_width, screen_height)
+ except Exception as e:
+ return ActionResult(
+ success=False, should_finish=False, message=f"Action failed: {e}"
+ )
+
+ def _get_handler(self, action_name: str) -> Callable | None:
+ """Get the handler method for an action."""
+ handlers = {
+ "Launch": self._handle_launch,
+ "Tap": self._handle_tap,
+ "Type": self._handle_type,
+ "Type_Name": self._handle_type,
+ "Swipe": self._handle_swipe,
+ "Back": self._handle_back,
+ "Home": self._handle_home,
+ "Double Tap": self._handle_double_tap,
+ "Long Press": self._handle_long_press,
+ "Wait": self._handle_wait,
+ "Take_over": self._handle_takeover,
+ "Note": self._handle_note,
+ "Call_API": self._handle_call_api,
+ "Interact": self._handle_interact,
+ }
+ return handlers.get(action_name)
+
+ def _convert_relative_to_absolute(
+ self, element: list[int], screen_width: int, screen_height: int
+ ) -> tuple[int, int]:
+ """Convert relative coordinates (0-1000) to absolute pixels."""
+ x = int(element[0] / 1000 * screen_width)
+ y = int(element[1] / 1000 * screen_height)
+ return x, y
+
+ def _handle_launch(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle app launch action."""
+ app_name = action.get("app")
+ if not app_name:
+ return ActionResult(False, False, "No app name specified")
+
+ success = launch_app(app_name, self.device_id)
+ if success:
+ return ActionResult(True, False)
+ return ActionResult(False, False, f"App not found: {app_name}")
+
+ def _handle_tap(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle tap action."""
+ element = action.get("element")
+ if not element:
+ return ActionResult(False, False, "No element coordinates")
+
+ x, y = self._convert_relative_to_absolute(element, width, height)
+
+ # Check for sensitive operation
+ if "message" in action:
+ if not self.confirmation_callback(action["message"]):
+ return ActionResult(
+ success=False,
+ should_finish=True,
+ message="User cancelled sensitive operation",
+ )
+
+ tap(x, y, self.device_id)
+ return ActionResult(True, False)
+
+ def _handle_type(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle text input action."""
+ text = action.get("text", "")
+
+ # Switch to ADB keyboard
+ original_ime = detect_and_set_adb_keyboard(self.device_id)
+ time.sleep(1.0)
+
+ # Clear existing text and type new text
+ clear_text(self.device_id)
+ time.sleep(1.0)
+
+ type_text(text, self.device_id)
+ time.sleep(1.0)
+
+ # Restore original keyboard
+ restore_keyboard(original_ime, self.device_id)
+ time.sleep(1.0)
+
+ return ActionResult(True, False)
+
+ def _handle_swipe(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle swipe action."""
+ start = action.get("start")
+ end = action.get("end")
+
+ if not start or not end:
+ return ActionResult(False, False, "Missing swipe coordinates")
+
+ start_x, start_y = self._convert_relative_to_absolute(start, width, height)
+ end_x, end_y = self._convert_relative_to_absolute(end, width, height)
+
+ swipe(start_x, start_y, end_x, end_y, device_id=self.device_id)
+ return ActionResult(True, False)
+
+ def _handle_back(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle back button action."""
+ back(self.device_id)
+ return ActionResult(True, False)
+
+ def _handle_home(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle home button action."""
+ home(self.device_id)
+ return ActionResult(True, False)
+
+ def _handle_double_tap(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle double tap action."""
+ element = action.get("element")
+ if not element:
+ return ActionResult(False, False, "No element coordinates")
+
+ x, y = self._convert_relative_to_absolute(element, width, height)
+ double_tap(x, y, self.device_id)
+ return ActionResult(True, False)
+
+ def _handle_long_press(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle long press action."""
+ element = action.get("element")
+ if not element:
+ return ActionResult(False, False, "No element coordinates")
+
+ x, y = self._convert_relative_to_absolute(element, width, height)
+ long_press(x, y, device_id=self.device_id)
+ return ActionResult(True, False)
+
+ def _handle_wait(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle wait action."""
+ duration_str = action.get("duration", "1 seconds")
+ try:
+ duration = float(duration_str.replace("seconds", "").strip())
+ except ValueError:
+ duration = 1.0
+
+ time.sleep(duration)
+ return ActionResult(True, False)
+
+ def _handle_takeover(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle takeover request (login, captcha, etc.)."""
+ message = action.get("message", "User intervention required")
+ self.takeover_callback(message)
+ return ActionResult(True, False)
+
+ def _handle_note(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle note action (placeholder for content recording)."""
+ # This action is typically used for recording page content
+ # Implementation depends on specific requirements
+ return ActionResult(True, False)
+
+ def _handle_call_api(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle API call action (placeholder for summarization)."""
+ # This action is typically used for content summarization
+ # Implementation depends on specific requirements
+ return ActionResult(True, False)
+
+ def _handle_interact(self, action: dict, width: int, height: int) -> ActionResult:
+ """Handle interaction request (user choice needed)."""
+ # This action signals that user input is needed
+ return ActionResult(True, False, message="User interaction required")
+
+ @staticmethod
+ def _default_confirmation(message: str) -> bool:
+ """Default confirmation callback using console input."""
+ response = input(f"Sensitive operation: {message}\nConfirm? (Y/N): ")
+ return response.upper() == "Y"
+
+ @staticmethod
+ def _default_takeover(message: str) -> None:
+ """Default takeover callback using console input."""
+ input(f"{message}\nPress Enter after completing manual operation...")
+
+
+def parse_action(response: str) -> dict[str, Any]:
+ """
+ Parse action from model response.
+
+ Args:
+ response: Raw response string from the model.
+
+ Returns:
+ Parsed action dictionary.
+
+ Raises:
+ ValueError: If the response cannot be parsed.
+ """
+ try:
+ # Try to evaluate as Python dict/function call
+ response = response.strip()
+ if response.startswith("do"):
+ action = eval(response)
+ elif response.startswith("finish"):
+ action = {
+ "_metadata": "finish",
+ "message": response.replace("finish(message=", "")[1:-2],
+ }
+ else:
+ raise ValueError(f"Failed to parse action: {response}")
+ return action
+ except Exception as e:
+ raise ValueError(f"Failed to parse action: {e}")
+
+
+def do(**kwargs) -> dict[str, Any]:
+ """Helper function for creating 'do' actions."""
+ kwargs["_metadata"] = "do"
+ return kwargs
+
+
+def finish(**kwargs) -> dict[str, Any]:
+ """Helper function for creating 'finish' actions."""
+ kwargs["_metadata"] = "finish"
+ return kwargs
diff --git a/phone_agent/adb/__init__.py b/phone_agent/adb/__init__.py
new file mode 100644
index 0000000..004beaf
--- /dev/null
+++ b/phone_agent/adb/__init__.py
@@ -0,0 +1,51 @@
+"""ADB utilities for Android device interaction."""
+
+from phone_agent.adb.connection import (
+ ADBConnection,
+ ConnectionType,
+ DeviceInfo,
+ list_devices,
+ quick_connect,
+)
+from phone_agent.adb.device import (
+ back,
+ double_tap,
+ get_current_app,
+ home,
+ launch_app,
+ long_press,
+ swipe,
+ tap,
+)
+from phone_agent.adb.input import (
+ clear_text,
+ detect_and_set_adb_keyboard,
+ restore_keyboard,
+ type_text,
+)
+from phone_agent.adb.screenshot import get_screenshot
+
+__all__ = [
+ # Screenshot
+ "get_screenshot",
+ # Input
+ "type_text",
+ "clear_text",
+ "detect_and_set_adb_keyboard",
+ "restore_keyboard",
+ # Device control
+ "get_current_app",
+ "tap",
+ "swipe",
+ "back",
+ "home",
+ "double_tap",
+ "long_press",
+ "launch_app",
+ # Connection management
+ "ADBConnection",
+ "DeviceInfo",
+ "ConnectionType",
+ "quick_connect",
+ "list_devices",
+]
diff --git a/phone_agent/adb/connection.py b/phone_agent/adb/connection.py
new file mode 100644
index 0000000..31858dc
--- /dev/null
+++ b/phone_agent/adb/connection.py
@@ -0,0 +1,350 @@
+"""ADB connection management for local and remote devices."""
+
+import subprocess
+import time
+from dataclasses import dataclass
+from enum import Enum
+from typing import Optional
+
+
+class ConnectionType(Enum):
+ """Type of ADB connection."""
+
+ USB = "usb"
+ WIFI = "wifi"
+ REMOTE = "remote"
+
+
+@dataclass
+class DeviceInfo:
+ """Information about a connected device."""
+
+ device_id: str
+ status: str
+ connection_type: ConnectionType
+ model: str | None = None
+ android_version: str | None = None
+
+
+class ADBConnection:
+ """
+ Manages ADB connections to Android devices.
+
+ Supports USB, WiFi, and remote TCP/IP connections.
+
+ Example:
+ >>> conn = ADBConnection()
+ >>> # Connect to remote device
+ >>> conn.connect("192.168.1.100:5555")
+ >>> # List devices
+ >>> devices = conn.list_devices()
+ >>> # Disconnect
+ >>> conn.disconnect("192.168.1.100:5555")
+ """
+
+ def __init__(self, adb_path: str = "adb"):
+ """
+ Initialize ADB connection manager.
+
+ Args:
+ adb_path: Path to ADB executable.
+ """
+ self.adb_path = adb_path
+
+ def connect(self, address: str, timeout: int = 10) -> tuple[bool, str]:
+ """
+ Connect to a remote device via TCP/IP.
+
+ Args:
+ address: Device address in format "host:port" (e.g., "192.168.1.100:5555").
+ timeout: Connection timeout in seconds.
+
+ Returns:
+ Tuple of (success, message).
+
+ Note:
+ The remote device must have TCP/IP debugging enabled.
+ On the device, run: adb tcpip 5555
+ """
+ # Validate address format
+ if ":" not in address:
+ address = f"{address}:5555" # Default ADB port
+
+ try:
+ result = subprocess.run(
+ [self.adb_path, "connect", address],
+ capture_output=True,
+ text=True,
+ timeout=timeout,
+ )
+
+ output = result.stdout + result.stderr
+
+ if "connected" in output.lower():
+ return True, f"Connected to {address}"
+ elif "already connected" in output.lower():
+ return True, f"Already connected to {address}"
+ else:
+ return False, output.strip()
+
+ except subprocess.TimeoutExpired:
+ return False, f"Connection timeout after {timeout}s"
+ except Exception as e:
+ return False, f"Connection error: {e}"
+
+ def disconnect(self, address: str | None = None) -> tuple[bool, str]:
+ """
+ Disconnect from a remote device.
+
+ Args:
+ address: Device address to disconnect. If None, disconnects all.
+
+ Returns:
+ Tuple of (success, message).
+ """
+ try:
+ cmd = [self.adb_path, "disconnect"]
+ if address:
+ cmd.append(address)
+
+ result = subprocess.run(cmd, capture_output=True, text=True, timeout=5)
+
+ output = result.stdout + result.stderr
+ return True, output.strip() or "Disconnected"
+
+ except Exception as e:
+ return False, f"Disconnect error: {e}"
+
+ def list_devices(self) -> list[DeviceInfo]:
+ """
+ List all connected devices.
+
+ Returns:
+ List of DeviceInfo objects.
+ """
+ try:
+ result = subprocess.run(
+ [self.adb_path, "devices", "-l"],
+ capture_output=True,
+ text=True,
+ timeout=5,
+ )
+
+ devices = []
+ for line in result.stdout.strip().split("\n")[1:]: # Skip header
+ if not line.strip():
+ continue
+
+ parts = line.split()
+ if len(parts) >= 2:
+ device_id = parts[0]
+ status = parts[1]
+
+ # Determine connection type
+ if ":" in device_id:
+ conn_type = ConnectionType.REMOTE
+ elif "emulator" in device_id:
+ conn_type = ConnectionType.USB # Emulator via USB
+ else:
+ conn_type = ConnectionType.USB
+
+ # Parse additional info
+ model = None
+ for part in parts[2:]:
+ if part.startswith("model:"):
+ model = part.split(":", 1)[1]
+ break
+
+ devices.append(
+ DeviceInfo(
+ device_id=device_id,
+ status=status,
+ connection_type=conn_type,
+ model=model,
+ )
+ )
+
+ return devices
+
+ except Exception as e:
+ print(f"Error listing devices: {e}")
+ return []
+
+ def get_device_info(self, device_id: str | None = None) -> DeviceInfo | None:
+ """
+ Get detailed information about a device.
+
+ Args:
+ device_id: Device ID. If None, uses first available device.
+
+ Returns:
+ DeviceInfo or None if not found.
+ """
+ devices = self.list_devices()
+
+ if not devices:
+ return None
+
+ if device_id is None:
+ return devices[0]
+
+ for device in devices:
+ if device.device_id == device_id:
+ return device
+
+ return None
+
+ def is_connected(self, device_id: str | None = None) -> bool:
+ """
+ Check if a device is connected.
+
+ Args:
+ device_id: Device ID to check. If None, checks if any device is connected.
+
+ Returns:
+ True if connected, False otherwise.
+ """
+ devices = self.list_devices()
+
+ if not devices:
+ return False
+
+ if device_id is None:
+ return any(d.status == "device" for d in devices)
+
+ return any(d.device_id == device_id and d.status == "device" for d in devices)
+
+ def enable_tcpip(
+ self, port: int = 5555, device_id: str | None = None
+ ) -> tuple[bool, str]:
+ """
+ Enable TCP/IP debugging on a USB-connected device.
+
+ This allows subsequent wireless connections to the device.
+
+ Args:
+ port: TCP port for ADB (default: 5555).
+ device_id: Device ID. If None, uses first available device.
+
+ Returns:
+ Tuple of (success, message).
+
+ Note:
+ The device must be connected via USB first.
+ After this, you can disconnect USB and connect via WiFi.
+ """
+ try:
+ cmd = [self.adb_path]
+ if device_id:
+ cmd.extend(["-s", device_id])
+ cmd.extend(["tcpip", str(port)])
+
+ result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
+
+ output = result.stdout + result.stderr
+
+ if "restarting" in output.lower() or result.returncode == 0:
+ time.sleep(2) # Wait for ADB to restart
+ return True, f"TCP/IP mode enabled on port {port}"
+ else:
+ return False, output.strip()
+
+ except Exception as e:
+ return False, f"Error enabling TCP/IP: {e}"
+
+ def get_device_ip(self, device_id: str | None = None) -> str | None:
+ """
+ Get the IP address of a connected device.
+
+ Args:
+ device_id: Device ID. If None, uses first available device.
+
+ Returns:
+ IP address string or None if not found.
+ """
+ try:
+ cmd = [self.adb_path]
+ if device_id:
+ cmd.extend(["-s", device_id])
+ cmd.extend(["shell", "ip", "route"])
+
+ result = subprocess.run(cmd, capture_output=True, text=True, timeout=5)
+
+ # Parse IP from route output
+ for line in result.stdout.split("\n"):
+ if "src" in line:
+ parts = line.split()
+ for i, part in enumerate(parts):
+ if part == "src" and i + 1 < len(parts):
+ return parts[i + 1]
+
+ # Alternative: try wlan0 interface
+ cmd[-1] = "ip addr show wlan0"
+ result = subprocess.run(
+ cmd[:-1] + ["shell", "ip", "addr", "show", "wlan0"],
+ capture_output=True,
+ text=True,
+ timeout=5,
+ )
+
+ for line in result.stdout.split("\n"):
+ if "inet " in line:
+ parts = line.strip().split()
+ if len(parts) >= 2:
+ return parts[1].split("/")[0]
+
+ return None
+
+ except Exception as e:
+ print(f"Error getting device IP: {e}")
+ return None
+
+ def restart_server(self) -> tuple[bool, str]:
+ """
+ Restart the ADB server.
+
+ Returns:
+ Tuple of (success, message).
+ """
+ try:
+ # Kill server
+ subprocess.run(
+ [self.adb_path, "kill-server"], capture_output=True, timeout=5
+ )
+
+ time.sleep(1)
+
+ # Start server
+ subprocess.run(
+ [self.adb_path, "start-server"], capture_output=True, timeout=5
+ )
+
+ return True, "ADB server restarted"
+
+ except Exception as e:
+ return False, f"Error restarting server: {e}"
+
+
+def quick_connect(address: str) -> tuple[bool, str]:
+ """
+ Quick helper to connect to a remote device.
+
+ Args:
+ address: Device address (e.g., "192.168.1.100" or "192.168.1.100:5555").
+
+ Returns:
+ Tuple of (success, message).
+ """
+ conn = ADBConnection()
+ return conn.connect(address)
+
+
+def list_devices() -> list[DeviceInfo]:
+ """
+ Quick helper to list connected devices.
+
+ Returns:
+ List of DeviceInfo objects.
+ """
+ conn = ADBConnection()
+ return conn.list_devices()
diff --git a/phone_agent/adb/device.py b/phone_agent/adb/device.py
new file mode 100644
index 0000000..a210af3
--- /dev/null
+++ b/phone_agent/adb/device.py
@@ -0,0 +1,224 @@
+"""Device control utilities for Android automation."""
+
+import os
+import subprocess
+import time
+from typing import List, Optional, Tuple
+
+from phone_agent.config.apps import APP_PACKAGES
+
+
+def get_current_app(device_id: str | None = None) -> str:
+ """
+ Get the currently focused app name.
+
+ Args:
+ device_id: Optional ADB device ID for multi-device setups.
+
+ Returns:
+ The app name if recognized, otherwise "System Home".
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ result = subprocess.run(
+ adb_prefix + ["shell", "dumpsys", "window"], capture_output=True, text=True
+ )
+ output = result.stdout
+
+ # Parse window focus info
+ for line in output.split("\n"):
+ if "mCurrentFocus" in line or "mFocusedApp" in line:
+ for app_name, package in APP_PACKAGES.items():
+ if package in line:
+ return app_name
+
+ return "System Home"
+
+
+def tap(x: int, y: int, device_id: str | None = None, delay: float = 1.0) -> None:
+ """
+ Tap at the specified coordinates.
+
+ Args:
+ x: X coordinate.
+ y: Y coordinate.
+ device_id: Optional ADB device ID.
+ delay: Delay in seconds after tap.
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ subprocess.run(
+ adb_prefix + ["shell", "input", "tap", str(x), str(y)], capture_output=True
+ )
+ time.sleep(delay)
+
+
+def double_tap(
+ x: int, y: int, device_id: str | None = None, delay: float = 1.0
+) -> None:
+ """
+ Double tap at the specified coordinates.
+
+ Args:
+ x: X coordinate.
+ y: Y coordinate.
+ device_id: Optional ADB device ID.
+ delay: Delay in seconds after double tap.
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ subprocess.run(
+ adb_prefix + ["shell", "input", "tap", str(x), str(y)], capture_output=True
+ )
+ time.sleep(0.1)
+ subprocess.run(
+ adb_prefix + ["shell", "input", "tap", str(x), str(y)], capture_output=True
+ )
+ time.sleep(delay)
+
+
+def long_press(
+ x: int,
+ y: int,
+ duration_ms: int = 3000,
+ device_id: str | None = None,
+ delay: float = 1.0,
+) -> None:
+ """
+ Long press at the specified coordinates.
+
+ Args:
+ x: X coordinate.
+ y: Y coordinate.
+ duration_ms: Duration of press in milliseconds.
+ device_id: Optional ADB device ID.
+ delay: Delay in seconds after long press.
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ subprocess.run(
+ adb_prefix
+ + ["shell", "input", "swipe", str(x), str(y), str(x), str(y), str(duration_ms)],
+ capture_output=True,
+ )
+ time.sleep(delay)
+
+
+def swipe(
+ start_x: int,
+ start_y: int,
+ end_x: int,
+ end_y: int,
+ duration_ms: int | None = None,
+ device_id: str | None = None,
+ delay: float = 1.0,
+) -> None:
+ """
+ Swipe from start to end coordinates.
+
+ Args:
+ start_x: Starting X coordinate.
+ start_y: Starting Y coordinate.
+ end_x: Ending X coordinate.
+ end_y: Ending Y coordinate.
+ duration_ms: Duration of swipe in milliseconds (auto-calculated if None).
+ device_id: Optional ADB device ID.
+ delay: Delay in seconds after swipe.
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ if duration_ms is None:
+ # Calculate duration based on distance
+ dist_sq = (start_x - end_x) ** 2 + (start_y - end_y) ** 2
+ duration_ms = int(dist_sq / 1000)
+ duration_ms = max(1000, min(duration_ms, 2000)) # Clamp between 1000-2000ms
+
+ subprocess.run(
+ adb_prefix
+ + [
+ "shell",
+ "input",
+ "swipe",
+ str(start_x),
+ str(start_y),
+ str(end_x),
+ str(end_y),
+ str(duration_ms),
+ ],
+ capture_output=True,
+ )
+ time.sleep(delay)
+
+
+def back(device_id: str | None = None, delay: float = 1.0) -> None:
+ """
+ Press the back button.
+
+ Args:
+ device_id: Optional ADB device ID.
+ delay: Delay in seconds after pressing back.
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ subprocess.run(
+ adb_prefix + ["shell", "input", "keyevent", "4"], capture_output=True
+ )
+ time.sleep(delay)
+
+
+def home(device_id: str | None = None, delay: float = 1.0) -> None:
+ """
+ Press the home button.
+
+ Args:
+ device_id: Optional ADB device ID.
+ delay: Delay in seconds after pressing home.
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ subprocess.run(
+ adb_prefix + ["shell", "input", "keyevent", "KEYCODE_HOME"], capture_output=True
+ )
+ time.sleep(delay)
+
+
+def launch_app(app_name: str, device_id: str | None = None, delay: float = 1.0) -> bool:
+ """
+ Launch an app by name.
+
+ Args:
+ app_name: The app name (must be in APP_PACKAGES).
+ device_id: Optional ADB device ID.
+ delay: Delay in seconds after launching.
+
+ Returns:
+ True if app was launched, False if app not found.
+ """
+ if app_name not in APP_PACKAGES:
+ return False
+
+ adb_prefix = _get_adb_prefix(device_id)
+ package = APP_PACKAGES[app_name]
+
+ subprocess.run(
+ adb_prefix
+ + [
+ "shell",
+ "monkey",
+ "-p",
+ package,
+ "-c",
+ "android.intent.category.LAUNCHER",
+ "1",
+ ],
+ capture_output=True,
+ )
+ time.sleep(delay)
+ return True
+
+
+def _get_adb_prefix(device_id: str | None) -> list:
+ """Get ADB command prefix with optional device specifier."""
+ if device_id:
+ return ["adb", "-s", device_id]
+ return ["adb"]
diff --git a/phone_agent/adb/input.py b/phone_agent/adb/input.py
new file mode 100644
index 0000000..4c1c68c
--- /dev/null
+++ b/phone_agent/adb/input.py
@@ -0,0 +1,109 @@
+"""Input utilities for Android device text input."""
+
+import base64
+import subprocess
+from typing import Optional
+
+
+def type_text(text: str, device_id: str | None = None) -> None:
+ """
+ Type text into the currently focused input field using ADB Keyboard.
+
+ Args:
+ text: The text to type.
+ device_id: Optional ADB device ID for multi-device setups.
+
+ Note:
+ Requires ADB Keyboard to be installed on the device.
+ See: https://github.com/nicnocquee/AdbKeyboard
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+ encoded_text = base64.b64encode(text.encode("utf-8")).decode("utf-8")
+
+ subprocess.run(
+ adb_prefix
+ + [
+ "shell",
+ "am",
+ "broadcast",
+ "-a",
+ "ADB_INPUT_B64",
+ "--es",
+ "msg",
+ encoded_text,
+ ],
+ capture_output=True,
+ text=True,
+ )
+
+
+def clear_text(device_id: str | None = None) -> None:
+ """
+ Clear text in the currently focused input field.
+
+ Args:
+ device_id: Optional ADB device ID for multi-device setups.
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ subprocess.run(
+ adb_prefix + ["shell", "am", "broadcast", "-a", "ADB_CLEAR_TEXT"],
+ capture_output=True,
+ text=True,
+ )
+
+
+def detect_and_set_adb_keyboard(device_id: str | None = None) -> str:
+ """
+ Detect current keyboard and switch to ADB Keyboard if needed.
+
+ Args:
+ device_id: Optional ADB device ID for multi-device setups.
+
+ Returns:
+ The original keyboard IME identifier for later restoration.
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ # Get current IME
+ result = subprocess.run(
+ adb_prefix + ["shell", "settings", "get", "secure", "default_input_method"],
+ capture_output=True,
+ text=True,
+ )
+ current_ime = (result.stdout + result.stderr).strip()
+
+ # Switch to ADB Keyboard if not already set
+ if "com.android.adbkeyboard/.AdbIME" not in current_ime:
+ subprocess.run(
+ adb_prefix + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"],
+ capture_output=True,
+ text=True,
+ )
+
+ # Warm up the keyboard
+ type_text("", device_id)
+
+ return current_ime
+
+
+def restore_keyboard(ime: str, device_id: str | None = None) -> None:
+ """
+ Restore the original keyboard IME.
+
+ Args:
+ ime: The IME identifier to restore.
+ device_id: Optional ADB device ID for multi-device setups.
+ """
+ adb_prefix = _get_adb_prefix(device_id)
+
+ subprocess.run(
+ adb_prefix + ["shell", "ime", "set", ime], capture_output=True, text=True
+ )
+
+
+def _get_adb_prefix(device_id: str | None) -> list:
+ """Get ADB command prefix with optional device specifier."""
+ if device_id:
+ return ["adb", "-s", device_id]
+ return ["adb"]
diff --git a/phone_agent/adb/screenshot.py b/phone_agent/adb/screenshot.py
new file mode 100644
index 0000000..6bf4034
--- /dev/null
+++ b/phone_agent/adb/screenshot.py
@@ -0,0 +1,108 @@
+"""Screenshot utilities for capturing Android device screen."""
+
+import base64
+import os
+import subprocess
+import uuid
+from dataclasses import dataclass
+from io import BytesIO
+from typing import Tuple
+
+from PIL import Image
+
+
+@dataclass
+class Screenshot:
+ """Represents a captured screenshot."""
+
+ base64_data: str
+ width: int
+ height: int
+ is_sensitive: bool = False
+
+
+def get_screenshot(device_id: str | None = None, timeout: int = 10) -> Screenshot:
+ """
+ Capture a screenshot from the connected Android device.
+
+ Args:
+ device_id: Optional ADB device ID for multi-device setups.
+ timeout: Timeout in seconds for screenshot operations.
+
+ Returns:
+ Screenshot object containing base64 data and dimensions.
+
+ Note:
+ If the screenshot fails (e.g., on sensitive screens like payment pages),
+ a black fallback image is returned with is_sensitive=True.
+ """
+ temp_path = f"/tmp/screenshot_{uuid.uuid4()}.png"
+ adb_prefix = _get_adb_prefix(device_id)
+
+ try:
+ # Execute screenshot command
+ result = subprocess.run(
+ adb_prefix + ["shell", "screencap", "-p", "/sdcard/tmp.png"],
+ capture_output=True,
+ text=True,
+ timeout=timeout,
+ )
+
+ # Check for screenshot failure (sensitive screen)
+ output = result.stdout + result.stderr
+ if "Status: -1" in output or "Failed" in output:
+ return _create_fallback_screenshot(is_sensitive=True)
+
+ # Pull screenshot to local temp path
+ subprocess.run(
+ adb_prefix + ["pull", "/sdcard/tmp.png", temp_path],
+ capture_output=True,
+ text=True,
+ timeout=5,
+ )
+
+ if not os.path.exists(temp_path):
+ return _create_fallback_screenshot(is_sensitive=False)
+
+ # Read and encode image
+ img = Image.open(temp_path)
+ width, height = img.size
+
+ buffered = BytesIO()
+ img.save(buffered, format="PNG")
+ base64_data = base64.b64encode(buffered.getvalue()).decode("utf-8")
+
+ # Cleanup
+ os.remove(temp_path)
+
+ return Screenshot(
+ base64_data=base64_data, width=width, height=height, is_sensitive=False
+ )
+
+ except Exception as e:
+ print(f"Screenshot error: {e}")
+ return _create_fallback_screenshot(is_sensitive=False)
+
+
+def _get_adb_prefix(device_id: str | None) -> list:
+ """Get ADB command prefix with optional device specifier."""
+ if device_id:
+ return ["adb", "-s", device_id]
+ return ["adb"]
+
+
+def _create_fallback_screenshot(is_sensitive: bool) -> Screenshot:
+ """Create a black fallback image when screenshot fails."""
+ default_width, default_height = 1080, 2400
+
+ black_img = Image.new("RGB", (default_width, default_height), color="black")
+ buffered = BytesIO()
+ black_img.save(buffered, format="PNG")
+ base64_data = base64.b64encode(buffered.getvalue()).decode("utf-8")
+
+ return Screenshot(
+ base64_data=base64_data,
+ width=default_width,
+ height=default_height,
+ is_sensitive=is_sensitive,
+ )
diff --git a/phone_agent/agent.py b/phone_agent/agent.py
new file mode 100644
index 0000000..83edb7f
--- /dev/null
+++ b/phone_agent/agent.py
@@ -0,0 +1,244 @@
+"""Main PhoneAgent class for orchestrating phone automation."""
+
+import json
+import traceback
+from dataclasses import dataclass
+from typing import Any, Callable
+
+from phone_agent.actions import ActionHandler
+from phone_agent.actions.handler import do, finish, parse_action
+from phone_agent.adb import get_current_app, get_screenshot
+from phone_agent.config import SYSTEM_PROMPT
+from phone_agent.model import ModelClient, ModelConfig
+from phone_agent.model.client import MessageBuilder
+
+
+@dataclass
+class AgentConfig:
+ """Configuration for the PhoneAgent."""
+
+ max_steps: int = 100
+ device_id: str | None = None
+ system_prompt: str = SYSTEM_PROMPT
+ verbose: bool = True
+
+
+@dataclass
+class StepResult:
+ """Result of a single agent step."""
+
+ success: bool
+ finished: bool
+ action: dict[str, Any] | None
+ thinking: str
+ message: str | None = None
+
+
+class PhoneAgent:
+ """
+ AI-powered agent for automating Android phone interactions.
+
+ The agent uses a vision-language model to understand screen content
+ and decide on actions to complete user tasks.
+
+ Args:
+ model_config: Configuration for the AI model.
+ agent_config: Configuration for the agent behavior.
+ confirmation_callback: Optional callback for sensitive action confirmation.
+ takeover_callback: Optional callback for takeover requests.
+
+ Example:
+ >>> from phone_agent import PhoneAgent
+ >>> from phone_agent.model import ModelConfig
+ >>>
+ >>> model_config = ModelConfig(base_url="http://localhost:8000/v1")
+ >>> agent = PhoneAgent(model_config)
+ >>> agent.run("Open WeChat and send a message to John")
+ """
+
+ def __init__(
+ self,
+ model_config: ModelConfig | None = None,
+ agent_config: AgentConfig | None = None,
+ confirmation_callback: Callable[[str], bool] | None = None,
+ takeover_callback: Callable[[str], None] | None = None,
+ ):
+ self.model_config = model_config or ModelConfig()
+ self.agent_config = agent_config or AgentConfig()
+
+ self.model_client = ModelClient(self.model_config)
+ self.action_handler = ActionHandler(
+ device_id=self.agent_config.device_id,
+ confirmation_callback=confirmation_callback,
+ takeover_callback=takeover_callback,
+ )
+
+ self._context: list[dict[str, Any]] = []
+ self._step_count = 0
+
+ def run(self, task: str) -> str:
+ """
+ Run the agent to complete a task.
+
+ Args:
+ task: Natural language description of the task.
+
+ Returns:
+ Final message from the agent.
+ """
+ self._context = []
+ self._step_count = 0
+
+ # First step with user prompt
+ result = self._execute_step(task, is_first=True)
+
+ if result.finished:
+ return result.message or "Task completed"
+
+ # Continue until finished or max steps reached
+ while self._step_count < self.agent_config.max_steps:
+ result = self._execute_step(is_first=False)
+
+ if result.finished:
+ return result.message or "Task completed"
+
+ return "Max steps reached"
+
+ def step(self, task: str | None = None) -> StepResult:
+ """
+ Execute a single step of the agent.
+
+ Useful for manual control or debugging.
+
+ Args:
+ task: Task description (only needed for first step).
+
+ Returns:
+ StepResult with step details.
+ """
+ is_first = len(self._context) == 0
+
+ if is_first and not task:
+ raise ValueError("Task is required for the first step")
+
+ return self._execute_step(task, is_first)
+
+ def reset(self) -> None:
+ """Reset the agent state for a new task."""
+ self._context = []
+ self._step_count = 0
+
+ def _execute_step(
+ self, user_prompt: str | None = None, is_first: bool = False
+ ) -> StepResult:
+ """Execute a single step of the agent loop."""
+ self._step_count += 1
+
+ # Capture current screen state
+ screenshot = get_screenshot(self.agent_config.device_id)
+ current_app = get_current_app(self.agent_config.device_id)
+
+ # Build messages
+ if is_first:
+ self._context.append(
+ MessageBuilder.create_system_message(self.agent_config.system_prompt)
+ )
+
+ screen_info = MessageBuilder.build_screen_info(current_app)
+ text_content = f"{user_prompt}\n\n{screen_info}"
+
+ self._context.append(
+ MessageBuilder.create_user_message(
+ text=text_content, image_base64=screenshot.base64_data
+ )
+ )
+ else:
+ screen_info = MessageBuilder.build_screen_info(current_app)
+ text_content = f"** Screen Info **\n\n{screen_info}"
+
+ self._context.append(
+ MessageBuilder.create_user_message(
+ text=text_content, image_base64=screenshot.base64_data
+ )
+ )
+
+ # Get model response
+ try:
+ response = self.model_client.request(self._context)
+ except Exception as e:
+ if self.agent_config.verbose:
+ traceback.print_exc()
+ return StepResult(
+ success=False,
+ finished=True,
+ action=None,
+ thinking="",
+ message=f"Model error: {e}",
+ )
+
+ # Parse action from response
+ try:
+ action = parse_action(response.action)
+ except ValueError:
+ if self.agent_config.verbose:
+ traceback.print_exc()
+ action = finish(message=response.action)
+
+ if self.agent_config.verbose:
+ # 打印思考过程
+ print("\n" + "=" * 50)
+ print("💭 思考过程:")
+ print("-" * 50)
+ print(response.thinking)
+ print("-" * 50)
+ print("🎯 执行动作:")
+ print(json.dumps(action, ensure_ascii=False, indent=2))
+ print("=" * 50 + "\n")
+
+ # Remove image from context to save space
+ self._context[-1] = MessageBuilder.remove_images_from_message(self._context[-1])
+
+ # Execute action
+ try:
+ result = self.action_handler.execute(
+ action, screenshot.width, screenshot.height
+ )
+ except Exception as e:
+ if self.agent_config.verbose:
+ traceback.print_exc()
+ result = self.action_handler.execute(
+ finish(message=str(e)), screenshot.width, screenshot.height
+ )
+
+ # Add assistant response to context
+ self._context.append(
+ MessageBuilder.create_assistant_message(
+ f"{response.thinking} {response.action} "
+ )
+ )
+
+ # Check if finished
+ finished = action.get("_metadata") == "finish" or result.should_finish
+
+ if finished and self.agent_config.verbose:
+ print("\n" + "🎉 " + "=" * 48)
+ print(f"✅ 任务完成: {result.message or action.get('message', '完成')}")
+ print("=" * 50 + "\n")
+
+ return StepResult(
+ success=result.success,
+ finished=finished,
+ action=action,
+ thinking=response.thinking,
+ message=result.message or action.get("message"),
+ )
+
+ @property
+ def context(self) -> list[dict[str, Any]]:
+ """Get the current conversation context."""
+ return self._context.copy()
+
+ @property
+ def step_count(self) -> int:
+ """Get the current step count."""
+ return self._step_count
diff --git a/phone_agent/config/__init__.py b/phone_agent/config/__init__.py
new file mode 100644
index 0000000..787205a
--- /dev/null
+++ b/phone_agent/config/__init__.py
@@ -0,0 +1,6 @@
+"""Configuration module for Phone Agent."""
+
+from phone_agent.config.apps import APP_PACKAGES
+from phone_agent.config.prompts import SYSTEM_PROMPT
+
+__all__ = ["APP_PACKAGES", "SYSTEM_PROMPT"]
diff --git a/phone_agent/config/apps.py b/phone_agent/config/apps.py
new file mode 100644
index 0000000..c54613f
--- /dev/null
+++ b/phone_agent/config/apps.py
@@ -0,0 +1,111 @@
+"""App name to package name mapping for supported applications."""
+
+APP_PACKAGES: dict[str, str] = {
+ # Social & Messaging
+ "微信": "com.tencent.mm",
+ "QQ": "com.tencent.mobileqq",
+ "微博": "com.sina.weibo",
+ # E-commerce
+ "淘宝": "com.taobao.taobao",
+ "京东": "com.jingdong.app.mall",
+ "拼多多": "com.xunmeng.pinduoduo",
+ "淘宝闪购": "com.taobao.taobao",
+ "京东秒送": "com.jingdong.app.mall",
+ # Lifestyle & Social
+ "小红书": "com.xingin.xhs",
+ "豆瓣": "com.douban.frodo",
+ "知乎": "com.zhihu.android",
+ # Maps & Navigation
+ "高德地图": "com.autonavi.minimap",
+ "百度地图": "com.baidu.BaiduMap",
+ # Food & Services
+ "美团": "com.sankuai.meituan",
+ "大众点评": "com.dianping.v1",
+ "饿了么": "me.ele",
+ "肯德基": "com.yek.android.kfc.activitys",
+ # Travel
+ "携程": "ctrip.android.view",
+ "铁路12306": "com.MobileTicket",
+ "12306": "com.MobileTicket",
+ "去哪儿": "com.Qunar",
+ "去哪儿旅行": "com.Qunar",
+ "滴滴出行": "com.sdu.didi.psnger",
+
+ # Video & Entertainment
+ "bilibili": "tv.danmaku.bili",
+ "抖音": "com.ss.android.ugc.aweme",
+ "快手": "com.smile.gifmaker",
+ "腾讯视频": "com.tencent.qqlive",
+ "爱奇艺": "com.qiyi.video",
+ "优酷视频": "com.youku.phone",
+ "芒果TV": "com.hunantv.imgo.activity",
+ "红果短剧": "com.phoenix.read",
+ # Music & Audio
+ "网易云音乐": "com.netease.cloudmusic",
+ "QQ音乐": "com.tencent.qqmusic",
+ "汽水音乐": "com.luna.music",
+ "喜马拉雅": "com.ximalaya.ting.android",
+ # Reading
+ "番茄小说": "com.dragon.read",
+ "番茄免费小说": "com.dragon.read",
+ "七猫免费小说": "com.kmxs.reader",
+ # Productivity
+ "飞书": "com.ss.android.lark",
+ "QQ邮箱": "com.tencent.androidqqmail",
+ # AI & Tools
+ "豆包": "com.larus.nova",
+ # Health & Fitness
+ "keep": "com.gotokeep.keep",
+ "美柚": "com.lingan.seeyou",
+ # News & Information
+ "腾讯新闻": "com.tencent.news",
+ "今日头条": "com.ss.android.article.news",
+ # Real Estate
+ "贝壳找房": "com.lianjia.beike",
+ "安居客": "com.anjuke.android.app",
+ # Finance
+ "同花顺": "com.hexin.plat.android",
+ # Games
+ "星穹铁道": "com.miHoYo.hkrpg",
+ "崩坏:星穹铁道": "com.miHoYo.hkrpg",
+ "恋与深空": "com.papegames.lysk.cn",
+}
+
+
+def get_package_name(app_name: str) -> str | None:
+ """
+ Get the package name for an app.
+
+ Args:
+ app_name: The display name of the app.
+
+ Returns:
+ The Android package name, or None if not found.
+ """
+ return APP_PACKAGES.get(app_name)
+
+
+def get_app_name(package_name: str) -> str | None:
+ """
+ Get the app name from a package name.
+
+ Args:
+ package_name: The Android package name.
+
+ Returns:
+ The display name of the app, or None if not found.
+ """
+ for name, package in APP_PACKAGES.items():
+ if package == package_name:
+ return name
+ return None
+
+
+def list_supported_apps() -> list[str]:
+ """
+ Get a list of all supported app names.
+
+ Returns:
+ List of app names.
+ """
+ return list(APP_PACKAGES.keys())
diff --git a/phone_agent/config/prompts.py b/phone_agent/config/prompts.py
new file mode 100644
index 0000000..a978107
--- /dev/null
+++ b/phone_agent/config/prompts.py
@@ -0,0 +1,70 @@
+"""System prompts for the AI agent."""
+from datetime import datetime
+
+today = datetime.today()
+formatted_date = today.strftime("%Y年%m月%d日")
+
+SYSTEM_PROMPT = "今天的日期是: " + formatted_date + '''
+你是一个智能体分析专家,可以根据操作历史和当前状态图执行一系列操作来完成任务。
+你必须严格按照要求输出以下格式:
+{think}
+{action}
+
+其中:
+- {think} 是对你为什么选择这个操作的简短推理说明。
+- {action} 是本次执行的具体操作指令,必须严格遵循下方定义的指令格式。
+
+操作指令及其作用如下:
+- do(action="Launch", app="xxx")
+ Launch是启动目标app的操作,这比通过主屏幕导航更快。此操作完成后,您将自动收到结果状态的截图。
+- do(action="Tap", element=[x,y])
+ Tap是点击操作,点击屏幕上的特定点。可用此操作点击按钮、选择项目、从主屏幕打开应用程序,或与任何可点击的用户界面元素进行交互。坐标系统从左上角 (0,0) 开始到右下角(999,999)结束。此操作完成后,您将自动收到结果状态的截图。
+- do(action="Tap", element=[x,y], message="重要操作")
+ 基本功能同Tap,点击涉及财产、支付、隐私等敏感按钮时触发。
+- do(action="Type", text="xxx")
+ Type是输入操作,在当前聚焦的输入框中输入文本。使用此操作前,请确保输入框已被聚焦(先点击它)。输入的文本将像使用键盘输入一样输入。重要提示:手机可能正在使用 ADB 键盘,该键盘不会像普通键盘那样占用屏幕空间。要确认键盘已激活,请查看屏幕底部是否显示 'ADB Keyboard {ON}' 类似的文本,或者检查输入框是否处于激活/高亮状态。不要仅仅依赖视觉上的键盘显示。自动清除文本:当你使用输入操作时,输入框中现有的任何文本(包括占位符文本和实际输入)都会在输入新文本前自动清除。你无需在输入前手动清除文本——直接使用输入操作输入所需文本即可。操作完成后,你将自动收到结果状态的截图。
+- do(action="Type_Name", text="xxx")
+ Type_Name是输入人名的操作,基本功能同Type。
+- do(action="Interact")
+ Interact是当有多个满足条件的选项时而触发的交互操作,询问用户如何选择。
+- do(action="Swipe", start=[x1,y1], end=[x2,y2])
+ Swipe是滑动操作,通过从起始坐标拖动到结束坐标来执行滑动手势。可用于滚动内容、在屏幕之间导航、下拉通知栏以及项目栏或进行基于手势的导航。坐标系统从左上角 (0,0) 开始到右下角(999,999)结束。滑动持续时间会自动调整以实现自然的移动。此操作完成后,您将自动收到结果状态的截图。
+- do(action="Note", message="True")
+ 记录当前页面内容以便后续总结。
+- do(action="Call_API", instruction="xxx")
+ 总结或评论当前页面或已记录的内容。
+- do(action="Long Press", element=[x,y])
+ Long Pres是长按操作,在屏幕上的特定点长按指定时间。可用于触发上下文菜单、选择文本或激活长按交互。坐标系统从左上角 (0,0) 开始到右下角(999,999)结束。此操作完成后,您将自动收到结果状态的屏幕截图。
+- do(action="Double Tap", element=[x,y])
+ Double Tap在屏幕上的特定点快速连续点按两次。使用此操作可以激活双击交互,如缩放、选择文本或打开项目。坐标系统从左上角 (0,0) 开始到右下角(999,999)结束。此操作完成后,您将自动收到结果状态的截图。
+- do(action="Take_over", message="xxx")
+ Take_over是接管操作,表示在登录和验证阶段需要用户协助。
+- do(action="Back")
+ 导航返回到上一个屏幕或关闭当前对话框。相当于按下 Android 的返回按钮。使用此操作可以从更深的屏幕返回、关闭弹出窗口或退出当前上下文。此操作完成后,您将自动收到结果状态的截图。
+- do(action="Home")
+ Home是回到系统桌面的操作,相当于按下 Android 主屏幕按钮。使用此操作可退出当前应用并返回启动器,或从已知状态启动新任务。此操作完成后,您将自动收到结果状态的截图。
+- do(action="Wait", duration="x seconds")
+ 等待页面加载,x为需要等待多少秒。
+- finish(message="xxx")
+ finish是结束任务的操作,表示准确完整完成任务,message是终止信息。
+
+必须遵循的规则:
+1. 在执行任何操作前,先检查当前app是否是目标app,如果不是,先执行 Launch。
+2. 如果进入到了无关页面,先执行 Back。如果执行Back后页面没有变化,请点击页面左上角的返回键进行返回,或者右上角的X号关闭。
+3. 如果页面未加载出内容,最多连续 Wait 三次,否则执行 Back重新进入。
+4. 如果页面显示网络问题,需要重新加载,请点击重新加载。
+5. 如果当前页面找不到目标联系人、商品、店铺等信息,可以尝试 Swipe 滑动查找。
+6. 遇到价格区间、时间区间等筛选条件,如果没有完全符合的,可以放宽要求。
+7. 在做小红书总结类任务时一定要筛选图文笔记。
+8. 购物车全选后再点击全选可以把状态设为全不选,在做购物车任务时,如果购物车里已经有商品被选中时,你需要点击全选后再点击取消全选,再去找需要购买或者删除的商品。
+9. 在做外卖任务时,如果相应店铺购物车里已经有其他商品你需要先把购物车清空再去购买用户指定的外卖。
+10. 在做点外卖任务时,如果用户需要点多个外卖,请尽量在同一店铺进行购买,如果无法找到可以下单,并说明某个商品未找到。
+11. 请严格遵循用户意图执行任务,用户的特殊要求可以执行多次搜索,滑动查找。比如(i)用户要求点一杯咖啡,要咸的,你可以直接搜索咸咖啡,或者搜索咖啡后滑动查找咸的咖啡,比如海盐咖啡。(ii)用户要找到XX群,发一条消息,你可以先搜索XX群,找不到结果后,将"群"字去掉,搜索XX重试。(iii)用户要找到宠物友好的餐厅,你可以搜索餐厅,找到筛选,找到设施,选择可带宠物,或者直接搜索可带宠物,必要时可以使用AI搜索。
+12. 在选择日期时,如果原滑动方向与预期日期越来越远,请向反方向滑动查找。
+13. 执行任务过程中如果有多个可选择的项目栏,请逐个查找每个项目栏,直到完成任务,一定不要在同一项目栏多次查找,从而陷入死循环。
+14. 在执行下一步操作前请一定要检查上一步的操作是否生效,如果点击没生效,可能因为app反应较慢,请先稍微等待一下,如果还是不生效请调整一下点击位置重试,如果仍然不生效请跳过这一步继续任务,并在finish message说明点击不生效。
+15. 在执行任务中如果遇到滑动不生效的情况,请调整一下起始点位置,增大滑动距离重试,如果还是不生效,有可能是已经滑到底了,请继续向反方向滑动,直到顶部或底部,如果仍然没有符合要求的结果,请跳过这一步继续任务,并在finish message说明但没找到要求的项目。
+16. 在做游戏任务时如果在战斗页面如果有自动战斗一定要开启自动战斗,如果多轮历史状态相似要检查自动战斗是否开启。
+17. 如果没有合适的搜索结果,可能是因为搜索页面不对,请返回到搜索页面的上一级尝试重新搜索,如果尝试三次返回上一级搜索后仍然没有符合要求的结果,执行 finish(message="原因")。
+18. 在结束任务前请一定要仔细检查任务是否完整准确的完成,如果出现错选、漏选、多选的情况,请返回之前的步骤进行纠正。
+'''
\ No newline at end of file
diff --git a/phone_agent/model/__init__.py b/phone_agent/model/__init__.py
new file mode 100644
index 0000000..766cb95
--- /dev/null
+++ b/phone_agent/model/__init__.py
@@ -0,0 +1,5 @@
+"""Model client module for AI inference."""
+
+from phone_agent.model.client import ModelClient, ModelConfig
+
+__all__ = ["ModelClient", "ModelConfig"]
diff --git a/phone_agent/model/client.py b/phone_agent/model/client.py
new file mode 100644
index 0000000..e326f91
--- /dev/null
+++ b/phone_agent/model/client.py
@@ -0,0 +1,168 @@
+"""Model client for AI inference using OpenAI-compatible API."""
+
+import json
+from dataclasses import dataclass, field
+from typing import Any
+
+from openai import OpenAI
+
+
+@dataclass
+class ModelConfig:
+ """Configuration for the AI model."""
+
+ base_url: str = "http://localhost:8000/v1"
+ api_key: str = "EMPTY"
+ model_name: str = "autoglm-phone-9b"
+ max_tokens: int = 3000
+ temperature: float = 0.0
+ top_p: float = 0.85
+ frequency_penalty: float = 0.2
+ extra_body: dict[str, Any] = field(
+ default_factory=lambda: {"skip_special_tokens": False}
+ )
+
+
+@dataclass
+class ModelResponse:
+ """Response from the AI model."""
+
+ thinking: str
+ action: str
+ raw_content: str
+
+
+class ModelClient:
+ """
+ Client for interacting with OpenAI-compatible vision-language models.
+
+ Args:
+ config: Model configuration.
+ """
+
+ def __init__(self, config: ModelConfig | None = None):
+ self.config = config or ModelConfig()
+ self.client = OpenAI(base_url=self.config.base_url, api_key=self.config.api_key)
+
+ def request(self, messages: list[dict[str, Any]]) -> ModelResponse:
+ """
+ Send a request to the model.
+
+ Args:
+ messages: List of message dictionaries in OpenAI format.
+
+ Returns:
+ ModelResponse containing thinking and action.
+
+ Raises:
+ ValueError: If the response cannot be parsed.
+ """
+ response = self.client.chat.completions.create(
+ messages=messages,
+ model=self.config.model_name,
+ max_tokens=self.config.max_tokens,
+ temperature=self.config.temperature,
+ top_p=self.config.top_p,
+ frequency_penalty=self.config.frequency_penalty,
+ extra_body=self.config.extra_body,
+ )
+
+ raw_content = response.choices[0].message.content
+
+ # Parse thinking and action from response
+ thinking, action = self._parse_response(raw_content)
+
+ return ModelResponse(thinking=thinking, action=action, raw_content=raw_content)
+
+ def _parse_response(self, content: str) -> tuple[str, str]:
+ """
+ Parse the model response into thinking and action parts.
+
+ Args:
+ content: Raw response content.
+
+ Returns:
+ Tuple of (thinking, action).
+ """
+ if "" not in content:
+ return "", content
+
+ parts = content.split("", 1)
+ thinking = parts[0].replace("", "").replace(" ", "").strip()
+ action = parts[1].replace(" ", "").strip()
+
+ return thinking, action
+
+
+class MessageBuilder:
+ """Helper class for building conversation messages."""
+
+ @staticmethod
+ def create_system_message(content: str) -> dict[str, Any]:
+ """Create a system message."""
+ return {"role": "system", "content": content}
+
+ @staticmethod
+ def create_user_message(
+ text: str, image_base64: str | None = None
+ ) -> dict[str, Any]:
+ """
+ Create a user message with optional image.
+
+ Args:
+ text: Text content.
+ image_base64: Optional base64-encoded image.
+
+ Returns:
+ Message dictionary.
+ """
+ content = []
+
+ if image_base64:
+ content.append(
+ {
+ "type": "image_url",
+ "image_url": {"url": f"data:image/png;base64,{image_base64}"},
+ }
+ )
+
+ content.append({"type": "text", "text": text})
+
+ return {"role": "user", "content": content}
+
+ @staticmethod
+ def create_assistant_message(content: str) -> dict[str, Any]:
+ """Create an assistant message."""
+ return {"role": "assistant", "content": content}
+
+ @staticmethod
+ def remove_images_from_message(message: dict[str, Any]) -> dict[str, Any]:
+ """
+ Remove image content from a message to save context space.
+
+ Args:
+ message: Message dictionary.
+
+ Returns:
+ Message with images removed.
+ """
+ if isinstance(message.get("content"), list):
+ message["content"] = [
+ item for item in message["content"] if item.get("type") == "text"
+ ]
+ return message
+
+ @staticmethod
+ def build_screen_info(current_app: str, **extra_info) -> str:
+ """
+ Build screen info string for the model.
+
+ Args:
+ current_app: Current app name.
+ **extra_info: Additional info to include.
+
+ Returns:
+ JSON string with screen info.
+ """
+ info = {"current_app": current_app, **extra_info}
+ return json.dumps(info, ensure_ascii=False)
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..ae9e8ce
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,14 @@
+Pillow>=12.0.0
+openai>=2.9.0
+
+# For Model Deployment
+
+# sglang>=0.5.6.post1
+# transformers>=5.0.0rc0
+# vllm>=0.12.0
+
+# Optional: for development
+# pytest>=7.0.0
+# pre-commit>=4.5.0
+# black>=23.0.0
+# mypy>=1.0.0
diff --git a/resources/WECHAT.md b/resources/WECHAT.md
new file mode 100644
index 0000000..34b8316
--- /dev/null
+++ b/resources/WECHAT.md
@@ -0,0 +1,6 @@
+
+
+
+
扫码关注公众号,加入「Open-AutoGLM 交流群」
+
Scan the QR code to follow the official account and join the "Open-AutoGLM Discussion Group"
+
diff --git a/resources/logo.svg b/resources/logo.svg
new file mode 100644
index 0000000..4877d7b
--- /dev/null
+++ b/resources/logo.svg
@@ -0,0 +1,18 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/resources/privacy_policy.txt b/resources/privacy_policy.txt
new file mode 100644
index 0000000..5595360
--- /dev/null
+++ b/resources/privacy_policy.txt
@@ -0,0 +1,132 @@
+第一部分:模型/技术的安全性说明
+
+1. AutoGLM 技术机制与部署灵活性
+AutoGLM 的核心功能是自动化操作执行。其工作原理如下:
+- 指令驱动: 基于用户或开发者发出的操作指令。
+- 屏幕理解: 获取当前操作环境的屏幕内容,将图像发送给大模型(可部署在本地或云端)进行分析理解。
+- 操作模拟: 模拟人类操作方式(如点击、滑动、输入信息等)在目标环境中完成任务。
+- 示例: 当指令要求订购高铁票时,AutoGLM 会打开相关应用,识别界面内容,按指令选择车次、完成下单等步骤,如同人工操作,用户或开发者可随时终止任务。
+
+关键灵活性:
+- 模型部署: 开发者可自由选择将 AutoGLM 模型部署在本地设备或云端服务器上。
+- 操作执行环境: 自动化操作可以在本地设备上执行,也可以在云设备上执行,具体由开发者根据应用场景和需求决定。
+- 数据流向: 数据流向取决于部署选择:
+ - 本地部署(模型+执行): 屏幕捕获、模型分析、操作执行均在本地设备完成,数据不离开设备,隐私性最高。
+ - 云端部署(模型+执行): 屏幕内容需从操作环境(本机或云设备)传输到云端模型,模型分析后指令返回操作环境执行。开发者需确保传输和云端处理的安全性。
+ - 混合部署(如本地执行+云端模型): 屏幕内容在本地捕获,传输到云端模型分析,分析结果返回本地执行。开发者需关注数据传输安全。
+
+2. 系统权限调用说明(针对操作执行环境)
+为保证自动化操作正常执行,运行 AutoGLM 操作的环境可能需要获取以下权限:
+- ADB (Android Debug Bridge) 权限: 用于获取信息并模拟点击、滑动、输入等用户交互操作。
+- 存储权限: 用于临时存储必要的数据、模型文件(若本地部署)或日志。
+- 网络权限: 用于访问在线服务(如调用云端模型、访问目标应用服务)。
+- 其他特定权限: 根据具体任务可能需要(如麦克风用于语音指令)。
+
+开发者责任:
+- 最小权限原则: 仅请求完成特定任务所必需的权限。
+- 透明告知: 在应用或服务中清晰、明确地向最终用户说明每个权限的用途和必要性。
+- 用户授权: 必须获得最终用户的明确授权后,才能在操作环境中启用相关权限和功能。
+- 环境适配: 确保权限请求和获取机制适配所选择的操作执行环境(本地或云)。
+
+3. 数据处理与隐私保护原则
+AutoGLM 开源项目本身不收集用户数据。数据处理和隐私保护的责任主体是基于 AutoGLM 开发具体应用或服务的开发者,其责任取决于部署方式:
+- 本地部署(模型+执行):
+ - 开发者需在应用层面实现本地数据的安全存储和处理,所有数据处理(屏幕捕获、模型分析、操作执行)均在最终用户的本地设备上完成。
+ - 开发者应确保其应用不主动将敏感数据(如屏幕内容、操作记录)上传到开发者服务器或第三方,除非用户明确知情同意且为必要功能。
+- 云端部署(模型或执行或两者):
+ - 涉及数据(屏幕内容、操作指令、模型分析结果)在操作环境与云端之间传输。
+ - 开发者必须:
+ - 实施强加密保护所有传输和存储的数据。
+ - 明确告知最终用户哪些数据会被发送到云端、发送目的、存储位置及保留期限,获得最终用户对数据传输和云端处理的明确同意。
+ - 遵守适用的数据保护法规,提供清晰的隐私政策,说明数据处理实践。
+ - 确保云端环境(模型服务器、操作环境服务器)的安全配置和访问控制。
+- 通用原则(所有部署方式):
+ - 数据最小化: 仅收集和处理完成自动化任务所绝对必需的最少信息。
+ - 目的限制: 数据仅用于实现用户指令的特定自动化操作目的。
+ - 安全保障: 开发者有责任采取合理的技术和管理措施,保护其处理的所有用户数据(无论在本地还是云端)的安全性和保密性,防止未经授权的访问、使用、泄露或丢失。
+ - 用户控制: 提供机制让最终用户能够查看、管理(如删除)与其相关的数据(在技术可行且符合部署方式的前提下)。
+
+
+
+第二部分:开发者/用户应该遵循的使用规范
+
+开发者/用户在使用AutoGLM开源项目过程中,应始终遵循《中华人民共和国网络安全法》《互联网信息服务算法推荐管理规定》《互联网信息服务深度合成管理规定》《生成式人工智能服务管理暂行办法》《网络安全技术 生成式人工智能服务安全基本要求》等使用地所适用的法律法规及标准,并根据《人工智能生成合成内容标识办法》《网络安全技术人工智能生成合成内容标识方法(GB45438-2025)》的要求和应用场景,对人工智能生成合成内容进行标识,包括但不限于显式标识、隐式标识(元数据标识和数字水印)等。
+
+1. 重要操作确认机制
+
+开发者必须在其基于 AutoGLM 开发的应用或服务中,为涉及以下6+1项高风险操作设计并实现明确的、强制性的用户确认步骤:
+- 信息交互与内容传播:包括但不限于发送消息、邮件、发表评论、点赞、分享等。
+- 文件处置与权限管理:包括但不限于创建、编辑、删除、移动文件或文件夹、开启或关闭任意权限等。
+- 交易订单与权益处置:包括但不限于清空购物车、提交订单、修改/添加收货地址、使用优惠券/积分等。
+- 资金流转与支付结算:包括但不限于转账、支付、收款、充值、提现、绑定/解绑支付方式等。
+- 账户身份与安全配置:包括但不限于修改密码、设置/修改安全选项、删除账号或关联账号、删除好友/联系人、删除对话/记录等。
+- 医疗健康与法律合规:包括但不限于诊疗记录/健康数据的访问、授权或处置、药品采购、生理或心理测试、电子协议的签署等。
+- 其他高风险操作:其他任何可能对用户数据安全、财产安全、账号安全或声誉造成重大影响的操作。
+
+要求:
+- 确认步骤必须在操作执行前触发,清晰展示即将执行的操作详情。
+- 提供便捷的取消/终止机制,允许用户在确认前或操作过程中随时中止任务。
+- 开发者责任: 未能实现有效确认机制导致用户损失的,开发者需承担相应责任。用户责任: 用户在确认后未及时终止错误操作导致的损失,由用户自行承担。
+
+2. 开发者与用户的义务
+
+开发者义务:
+- 透明告知: 清晰、准确地向最终用户说明其应用/服务的功能、工作原理(特别是自动化部分)、数据收集和处理方式(包括是否涉及云端)、潜在风险以及用户如何控制。
+- 提供监控与控制: 设计用户界面,允许最终用户:
+ - 实时查看或了解自动化操作的当前状态和步骤。
+ - 方便、快速地暂停、终止任何正在进行的自动化任务。
+ - 管理自动化操作的权限和设置。
+- 安全开发: 遵循安全编码实践,确保应用/服务本身的安全性,防止被恶意利用。
+- 合规性: 确保其开发的应用/服务符合所有适用的法律法规、行业标准和第三方平台(如被操作的应用)的服务条款。
+- 风险提示: 在适当位置(如功能入口、首次使用时、确认步骤中)向用户明确提示使用自动化功能可能存在的风险(如误操作、隐私风险、第三方平台政策风险)。
+- 避免关键依赖: 谨慎评估,不建议将 AutoGLM 用于处理极端关键、高风险或一旦出错后果极其严重的操作(如医疗设备控制、关键基础设施操作、大额金融交易无人工复核)。
+
+用户义务:
+- 理解风险: 在使用基于 AutoGLM 的自动化功能前,仔细阅读开发者提供的说明、隐私政策和风险提示,充分理解其工作原理和潜在风险。
+- 谨慎授权: 仅在完全信任应用/服务开发者并理解授权内容后,才授予必要的权限。
+- 主动监控: 在自动化任务执行期间,保持适当的关注,特别是在执行重要操作时。利用开发者提供的监控功能了解操作进展。
+- 及时干预: 如发现操作错误、异常或不符合预期,应立即使用提供的终止功能停止任务。
+- 承担责任: 对其发出的指令、确认的操作以及因未能及时监控和制止错误操作而导致的任何损失,自行承担责任。
+
+3. 开发者与用户行为规范
+
+严禁利用 AutoGLM 开源项目或基于其开发的应用/服务从事以下行为:
+(1)批量自动化与恶意竞争行为
+- 进行任何形式的虚假数据操作:刷单、刷票、刷赞、刷评论、刷流量、刷粉丝、刷播放量、刷下载量等。
+- 批量操控账号:批量注册、批量登录、批量操作第三方平台账号(群控、多开、云控)。
+- 扰乱市场秩序:恶意抢购、囤积居奇、抢占限量资源、批量领取/滥用优惠券/补贴、恶意占用服务资源(薅羊毛)。
+- 操纵平台规则:刷榜、刷排名、操纵搜索结果、人为干预推荐算法、虚假提升/降低内容曝光度。
+- 制造虚假活跃度:批量发布、转发、点赞、收藏、关注、取关等社交媒体操作。
+- 破坏游戏公平:游戏代练、工作室操作、批量刷装备/金币/经验/道具。
+- 破坏公正性:批量投票、刷票、操纵网络评选、调查结果。
+(2)虚假信息与欺诈行为
+- 制造误导信息:发布/传播虚假商品/服务评价、虚假用户反馈、虚假证言、虚假体验。
+- 伪造商业数据:制造虚假交易记录、虚假销量、虚假用户活跃度、虚假好评率。
+- 身份欺诈:冒充他人身份、虚构个人信息、盗用他人账号/头像/昵称、伪造身份证明。
+- 虚假营销:发布虚假广告、进行虚假宣传、夸大产品功效、隐瞒产品缺陷/风险。
+- 参与诈骗活动:网络诈骗、虚假投资、传销、非法集资、虚假中奖、钓鱼等。
+- 传播不实信息:制造或恶意传播虚假新闻、谣言、未经证实的信息。
+(3)破坏第三方服务与系统安全
+- 非授权访问:利用 AutoGLM 进行数据爬取(违反 robots.txt 或平台政策)、信息窃取、API 接口滥用、服务器渗透测试(未授权)。
+- 技术破坏:对第三方应用进行逆向工程、破解、修改、注入恶意代码、干扰其正常运行。
+- 资源滥用:恶意占用第三方服务器资源、发送垃圾请求、制造异常流量、进行 DDoS 攻击。
+- 违反平台规则:故意违反被操作第三方应用的用户协议、服务条款、社区规则。
+- 恶意竞争:恶意差评、恶意举报、恶意投诉、商业诋毁。
+- 传播有害内容:传播计算机病毒、木马、恶意软件、勒索软件、垃圾邮件、非法内容。
+- 侵犯数据权益:未经授权进行大规模商业数据采集、用户信息收集、隐私窥探。
+(4)侵犯他人合法权益
+- 账号盗用:盗用他人账号、密码、身份凭证进行操作。
+- 网络骚扰与霸凌:恶意骚扰、威胁、辱骂、诽谤、人肉搜索他人。
+- 侵犯隐私与秘密:未经授权收集、使用、传播他人个人信息、隐私数据、商业秘密。
+- 恶意抢注:抢注他人商标、域名、用户名、社交媒体账号等。
+- 骚扰行为:恶意刷屏、垃圾信息轰炸、强制关注/订阅。
+- 损害商业利益:商业间谍活动、不正当竞争、恶意挖角、窃取商业机密。
+(5)滥用资源与破坏项目生态
+- 滥用注册资源:恶意注册大量账号、虚假注册。
+- 浪费计算/设备资源:恶意占用本地设备或云设备资源、长时间闲置占用、运行与自动化任务无关的高耗能程序(如挖矿)。
+- 破坏稳定性:恶意测试系统性能、进行压力测试(未授权)、频繁重启服务、利用技术漏洞/缺陷牟利或损害项目/平台利益。
+- 违反开源协议:违反 AutoGLM 项目的开源许可证条款。
+
+违反后果:
+
+如开发者/用户在使用中未遵循相应的法律法规、政策、行业标准(包括但不限于技术规范、安全标准)及开源项目的约定(包括但不限于开源协议、使用须知),由此产生的全部法律责任、经济损失及一切不良后果,均由开发者 / 用户自行独立承担。
\ No newline at end of file
diff --git a/resources/setting.png b/resources/setting.png
new file mode 100644
index 0000000..0cb7af7
Binary files /dev/null and b/resources/setting.png differ
diff --git a/resources/wechat.png b/resources/wechat.png
new file mode 100644
index 0000000..2855cb3
Binary files /dev/null and b/resources/wechat.png differ
diff --git a/setup.py b/setup.py
new file mode 100644
index 0000000..c31a5c7
--- /dev/null
+++ b/setup.py
@@ -0,0 +1,49 @@
+#!/usr/bin/env python3
+"""Setup script for Phone Agent."""
+
+from setuptools import find_packages, setup
+
+with open("README.md", "r", encoding="utf-8") as f:
+ long_description = f.read()
+
+setup(
+ name="phone-agent",
+ version="0.1.0",
+ author="Your Name",
+ author_email="your.email@example.com",
+ description="AI-powered phone automation framework",
+ long_description=long_description,
+ long_description_content_type="text/markdown",
+ url="https://github.com/yourusername/phone-agent",
+ packages=find_packages(),
+ classifiers=[
+ "Development Status :: 3 - Alpha",
+ "Intended Audience :: Developers",
+ "License :: OSI Approved :: Apache Software License",
+ "Operating System :: OS Independent",
+ "Programming Language :: Python :: 3",
+ "Programming Language :: Python :: 3.10",
+ "Programming Language :: Python :: 3.11",
+ "Programming Language :: Python :: 3.12",
+ "Topic :: Software Development :: Libraries :: Python Modules",
+ "Topic :: Scientific/Engineering :: Artificial Intelligence",
+ ],
+ python_requires=">=3.10",
+ install_requires=[
+ "Pillow>=12.0.0",
+ "openai>=2.9.0",
+ ],
+ extras_require={
+ "dev": [
+ "pytest>=7.0.0",
+ "black>=23.0.0",
+ "mypy>=1.0.0",
+ "ruff>=0.1.0",
+ ],
+ },
+ entry_points={
+ "console_scripts": [
+ "phone-agent=main:main",
+ ],
+ },
+)