Files
Open-AutoGLM/docs/VIDEO_LEARNING.md
let5sne.win10 5b3f214e20 Add Video Learning Agent for short video platforms
Features:
- VideoLearningAgent for automated video watching on Douyin/Kuaishou/TikTok
- Web dashboard UI for video learning sessions
- Real-time progress tracking with screenshot capture
- App detection using get_current_app() for accurate recording
- Session management with pause/resume/stop controls

Technical improvements:
- Simplified video detection logic using direct app detection
- Full base64 hash for sensitive screenshot change detection
- Immediate stop when target video count is reached
- Fixed circular import issues with ModelConfig

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 22:54:57 +08:00

254 lines
5.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Video Learning Agent
AI-powered agent for learning from short video platforms like Douyin (抖音), Kuaishou (快手), and TikTok.
## 功能特性
### MVP 功能
- **自动滑动**: 自动在视频之间滑动切换
- **播放控制**: 播放/暂停控制
- **截图记录**: 为每个视频截图保存
- **数据采集**: 采集视频描述、点赞数、评论数
- **可视化管理**: 通过 Web Dashboard 可视化控制
- **会话管理**: 创建、暂停、恢复、停止学习会话
- **数据导出**: 导出学习数据JSON/CSV
## 快速开始
### 1. 启动 Dashboard
```bash
# 使用脚本启动(推荐)
python scripts/run_video_learning_demo.bat # Windows
bash scripts/run_video_learning_demo.sh # Linux/Mac
# 或手动启动
python -m uvicorn dashboard.main:app --host 0.0.0.0 --port 8080 --reload
```
### 2. 访问 Video Learning 页面
打开浏览器访问: `http://localhost:8080/static/video-learning.html`
或从主 Dashboard 页面点击 "Video Learning" 按钮。
### 3. 创建学习会话
1. 选择设备
2. 选择平台(抖音/快手/TikTok
3. 设置目标视频数量
4. (可选)设置类别筛选
5. 设置观看时长
6. 点击 "Start Learning"
## 使用示例
### 独立运行
```bash
python examples/video_learning_demo.py \
--device-id emulator-5554 \
--count 10 \
--category "美食" \
--watch-duration 3.0
```
### 通过 Dashboard
1. 打开 Video Learning 页面
2. 配置学习参数
3. 点击启动
4. 实时查看进度
### API 调用
```python
from phone_agent import VideoLearningAgent
from phone_agent.model.client import ModelConfig
# 创建模型配置
model_config = ModelConfig(
base_url="https://open.bigmodel.cn/api/paas/v4",
model_name="autoglm-phone-9b",
api_key="your-api-key",
)
# 创建 Video Learning Agent
agent = VideoLearningAgent(
model_config=model_config,
platform="douyin",
output_dir="./video_learning_data",
)
# 启动会话
session_id = agent.start_session(
device_id="emulator-5554",
target_count=10,
category="美食",
watch_duration=3.0,
)
# 运行任务
task = """
在抖音上学习"美食"类视频:
1. 打开抖音并搜索"美食"
2. 观看视频每个视频约3秒
3. 记录描述、点赞数、评论数
4. 滑动到下一个视频
5. 重复直到观看完10个视频
"""
success = agent.run_learning_task(task)
# 导出数据
agent.export_data("json")
agent.export_data("csv")
```
## API 端点
### 创建会话
```http
POST /api/video-learning/sessions
Content-Type: application/json
{
"device_id": "emulator-5554",
"platform": "douyin",
"target_count": 10,
"category": "",
"watch_duration": 3.0
}
```
### 启动会话
```http
POST /api/video-learning/sessions/{session_id}/start
```
### 控制会话
```http
POST /api/video-learning/sessions/{session_id}/control
Content-Type: application/json
{
"action": "pause" // pause, resume, stop
}
```
### 获取会话状态
```http
GET /api/video-learning/sessions/{session_id}/status
```
### 获取会话视频列表
```http
GET /api/video-learning/sessions/{session_id}/videos
```
## 数据结构
### VideoRecord
```python
{
"sequence_id": 1,
"timestamp": "2024-01-09T10:00:00",
"screenshot_path": "./video_learning_data/screenshots/...",
"watch_duration": 3.0,
"description": "视频描述文案",
"likes": 1000,
"comments": 50,
"tags": [],
"category": "美食"
}
```
### LearningSession
```python
{
"session_id": "session_20240109_100000",
"start_time": "2024-01-09T10:00:00",
"platform": "douyin",
"target_category": "美食",
"target_count": 10,
"is_active": true,
"is_paused": false,
"total_videos": 10,
"total_duration": 30.0,
"records": [...]
}
```
## 配置选项
`.env` 文件中配置:
```bash
# 视频学习数据输出目录
VIDEO_LEARNING_OUTPUT_DIR=./video_learning_data
# 模型参数
PHONE_AGENT_MAX_TOKENS=3000
PHONE_AGENT_TEMPERATURE=0.0
PHONE_AGENT_TOP_P=0.85
PHONE_AGENT_FREQUENCY_PENALTY=0.2
```
## 后续扩展计划
### 阶段 2: 高级分析
- [ ] 视频内容特征提取
- [ ] 常见元素识别
- [ ] 视频风格分析
- [ ] BGM 识别
### 阶段 3: 模式学习
- [ ] 同类视频模式归纳
- [ ] 创作趋势分析
- [ ] 热门元素统计
- [ ] 最佳实践总结
### 阶段 4: 创作辅助
- [ ] 脚本生成
- [ ] 分镜头建议
- [ ] 拍摄指导
- [ ] 剪辑建议
## 技术架构
```
VideoLearningAgent
├── ModelConfig (VLM 配置)
├── LearningSession (会话管理)
│ └── VideoRecord[] (视频记录)
├── Callbacks (回调函数)
│ ├── on_video_watched
│ ├── on_progress_update
│ └── on_session_complete
└── PhoneAgent (底层操作)
├── 视觉理解 (VLM)
├── 设备控制 (ADB/HDC/iOS)
└── 任务执行
```
## 故障排除
### 问题: 设备未连接
- 确保 ADB/HDC 服务正在运行
- 检查设备是否通过 USB 连接
- 尝试点击 "Refresh" 按钮
### 问题: 任务无法启动
- 检查模型 API 配置
- 确保 `.env` 文件正确配置
- 查看 Dashboard 控制台日志
### 问题: 视频信息未采集
- 确保 VLM 模型正常工作
- 检查网络连接
- 增加观看时长
## 许可证
MIT License