Add Video Learning Agent for short video platforms

Features: - VideoLearningAgent for automated video watching on Douyin/Kuaishou/TikTok - Web dashboard UI for video learning sessions - Real-time progress tracking with screenshot capture - App detection using get_current_app() for accurate recording - Session management with pause/resume/stop controls Technical improvements: - Simplified video detection logic using direct app detection - Full base64 hash for sensitive screenshot change detection - Immediate stop when target video count is reached - Fixed circular import issues with ModelConfig Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 22:54:57 +08:00
parent 3552df23d6
commit 5b3f214e20
15 changed files with 2317 additions and 1 deletions
--- a/docs/VIDEO_LEARNING.md
+++ b/docs/VIDEO_LEARNING.md
@@ -0,0 +1,253 @@
+# Video Learning Agent
+
+AI-powered agent for learning from short video platforms like Douyin (抖音), Kuaishou (快手), and TikTok.
+
+## 功能特性
+
+### MVP 功能
+- **自动滑动**: 自动在视频之间滑动切换
+- **播放控制**: 播放/暂停控制
+- **截图记录**: 为每个视频截图保存
+- **数据采集**: 采集视频描述、点赞数、评论数
+- **可视化管理**: 通过 Web Dashboard 可视化控制
+- **会话管理**: 创建、暂停、恢复、停止学习会话
+- **数据导出**: 导出学习数据（JSON/CSV）
+
+## 快速开始
+
+### 1. 启动 Dashboard
+
+```bash
+# 使用脚本启动（推荐）
+python scripts/run_video_learning_demo.bat   # Windows
+bash scripts/run_video_learning_demo.sh      # Linux/Mac
+
+# 或手动启动
+python -m uvicorn dashboard.main:app --host 0.0.0.0 --port 8080 --reload
+```
+
+### 2. 访问 Video Learning 页面
+
+打开浏览器访问: `http://localhost:8080/static/video-learning.html`
+
+或从主 Dashboard 页面点击 "Video Learning" 按钮。
+
+### 3. 创建学习会话
+
+1. 选择设备
+2. 选择平台（抖音/快手/TikTok）
+3. 设置目标视频数量
+4. （可选）设置类别筛选
+5. 设置观看时长
+6. 点击 "Start Learning"
+
+## 使用示例
+
+### 独立运行
+
+```bash
+python examples/video_learning_demo.py \
+    --device-id emulator-5554 \
+    --count 10 \
+    --category "美食" \
+    --watch-duration 3.0
+```
+
+### 通过 Dashboard
+
+1. 打开 Video Learning 页面
+2. 配置学习参数
+3. 点击启动
+4. 实时查看进度
+
+### API 调用
+
+```python
+from phone_agent import VideoLearningAgent
+from phone_agent.model.client import ModelConfig
+
+# 创建模型配置
+model_config = ModelConfig(
+    base_url="https://open.bigmodel.cn/api/paas/v4",
+    model_name="autoglm-phone-9b",
+    api_key="your-api-key",
+)
+
+# 创建 Video Learning Agent
+agent = VideoLearningAgent(
+    model_config=model_config,
+    platform="douyin",
+    output_dir="./video_learning_data",
+)
+
+# 启动会话
+session_id = agent.start_session(
+    device_id="emulator-5554",
+    target_count=10,
+    category="美食",
+    watch_duration=3.0,
+)
+
+# 运行任务
+task = """
+在抖音上学习"美食"类视频：
+1. 打开抖音并搜索"美食"
+2. 观看视频，每个视频约3秒
+3. 记录描述、点赞数、评论数
+4. 滑动到下一个视频
+5. 重复直到观看完10个视频
+"""
+
+success = agent.run_learning_task(task)
+
+# 导出数据
+agent.export_data("json")
+agent.export_data("csv")
+```
+
+## API 端点
+
+### 创建会话
+```http
+POST /api/video-learning/sessions
+Content-Type: application/json
+
+{
+    "device_id": "emulator-5554",
+    "platform": "douyin",
+    "target_count": 10,
+    "category": "美食",
+    "watch_duration": 3.0
+}
+```
+
+### 启动会话
+```http
+POST /api/video-learning/sessions/{session_id}/start
+```
+
+### 控制会话
+```http
+POST /api/video-learning/sessions/{session_id}/control
+Content-Type: application/json
+
+{
+    "action": "pause"  // pause, resume, stop
+}
+```
+
+### 获取会话状态
+```http
+GET /api/video-learning/sessions/{session_id}/status
+```
+
+### 获取会话视频列表
+```http
+GET /api/video-learning/sessions/{session_id}/videos
+```
+
+## 数据结构
+
+### VideoRecord
+```python
+{
+    "sequence_id": 1,
+    "timestamp": "2024-01-09T10:00:00",
+    "screenshot_path": "./video_learning_data/screenshots/...",
+    "watch_duration": 3.0,
+    "description": "视频描述文案",
+    "likes": 1000,
+    "comments": 50,
+    "tags": [],
+    "category": "美食"
+}
+```
+
+### LearningSession
+```python
+{
+    "session_id": "session_20240109_100000",
+    "start_time": "2024-01-09T10:00:00",
+    "platform": "douyin",
+    "target_category": "美食",
+    "target_count": 10,
+    "is_active": true,
+    "is_paused": false,
+    "total_videos": 10,
+    "total_duration": 30.0,
+    "records": [...]
+}
+```
+
+## 配置选项
+
+在 `.env` 文件中配置：
+
+```bash
+# 视频学习数据输出目录
+VIDEO_LEARNING_OUTPUT_DIR=./video_learning_data
+
+# 模型参数
+PHONE_AGENT_MAX_TOKENS=3000
+PHONE_AGENT_TEMPERATURE=0.0
+PHONE_AGENT_TOP_P=0.85
+PHONE_AGENT_FREQUENCY_PENALTY=0.2
+```
+
+## 后续扩展计划
+
+### 阶段 2: 高级分析
+- [ ] 视频内容特征提取
+- [ ] 常见元素识别
+- [ ] 视频风格分析
+- [ ] BGM 识别
+
+### 阶段 3: 模式学习
+- [ ] 同类视频模式归纳
+- [ ] 创作趋势分析
+- [ ] 热门元素统计
+- [ ] 最佳实践总结
+
+### 阶段 4: 创作辅助
+- [ ] 脚本生成
+- [ ] 分镜头建议
+- [ ] 拍摄指导
+- [ ] 剪辑建议
+
+## 技术架构
+
+```
+VideoLearningAgent
+├── ModelConfig (VLM 配置)
+├── LearningSession (会话管理)
+│   └── VideoRecord[] (视频记录)
+├── Callbacks (回调函数)
+│   ├── on_video_watched
+│   ├── on_progress_update
+│   └── on_session_complete
+└── PhoneAgent (底层操作)
+    ├── 视觉理解 (VLM)
+    ├── 设备控制 (ADB/HDC/iOS)
+    └── 任务执行
+```
+
+## 故障排除
+
+### 问题: 设备未连接
+- 确保 ADB/HDC 服务正在运行
+- 检查设备是否通过 USB 连接
+- 尝试点击 "Refresh" 按钮
+
+### 问题: 任务无法启动
+- 检查模型 API 配置
+- 确保 `.env` 文件正确配置
+- 查看 Dashboard 控制台日志
+
+### 问题: 视频信息未采集
+- 确保 VLM 模型正常工作
+- 检查网络连接
+- 增加观看时长
+
+## 许可证
+
+MIT License