Features: - VideoLearningAgent for automated video watching on Douyin/Kuaishou/TikTok - Web dashboard UI for video learning sessions - Real-time progress tracking with screenshot capture - App detection using get_current_app() for accurate recording - Session management with pause/resume/stop controls Technical improvements: - Simplified video detection logic using direct app detection - Full base64 hash for sensitive screenshot change detection - Immediate stop when target video count is reached - Fixed circular import issues with ModelConfig Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
254 lines
5.2 KiB
Markdown
254 lines
5.2 KiB
Markdown
# Video Learning Agent
|
||
|
||
AI-powered agent for learning from short video platforms like Douyin (抖音), Kuaishou (快手), and TikTok.
|
||
|
||
## 功能特性
|
||
|
||
### MVP 功能
|
||
- **自动滑动**: 自动在视频之间滑动切换
|
||
- **播放控制**: 播放/暂停控制
|
||
- **截图记录**: 为每个视频截图保存
|
||
- **数据采集**: 采集视频描述、点赞数、评论数
|
||
- **可视化管理**: 通过 Web Dashboard 可视化控制
|
||
- **会话管理**: 创建、暂停、恢复、停止学习会话
|
||
- **数据导出**: 导出学习数据(JSON/CSV)
|
||
|
||
## 快速开始
|
||
|
||
### 1. 启动 Dashboard
|
||
|
||
```bash
|
||
# 使用脚本启动(推荐)
|
||
python scripts/run_video_learning_demo.bat # Windows
|
||
bash scripts/run_video_learning_demo.sh # Linux/Mac
|
||
|
||
# 或手动启动
|
||
python -m uvicorn dashboard.main:app --host 0.0.0.0 --port 8080 --reload
|
||
```
|
||
|
||
### 2. 访问 Video Learning 页面
|
||
|
||
打开浏览器访问: `http://localhost:8080/static/video-learning.html`
|
||
|
||
或从主 Dashboard 页面点击 "Video Learning" 按钮。
|
||
|
||
### 3. 创建学习会话
|
||
|
||
1. 选择设备
|
||
2. 选择平台(抖音/快手/TikTok)
|
||
3. 设置目标视频数量
|
||
4. (可选)设置类别筛选
|
||
5. 设置观看时长
|
||
6. 点击 "Start Learning"
|
||
|
||
## 使用示例
|
||
|
||
### 独立运行
|
||
|
||
```bash
|
||
python examples/video_learning_demo.py \
|
||
--device-id emulator-5554 \
|
||
--count 10 \
|
||
--category "美食" \
|
||
--watch-duration 3.0
|
||
```
|
||
|
||
### 通过 Dashboard
|
||
|
||
1. 打开 Video Learning 页面
|
||
2. 配置学习参数
|
||
3. 点击启动
|
||
4. 实时查看进度
|
||
|
||
### API 调用
|
||
|
||
```python
|
||
from phone_agent import VideoLearningAgent
|
||
from phone_agent.model.client import ModelConfig
|
||
|
||
# 创建模型配置
|
||
model_config = ModelConfig(
|
||
base_url="https://open.bigmodel.cn/api/paas/v4",
|
||
model_name="autoglm-phone-9b",
|
||
api_key="your-api-key",
|
||
)
|
||
|
||
# 创建 Video Learning Agent
|
||
agent = VideoLearningAgent(
|
||
model_config=model_config,
|
||
platform="douyin",
|
||
output_dir="./video_learning_data",
|
||
)
|
||
|
||
# 启动会话
|
||
session_id = agent.start_session(
|
||
device_id="emulator-5554",
|
||
target_count=10,
|
||
category="美食",
|
||
watch_duration=3.0,
|
||
)
|
||
|
||
# 运行任务
|
||
task = """
|
||
在抖音上学习"美食"类视频:
|
||
1. 打开抖音并搜索"美食"
|
||
2. 观看视频,每个视频约3秒
|
||
3. 记录描述、点赞数、评论数
|
||
4. 滑动到下一个视频
|
||
5. 重复直到观看完10个视频
|
||
"""
|
||
|
||
success = agent.run_learning_task(task)
|
||
|
||
# 导出数据
|
||
agent.export_data("json")
|
||
agent.export_data("csv")
|
||
```
|
||
|
||
## API 端点
|
||
|
||
### 创建会话
|
||
```http
|
||
POST /api/video-learning/sessions
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"device_id": "emulator-5554",
|
||
"platform": "douyin",
|
||
"target_count": 10,
|
||
"category": "美食",
|
||
"watch_duration": 3.0
|
||
}
|
||
```
|
||
|
||
### 启动会话
|
||
```http
|
||
POST /api/video-learning/sessions/{session_id}/start
|
||
```
|
||
|
||
### 控制会话
|
||
```http
|
||
POST /api/video-learning/sessions/{session_id}/control
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"action": "pause" // pause, resume, stop
|
||
}
|
||
```
|
||
|
||
### 获取会话状态
|
||
```http
|
||
GET /api/video-learning/sessions/{session_id}/status
|
||
```
|
||
|
||
### 获取会话视频列表
|
||
```http
|
||
GET /api/video-learning/sessions/{session_id}/videos
|
||
```
|
||
|
||
## 数据结构
|
||
|
||
### VideoRecord
|
||
```python
|
||
{
|
||
"sequence_id": 1,
|
||
"timestamp": "2024-01-09T10:00:00",
|
||
"screenshot_path": "./video_learning_data/screenshots/...",
|
||
"watch_duration": 3.0,
|
||
"description": "视频描述文案",
|
||
"likes": 1000,
|
||
"comments": 50,
|
||
"tags": [],
|
||
"category": "美食"
|
||
}
|
||
```
|
||
|
||
### LearningSession
|
||
```python
|
||
{
|
||
"session_id": "session_20240109_100000",
|
||
"start_time": "2024-01-09T10:00:00",
|
||
"platform": "douyin",
|
||
"target_category": "美食",
|
||
"target_count": 10,
|
||
"is_active": true,
|
||
"is_paused": false,
|
||
"total_videos": 10,
|
||
"total_duration": 30.0,
|
||
"records": [...]
|
||
}
|
||
```
|
||
|
||
## 配置选项
|
||
|
||
在 `.env` 文件中配置:
|
||
|
||
```bash
|
||
# 视频学习数据输出目录
|
||
VIDEO_LEARNING_OUTPUT_DIR=./video_learning_data
|
||
|
||
# 模型参数
|
||
PHONE_AGENT_MAX_TOKENS=3000
|
||
PHONE_AGENT_TEMPERATURE=0.0
|
||
PHONE_AGENT_TOP_P=0.85
|
||
PHONE_AGENT_FREQUENCY_PENALTY=0.2
|
||
```
|
||
|
||
## 后续扩展计划
|
||
|
||
### 阶段 2: 高级分析
|
||
- [ ] 视频内容特征提取
|
||
- [ ] 常见元素识别
|
||
- [ ] 视频风格分析
|
||
- [ ] BGM 识别
|
||
|
||
### 阶段 3: 模式学习
|
||
- [ ] 同类视频模式归纳
|
||
- [ ] 创作趋势分析
|
||
- [ ] 热门元素统计
|
||
- [ ] 最佳实践总结
|
||
|
||
### 阶段 4: 创作辅助
|
||
- [ ] 脚本生成
|
||
- [ ] 分镜头建议
|
||
- [ ] 拍摄指导
|
||
- [ ] 剪辑建议
|
||
|
||
## 技术架构
|
||
|
||
```
|
||
VideoLearningAgent
|
||
├── ModelConfig (VLM 配置)
|
||
├── LearningSession (会话管理)
|
||
│ └── VideoRecord[] (视频记录)
|
||
├── Callbacks (回调函数)
|
||
│ ├── on_video_watched
|
||
│ ├── on_progress_update
|
||
│ └── on_session_complete
|
||
└── PhoneAgent (底层操作)
|
||
├── 视觉理解 (VLM)
|
||
├── 设备控制 (ADB/HDC/iOS)
|
||
└── 任务执行
|
||
```
|
||
|
||
## 故障排除
|
||
|
||
### 问题: 设备未连接
|
||
- 确保 ADB/HDC 服务正在运行
|
||
- 检查设备是否通过 USB 连接
|
||
- 尝试点击 "Refresh" 按钮
|
||
|
||
### 问题: 任务无法启动
|
||
- 检查模型 API 配置
|
||
- 确保 `.env` 文件正确配置
|
||
- 查看 Dashboard 控制台日志
|
||
|
||
### 问题: 视频信息未采集
|
||
- 确保 VLM 模型正常工作
|
||
- 检查网络连接
|
||
- 增加观看时长
|
||
|
||
## 许可证
|
||
|
||
MIT License
|