open-autoglm-phone-agent by aradotso/trending-skills
npx skills add https://github.com/aradotso/trending-skills --skill open-autoglm-phone-agentSkill by ara.so — Daily 2026 Skills collection.
Open-AutoGLM 是一个开源的 AI 手机智能体框架,支持通过自然语言控制 Android、HarmonyOS NEXT 和 iOS 设备。它使用 AutoGLM 视觉语言模型(90 亿参数)来感知屏幕内容,并执行多步骤任务,例如“打开美团并搜索附近的火锅店”。
User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
# Android
adb devices
# Expected: emulator-5554 device
# HarmonyOS NEXT
hdc list targets
# Expected: 7001005458323933328a01bce01c2500
BigModel (智谱AI)
export BIGMODEL_API_KEY="your-bigmodel-api-key"
python main.py \
--base-url https://open.bigmodel.cn/api/paas/v4 \
--model "autoglm-phone" \
--apikey $BIGMODEL_API_KEY \
"打开美团搜索附近的火锅店"
ModelScope
export MODELSCOPE_API_KEY="your-modelscope-api-key"
python main.py \
--base-url https://api-inference.modelscope.cn/v1 \
--model "ZhipuAI/AutoGLM-Phone-9B" \
--apikey $MODELSCOPE_API_KEY \
"open Meituan and find nearby hotpot"
# 安装 vLLM(或使用官方 Docker:docker pull vllm/vllm-openai:v0.12.0)
pip install vllm
# 启动模型服务器(请严格遵循这些参数)
python3 -m vllm.entrypoints.openai.api_server \
--served-model-name autoglm-phone-9b \
--allowed-local-media-path / \
--mm-encoder-tp-mode data \
--mm_processor_cache_type shm \
--mm_processor_kwargs '{"max_pixels":5000000}' \
--max-model-len 25480 \
--chat-template-content-format string \
--limit-mm-per-prompt '{"image":10}' \
--model zai-org/AutoGLM-Phone-9B \
--port 8000
# 安装 SGLang 或使用:docker pull lmsysorg/sglang:v0.5.6.post1
# 在容器内:pip install nvidia-cudnn-cu12==9.16.0.29
python3 -m sglang.launch_server \
--model-path zai-org/AutoGLM-Phone-9B \
--served-model-name autoglm-phone-9b \
--context-length 25480 \
--mm-enable-dp-encoder \
--mm-process-config '{"image":{"max_pixels":5000000}}' \
--port 8000
python scripts/check_deployment_cn.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b
预期输出包含一个 <think>...</think> 块,后跟 <answer>do(action="Launch", app="...")。如果思维链非常短或混乱,则模型部署失败。
# Android 设备(默认)
python main.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b \
"打开小红书搜索美食"
# HarmonyOS 设备
python main.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b \
--device-type hdc \
"打开设置查看WiFi"
# 用于英文应用的多语言模型
python main.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b-multilingual \
"Open Instagram and search for travel photos"
| 参数 | 描述 | 默认值 |
|---|---|---|
--base-url | 模型服务端点 | 必填 |
--model | 服务器上的模型名称 | 必填 |
--apikey | 第三方服务的 API 密钥 | None |
--device-type | adb (Android) 或 hdc (HarmonyOS) | adb |
--device-id | 特定设备序列号 | 自动检测 |
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
config = AgentConfig(
base_url="http://localhost:8000/v1",
model="autoglm-phone-9b",
device_type="adb", # 或 HarmonyOS 用 "hdc"
)
agent = PhoneAgent(config)
# 运行任务
result = agent.run("打开淘宝搜索蓝牙耳机")
print(result)
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os
config = AgentConfig(
base_url=os.environ["MODEL_BASE_URL"],
model=os.environ["MODEL_NAME"],
apikey=os.environ.get("MODEL_API_KEY"),
device_type="adb",
device_id="emulator-5554", # 指定设备
)
agent = PhoneAgent(config)
# 包含敏感操作确认的任务
result = agent.run(
"在京东购买最便宜的蓝牙耳机",
confirm_sensitive=True # 购买操作前提示用户
)
import openai
import base64
import os
from pathlib import Path
client = openai.OpenAI(
base_url=os.environ["MODEL_BASE_URL"],
api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)
# 加载屏幕截图
screenshot_path = "screenshot.png"
with open(screenshot_path, "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="autoglm-phone-9b",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_b64}"},
},
{
"type": "text",
"text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search",
},
],
}
],
)
print(response.choices[0].message.content)
# 输出格式:<think>...</think>\n<answer>do(action="...", ...)
import re
def parse_action(model_output: str) -> dict:
"""将 AutoGLM 模型输出解析为结构化动作。"""
# 提取 answer 块
answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
if not answer_match:
return {"action": "unknown"}
answer = answer_match.group(1).strip()
# 解析 do() 调用
# 格式:do(action="ActionName", param1="value1", param2="value2")
action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
if not action_match:
return {"action": "unknown", "raw": answer}
action_name = action_match.group(1)
params_str = action_match.group(2)
# 解析参数
params = {}
for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
params[param_match.group(1)] = param_match.group(2)
return {"action": action_name, **params}
# 使用示例
output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")'
action = parse_action(output)
# {"action": "Launch", "app": "京东"}
import subprocess
def take_screenshot(device_id: str = None) -> bytes:
"""捕获当前设备屏幕。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["exec-out", "screencap", "-p"])
result = subprocess.run(cmd, capture_output=True)
return result.stdout
def send_tap(x: int, y: int, device_id: str = None):
"""点击屏幕坐标。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "tap", str(x), str(y)])
subprocess.run(cmd)
def send_text_adb_keyboard(text: str, device_id: str = None):
"""通过 ADB Keyboard 发送文本(必须已安装并启用)。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
# 首先启用 ADB 键盘
cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
subprocess.run(cmd_enable)
# 发送文本
cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
"--es", "msg", text]
subprocess.run(cmd_text)
def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
"""在屏幕上滑动。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "swipe",
str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
subprocess.run(cmd)
def press_back(device_id: str = None):
"""按下 Android 返回键。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
subprocess.run(cmd)
def launch_app(package_name: str, device_id: str = None):
"""通过包名启动应用。"""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "monkey", "-p", package_name, "-c",
"android.intent.category.LAUNCHER", "1"])
subprocess.run(cmd)
使用 AutoGLM 进行 JavaScript/TypeScript 自动化:
// .env 配置
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key
import { AndroidAgent } from "@midscene/android";
const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么?");
# 首先通过 USB 连接设备,然后启用 TCP/IP 模式
adb tcpip 5555
# 获取设备 IP 地址
adb shell ip addr show wlan0
# 无线连接(之后可以断开 USB)
adb connect 192.168.1.100:5555
# 验证连接
adb devices
# 192.168.1.100:5555 device
# 与智能体一起使用
python main.py \
--base-url http://model-server:8000/v1 \
--model autoglm-phone-9b \
--device-id "192.168.1.100:5555" \
"打开支付宝查看余额"
AutoGLM 模型输出结构化动作:
| 动作 | 描述 | 示例 |
|---|---|---|
Launch | 打开应用 | do(action="Launch", app="微信") |
Tap | 点击屏幕元素 | do(action="Tap", element="搜索框") |
Type | 输入文本 | do(action="Type", text="火锅") |
Swipe | 滚动/滑动 | do(action="Swipe", direction="up") |
Back | 按下返回键 | do(action="Back") |
Home | 返回主屏幕 | do(action="Home") |
Finish | 任务完成 | do(action="Finish", result="已完成搜索") |
| 模型 | 使用场景 | 语言 |
|---|---|---|
AutoGLM-Phone-9B | 中文应用(微信、淘宝、美团) | 中文优化版 |
AutoGLM-Phone-9B-Multilingual | 国际应用,混合内容 | 中文 + 英文 + 其他 |
zai-org/AutoGLM-Phone-9B / zai-org/AutoGLM-Phone-9B-MultilingualZhipuAI/AutoGLM-Phone-9B / ZhipuAI/AutoGLM-Phone-9B-Multilingual# 模型服务
export MODEL_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="autoglm-phone-9b"
export MODEL_API_KEY="" # BigModel/ModelScope API 需要
# BigModel API
export BIGMODEL_API_KEY=""
export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"
# ModelScope API
export MODELSCOPE_API_KEY=""
export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"
# 设备配置
export ADB_DEVICE_ID="" # 留空以自动检测
export HDC_DEVICE_ID="" # HarmonyOS 设备 ID
原因 : vLLM/SGLang 启动参数不正确。修复 : 确保设置了 --chat-template-content-format string (vLLM) 和 --mm-process-config 中的 max_pixels:5000000。检查 transformers 版本兼容性。
adb devices 未显示设备修复 :
adb kill-server && adb start-server修复 : ADB Keyboard 必须已安装并启用:
adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME
原因 : 模型无法找到完成任务的方法。修复 : 框架包含敏感操作确认功能 — 对于购买/删除任务,确保设置 confirm_sensitive=True。对于登录/验证码屏幕,智能体支持人工接管。
修复 : AutoGLM-Phone-9B 需要约 20GB 显存。使用 --tensor-parallel-size 2 进行多 GPU 部署,或改用 API 服务。
修复 : 检查防火墙规则。对于远程服务器:
# 测试连通性
curl http://YOUR_SERVER_IP:8000/v1/models
# 应返回模型列表 JSON
修复 : 需要 HarmonyOS NEXT(非早期版本)。在 设置 → 关于 → 版本号(快速点击 10 次)中开启开发者模式。
关于 iPhone 自动化,请参阅专用设置指南:
# 按照 docs/ios_setup/ios_setup.md 配置 WebDriverAgent 后
python main.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b-multilingual \
--device-type ios \
"Open Maps and navigate to Central Park"
每周安装量
274
仓库
GitHub Stars
10
首次出现
6 天前
安全审计
安装于
cursor273
gemini-cli273
amp273
cline273
github-copilot273
codex273
Skill by ara.so — Daily 2026 Skills collection.
Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."
User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions
git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .
# Android
adb devices
# Expected: emulator-5554 device
# HarmonyOS NEXT
hdc list targets
# Expected: 7001005458323933328a01bce01c2500
BigModel (ZhipuAI)
export BIGMODEL_API_KEY="your-bigmodel-api-key"
python main.py \
--base-url https://open.bigmodel.cn/api/paas/v4 \
--model "autoglm-phone" \
--apikey $BIGMODEL_API_KEY \
"打开美团搜索附近的火锅店"
ModelScope
export MODELSCOPE_API_KEY="your-modelscope-api-key"
python main.py \
--base-url https://api-inference.modelscope.cn/v1 \
--model "ZhipuAI/AutoGLM-Phone-9B" \
--apikey $MODELSCOPE_API_KEY \
"open Meituan and find nearby hotpot"
# Install vLLM (or use official Docker: docker pull vllm/vllm-openai:v0.12.0)
pip install vllm
# Start model server (strictly follow these parameters)
python3 -m vllm.entrypoints.openai.api_server \
--served-model-name autoglm-phone-9b \
--allowed-local-media-path / \
--mm-encoder-tp-mode data \
--mm_processor_cache_type shm \
--mm_processor_kwargs '{"max_pixels":5000000}' \
--max-model-len 25480 \
--chat-template-content-format string \
--limit-mm-per-prompt '{"image":10}' \
--model zai-org/AutoGLM-Phone-9B \
--port 8000
# Install SGLang or use: docker pull lmsysorg/sglang:v0.5.6.post1
# Inside container: pip install nvidia-cudnn-cu12==9.16.0.29
python3 -m sglang.launch_server \
--model-path zai-org/AutoGLM-Phone-9B \
--served-model-name autoglm-phone-9b \
--context-length 25480 \
--mm-enable-dp-encoder \
--mm-process-config '{"image":{"max_pixels":5000000}}' \
--port 8000
python scripts/check_deployment_cn.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b
Expected output includes a <think>...</think> block followed by <answer>do(action="Launch", app="..."). If the chain-of-thought is very short or garbled, the model deployment has failed.
# Android device (default)
python main.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b \
"打开小红书搜索美食"
# HarmonyOS device
python main.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b \
--device-type hdc \
"打开设置查看WiFi"
# Multilingual model for English apps
python main.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b-multilingual \
"Open Instagram and search for travel photos"
| Parameter | Description | Default |
|---|---|---|
--base-url | Model service endpoint | Required |
--model | Model name on server | Required |
--apikey | API key for third-party services | None |
--device-type | adb (Android) or hdc (HarmonyOS) |
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
config = AgentConfig(
base_url="http://localhost:8000/v1",
model="autoglm-phone-9b",
device_type="adb", # or "hdc" for HarmonyOS
)
agent = PhoneAgent(config)
# Run a task
result = agent.run("打开淘宝搜索蓝牙耳机")
print(result)
from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os
config = AgentConfig(
base_url=os.environ["MODEL_BASE_URL"],
model=os.environ["MODEL_NAME"],
apikey=os.environ.get("MODEL_API_KEY"),
device_type="adb",
device_id="emulator-5554", # specific device
)
agent = PhoneAgent(config)
# Task with sensitive operation confirmation
result = agent.run(
"在京东购买最便宜的蓝牙耳机",
confirm_sensitive=True # prompt user before purchase actions
)
import openai
import base64
import os
from pathlib import Path
client = openai.OpenAI(
base_url=os.environ["MODEL_BASE_URL"],
api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)
# Load screenshot
screenshot_path = "screenshot.png"
with open(screenshot_path, "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="autoglm-phone-9b",
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_b64}"},
},
{
"type": "text",
"text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search",
},
],
}
],
)
print(response.choices[0].message.content)
# Output format: <think>...</think>\n<answer>do(action="...", ...)
import re
def parse_action(model_output: str) -> dict:
"""Parse AutoGLM model output into structured action."""
# Extract answer block
answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
if not answer_match:
return {"action": "unknown"}
answer = answer_match.group(1).strip()
# Parse do() call
# Format: do(action="ActionName", param1="value1", param2="value2")
action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
if not action_match:
return {"action": "unknown", "raw": answer}
action_name = action_match.group(1)
params_str = action_match.group(2)
# Parse parameters
params = {}
for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
params[param_match.group(1)] = param_match.group(2)
return {"action": action_name, **params}
# Example usage
output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")'
action = parse_action(output)
# {"action": "Launch", "app": "京东"}
import subprocess
def take_screenshot(device_id: str = None) -> bytes:
"""Capture current device screen."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["exec-out", "screencap", "-p"])
result = subprocess.run(cmd, capture_output=True)
return result.stdout
def send_tap(x: int, y: int, device_id: str = None):
"""Tap at screen coordinates."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "tap", str(x), str(y)])
subprocess.run(cmd)
def send_text_adb_keyboard(text: str, device_id: str = None):
"""Send text via ADB Keyboard (must be installed and enabled)."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
# Enable ADB keyboard first
cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
subprocess.run(cmd_enable)
# Send text
cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
"--es", "msg", text]
subprocess.run(cmd_text)
def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
"""Swipe gesture on screen."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "swipe",
str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
subprocess.run(cmd)
def press_back(device_id: str = None):
"""Press Android back button."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
subprocess.run(cmd)
def launch_app(package_name: str, device_id: str = None):
"""Launch app by package name."""
cmd = ["adb"]
if device_id:
cmd.extend(["-s", device_id])
cmd.extend(["shell", "monkey", "-p", package_name, "-c",
"android.intent.category.LAUNCHER", "1"])
subprocess.run(cmd)
For JavaScript/TypeScript automation using AutoGLM:
// .env configuration
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key
import { AndroidAgent } from "@midscene/android";
const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么?");
# Connect device via USB first, then enable TCP/IP mode
adb tcpip 5555
# Get device IP address
adb shell ip addr show wlan0
# Connect wirelessly (disconnect USB after this)
adb connect 192.168.1.100:5555
# Verify connection
adb devices
# 192.168.1.100:5555 device
# Use with agent
python main.py \
--base-url http://model-server:8000/v1 \
--model autoglm-phone-9b \
--device-id "192.168.1.100:5555" \
"打开支付宝查看余额"
The AutoGLM model outputs structured actions:
| Action | Description | Example |
|---|---|---|
Launch | Open an app | do(action="Launch", app="微信") |
Tap | Tap screen element | do(action="Tap", element="搜索框") |
Type | Input text | do(action="Type", text="火锅") |
Swipe |
| Model | Use Case | Languages |
|---|---|---|
AutoGLM-Phone-9B | Chinese apps (WeChat, Taobao, Meituan) | Chinese-optimized |
AutoGLM-Phone-9B-Multilingual | International apps, mixed content | Chinese + English + others |
zai-org/AutoGLM-Phone-9B / zai-org/AutoGLM-Phone-9B-MultilingualZhipuAI/AutoGLM-Phone-9B / ZhipuAI/AutoGLM-Phone-9B-Multilingual# Model service
export MODEL_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="autoglm-phone-9b"
export MODEL_API_KEY="" # Required for BigModel/ModelScope APIs
# BigModel API
export BIGMODEL_API_KEY=""
export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"
# ModelScope API
export MODELSCOPE_API_KEY=""
export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"
# Device configuration
export ADB_DEVICE_ID="" # Leave empty for auto-detect
export HDC_DEVICE_ID="" # HarmonyOS device ID
Cause : Incorrect vLLM/SGLang startup parameters. Fix : Ensure --chat-template-content-format string (vLLM) and --mm-process-config with max_pixels:5000000 are set. Check transformers version compatibility.
adb devices shows no devicesFix :
adb kill-server && adb start-serverFix : ADB Keyboard must be installed AND enabled:
adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME
Cause : Model cannot identify a path to complete the task. Fix : The framework includes sensitive operation confirmation — ensure confirm_sensitive=True for purchase/delete tasks. For login/CAPTCHA screens, the agent supports human takeover.
Fix : AutoGLM-Phone-9B requires ~20GB VRAM. Use --tensor-parallel-size 2 for multi-GPU, or use the API service instead.
Fix : Check firewall rules. For remote server:
# Test connectivity
curl http://YOUR_SERVER_IP:8000/v1/models
# Should return model list JSON
Fix : HarmonyOS NEXT (not earlier versions) is required. Enable developer mode in Settings → About → Version Number (tap 10 times rapidly).
For iPhone automation, see the dedicated setup guide:
# After configuring WebDriverAgent per docs/ios_setup/ios_setup.md
python main.py \
--base-url http://localhost:8000/v1 \
--model autoglm-phone-9b-multilingual \
--device-type ios \
"Open Maps and navigate to Central Park"
Weekly Installs
274
Repository
GitHub Stars
10
First Seen
6 days ago
Security Audits
Gen Agent Trust HubWarnSocketPassSnykWarn
Installed on
cursor273
gemini-cli273
amp273
cline273
github-copilot273
codex273
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
56,200 周安装
短视频脚本生成器 - 为YouTube Shorts/Reels创作病毒式个性驱动脚本,避免AI陈词滥调
270 周安装
客户成功管理全流程指南:从客户入职、健康度评分到留存策略
271 周安装
Claude Code技能智能路由器 - 分析需求推荐最佳开发技能工具
271 周安装
Apollo Kotlin:Android与JVM的强类型GraphQL客户端,支持Kotlin多平台
271 周安装
专业图表创建工具 - 支持Mermaid/PlantUML生成流程图、架构图、UML图
271 周安装
Voicebox 开源语音合成与克隆工具:本地化 TTS 工作室,替代 ElevenLabs
271 周安装
adb--device-id | Specific device serial number | Auto-detect |
| Scroll/swipe |
do(action="Swipe", direction="up") |
Back | Press back button | do(action="Back") |
Home | Go to home screen | do(action="Home") |
Finish | Task complete | do(action="Finish", result="已完成搜索") |