Open-AutoGLM Phone Agent：开源AI手机智能体，用自然语言控制安卓/iOS/HarmonyOS设备

open-autoglm-phone-agent by aradotso/trending-skills

274 周安装量

10 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/aradotso/trending-skills --skill open-autoglm-phone-agent

AI/机器学习自动化测试

🇨🇳中文介绍

Open-AutoGLM Phone Agent

Skill by ara.so — Daily 2026 Skills collection.

Open-AutoGLM 是一个开源的 AI 手机智能体框架，支持通过自然语言控制 Android、HarmonyOS NEXT 和 iOS 设备。它使用 AutoGLM 视觉语言模型（90 亿参数）来感知屏幕内容，并执行多步骤任务，例如“打开美团并搜索附近的火锅店”。

架构概述

User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions

模型 : AutoGLM-Phone-9B（中文优化版）或 AutoGLM-Phone-9B-Multilingual
设备控制 : ADB (Android)、HDC (HarmonyOS NEXT)、WebDriverAgent (iOS)
模型服务 : vLLM 或 SGLang（自托管）或 BigModel/ModelScope API
输入 : 屏幕截图 + 任务描述 → 输出：结构化的操作命令

安装

前提条件

Python 3.10+
ADB 已安装并加入 PATH（Android）或 HDC（HarmonyOS）或 WebDriverAgent（iOS）
Android 设备已开启开发者模式和 USB 调试
Android 设备上已安装 ADB Keyboard APK（用于文本输入）

安装框架

git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

选项 A：第三方 API（推荐用于快速开始）

BigModel (智谱AI)

export BIGMODEL_API_KEY="your-bigmodel-api-key"
python main.py \
  --base-url https://open.bigmodel.cn/api/paas/v4 \
  --model "autoglm-phone" \
  --apikey $BIGMODEL_API_KEY \
  "打开美团搜索附近的火锅店"

export MODELSCOPE_API_KEY="your-modelscope-api-key"
python main.py \
  --base-url https://api-inference.modelscope.cn/v1 \
  --model "ZhipuAI/AutoGLM-Phone-9B" \
  --apikey $MODELSCOPE_API_KEY \
  "open Meituan and find nearby hotpot"

选项 B：使用 vLLM 自托管

# 安装 vLLM（或使用官方 Docker：docker pull vllm/vllm-openai:v0.12.0）
pip install vllm

# 启动模型服务器（请严格遵循这些参数）
python3 -m vllm.entrypoints.openai.api_server \
  --served-model-name autoglm-phone-9b \
  --allowed-local-media-path / \
  --mm-encoder-tp-mode data \
  --mm_processor_cache_type shm \
  --mm_processor_kwargs '{"max_pixels":5000000}' \
  --max-model-len 25480 \
  --chat-template-content-format string \
  --limit-mm-per-prompt '{"image":10}' \
  --model zai-org/AutoGLM-Phone-9B \
  --port 8000

选项 C：使用 SGLang 自托管

# 安装 SGLang 或使用：docker pull lmsysorg/sglang:v0.5.6.post1
# 在容器内：pip install nvidia-cudnn-cu12==9.16.0.29

python3 -m sglang.launch_server \
  --model-path zai-org/AutoGLM-Phone-9B \
  --served-model-name autoglm-phone-9b \
  --context-length 25480 \
  --mm-enable-dp-encoder \
  --mm-process-config '{"image":{"max_pixels":5000000}}' \
  --port 8000

python scripts/check_deployment_cn.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b

预期输出包含一个 <think>...</think> 块，后跟 <answer>do(action="Launch", app="...")。如果思维链非常短或混乱，则模型部署失败。

# Android 设备（默认）
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b \
  "打开小红书搜索美食"

# HarmonyOS 设备
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b \
  --device-type hdc \
  "打开设置查看WiFi"

# 用于英文应用的多语言模型
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b-multilingual \
  "Open Instagram and search for travel photos"

参数	描述	默认值
`--base-url`	模型服务端点	必填
`--model`	服务器上的模型名称	必填
`--apikey`	第三方服务的 API 密钥	None
`--device-type`	`adb` (Android) 或 `hdc` (HarmonyOS)	`adb`
`--device-id`	特定设备序列号	自动检测

基本智能体调用

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig

config = AgentConfig(
    base_url="http://localhost:8000/v1",
    model="autoglm-phone-9b",
    device_type="adb",  # 或 HarmonyOS 用 "hdc"
)

agent = PhoneAgent(config)

# 运行任务
result = agent.run("打开淘宝搜索蓝牙耳机")
print(result)

自定义任务与设备选择

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os

config = AgentConfig(
    base_url=os.environ["MODEL_BASE_URL"],
    model=os.environ["MODEL_NAME"],
    apikey=os.environ.get("MODEL_API_KEY"),
    device_type="adb",
    device_id="emulator-5554",  # 指定设备
)

agent = PhoneAgent(config)

# 包含敏感操作确认的任务
result = agent.run(
    "在京东购买最便宜的蓝牙耳机",
    confirm_sensitive=True  # 购买操作前提示用户
)

直接调用模型 API（用于测试/集成）

import openai
import base64
import os
from pathlib import Path

client = openai.OpenAI(
    base_url=os.environ["MODEL_BASE_URL"],
    api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)

# 加载屏幕截图
screenshot_path = "screenshot.png"
with open(screenshot_path, "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="autoglm-phone-9b",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_b64}"},
                },
                {
                    "type": "text",
                    "text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search",
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)
# 输出格式：<think>...</think>\n<answer>do(action="...", ...)

解析模型动作输出

import re

def parse_action(model_output: str) -> dict:
    """将 AutoGLM 模型输出解析为结构化动作。"""
    # 提取 answer 块
    answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
    if not answer_match:
        return {"action": "unknown"}
    
    answer = answer_match.group(1).strip()
    
    # 解析 do() 调用
    # 格式：do(action="ActionName", param1="value1", param2="value2")
    action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
    if not action_match:
        return {"action": "unknown", "raw": answer}
    
    action_name = action_match.group(1)
    params_str = action_match.group(2)
    
    # 解析参数
    params = {}
    for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
        params[param_match.group(1)] = param_match.group(2)
    
    return {"action": action_name, **params}

# 使用示例
output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")'
action = parse_action(output)
# {"action": "Launch", "app": "京东"}

ADB 设备控制模式

智能体使用的常见 ADB 操作

import subprocess

def take_screenshot(device_id: str = None) -> bytes:
    """捕获当前设备屏幕。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["exec-out", "screencap", "-p"])
    result = subprocess.run(cmd, capture_output=True)
    return result.stdout

def send_tap(x: int, y: int, device_id: str = None):
    """点击屏幕坐标。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "tap", str(x), str(y)])
    subprocess.run(cmd)

def send_text_adb_keyboard(text: str, device_id: str = None):
    """通过 ADB Keyboard 发送文本（必须已安装并启用）。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    # 首先启用 ADB 键盘
    cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
    subprocess.run(cmd_enable)
    # 发送文本
    cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
                      "--es", "msg", text]
    subprocess.run(cmd_text)

def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
    """在屏幕上滑动。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "swipe",
                str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
    subprocess.run(cmd)

def press_back(device_id: str = None):
    """按下 Android 返回键。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
    subprocess.run(cmd)

def launch_app(package_name: str, device_id: str = None):
    """通过包名启动应用。"""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "monkey", "-p", package_name, "-c",
                "android.intent.category.LAUNCHER", "1"])
    subprocess.run(cmd)

使用 AutoGLM 进行 JavaScript/TypeScript 自动化：

// .env 配置
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key

import { AndroidAgent } from "@midscene/android";

const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么？");

远程 ADB（WiFi 调试）

# 首先通过 USB 连接设备，然后启用 TCP/IP 模式
adb tcpip 5555

# 获取设备 IP 地址
adb shell ip addr show wlan0

# 无线连接（之后可以断开 USB）
adb connect 192.168.1.100:5555

# 验证连接
adb devices
# 192.168.1.100:5555   device

# 与智能体一起使用
python main.py \
  --base-url http://model-server:8000/v1 \
  --model autoglm-phone-9b \
  --device-id "192.168.1.100:5555" \
  "打开支付宝查看余额"

AutoGLM 模型输出结构化动作：

动作	描述	示例
`Launch`	打开应用	`do(action="Launch", app="微信")`
`Tap`	点击屏幕元素	`do(action="Tap", element="搜索框")`
`Type`	输入文本	`do(action="Type", text="火锅")`
`Swipe`	滚动/滑动	`do(action="Swipe", direction="up")`
`Back`	按下返回键	`do(action="Back")`
`Home`	返回主屏幕	`do(action="Home")`
`Finish`	任务完成	`do(action="Finish", result="已完成搜索")`

模型	使用场景	语言
`AutoGLM-Phone-9B`	中文应用（微信、淘宝、美团）	中文优化版
`AutoGLM-Phone-9B-Multilingual`	国际应用，混合内容	中文 + 英文 + 其他

HuggingFace: zai-org/AutoGLM-Phone-9B / zai-org/AutoGLM-Phone-9B-Multilingual
ModelScope: ZhipuAI/AutoGLM-Phone-9B / ZhipuAI/AutoGLM-Phone-9B-Multilingual

# 模型服务
export MODEL_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="autoglm-phone-9b"
export MODEL_API_KEY=""  # BigModel/ModelScope API 需要

# BigModel API
export BIGMODEL_API_KEY=""
export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"

# ModelScope API
export MODELSCOPE_API_KEY=""
export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"

# 设备配置
export ADB_DEVICE_ID=""      # 留空以自动检测
export HDC_DEVICE_ID=""      # HarmonyOS 设备 ID

模型输出混乱或思维链非常短

原因 : vLLM/SGLang 启动参数不正确。修复 : 确保设置了 --chat-template-content-format string (vLLM) 和 --mm-process-config 中的 max_pixels:5000000。检查 transformers 版本兼容性。

`adb devices` 未显示设备

确认 USB 线支持数据传输（非仅充电）
在手机上接受“允许 USB 调试”对话框
尝试 adb kill-server && adb start-server
某些设备在开启开发者选项后需要重启

Android 上文本输入不工作

修复 : ADB Keyboard 必须已安装并启用：

adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME

智能体陷入循环

原因 : 模型无法找到完成任务的方法。修复 : 框架包含敏感操作确认功能 — 对于购买/删除任务，确保设置 confirm_sensitive=True。对于登录/验证码屏幕，智能体支持人工接管。

vLLM CUDA 内存不足

修复 : AutoGLM-Phone-9B 需要约 20GB 显存。使用 --tensor-parallel-size 2 进行多 GPU 部署，或改用 API 服务。

连接模型服务器被拒绝

修复 : 检查防火墙规则。对于远程服务器：

# 测试连通性
curl http://YOUR_SERVER_IP:8000/v1/models
# 应返回模型列表 JSON

HDC 设备无法识别（HarmonyOS）

修复 : 需要 HarmonyOS NEXT（非早期版本）。在设置 → 关于 → 版本号（快速点击 10 次）中开启开发者模式。

关于 iPhone 自动化，请参阅专用设置指南：

# 按照 docs/ios_setup/ios_setup.md 配置 WebDriverAgent 后
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b-multilingual \
  --device-type ios \
  "Open Maps and navigate to Central Park"

🇺🇸English

Open-AutoGLM Phone Agent

Skill by ara.so — Daily 2026 Skills collection.

Open-AutoGLM is an open-source AI phone agent framework that enables natural language control of Android, HarmonyOS NEXT, and iOS devices. It uses the AutoGLM vision-language model (9B parameters) to perceive screen content and execute multi-step tasks like "open Meituan and search for nearby hot pot restaurants."

Architecture Overview

User Natural Language → AutoGLM VLM → Screen Perception → ADB/HDC/WebDriverAgent → Device Actions

Model : AutoGLM-Phone-9B (Chinese-optimized) or AutoGLM-Phone-9B-Multilingual
Device control : ADB (Android), HDC (HarmonyOS NEXT), WebDriverAgent (iOS)
Model serving : vLLM or SGLang (self-hosted) or BigModel/ModelScope API
Input : Screenshot + task description → Output: structured action commands

Installation

Prerequisites

Python 3.10+
ADB installed and in PATH (Android) or HDC (HarmonyOS) or WebDriverAgent (iOS)
Android device with Developer Mode + USB Debugging enabled
ADB Keyboard APK installed on Android device (for text input)

Install the framework

git clone https://github.com/zai-org/Open-AutoGLM.git
cd Open-AutoGLM
pip install -r requirements.txt
pip install -e .

Verify ADB connection

# Android
adb devices
# Expected: emulator-5554   device

# HarmonyOS NEXT
hdc list targets
# Expected: 7001005458323933328a01bce01c2500

Model Deployment Options

Option A: Third-party API (Recommended for quick start)

BigModel (ZhipuAI)

export BIGMODEL_API_KEY="your-bigmodel-api-key"
python main.py \
  --base-url https://open.bigmodel.cn/api/paas/v4 \
  --model "autoglm-phone" \
  --apikey $BIGMODEL_API_KEY \
  "打开美团搜索附近的火锅店"

ModelScope

export MODELSCOPE_API_KEY="your-modelscope-api-key"
python main.py \
  --base-url https://api-inference.modelscope.cn/v1 \
  --model "ZhipuAI/AutoGLM-Phone-9B" \
  --apikey $MODELSCOPE_API_KEY \
  "open Meituan and find nearby hotpot"

Option B: Self-hosted with vLLM

# Install vLLM (or use official Docker: docker pull vllm/vllm-openai:v0.12.0)
pip install vllm

# Start model server (strictly follow these parameters)
python3 -m vllm.entrypoints.openai.api_server \
  --served-model-name autoglm-phone-9b \
  --allowed-local-media-path / \
  --mm-encoder-tp-mode data \
  --mm_processor_cache_type shm \
  --mm_processor_kwargs '{"max_pixels":5000000}' \
  --max-model-len 25480 \
  --chat-template-content-format string \
  --limit-mm-per-prompt '{"image":10}' \
  --model zai-org/AutoGLM-Phone-9B \
  --port 8000

Option C: Self-hosted with SGLang

# Install SGLang or use: docker pull lmsysorg/sglang:v0.5.6.post1
# Inside container: pip install nvidia-cudnn-cu12==9.16.0.29

python3 -m sglang.launch_server \
  --model-path zai-org/AutoGLM-Phone-9B \
  --served-model-name autoglm-phone-9b \
  --context-length 25480 \
  --mm-enable-dp-encoder \
  --mm-process-config '{"image":{"max_pixels":5000000}}' \
  --port 8000

Verify deployment

python scripts/check_deployment_cn.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b

Expected output includes a <think>...</think> block followed by <answer>do(action="Launch", app="..."). If the chain-of-thought is very short or garbled, the model deployment has failed.

Running the Agent

Basic CLI usage

# Android device (default)
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b \
  "打开小红书搜索美食"

# HarmonyOS device
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b \
  --device-type hdc \
  "打开设置查看WiFi"

# Multilingual model for English apps
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b-multilingual \
  "Open Instagram and search for travel photos"

Key CLI parameters

Parameter	Description	Default
`--base-url`	Model service endpoint	Required
`--model`	Model name on server	Required
`--apikey`	API key for third-party services	None
`--device-type`	`adb` (Android) or `hdc` (HarmonyOS)

Python API Usage

Basic agent invocation

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig

config = AgentConfig(
    base_url="http://localhost:8000/v1",
    model="autoglm-phone-9b",
    device_type="adb",  # or "hdc" for HarmonyOS
)

agent = PhoneAgent(config)

# Run a task
result = agent.run("打开淘宝搜索蓝牙耳机")
print(result)

Custom task with device selection

from phone_agent import PhoneAgent
from phone_agent.config import AgentConfig
import os

config = AgentConfig(
    base_url=os.environ["MODEL_BASE_URL"],
    model=os.environ["MODEL_NAME"],
    apikey=os.environ.get("MODEL_API_KEY"),
    device_type="adb",
    device_id="emulator-5554",  # specific device
)

agent = PhoneAgent(config)

# Task with sensitive operation confirmation
result = agent.run(
    "在京东购买最便宜的蓝牙耳机",
    confirm_sensitive=True  # prompt user before purchase actions
)

Direct model API call (for testing/integration)

import openai
import base64
import os
from pathlib import Path

client = openai.OpenAI(
    base_url=os.environ["MODEL_BASE_URL"],
    api_key=os.environ.get("MODEL_API_KEY", "dummy"),
)

# Load screenshot
screenshot_path = "screenshot.png"
with open(screenshot_path, "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="autoglm-phone-9b",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{image_b64}"},
                },
                {
                    "type": "text",
                    "text": "Task: 搜索附近的咖啡店\nCurrent step: Navigate to search",
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)
# Output format: <think>...</think>\n<answer>do(action="...", ...)

Parsing model action output

import re

def parse_action(model_output: str) -> dict:
    """Parse AutoGLM model output into structured action."""
    # Extract answer block
    answer_match = re.search(r'<answer>(.*?)(?:</answer>|$)', model_output, re.DOTALL)
    if not answer_match:
        return {"action": "unknown"}
    
    answer = answer_match.group(1).strip()
    
    # Parse do() call
    # Format: do(action="ActionName", param1="value1", param2="value2")
    action_match = re.search(r'do\(action="([^"]+)"(.*?)\)', answer, re.DOTALL)
    if not action_match:
        return {"action": "unknown", "raw": answer}
    
    action_name = action_match.group(1)
    params_str = action_match.group(2)
    
    # Parse parameters
    params = {}
    for param_match in re.finditer(r'(\w+)="([^"]*)"', params_str):
        params[param_match.group(1)] = param_match.group(2)
    
    return {"action": action_name, **params}

# Example usage
output = '<think>需要启动京东</think>\n<answer>do(action="Launch", app="京东")'
action = parse_action(output)
# {"action": "Launch", "app": "京东"}

ADB Device Control Patterns

Common ADB operations used by the agent

import subprocess

def take_screenshot(device_id: str = None) -> bytes:
    """Capture current device screen."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["exec-out", "screencap", "-p"])
    result = subprocess.run(cmd, capture_output=True)
    return result.stdout

def send_tap(x: int, y: int, device_id: str = None):
    """Tap at screen coordinates."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "tap", str(x), str(y)])
    subprocess.run(cmd)

def send_text_adb_keyboard(text: str, device_id: str = None):
    """Send text via ADB Keyboard (must be installed and enabled)."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    # Enable ADB keyboard first
    cmd_enable = cmd + ["shell", "ime", "set", "com.android.adbkeyboard/.AdbIME"]
    subprocess.run(cmd_enable)
    # Send text
    cmd_text = cmd + ["shell", "am", "broadcast", "-a", "ADB_INPUT_TEXT",
                      "--es", "msg", text]
    subprocess.run(cmd_text)

def swipe(x1: int, y1: int, x2: int, y2: int, duration_ms: int = 300, device_id: str = None):
    """Swipe gesture on screen."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "swipe",
                str(x1), str(y1), str(x2), str(y2), str(duration_ms)])
    subprocess.run(cmd)

def press_back(device_id: str = None):
    """Press Android back button."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "input", "keyevent", "KEYCODE_BACK"])
    subprocess.run(cmd)

def launch_app(package_name: str, device_id: str = None):
    """Launch app by package name."""
    cmd = ["adb"]
    if device_id:
        cmd.extend(["-s", device_id])
    cmd.extend(["shell", "monkey", "-p", package_name, "-c",
                "android.intent.category.LAUNCHER", "1"])
    subprocess.run(cmd)

Midscene.js Integration

For JavaScript/TypeScript automation using AutoGLM:

// .env configuration
// MIDSCENE_MODEL_NAME=autoglm-phone
// MIDSCENE_OPENAI_BASE_URL=https://open.bigmodel.cn/api/paas/v4
// MIDSCENE_OPENAI_API_KEY=your-api-key

import { AndroidAgent } from "@midscene/android";

const agent = new AndroidAgent();
await agent.aiAction("打开微信发送消息给张三");
await agent.aiQuery("当前页面显示的消息内容是什么？");

Remote ADB (WiFi Debugging)

# Connect device via USB first, then enable TCP/IP mode
adb tcpip 5555

# Get device IP address
adb shell ip addr show wlan0

# Connect wirelessly (disconnect USB after this)
adb connect 192.168.1.100:5555

# Verify connection
adb devices
# 192.168.1.100:5555   device

# Use with agent
python main.py \
  --base-url http://model-server:8000/v1 \
  --model autoglm-phone-9b \
  --device-id "192.168.1.100:5555" \
  "打开支付宝查看余额"

Common Action Types

The AutoGLM model outputs structured actions:

Action	Description	Example
`Launch`	Open an app	`do(action="Launch", app="微信")`
`Tap`	Tap screen element	`do(action="Tap", element="搜索框")`
`Type`	Input text	`do(action="Type", text="火锅")`
`Swipe`

Model Selection Guide

Model	Use Case	Languages
`AutoGLM-Phone-9B`	Chinese apps (WeChat, Taobao, Meituan)	Chinese-optimized
`AutoGLM-Phone-9B-Multilingual`	International apps, mixed content	Chinese + English + others

HuggingFace: zai-org/AutoGLM-Phone-9B / zai-org/AutoGLM-Phone-9B-Multilingual
ModelScope: ZhipuAI/AutoGLM-Phone-9B / ZhipuAI/AutoGLM-Phone-9B-Multilingual

Environment Variables Reference

# Model service
export MODEL_BASE_URL="http://localhost:8000/v1"
export MODEL_NAME="autoglm-phone-9b"
export MODEL_API_KEY=""  # Required for BigModel/ModelScope APIs

# BigModel API
export BIGMODEL_API_KEY=""
export BIGMODEL_BASE_URL="https://open.bigmodel.cn/api/paas/v4"

# ModelScope API
export MODELSCOPE_API_KEY=""
export MODELSCOPE_BASE_URL="https://api-inference.modelscope.cn/v1"

# Device configuration
export ADB_DEVICE_ID=""      # Leave empty for auto-detect
export HDC_DEVICE_ID=""      # HarmonyOS device ID

Troubleshooting

Model output is garbled or very short chain-of-thought

Cause : Incorrect vLLM/SGLang startup parameters. Fix : Ensure --chat-template-content-format string (vLLM) and --mm-process-config with max_pixels:5000000 are set. Check transformers version compatibility.

`adb devices` shows no devices

Fix :

Verify USB cable supports data transfer (not charge-only)
Accept "Allow USB debugging" dialog on phone
Try adb kill-server && adb start-server
Some devices require reboot after enabling developer options

Text input not working on Android

Fix : ADB Keyboard must be installed AND enabled:

adb shell ime enable com.android.adbkeyboard/.AdbIME
adb shell ime set com.android.adbkeyboard/.AdbIME

Agent stuck in a loop

Cause : Model cannot identify a path to complete the task. Fix : The framework includes sensitive operation confirmation — ensure confirm_sensitive=True for purchase/delete tasks. For login/CAPTCHA screens, the agent supports human takeover.

vLLM CUDA out of memory

Fix : AutoGLM-Phone-9B requires ~20GB VRAM. Use --tensor-parallel-size 2 for multi-GPU, or use the API service instead.

Connection refused to model server

Fix : Check firewall rules. For remote server:

# Test connectivity
curl http://YOUR_SERVER_IP:8000/v1/models
# Should return model list JSON

HDC device not recognized (HarmonyOS)

Fix : HarmonyOS NEXT (not earlier versions) is required. Enable developer mode in Settings → About → Version Number (tap 10 times rapidly).

iOS Setup

For iPhone automation, see the dedicated setup guide:

# After configuring WebDriverAgent per docs/ios_setup/ios_setup.md
python main.py \
  --base-url http://localhost:8000/v1 \
  --model autoglm-phone-9b-multilingual \
  --device-type ios \
  "Open Maps and navigate to Central Park"

Weekly Installs

274

Repository

aradotso/trending-skills

GitHub Stars

First Seen

6 days ago

Security Audits

Gen Agent Trust HubWarn SocketPass SnykWarn

Installed on

cursor273

gemini-cli273

amp273

cline273

github-copilot273

codex273

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

56,200 周安装

Open-AutoGLM Phone Agent：开源AI手机智能体，用自然语言控制安卓/iOS/HarmonyOS设备

🇨🇳中文介绍

Open-AutoGLM Phone Agent

架构概述

安装

前提条件

安装框架

相关 Skills

验证 ADB 连接

模型部署选项

选项 A：第三方 API（推荐用于快速开始）

选项 B：使用 vLLM 自托管

选项 C：使用 SGLang 自托管

验证部署

运行智能体

基本 CLI 用法

关键 CLI 参数

Python API 使用

基本智能体调用

自定义任务与设备选择

直接调用模型 API（用于测试/集成）

解析模型动作输出

ADB 设备控制模式

智能体使用的常见 ADB 操作

Midscene.js 集成

远程 ADB（WiFi 调试）

常见动作类型

模型选择指南

环境变量参考

故障排除

模型输出混乱或思维链非常短

adb devices 未显示设备

Android 上文本输入不工作

智能体陷入循环

vLLM CUDA 内存不足

连接模型服务器被拒绝

HDC 设备无法识别（HarmonyOS）

iOS 设置

🇺🇸English

Open-AutoGLM Phone Agent

Architecture Overview

Installation

Prerequisites

Install the framework

Verify ADB connection

Model Deployment Options

Option A: Third-party API (Recommended for quick start)

Option B: Self-hosted with vLLM

Option C: Self-hosted with SGLang

Verify deployment

Running the Agent

Basic CLI usage

Key CLI parameters

Python API Usage

Basic agent invocation

Custom task with device selection

Direct model API call (for testing/integration)

Parsing model action output

ADB Device Control Patterns

Common ADB operations used by the agent

Midscene.js Integration

Remote ADB (WiFi Debugging)

Common Action Types

Model Selection Guide

Environment Variables Reference

Troubleshooting

Model output is garbled or very short chain-of-thought

adb devices shows no devices

Text input not working on Android

Agent stuck in a loop

vLLM CUDA out of memory

Connection refused to model server

HDC device not recognized (HarmonyOS)

iOS Setup

最新 Skills

`adb devices` 未显示设备

`adb devices` shows no devices