Eve 部署与调试指南：快速诊断应用问题，掌握基础设施变更与流水线部署

eve-deploy-debugging by incept5/eve-skillpacks

198 周安装量

GitHub

安装命令

npx skills add https://github.com/incept5/eve-skillpacks --skill eve-deploy-debugging

云服务开发运维调试

🇨🇳中文介绍

Eve 部署与调试

使用以下步骤快速部署和诊断应用问题。

环境设置

从管理员处获取预发布环境 API URL。
创建并使用一个配置文件：

eve profile create staging --api-url https://api.eh1.incept5.dev eve profile use staging

基础设施变更策略

切勿对共享基础设施运行 kubectl apply、helm install 或任何直接的 Kubernetes 资源创建操作。所有基础设施变更都通过 Terraform 进行。使用 Eve CLI（eve env、eve env deploy）来管理应用程序部署——平台会处理底层的 k8s 资源。

部署流程（预发布环境）

# 如果需要，创建环境
eve env create staging --project proj_xxx --type persistent

# 部署（需要 --ref 参数，值为 40 位字符的 SHA 或相对于 --repo-dir 解析的引用）
eve env deploy staging --ref main --repo-dir .

# 当环境配置了流水线时，上述命令会触发流水线。
# 使用 --direct 绕过流水线直接部署：
eve env deploy staging --ref main --repo-dir . --direct

# 向流水线传递输入：
eve env deploy staging --ref main --repo-dir . --inputs '{"key":"value"}'

🇺🇸English

Eve Deploy and Debug

Use these steps to deploy and diagnose app issues quickly.

Environment Setup

Get the staging API URL from your admin.
Create and use a profile:

eve profile create staging --api-url https://api.eh1.incept5.dev eve profile use staging

Infrastructure Change Policy

Never run kubectl apply, helm install, or any direct Kubernetes resource creation against shared infrastructure. All infrastructure changes go through Terraform. Use the Eve CLI (eve env, eve env deploy) to manage application deployments — the platform handles the underlying k8s resources.

Deploy Flow (Staging)

# Create env if needed
eve env create staging --project proj_xxx --type persistent

# Deploy (requires --ref with 40-char SHA or a ref resolved against --repo-dir)
eve env deploy staging --ref main --repo-dir .

# When environment has a pipeline configured, the above triggers the pipeline.
# Use --direct to bypass pipeline and deploy directly:
eve env deploy staging --ref main --repo-dir . --direct

# Pass inputs to pipeline:
eve env deploy staging --ref main --repo-dir . --inputs '{"key":"value"}'

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

实时调试（三终端方法）

# 终端 1：流水线/作业进度
eve job follow <job-id>

# 终端 2：环境健康状态
eve env diagnose <project> <env>

# 终端 3：系统级日志
eve system logs

检查依赖项：eve job dep list <job-id>
检查是否被阻塞：eve job show <job-id> → 查看 blocked_by
验证环境就绪状态：eve env show <project> <env>
检查编排器：eve system orchestrator status

获取错误信息：eve job diagnose <job-id>
检查日志：eve job follow <job-id> 或 eve job runner-logs <job-id>
如果是构建失败：eve build diagnose <build-id>
如果是密钥失败：eve secrets list --project <project_id>

作业卡在活动状态

检查是否在等待输入：eve job show <job-id> → effective_phase
检查线程消息：eve thread messages <thread-id>
检查运行器 Pod：eve system pods

API 健康状态：eve system health
编排器：eve system orchestrator status
最近事件：eve system events

错误	原因	修复方法
`401 Unauthorized`	令牌过期	`eve auth login`
`git clone failed`	缺少凭据	设置 `github_token` 或 `ssh_key` 密钥
`service not provisioned`	环境未创建	`eve env create <env>`
`image pull backoff`	注册表认证失败	如果使用 BYO/自定义注册表，请验证 `REGISTRY_USERNAME` + `REGISTRY_PASSWORD`；对于托管应用，使用 `registry: "eve"`
`healthcheck timeout`	应用未启动	检查应用日志，验证清单中的端口

如果部署流水线在构建步骤失败：

eve build list --project <project_id>
eve build diagnose <build_id>
eve build logs <build_id>
eve secrets list --project <project_id>     # BYO/自定义注册表所需：REGISTRY_USERNAME, REGISTRY_PASSWORD

常见的构建失败：

注册表认证：对于 BYO/自定义注册表，请验证 REGISTRY_USERNAME 和 REGISTRY_PASSWORD 密钥
未找到 Dockerfile：检查清单中 build.context 的路径
多阶段构建失败：BuildKit 能正确处理这些；Kaniko 可能有问题
工作区错误：构建上下文不可用——检查 eve build diagnose

工作进程镜像注册表

Eve 将工作进程镜像发布到配置的私有注册表，包含以下变体：

变体	内容
`base`	Node.js, git, 标准 CLI 工具
`python`	Base + Python 运行时
`rust`	Base + Rust 工具链
`java`	Base + JDK
`kotlin`	Base + Kotlin 编译器
`full`	所有运行时组合

版本固定：在生产环境中使用语义化版本标签（例如 v1.2.3）。在开发环境中使用 SHA 标签或 :latest。

Eve 自动将这些变量注入到每个已部署的服务容器中：

变量	用途
`EVE_API_URL`	用于服务器间调用的内部集群 URL
`EVE_PUBLIC_API_URL`	面向浏览器应用的公共入口 URL（配置后）
`EVE_SSO_URL`	用于用户身份验证的 SSO 代理 URL（配置后）
`EVE_PROJECT_ID`	当前项目 ID
`EVE_ORG_ID`	当前组织 ID
`EVE_ENV_NAME`	当前环境名称

后端调用使用 EVE_API_URL。浏览器/客户端代码使用 EVE_PUBLIC_API_URL。服务可以通过在其清单的 environment 部分显式定义这些变量来覆盖它们。

URL 模式：{service}.{orgSlug}-{projectSlug}-{env}.{domain}
本地开发默认域名：lvh.me
向管理员询问正确的域名（预发布环境与生产环境）。

环境	如何调试
本地 (k3d)	通过入口直接访问服务，`eve system logs`
Docker Compose	`docker compose logs <service>`，仅限开发（不用于生产）
Kubernetes	基于入口的访问，最后手段使用 `kubectl -n eve logs`

私有端点（Tailscale）

将私有网络上的服务（家庭实验室 GPU、内部 API、开发机器）连接到 Eve 集群。平台创建由 Tailscale 出口代理支持的 K8s ExternalName 服务。

# 注册私有端点
eve endpoint add \
  --name lmstudio \
  --provider tailscale \
  --tailscale-hostname mac-mini.tail12345.ts.net \
  --port 1234 \
  --org org_xxx

# 列出并检查
eve endpoint list --org org_xxx
eve endpoint show lmstudio --org org_xxx

# 诊断连接性
eve endpoint diagnose lmstudio

# 移除
eve endpoint remove lmstudio --org org_xxx

每个端点都会获得一个稳定的集群内 DNS 名称：http://{orgSlug}-{name}.eve-tunnels.svc.cluster.local:{port}。通过密钥将其连接到应用/代理：

eve secrets set LLM_BASE_URL \
  "http://myorg-lmstudio.eve-tunnels.svc.cluster.local:1234/v1" \
  --scope project

诊断检查包括：操作员状态、K8s 服务是否存在、DNS 解析、TCP 连接性和 HTTP 健康状态。

工作进程按需工具链

默认的工作进程镜像是 base（约 800MB，包含 Node.js、git 和所有工具链）。工具链（Python、Rust、Java、Kotlin、媒体）通过初始化容器按需注入，而不是捆绑在一个臃肿的镜像中。

部署影响：如果代理作业需要工具链，运行器 Pod 会启动初始化容器，从小型预构建镜像中复制工具链二进制文件。首次拉取会增加约 5-10 秒；同一节点上的后续作业使用缓存的镜像。

调试工具链问题：

# 检查代理配置中是否声明了工具链
# agents.yaml: toolchains: [python]

# 如果运行时缺少工具链二进制文件：
# 1. 验证代理配置已声明该工具链
# 2. 检查运行器 Pod 上的初始化容器日志
# 3. 验证注册表中工具链镜像是否可用

要使用完整镜像（捆绑所有工具链）：设置 EVE_WORKER_VARIANT=full 或在本地使用 --variant full。

应用卸载/删除生命周期

移除环境并清理资源：

# 从环境中卸载服务（停止 Pod，保留环境记录）
eve env undeploy <project> <env>

# 完全删除环境（移除环境记录、托管数据库、密钥）
eve env delete <project> <env>

当附加了托管数据库时，eve env delete 会取消其配置。作用域为该环境的密钥会被清理。环境的流水线历史记录保留在审计日志中。

对于应用级清理，删除项目：

eve project delete <project-id>

这会级联删除：环境、密钥、流水线历史记录和构建产物都会被移除。

用于代理工作区的生产磁盘管理：

EVE_WORKSPACE_MAX_GB — 工作区总预算
EVE_WORKSPACE_MIN_FREE_GB — 触发清理的阈值
EVE_SESSION_TTL_HOURS — 自动驱逐过期会话
接近预算时使用 LRU 驱逐；对空闲会话进行 TTL 清理
K8s：每次尝试的 PVC 在完成后删除

本地开发循环：eve-local-dev-loop
密钥：eve-auth-and-secrets
清单变更：eve-manifest-authoring

When eve env deploy is called:

Direct deploy (no pipeline): Returns deployment_status directly. Poll health endpoint until ready === true.
Pipeline deploy : Returns pipeline_run_id. Poll GET /pipelines/{name}/runs/{id} until all steps complete, then check health.

Deploy is complete when: ready === true AND active_pipeline_run === null.

eve job list --phase active
eve job follow <job-id>              # Real-time SSE streaming
eve job watch <job-id>               # Poll-based status updates
eve job diagnose <job-id>            # Full diagnostic
eve job result <job-id>              # Final result
eve job runner-logs <job-id>         # Raw worker logs

Real-Time Debugging (3-Terminal Approach)

# Terminal 1: Pipeline/job progress
eve job follow <job-id>

# Terminal 2: Environment health
eve env diagnose <project> <env>

# Terminal 3: System-level logs
eve system logs

Check dependencies: eve job dep list <job-id>
Check if blocked: eve job show <job-id> → look at blocked_by
Verify environment readiness: eve env show <project> <env>
Check orchestrator: eve system orchestrator status

Get the error: eve job diagnose <job-id>
Check logs: eve job follow <job-id> or eve job runner-logs <job-id>
If build failure: eve build diagnose <build-id>
If secret failure: eve secrets list --project <project_id>

Check if waiting for input: eve job show <job-id> → effective_phase
Check thread messages: eve thread messages <thread-id>
Check runner pod: eve system pods

API health: eve system health
Orchestrator: eve system orchestrator status
Recent events: eve system events

Common Error Messages

Error	Cause	Fix
`401 Unauthorized`	Token expired	`eve auth login`
`git clone failed`	Missing credentials	Set `github_token` or `ssh_key` secret
`service not provisioned`	Environment not created	`eve env create <env>`
`image pull backoff`	Registry auth failed	If using BYO/custom registry, verify `REGISTRY_USERNAME` + `REGISTRY_PASSWORD`; for managed apps use `registry: "eve"`
`healthcheck timeout`	App not starting	Check app logs, verify ports in manifest

If a deploy pipeline fails at the build step:

eve build list --project <project_id>
eve build diagnose <build_id>
eve build logs <build_id>
eve secrets list --project <project_id>     # Required for BYO/custom registry: REGISTRY_USERNAME, REGISTRY_PASSWORD

Common build failures:

Registry auth : For BYO/custom registry, verify REGISTRY_USERNAME and REGISTRY_PASSWORD secrets
Dockerfile not found : Check build.context path in manifest
Multi-stage build failure : BuildKit handles these correctly; Kaniko may have issues
Workspace errors : Build context not available — check eve build diagnose

Worker Image Registry

Eve publishes worker images to the configured private registry with these variants:

Variant	Contents
`base`	Node.js, git, standard CLI tools
`python`	Base + Python runtime
`rust`	Base + Rust toolchain
`java`	Base + JDK
`kotlin`	Base + Kotlin compiler
`full`	All runtimes combined

Version pinning : Use semver tags (e.g., v1.2.3) in production. Use SHA tags or :latest in development.

Platform Environment Variables

Eve automatically injects these into every deployed service container:

Variable	Purpose
`EVE_API_URL`	Internal cluster URL for server-to-server calls
`EVE_PUBLIC_API_URL`	Public ingress URL for browser-facing apps (when configured)
`EVE_SSO_URL`	SSO broker URL for user authentication (when configured)
`EVE_PROJECT_ID`	Current project ID
`EVE_ORG_ID`	Current organization ID
`EVE_ENV_NAME`	Current environment name

Use EVE_API_URL for backend calls. Use EVE_PUBLIC_API_URL for browser/client-side code. Services can override any of these by defining them explicitly in their manifest environment section.

URL pattern: {service}.{orgSlug}-{projectSlug}-{env}.{domain}
Local dev default domain: lvh.me
Ask the admin for the correct domain (staging vs production).

Environment-Specific Debugging

Environment	How to Debug
Local (k3d)	Direct service access via ingress, `eve system logs`
Docker Compose	`docker compose logs <service>`, dev-only (no production use)
Kubernetes	Ingress-based access, `kubectl -n eve logs` as last resort

Private Endpoints (Tailscale)

Connect services on private networks (home lab GPUs, internal APIs, dev machines) to the Eve cluster. The platform creates K8s ExternalName services backed by Tailscale egress proxies.

# Register a private endpoint
eve endpoint add \
  --name lmstudio \
  --provider tailscale \
  --tailscale-hostname mac-mini.tail12345.ts.net \
  --port 1234 \
  --org org_xxx

# List and inspect
eve endpoint list --org org_xxx
eve endpoint show lmstudio --org org_xxx

# Diagnose connectivity
eve endpoint diagnose lmstudio

# Remove
eve endpoint remove lmstudio --org org_xxx

Each endpoint gets a stable in-cluster DNS name: http://{orgSlug}-{name}.eve-tunnels.svc.cluster.local:{port}. Wire it into apps/agents via secrets:

eve secrets set LLM_BASE_URL \
  "http://myorg-lmstudio.eve-tunnels.svc.cluster.local:1234/v1" \
  --scope project

Diagnostics check: operator status, K8s service existence, DNS resolution, TCP connectivity, and HTTP health.

Worker Toolchain-on-Demand

The default worker image is base (~800MB with Node.js, git, and all harnesses). Toolchains (Python, Rust, Java, Kotlin, media) are injected on-demand via init containers rather than bundled in a fat image.

Deployment impact : If an agent job needs toolchains, the runner pod starts init containers that copy toolchain binaries from small pre-built images. First pull adds ~5-10s; subsequent jobs on the same node use cached images.

Debugging toolchain issues :

# Check if toolchains are declared in agent config
# agents.yaml: toolchains: [python]

# If a toolchain binary is missing at runtime:
# 1. Verify agent config has the toolchain declared
# 2. Check init container logs on the runner pod
# 3. Verify toolchain images are available in the registry

To use the full image (all toolchains bundled): set EVE_WORKER_VARIANT=full or use --variant full locally.

App Undeploy/Delete Lifecycle

Remove environments and clean up resources:

# Undeploy services from an environment (stops pods, keeps env record)
eve env undeploy <project> <env>

# Delete the environment entirely (removes env record, managed DB, secrets)
eve env delete <project> <env>

When a managed DB is attached, eve env delete deprovisions it. Secrets scoped to the environment are cleaned up. The environment's pipeline history remains in the audit log.

For app-level cleanup, remove the project:

eve project delete <project-id>

This cascades: environments, secrets, pipeline history, and build artifacts are removed.

Production disk management for agent workspaces:

EVE_WORKSPACE_MAX_GB — total workspace budget
EVE_WORKSPACE_MIN_FREE_GB — trigger cleanup threshold
EVE_SESSION_TTL_HOURS — auto-evict stale sessions
LRU eviction when approaching budget; TTL cleanup for idle sessions
K8s: per-attempt PVCs deleted on completion

Local dev loop: eve-local-dev-loop
Secrets: eve-auth-and-secrets
Manifest changes: eve-manifest-authoring

Eve 部署与调试指南：快速诊断应用问题，掌握基础设施变更与流水线部署

🇨🇳中文介绍

Eve 部署与调试

环境设置

基础设施变更策略

部署流程（预发布环境）

🇺🇸English

Eve Deploy and Debug

Environment Setup

Infrastructure Change Policy

Deploy Flow (Staging)

相关 Skills

部署轮询流程

观察部署过程

实时调试（三终端方法）

调试工作流

作业无法启动

作业失败

作业卡在活动状态

系统问题

常见错误信息

构建失败

工作进程镜像注册表

平台环境变量

访问 URL

环境特定调试

私有端点（Tailscale）

工作进程按需工具链

应用卸载/删除生命周期

工作区清理器

相关技能

Deploy Polling Flow

Observe the Deploy

Real-Time Debugging (3-Terminal Approach)

Debugging Workflows

Job Won't Start

Job Failed

Job Stuck Active

System Issues

Common Error Messages

Build Failures

Worker Image Registry

Platform Environment Variables

Access URLs

Environment-Specific Debugging

Private Endpoints (Tailscale)

Worker Toolchain-on-Demand

App Undeploy/Delete Lifecycle

Workspace Janitor

Related Skills

最新 Skills