ADK 部署指南：Agent Engine、Cloud Run、GKE 部署决策与快速部署教程

adk-deploy-guide by google/adk-docs

1,100 周安装量

1,200 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/google/adk-docs --skill adk-deploy-guide

AI/机器学习云服务开发运维

🇨🇳中文介绍

ADK 部署指南

脚手架项目？ 本指南中请使用 make 命令——它们将 Terraform、Docker 和部署封装成一个经过测试的流水线。

没有脚手架？ 请参阅下方的快速部署，或查阅 ADK 部署文档。对于生产环境基础设施，请使用 /adk-scaffold 创建脚手架。

参考文件

如需更深入的细节，请查阅 references/ 目录下的这些参考文件：

cloud-run.md — 扩缩容默认值、Dockerfile、会话类型、网络
agent-engine.md — deploy.py CLI、AdkApp 模式、Terraform 资源、部署元数据、CI/CD 差异
gke.md — GKE Autopilot 集群、Terraform 管理的 Kubernetes 资源、工作负载身份、会话类型、网络

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

部署目标决策矩阵

根据您的需求选择合适的部署目标：

标准	Agent Engine	Cloud Run	GKE
支持语言	Python	Python	Python（+ 通过自定义容器支持其他语言）
扩缩容	托管自动扩缩（可配置最小/最大值、并发数）	完全可配置（最小/最大实例数、并发数、CPU 分配）	完整的 Kubernetes 扩缩（HPA、VPA、节点自动配置）
网络	支持 VPC-SC 和 PSC	完整的 VPC 支持、直接 VPC 出口、IAP、入口规则	完整的 Kubernetes 网络
会话状态	原生 `VertexAiSessionService`（持久化、托管）	内存中（开发环境）、Cloud SQL 或 Agent Engine 会话后端	内存中（开发环境）、Cloud SQL 或 Agent Engine 会话后端
批处理/事件处理	不支持	用于 Pub/Sub、Eventarc、BigQuery 的 `/invoke` 端点	自定义（Kubernetes Jobs、Pub/Sub）
成本模型	vCPU-小时 + 内存-小时（空闲时不收费）	按实例-秒 + 最小实例成本	节点池成本（常备或自动配置）
设置复杂度	较低（托管、专为智能体构建）	中等（Dockerfile、Terraform、网络）	较高（需要 Kubernetes 专业知识）
最适合	托管基础设施、运维最少	自定义基础设施、事件驱动型工作负载	完整的 Kubernetes 控制权

询问用户 哪个部署目标符合他们的需求。每个都是有效的生产选择，但有不同的权衡取舍。

快速部署（ADK CLI）

适用于没有 Agent Starter Pack 脚手架的项目。不需要 Makefile、Terraform 或 Dockerfile。

# Cloud Run
adk deploy cloud_run --project=PROJECT --region=REGION path/to/agent/

# Agent Engine
adk deploy agent_engine --project=PROJECT --region=REGION path/to/agent/

# GKE（需要现有集群）
adk deploy gke --project=PROJECT --cluster_name=CLUSTER --region=REGION path/to/agent/

所有命令都支持 --with_ui 来部署 ADK 开发 UI。Cloud Run 还接受在 -- 之后传递额外的 gcloud 标志（例如，-- --no-allow-unauthenticated）。

完整标志参考请参阅 adk deploy --help 或 ADK 部署文档。

对于 CI/CD、可观测性或生产环境基础设施，请使用 /adk-scaffold 创建脚手架，并使用以下部分。

开发环境设置与部署（脚手架项目）

设置开发基础设施（可选）

make setup-dev-env 在 deployment/terraform/dev/ 中运行 terraform apply。这将配置支持性基础设施：

服务账号（app_sa 用于智能体，用于运行时权限）
Artifact Registry 仓库（用于容器镜像）
IAM 绑定（授予应用服务账号必要的角色）
遥测资源（Cloud Logging 存储桶、BigQuery 数据集）
在 deployment/terraform/dev/ 中定义的任何自定义资源

此步骤是可选的——没有它 make deploy 也能工作（Cloud Run 通过 gcloud run deploy --source . 动态创建服务）。但是，运行它可以为您提供适当的服务账号、可观测性和 IAM 设置。

make setup-dev-env

注意： make deploy 不会自动使用 Terraform 创建的 app_sa。请显式传递 --service-account 或更新 Makefile。

通知相关人员："评估分数达到阈值且测试通过。准备部署到开发环境吗？"
等待明确的批准
一旦获得批准：make deploy

重要提示：未经相关人员明确批准，切勿运行 make deploy。

生产部署 — CI/CD 流水线

最适合： 生产应用程序、需要从预演环境推广到生产环境的团队。

项目必须不在 gitignore 的文件夹中
用户必须提供预演环境和生产环境的 GCP 项目 ID
GitHub 仓库名称和所有者

如果是原型项目，首先使用 Agent Starter Pack CLI 添加 Terraform/CI-CD 文件（完整选项请参阅 /adk-scaffold）：
```
uvx agent-starter-pack enhance . --cicd-runner github_actions -y -s
```

确保已登录 GitHub CLI：

gh auth login  # （如果已认证，请跳过）

运行 setup-cicd：

uvx agent-starter-pack setup-cicd \
  --staging-project YOUR_STAGING_PROJECT \
  --prod-project YOUR_PROD_PROJECT \
  --repository-name YOUR_REPO_NAME \
  --repository-owner YOUR_GITHUB_USERNAME \
  --auto-approve \
  --create-repository

推送代码以触发部署

关键的 `setup-cicd` 标志

标志	描述
`--staging-project`	预演环境的 GCP 项目 ID
`--prod-project`	生产环境的 GCP 项目 ID
`--repository-name` / `--repository-owner`	GitHub 仓库名称和所有者
`--auto-approve`	跳过 Terraform 计划确认提示
`--create-repository`	如果 GitHub 仓库不存在则创建它
`--cicd-project`	用于 CI/CD 基础设施的独立 GCP 项目。默认为生产项目
`--local-state`	将 Terraform 状态存储在本地而不是 GCS 中（请参阅 `references/terraform-patterns.md`）

运行 uvx agent-starter-pack setup-cicd --help 查看完整标志参考（Cloud Build 选项、开发项目、区域等）。

选择 CI/CD 运行器

运行器	优点	缺点
github_actions（默认）	无需 PAT，使用 `gh auth`，基于 WIF，完全自动化	需要 GitHub CLI 认证
google_cloud_build	原生 GCP 集成	需要交互式浏览器授权（或用于编程模式的 PAT + 应用安装 ID）

身份验证如何工作（WIF）

两个运行器都使用工作负载身份联盟（WIF）——GitHub/Cloud Build 的 OIDC 令牌被 GCP 工作负载身份池信任，从而授予 cicd_runner_sa 模拟权限。不需要长期有效的服务账号密钥。setup-cicd 中的 Terraform 会自动创建池、提供者和服务账号绑定。如果身份验证失败，请在 CI/CD Terraform 目录中重新运行 terraform apply。

CI/CD 流水线阶段

流水线包含三个阶段：

CI（PR 检查） — 在拉取请求时触发。运行单元测试和集成测试。
预演环境 CD — 在合并到 main 分支时触发。构建容器，部署到预演环境，运行负载测试。

路径过滤器： 预演环境 CD 使用 paths: ['app/**']——仅当 app/ 下的文件更改时才触发。setup-cicd 之后的第一次推送不会触发预演环境 CD，除非您修改了 app/ 中的某些内容。如果推送后没有任何反应，这就是原因。

生产环境 CD — 在预演环境部署成功后通过 workflow_run 触发。在部署到生产环境之前可能需要手动批准。

批准： 转到 GitHub Actions → 生产环境工作流运行 → 点击"Review deployments" → 批准待处理的 production 环境。这是 GitHub 的环境保护规则，不是自定义机制。

重要提示：setup-cicd 创建基础设施但不会自动部署。Terraform 配置所有必需的 GitHub 密钥和变量（WIF 凭据、项目 ID、服务账号）。推送代码以触发流水线：

git add . && git commit -m "Initial agent implementation"
git push origin main

批准生产环境部署：

# GitHub Actions：通过仓库的 Actions 标签页批准（环境保护规则）

# Cloud Build：查找待处理的构建并批准
gcloud builds list --project=PROD_PROJECT --region=REGION --filter="status=PENDING"
gcloud builds approve BUILD_ID --project=PROD_PROJECT

关于详细的基础设施配置（扩缩容默认值、Dockerfile、FastAPI 端点、会话类型、网络），请参阅 references/cloud-run.md。关于 Cloud Run 部署的 ADK 文档，请获取 https://google.github.io/adk-docs/deploy/cloud-run/index.md。

Agent Engine 是一个用于部署 Python ADK 智能体的托管 Vertex AI 服务。通过 deploy.py 和 AdkApp 类使用基于源代码的部署（无需 Dockerfile）。

Agent Engine 没有 gcloud CLI。 通过 deploy.py 或 adk deploy agent_engine 部署。通过 Python vertexai.Client SDK 查询。

部署可能需要 5-10 分钟。如果 make deploy 超时，请检查引擎是否已创建，并手动使用引擎资源 ID 填充 deployment_metadata.json（详情请参阅参考文档）。

关于详细的基础设施配置（deploy.py 标志、AdkApp 模式、Terraform 资源、部署元数据、会话/工件服务、CI/CD 差异），请参阅 references/agent-engine.md。关于 Agent Engine 部署的 ADK 文档，请获取 https://google.github.io/adk-docs/deploy/agent-engine/index.md。

关于详细的基础设施配置（Terraform 管理的 Kubernetes 资源、工作负载身份、会话类型、网络），请参阅 references/gke.md。关于 GKE 部署的 ADK 文档，请获取 https://google.github.io/adk-docs/deploy/gke/index.md。

脚手架项目使用两个服务账号：

app_sa（每个环境）— 已部署智能体的运行时身份。角色在 deployment/terraform/iam.tf 中定义。
cicd_runner_sa（CI/CD 项目）— CI/CD 流水线身份（GitHub Actions / Cloud Build）。位于 CI/CD 项目中（默认为生产项目），需要在预演和生产两个项目中都有权限。

检查 deployment/terraform/iam.tf 了解确切的角色绑定。跨项目权限（Cloud Run 服务代理、Artifact Registry 访问权限）也在那里配置。

常见的 403 错误：

"Permission denied on Cloud Run" → cicd_runner_sa 在目标项目中缺少部署角色
"Cannot act as service account" → 在 app_sa 上缺少 iam.serviceAccountUser 绑定
"Secret access denied" → app_sa 缺少 secretmanager.secretAccessor
"Artifact Registry read denied" → Cloud Run 服务代理在 CI/CD 项目中缺少读取权限

Secret Manager（用于 API 凭据）

不要将敏感密钥作为环境变量传递，请使用 GCP Secret Manager。

# 创建密钥
echo -n "YOUR_API_KEY" | gcloud secrets create MY_SECRET_NAME --data-file=-

# 更新现有密钥
echo -n "NEW_API_KEY" | gcloud secrets versions add MY_SECRET_NAME --data-file=-

授予访问权限： 对于 Cloud Run，授予 app_sa secretmanager.secretAccessor 权限。对于 Agent Engine，授予平台托管的服务账号（service-PROJECT_NUMBER@gcp-sa-aiplatform-re.iam.gserviceaccount.com）该权限。对于 GKE，授予 app_sa secretmanager.secretAccessor 权限。通过 Kubernetes Secrets 或直接使用 Secret Manager API 和工作负载身份访问密钥。

在部署时传递密钥（Agent Engine）：

make deploy SECRETS="API_KEY=my-api-key,DB_PASS=db-password:2"

格式：ENV_VAR=SECRET_ID 或 ENV_VAR=SECRET_ID:VERSION（默认为最新版本）。在代码中通过 os.environ.get("API_KEY") 访问。

关于可观测性配置（Cloud Trace、提示-响应日志记录、BigQuery Analytics、第三方集成），请参阅 adk-observability-guide 技能。

测试已部署的智能体

选项 1：测试 Notebook

jupyter notebook notebooks/adk_app_testing.ipynb

选项 2：Python 脚本

import json
import vertexai

with open("deployment_metadata.json") as f:
    engine_id = json.load(f)["remote_agent_engine_id"]

client = vertexai.Client(location="us-central1")
agent = client.agent_engines.get(name=engine_id)

async for event in agent.async_stream_query(message="Hello!", user_id="test"):
    print(event)

选项 3：Playground

默认需要身份验证。 Cloud Run 部署时带有 --no-allow-unauthenticated，因此所有请求都需要带有身份令牌的 Authorization: Bearer 请求头。遇到 403 错误？您很可能缺少这个请求头。要允许公共访问，请使用 --allow-unauthenticated 重新部署。

SERVICE_URL="https://SERVICE_NAME-PROJECT_NUMBER.REGION.run.app"
AUTH="Authorization: Bearer $(gcloud auth print-identity-token)"

# 测试健康端点
curl -H "$AUTH" "$SERVICE_URL/"

# 步骤 1：创建会话（发送消息前必需）
curl -X POST "$SERVICE_URL/apps/app/users/test-user/sessions" \
  -H "Content-Type: application/json" \
  -H "$AUTH" \
  -d '{}'
# → 返回带有 "id" 的 JSON — 在下面将其用作 SESSION_ID

# 步骤 2：通过 SSE 流发送消息
curl -X POST "$SERVICE_URL/run_sse" \
  -H "Content-Type: application/json" \
  -H "$AUTH" \
  -d '{
    "app_name": "app",
    "user_id": "test-user",
    "session_id": "SESSION_ID",
    "new_message": {"role": "user", "parts": [{"text": "Hello!"}]}
  }'

常见错误： 使用 {"message": "Hello!", "user_id": "...", "session_id": "..."} 会返回 422 Field required。ADK HTTP 服务器期望上面显示的 new_message / parts 模式，并且会话必须已经存在。

GKE LoadBalancer 服务默认是公开的——不需要身份验证请求头（与 Cloud Run 不同）。关于 curl 示例和端点详情，请参阅 references/gke.md。

关于配置、默认设置和 CI/CD 集成详情，请参阅 tests/load_test/README.md。

部署带 UI 的版本（IAP）

要通过受 Google 身份验证保护的 Web UI 公开您的智能体：

# 使用 IAP 部署（内置框架 UI）
make deploy IAP=true

# 在不同端口上部署自定义前端
make deploy IAP=true PORT=5173

IAP（身份识别代理）保护 Cloud Run 服务——只有授权的 Google 账户才能访问它。部署后，通过 Cloud Console IAP 设置授予用户访问权限。

对于带有自定义前端的 Agent Engine，请使用解耦部署——将前端单独部署到 Cloud Run 或 Cloud Storage，连接到 Agent Engine 后端 API。

主要的回滚机制是基于 git 的：修复问题，提交，并推送到 main 分支。CI/CD 流水线将自动构建并通过预演 → 生产环境部署新版本。

对于无需新提交的即时 Cloud Run 回滚，请使用修订版流量转移：

gcloud run revisions list --service=SERVICE_NAME --region=REGION
gcloud run services update-traffic SERVICE_NAME \
  --to-revisions=REVISION_NAME=100 --region=REGION

Agent Engine 不支持基于修订版的回滚——通过 make deploy 修复并重新部署。

对于 GKE 回滚，使用 kubectl rollout undo：

kubectl rollout undo deployment/DEPLOYMENT_NAME -n NAMESPACE
kubectl rollout status deployment/DEPLOYMENT_NAME -n NAMESPACE

自定义基础设施（Terraform）

关于自定义基础设施模式（Pub/Sub、BigQuery、Eventarc、Cloud SQL、IAM），请查阅 references/terraform-patterns.md 了解：

在哪里放置自定义 Terraform 文件（开发环境 vs CI/CD）
资源示例（Pub/Sub、BigQuery、Eventarc 触发器）
自定义资源的 IAM 绑定
Terraform 状态管理（远程 vs 本地、导入资源）
常见的基础设施模式

问题	解决方案
Terraform 状态被锁定	在 deployment/terraform/ 中运行 `terraform force-unlock -force LOCK_ID`
GitHub Actions 身份验证失败	在 CI/CD terraform 目录中重新运行 `terraform apply`；验证 WIF 池/提供者
Cloud Build 授权待处理	改用 `github_actions` 运行器
资源已存在	`terraform import`（请参阅 `references/terraform-patterns.md`）
Agent Engine 部署超时/挂起	部署需要 5-10 分钟；检查引擎是否已创建（请参阅 Agent Engine 详情）
密钥不可用	验证 `secretAccessor` 是否已授予 `app_sa`（而不是默认的计算服务账号）
部署时出现 403	检查 `deployment/terraform/iam.tf` — `cicd_runner_sa` 需要在目标项目中拥有部署 + 服务账号模拟角色
测试 Cloud Run 时出现 403	默认是 `--no-allow-unauthenticated`；包含 `Authorization: Bearer $(gcloud auth print-identity-token)` 请求头
冷启动太慢	在 Cloud Run Terraform 配置中设置 `min_instance_count > 0`
Cloud Run 503 错误	检查资源限制（内存/CPU），增加 `max_instance_count`，或检查容器崩溃日志
授予 IAM 角色后立即出现 403	IAM 传播不是即时的——等待几分钟再重试。不要反复授予相同的角色
资源似乎缺失但 Terraform 已创建	运行 `terraform state list` 检查 Terraform 实际管理的内容。通过 `null_resource` + `local-exec` 创建的资源（例如，BQ 链接数据集）不会出现在 `gcloud` CLI 输出中

🇺🇸English

ADK Deployment Guide

Scaffolded project? Use the make commands throughout this guide — they wrap Terraform, Docker, and deployment into a tested pipeline.

No scaffold? See Quick Deploy below, or the ADK deployment docs. For production infrastructure, scaffold with /adk-scaffold.

Reference Files

For deeper details, consult these reference files in references/:

cloud-run.md — Scaling defaults, Dockerfile, session types, networking
agent-engine.md — deploy.py CLI, AdkApp pattern, Terraform resource, deployment metadata, CI/CD differences
gke.md — GKE Autopilot cluster, Terraform-managed Kubernetes resources, Workload Identity, session types, networking
terraform-patterns.md — Custom infrastructure, IAM, state management, importing resources
event-driven.md — Pub/Sub, Eventarc, BigQuery Remote Function triggers via custom fast_api_app.py endpoints

Observability: See the adk-observability-guide skill for Cloud Trace, prompt-response logging, BigQuery Analytics, and third-party integrations.

Deployment Target Decision Matrix

Choose the right deployment target based on your requirements:

Criteria	Agent Engine	Cloud Run	GKE
Languages	Python	Python	Python (+ others via custom containers)
Scaling	Managed auto-scaling (configurable min/max, concurrency)	Fully configurable (min/max instances, concurrency, CPU allocation)	Full Kubernetes scaling (HPA, VPA, node auto-provisioning)
Networking	VPC-SC and PSC supported	Full VPC support, direct VPC egress, IAP, ingress rules	Full Kubernetes networking
Session state	Native `VertexAiSessionService` (persistent, managed)	In-memory (dev), Cloud SQL, or Agent Engine session backend	In-memory (dev), Cloud SQL, or Agent Engine session backend
Batch/event processing	Not supported

Ask the user which deployment target fits their needs. Each is a valid production choice with different trade-offs.

Quick Deploy (ADK CLI)

For projects without Agent Starter Pack scaffolding. No Makefile, Terraform, or Dockerfile required.

# Cloud Run
adk deploy cloud_run --project=PROJECT --region=REGION path/to/agent/

# Agent Engine
adk deploy agent_engine --project=PROJECT --region=REGION path/to/agent/

# GKE (requires existing cluster)
adk deploy gke --project=PROJECT --cluster_name=CLUSTER --region=REGION path/to/agent/

All commands support --with_ui to deploy the ADK dev UI. Cloud Run also accepts extra gcloud flags after -- (e.g., -- --no-allow-unauthenticated).

See adk deploy --help or the ADK deployment docs for full flag reference.

For CI/CD, observability, or production infrastructure, scaffold with /adk-scaffold and use the sections below.

Dev Environment Setup & Deploy (Scaffolded Projects)

Setting Up Dev Infrastructure (Optional)

make setup-dev-env runs terraform apply in deployment/terraform/dev/. This provisions supporting infrastructure:

Service accounts (app_sa for the agent, used for runtime permissions)
Artifact Registry repository (for container images)
IAM bindings (granting the app SA necessary roles)
Telemetry resources (Cloud Logging bucket, BigQuery dataset)
Any custom resources defined in deployment/terraform/dev/

This step is optional — make deploy works without it (Cloud Run creates the service on the fly via gcloud run deploy --source .). However, running it gives you proper service accounts, observability, and IAM setup.

make setup-dev-env

Note: make deploy doesn't automatically use the Terraform-created app_sa. Pass --service-account explicitly or update the Makefile.

Deploying

Notify the human : "Eval scores meet thresholds and tests pass. Ready to deploy to dev?"
Wait for explicit approval
Once approved: make deploy

IMPORTANT : Never run make deploy without explicit human approval.

Production Deployment — CI/CD Pipeline

Best for: Production applications, teams requiring staging → production promotion.

Prerequisites:

Project must NOT be in a gitignored folder
User must provide staging and production GCP project IDs
GitHub repository name and owner

Steps:

If prototype, first add Terraform/CI-CD files using the Agent Starter Pack CLI (see /adk-scaffold for full options):
```
uvx agent-starter-pack enhance . --cicd-runner github_actions -y -s
```

Ensure you're logged in to GitHub CLI:

gh auth login  # (skip if already authenticated)

Run setup-cicd:

uvx agent-starter-pack setup-cicd \
  --staging-project YOUR_STAGING_PROJECT \
  --prod-project YOUR_PROD_PROJECT \
  --repository-name YOUR_REPO_NAME \
  --repository-owner YOUR_GITHUB_USERNAME \
  --auto-approve \
  --create-repository

Push code to trigger deployments

Key `setup-cicd` Flags

Flag	Description
`--staging-project`	GCP project ID for staging environment
`--prod-project`	GCP project ID for production environment
`--repository-name` / `--repository-owner`	GitHub repository name and owner
`--auto-approve`	Skip Terraform plan confirmation prompts
`--create-repository`	Create the GitHub repo if it doesn't exist

Run uvx agent-starter-pack setup-cicd --help for the full flag reference (Cloud Build options, dev project, region, etc.).

Choosing a CI/CD Runner

Runner	Pros	Cons
github_actions (Default)	No PAT needed, uses `gh auth`, WIF-based, fully automated	Requires GitHub CLI authentication
google_cloud_build	Native GCP integration	Requires interactive browser authorization (or PAT + app installation ID for programmatic mode)

How Authentication Works (WIF)

Both runners use Workload Identity Federation (WIF) — GitHub/Cloud Build OIDC tokens are trusted by a GCP Workload Identity Pool, which grants cicd_runner_sa impersonation. No long-lived service account keys needed. Terraform in setup-cicd creates the pool, provider, and SA bindings automatically. If auth fails, re-run terraform apply in the CI/CD Terraform directory.

CI/CD Pipeline Stages

The pipeline has three stages:

CI (PR checks) — Triggered on pull request. Runs unit and integration tests.
Staging CD — Triggered on merge to main. Builds container, deploys to staging, runs load tests.

Path filter: Staging CD uses paths: ['app/**'] — it only triggers when files under app/ change. The first push after setup-cicd won't trigger staging CD unless you modify something in app/. If nothing happens after pushing, this is why.

Production CD — Triggered after successful staging deploy via workflow_run. Might require manual approval before deploying to production.

Approving: Go to GitHub Actions → the production workflow run → click "Review deployments" → approve the pending production environment. This is GitHub's environment protection rules, not a custom mechanism.

IMPORTANT : setup-cicd creates infrastructure but doesn't deploy automatically. Terraform configures all required GitHub secrets and variables (WIF credentials, project IDs, service accounts). Push code to trigger the pipeline:

git add . && git commit -m "Initial agent implementation"
git push origin main

To approve production deployment:

# GitHub Actions: Approve via repository Actions tab (environment protection rules)

# Cloud Build: Find pending build and approve
gcloud builds list --project=PROD_PROJECT --region=REGION --filter="status=PENDING"
gcloud builds approve BUILD_ID --project=PROD_PROJECT

Cloud Run Specifics

For detailed infrastructure configuration (scaling defaults, Dockerfile, FastAPI endpoints, session types, networking), see references/cloud-run.md. For ADK docs on Cloud Run deployment, fetch https://google.github.io/adk-docs/deploy/cloud-run/index.md.

Agent Engine Specifics

Agent Engine is a managed Vertex AI service for deploying Python ADK agents. Uses source-based deployment (no Dockerfile) via deploy.py and the AdkApp class.

Nogcloud CLI exists for Agent Engine. Deploy via deploy.py or adk deploy agent_engine. Query via the Python vertexai.Client SDK.

Deployments can take 5-10 minutes. If make deploy times out, check if the engine was created and manually populate deployment_metadata.json with the engine resource ID (see reference for details).

For detailed infrastructure configuration (deploy.py flags, AdkApp pattern, Terraform resource, deployment metadata, session/artifact services, CI/CD differences), see references/agent-engine.md. For ADK docs on Agent Engine deployment, fetch https://google.github.io/adk-docs/deploy/agent-engine/index.md.

GKE Specifics

For detailed infrastructure configuration (Terraform-managed Kubernetes resources, Workload Identity, session types, networking), see references/gke.md. For ADK docs on GKE deployment, fetch https://google.github.io/adk-docs/deploy/gke/index.md.

Service Account Architecture

Scaffolded projects use two service accounts:

app_sa (per environment) — Runtime identity for the deployed agent. Roles defined in deployment/terraform/iam.tf.
cicd_runner_sa (CI/CD project) — CI/CD pipeline identity (GitHub Actions / Cloud Build). Lives in the CI/CD project (defaults to prod project), needs permissions in both staging and prod projects.

Check deployment/terraform/iam.tf for exact role bindings. Cross-project permissions (Cloud Run service agents, artifact registry access) are also configured there.

Common 403 errors:

"Permission denied on Cloud Run" → cicd_runner_sa missing deployment role in the target project
"Cannot act as service account" → Missing iam.serviceAccountUser binding on app_sa
"Secret access denied" → app_sa missing secretmanager.secretAccessor
"Artifact Registry read denied" → Cloud Run service agent missing read access in CI/CD project

Secret Manager (for API Credentials)

Instead of passing sensitive keys as environment variables, use GCP Secret Manager.

# Create a secret
echo -n "YOUR_API_KEY" | gcloud secrets create MY_SECRET_NAME --data-file=-

# Update an existing secret
echo -n "NEW_API_KEY" | gcloud secrets versions add MY_SECRET_NAME --data-file=-

Grant access: For Cloud Run, grant secretmanager.secretAccessor to app_sa. For Agent Engine, grant it to the platform-managed SA (service-PROJECT_NUMBER@gcp-sa-aiplatform-re.iam.gserviceaccount.com). For GKE, grant secretmanager.secretAccessor to app_sa. Access secrets via Kubernetes Secrets or directly via the Secret Manager API with Workload Identity.

Pass secrets at deploy time (Agent Engine):

make deploy SECRETS="API_KEY=my-api-key,DB_PASS=db-password:2"

Format: ENV_VAR=SECRET_ID or ENV_VAR=SECRET_ID:VERSION (defaults to latest). Access in code via os.environ.get("API_KEY").

Observability

See the adk-observability-guide skill for observability configuration (Cloud Trace, prompt-response logging, BigQuery Analytics, third-party integrations).

Testing Your Deployed Agent

Agent Engine Deployment

Option 1: Testing Notebook

jupyter notebook notebooks/adk_app_testing.ipynb

Option 2: Python Script

import json
import vertexai

with open("deployment_metadata.json") as f:
    engine_id = json.load(f)["remote_agent_engine_id"]

client = vertexai.Client(location="us-central1")
agent = client.agent_engines.get(name=engine_id)

async for event in agent.async_stream_query(message="Hello!", user_id="test"):
    print(event)

Option 3: Playground

make playground

Cloud Run Deployment

Auth required by default. Cloud Run deploys with --no-allow-unauthenticated, so all requests need an Authorization: Bearer header with an identity token. Getting a 403? You're likely missing this header. To allow public access, redeploy with --allow-unauthenticated.

SERVICE_URL="https://SERVICE_NAME-PROJECT_NUMBER.REGION.run.app"
AUTH="Authorization: Bearer $(gcloud auth print-identity-token)"

# Test health endpoint
curl -H "$AUTH" "$SERVICE_URL/"

# Step 1: Create a session (required before sending messages)
curl -X POST "$SERVICE_URL/apps/app/users/test-user/sessions" \
  -H "Content-Type: application/json" \
  -H "$AUTH" \
  -d '{}'
# → returns JSON with "id" — use this as SESSION_ID below

# Step 2: Send a message via SSE streaming
curl -X POST "$SERVICE_URL/run_sse" \
  -H "Content-Type: application/json" \
  -H "$AUTH" \
  -d '{
    "app_name": "app",
    "user_id": "test-user",
    "session_id": "SESSION_ID",
    "new_message": {"role": "user", "parts": [{"text": "Hello!"}]}
  }'

Common mistake: Using {"message": "Hello!", "user_id": "...", "session_id": "..."} returns 422 Field required. The ADK HTTP server expects the new_message / parts schema shown above, and the session must already exist.

GKE Deployment

GKE LoadBalancer services are public by default — no auth header needed (unlike Cloud Run). See references/gke.md for curl examples and endpoint details.

Load Tests

make load-test

See tests/load_test/README.md for configuration, default settings, and CI/CD integration details.

Deploying with a UI (IAP)

To expose your agent with a web UI protected by Google identity authentication:

# Deploy with IAP (built-in framework UI)
make deploy IAP=true

# Deploy with custom frontend on a different port
make deploy IAP=true PORT=5173

IAP (Identity-Aware Proxy) secures the Cloud Run service — only authorized Google accounts can access it. After deploying, grant user access via the Cloud Console IAP settings.

For Agent Engine with a custom frontend, use a decoupled deployment — deploy the frontend separately to Cloud Run or Cloud Storage, connecting to the Agent Engine backend API.

Rollback & Recovery

The primary rollback mechanism is git-based : fix the issue, commit, and push to main. The CI/CD pipeline will automatically build and deploy the new version through staging → production.

For immediate Cloud Run rollback without a new commit, use revision traffic shifting:

gcloud run revisions list --service=SERVICE_NAME --region=REGION
gcloud run services update-traffic SERVICE_NAME \
  --to-revisions=REVISION_NAME=100 --region=REGION

Agent Engine doesn't support revision-based rollback — fix and redeploy via make deploy.

For GKE rollback, use kubectl rollout undo:

kubectl rollout undo deployment/DEPLOYMENT_NAME -n NAMESPACE
kubectl rollout status deployment/DEPLOYMENT_NAME -n NAMESPACE

Custom Infrastructure (Terraform)

For custom infrastructure patterns (Pub/Sub, BigQuery, Eventarc, Cloud SQL, IAM), consult references/terraform-patterns.md for:

Where to put custom Terraform files (dev vs CI/CD)
Resource examples (Pub/Sub, BigQuery, Eventarc triggers)
IAM bindings for custom resources
Terraform state management (remote vs local, importing resources)
Common infrastructure patterns

Troubleshooting

Issue	Solution
Terraform state locked	`terraform force-unlock -force LOCK_ID` in deployment/terraform/
GitHub Actions auth failed	Re-run `terraform apply` in CI/CD terraform dir; verify WIF pool/provider
Cloud Build authorization pending	Use `github_actions` runner instead
Resource already exists	`terraform import` (see `references/terraform-patterns.md`)
Agent Engine deploy timeout / hangs	Deployments take 5-10 min; check if engine was created (see Agent Engine Specifics)
Secret not available

Weekly Installs

1.1K

Repository

google/adk-docs

GitHub Stars

1.2K

First Seen

Mar 9, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

codex1.1K

gemini-cli1.1K

cursor1.1K

opencode1.1K

github-copilot1.1K

kimi-cli1.1K

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

98,500 周安装