npx skills add https://github.com/runpod/skills --skill flash在本地编写代码,使用 flash run 进行测试(开发服务器位于 localhost:8888),flash 会自动在云端的远程 GPU/CPU 上配置和部署。Endpoint 处理一切。
pip install runpod-flash # 需要 Python >=3.10
# 认证选项 1: 基于浏览器的登录(本地保存令牌)
flash login
# 认证选项 2: 通过环境变量设置 API 密钥
export RUNPOD_API_KEY=your_key
flash init my-project # 在 ./my-project 中搭建新项目脚手架
flash run # 在 localhost:8888 启动本地开发服务器
flash run --auto-provision # 同上,但预置端点(无冷启动)
flash build # 打包用于部署的工件(500MB 限制)
flash build --exclude pkg1,pkg2 # 从构建中排除指定包
flash deploy # 构建 + 部署(如果只有一个环境则自动选择)
flash deploy --env staging # 构建 + 部署到 "staging" 环境
flash deploy --app my-app --env prod # 将特定应用部署到某个环境
flash deploy --preview # 构建并在 Docker 中启动本地预览
flash env list # 列出部署环境
flash env create staging # 创建 "staging" 环境
flash env get staging # 显示环境详情 + 资源
flash env delete staging # 删除环境 + 销毁资源
flash undeploy list # 列出所有活动端点
flash undeploy my-endpoint # 移除特定端点
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
一个函数 = 一个拥有自己工作进程的端点。
from runpod_flash import Endpoint, GpuGroup
@Endpoint(name="my-worker", gpu=GpuGroup.AMPERE_80, workers=5, dependencies=["torch"])
async def compute(data):
import torch # 必须在函数内部导入(cloudpickle)
return {"sum": torch.tensor(data, device="cuda").sum().item()}
result = await compute([1, 2, 3])
多个 HTTP 路由共享一个工作进程池。
from runpod_flash import Endpoint, GpuGroup
api = Endpoint(name="my-api", gpu=GpuGroup.ADA_24, workers=(1, 5), dependencies=["torch"])
@api.post("/predict")
async def predict(data: list[float]):
import torch
return {"result": torch.tensor(data, device="cuda").sum().item()}
@api.get("/health")
async def health():
return {"status": "ok"}
部署一个预构建的 Docker 镜像并通过 HTTP 调用它。
from runpod_flash import Endpoint, GpuGroup, PodTemplate
server = Endpoint(
name="my-server",
image="my-org/my-image:latest",
gpu=GpuGroup.AMPERE_80,
workers=1,
env={"HF_TOKEN": "xxx"},
template=PodTemplate(containerDiskInGb=100),
)
# LB 风格
result = await server.post("/v1/completions", {"prompt": "hello"})
models = await server.get("/v1/models")
# QB 风格
job = await server.run({"prompt": "hello"})
await job.wait()
print(job.output)
通过 ID 连接到现有端点(无需配置):
ep = Endpoint(id="abc123")
job = await ep.runsync({"input": "hello"})
print(job.output)
| 参数 | 模式 |
|---|---|
仅设置 name= | 装饰器模式(你的代码) |
设置了 image= | 客户端模式(部署镜像,然后进行 HTTP 调用) |
设置了 id= | 客户端模式(连接到现有端点,无需配置) |
Endpoint(
name="endpoint-name", # 必需(除非设置了 id=)
id=None, # 连接到现有端点
gpu=GpuGroup.AMPERE_80, # 单个 GPU 类型(默认:ANY)
gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_80], # 或列表,用于根据供应情况自动选择
cpu=CpuInstanceType.CPU5C_4_8, # CPU 类型(与 gpu 互斥)
workers=5, # (0, 5) 的简写
workers=(1, 5), # 显式指定 (min, max)
idle_timeout=60, # 缩容前的空闲时间(默认:60秒)
dependencies=["torch"], # 用于远程执行的 pip 包
system_dependencies=["ffmpeg"], # apt-get 包
image="org/image:tag", # 预构建的 Docker 镜像(客户端模式)
env={"KEY": "val"}, # 环境变量
volume=NetworkVolume(...), # 持久化存储
gpu_count=1, # 每个工作进程的 GPU 数量
template=PodTemplate(containerDiskInGb=100),
flashboot=True, # 快速冷启动
execution_timeout_ms=0, # 最大执行时间(0 = 无限制)
)
gpu= 和 cpu= 是互斥的workers=5 表示 (0, 5)。默认是 (0, 1)idle_timeout 默认是 60 秒flashboot=True(默认)-- 通过快照恢复启用快速冷启动gpu_count -- 每个工作进程的 GPU 数量(默认 1),对于多 GPU 模型使用 >1NetworkVolume(name="my-vol", size=100) # 大小以 GB 为单位,默认 100
PodTemplate(
containerDiskInGb=64, # 容器磁盘大小(默认 64)
dockerArgs="", # 额外的 docker 参数
ports="", # 暴露的端口
startScript="", # 启动时运行的脚本
)
在客户端模式下由 ep.run() 和 ep.runsync() 返回。
job = await ep.run({"data": [1, 2, 3]})
await job.wait(timeout=120) # 轮询直到完成
print(job.id, job.output, job.error, job.done)
await job.cancel()
| 枚举值 | GPU | 显存 |
|---|---|---|
ANY | 任意 | 可变 |
AMPERE_16 | RTX A4000 | 16GB |
AMPERE_24 | RTX A5000/L4 | 24GB |
AMPERE_48 | A40/A6000 | 48GB |
AMPERE_80 | A100 | 80GB |
ADA_24 | RTX 4090 | 24GB |
ADA_32_PRO | RTX 5090 | 32GB |
ADA_48_PRO | RTX 6000 Ada | 48GB |
ADA_80_PRO | H100 PCIe (80GB) / H100 HBM3 (80GB) / H100 NVL (94GB) | 80GB+ |
HOPPER_141 | H200 | 141GB |
| 枚举值 | vCPU | 内存 | 最大磁盘 | 类型 |
|---|---|---|---|---|
CPU3G_1_4 | 1 | 4GB | 10GB | 通用 |
CPU3G_2_8 | 2 | 8GB | 20GB | 通用 |
CPU3G_4_16 | 4 | 16GB | 40GB | 通用 |
CPU3G_8_32 | 8 | 32GB | 80GB | 通用 |
CPU3C_1_2 | 1 | 2GB | 10GB | 计算 |
CPU3C_2_4 | 2 | 4GB | 20GB | 计算 |
CPU3C_4_8 | 4 | 8GB | 40GB | 计算 |
CPU3C_8_16 | 8 | 16GB | 80GB | 计算 |
CPU5C_1_2 | 1 | 2GB | 15GB | 计算(第 5 代) |
CPU5C_2_4 | 2 | 4GB | 30GB | 计算(第 5 代) |
CPU5C_4_8 | 4 | 8GB | 60GB | 计算(第 5 代) |
CPU5C_8_16 | 8 | 16GB | 120GB | 计算(第 5 代) |
from runpod_flash import Endpoint, CpuInstanceType
@Endpoint(name="cpu-work", cpu=CpuInstanceType.CPU5C_4_8, workers=5, dependencies=["pandas"])
async def process(data):
import pandas as pd
return pd.DataFrame(data).describe().to_dict()
from runpod_flash import Endpoint, GpuGroup, CpuInstanceType
@Endpoint(name="preprocess", cpu=CpuInstanceType.CPU5C_4_8, workers=5, dependencies=["pandas"])
async def preprocess(raw):
import pandas as pd
return pd.DataFrame(raw).to_dict("records")
@Endpoint(name="infer", gpu=GpuGroup.AMPERE_80, workers=5, dependencies=["torch"])
async def infer(clean):
import torch
t = torch.tensor([[v for v in r.values()] for r in clean], device="cuda")
return {"predictions": t.mean(dim=1).tolist()}
async def pipeline(data):
return await infer(await preprocess(data))
import asyncio
results = await asyncio.gather(compute(a), compute(b), compute(c))
await。dependencies=[] 中列出。image=/id= = 客户端模式。否则 = 装饰器模式。gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_80])并设置 workers=5 或更高。平台仅在最大工作进程数至少为 5 时,才会根据供应情况自动切换 GPU 类型。runsync 超时时间为 60 秒 -- 冷启动可能超过 60 秒。对于首次请求,请使用 ep.runsync(data, timeout=120),或者改用 ep.run() + job.wait()。每周安装量
129
代码仓库
GitHub 星标数
6
首次出现
2026年2月10日
安全审计
安装于
gemini-cli127
opencode127
codex126
github-copilot125
amp123
kimi-cli123
Write code locally, test with flash run (dev server at localhost:8888), and flash automatically provisions and deploys to remote GPUs/CPUs in the cloud. Endpoint handles everything.
pip install runpod-flash # requires Python >=3.10
# auth option 1: browser-based login (saves token locally)
flash login
# auth option 2: API key via environment variable
export RUNPOD_API_KEY=your_key
flash init my-project # scaffold a new project in ./my-project
flash run # start local dev server at localhost:8888
flash run --auto-provision # same, but pre-provision endpoints (no cold start)
flash build # package artifact for deployment (500MB limit)
flash build --exclude pkg1,pkg2 # exclude packages from build
flash deploy # build + deploy (auto-selects env if only one)
flash deploy --env staging # build + deploy to "staging" environment
flash deploy --app my-app --env prod # deploy a specific app to an environment
flash deploy --preview # build + launch local preview in Docker
flash env list # list deployment environments
flash env create staging # create "staging" environment
flash env get staging # show environment details + resources
flash env delete staging # delete environment + tear down resources
flash undeploy list # list all active endpoints
flash undeploy my-endpoint # remove a specific endpoint
One function = one endpoint with its own workers.
from runpod_flash import Endpoint, GpuGroup
@Endpoint(name="my-worker", gpu=GpuGroup.AMPERE_80, workers=5, dependencies=["torch"])
async def compute(data):
import torch # MUST import inside function (cloudpickle)
return {"sum": torch.tensor(data, device="cuda").sum().item()}
result = await compute([1, 2, 3])
Multiple HTTP routes share one pool of workers.
from runpod_flash import Endpoint, GpuGroup
api = Endpoint(name="my-api", gpu=GpuGroup.ADA_24, workers=(1, 5), dependencies=["torch"])
@api.post("/predict")
async def predict(data: list[float]):
import torch
return {"result": torch.tensor(data, device="cuda").sum().item()}
@api.get("/health")
async def health():
return {"status": "ok"}
Deploy a pre-built Docker image and call it via HTTP.
from runpod_flash import Endpoint, GpuGroup, PodTemplate
server = Endpoint(
name="my-server",
image="my-org/my-image:latest",
gpu=GpuGroup.AMPERE_80,
workers=1,
env={"HF_TOKEN": "xxx"},
template=PodTemplate(containerDiskInGb=100),
)
# LB-style
result = await server.post("/v1/completions", {"prompt": "hello"})
models = await server.get("/v1/models")
# QB-style
job = await server.run({"prompt": "hello"})
await job.wait()
print(job.output)
Connect to an existing endpoint by ID (no provisioning):
ep = Endpoint(id="abc123")
job = await ep.runsync({"input": "hello"})
print(job.output)
| Parameters | Mode |
|---|---|
name= only | Decorator (your code) |
image= set | Client (deploys image, then HTTP calls) |
id= set | Client (connects to existing, no provisioning) |
Endpoint(
name="endpoint-name", # required (unless id= set)
id=None, # connect to existing endpoint
gpu=GpuGroup.AMPERE_80, # single GPU type (default: ANY)
gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_80], # or list for auto-select by supply
cpu=CpuInstanceType.CPU5C_4_8, # CPU type (mutually exclusive with gpu)
workers=5, # shorthand for (0, 5)
workers=(1, 5), # explicit (min, max)
idle_timeout=60, # seconds before scale-down (default: 60)
dependencies=["torch"], # pip packages for remote exec
system_dependencies=["ffmpeg"], # apt-get packages
image="org/image:tag", # pre-built Docker image (client mode)
env={"KEY": "val"}, # environment variables
volume=NetworkVolume(...), # persistent storage
gpu_count=1, # GPUs per worker
template=PodTemplate(containerDiskInGb=100),
flashboot=True, # fast cold starts
execution_timeout_ms=0, # max execution time (0 = unlimited)
)
gpu= and cpu= are mutually exclusiveworkers=5 means (0, 5). Default is (0, 1)idle_timeout default is 60 secondsflashboot=True (default) -- enables fast cold starts via snapshot restoregpu_count -- GPUs per worker (default 1), use >1 for multi-GPU modelsNetworkVolume(name="my-vol", size=100) # size in GB, default 100
PodTemplate(
containerDiskInGb=64, # container disk size (default 64)
dockerArgs="", # extra docker arguments
ports="", # exposed ports
startScript="", # script to run on start
)
Returned by ep.run() and ep.runsync() in client mode.
job = await ep.run({"data": [1, 2, 3]})
await job.wait(timeout=120) # poll until done
print(job.id, job.output, job.error, job.done)
await job.cancel()
| Enum | GPU | VRAM |
|---|---|---|
ANY | any | varies |
AMPERE_16 | RTX A4000 | 16GB |
AMPERE_24 | RTX A5000/L4 | 24GB |
AMPERE_48 | A40/A6000 | 48GB |
AMPERE_80 | A100 | 80GB |
| Enum | vCPU | RAM | Max Disk | Type |
|---|---|---|---|---|
CPU3G_1_4 | 1 | 4GB | 10GB | General |
CPU3G_2_8 | 2 | 8GB | 20GB | General |
CPU3G_4_16 | 4 | 16GB | 40GB | General |
CPU3G_8_32 | 8 |
from runpod_flash import Endpoint, CpuInstanceType
@Endpoint(name="cpu-work", cpu=CpuInstanceType.CPU5C_4_8, workers=5, dependencies=["pandas"])
async def process(data):
import pandas as pd
return pd.DataFrame(data).describe().to_dict()
from runpod_flash import Endpoint, GpuGroup, CpuInstanceType
@Endpoint(name="preprocess", cpu=CpuInstanceType.CPU5C_4_8, workers=5, dependencies=["pandas"])
async def preprocess(raw):
import pandas as pd
return pd.DataFrame(raw).to_dict("records")
@Endpoint(name="infer", gpu=GpuGroup.AMPERE_80, workers=5, dependencies=["torch"])
async def infer(clean):
import torch
t = torch.tensor([[v for v in r.values()] for r in clean], device="cuda")
return {"predictions": t.mean(dim=1).tolist()}
async def pipeline(data):
return await infer(await preprocess(data))
import asyncio
results = await asyncio.gather(compute(a), compute(b), compute(c))
await.dependencies=[].image=/id= = client. Otherwise = decorator.gpu=[GpuGroup.ADA_24, GpuGroup.AMPERE_80]) and set workers=5 or higher. The platform only auto-switches GPU types based on supply when max workers is at least 5.Weekly Installs
129
Repository
GitHub Stars
6
First Seen
Feb 10, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
gemini-cli127
opencode127
codex126
github-copilot125
amp123
kimi-cli123
专业金融建模工具包:DCF分析、蒙特卡洛模拟、敏感性分析、企业估值与风险评估
77 周安装
供应链优化工具 supply-chain-optimizer | 开源AI助手技能,提升物流与供应链管理效率
77 周安装
.NET NuGet包升级工具 - 自动检测破坏性变更与迁移指南
77 周安装
AI工程师专家指南:LLM集成、提示工程与生产级AI应用部署实战
77 周安装
GitHub Issues自动化处理工具 - 使用Jules AI代理自动分析、规划和修复代码问题
77 周安装
deploy-mcp 技能:AI 代理部署与 MCP 集成工具 | 提升开发效率
77 周安装
ADA_24| RTX 4090 |
| 24GB |
ADA_32_PRO | RTX 5090 | 32GB |
ADA_48_PRO | RTX 6000 Ada | 48GB |
ADA_80_PRO | H100 PCIe (80GB) / H100 HBM3 (80GB) / H100 NVL (94GB) | 80GB+ |
HOPPER_141 | H200 | 141GB |
| 32GB |
| 80GB |
| General |
CPU3C_1_2 | 1 | 2GB | 10GB | Compute |
CPU3C_2_4 | 2 | 4GB | 20GB | Compute |
CPU3C_4_8 | 4 | 8GB | 40GB | Compute |
CPU3C_8_16 | 8 | 16GB | 80GB | Compute |
CPU5C_1_2 | 1 | 2GB | 15GB | Compute (5th gen) |
CPU5C_2_4 | 2 | 4GB | 30GB | Compute (5th gen) |
CPU5C_4_8 | 4 | 8GB | 60GB | Compute (5th gen) |
CPU5C_8_16 | 8 | 16GB | 120GB | Compute (5th gen) |
runsync timeout is 60sep.runsync(data, timeout=120)ep.run()job.wait()