⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

Modal.com 无服务器云平台完整指南：GPU Python函数、定价、部署最佳实践

modal-knowledge by josiahsiegel/claude-plugin-marketplace

64 周安装量

21 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/josiahsiegel/claude-plugin-marketplace --skill modal-knowledge

AI/机器学习云服务开发运维

🇨🇳中文介绍

Modal 知识技能

全面的 Modal.com 平台知识，涵盖所有功能、定价和最佳实践。当用户需要关于 Modal 无服务器云平台的详细信息时，激活此技能。

激活触发条件

当用户询问以下内容时激活此技能：

Modal.com 平台功能和能力
GPU 加速的 Python 函数
无服务器容器配置
Modal 定价和计费
Modal CLI 命令
Modal 上的 Web 端点和 API
Modal 上的计划/cron 作业
Modal 卷、密钥和存储
使用 Modal 进行并行处理
Modal 部署和 CI/CD

平台概述

Modal 是一个用于运行 Python 代码的无服务器云平台，针对 AI/ML 工作负载进行了优化，具有以下特点：

零配置 ：所有内容都在 Python 代码中定义
快速 GPU 启动 ：约 1 秒容器启动
自动扩缩容 ：缩放到零，扩展到数千
按秒计费 ：仅为活跃计算付费
多云支持 ：AWS、GCP、Oracle Cloud Infrastructure

核心组件参考

应用和函数

import modal

app = modal.App("app-name")

@app.function()
def basic_function(arg: str) -> str:
    return f"Result: {arg}"

@app.local_entrypoint()
def main():
    result = basic_function.remote("test")
    print(result)

函数装饰器参数

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

参数	类型	描述
`image`	Image	容器镜像配置
`gpu`	str/list	GPU 类型："T4"、"A100"、["H100", "A100"]
`cpu`	float	CPU 核心数（0.125 到 64）
`memory`	int	内存，单位 MB（128 到 262144）
`timeout`	int	最大执行秒数
`retries`	int	失败时的重试次数
`secrets`	list	要注入的密钥
`volumes`	dict	卷挂载点
`schedule`	Cron/Period	计划执行
`concurrency_limit`	int	最大并发执行数
`container_idle_timeout`	int	保持容器预热状态的秒数
`include_source`	bool	自动同步源代码

GPU	内存	使用场景	约成本/小时
T4	16 GB	小型推理	$0.59
L4	24 GB	中型推理	$0.80
A10G	24 GB	推理/微调	$1.10
L40S	48 GB	重型推理	$1.50
A100-40GB	40 GB	训练	$2.00
A100-80GB	80 GB	大型模型	$3.00
H100	80 GB	前沿模型	$5.00
H200	141 GB	最大模型	$5.00
B200	180+ GB	最新一代	$6.25

FastAPI 端点（简单）

@app.function()
@modal.fastapi_endpoint()
def hello(name: str = "World"):
    return {"message": f"Hello, {name}!"}

ASGI 应用（完整 FastAPI）

from fastapi import FastAPI
web_app = FastAPI()

@web_app.post("/predict")
def predict(text: str):
    return {"result": process(text)}

@app.function()
@modal.asgi_app()
def fastapi_app():
    return web_app

WSGI 应用（Flask）

from flask import Flask
flask_app = Flask(__name__)

@app.function()
@modal.wsgi_app()
def flask_endpoint():
    return flask_app

自定义 Web 服务器

@app.function()
@modal.web_server(port=8000)
def custom_server():
    subprocess.run(["python", "-m", "http.server", "8000"])

@modal.asgi_app(custom_domains=["api.example.com"])

# 每天 UTC 时间 8 点
@app.function(schedule=modal.Cron("0 8 * * *"))

# 带时区
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))

@app.function(schedule=modal.Period(hours=5))
@app.function(schedule=modal.Period(days=1))

注意： 计划函数仅在使用 modal deploy 时运行，而不是 modal run。

# 并行执行（最多 1000 个并发）
results = list(func.map(items))

# 无序（更快）
results = list(func.map(items, order_outputs=False))

# 展开参数
pairs = [(1, 2), (3, 4)]
results = list(add.starmap(pairs))

# 异步作业（立即返回）
call = func.spawn(data)
result = call.get()  # 稍后获取结果

# 生成多个
calls = [func.spawn(item) for item in items]
results = [call.get() for call in calls]

容器生命周期（类）

@app.cls(gpu="A100", container_idle_timeout=300)
class Server:

    @modal.enter()
    def load(self):
        self.model = load_model()

    @modal.method()
    def predict(self, text):
        return self.model(text)

    @modal.exit()
    def cleanup(self):
        del self.model

@modal.concurrent(max_inputs=100, target_inputs=80)
@modal.method()
def batched(self, item):
    pass

modal run app.py              # 运行函数
modal serve app.py            # 热重载开发服务器
modal shell app.py            # 交互式 shell
modal shell app.py --gpu A100 # 带 GPU 的 shell

modal deploy app.py           # 部署
modal app list                # 列出应用
modal app logs app-name       # 查看日志
modal app stop app-name       # 停止应用

# 卷
modal volume create name
modal volume list
modal volume put name local remote
modal volume get name remote local

# 密钥
modal secret create name KEY=value
modal secret list

# 环境
modal environment create staging

套餐	价格	容器数	GPU 并发数
Starter	免费（$30 额度）	100	10
Team	$250/月	1000	50
Enterprise	定制	无限制	定制

CPU ：$0.0000131/核心/秒
内存：$0.00000222/GiB/秒
GPU ：参见上方的 GPU 表

初创公司：最高 $25k 额度
研究人员：最高 $10k 额度

使用 @modal.enter() 加载模型
使用 uv_pip_install 以加速构建
使用 GPU 后备方案 以提高可用性
设置适当的超时 和重试次数
使用环境（开发/预发布/生产）
在构建时下载模型，而非运行时
当顺序不重要时使用 order_outputs=False
设置 container_idle_timeout 以平衡成本和延迟
在 Modal 仪表板中监控成本
在 modal deploy 之前使用 modal run 进行测试

@app.cls(gpu="A100", container_idle_timeout=300)
class LLM:
    @modal.enter()
    def load(self):
        from vllm import LLM
        self.llm = LLM(model="...")

    @modal.method()
    def generate(self, prompt):
        return self.llm.generate([prompt])

@app.function(volumes={"/data": vol})
def process(file):
    # 处理文件
    vol.commit()

# 并行
results = list(process.map(files))

@app.function(
    schedule=modal.Cron("0 6 * * *"),
    secrets=[modal.Secret.from_name("db")]
)
def daily_etl():
    extract()
    transform()
    load()

任务	代码
创建应用	`app = modal.App("name")`
基本函数	`@app.function()`
使用 GPU	`@app.function(gpu="A100")`
使用镜像	`@app.function(image=img)`
Web 端点	`@modal.asgi_app()`
计划任务	`schedule=modal.Cron("...")`
挂载卷	`volumes={"/path": vol}`
使用密钥	`secrets=[modal.Secret.from_name("x")]`
并行 map	`func.map(items)`
异步 spawn	`func.spawn(arg)`
类模式	`@app.cls()` 配合 `@modal.enter()`

🇺🇸English

Modal Knowledge Skill

Comprehensive Modal.com platform knowledge covering all features, pricing, and best practices. Activate this skill when users need detailed information about Modal's serverless cloud platform.

Activation Triggers

Activate this skill when users ask about:

Modal.com platform features and capabilities
GPU-accelerated Python functions
Serverless container configuration
Modal pricing and billing
Modal CLI commands
Web endpoints and APIs on Modal
Scheduled/cron jobs on Modal
Modal volumes, secrets, and storage
Parallel processing with Modal
Modal deployment and CI/CD

Platform Overview

Modal is a serverless cloud platform for running Python code, optimized for AI/ML workloads with:

Zero Configuration : Everything defined in Python code
Fast GPU Startup : ~1 second container spin-up
Automatic Scaling : Scale to zero, scale to thousands
Per-Second Billing : Only pay for active compute
Multi-Cloud : AWS, GCP, Oracle Cloud Infrastructure

Core Components Reference

Apps and Functions

import modal

app = modal.App("app-name")

@app.function()
def basic_function(arg: str) -> str:
    return f"Result: {arg}"

@app.local_entrypoint()
def main():
    result = basic_function.remote("test")
    print(result)

Function Decorator Parameters

Parameter	Type	Description
`image`	Image	Container image configuration
`gpu`	str/list	GPU type(s): "T4", "A100", ["H100", "A100"]
`cpu`	float	CPU cores (0.125 to 64)
`memory`	int	Memory in MB (128 to 262144)
`timeout`	int	Max execution seconds

GPU Reference

Available GPUs

GPU	Memory	Use Case	~Cost/hr
T4	16 GB	Small inference	$0.59
L4	24 GB	Medium inference	$0.80
A10G	24 GB	Inference/fine-tuning	$1.10
L40S	48 GB	Heavy inference	$1.50
A100-40GB	40 GB	Training	$2.00
A100-80GB	80 GB	Large models	$3.00
H100	80 GB	Cutting-edge	$5.00
H200

GPU Configuration

# Single GPU
@app.function(gpu="A100")

# Specific memory variant
@app.function(gpu="A100-80GB")

# Multi-GPU
@app.function(gpu="H100:4")

# Fallbacks (tries in order)
@app.function(gpu=["H100", "A100", "any"])

# "any" = L4, A10G, or T4
@app.function(gpu="any")

Image Building

Base Images

# Debian slim (recommended)
modal.Image.debian_slim(python_version="3.11")

# From Dockerfile
modal.Image.from_dockerfile("./Dockerfile")

# From Docker registry
modal.Image.from_registry("nvidia/cuda:12.1.0-base-ubuntu22.04")

Package Installation

# pip (standard)
image.pip_install("torch", "transformers")

# uv (FASTER - 10-100x)
image.uv_pip_install("torch", "transformers")

# System packages
image.apt_install("ffmpeg", "libsm6")

# Shell commands
image.run_commands("apt-get update", "make install")

Adding Files

# Single file
image.add_local_file("./config.json", "/app/config.json")

# Directory
image.add_local_dir("./models", "/app/models")

# Python source
image.add_local_python_source("my_module")

# Environment variables
image.env({"VAR": "value"})

Build-Time Function

def download_model():
    from huggingface_hub import snapshot_download
    snapshot_download("model-name")

image.run_function(download_model, secrets=[...])

Storage

Volumes

# Create/reference volume
vol = modal.Volume.from_name("my-vol", create_if_missing=True)

# Mount in function
@app.function(volumes={"/data": vol})
def func():
    # Read/write to /data
    vol.commit()  # Persist changes

Secrets

# From dashboard (recommended)
modal.Secret.from_name("secret-name")

# From dictionary
modal.Secret.from_dict({"KEY": "value"})

# From local env
modal.Secret.from_local_environ(["KEY1", "KEY2"])

# From .env file
modal.Secret.from_dotenv()

# Usage
@app.function(secrets=[modal.Secret.from_name("api-keys")])
def func():
    import os
    key = os.environ["API_KEY"]

Dict and Queue

# Distributed dict
d = modal.Dict.from_name("cache", create_if_missing=True)
d["key"] = "value"
d.put("key", "value", ttl=3600)

# Distributed queue
q = modal.Queue.from_name("jobs", create_if_missing=True)
q.put("task")
item = q.get()

Web Endpoints

FastAPI Endpoint (Simple)

@app.function()
@modal.fastapi_endpoint()
def hello(name: str = "World"):
    return {"message": f"Hello, {name}!"}

ASGI App (Full FastAPI)

from fastapi import FastAPI
web_app = FastAPI()

@web_app.post("/predict")
def predict(text: str):
    return {"result": process(text)}

@app.function()
@modal.asgi_app()
def fastapi_app():
    return web_app

WSGI App (Flask)

from flask import Flask
flask_app = Flask(__name__)

@app.function()
@modal.wsgi_app()
def flask_endpoint():
    return flask_app

Custom Web Server

@app.function()
@modal.web_server(port=8000)
def custom_server():
    subprocess.run(["python", "-m", "http.server", "8000"])

Custom Domains

@modal.asgi_app(custom_domains=["api.example.com"])

Scheduling

Cron

# Daily at 8 AM UTC
@app.function(schedule=modal.Cron("0 8 * * *"))

# With timezone
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))

Period

@app.function(schedule=modal.Period(hours=5))
@app.function(schedule=modal.Period(days=1))

Note: Scheduled functions only run with modal deploy, not modal run.

Parallel Processing

Map

# Parallel execution (up to 1000 concurrent)
results = list(func.map(items))

# Unordered (faster)
results = list(func.map(items, order_outputs=False))

Starmap

# Spread args
pairs = [(1, 2), (3, 4)]
results = list(add.starmap(pairs))

Spawn

# Async job (returns immediately)
call = func.spawn(data)
result = call.get()  # Get result later

# Spawn many
calls = [func.spawn(item) for item in items]
results = [call.get() for call in calls]

Container Lifecycle (Classes)

@app.cls(gpu="A100", container_idle_timeout=300)
class Server:

    @modal.enter()
    def load(self):
        self.model = load_model()

    @modal.method()
    def predict(self, text):
        return self.model(text)

    @modal.exit()
    def cleanup(self):
        del self.model

Concurrency

@modal.concurrent(max_inputs=100, target_inputs=80)
@modal.method()
def batched(self, item):
    pass

CLI Commands

Development

modal run app.py              # Run function
modal serve app.py            # Hot-reload dev server
modal shell app.py            # Interactive shell
modal shell app.py --gpu A100 # Shell with GPU

Deployment

modal deploy app.py           # Deploy
modal app list                # List apps
modal app logs app-name       # View logs
modal app stop app-name       # Stop app

Resources

# Volumes
modal volume create name
modal volume list
modal volume put name local remote
modal volume get name remote local

# Secrets
modal secret create name KEY=value
modal secret list

# Environments
modal environment create staging

Pricing (2025)

Plans

Plan	Price	Containers	GPU Concurrency
Starter	Free ($30 credits)	100	10
Team	$250/month	1000	50
Enterprise	Custom	Unlimited	Custom

Compute

CPU : $0.0000131/core/sec
Memory : $0.00000222/GiB/sec
GPUs : See GPU table above

Special Programs

Startups: Up to $25k credits
Researchers: Up to $10k credits

Best Practices

Use@modal.enter() for model loading
Useuv_pip_install for faster builds
Use GPU fallbacks for availability
Set appropriate timeouts and retries
Use environments (dev/staging/prod)
Download models during build , not runtime
Useorder_outputs=False when order doesn't matter
Setcontainer_idle_timeout to balance cost/latency
Monitor costs in Modal dashboard
Test withmodal run before modal deploy

Common Patterns

LLM Inference

@app.cls(gpu="A100", container_idle_timeout=300)
class LLM:
    @modal.enter()
    def load(self):
        from vllm import LLM
        self.llm = LLM(model="...")

    @modal.method()
    def generate(self, prompt):
        return self.llm.generate([prompt])

Batch Processing

@app.function(volumes={"/data": vol})
def process(file):
    # Process file
    vol.commit()

# Parallel
results = list(process.map(files))

Scheduled ETL

@app.function(
    schedule=modal.Cron("0 6 * * *"),
    secrets=[modal.Secret.from_name("db")]
)
def daily_etl():
    extract()
    transform()
    load()

Quick Reference

Task	Code
Create app	`app = modal.App("name")`
Basic function	`@app.function()`
With GPU	`@app.function(gpu="A100")`
With image	`@app.function(image=img)`
Web endpoint	`@modal.asgi_app()`
Scheduled	`schedule=modal.Cron("...")`

Weekly Installs

Repository

josiahsiegel/cl…ketplace

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

claude-code51

gemini-cli50

opencode50

codex47

cursor46

github-copilot42

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

145,500 周安装

Modal.com 无服务器云平台完整指南：GPU Python函数、定价、部署最佳实践

🇨🇳中文介绍

Modal 知识技能

激活触发条件

平台概述

核心组件参考

应用和函数

函数装饰器参数

相关 Skills

GPU 参考

可用 GPU

GPU 配置

镜像构建

基础镜像

包安装

添加文件

构建时函数

存储

卷

密钥

字典和队列

Web 端点

FastAPI 端点（简单）

ASGI 应用（完整 FastAPI）

WSGI 应用（Flask）

自定义 Web 服务器

自定义域名

调度

Cron

周期

并行处理

Map

Starmap

Spawn

容器生命周期（类）

并发

CLI 命令

开发

部署

资源

定价（2025）

套餐

计算

特别计划

最佳实践

常见模式

LLM 推理

批处理

计划 ETL

快速参考

🇺🇸English

Modal Knowledge Skill

Activation Triggers

Platform Overview

Core Components Reference

Apps and Functions

Function Decorator Parameters

GPU Reference

Available GPUs

GPU Configuration

Image Building

Base Images

Package Installation

Adding Files

Build-Time Function

Storage

Volumes

Secrets

Dict and Queue

Web Endpoints

FastAPI Endpoint (Simple)

ASGI App (Full FastAPI)

WSGI App (Flask)

Custom Web Server

Custom Domains

Scheduling

Cron

Period

Parallel Processing

Map