Modal 无服务器平台：Python 云函数、GPU 加速计算和 AI/ML 部署指南

modal by davila7/claude-code-templates

205 周安装量

24,200 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill modal

AI/机器学习云服务开发运维

🇨🇳中文介绍

Modal

概述

Modal 是一个无服务器平台，用于在云中以最少的配置运行 Python 代码。在强大的 GPU 上执行函数，自动扩展到数千个容器，并且只为使用的计算资源付费。

Modal 特别适合 AI/ML 工作负载、高性能批处理、计划任务、GPU 推理和无服务器 API。在 https://modal.com 免费注册，每月可获得 30 美元的额度。

何时使用此技能

在以下场景使用 Modal：

部署和提供 ML 模型（LLMs、图像生成、嵌入模型）
运行 GPU 加速计算（训练、推理、渲染）
并行批处理大型数据集
调度计算密集型任务（每日数据处理、模型训练）
构建需要自动扩展的无服务器 API
需要分布式计算或专用硬件的科学计算

身份验证与设置

Modal 需要通过 API 令牌进行身份验证。

初始设置

# 安装 Modal
uv uv pip install modal

# 身份验证（打开浏览器登录）
modal token new

这将在 ~/.modal.toml 中创建一个存储的令牌。该令牌用于验证所有 Modal 操作。

验证设置

import modal

app = modal.App("test-app")

@app.function()
def hello():
    print("Modal is working!")

运行命令：modal run script.py

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

1. 定义容器镜像

使用 Modal 镜像为函数指定依赖项和环境。

import modal

# 包含 Python 包的基础镜像
image = (
    modal.Image.debian_slim(python_version="3.12")
    .uv_pip_install("torch", "transformers", "numpy")
)

app = modal.App("ml-app", image=image)

安装 Python 包：.uv_pip_install("pandas", "scikit-learn")
安装系统包：.apt_install("ffmpeg", "git")
使用现有 Docker 镜像：modal.Image.from_registry("nvidia/cuda:12.1.0-base")
添加本地代码：.add_local_python_source("my_module")

有关全面的镜像构建文档，请参阅 references/images.md。

使用 @app.function() 装饰器定义在云中运行的函数。

@app.function()
def process_data(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    return df.describe()

# 从本地入口点调用
@app.local_entrypoint()
def main():
    result = process_data.remote("data.csv")
    print(result)

运行命令：modal run script.py

有关函数模式、部署和参数处理的详细信息，请参阅 references/functions.md。

为函数附加 GPU 以进行加速计算。

@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # 此处为 GPU 加速代码

可用的 GPU 类型：

T4, L4 - 经济高效的推理
A10, A100, A100-80GB - 标准训练/推理
L40S - 出色的性价比平衡（48GB）
H100, H200 - 高性能训练
B200 - 旗舰性能（最强大）

请求多个 GPU：

@app.function(gpu="H100:8")  # 8 个 H100 GPU
def train_large_model():
    pass

有关 GPU 选择指南、CUDA 设置和多 GPU 配置，请参阅 references/gpu.md。

为函数请求 CPU 核心、内存和磁盘。

@app.function(
    cpu=8.0,           # 8 个物理核心
    memory=32768,      # 32 GiB RAM
    ephemeral_disk=10240  # 10 GiB 磁盘
)
def memory_intensive_task():
    pass

默认分配：0.125 个 CPU 核心，128 MiB 内存。计费基于预留或实际使用量中的较高者。

有关资源限制和计费详情，请参阅 references/resources.md。

Modal 根据需求将函数从零自动扩展到数千个容器。

并行处理输入：

@app.function()
def analyze_sample(sample_id: int):
    # 处理单个样本
    return result

@app.local_entrypoint()
def main():
    sample_ids = range(1000)
    # 自动在容器间并行化
    results = list(analyze_sample.map(sample_ids))

配置自动扩展：

@app.function(
    max_containers=100,      # 上限
    min_containers=2,        # 保持预热
    buffer_containers=5      # 应对突发流量的空闲缓冲区
)
def inference():
    pass

有关自动扩展配置、并发和扩展限制，请参阅 references/scaling.md。

6. 持久化存储数据

使用 Volumes 在函数调用之间进行持久化存储。

volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # 持久化更改

Volumes 在运行之间持久化数据，存储模型权重，缓存数据集，并在函数之间共享数据。

有关卷管理、提交和缓存模式，请参阅 references/volumes.md。

使用 Modal Secrets 安全地存储 API 密钥和凭据。

@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]
    # 使用令牌进行身份验证

在 Modal 仪表板或通过 CLI 创建密钥：

modal secret create my-secret KEY=value API_TOKEN=xyz

有关密钥管理和身份验证模式，请参阅 references/secrets.md。

8. 部署 Web 端点

使用 @modal.web_endpoint() 提供 HTTP 端点、API 和 webhook。

@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    # 处理请求
    result = model.predict(data["input"])
    return {"prediction": result}

modal deploy script.py

Modal 为端点提供 HTTPS URL。

有关 FastAPI 集成、流式传输、身份验证和 WebSocket 支持，请参阅 references/web-endpoints.md。

使用 cron 表达式按计划运行函数。

@app.function(schedule=modal.Cron("0 2 * * *"))  # 每天凌晨 2 点
def daily_backup():
    # 备份数据
    pass

@app.function(schedule=modal.Period(hours=4))  # 每 4 小时
def refresh_cache():
    # 更新缓存
    pass

计划函数会自动运行，无需手动调用。

有关 cron 语法、时区配置和监控，请参阅 references/scheduled-jobs.md。

部署 ML 模型进行推理

import modal

# 定义依赖项
image = modal.Image.debian_slim().uv_pip_install("torch", "transformers")
app = modal.App("llm-inference", image=image)

# 在构建时下载模型
@app.function()
def download_model():
    from transformers import AutoModel
    AutoModel.from_pretrained("bert-base-uncased")

# 提供模型服务
@app.cls(gpu="L40S")
class Model:
    @modal.enter()
    def load_model(self):
        from transformers import pipeline
        self.pipe = pipeline("text-classification", device="cuda")

    @modal.method()
    def predict(self, text: str):
        return self.pipe(text)

@app.local_entrypoint()
def main():
    model = Model()
    result = model.predict.remote("Modal is great!")
    print(result)

批处理大型数据集

@app.function(cpu=2.0, memory=4096)
def process_file(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    # 处理数据
    return df.shape[0]

@app.local_entrypoint()
def main():
    files = ["file1.csv", "file2.csv", ...]  # 数千个文件
    # 自动在容器间并行化
    for count in process_file.map(files):
        print(f"Processed {count} rows")

在 GPU 上训练模型

@app.function(
    gpu="A100:2",      # 2 个 A100 GPU
    timeout=3600       # 1 小时超时
)
def train_model(config: dict):
    import torch
    # 多 GPU 训练代码
    model = create_model(config)
    train(model)
    return metrics

特定功能的详细文档：

references/getting-started.md - 身份验证、设置、基本概念
references/images.md - 镜像构建、依赖项、Dockerfiles
references/functions.md - 函数模式、部署、参数
references/gpu.md - GPU 类型、CUDA、多 GPU 配置
references/resources.md - CPU、内存、磁盘管理
references/scaling.md - 自动扩展、并行执行、并发
references/volumes.md - 持久化存储、数据管理
references/secrets.md - 环境变量、身份验证
references/web-endpoints.md - API、webhook、端点
references/scheduled-jobs.md - Cron 任务、周期性任务
references/examples.md - 科学计算的常见模式

固定依赖项 在 .uv_pip_install() 中固定依赖项以实现可重复构建
使用合适的 GPU 类型 - L40S 用于推理，H100/A100 用于训练
利用缓存 - 使用 Volumes 存储模型权重和数据集
配置自动扩展 - 根据工作负载设置 max_containers 和 min_containers
在函数体内导入包 如果本地不可用
使用 .map() 进行并行处理 而不是顺序循环
安全存储密钥 - 切勿硬编码 API 密钥
监控成本 - 在 Modal 仪表板中查看使用情况和计费

"Module not found" 错误：

使用 .uv_pip_install("package-name") 将包添加到镜像中
如果本地不可用，请在函数体内导入包

未检测到 GPU：

验证 GPU 规格：@app.function(gpu="A100")
检查 CUDA 可用性：torch.cuda.is_available()

增加超时时间：@app.function(timeout=3600)
默认超时为 5 分钟

卷更改未持久化：

写入文件后调用 volume.commit()
验证卷在函数装饰器中是否正确挂载

如需更多帮助，请参阅 Modal 文档 https://modal.com/docs 或加入 Modal Slack 社区。

2026 年 1 月 21 日

🇺🇸English

Modal

Overview

Modal is a serverless platform for running Python code in the cloud with minimal configuration. Execute functions on powerful GPUs, scale automatically to thousands of containers, and pay only for compute used.

Modal is particularly suited for AI/ML workloads, high-performance batch processing, scheduled jobs, GPU inference, and serverless APIs. Sign up for free at https://modal.com and receive $30/month in credits.

When to Use This Skill

Use Modal for:

Deploying and serving ML models (LLMs, image generation, embedding models)
Running GPU-accelerated computation (training, inference, rendering)
Batch processing large datasets in parallel
Scheduling compute-intensive jobs (daily data processing, model training)
Building serverless APIs that need automatic scaling
Scientific computing requiring distributed compute or specialized hardware

Authentication and Setup

Modal requires authentication via API token.

Initial Setup

# Install Modal
uv uv pip install modal

# Authenticate (opens browser for login)
modal token new

This creates a token stored in ~/.modal.toml. The token authenticates all Modal operations.

Verify Setup

import modal

app = modal.App("test-app")

@app.function()
def hello():
    print("Modal is working!")

Run with: modal run script.py

Core Capabilities

Modal provides serverless Python execution through Functions that run in containers. Define compute requirements, dependencies, and scaling behavior declaratively.

1. Define Container Images

Specify dependencies and environment for functions using Modal Images.

import modal

# Basic image with Python packages
image = (
    modal.Image.debian_slim(python_version="3.12")
    .uv_pip_install("torch", "transformers", "numpy")
)

app = modal.App("ml-app", image=image)

Common patterns:

Install Python packages: .uv_pip_install("pandas", "scikit-learn")
Install system packages: .apt_install("ffmpeg", "git")
Use existing Docker images: modal.Image.from_registry("nvidia/cuda:12.1.0-base")
Add local code: .add_local_python_source("my_module")

See references/images.md for comprehensive image building documentation.

2. Create Functions

Define functions that run in the cloud with the @app.function() decorator.

@app.function()
def process_data(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    return df.describe()

Call functions:

# From local entrypoint
@app.local_entrypoint()
def main():
    result = process_data.remote("data.csv")
    print(result)

Run with: modal run script.py

See references/functions.md for function patterns, deployment, and parameter handling.

3. Request GPUs

Attach GPUs to functions for accelerated computation.

@app.function(gpu="H100")
def train_model():
    import torch
    assert torch.cuda.is_available()
    # GPU-accelerated code here

Available GPU types:

T4, L4 - Cost-effective inference
A10, A100, A100-80GB - Standard training/inference
L40S - Excellent cost/performance balance (48GB)
H100, H200 - High-performance training
B200 - Flagship performance (most powerful)

Request multiple GPUs:

@app.function(gpu="H100:8")  # 8x H100 GPUs
def train_large_model():
    pass

See references/gpu.md for GPU selection guidance, CUDA setup, and multi-GPU configuration.

4. Configure Resources

Request CPU cores, memory, and disk for functions.

@app.function(
    cpu=8.0,           # 8 physical cores
    memory=32768,      # 32 GiB RAM
    ephemeral_disk=10240  # 10 GiB disk
)
def memory_intensive_task():
    pass

Default allocation: 0.125 CPU cores, 128 MiB memory. Billing based on reservation or actual usage, whichever is higher.

See references/resources.md for resource limits and billing details.

5. Scale Automatically

Modal autoscales functions from zero to thousands of containers based on demand.

Process inputs in parallel:

@app.function()
def analyze_sample(sample_id: int):
    # Process single sample
    return result

@app.local_entrypoint()
def main():
    sample_ids = range(1000)
    # Automatically parallelized across containers
    results = list(analyze_sample.map(sample_ids))

Configure autoscaling:

@app.function(
    max_containers=100,      # Upper limit
    min_containers=2,        # Keep warm
    buffer_containers=5      # Idle buffer for bursts
)
def inference():
    pass

See references/scaling.md for autoscaling configuration, concurrency, and scaling limits.

6. Store Data Persistently

Use Volumes for persistent storage across function invocations.

volume = modal.Volume.from_name("my-data", create_if_missing=True)

@app.function(volumes={"/data": volume})
def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # Persist changes

Volumes persist data between runs, store model weights, cache datasets, and share data between functions.

See references/volumes.md for volume management, commits, and caching patterns.

7. Manage Secrets

Store API keys and credentials securely using Modal Secrets.

@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
    import os
    token = os.environ["HF_TOKEN"]
    # Use token for authentication

Create secrets in Modal dashboard or via CLI:

modal secret create my-secret KEY=value API_TOKEN=xyz

See references/secrets.md for secret management and authentication patterns.

8. Deploy Web Endpoints

Serve HTTP endpoints, APIs, and webhooks with @modal.web_endpoint().

@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
    # Process request
    result = model.predict(data["input"])
    return {"prediction": result}

Deploy with:

modal deploy script.py

Modal provides HTTPS URL for the endpoint.

See references/web-endpoints.md for FastAPI integration, streaming, authentication, and WebSocket support.

9. Schedule Jobs

Run functions on a schedule with cron expressions.

@app.function(schedule=modal.Cron("0 2 * * *"))  # Daily at 2 AM
def daily_backup():
    # Backup data
    pass

@app.function(schedule=modal.Period(hours=4))  # Every 4 hours
def refresh_cache():
    # Update cache
    pass

Scheduled functions run automatically without manual invocation.

See references/scheduled-jobs.md for cron syntax, timezone configuration, and monitoring.

Common Workflows

Deploy ML Model for Inference

import modal

# Define dependencies
image = modal.Image.debian_slim().uv_pip_install("torch", "transformers")
app = modal.App("llm-inference", image=image)

# Download model at build time
@app.function()
def download_model():
    from transformers import AutoModel
    AutoModel.from_pretrained("bert-base-uncased")

# Serve model
@app.cls(gpu="L40S")
class Model:
    @modal.enter()
    def load_model(self):
        from transformers import pipeline
        self.pipe = pipeline("text-classification", device="cuda")

    @modal.method()
    def predict(self, text: str):
        return self.pipe(text)

@app.local_entrypoint()
def main():
    model = Model()
    result = model.predict.remote("Modal is great!")
    print(result)

Batch Process Large Dataset

@app.function(cpu=2.0, memory=4096)
def process_file(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    # Process data
    return df.shape[0]

@app.local_entrypoint()
def main():
    files = ["file1.csv", "file2.csv", ...]  # 1000s of files
    # Automatically parallelized across containers
    for count in process_file.map(files):
        print(f"Processed {count} rows")

Train Model on GPU

@app.function(
    gpu="A100:2",      # 2x A100 GPUs
    timeout=3600       # 1 hour timeout
)
def train_model(config: dict):
    import torch
    # Multi-GPU training code
    model = create_model(config)
    train(model)
    return metrics

Reference Documentation

Detailed documentation for specific features:

references/getting-started.md - Authentication, setup, basic concepts
references/images.md - Image building, dependencies, Dockerfiles
references/functions.md - Function patterns, deployment, parameters
references/gpu.md - GPU types, CUDA, multi-GPU configuration
references/resources.md - CPU, memory, disk management
references/scaling.md - Autoscaling, parallel execution, concurrency
references/volumes.md - Persistent storage, data management
- Environment variables, authentication

Best Practices

Pin dependencies in .uv_pip_install() for reproducible builds
Use appropriate GPU types - L40S for inference, H100/A100 for training
Leverage caching - Use Volumes for model weights and datasets
Configure autoscaling - Set max_containers and min_containers based on workload
Import packages in function body if not available locally
Use.map() for parallel processing instead of sequential loops
Store secrets securely - Never hardcode API keys
Monitor costs - Check Modal dashboard for usage and billing

Troubleshooting

"Module not found" errors:

Add packages to image with .uv_pip_install("package-name")
Import packages inside function body if not available locally

GPU not detected:

Verify GPU specification: @app.function(gpu="A100")
Check CUDA availability: torch.cuda.is_available()

Function timeout:

Increase timeout: @app.function(timeout=3600)
Default timeout is 5 minutes

Volume changes not persisting:

Call volume.commit() after writing files
Verify volume mounted correctly in function decorator

For additional help, see Modal documentation at https://modal.com/docs or join Modal Slack community.

Weekly Installs

137

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubFail SocketPass SnykWarn

Installed on

claude-code115

opencode107

cursor99

gemini-cli98

antigravity92

codex90

references/secrets.md

references/web-endpoints.md - APIs, webhooks, endpoints

references/scheduled-jobs.md - Cron jobs, periodic tasks

references/examples.md - Common patterns for scientific computing

Modal 无服务器平台：Python 云函数、GPU 加速计算和 AI/ML 部署指南

🇨🇳中文介绍

Modal

概述

何时使用此技能

身份验证与设置

初始设置

验证设置

相关 Skills

核心功能

1. 定义容器镜像

2. 创建函数

3. 请求 GPU

4. 配置资源

5. 自动扩展

6. 持久化存储数据

7. 管理密钥

8. 部署 Web 端点

9. 调度任务

常见工作流

部署 ML 模型进行推理

批处理大型数据集

在 GPU 上训练模型

参考文档

最佳实践

故障排除

🇺🇸English

Modal

Overview

When to Use This Skill

Authentication and Setup

Initial Setup

Verify Setup

Core Capabilities

1. Define Container Images

2. Create Functions

3. Request GPUs

4. Configure Resources

5. Scale Automatically

6. Store Data Persistently

7. Manage Secrets

8. Deploy Web Endpoints

9. Schedule Jobs

Common Workflows

Deploy ML Model for Inference

Batch Process Large Dataset

Train Model on GPU

Reference Documentation

Best Practices

Troubleshooting

最新 Skills