stable-diffusion-image-generation by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill stable-diffusion-image-generation使用 HuggingFace Diffusers 库进行 Stable Diffusion 图像生成的综合指南。
在以下情况下使用 Stable Diffusion:
主要特性:
在以下情况下使用替代方案:
pip install diffusers transformers accelerate torch
pip install xformers # 可选:内存高效的注意力机制
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
from diffusers import DiffusionPipeline
import torch
# 加载 pipeline(自动检测模型类型)
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe.to("cuda")
# 生成图像
image = pipe(
"A serene mountain landscape at sunset, highly detailed",
num_inference_steps=50,
guidance_scale=7.5
).images[0]
image.save("output.png")
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")
# 启用内存优化
pipe.enable_model_cpu_offload()
image = pipe(
prompt="A futuristic city with flying cars, cinematic lighting",
height=1024,
width=1024,
num_inference_steps=30
).images[0]
Diffusers 围绕三个核心组件构建:
Pipeline (编排)
├── Model (神经网络)
│ ├── UNet / Transformer (噪声预测)
│ ├── VAE (潜在编码/解码)
│ └── Text Encoder (CLIP/T5)
└── Scheduler (去噪算法)
文本提示 → 文本编码器 → 文本嵌入
↓
随机噪声 → [去噪循环] ← Scheduler
↓
预测噪声
↓
VAE 解码器 → 最终图像
Pipelines 编排完整的工作流:
| Pipeline | 用途 |
|---|---|
StableDiffusionPipeline | 文本到图像 (SD 1.x/2.x) |
StableDiffusionXLPipeline | 文本到图像 (SDXL) |
StableDiffusion3Pipeline | 文本到图像 (SD 3.0) |
FluxPipeline | 文本到图像 (Flux 模型) |
StableDiffusionImg2ImgPipeline | 图像到图像 |
StableDiffusionInpaintPipeline | 图像修复 |
Schedulers 控制去噪过程:
| Scheduler | 步数 | 质量 | 使用场景 |
|---|---|---|---|
EulerDiscreteScheduler | 20-50 | 良好 | 默认选择 |
EulerAncestralDiscreteScheduler | 20-50 | 良好 | 更多变化 |
DPMSolverMultistepScheduler | 15-25 | 优秀 | 快速,高质量 |
DDIMScheduler | 50-100 | 良好 | 确定性 |
LCMScheduler | 4-8 | 良好 | 非常快 |
UniPCMultistepScheduler | 15-25 | 优秀 | 快速收敛 |
from diffusers import DPMSolverMultistepScheduler
# 切换为更快的生成
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config
)
# 现在用更少的步数生成
image = pipe(prompt, num_inference_steps=20).images[0]
| 参数 | 默认值 | 描述 |
|---|---|---|
prompt | 必需 | 期望图像的文本描述 |
negative_prompt | None | 图像中要避免的内容 |
num_inference_steps | 50 | 去噪步数(越多 = 质量越好) |
guidance_scale | 7.5 | 提示词遵循度(通常 7-12) |
height, width | 512/1024 | 输出尺寸(8 的倍数) |
generator | None | 用于可重现性的 Torch 生成器 |
num_images_per_prompt | 1 | 批处理大小 |
import torch
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
prompt="A cat wearing a top hat",
generator=generator,
num_inference_steps=50
).images[0]
image = pipe(
prompt="Professional photo of a dog in a garden",
negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",
guidance_scale=7.5
).images[0]
在文本引导下转换现有图像:
from diffusers import AutoPipelineForImage2Image
from PIL import Image
pipe = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
init_image = Image.open("input.jpg").resize((512, 512))
image = pipe(
prompt="A watercolor painting of the scene",
image=init_image,
strength=0.75, # 转换程度 (0-1)
num_inference_steps=50
).images[0]
填充遮罩区域:
from diffusers import AutoPipelineForInpainting
from PIL import Image
pipe = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16
).to("cuda")
image = Image.open("photo.jpg")
mask = Image.open("mask.png") # 白色 = 修复区域
result = pipe(
prompt="A red car parked on the street",
image=image,
mask_image=mask,
num_inference_steps=50
).images[0]
添加空间条件控制以实现精确控制:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
# 加载用于边缘条件控制的 ControlNet
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_canny",
torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
# 使用 Canny 边缘图像作为控制条件
control_image = get_canny_image(input_image)
image = pipe(
prompt="A beautiful house in the style of Van Gogh",
image=control_image,
num_inference_steps=30
).images[0]
| ControlNet | 输入类型 | 使用场景 |
|---|---|---|
canny | 边缘图 | 保留结构 |
openpose | 姿态骨架 | 人体姿态 |
depth | 深度图 | 3D 感知生成 |
normal | 法线图 | 表面细节 |
mlsd | 线段 | 建筑线条 |
scribble | 粗略草图 | 草图到图像 |
加载微调后的风格适配器:
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
# 加载 LoRA 权重
pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")
# 使用 LoRA 风格生成
image = pipe("A portrait in the trained style").images[0]
# 调整 LoRA 强度
pipe.fuse_lora(lora_scale=0.8)
# 卸载 LoRA
pipe.unload_lora_weights()
# 加载多个 LoRAs
pipe.load_lora_weights("lora1", adapter_name="style")
pipe.load_lora_weights("lora2", adapter_name="character")
# 为每个设置权重
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
image = pipe("A portrait").images[0]
# 模型 CPU 卸载 - 在不使用时将模型移至 CPU
pipe.enable_model_cpu_offload()
# 顺序 CPU 卸载 - 更激进,更慢
pipe.enable_sequential_cpu_offload()
# 通过分块计算注意力来减少内存
pipe.enable_attention_slicing()
# 或指定分块大小
pipe.enable_attention_slicing("max")
# 需要 xformers 包
pipe.enable_xformers_memory_efficient_attention()
# 对大图像进行分块解码潜在表示
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
# FP16 (GPU 推荐)
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.float16,
variant="fp16"
)
# BF16 (更好的精度,需要 Ampere+ GPU)
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.bfloat16
)
from diffusers import UNet2DConditionModel, AutoencoderKL
# 加载自定义 VAE
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
# 在 pipeline 中使用
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
vae=vae,
torch_dtype=torch.float16
)
高效生成多张图像:
# 多个提示词
prompts = [
"A cat playing piano",
"A dog reading a book",
"A bird painting a picture"
]
images = pipe(prompts, num_inference_steps=30).images
# 每个提示词生成多张图像
images = pipe(
"A beautiful sunset",
num_images_per_prompt=4,
num_inference_steps=30
).images
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch
# 1. 加载优化后的 SDXL
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
# 2. 使用高质量设置生成
image = pipe(
prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur",
negative_prompt="blurry, low quality, cartoon, anime, sketch",
num_inference_steps=30,
guidance_scale=7.5,
height=1024,
width=1024
).images[0]
from diffusers import AutoPipelineForText2Image, LCMScheduler
import torch
# 使用 LCM 进行 4-8 步生成
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
# 加载用于快速生成的 LCM LoRA
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.fuse_lora()
# 约 1 秒内生成
image = pipe(
"A beautiful landscape",
num_inference_steps=4,
guidance_scale=1.0
).images[0]
CUDA 内存不足:
# 启用内存优化
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
# 或使用更低精度
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
黑色/噪声图像:
# 检查 VAE 配置
# 如果需要,绕过安全检查器
pipe.safety_checker = None
# 确保正确的 dtype 一致性
pipe = pipe.to(dtype=torch.float16)
生成速度慢:
# 使用更快的调度器
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
# 减少步数
image = pipe(prompt, num_inference_steps=20).images[0]
每周安装量
371
代码仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
安装于
opencode317
gemini-cli304
cursor293
codex286
github-copilot269
amp223
Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.
Use Stable Diffusion when:
Key features:
Use alternatives instead:
pip install diffusers transformers accelerate torch
pip install xformers # Optional: memory-efficient attention
from diffusers import DiffusionPipeline
import torch
# Load pipeline (auto-detects model type)
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
)
pipe.to("cuda")
# Generate image
image = pipe(
"A serene mountain landscape at sunset, highly detailed",
num_inference_steps=50,
guidance_scale=7.5
).images[0]
image.save("output.png")
from diffusers import AutoPipelineForText2Image
import torch
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")
# Enable memory optimization
pipe.enable_model_cpu_offload()
image = pipe(
prompt="A futuristic city with flying cars, cinematic lighting",
height=1024,
width=1024,
num_inference_steps=30
).images[0]
Diffusers is built around three core components:
Pipeline (orchestration)
├── Model (neural networks)
│ ├── UNet / Transformer (noise prediction)
│ ├── VAE (latent encoding/decoding)
│ └── Text Encoder (CLIP/T5)
└── Scheduler (denoising algorithm)
Text Prompt → Text Encoder → Text Embeddings
↓
Random Noise → [Denoising Loop] ← Scheduler
↓
Predicted Noise
↓
VAE Decoder → Final Image
Pipelines orchestrate complete workflows:
| Pipeline | Purpose |
|---|---|
StableDiffusionPipeline | Text-to-image (SD 1.x/2.x) |
StableDiffusionXLPipeline | Text-to-image (SDXL) |
StableDiffusion3Pipeline | Text-to-image (SD 3.0) |
FluxPipeline | Text-to-image (Flux models) |
StableDiffusionImg2ImgPipeline | Image-to-image |
StableDiffusionInpaintPipeline |
Schedulers control the denoising process:
| Scheduler | Steps | Quality | Use Case |
|---|---|---|---|
EulerDiscreteScheduler | 20-50 | Good | Default choice |
EulerAncestralDiscreteScheduler | 20-50 | Good | More variation |
DPMSolverMultistepScheduler | 15-25 | Excellent | Fast, high quality |
DDIMScheduler | 50-100 | Good | Deterministic |
from diffusers import DPMSolverMultistepScheduler
# Swap for faster generation
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config
)
# Now generate with fewer steps
image = pipe(prompt, num_inference_steps=20).images[0]
| Parameter | Default | Description |
|---|---|---|
prompt | Required | Text description of desired image |
negative_prompt | None | What to avoid in the image |
num_inference_steps | 50 | Denoising steps (more = better quality) |
guidance_scale | 7.5 | Prompt adherence (7-12 typical) |
height, width |
import torch
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
prompt="A cat wearing a top hat",
generator=generator,
num_inference_steps=50
).images[0]
image = pipe(
prompt="Professional photo of a dog in a garden",
negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",
guidance_scale=7.5
).images[0]
Transform existing images with text guidance:
from diffusers import AutoPipelineForImage2Image
from PIL import Image
pipe = AutoPipelineForImage2Image.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
init_image = Image.open("input.jpg").resize((512, 512))
image = pipe(
prompt="A watercolor painting of the scene",
image=init_image,
strength=0.75, # How much to transform (0-1)
num_inference_steps=50
).images[0]
Fill masked regions:
from diffusers import AutoPipelineForInpainting
from PIL import Image
pipe = AutoPipelineForInpainting.from_pretrained(
"runwayml/stable-diffusion-inpainting",
torch_dtype=torch.float16
).to("cuda")
image = Image.open("photo.jpg")
mask = Image.open("mask.png") # White = inpaint region
result = pipe(
prompt="A red car parked on the street",
image=image,
mask_image=mask,
num_inference_steps=50
).images[0]
Add spatial conditioning for precise control:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
# Load ControlNet for edge conditioning
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_canny",
torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
controlnet=controlnet,
torch_dtype=torch.float16
).to("cuda")
# Use Canny edge image as control
control_image = get_canny_image(input_image)
image = pipe(
prompt="A beautiful house in the style of Van Gogh",
image=control_image,
num_inference_steps=30
).images[0]
| ControlNet | Input Type | Use Case |
|---|---|---|
canny | Edge maps | Preserve structure |
openpose | Pose skeletons | Human poses |
depth | Depth maps | 3D-aware generation |
normal | Normal maps | Surface details |
mlsd | Line segments | Architectural lines |
Load fine-tuned style adapters:
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
# Load LoRA weights
pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")
# Generate with LoRA style
image = pipe("A portrait in the trained style").images[0]
# Adjust LoRA strength
pipe.fuse_lora(lora_scale=0.8)
# Unload LoRA
pipe.unload_lora_weights()
# Load multiple LoRAs
pipe.load_lora_weights("lora1", adapter_name="style")
pipe.load_lora_weights("lora2", adapter_name="character")
# Set weights for each
pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])
image = pipe("A portrait").images[0]
# Model CPU offload - moves models to CPU when not in use
pipe.enable_model_cpu_offload()
# Sequential CPU offload - more aggressive, slower
pipe.enable_sequential_cpu_offload()
# Reduce memory by computing attention in chunks
pipe.enable_attention_slicing()
# Or specific chunk size
pipe.enable_attention_slicing("max")
# Requires xformers package
pipe.enable_xformers_memory_efficient_attention()
# Decode latents in tiles for large images
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
# FP16 (recommended for GPU)
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.float16,
variant="fp16"
)
# BF16 (better precision, requires Ampere+ GPU)
pipe = DiffusionPipeline.from_pretrained(
"model-id",
torch_dtype=torch.bfloat16
)
from diffusers import UNet2DConditionModel, AutoencoderKL
# Load custom VAE
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")
# Use with pipeline
pipe = DiffusionPipeline.from_pretrained(
"stable-diffusion-v1-5/stable-diffusion-v1-5",
vae=vae,
torch_dtype=torch.float16
)
Generate multiple images efficiently:
# Multiple prompts
prompts = [
"A cat playing piano",
"A dog reading a book",
"A bird painting a picture"
]
images = pipe(prompts, num_inference_steps=30).images
# Multiple images per prompt
images = pipe(
"A beautiful sunset",
num_images_per_prompt=4,
num_inference_steps=30
).images
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler
import torch
# 1. Load SDXL with optimizations
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16"
)
pipe.to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
# 2. Generate with quality settings
image = pipe(
prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur",
negative_prompt="blurry, low quality, cartoon, anime, sketch",
num_inference_steps=30,
guidance_scale=7.5,
height=1024,
width=1024
).images[0]
from diffusers import AutoPipelineForText2Image, LCMScheduler
import torch
# Use LCM for 4-8 step generation
pipe = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
# Load LCM LoRA for fast generation
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.fuse_lora()
# Generate in ~1 second
image = pipe(
"A beautiful landscape",
num_inference_steps=4,
guidance_scale=1.0
).images[0]
CUDA out of memory:
# Enable memory optimizations
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
# Or use lower precision
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
Black/noise images:
# Check VAE configuration
# Use safety checker bypass if needed
pipe.safety_checker = None
# Ensure proper dtype consistency
pipe = pipe.to(dtype=torch.float16)
Slow generation:
# Use faster scheduler
from diffusers import DPMSolverMultistepScheduler
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
# Reduce steps
image = pipe(prompt, num_inference_steps=20).images[0]
Weekly Installs
371
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode317
gemini-cli304
cursor293
codex286
github-copilot269
amp223
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
103,800 周安装
| Inpainting |
LCMScheduler | 4-8 | Good | Very fast |
UniPCMultistepScheduler | 15-25 | Excellent | Fast convergence |
| 512/1024 |
| Output dimensions (multiples of 8) |
generator | None | Torch generator for reproducibility |
num_images_per_prompt | 1 | Batch size |
scribble | Rough sketches | Sketch-to-image |