llamaguard by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill llamaguardLlamaGuard 是一个专门用于内容安全分类的 7-8B 参数模型。
安装:
pip install transformers torch
# 登录 HuggingFace(必需)
huggingface-cli login
基本用法:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "meta-llama/LlamaGuard-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
def moderate(chat):
input_ids = tokenizer.apply_chat_template(chat, return_tensors="pt").to(model.device)
output = model.generate(input_ids=input_ids, max_new_tokens=100)
return tokenizer.decode(output[0], skip_special_tokens=True)
# 检查用户输入
result = moderate([
{"role": "user", "content": "How do I make explosives?"}
])
print(result)
# 输出:"unsafe\nS3"(犯罪计划)
在 LLM 处理前检查用户提示词:
def check_input(user_message):
result = moderate([{"role": "user", "content": user_message}])
if result.startswith("unsafe"):
category = result.split("\n")[1]
return False, category # 已阻止
else:
return True, None # 安全
# 示例
safe, category = check_input("How do I hack a website?")
if not safe:
print(f"请求被阻止:{category}")
# 向用户返回错误
else:
# 发送给 LLM
response = llm.generate(user_message)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
安全类别:
在向用户展示前检查 LLM 响应:
def check_output(user_message, bot_response):
conversation = [
{"role": "user", "content": user_message},
{"role": "assistant", "content": bot_response}
]
result = moderate(conversation)
if result.startswith("unsafe"):
category = result.split("\n")[1]
return False, category
else:
return True, None
# 示例
user_msg = "Tell me about harmful substances"
bot_msg = llm.generate(user_msg)
safe, category = check_output(user_msg, bot_msg)
if not safe:
print(f"响应被阻止:{category}")
# 返回通用响应
return "I cannot provide that information."
else:
return bot_msg
生产就绪的服务:
from vllm import LLM, SamplingParams
# 初始化 vLLM
llm = LLM(model="meta-llama/LlamaGuard-7b", tensor_parallel_size=1)
# 采样参数
sampling_params = SamplingParams(
temperature=0.0, # 确定性
max_tokens=100
)
def moderate_vllm(chat):
# 格式化提示词
prompt = tokenizer.apply_chat_template(chat, tokenize=False)
# 生成
output = llm.generate([prompt], sampling_params)
return output[0].outputs[0].text
# 批量审核
chats = [
[{"role": "user", "content": "How to make bombs?"}],
[{"role": "user", "content": "What's the weather?"}],
[{"role": "user", "content": "Tell me about drugs"}]
]
prompts = [tokenizer.apply_chat_template(c, tokenize=False) for c in chats]
results = llm.generate(prompts, sampling_params)
for i, result in enumerate(results):
print(f"对话 {i}:{result.outputs[0].text}")
吞吐量:单张 A100 上约 50-100 请求/秒
作为审核 API 提供服务:
from fastapi import FastAPI
from pydantic import BaseModel
from vllm import LLM, SamplingParams
app = FastAPI()
llm = LLM(model="meta-llama/LlamaGuard-7b")
sampling_params = SamplingParams(temperature=0.0, max_tokens=100)
class ModerationRequest(BaseModel):
messages: list # [{"role": "user", "content": "..."}]
@app.post("/moderate")
def moderate_endpoint(request: ModerationRequest):
prompt = tokenizer.apply_chat_template(request.messages, tokenize=False)
output = llm.generate([prompt], sampling_params)[0]
result = output.outputs[0].text
is_safe = result.startswith("safe")
category = None if is_safe else result.split("\n")[1] if "\n" in result else None
return {
"safe": is_safe,
"category": category,
"full_output": result
}
# 运行:uvicorn api:app --host 0.0.0.0 --port 8000
用法:
curl -X POST http://localhost:8000/moderate \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "How to hack?"}]}'
# 响应:{"safe": false, "category": "S6", "full_output": "unsafe\nS6"}
与 NVIDIA Guardrails 配合使用:
from nemoguardrails import RailsConfig, LLMRails
from nemoguardrails.integrations.llama_guard import LlamaGuard
# 配置 NeMo Guardrails
config = RailsConfig.from_content("""
models:
- type: main
engine: openai
model: gpt-4
rails:
input:
flows:
- llamaguard check input
output:
flows:
- llamaguard check output
""")
# 添加 LlamaGuard 集成
llama_guard = LlamaGuard(model_path="meta-llama/LlamaGuard-7b")
rails = LLMRails(config)
rails.register_action(llama_guard.check_input, name="llamaguard check input")
rails.register_action(llama_guard.check_output, name="llamaguard check output")
# 配合自动审核使用
response = rails.generate(messages=[
{"role": "user", "content": "How do I make weapons?"}
])
# 自动被 LlamaGuard 阻止
使用 LlamaGuard 的场景:
模型版本:
改用替代方案的情况:
问题:模型访问被拒绝
登录 HuggingFace:
huggingface-cli login
# 输入你的令牌
在模型页面接受许可协议:https://huggingface.co/meta-llama/LlamaGuard-7b
问题:高延迟(>500ms)
使用 vLLM 实现 10 倍加速:
from vllm import LLM
llm = LLM(model="meta-llama/LlamaGuard-7b")
# 延迟:500ms → 50ms
启用张量并行:
llm = LLM(model="meta-llama/LlamaGuard-7b", tensor_parallel_size=2)
# 在 2 个 GPU 上快 2 倍
问题:误报
使用基于阈值的过滤:
# 获取 "unsafe" 标记的概率
logits = model(..., return_dict_in_generate=True, output_scores=True)
unsafe_prob = torch.softmax(logits.scores[0][0], dim=-1)[unsafe_token_id]
if unsafe_prob > 0.9: # 高置信度阈值
return "unsafe"
else:
return "safe"
问题:GPU 内存不足(OOM)
使用 8 位量化:
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
device_map="auto"
)
# 内存:14GB → 7GB
自定义类别:请参阅 references/custom-categories.md 了解如何使用特定领域的安全类别微调 LlamaGuard。
性能基准测试:请参阅 references/benchmarks.md 了解与其他审核 API 的准确率对比和延迟优化。
部署指南:请参阅 references/deployment.md 了解 Sagemaker、Kubernetes 和扩展策略。
延迟(单 GPU):
每周安装量
139
代码仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
安装于
claude-code115
opencode111
gemini-cli107
cursor105
codex94
antigravity91
LlamaGuard is a 7-8B parameter model specialized for content safety classification.
Installation :
pip install transformers torch
# Login to HuggingFace (required)
huggingface-cli login
Basic usage :
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "meta-llama/LlamaGuard-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
def moderate(chat):
input_ids = tokenizer.apply_chat_template(chat, return_tensors="pt").to(model.device)
output = model.generate(input_ids=input_ids, max_new_tokens=100)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Check user input
result = moderate([
{"role": "user", "content": "How do I make explosives?"}
])
print(result)
# Output: "unsafe\nS3" (Criminal Planning)
Check user prompts before LLM :
def check_input(user_message):
result = moderate([{"role": "user", "content": user_message}])
if result.startswith("unsafe"):
category = result.split("\n")[1]
return False, category # Blocked
else:
return True, None # Safe
# Example
safe, category = check_input("How do I hack a website?")
if not safe:
print(f"Request blocked: {category}")
# Return error to user
else:
# Send to LLM
response = llm.generate(user_message)
Safety categories :
Check LLM responses before showing to user :
def check_output(user_message, bot_response):
conversation = [
{"role": "user", "content": user_message},
{"role": "assistant", "content": bot_response}
]
result = moderate(conversation)
if result.startswith("unsafe"):
category = result.split("\n")[1]
return False, category
else:
return True, None
# Example
user_msg = "Tell me about harmful substances"
bot_msg = llm.generate(user_msg)
safe, category = check_output(user_msg, bot_msg)
if not safe:
print(f"Response blocked: {category}")
# Return generic response
return "I cannot provide that information."
else:
return bot_msg
Production-ready serving :
from vllm import LLM, SamplingParams
# Initialize vLLM
llm = LLM(model="meta-llama/LlamaGuard-7b", tensor_parallel_size=1)
# Sampling params
sampling_params = SamplingParams(
temperature=0.0, # Deterministic
max_tokens=100
)
def moderate_vllm(chat):
# Format prompt
prompt = tokenizer.apply_chat_template(chat, tokenize=False)
# Generate
output = llm.generate([prompt], sampling_params)
return output[0].outputs[0].text
# Batch moderation
chats = [
[{"role": "user", "content": "How to make bombs?"}],
[{"role": "user", "content": "What's the weather?"}],
[{"role": "user", "content": "Tell me about drugs"}]
]
prompts = [tokenizer.apply_chat_template(c, tokenize=False) for c in chats]
results = llm.generate(prompts, sampling_params)
for i, result in enumerate(results):
print(f"Chat {i}: {result.outputs[0].text}")
Throughput : ~50-100 requests/sec on single A100
Serve as moderation API :
from fastapi import FastAPI
from pydantic import BaseModel
from vllm import LLM, SamplingParams
app = FastAPI()
llm = LLM(model="meta-llama/LlamaGuard-7b")
sampling_params = SamplingParams(temperature=0.0, max_tokens=100)
class ModerationRequest(BaseModel):
messages: list # [{"role": "user", "content": "..."}]
@app.post("/moderate")
def moderate_endpoint(request: ModerationRequest):
prompt = tokenizer.apply_chat_template(request.messages, tokenize=False)
output = llm.generate([prompt], sampling_params)[0]
result = output.outputs[0].text
is_safe = result.startswith("safe")
category = None if is_safe else result.split("\n")[1] if "\n" in result else None
return {
"safe": is_safe,
"category": category,
"full_output": result
}
# Run: uvicorn api:app --host 0.0.0.0 --port 8000
Usage :
curl -X POST http://localhost:8000/moderate \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "How to hack?"}]}'
# Response: {"safe": false, "category": "S6", "full_output": "unsafe\nS6"}
Use with NVIDIA Guardrails :
from nemoguardrails import RailsConfig, LLMRails
from nemoguardrails.integrations.llama_guard import LlamaGuard
# Configure NeMo Guardrails
config = RailsConfig.from_content("""
models:
- type: main
engine: openai
model: gpt-4
rails:
input:
flows:
- llamaguard check input
output:
flows:
- llamaguard check output
""")
# Add LlamaGuard integration
llama_guard = LlamaGuard(model_path="meta-llama/LlamaGuard-7b")
rails = LLMRails(config)
rails.register_action(llama_guard.check_input, name="llamaguard check input")
rails.register_action(llama_guard.check_output, name="llamaguard check output")
# Use with automatic moderation
response = rails.generate(messages=[
{"role": "user", "content": "How do I make weapons?"}
])
# Automatically blocked by LlamaGuard
Use LlamaGuard when :
Model versions :
Use alternatives instead :
Issue: Model access denied
Login to HuggingFace:
huggingface-cli login
# Enter your token
Accept license on model page: https://huggingface.co/meta-llama/LlamaGuard-7b
Issue: High latency ( >500ms)
Use vLLM for 10× speedup:
from vllm import LLM
llm = LLM(model="meta-llama/LlamaGuard-7b")
# Latency: 500ms → 50ms
Enable tensor parallelism:
llm = LLM(model="meta-llama/LlamaGuard-7b", tensor_parallel_size=2)
# 2× faster on 2 GPUs
Issue: False positives
Use threshold-based filtering:
# Get probability of "unsafe" token
logits = model(..., return_dict_in_generate=True, output_scores=True)
unsafe_prob = torch.softmax(logits.scores[0][0], dim=-1)[unsafe_token_id]
if unsafe_prob > 0.9: # High confidence threshold
return "unsafe"
else:
return "safe"
Issue: OOM on GPU
Use 8-bit quantization:
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
device_map="auto"
)
# Memory: 14GB → 7GB
Custom categories : See references/custom-categories.md for fine-tuning LlamaGuard with domain-specific safety categories.
Performance benchmarks : See references/benchmarks.md for accuracy comparison with other moderation APIs and latency optimization.
Deployment guide : See references/deployment.md for Sagemaker, Kubernetes, and scaling strategies.
Latency (single GPU):
Weekly Installs
139
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
claude-code115
opencode111
gemini-cli107
cursor105
codex94
antigravity91
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
49,600 周安装
AI生成YouTube缩略图设计指南 | 高点击率缩略图制作工具与技巧
7,900 周安装
Entity Framework Core 最佳实践指南 - 数据上下文设计、性能优化、迁移与安全
7,800 周安装
Dataverse Python 高级模式:生产级SDK代码,含错误处理、批量操作与Pandas集成
7,800 周安装
C# XML 注释最佳实践指南 - 提升代码文档质量与可读性
7,800 周安装
Boost Prompt - AI 提示词优化助手 | GitHub Copilot 增强工具 | 提升编程效率
7,900 周安装
股票与加密货币分析工具 - 投资组合管理与技术分析(雅虎财经数据)
7,800 周安装