重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
llamaguard by orchestra-research/ai-research-skills
npx skills add https://github.com/orchestra-research/ai-research-skills --skill llamaguardLlamaGuard 是一个专门用于内容安全分类的 7-8B 参数模型。
安装:
pip install transformers torch
# 登录 HuggingFace(必需)
huggingface-cli login
基本用法:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "meta-llama/LlamaGuard-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
def moderate(chat):
input_ids = tokenizer.apply_chat_template(chat, return_tensors="pt").to(model.device)
output = model.generate(input_ids=input_ids, max_new_tokens=100)
return tokenizer.decode(output[0], skip_special_tokens=True)
# 检查用户输入
result = moderate([
{"role": "user", "content": "How do I make explosives?"}
])
print(result)
# 输出:"unsafe\nS3" (犯罪计划)
在大语言模型前检查用户提示词:
def check_input(user_message):
result = moderate([{"role": "user", "content": user_message}])
if result.startswith("unsafe"):
category = result.split("\n")[1]
return False, category # 已阻止
else:
return True, None # 安全
# 示例
safe, category = check_input("How do I hack a website?")
if not safe:
print(f"请求被阻止:{category}")
# 向用户返回错误
else:
# 发送给大语言模型
response = llm.generate(user_message)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
安全类别:
在向用户展示前检查大语言模型响应:
def check_output(user_message, bot_response):
conversation = [
{"role": "user", "content": user_message},
{"role": "assistant", "content": bot_response}
]
result = moderate(conversation)
if result.startswith("unsafe"):
category = result.split("\n")[1]
return False, category
else:
return True, None
# 示例
user_msg = "Tell me about harmful substances"
bot_msg = llm.generate(user_msg)
safe, category = check_output(user_msg, bot_msg)
if not safe:
print(f"响应被阻止:{category}")
# 返回通用响应
return "我无法提供该信息。"
else:
return bot_msg
生产就绪的服务:
from vllm import LLM, SamplingParams
# 初始化 vLLM
llm = LLM(model="meta-llama/LlamaGuard-7b", tensor_parallel_size=1)
# 采样参数
sampling_params = SamplingParams(
temperature=0.0, # 确定性
max_tokens=100
)
def moderate_vllm(chat):
# 格式化提示词
prompt = tokenizer.apply_chat_template(chat, tokenize=False)
# 生成
output = llm.generate([prompt], sampling_params)
return output[0].outputs[0].text
# 批量审核
chats = [
[{"role": "user", "content": "How to make bombs?"}],
[{"role": "user", "content": "What's the weather?"}],
[{"role": "user", "content": "Tell me about drugs"}]
]
prompts = [tokenizer.apply_chat_template(c, tokenize=False) for c in chats]
results = llm.generate(prompts, sampling_params)
for i, result in enumerate(results):
print(f"对话 {i}: {result.outputs[0].text}")
吞吐量:单张 A100 上约 50-100 请求/秒
作为审核 API 提供服务:
from fastapi import FastAPI
from pydantic import BaseModel
from vllm import LLM, SamplingParams
app = FastAPI()
llm = LLM(model="meta-llama/LlamaGuard-7b")
sampling_params = SamplingParams(temperature=0.0, max_tokens=100)
class ModerationRequest(BaseModel):
messages: list # [{"role": "user", "content": "..."}]
@app.post("/moderate")
def moderate_endpoint(request: ModerationRequest):
prompt = tokenizer.apply_chat_template(request.messages, tokenize=False)
output = llm.generate([prompt], sampling_params)[0]
result = output.outputs[0].text
is_safe = result.startswith("safe")
category = None if is_safe else result.split("\n")[1] if "\n" in result else None
return {
"safe": is_safe,
"category": category,
"full_output": result
}
# 运行:uvicorn api:app --host 0.0.0.0 --port 8000
用法:
curl -X POST http://localhost:8000/moderate \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "How to hack?"}]}'
# 响应:{"safe": false, "category": "S6", "full_output": "unsafe\nS6"}
与 NVIDIA Guardrails 一起使用:
from nemoguardrails import RailsConfig, LLMRails
from nemoguardrails.integrations.llama_guard import LlamaGuard
# 配置 NeMo Guardrails
config = RailsConfig.from_content("""
models:
- type: main
engine: openai
model: gpt-4
rails:
input:
flows:
- llamaguard check input
output:
flows:
- llamaguard check output
""")
# 添加 LlamaGuard 集成
llama_guard = LlamaGuard(model_path="meta-llama/LlamaGuard-7b")
rails = LLMRails(config)
rails.register_action(llama_guard.check_input, name="llamaguard check input")
rails.register_action(llama_guard.check_output, name="llamaguard check output")
# 与自动审核一起使用
response = rails.generate(messages=[
{"role": "user", "content": "How do I make weapons?"}
])
# 自动被 LlamaGuard 阻止
使用 LlamaGuard 的场景:
模型版本:
改用替代方案的场景:
问题:模型访问被拒绝
登录 HuggingFace:
huggingface-cli login
# 输入你的令牌
问题:高延迟(>500ms)
使用 vLLM 实现 10 倍加速:
from vllm import LLM
llm = LLM(model="meta-llama/LlamaGuard-7b")
# 延迟:500ms → 50ms
启用张量并行:
llm = LLM(model="meta-llama/LlamaGuard-7b", tensor_parallel_size=2)
# 在 2 个 GPU 上快 2 倍
问题:误报
使用基于阈值的过滤:
# 获取 "unsafe" 标记的概率
logits = model(..., return_dict_in_generate=True, output_scores=True)
unsafe_prob = torch.softmax(logits.scores[0][0], dim=-1)[unsafe_token_id]
if unsafe_prob > 0.9: # 高置信度阈值
return "unsafe"
else:
return "safe"
问题:GPU 内存不足 (OOM)
使用 8 位量化:
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
device_map="auto"
)
# 内存:14GB → 7GB
自定义类别:请参阅 references/custom-categories.md 了解如何使用特定领域的安全类别微调 LlamaGuard。
性能基准测试:请参阅 references/benchmarks.md 了解与其他审核 API 的准确率对比和延迟优化。
部署指南:请参阅 references/deployment.md 了解 Sagemaker、Kubernetes 和扩展策略。
延迟(单 GPU):
每周安装量
66
代码仓库
GitHub 星标数
5.5K
首次出现
2026年2月7日
安全审计
安装于
opencode57
codex56
cursor56
gemini-cli55
claude-code54
github-copilot54
LlamaGuard is a 7-8B parameter model specialized for content safety classification.
Installation :
pip install transformers torch
# Login to HuggingFace (required)
huggingface-cli login
Basic usage :
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "meta-llama/LlamaGuard-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
def moderate(chat):
input_ids = tokenizer.apply_chat_template(chat, return_tensors="pt").to(model.device)
output = model.generate(input_ids=input_ids, max_new_tokens=100)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Check user input
result = moderate([
{"role": "user", "content": "How do I make explosives?"}
])
print(result)
# Output: "unsafe\nS3" (Criminal Planning)
Check user prompts before LLM :
def check_input(user_message):
result = moderate([{"role": "user", "content": user_message}])
if result.startswith("unsafe"):
category = result.split("\n")[1]
return False, category # Blocked
else:
return True, None # Safe
# Example
safe, category = check_input("How do I hack a website?")
if not safe:
print(f"Request blocked: {category}")
# Return error to user
else:
# Send to LLM
response = llm.generate(user_message)
Safety categories :
Check LLM responses before showing to user :
def check_output(user_message, bot_response):
conversation = [
{"role": "user", "content": user_message},
{"role": "assistant", "content": bot_response}
]
result = moderate(conversation)
if result.startswith("unsafe"):
category = result.split("\n")[1]
return False, category
else:
return True, None
# Example
user_msg = "Tell me about harmful substances"
bot_msg = llm.generate(user_msg)
safe, category = check_output(user_msg, bot_msg)
if not safe:
print(f"Response blocked: {category}")
# Return generic response
return "I cannot provide that information."
else:
return bot_msg
Production-ready serving :
from vllm import LLM, SamplingParams
# Initialize vLLM
llm = LLM(model="meta-llama/LlamaGuard-7b", tensor_parallel_size=1)
# Sampling params
sampling_params = SamplingParams(
temperature=0.0, # Deterministic
max_tokens=100
)
def moderate_vllm(chat):
# Format prompt
prompt = tokenizer.apply_chat_template(chat, tokenize=False)
# Generate
output = llm.generate([prompt], sampling_params)
return output[0].outputs[0].text
# Batch moderation
chats = [
[{"role": "user", "content": "How to make bombs?"}],
[{"role": "user", "content": "What's the weather?"}],
[{"role": "user", "content": "Tell me about drugs"}]
]
prompts = [tokenizer.apply_chat_template(c, tokenize=False) for c in chats]
results = llm.generate(prompts, sampling_params)
for i, result in enumerate(results):
print(f"Chat {i}: {result.outputs[0].text}")
Throughput : ~50-100 requests/sec on single A100
Serve as moderation API :
from fastapi import FastAPI
from pydantic import BaseModel
from vllm import LLM, SamplingParams
app = FastAPI()
llm = LLM(model="meta-llama/LlamaGuard-7b")
sampling_params = SamplingParams(temperature=0.0, max_tokens=100)
class ModerationRequest(BaseModel):
messages: list # [{"role": "user", "content": "..."}]
@app.post("/moderate")
def moderate_endpoint(request: ModerationRequest):
prompt = tokenizer.apply_chat_template(request.messages, tokenize=False)
output = llm.generate([prompt], sampling_params)[0]
result = output.outputs[0].text
is_safe = result.startswith("safe")
category = None if is_safe else result.split("\n")[1] if "\n" in result else None
return {
"safe": is_safe,
"category": category,
"full_output": result
}
# Run: uvicorn api:app --host 0.0.0.0 --port 8000
Usage :
curl -X POST http://localhost:8000/moderate \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "How to hack?"}]}'
# Response: {"safe": false, "category": "S6", "full_output": "unsafe\nS6"}
Use with NVIDIA Guardrails :
from nemoguardrails import RailsConfig, LLMRails
from nemoguardrails.integrations.llama_guard import LlamaGuard
# Configure NeMo Guardrails
config = RailsConfig.from_content("""
models:
- type: main
engine: openai
model: gpt-4
rails:
input:
flows:
- llamaguard check input
output:
flows:
- llamaguard check output
""")
# Add LlamaGuard integration
llama_guard = LlamaGuard(model_path="meta-llama/LlamaGuard-7b")
rails = LLMRails(config)
rails.register_action(llama_guard.check_input, name="llamaguard check input")
rails.register_action(llama_guard.check_output, name="llamaguard check output")
# Use with automatic moderation
response = rails.generate(messages=[
{"role": "user", "content": "How do I make weapons?"}
])
# Automatically blocked by LlamaGuard
Use LlamaGuard when :
Model versions :
Use alternatives instead :
Issue: Model access denied
Login to HuggingFace:
huggingface-cli login
# Enter your token
Accept license on model page: https://huggingface.co/meta-llama/LlamaGuard-7b
Issue: High latency ( >500ms)
Use vLLM for 10× speedup:
from vllm import LLM
llm = LLM(model="meta-llama/LlamaGuard-7b")
# Latency: 500ms → 50ms
Enable tensor parallelism:
llm = LLM(model="meta-llama/LlamaGuard-7b", tensor_parallel_size=2)
# 2× faster on 2 GPUs
Issue: False positives
Use threshold-based filtering:
# Get probability of "unsafe" token
logits = model(..., return_dict_in_generate=True, output_scores=True)
unsafe_prob = torch.softmax(logits.scores[0][0], dim=-1)[unsafe_token_id]
if unsafe_prob > 0.9: # High confidence threshold
return "unsafe"
else:
return "safe"
Issue: OOM on GPU
Use 8-bit quantization:
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=quantization_config,
device_map="auto"
)
# Memory: 14GB → 7GB
Custom categories : See references/custom-categories.md for fine-tuning LlamaGuard with domain-specific safety categories.
Performance benchmarks : See references/benchmarks.md for accuracy comparison with other moderation APIs and latency optimization.
Deployment guide : See references/deployment.md for Sagemaker, Kubernetes, and scaling strategies.
Latency (single GPU):
Weekly Installs
66
Repository
GitHub Stars
5.5K
First Seen
Feb 7, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode57
codex56
cursor56
gemini-cli55
claude-code54
github-copilot54
SoulTrace 人格评估 API - 基于五色心理模型的贝叶斯自适应测试
56,700 周安装
阿里云OSS ossutil 2.0 测试指南:验证AK配置与存储桶操作
278 周安装
解决方案空间探索:避免局部最优陷阱,提升软件设计决策质量
62 周安装
Notion研究文档自动化工具:AI驱动的研究整理与报告生成工作流
282 周安装
Docker容器化最佳实践指南:生产就绪容器构建、安全优化与CI/CD部署
283 周安装
使用 shadcn/ui 和 Radix Primitives 构建无障碍 UI 组件库 - CVA 变体与 OKLCH 主题指南
292 周安装
Web Browser Skill:极简CDP工具,自动化网页导航、JS执行与Cookie处理
281 周安装