OWASP LLM应用十大安全审计 | 2025版AI安全评估与风险管理指南

owasp-llm-top10 by mastepanoski/claude-skills

131 周安装量

19 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/mastepanoski/claude-skills --skill owasp-llm-top10

AI/机器学习风险管理安全

🇨🇳中文介绍

OWASP LLM 应用十大安全审计

此技能使 AI 代理能够使用 OWASP GenAI 安全项目发布的 OWASP LLM 应用十大安全风险（2025），对大型语言模型（LLM）和生成式 AI 应用进行全面的安全评估。

OWASP LLM 应用十大安全风险识别了集成大型语言模型的系统中最关键的安全风险，涵盖了从提示注入到无限制资源消耗等各种漏洞。这是 LLM 应用安全领域的权威行业标准。

使用此技能来识别安全漏洞、评估风险敞口、确定修复优先级，并为 AI 驱动的应用程序建立安全的开发实践。

可结合 "NIST AI RMF" 进行全面的风险管理，或结合 "ISO 42001 AI Governance" 以满足治理合规性要求。

何时使用此技能

在以下情况下调用此技能：

部署前审计 LLM 驱动的应用程序的安全性
审查 GenAI 集成的安全漏洞
评估 RAG（检索增强生成）系统
评估聊天机器人或 AI 助手的安全性
对 AI 功能进行渗透测试
构建安全的 AI 应用架构
审查第三方 AI API 集成
为安全合规性审查做准备
响应与 AI 相关的安全事件

所需输入

执行此审计时，请收集：

application_description : AI 应用程序的描述（用途、使用的 LLM、架构、功能、用户群）[必需]
architecture_details : 系统架构（API、数据库、向量存储、插件、集成）[可选但推荐]
llm_provider : LLM 提供商和模型（OpenAI GPT-4、Anthropic Claude、自托管等）[可选]
deployment_context : 部署环境（云、本地、混合、边缘）[可选]
data_sensitivity : 处理的数据类型（PII、金融、健康、专有）[可选]
existing_controls : 当前的安全措施（身份验证、速率限制、内容过滤）[可选]
specific_concerns : 已知的漏洞或关注领域 [可选]

OWASP LLM 应用十大安全风险 (2025)

LLM01: 提示注入

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

LLM02: 敏感信息泄露

严重性 : 严重

描述 : LLM 无意中通过其输出暴露机密数据，包括 PII、专有算法、凭证、知识产权或内部系统信息。

旨在提取训练数据的精心构造提示
触发记忆敏感内容的合法查询
揭示内部系统架构的模型输出
来自向量数据库的嵌入泄露

隐私侵犯和监管不合规（GDPR、CCPA）
知识产权盗窃
凭证暴露，为后续攻击提供便利
声誉损害

训练/微调数据中已移除 PII 和敏感数据
在日志和输出中对数据进行掩码和标记化处理
系统指令禁止敏感信息泄露
对已知敏感模式（SSN、信用卡、API 密钥）进行输出过滤
通过中间件将模型访问限制在必要信息范围内
教育用户不要粘贴机密内容
监控输出是否存在异常数据暴露

清理训练数据以移除敏感信息
在输出上实施数据防泄露（DLP）
应用访问控制以限制模型的数据访问范围
监控输出中的敏感数据模式
在训练中使用差分隐私技术

LLM03: 供应链漏洞

描述 : 受损的第三方组件（模型、数据集、库、插件）引入了安全风险，包括恶意软件、后门或有偏见的行为。

来自公共存储库的恶意预训练模型
嵌入触发器的中毒数据集
易受攻击的 ML 库和依赖项
具有未授权访问权限的受损插件
带有木马的微调适配器

系统被入侵和数据被盗
对生产系统的后门访问
影响所有用户的模型损坏
因未经许可的内容而产生的法律责任

模型来自经过验证的、信誉良好的提供商
已验证数字签名和校验和
已扫描模型文件中的可疑代码（使用 picklescan 等工具）
第三方模型部署在沙盒环境中
依赖项定期更新和审计
插件权限受允许列表限制
维护所有模型和组件的完整清单
为 AI 组件维护 SBOM（软件物料清单）

仅从可信、经过验证的提供商处获取模型
部署前扫描模型文件中的恶意代码
沙盒化第三方模型并限制其权限
维护更新的依赖项清单
实施模型签名和完整性验证

LLM04: 数据和模型投毒

描述 : 攻击者操纵训练或微调数据，以引入漏洞、后门或偏见，从而损害模型的安全性和可靠性。

包含隐藏触发短语的精心构造训练示例
训练期间吸收的中毒网络爬取内容
直接篡改模型权重或参数
恶意微调数据
细微的标签操纵或数据异常

有偏见或性能下降的模型输出
生产环境中触发激活的后门
模型可信度下降
难以检测的长期隐藏威胁

训练数据经过验证、清理和审计
数据来源可追溯并有文档记录
对众包数据进行速率限制和审核
应用了差分隐私技术
部署前使用已知触发短语测试模型
监控已部署模型的行为漂移
根据已知良好状态验证模型文件校验和

验证并清理所有训练数据源
实施数据来源追踪
应用差分隐私以限制单个数据的影响
部署前使用对抗性输入进行测试
监控生产模型是否存在意外行为

LLM05: 不当的输出处理

描述 : 应用程序盲目执行或渲染 LLM 输出而不进行验证，从而可能导致代码注入、XSS、SQL 注入、SSRF 和其他攻击。

输出中未转义的 HTML/JavaScript（XSS）
未经清理即执行的模型生成的 shell 命令
根据模型输出构建的 SQL 查询
基于 AI 建议的未经清理的 API 调用
通过 eval() 或 exec() 直接执行

远程代码执行
会话劫持
数据库操纵
权限提升
系统完全被入侵

所有 LLM 输出均被视为不可信输入
强制执行严格的输出模式验证（JSON、格式）
根据上下文（HTML、SQL、shell）对输出进行清理和转义
使用参数化查询而非原始 SQL
对可接受的输出模式使用允许列表
在沙盒环境中执行生成的代码
高影响操作需要人工批准
使用具有内置转义功能的渲染库

永远不要信任 LLM 输出；验证并清理所有内容
强制执行严格的输出模式
使用参数化查询和安全的 ORM 方法
沙盒化所有代码执行
特权操作需要人工批准

描述 : AI 代理拥有过多的权限和自主能力，可能通过受损的提示、幻觉或恶意操纵造成重大危害。

利用权限过高的代理进行提示注入
幻觉触发意外的高影响操作
利用 AI 提升的权限进行混淆代理攻击
具有过度访问权限的恶意插件
无限制的系统控制（电子邮件、API、数据库）

未经授权的数据传输
破坏性操作（删除、修改）
通过未经授权的交易造成财务损失
服务中断
自动化攻击放大

对所有 AI 能力应用最小权限原则
具有有限范围 OAuth 令牌的细粒度权限
功能跨多个范围狭窄的代理进行划分
限制高风险操作（删除、传输、设备控制）
重要操作需要明确的用户批准
对 AI 操作和 API 调用进行速率限制
所有代理活动的全面审计日志
监控并针对异常行为发出警报

仅授予必要的能力（最小权限）
划分代理功能
高影响操作需要人工批准
实施全面的审计日志记录
设置实时监控和异常检测

LLM07: 系统提示泄露

描述 : 用于指导 AI 行为的系统指令暴露给用户或攻击者，从而泄露内部逻辑、安全控制或敏感配置。

请求披露指令的提示注入
要求重复对话上下文的复杂探测
标记化异常导致意外泄露
通过行为观察进行逆向工程
模型无意中回显系统提示

安全逻辑暴露，为绕过攻击提供便利
如果提示中嵌入了秘密，则可能导致凭证泄露
内部系统知识泄露
促进更有针对性的攻击

系统提示中不包含密码、API 密钥或秘密
提示被视为公开信息
模型配置为拒绝透露系统消息
清晰的消息角色分隔符（系统/用户/助手）
在应用程序级别而非提示级别强制执行安全策略
监控输出是否存在提示泄露模式
定期使用已知提取技术进行测试

切勿在系统提示中嵌入敏感数据
在应用程序级别实施安全强制执行
配置模型以拒绝提示泄露
监控输出中的泄露模式
使用带有角色分隔符的结构化消息格式

LLM08: 向量和嵌入弱点

描述 : 向量数据库和基于嵌入的检索系统（RAG）中的漏洞允许对存储的数据进行投毒、注入或未经授权的访问。

RAG 操作期间检索到的中毒嵌入
将恶意向量直接注入存储
从不安全的数据中检索敏感数据
利用过滤不足的基于元数据的攻击
基于相似性的检索返回有害内容

通过中毒上下文操纵输出
向量存储中的敏感数据泄露
虚假信息注入
RAG 系统完整性受损

数据在向量化前经过验证和清理
对向量存储的插入和修改进行访问控制
元数据过滤将检索限制在适当的类别
监控可疑的批量插入
相似性阈值确保检索相关性
敏感和公共向量存储分离
追踪嵌入来源
检测异常检索模式

在存储到向量数据库前验证数据
对向量操作实施严格的访问控制
使用元数据过滤和相似性阈值
分离敏感和公共数据存储
监控异常模式

描述 : LLM 生成看似合理但虚假的信息（幻觉/虚构），用户可能信任并据此采取行动，从而造成危害。

以权威方式呈现的捏造事实
不存在的虚假引用或参考文献
虚构的判例法、医疗建议或技术解决方案
旨在触发幻觉的对抗性提示
自信的错误推理

基于虚假信息的有害决策
因错误建议而产生的法律责任
对 AI 系统的信任度下降
合规环境中的违规行为
声誉损害

提供置信度分数或不确定性指标
已实施针对可靠数据库的事实核查
敏感领域需要提供可验证来源的引用
RAG 将回答基于经过验证的数据
系统指令鼓励承认不确定性
关键输出需要人工审查
向用户清晰传达模型的局限性

实施检索增强生成（RAG）以提供依据
向用户提供置信度指标
关键领域需要可验证的引用
为高风险输出增加人工审查
清晰传达模型的局限性

LLM10: 无限制消耗

描述 : 不受控制的 LLM 使用导致拒绝服务、系统崩溃或通过资源耗尽产生过高的运营成本。

大量查询淹没 API 端点
消耗资源的极长或递归提示
通过递归提示注入形成无限循环
具有大量输入量的分布式攻击
通过连接系统产生的级联故障

合法用户的服务不可用
因令牌使用过多造成的财务损失
系统崩溃和性能下降
基础设施损坏

按用户、IP 和 API 密钥进行速率限制
对请求和每日使用设置最大令牌限制
资源消耗监控和警报
请求超时防止挂起操作
具有成本影响的人均配额
成本监控和自动预算警报
跨基础设施的负载均衡
具有成本感知限制的自动扩缩容

在多个级别实施速率限制
为每个用户/会话设置令牌和成本限制
监控资源消耗并设置警报
使用请求超时和队列管理
设计具有成本护栏的自动扩缩容

步骤 1: 应用理解 (15 分钟)

系统清单:
- 记录 LLM 提供商、模型版本和配置
- 绘制应用程序架构图（API、数据库、向量存储）
- 识别所有进出 LLM 的数据流
- 列出所有插件、工具和集成
威胁建模:
- 识别攻击面（用户输入、API、数据源）
- 确定数据敏感性分类
- 绘制组件间的信任边界
- 识别 LLM 可以触发的特权操作

步骤 2: 漏洞评估 (40-60 分钟)

针对 10 个漏洞中的每一个进行评估：

提示注入 (LLM01) - 10 分钟

使用已知注入技术（直接和间接）进行测试
尝试覆盖系统指令
在外部数据源中使用恶意内容进行测试
验证输入验证和清理

敏感信息泄露 (LLM02) - 5 分钟

尝试提取训练数据
测试输出中的 PII 泄露
检查凭证或 API 密钥暴露
验证输出过滤的有效性

供应链 (LLM03) - 5 分钟

审查模型来源和出处
检查依赖项版本和漏洞状态
验证插件和集成的安全性
审查 SBOM 的完整性

数据/模型投毒 (LLM04) - 5 分钟

审查数据管道安全性
检查微调数据验证
验证模型完整性监控
使用已知触发模式进行测试

不当的输出处理 (LLM05) - 10 分钟

测试通过 LLM 输出进行 XSS
尝试通过模型响应进行 SQL 注入
检查命令注入的可能性
验证输出清理和编码

过度代理 (LLM06) - 5 分钟

审查代理权限和能力
测试权限边界
验证关键操作是否有人类参与
检查审计日志的完整性

系统提示泄露 (LLM07) - 5 分钟

尝试提取系统提示
检查提示中是否包含秘密
验证提示保护机制

向量/嵌入弱点 (LLM08) - 5 分钟

审查向量存储访问控制
测试 RAG 检索是否存在注入
验证数据分离和过滤

虚假信息 (LLM09) - 5 分钟

测试关键领域是否存在幻觉
验证依据和引用机制
检查置信度指标

无限制消耗 (LLM10) - 5 分钟

测试速率限制的有效性
验证令牌和成本限制
检查超时配置

步骤 3: 风险评分 (15 分钟)

对发现的每个漏洞，使用以下标准进行评分：

可能性 : 被利用的可能性有多大？

高 : 已知攻击向量，易于利用，可公开访问
中 : 需要一定技能或特定条件
低 : 难以利用，攻击面有限

影响 : 潜在的损害是什么？

严重 : 系统被入侵、重大数据泄露、重大财务损失
高 : 未经授权的访问、数据暴露、服务中断
中 : 有限的数据暴露、部分服务影响
低 : 轻微信息泄露、影响最小

步骤 4: 报告生成 (20 分钟)

编制全面的安全评估报告。

生成全面的 OWASP LLM 安全审计报告：

# OWASP LLM 十大安全审计报告

**应用程序**: [名称]
**LLM 提供商/模型**: [提供商 - 模型]
**日期**: [日期]
**评估者**: [AI 代理或人类]
**OWASP LLM 十大版本**: 2025

---

## 执行摘要

### 整体安全态势: [严重 / 高风险 / 中等风险 / 低风险 / 安全]

**应用程序类型**: [聊天机器人 / 代理 / RAG 系统 / 内容生成器 / 代码助手 / 其他]
**数据敏感性**: [公开 / 内部 / 机密 / 受限]
**用户群**: [内部 / B2B / B2C / 公开]

### 关键发现
| # | 漏洞 | 严重性 | 状态 |
|---|---|---|---|
| LLM01 | 提示注入 | 严重 | [易受攻击 / 已缓解 / 不适用] |
| LLM02 | 敏感信息泄露 | 严重 | [易受攻击 / 已缓解 / 不适用] |
| LLM03 | 供应链 | 高 | [易受攻击 / 已缓解 / 不适用] |
| LLM04 | 数据/模型投毒 | 高 | [易受攻击 / 已缓解 / 不适用] |
| LLM05 | 不当的输出处理 | 高 | [易受攻击 / 已缓解 / 不适用] |
| LLM06 | 过度代理 | 高 | [易受攻击 / 已缓解 / 不适用] |
| LLM07 | 系统提示泄露 | 中 | [易受攻击 / 已缓解 / 不适用] |
| LLM08 | 向量/嵌入弱点 | 中 | [易受攻击 / 已缓解 / 不适用] |
| LLM09 | 虚假信息 | 中 | [易受攻击 / 已缓解 / 不适用] |
| LLM10 | 无限制消耗 | 中 | [易受攻击 / 已缓解 / 不适用] |

### 前三大关键问题
1. [问题] - [影响描述]
2. [问题] - [影响描述]
3. [问题] - [影响描述]

---

## 详细发现

### LLM01: 提示注入
**状态**: [易受攻击 / 部分缓解 / 已缓解]
**严重性**: [严重 / 高 / 中 / 低]
**可能性**: [高 / 中 / 低]

**发现:**
1. [附带证据的发现]
2. [附带证据的发现]

**攻击场景:**
[描述如何利用此漏洞]

**建议:**
1. [具体的修复步骤]
2. [具体的修复步骤]

**工作量**: [低 / 中 / 高]

---

[继续 LLM02 到 LLM10...]

---

## 架构安全审查

### 数据流分析
[带有信任边界标记的数据流图或描述]

### 攻击面摘要
| 攻击面 | 风险级别 | 控制措施 |
|---|---|---|
| 用户输入 | [级别] | [控制措施] |
| API 端点 | [级别] | [控制措施] |
| 向量存储 | [级别] | [控制措施] |
| 插件/工具 | [级别] | [控制措施] |
| 输出渲染 | [级别] | [控制措施] |

---

## 修复路线图

### 阶段 1: 关键 (0-7 天)
1. [ ] [带负责人的行动项]
2. [ ] [带负责人的行动项]

### 阶段 2: 高优先级 (7-30 天)
1. [ ] [带负责人的行动项]

### 阶段 3: 中等优先级 (30-90 天)
1. [ ] [带负责人的行动项]

### 阶段 4: 加固 (持续进行)
1. [ ] [持续改进实践]

---

## 安全控制矩阵

| 控制措施 | 已实施 | 有效 | 建议 |
|---|---|---|---|
| 输入验证 | [是/否/部分] | [是/否] | [建议] |
| 输出清理 | [是/否/部分] | [是/否] | [建议] |
| 速率限制 | [是/否/部分] | [是/否] | [建议] |
| 身份验证 | [是/否/部分] | [是/否] | [建议] |
| 授权 | [是/否/部分] | [是/否] | [建议] |
| 日志记录/监控 | [是/否/部分] | [是/否] | [建议] |
| 内容过滤 | [是/否/部分] | [是/否] | [建议] |
| 人类参与 | [是/否/部分] | [是/否] | [建议] |

---

## 后续步骤

1. [ ] 确定关键发现的优先级并分配任务
2. [ ] 实施快速见效的措施（输入验证、速率限制）
3. [ ] 为高风险区域安排渗透测试
4. [ ] 建立持续监控
5. [ ] 修复后计划后续审计

---

## 资源

- [OWASP LLM 应用十大安全风险 2025](https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/)
- [OWASP GenAI 安全项目](https://genai.owasp.org/)
- [OWASP LLM AI 安全与治理清单](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
- [OWASP GitHub 存储库](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications)

---

**审计版本**: 1.0
**日期**: [日期]

快速参考：漏洞优先级

优先级	漏洞	理由
P0	LLM01 (提示注入), LLM02 (数据泄露)	直接利用，影响大
P1	LLM05 (输出处理), LLM06 (过度代理)	可能导致系统被入侵
P2	LLM03 (供应链), LLM04 (投毒)	较难利用但影响严重
P3	LLM07 (提示泄露), LLM08 (向量弱点)	为后续攻击提供便利
P4	LLM09 (虚假信息), LLM10 (无限制消耗)	操作风险

纵深防御 : 切勿依赖单一安全控制
对 LLM 输出零信任 : 将所有模型输出视为不可信
最小权限 : 最小化 AI 代理的权限和能力
持续监控 : 记录 AI 异常行为并发出警报
对抗性测试 : 定期对 AI 功能进行红队演练
保护管道安全 : 保护训练数据、模型和嵌入
人工监督 : 对关键操作保持人类参与
定期更新 : 紧跟不断发展的攻击技术
教育用户 : 培训用户安全的 AI 交互实践
制定事件应对计划 : 制定针对 AI 的事件响应程序

1.0 - 初始版本 (OWASP LLM 应用十大安全风险 2025)

请记住 : LLM 安全是一个不断发展的领域。新的攻击向量会定期出现。此审计提供基线评估；持续监控和定期重新评估对于维持安全态势至关重要。

🇺🇸English

OWASP Top 10 for LLM Applications Security Audit

This skill enables AI agents to perform a comprehensive security assessment of Large Language Model (LLM) and Generative AI applications using the OWASP Top 10 for LLM Applications 2025 , published by the OWASP GenAI Security Project.

The OWASP Top 10 for LLM Applications identifies the most critical security risks in systems that integrate large language models, covering vulnerabilities from prompt injection to unbounded resource consumption. This is the authoritative industry standard for LLM application security.

Use this skill to identify security vulnerabilities, assess risk exposure, prioritize remediation, and establish secure development practices for AI-powered applications.

Combine with "NIST AI RMF" for comprehensive risk management or "ISO 42001 AI Governance" for governance compliance.

When to Use This Skill

Invoke this skill when:

Auditing security of LLM-powered applications before deployment
Reviewing GenAI integrations for security vulnerabilities
Assessing RAG (Retrieval-Augmented Generation) systems
Evaluating chatbot or AI assistant security
Conducting penetration testing of AI features
Building secure AI application architectures
Reviewing third-party AI API integrations
Preparing for security compliance reviews
Responding to AI-related security incidents

Inputs Required

When executing this audit, gather:

application_description : Description of the AI application (purpose, LLM used, architecture, features, user base) [REQUIRED]
architecture_details : System architecture (APIs, databases, vector stores, plugins, integrations) [OPTIONAL but recommended]
llm_provider : LLM provider and model (OpenAI GPT-4, Anthropic Claude, self-hosted, etc.) [OPTIONAL]
deployment_context : Deployment environment (cloud, on-premise, hybrid, edge) [OPTIONAL]
data_sensitivity : Types of data processed (PII, financial, health, proprietary) [OPTIONAL]
existing_controls : Current security measures (auth, rate limiting, content filtering) [OPTIONAL]
specific_concerns : Known vulnerabilities or areas of focus [OPTIONAL]

The OWASP Top 10 for LLM Applications (2025)

LLM01: Prompt Injection

Severity : Critical

Description : Attackers manipulate LLM operations through crafted inputs, either directly or indirectly, to bypass intended functionality, access unauthorized data, or trigger unintended actions.

Attack Vectors:

Direct injection : Malicious user prompts containing override commands
Indirect injection : Hidden instructions in external content (web pages, documents, emails) processed by the LLM
Jailbreaks : Techniques to bypass safety constraints and content policies

Impact:

Unauthorized data access and exfiltration
Bypass of content safety filters
Manipulation of downstream system actions
Social engineering of users through manipulated outputs

Assessment Checklist:

Input sanitization and validation implemented
System prompts separated from user inputs with clear delimiters
Least privilege applied to LLM backend access
Output validation before downstream actions
Human-in-the-loop for critical operations
Adversarial testing conducted with known injection techniques
Content filtering layers applied pre- and post-LLM

Mitigation Strategies:

Enforce privilege controls on LLM backend access
Segregate external content from user prompts
Maintain human oversight for critical functions
Implement input/output validation pipelines
Conduct regular adversarial testing

LLM02: Sensitive Information Disclosure

Severity : Critical

Description : LLMs inadvertently expose confidential data including PII, proprietary algorithms, credentials, intellectual property, or internal system information through their outputs.

Attack Vectors:

Crafted prompts designed to extract training data
Legitimate queries that trigger memorized sensitive content
Model outputs revealing internal system architecture
Embedding leakage from vector databases

Impact:

Privacy violations and regulatory non-compliance (GDPR, CCPA)
Intellectual property theft
Credential exposure enabling further attacks
Reputational damage

Assessment Checklist:

PII and sensitive data removed from training/fine-tuning data
Data masking and tokenization in logs and outputs
System instructions forbidding sensitive disclosures
Output filtering for known sensitive patterns (SSN, credit cards, API keys)
Model access restricted to necessary information via middleware
User education against pasting confidential content
Output monitoring for anomalous data exposure

Mitigation Strategies:

Sanitize training data to remove sensitive information
Implement data loss prevention (DLP) on outputs
Apply access controls limiting model's data reach
Monitor outputs for sensitive data patterns
Use differential privacy techniques in training

LLM03: Supply Chain Vulnerabilities

Severity : High

Description : Compromised third-party components (models, datasets, libraries, plugins) introduce security risks including malware, backdoors, or biased behavior.

Attack Vectors:

Malicious pre-trained models from public repositories
Poisoned datasets with embedded triggers
Vulnerable ML libraries and dependencies
Compromised plugins with unauthorized access
Trojanized fine-tuning adapters

Impact:

System compromise and data theft
Backdoor access to production systems
Model corruption affecting all users
Legal liability from unlicensed content

Assessment Checklist:

Models sourced from verified, reputable providers
Digital signatures and checksums verified
Model files scanned for suspicious code (picklescan, etc.)
Third-party models deployed in sandboxed environments
Dependencies regularly updated and audited
Plugin permissions restricted with allowlists
Complete inventory of all models and components maintained
SBOM (Software Bill of Materials) maintained for AI components

Mitigation Strategies:

Source models only from trusted, verified providers
Scan model files for malicious code before deployment
Sandbox third-party models with restricted permissions
Maintain updated dependency inventory
Implement model signing and integrity verification

LLM04: Data and Model Poisoning

Severity : High

Description : Attackers manipulate training or fine-tuning data to introduce vulnerabilities, backdoors, or biases that compromise model security and reliability.

Attack Vectors:

Crafted training examples with hidden trigger phrases
Poisoned web-scraped content absorbed during training
Direct tampering with model weights or parameters
Malicious fine-tuning data
Subtle label manipulation or data anomalies

Impact:

Biased or degraded model outputs
Trigger-activated backdoors in production
Erosion of model trustworthiness
Long-term hidden threats difficult to detect

Assessment Checklist:

Training data validated, cleaned, and audited
Data provenance tracked and documented
Rate limiting and moderation for crowdsourced data
Differential privacy techniques applied
Models tested with known trigger phrases before deployment
Deployed models monitored for behavioral drift
Model file checksums verified against known-good states

Mitigation Strategies:

Validate and clean all training data sources
Implement data provenance tracking
Apply differential privacy to limit individual data influence
Test with adversarial inputs before deployment
Monitor production models for unexpected behavior

LLM05: Improper Output Handling

Severity : High

Description : Applications blindly execute or render LLM outputs without validation, enabling code injection, XSS, SQL injection, SSRF, and other attacks.

Attack Vectors:

Unescaped HTML/JavaScript in outputs (XSS)
Model-generated shell commands executed without sanitization
SQL queries constructed from model output
Unsanitized API calls based on AI suggestions
Direct execution via eval() or exec()

Impact:

Remote code execution
Session hijacking
Database manipulation
Privilege escalation
Full system compromise

Assessment Checklist:

All LLM output treated as untrusted input
Strict output schema validation enforced (JSON, formats)
Output sanitized and escaped based on context (HTML, SQL, shell)
Parameterized queries used instead of raw SQL
Allowlists for acceptable output patterns
Generated code executed in sandboxed environments
Human approval required for high-impact actions
Rendering libraries with built-in escaping used

Mitigation Strategies:

Never trust LLM output; validate and sanitize everything
Enforce strict output schemas
Use parameterized queries and safe ORM methods
Sandbox all code execution
Require human approval for privileged operations

LLM06: Excessive Agency

Severity : High

Description : AI agents possess excessive permissions and autonomous capabilities, enabling significant harm through compromised prompts, hallucinations, or malicious manipulation.

Attack Vectors:

Prompt injection exploiting overly permissioned agents
Hallucinations triggering unintended high-impact actions
Confused deputy attacks using AI's elevated privileges
Malicious plugins with excessive access
Unrestricted system control (email, API, database)

Impact:

Unauthorized data transmission
Destructive actions (deletion, modification)
Financial loss through unauthorized transactions
Service disruptions
Automated attack amplification

Assessment Checklist:

Principle of least privilege applied to all AI capabilities
Granular permissions with limited-scope OAuth tokens
Functionality compartmentalized across narrow-scope agents
High-risk actions restricted (deletion, transfers, device control)
Explicit user approval for significant operations
Rate limiting on AI actions and API calls
Comprehensive audit logs of all agent activities
Monitoring with alerts for anomalous behavior

Mitigation Strategies:

Grant only essential capabilities (least privilege)
Compartmentalize agent functionality
Require human approval for high-impact operations
Implement comprehensive audit logging
Set up real-time monitoring and anomaly detection

LLM07: System Prompt Leakage

Severity : Medium

Description : System instructions intended to guide AI behavior are exposed to users or attackers, revealing internal logic, security controls, or sensitive configurations.

Attack Vectors:

Prompt injection requesting instruction disclosure
Sophisticated probing asking to repeat conversation context
Tokenization quirks causing unintended disclosure
Reverse-engineering through behavioral observation
Model unintentionally echoing system prompts

Impact:

Security logic exposure enabling bypass attacks
Credential compromise if secrets embedded in prompts
Internal system knowledge revelation
Facilitation of more targeted attacks

Assessment Checklist:

No passwords, API keys, or secrets in system prompts
Prompts treated as public information
Models configured to refuse revealing system messages
Clear message role delimiters (system/user/assistant)
Security policies enforced at application level, not prompt level
Output monitoring for prompt leakage patterns
Regular testing with known extraction techniques

Mitigation Strategies:

Never embed sensitive data in system prompts
Implement application-level security enforcement
Configure models to refuse prompt disclosure
Monitor outputs for leakage patterns
Use structured message formats with role delimiters

LLM08: Vector and Embedding Weaknesses

Severity : Medium

Description : Vulnerabilities in vector databases and embedding-based retrieval systems (RAG) allow poisoning, injection, or unauthorized access to stored data.

Attack Vectors:

Poisoned embeddings retrieved during RAG operations
Direct injection of malicious vectors into stores
Retrieval of sensitive data from improperly secured databases
Metadata-based attacks exploiting insufficient filtering
Similarity-based retrieval returning harmful content

Impact:

Output manipulation through poisoned context
Sensitive data leakage from vector stores
Misinformation injection
Compromised RAG system integrity

Assessment Checklist:

Data validated and sanitized before vectorization
Access controls on vector store insertion and modification
Metadata filtering restricts retrieval to appropriate categories
Monitoring for suspicious bulk insertions
Similarity thresholds ensuring relevant retrieval
Sensitive and public vector stores separated
Embedding source provenance tracked
Anomaly detection for unusual retrieval patterns

Mitigation Strategies:

Validate data before storing in vector databases
Implement strict access controls on vector operations
Use metadata filtering and similarity thresholds
Separate sensitive and public data stores
Monitor for anomalous patterns

LLM09: Misinformation

Severity : Medium

Description : LLMs generate plausible but false information (hallucinations/confabulations) that users may trust and act upon, causing harm.

Attack Vectors:

Fabricated facts presented authoritatively
Fake citations or references that don't exist
Invented case law, medical advice, or technical solutions
Adversarial prompts designed to trigger hallucinations
Confident incorrect reasoning

Impact:

Harmful decisions based on false information
Legal liability from incorrect advice
Erosion of trust in AI systems
Regulatory violations in compliance contexts
Reputational damage

Assessment Checklist:

Confidence scores or uncertainty indicators provided
Fact-checking against reliable databases implemented
Citations with verifiable sources required for sensitive domains
RAG grounding responses in validated data
System instructions encourage admitting uncertainty
Human review for critical outputs
Model limitations clearly communicated to users

Mitigation Strategies:

Implement retrieval-augmented generation (RAG) for grounding
Provide confidence indicators to users
Require verifiable citations for critical domains
Add human review for high-stakes outputs
Clearly communicate model limitations

LLM10: Unbounded Consumption

Severity : Medium

Description : Uncontrolled LLM usage causes denial-of-service, system crashes, or excessive operational costs through resource exhaustion.

Attack Vectors:

Flood of queries overwhelming API endpoints
Extremely long or recursive prompts consuming resources
Infinite loops through recursive prompt injection
Distributed attacks with massive input volumes
Cascading failures through connected systems

Impact:

Service unavailability for legitimate users
Financial loss from excessive token usage
System crashes and performance degradation
Infrastructure damage

Assessment Checklist:

Rate limiting per user, IP, and API key
Maximum token limits for requests and daily usage
Resource consumption monitoring with alerting
Request timeouts preventing hung operations
Per-user quotas with cost implications
Cost monitoring with automated budget alerts
Load balancing across infrastructure
Auto-scaling with cost-aware limits

Mitigation Strategies:

Implement rate limiting at multiple levels
Set token and cost limits per user/session
Monitor resource consumption with alerts
Use request timeouts and queue management
Design auto-scaling with cost guardrails

Audit Procedure

Step 1: Application Understanding (15 minutes)

System inventory:
- Document LLM provider, model version, and configuration
- Map application architecture (APIs, databases, vector stores)
- Identify all data flows to and from the LLM
- List plugins, tools, and integrations
Threat modeling:
- Identify attack surfaces (user inputs, APIs, data sources)
- Determine data sensitivity classification
- Map trust boundaries between components
- Identify privileged operations the LLM can trigger

Step 2: Vulnerability Assessment (40-60 minutes)

For each of the 10 vulnerabilities, assess:

Prompt Injection (LLM01) - 10 min

Test with known injection techniques (direct and indirect)
Attempt to override system instructions
Test with malicious content in external data sources
Verify input validation and sanitization

Sensitive Information Disclosure (LLM02) - 5 min

Attempt to extract training data
Test for PII leakage in outputs
Check for credential or API key exposure
Verify output filtering effectiveness

Supply Chain (LLM03) - 5 min

Review model provenance and source
Check dependency versions and vulnerability status
Verify plugin and integration security
Review SBOM completeness

Data/Model Poisoning (LLM04) - 5 min

Review data pipeline security
Check fine-tuning data validation
Verify model integrity monitoring
Test with known trigger patterns

Improper Output Handling (LLM05) - 10 min

Test for XSS through LLM outputs
Attempt SQL injection via model responses
Check command injection possibilities
Verify output sanitization and encoding

Excessive Agency (LLM06) - 5 min

Review agent permissions and capabilities
Test privilege boundaries
Verify human-in-the-loop for critical actions
Check audit logging completeness

System Prompt Leakage (LLM07) - 5 min

Attempt to extract system prompts
Check for secrets in prompts
Verify prompt protection mechanisms

Vector/Embedding Weaknesses (LLM08) - 5 min

Review vector store access controls
Test RAG retrieval for injection
Verify data separation and filtering

Misinformation (LLM09) - 5 min

Test for hallucination in critical domains
Verify grounding and citation mechanisms
Check confidence indicators

Unbounded Consumption (LLM10) - 5 min

Test rate limiting effectiveness
Verify token and cost limits
Check timeout configurations

Step 3: Risk Scoring (15 minutes)

For each vulnerability found, score using:

Likelihood : How likely is exploitation?

High : Known attack vectors, easy to exploit, publicly accessible
Medium : Requires some skill or specific conditions
Low : Difficult to exploit, limited attack surface

Impact : What is the potential damage?

Critical : System compromise, major data breach, significant financial loss
High : Unauthorized access, data exposure, service disruption
Medium : Limited data exposure, partial service impact
Low : Minor information disclosure, minimal impact

Step 4: Report Generation (20 minutes)

Compile comprehensive security assessment.

Output Format

Generate a comprehensive OWASP LLM security audit report:

# OWASP LLM Top 10 Security Audit Report

**Application**: [Name]
**LLM Provider/Model**: [Provider - Model]
**Date**: [Date]
**Evaluator**: [AI Agent or Human]
**OWASP LLM Top 10 Version**: 2025

---

## Executive Summary

### Overall Security Posture: [Critical / High Risk / Medium Risk / Low Risk / Secure]

**Application Type**: [Chatbot / Agent / RAG System / Content Generator / Code Assistant / Other]
**Data Sensitivity**: [Public / Internal / Confidential / Restricted]
**User Base**: [Internal / B2B / B2C / Public]

### Critical Findings
| # | Vulnerability | Severity | Status |
|---|---|---|---|
| LLM01 | Prompt Injection | Critical | [Vulnerable / Mitigated / N/A] |
| LLM02 | Sensitive Info Disclosure | Critical | [Vulnerable / Mitigated / N/A] |
| LLM03 | Supply Chain | High | [Vulnerable / Mitigated / N/A] |
| LLM04 | Data/Model Poisoning | High | [Vulnerable / Mitigated / N/A] |
| LLM05 | Improper Output Handling | High | [Vulnerable / Mitigated / N/A] |
| LLM06 | Excessive Agency | High | [Vulnerable / Mitigated / N/A] |
| LLM07 | System Prompt Leakage | Medium | [Vulnerable / Mitigated / N/A] |
| LLM08 | Vector/Embedding Weaknesses | Medium | [Vulnerable / Mitigated / N/A] |
| LLM09 | Misinformation | Medium | [Vulnerable / Mitigated / N/A] |
| LLM10 | Unbounded Consumption | Medium | [Vulnerable / Mitigated / N/A] |

### Top 3 Critical Issues
1. [Issue] - [Impact description]
2. [Issue] - [Impact description]
3. [Issue] - [Impact description]

---

## Detailed Findings

### LLM01: Prompt Injection
**Status**: [Vulnerable / Partially Mitigated / Mitigated]
**Severity**: [Critical / High / Medium / Low]
**Likelihood**: [High / Medium / Low]

**Findings:**
1. [Finding with evidence]
2. [Finding with evidence]

**Attack Scenario:**
[Description of how this could be exploited]

**Recommendations:**
1. [Specific remediation step]
2. [Specific remediation step]

**Effort**: [Low / Medium / High]

---

[Continue for LLM02 through LLM10...]

---

## Architecture Security Review

### Data Flow Analysis
[Diagram or description of data flows with trust boundaries marked]

### Attack Surface Summary
| Surface | Risk Level | Controls |
|---|---|---|
| User Input | [Level] | [Controls] |
| API Endpoints | [Level] | [Controls] |
| Vector Store | [Level] | [Controls] |
| Plugins/Tools | [Level] | [Controls] |
| Output Rendering | [Level] | [Controls] |

---

## Remediation Roadmap

### Phase 1: Critical (0-7 days)
1. [ ] [Action item with owner]
2. [ ] [Action item with owner]

### Phase 2: High Priority (7-30 days)
1. [ ] [Action item with owner]

### Phase 3: Medium Priority (30-90 days)
1. [ ] [Action item with owner]

### Phase 4: Hardening (Ongoing)
1. [ ] [Continuous improvement practices]

---

## Security Controls Matrix

| Control | Implemented | Effective | Recommendation |
|---|---|---|---|
| Input validation | [Yes/No/Partial] | [Yes/No] | [Recommendation] |
| Output sanitization | [Yes/No/Partial] | [Yes/No] | [Recommendation] |
| Rate limiting | [Yes/No/Partial] | [Yes/No] | [Recommendation] |
| Authentication | [Yes/No/Partial] | [Yes/No] | [Recommendation] |
| Authorization | [Yes/No/Partial] | [Yes/No] | [Recommendation] |
| Logging/Monitoring | [Yes/No/Partial] | [Yes/No] | [Recommendation] |
| Content filtering | [Yes/No/Partial] | [Yes/No] | [Recommendation] |
| Human-in-the-loop | [Yes/No/Partial] | [Yes/No] | [Recommendation] |

---

## Next Steps

1. [ ] Prioritize and assign critical findings
2. [ ] Implement quick wins (input validation, rate limiting)
3. [ ] Schedule penetration testing for high-risk areas
4. [ ] Establish continuous monitoring
5. [ ] Plan follow-up audit after remediation

---

## Resources

- [OWASP Top 10 for LLM Applications 2025](https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/)
- [OWASP GenAI Security Project](https://genai.owasp.org/)
- [OWASP LLM AI Security & Governance Checklist](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
- [OWASP GitHub Repository](https://github.com/OWASP/www-project-top-10-for-large-language-model-applications)

---

**Audit Version**: 1.0
**Date**: [Date]

Quick Reference: Vulnerability Priority

Priority	Vulnerabilities	Rationale
P0	LLM01 (Prompt Injection), LLM02 (Data Disclosure)	Direct exploitation, high impact
P1	LLM05 (Output Handling), LLM06 (Excessive Agency)	System compromise potential
P2	LLM03 (Supply Chain), LLM04 (Poisoning)	Harder to exploit but severe impact
P3	LLM07 (Prompt Leakage), LLM08 (Vector Weaknesses)	Enables further attacks
P4	LLM09 (Misinformation), LLM10 (Unbounded Consumption)	Operational risk

Best Practices

Defense in depth : Never rely on a single security control
Zero trust for LLM output : Treat all model output as untrusted
Least privilege : Minimize AI agent permissions and capabilities
Monitor continuously : Log and alert on anomalous AI behavior
Test adversarially : Regular red-team exercises against AI features
Secure the pipeline : Protect training data, models, and embeddings
Human oversight : Maintain human-in-the-loop for critical operations
Update regularly : Stay current with evolving attack techniques
Educate users : Train users on safe AI interaction practices
Plan for incidents : Have AI-specific incident response procedures

Version

1.0 - Initial release (OWASP Top 10 for LLM Applications 2025)

Remember : LLM security is an evolving field. New attack vectors emerge regularly. This audit provides a baseline assessment; continuous monitoring and periodic re-assessment are essential for maintaining security posture.

Weekly Installs

Repository

mastepanoski/cl…e-skills

GitHub Stars

First Seen

Feb 5, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex84

gemini-cli83

github-copilot83

opencode83

cursor80

kimi-cli79

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

52,100 周安装

OWASP LLM应用十大安全审计 | 2025版AI安全评估与风险管理指南

🇨🇳中文介绍

OWASP LLM 应用十大安全审计

何时使用此技能

所需输入

OWASP LLM 应用十大安全风险 (2025)

LLM01: 提示注入

相关 Skills

LLM02: 敏感信息泄露

LLM03: 供应链漏洞

LLM04: 数据和模型投毒

LLM05: 不当的输出处理

LLM06: 过度代理

LLM07: 系统提示泄露

LLM08: 向量和嵌入弱点

LLM09: 虚假信息

LLM10: 无限制消耗

审计流程

步骤 1: 应用理解 (15 分钟)

步骤 2: 漏洞评估 (40-60 分钟)

提示注入 (LLM01) - 10 分钟

敏感信息泄露 (LLM02) - 5 分钟

供应链 (LLM03) - 5 分钟

数据/模型投毒 (LLM04) - 5 分钟

不当的输出处理 (LLM05) - 10 分钟

过度代理 (LLM06) - 5 分钟

系统提示泄露 (LLM07) - 5 分钟

向量/嵌入弱点 (LLM08) - 5 分钟

虚假信息 (LLM09) - 5 分钟

无限制消耗 (LLM10) - 5 分钟

步骤 3: 风险评分 (15 分钟)

步骤 4: 报告生成 (20 分钟)

输出格式

快速参考：漏洞优先级

最佳实践

版本

🇺🇸English

OWASP Top 10 for LLM Applications Security Audit

When to Use This Skill

Inputs Required

The OWASP Top 10 for LLM Applications (2025)

LLM01: Prompt Injection

LLM02: Sensitive Information Disclosure

LLM03: Supply Chain Vulnerabilities

LLM04: Data and Model Poisoning

LLM05: Improper Output Handling

LLM06: Excessive Agency

LLM07: System Prompt Leakage

LLM08: Vector and Embedding Weaknesses

LLM09: Misinformation

LLM10: Unbounded Consumption

Audit Procedure

Step 1: Application Understanding (15 minutes)

Step 2: Vulnerability Assessment (40-60 minutes)

Prompt Injection (LLM01) - 10 min

Sensitive Information Disclosure (LLM02) - 5 min

Supply Chain (LLM03) - 5 min

Data/Model Poisoning (LLM04) - 5 min

Improper Output Handling (LLM05) - 10 min

Excessive Agency (LLM06) - 5 min

System Prompt Leakage (LLM07) - 5 min

Vector/Embedding Weaknesses (LLM08) - 5 min

Misinformation (LLM09) - 5 min

Unbounded Consumption (LLM10) - 5 min

Step 3: Risk Scoring (15 minutes)

Step 4: Report Generation (20 minutes)

Output Format

Quick Reference: Vulnerability Priority

Best Practices

Version

最新 Skills