yara-rule-authoring by trailofbits/skills
npx skills add https://github.com/trailofbits/skills --skill yara-rule-authoring编写能够检测恶意软件且不会产生大量误报的检测规则。
此技能针对 YARA-X,这是基于 Rust 的旧版 YARA 的继任者。YARA-X 为 VirusTotal 的生产系统提供支持,是推荐的实现方案。如果您已有现有规则,请参阅从旧版 YARA 迁移。
字符串必须生成良好的原子 — YARA 提取 4 字节的子序列以进行快速匹配。包含重复字节、常见序列或少于 4 字节的字符串会迫使 YARA 对过多文件进行缓慢的字节码验证。
针对特定家族,而非类别 — "检测勒索软件" 会匹配所有内容,但实际上一无所获。"检测 LockBit 3.0 配置提取例程" 才能捕获您想要的目标。
部署前在良性软件上测试 — 在 Windows 系统文件上触发的规则是无用的。使用 VirusTotal 的良性软件语料库或您自己的干净文件集进行验证。
先进行廉价检查以短路求值 — 将 filesize < 10MB and uint16(0) == 0x5A4D 放在昂贵的字符串搜索或模块调用之前。
元数据即文档 — 未来的您(以及您的团队)需要知道此规则捕获什么、为什么捕获以及样本来源。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
YARA-X 是基于 Rust 的旧版 YARA 的继任者:正则表达式速度快 5-10 倍,错误信息更佳,内置格式化程序,验证更严格,新增模块(crx、dex),99% 的规则兼容性。
安装: brew install yara-x (macOS) 或 cargo install yara-x
基本命令: yr scan, yr check, yr fmt, yr dump
YARA 适用于任何文件类型。请根据目标调整模式:
| 平台 | 魔数 | 不良字符串 | 良好字符串 |
|---|---|---|---|
| Windows PE | uint16(0) == 0x5A4D | API 名称,Windows 路径 | 互斥体名称,PDB 路径 |
| macOS Mach-O | uint32(0) == 0xFEEDFACE (32位), 0xFEEDFACF (64位), 0xCAFEBABE (通用) | 常见 Obj-C 方法 | 键盘记录器字符串,持久化路径 |
| JavaScript/Node | (无需) | require, fetch, axios | 混淆器特征,eval+解码链 |
| npm/pip 包 | (无需) | postinstall, dependencies | 可疑包名,数据外传 URL |
| Office 文档 | uint32(0) == 0x504B0304 | VBA 关键字 | 宏自动执行,编码载荷 |
| VS Code 扩展 | (无需) | vscode.workspace | 不常见的 activationEvents,隐藏文件访问 |
| Chrome 扩展 | 使用 crx 模块 | 常见 Chrome API | 权限滥用,清单异常 |
| Android 应用 | 使用 dex 模块 | 标准 DEX 结构 | 混淆类,可疑权限 |
目前尚无专用的 Mach-O 模块。使用魔数检查 + 字符串模式:
魔数:
// Mach-O 32-bit
uint32(0) == 0xFEEDFACE
// Mach-O 64-bit
uint32(0) == 0xFEEDFACF
// Universal binary (fat binary)
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECA
macOS 恶意软件的优良指标:
CGEventTapCreate, kCGEventKeyDownssh -D, tunnel, socks~/Library/LaunchAgents, /Library/LaunchDaemonssecurity find-generic-password, keychain来自 Airbnb BinaryAlert 的示例模式:
rule SUSP_Mac_ProtonRAT
{
strings:
// Library indicators
$lib1 = "SRWebSocket" ascii
$lib2 = "SocketRocket" ascii
// Behavioral indicators
$behav1 = "SSH tunnel not launched" ascii
$behav2 = "Keylogger" ascii
condition:
(uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
any of ($lib*) and any of ($behav*)
}
Writing a JavaScript rule?
├─ npm package?
│ ├─ Check package.json patterns
│ ├─ Look for postinstall/preinstall hooks
│ └─ Target exfil patterns: fetch + env access + credential paths
├─ Browser extension?
│ ├─ Chrome: Use crx module
│ └─ Others: Target manifest patterns, background script behaviors
├─ Standalone JS file?
│ ├─ Look for obfuscation markers: eval+atob, fromCharCode chains
│ ├─ Target unique function/variable names (often survive minification)
│ └─ Check for packed/encoded payloads
└─ Minified/webpack bundle?
├─ Target unique strings that survive bundling (URLs, magic values)
└─ Avoid function names (will be mangled)
JavaScript 特有的良好字符串:
{ 70 a0 82 31 } (transfer){ E2 80 8B E2 80 8C }_0x, var _0xJavaScript 特有的不良字符串:
require, fetch, axios — 太常见Buffer, crypto — 到处都有合法用途process.env — 需要特定的环境变量名| 工具 | 用途 |
|---|---|
| yarGen | 提取候选字符串:yarGen.py -m samples/ --excludegood → 用 yr check 验证 |
| FLOSS | 提取混淆/栈字符串:floss sample.exe (当 yarGen 失败时) |
| yr CLI | 验证:yr check,扫描:yr scan -s,检查:yr dump -m pe |
| signature-base | 学习优质示例 |
| YARA-CI | 部署前进行良性软件语料库测试 |
掌握这五个工具。不要被工具目录分散注意力。
当您发现自己有这些想法时,请停下来重新考虑。
| 合理化理由 | 专家回应 |
|---|---|
| "这个通用字符串足够独特" | 先在良性软件上测试。您的直觉是错的。 |
| "yarGen 给了我这些字符串" | yarGen 是建议,您需要验证。手动检查每一个。 |
| "它在我的 10 个样本上有效" | 10 个样本 ≠ 生产环境。使用 VirusTotal 良性软件语料库。 |
| "一条规则捕获所有变种" | 会导致误报泛滥。针对特定家族。 |
| "如果我们收到误报,我会让它更具体" | 一开始就编写严格的规则。误报会消耗信任。 |
| "这个十六进制模式是唯一的" | 在一个样本中唯一 ≠ 在整个恶意软件生态系统中唯一。 |
| "性能不重要" | 一条慢规则会拖慢整个规则集。优化原子。 |
| "PEiD 规则仍然有效" | 已过时。32 位加壳器已不相关。 |
| "我稍后会添加更多条件" | 部署弱规则 = 造成损害。 |
| "这只是用于狩猎" | 狩猎规则会变成检测规则。质量标准相同。 |
| "这个 API 名称使其具有恶意" | 合法软件使用相同的 API。需要行为上下文。 |
| "对于这些常见字符串,any of 没问题" | 常见字符串 + any = 误报泛滥。仅对单独唯一的字符串使用 any of。 |
| "这个正则表达式足够具体" | /fetch.*token/ 匹配所有认证代码。添加数据外传目的地要求。 |
| "这个 JavaScript 看起来很干净" | 攻击者会向合法代码中注入恶意代码。检查 eval+解码链。 |
| "我会用 .* 来保持灵活性" | 无界正则表达式 = 性能灾难 + 内存爆炸。使用 .{0,30}。 |
| "我会到处使用 --relaxed-re-syntax" | 掩盖了真正的错误。修复正则表达式,而不是隐藏问题。 |
Is this string good enough?
├─ Less than 4 bytes?
│ └─ NO — find longer string
├─ Contains repeated bytes (0000, 9090)?
│ └─ NO — add surrounding context
├─ Is an API name (VirtualAlloc, CreateRemoteThread)?
│ └─ NO — use hex pattern of call site instead
├─ Appears in Windows system files?
│ └─ NO — too generic, find something unique
├─ Is it a common path (C:\Windows\, cmd.exe)?
│ └─ NO — find malware-specific paths
├─ Unique to this malware family?
│ └─ YES — use it
└─ Appears in other malware too?
└─ MAYBE — combine with family-specific marker
Should I require all strings or allow any?
├─ Strings are individually unique to malware?
│ └─ any of them (each alone is suspicious)
├─ Strings are common but combination is suspicious?
│ └─ all of them (require the full pattern)
├─ Strings have different confidence levels?
│ └─ Group: all of ($core_*) and any of ($variant_*)
└─ Seeing many false positives?
└─ Tighten: switch any → all, add more required strings
生产经验教训: 使用 any of ($network_*) 的规则,其中字符串包含 "fetch"、"axios"、"http",几乎匹配了所有 Web 应用程序。改为要求凭据路径 AND 网络调用 AND 数据外传目的地后,消除了误报。
在以下情况时停止并转向:
yarGen 仅返回 API 名称和路径 → 参见当字符串失效时,转向结构
找不到 3 个唯一字符串 → 可能已加壳。针对解压后的版本或检测加壳器本身。
规则匹配良性软件文件 → 字符串不够独特。1-2 个匹配 = 调查并收紧;3-5 个匹配 = 寻找不同的指标;6+ 个匹配 = 重新开始。
即使优化后性能仍然很差 → 架构问题。拆分为多个聚焦的规则或添加严格的预过滤器。
难以编写描述 → 规则太模糊。如果您无法解释它捕获了什么,那它捕获的东西就太多了。
FP Investigation Flow:
│
├─ 1. Which string matched?
│ Run: yr scan -s rule.yar false_positive.exe
│
├─ 2. Is it in a legitimate library?
│ └─ Add: not $fp_vendor_string exclusion
│
├─ 3. Is it a common development pattern?
│ └─ Find more specific indicator, replace the string
│
├─ 4. Are multiple generic strings matching together?
│ └─ Tighten to require all + add unique marker
│
└─ 5. Is the malware using common techniques?
└─ Target malware-specific implementation details, not the technique
What string type should I use?
│
├─ Exact ASCII/Unicode text?
│ └─ TEXT: $s = "MutexName" ascii wide
│
├─ Specific byte sequence?
│ └─ HEX: $h = { 4D 5A 90 00 }
│
├─ Byte sequence with variation?
│ └─ HEX with wildcards: { 4D 5A ?? ?? 50 45 }
│
├─ Pattern with structure (URLs, paths)?
│ └─ BOUNDED REGEX: /https:\/\/[a-z]{5,20}\.onion/
│
└─ Unknown encoding (XOR, base64)?
└─ TEXT with modifier: $s = "config" xor(0x00-0xFF)
在编写任何基于字符串的规则之前:
Is the sample packed?
├─ Entropy > 7.0?
│ └─ Likely packed — find unpacked layer first
├─ Few/no readable strings?
│ └─ Likely packed — use entropy, PE structure, or packer signatures
├─ UPX/MPRESS/custom packer detected?
│ └─ Target the unpacked payload OR detect the packer itself
└─ Readable strings available?
└─ Proceed with string-based detection
专家指导: 不要针对加壳层编写规则。加壳方式会变;载荷不会。
如果 yarGen 仅返回 API 名称和通用路径:
String extraction failed — what now?
├─ High entropy sections?
│ └─ Use math.entropy() on specific sections
├─ Unusual imports pattern?
│ └─ Use pe.imphash() for import hash clustering
├─ Consistent PE structure anomalies?
│ └─ Target section names, sizes, characteristics
├─ Metadata present?
│ └─ Target version info, timestamps, resources
└─ Nothing unique?
└─ This sample may not be detectable with YARA alone
专家指导: "可以尝试使用其他文件属性,例如元数据、熵、导入哈希或其他保持恒定的数据。" — Kaspersky Applied YARA Training
字符串选择: 互斥体名称是黄金;C2 路径是白银;错误消息是青铜。栈字符串几乎总是唯一的。如果您需要超过 6 个字符串,那就是过度拟合。
条件设计: 从 filesize < 开始,然后是魔数,接着是字符串,最后是模块。如果超过 5 行,拆分为多个规则。
质量信号: yarGen 输出需要过滤掉 80%。匹配变种少于 50% 的规则太窄;匹配良性软件的规则太宽。
修饰符纪律:
nocase 或 wide — 仅当您有确凿证据表明样本中大小写/编码存在变化时才使用nocase 会使原子生成翻倍;wide 会使字符串匹配翻倍 — 两者都有实际成本正则表达式锚定:
/mshta\.exe http:\/\/.../ 而不是 /http:\/\/.../循环纪律:
filesize < 100KB and for all i in (1..#a) : ...#a 在大文件中可能成千上万 — 指数级减速YARA-X 技巧: $_unused 用于抑制警告;private $s 用于隐藏输出;每次提交前使用 yr check + yr fmt。
Should I use a module or raw bytes?
├─ Need imphash/rich header/authenticode?
│ └─ Use PE module — too complex to replicate
├─ Just checking magic bytes or simple offsets?
│ └─ Use uint16/uint32 — faster, no module overhead
├─ Checking section names/sizes?
│ └─ PE module is cleaner, but add magic bytes filter FIRST
├─ Checking Chrome extension permissions?
│ └─ Use crx module — string parsing is fragile
└─ Checking LNK target paths?
└─ Use lnk module — LNK format is complex
专家指导: "避免使用 magic 模块 — 改用显式的十六进制检查" — Neo23x0。应用这个原则:如果能用 uint32() 完成,就不要加载模块。
近期版本的关键新增功能:
private $helper = "pattern" — 匹配但隐藏输出// suppress: slow_pattern 内联注释filesize < 10_000_000 以提高可读性yr fmt rules/ 以标准化格式yr scan --output-format ndjson 用于工具集成YARA-X 提供了旧版 YARA 所缺乏的诊断工具:
规则开发周期:
# 1. Write initial rule
# 2. Check syntax with detailed errors
yr check rule.yar
# 3. Format consistently
yr fmt -w rule.yar
# 4. Dump module output to inspect file structure (no dummy rule needed)
yr dump -m pe sample.exe --output-format yaml
# 5. Scan with timing info
time yr scan -s rule.yar corpus/
何时使用 yr dump:
YARA-X 诊断优势: 错误信息包含精确的源代码位置。如果 yr check 指向第 15 行,问题确实在第 15 行(不像旧版 YARA)。
crx 模块支持检测恶意的 Chrome 扩展。需要 YARA-X v1.5.0+(基本功能),v1.11.0+ 支持 permhash()。
关键 API: crx.is_crx, crx.permissions, crx.permhash()
危险信号: nativeMessaging + downloads, debugger 权限,在 <all_urls> 上的内容脚本
import "crx"
rule SUSP_CRX_HighRiskPerms {
condition:
crx.is_crx and
for any perm in crx.permissions : (perm == "debugger")
}
完整 API 参考、权限风险评估和示例规则,请参阅 crx-module.md。
dex 模块支持检测 Android 恶意软件。需要 YARA-X v1.11.0+。与旧版 YARA 的 dex 模块不兼容 — API 完全不同。
关键 API: dex.is_dex, dex.contains_class(), dex.contains_method(), dex.contains_string()
危险信号: 单字母类名(混淆),DexClassLoader 反射,加密资产
import "dex"
rule SUSP_DEX_DynamicLoading {
condition:
dex.is_dex and
dex.contains_class("Ldalvik/system/DexClassLoader;")
}
完整 API 参考、混淆检测和示例规则,请参阅 dex-module.md。
YARA-X 具有 99% 的规则兼容性,但执行更严格的验证。
快速迁移:
yr check --relaxed-re-syntax rules/ # Identify issues
# Fix each issue, then:
yr check rules/ # Verify without relaxed mode
常见修复:
| 问题 | 旧版 | YARA-X 修复 |
|---|---|---|
正则表达式中的字面量 { | /{/ | /\{/ |
| 无效转义 | \R 静默视为字面量 | \\R 或 R |
| Base64 字符串 | 任意长度 | 需要 3+ 字符 |
| 负索引 | @a[-1] | @a[#a - 1] |
| 重复修饰符 | 允许 | 移除重复项 |
注意: 仅将
--relaxed-re-syntax用作诊断工具。修复问题,而不是依赖宽松模式。
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
常见前缀: MAL_ (恶意软件), HKTL_ (黑客工具), WEBSHELL_, EXPL_, SUSP_ (可疑), GEN_ (通用)
平台: Win_, Lnx_, Mac_, Android_, CRX_
示例: MAL_Win_Emotet_Loader_Jan25
完整约定、元数据要求和命名示例,请参阅 style-guide.md。
每条规则都需要:description (以 "Detects" 开头), author, reference, date。
meta:
description = "Detects Example malware via unique mutex and C2 path"
author = "Your Name <email@example.com>"
reference = "https://example.com/analysis"
date = "2025-01-29"
良好: 互斥体名称,PDB 路径,C2 路径,栈字符串,配置标记 不良: API 名称,常见可执行文件,格式说明符,通用路径
完整的决策树和示例,请参阅 strings.md。
为短路求值排序条件:
filesize < 10MB (即时)uint16(0) == 0x5A4D (几乎即时)详细的优化模式,请参阅 performance.md。
yarGen -m samples/ --excludegoodyr check, yr fmt, linter 脚本详细的验证工作流和误报调查,请参阅 testing.md。
涵盖从样本收集到部署所有阶段的综合分步指南,请参阅 rule-development.md。
| 错误 | 不良 | 良好 |
|---|---|---|
| 将 API 名称作为指标 | "VirtualAlloc" | 调用点的十六进制模式 + 唯一互斥体 |
| 无界正则表达式 | /https?:\/\/.*/ | /https?:\/\/[a-z0-9]{8,12}\.onion/ |
| 缺少文件类型过滤器 | 首先 pe.imports(...) | 首先 uint16(0) == 0x5A4D and filesize < 10MB |
| 短字符串 | "abc" (3 字节) | "abcdef" (4+ 字节) |
| 未转义的大括号 (YARA-X) | /config{key}/ | /config\{key\}/ |
快速见效: 将 filesize 放在首位,避免 nocase,使用有界正则表达式 {1,100},优先使用十六进制而非正则表达式。
危险信号: 字符串 <4 字节,无界正则表达式 (.*),没有文件类型过滤器的模块。
原子理论和优化细节,请参阅 performance.md。
| 主题 | 文档 |
|---|---|
| 命名和元数据约定 | style-guide.md |
| 性能和原子优化 | performance.md |
| 字符串类型和判断 | strings.md |
| 测试和验证 | testing.md |
| Chrome 扩展模块 (crx) | crx-module.md |
| Android DEX 模块 (dex) | dex-module.md |
| 主题 | 文档 |
|---|---|
| 完整的规则开发流程 | rule-development.md |
examples/ 目录包含真实的、注明出源的规则,展示了最佳实践:
| 示例 | 展示内容 | 来源 |
|---|---|---|
| MAL_Win_Remcos_Jan25.yar | PE 恶意软件:分级字符串计数,每个家族多条规则 | Elastic Security |
| MAL_Mac_ProtonRAT_Jan25.yar | macOS:Mach-O 魔数,多类别分组 | Airbnb BinaryAlert |
| MAL_NPM_SupplyChain_Jan25.yar | npm 供应链:真实攻击模式,ERC-20 选择器 | Stairwell Research |
| SUSP_JS_Obfuscation_Jan25.yar | JavaScript:混淆器检测,基于密度的匹配 | imp0rtp3, Nils Kuhnert |
| SUSP_CRX_SuspiciousPermissions.yar | Chrome 扩展:crx 模块,权限 | Educational |
uv run {baseDir}/scripts/yara_lint.py rule.yar # Validate style/metadata
uv run {baseDir}/scripts/atom_analyzer.py rule.yar # Check string quality
详细的脚本文档,请参阅 README.md。
部署任何规则前:
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE} 格式{ 并具有有效的转义序列yr check 通过且无错误yr fmt --check 通过(格式一致)从生产规则中学习。这些仓库包含经过良好测试、正确注明出源的规则:
| 仓库 | 重点 | 维护者 |
|---|---|---|
| Neo23x0/signature-base | 17,000+ 生产规则,多平台 | Florian Roth |
| Elastic/protections-artifacts | 1,000+ 经过端点测试的规则 | Elastic Security |
| reversinglabs/reversinglabs-yara-rules | 威胁研究规则 | ReversingLabs |
| imp0rtp3/js-yara-rules | JavaScript/浏览器恶意软件 | imp0rtp3 |
| InQuest/awesome-yara | 精选资源索引 | InQuest |
| 指南 | 用途 |
|---|---|
| YARA Style Guide | 命名约定、元数据、字符串前缀 |
| YARA Performance Guidelines | 原子优化、正则表达式边界 |
| Kaspersky Applied YARA Training | 来自生产使用的专家技术 |
| 资源 | 用途 |
|---|---|
| Apple XProtect | 位于 /System/Library/CoreServices/XProtect.bundle/ 的生产 macOS 规则 |
| objective-see | macOS 恶意软件研究和样本 |
| macOS Security Tools | 参考列表 |
生产规则通常按类型分组指标:
strings:
// Category A: Library indicators
$a1 = "SRWebSocket" ascii
$a2 = "SocketRocket" ascii
// Category B: Behavioral indicators
$b1 = "SSH tunnel" ascii
$b2 = "keylogger" ascii nocase
// Category C: C2 patterns
$c1 = /https:\/\/[a-z0-9]{8,16}\.onion/
condition:
filesize < 10MB and
any of ($a*) and any of ($b*) // Require evidence from BOTH categories
为何有效: 不同类型的指标具有不同的置信度。单个 C2 域名可能是决定性的,而您需要多个库导入才能有信心。通过 $a*、$b*、$c* 分组可以让您表达分级要求。
每周安装数
901
仓库
GitHub 星标数
3.9K
首次出现
Jan 30, 2026
安全审计
Gen Agent Trust HubPass[SocketPass](/trailofbits/skills/yara-rule-authoring/security/socket
Write detection rules that catch malware without drowning in false positives.
This skill targets YARA-X , the Rust-based successor to legacy YARA. YARA-X powers VirusTotal's production systems and is the recommended implementation. See Migrating from Legacy YARA if you have existing rules.
Strings must generate good atoms — YARA extracts 4-byte subsequences for fast matching. Strings with repeated bytes, common sequences, or under 4 bytes force slow bytecode verification on too many files.
Target specific families, not categories — "Detects ransomware" catches everything and nothing. "Detects LockBit 3.0 configuration extraction routine" catches what you want.
Test against goodware before deployment — A rule that fires on Windows system files is useless. Validate against VirusTotal's goodware corpus or your own clean file set.
Short-circuit with cheap checks first — Put filesize < 10MB and uint16(0) == 0x5A4D before expensive string searches or module calls.
Metadata is documentation — Future you (and your team) need to know what this catches, why, and where the sample came from.
YARA-X is the Rust-based successor to legacy YARA: 5-10x faster regex, better errors, built-in formatter, stricter validation, new modules (crx, dex), 99% rule compatibility.
Install: brew install yara-x (macOS) or cargo install yara-x
Essential commands: yr scan, yr check, yr fmt, yr dump
YARA works on any file type. Adapt patterns to your target:
| Platform | Magic Bytes | Bad Strings | Good Strings |
|---|---|---|---|
| Windows PE | uint16(0) == 0x5A4D | API names, Windows paths | Mutex names, PDB paths |
| macOS Mach-O | uint32(0) == 0xFEEDFACE (32-bit), 0xFEEDFACF (64-bit), 0xCAFEBABE (universal) | Common Obj-C methods | Keylogger strings, persistence paths |
| JavaScript/Node | (none needed) | require, , |
No dedicated Mach-O module exists yet. Use magic byte checks + string patterns:
Magic bytes:
// Mach-O 32-bit
uint32(0) == 0xFEEDFACE
// Mach-O 64-bit
uint32(0) == 0xFEEDFACF
// Universal binary (fat binary)
uint32(0) == 0xCAFEBABE or uint32(0) == 0xBEBAFECA
Good indicators for macOS malware:
CGEventTapCreate, kCGEventKeyDownssh -D, tunnel, socks~/Library/LaunchAgents, /Library/LaunchDaemonssecurity find-generic-password, keychainExample pattern from Airbnb BinaryAlert:
rule SUSP_Mac_ProtonRAT
{
strings:
// Library indicators
$lib1 = "SRWebSocket" ascii
$lib2 = "SocketRocket" ascii
// Behavioral indicators
$behav1 = "SSH tunnel not launched" ascii
$behav2 = "Keylogger" ascii
condition:
(uint32(0) == 0xFEEDFACF or uint32(0) == 0xCAFEBABE) and
any of ($lib*) and any of ($behav*)
}
Writing a JavaScript rule?
├─ npm package?
│ ├─ Check package.json patterns
│ ├─ Look for postinstall/preinstall hooks
│ └─ Target exfil patterns: fetch + env access + credential paths
├─ Browser extension?
│ ├─ Chrome: Use crx module
│ └─ Others: Target manifest patterns, background script behaviors
├─ Standalone JS file?
│ ├─ Look for obfuscation markers: eval+atob, fromCharCode chains
│ ├─ Target unique function/variable names (often survive minification)
│ └─ Check for packed/encoded payloads
└─ Minified/webpack bundle?
├─ Target unique strings that survive bundling (URLs, magic values)
└─ Avoid function names (will be mangled)
JavaScript-specific good strings:
{ 70 a0 82 31 } (transfer){ E2 80 8B E2 80 8C }_0x, var _0xJavaScript-specific bad strings:
require, fetch, axios — too commonBuffer, crypto — legitimate uses everywhereprocess.env alone — need specific env var names| Tool | Purpose |
|---|---|
| yarGen | Extract candidate strings: yarGen.py -m samples/ --excludegood → validate with yr check |
| FLOSS | Extract obfuscated/stack strings: floss sample.exe (when yarGen fails) |
| yr CLI | Validate: yr check, scan: yr scan -s, inspect: yr dump -m pe |
| signature-base | Study quality examples |
Master these five. Don't get distracted by tool catalogs.
When you catch yourself thinking these, stop and reconsider.
| Rationalization | Expert Response |
|---|---|
| "This generic string is unique enough" | Test against goodware first. Your intuition is wrong. |
| "yarGen gave me these strings" | yarGen suggests, you validate. Check each one manually. |
| "It works on my 10 samples" | 10 samples ≠ production. Use VirusTotal goodware corpus. |
| "One rule to catch all variants" | Causes FP floods. Target specific families. |
| "I'll make it more specific if we get FPs" | Write tight rules upfront. FPs burn trust. |
| "This hex pattern is unique" | Unique in one sample ≠ unique across malware ecosystem. |
| "Performance doesn't matter" | One slow rule slows entire ruleset. Optimize atoms. |
| "PEiD rules still work" | Obsolete. 32-bit packers aren't relevant. |
| "I'll add more conditions later" | Weak rules deployed = damage done. |
| "This is just for hunting" | Hunting rules become detection rules. Same quality bar. |
| "The API name makes it malicious" | Legitimate software uses same APIs. Need behavioral context. |
Is this string good enough?
├─ Less than 4 bytes?
│ └─ NO — find longer string
├─ Contains repeated bytes (0000, 9090)?
│ └─ NO — add surrounding context
├─ Is an API name (VirtualAlloc, CreateRemoteThread)?
│ └─ NO — use hex pattern of call site instead
├─ Appears in Windows system files?
│ └─ NO — too generic, find something unique
├─ Is it a common path (C:\Windows\, cmd.exe)?
│ └─ NO — find malware-specific paths
├─ Unique to this malware family?
│ └─ YES — use it
└─ Appears in other malware too?
└─ MAYBE — combine with family-specific marker
Should I require all strings or allow any?
├─ Strings are individually unique to malware?
│ └─ any of them (each alone is suspicious)
├─ Strings are common but combination is suspicious?
│ └─ all of them (require the full pattern)
├─ Strings have different confidence levels?
│ └─ Group: all of ($core_*) and any of ($variant_*)
└─ Seeing many false positives?
└─ Tighten: switch any → all, add more required strings
Lesson from production: Rules using any of ($network_*) where strings included "fetch", "axios", "http" matched virtually all web applications. Switching to require credential path AND network call AND exfil destination eliminated FPs.
Stop and pivot when:
yarGen returns only API names and paths → See When Strings Fail, Pivot to Structure
Can't find 3 unique strings → Probably packed. Target the unpacked version or detect the packer.
Rule matches goodware files → Strings aren't unique enough. 1-2 matches = investigate and tighten; 3-5 matches = find different indicators; 6+ matches = start over.
Performance is terrible even after optimization → Architecture problem. Split into multiple focused rules or add strict pre-filters.
Description is hard to write → The rule is too vague. If you can't explain what it catches, it catches too much.
FP Investigation Flow:
│
├─ 1. Which string matched?
│ Run: yr scan -s rule.yar false_positive.exe
│
├─ 2. Is it in a legitimate library?
│ └─ Add: not $fp_vendor_string exclusion
│
├─ 3. Is it a common development pattern?
│ └─ Find more specific indicator, replace the string
│
├─ 4. Are multiple generic strings matching together?
│ └─ Tighten to require all + add unique marker
│
└─ 5. Is the malware using common techniques?
└─ Target malware-specific implementation details, not the technique
What string type should I use?
│
├─ Exact ASCII/Unicode text?
│ └─ TEXT: $s = "MutexName" ascii wide
│
├─ Specific byte sequence?
│ └─ HEX: $h = { 4D 5A 90 00 }
│
├─ Byte sequence with variation?
│ └─ HEX with wildcards: { 4D 5A ?? ?? 50 45 }
│
├─ Pattern with structure (URLs, paths)?
│ └─ BOUNDED REGEX: /https:\/\/[a-z]{5,20}\.onion/
│
└─ Unknown encoding (XOR, base64)?
└─ TEXT with modifier: $s = "config" xor(0x00-0xFF)
Before writing any string-based rule:
Is the sample packed?
├─ Entropy > 7.0?
│ └─ Likely packed — find unpacked layer first
├─ Few/no readable strings?
│ └─ Likely packed — use entropy, PE structure, or packer signatures
├─ UPX/MPRESS/custom packer detected?
│ └─ Target the unpacked payload OR detect the packer itself
└─ Readable strings available?
└─ Proceed with string-based detection
Expert guidance: Don't write rules against packed layers. The packing changes; the payload doesn't.
If yarGen returns only API names and generic paths:
String extraction failed — what now?
├─ High entropy sections?
│ └─ Use math.entropy() on specific sections
├─ Unusual imports pattern?
│ └─ Use pe.imphash() for import hash clustering
├─ Consistent PE structure anomalies?
│ └─ Target section names, sizes, characteristics
├─ Metadata present?
│ └─ Target version info, timestamps, resources
└─ Nothing unique?
└─ This sample may not be detectable with YARA alone
Expert guidance: "One can try to use other file properties, such as metadata, entropy, import hashes or other data which stays constant." — Kaspersky Applied YARA Training
String selection: Mutex names are gold; C2 paths silver; error messages bronze. Stack strings are almost always unique. If you need >6 strings, you're over-fitting.
Condition design: Start with filesize <, then magic bytes, then strings, then modules. If >5 lines, split into multiple rules.
Quality signals: yarGen output needs 80% filtering. Rules matching <50% of variants are too narrow; matching goodware are too broad.
Modifier discipline:
nocase or wide speculatively — only when you have confirmed evidence the case/encoding varies in samplesnocase doubles atom generation; wide doubles string matching — both have real costsRegex anchoring:
/mshta\.exe http:\/\/.../ not /http:\/\/.../Loop discipline:
filesize < 100KB and for all i in (1..#a) : ...#a can be thousands in large files — exponential slowdownYARA-X tips: $_unused to suppress warnings; private $s to hide from output; yr check + yr fmt before every commit.
Should I use a module or raw bytes?
├─ Need imphash/rich header/authenticode?
│ └─ Use PE module — too complex to replicate
├─ Just checking magic bytes or simple offsets?
│ └─ Use uint16/uint32 — faster, no module overhead
├─ Checking section names/sizes?
│ └─ PE module is cleaner, but add magic bytes filter FIRST
├─ Checking Chrome extension permissions?
│ └─ Use crx module — string parsing is fragile
└─ Checking LNK target paths?
└─ Use lnk module — LNK format is complex
Expert guidance: "Avoid the magic module — use explicit hex checks instead" — Neo23x0. Apply this principle: if you can do it with uint32(), don't load a module.
Key additions from recent releases:
private $helper = "pattern" — matches but hidden from output// suppress: slow_pattern inline commentsfilesize < 10_000_000 for readabilityyr fmt rules/ to standardize formattingyr scan --output-format ndjson for toolingYARA-X provides diagnostic tools legacy YARA lacks:
Rule development cycle:
# 1. Write initial rule
# 2. Check syntax with detailed errors
yr check rule.yar
# 3. Format consistently
yr fmt -w rule.yar
# 4. Dump module output to inspect file structure (no dummy rule needed)
yr dump -m pe sample.exe --output-format yaml
# 5. Scan with timing info
time yr scan -s rule.yar corpus/
When to useyr dump:
YARA-X diagnostic advantage: Error messages include precise source locations. If yr check points to line 15, the issue is actually on line 15 (unlike legacy YARA).
The crx module enables detection of malicious Chrome extensions. Requires YARA-X v1.5.0+ (basic), v1.11.0+ for permhash().
Key APIs: crx.is_crx, crx.permissions, crx.permhash()
Red flags: nativeMessaging + downloads, debugger permission, content scripts on <all_urls>
import "crx"
rule SUSP_CRX_HighRiskPerms {
condition:
crx.is_crx and
for any perm in crx.permissions : (perm == "debugger")
}
See crx-module.md for complete API reference, permission risk assessment, and example rules.
The dex module enables detection of Android malware. Requires YARA-X v1.11.0+. Not compatible with legacy YARA's dex module — API is completely different.
Key APIs: dex.is_dex, dex.contains_class(), dex.contains_method(), dex.contains_string()
Red flags: Single-letter class names (obfuscation), DexClassLoader reflection, encrypted assets
import "dex"
rule SUSP_DEX_DynamicLoading {
condition:
dex.is_dex and
dex.contains_class("Ldalvik/system/DexClassLoader;")
}
See dex-module.md for complete API reference, obfuscation detection, and example rules.
YARA-X has 99% rule compatibility, but enforces stricter validation.
Quick migration:
yr check --relaxed-re-syntax rules/ # Identify issues
# Fix each issue, then:
yr check rules/ # Verify without relaxed mode
Common fixes:
| Issue | Legacy | YARA-X Fix |
|---|---|---|
Literal { in regex | /{/ | /\{/ |
| Invalid escapes | \R silently literal | \\R or R |
| Base64 strings | Any length | 3+ chars required |
| Negative indexing | @a[-1] |
Note: Use
--relaxed-re-syntaxonly as a diagnostic tool. Fix issues rather than relying on relaxed mode.
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE}
Common prefixes: MAL_ (malware), HKTL_ (hacking tool), WEBSHELL_, EXPL_, SUSP_ (suspicious), GEN_ (generic)
Platforms: Win_, Lnx_, Mac_, Android_, CRX_
Example: MAL_Win_Emotet_Loader_Jan25
See style-guide.md for full conventions, metadata requirements, and naming examples.
Every rule needs: description (starts with "Detects"), author, reference, date.
meta:
description = "Detects Example malware via unique mutex and C2 path"
author = "Your Name <email@example.com>"
reference = "https://example.com/analysis"
date = "2025-01-29"
Good: Mutex names, PDB paths, C2 paths, stack strings, configuration markers Bad: API names, common executables, format specifiers, generic paths
See strings.md for the full decision tree and examples.
Order conditions for short-circuit:
filesize < 10MB (instant)uint16(0) == 0x5A4D (nearly instant)See performance.md for detailed optimization patterns.
yarGen -m samples/ --excludegoodyr check, yr fmt, linter scriptSee testing.md for detailed validation workflow and FP investigation.
For a comprehensive step-by-step guide covering all phases from sample collection to deployment, see rule-development.md.
| Mistake | Bad | Good |
|---|---|---|
| API names as indicators | "VirtualAlloc" | Hex pattern of call site + unique mutex |
| Unbounded regex | /https?:\/\/.*/ | /https?:\/\/[a-z0-9]{8,12}\.onion/ |
| Missing file type filter | pe.imports(...) first | uint16(0) == 0x5A4D and filesize < 10MB first |
| Short strings | "abc" (3 bytes) |
Quick wins: Put filesize first, avoid nocase, bounded regex {1,100}, prefer hex over regex.
Red flags: Strings <4 bytes, unbounded regex (.*), modules without file-type filter.
See performance.md for atom theory and optimization details.
| Topic | Document |
|---|---|
| Naming and metadata conventions | style-guide.md |
| Performance and atom optimization | performance.md |
| String types and judgment | strings.md |
| Testing and validation | testing.md |
| Chrome extension module (crx) | crx-module.md |
| Android DEX module (dex) | dex-module.md |
| Topic | Document |
|---|---|
| Complete rule development process | rule-development.md |
The examples/ directory contains real, attributed rules demonstrating best practices:
| Example | Demonstrates | Source |
|---|---|---|
| MAL_Win_Remcos_Jan25.yar | PE malware: graduated string counts, multiple rules per family | Elastic Security |
| MAL_Mac_ProtonRAT_Jan25.yar | macOS: Mach-O magic bytes, multi-category grouping | Airbnb BinaryAlert |
| MAL_NPM_SupplyChain_Jan25.yar | npm supply chain: real attack patterns, ERC-20 selectors | Stairwell Research |
| SUSP_JS_Obfuscation_Jan25.yar | JavaScript: obfuscator detection, density-based matching | imp0rtp3, Nils Kuhnert |
| SUSP_CRX_SuspiciousPermissions.yar |
uv run {baseDir}/scripts/yara_lint.py rule.yar # Validate style/metadata
uv run {baseDir}/scripts/atom_analyzer.py rule.yar # Check string quality
See README.md for detailed script documentation.
Before deploying any rule:
{CATEGORY}_{PLATFORM}_{FAMILY}_{VARIANT}_{DATE} format{ and valid escape sequencesyr check passes with no errorsyr fmt --check passes (consistent formatting)Learn from production rules. These repositories contain well-tested, properly attributed rules:
| Repository | Focus | Maintainer |
|---|---|---|
| Neo23x0/signature-base | 17,000+ production rules, multi-platform | Florian Roth |
| Elastic/protections-artifacts | 1,000+ endpoint-tested rules | Elastic Security |
| reversinglabs/reversinglabs-yara-rules | Threat research rules | ReversingLabs |
| imp0rtp3/js-yara-rules | JavaScript/browser malware | imp0rtp3 |
| InQuest/awesome-yara | Curated index of resources | InQuest |
| Guide | Purpose |
|---|---|
| YARA Style Guide | Naming conventions, metadata, string prefixes |
| YARA Performance Guidelines | Atom optimization, regex bounds |
| Kaspersky Applied YARA Training | Expert techniques from production use |
| Tool | Purpose |
|---|---|
| yarGen | Extract candidate strings from samples |
| FLOSS | Extract obfuscated and stack strings |
| YARA-CI | Automated goodware testing |
| YaraDbg | Web-based rule debugger |
| Resource | Purpose |
|---|---|
| Apple XProtect | Production macOS rules at /System/Library/CoreServices/XProtect.bundle/ |
| objective-see | macOS malware research and samples |
| macOS Security Tools | Reference list |
Production rules often group indicators by type:
strings:
// Category A: Library indicators
$a1 = "SRWebSocket" ascii
$a2 = "SocketRocket" ascii
// Category B: Behavioral indicators
$b1 = "SSH tunnel" ascii
$b2 = "keylogger" ascii nocase
// Category C: C2 patterns
$c1 = /https:\/\/[a-z0-9]{8,16}\.onion/
condition:
filesize < 10MB and
any of ($a*) and any of ($b*) // Require evidence from BOTH categories
Why this works: Different indicator types have different confidence levels. A single C2 domain might be definitive, while you need multiple library imports to be confident. Grouping by $a*, $b*, $c* lets you express graduated requirements.
Weekly Installs
901
Repository
GitHub Stars
3.9K
First Seen
Jan 30, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code802
opencode790
codex784
gemini-cli777
cursor762
github-copilot745
Excel财务建模规范与xlsx文件处理指南:专业格式、零错误公式与数据分析
37,200 周安装
Gemini Interactions API 指南:统一接口、智能体交互与服务器端状态管理
833 周安装
Apollo MCP 服务器:让AI代理通过GraphQL API交互的完整指南
834 周安装
智能体记忆系统构建指南:分块策略、向量存储与检索优化
835 周安装
Scrapling官方网络爬虫框架 - 自适应解析、绕过Cloudflare、Python爬虫库
836 周安装
抽奖赢家选取器 - 随机选择工具,支持CSV、Excel、Google Sheets,公平透明
838 周安装
Medusa 前端开发指南:使用 SDK、React Query 构建电商商店
839 周安装
fetchaxios| Obfuscator signatures, eval+decode chains |
| npm/pip packages | (none needed) | postinstall, dependencies | Suspicious package names, exfil URLs |
| Office docs | uint32(0) == 0x504B0304 | VBA keywords | Macro auto-exec, encoded payloads |
| VS Code extensions | (none needed) | vscode.workspace | Uncommon activationEvents, hidden file access |
| Chrome extensions | Use crx module | Common Chrome APIs | Permission abuse, manifest anomalies |
| Android apps | Use dex module | Standard DEX structure | Obfuscated classes, suspicious permissions |
| YARA-CI | Goodware corpus testing before deployment |
| "any of them is fine for these common strings" | Common strings + any = FP flood. Use any of only for individually unique strings. |
| "This regex is specific enough" | /fetch.*token/ matches all auth code. Add exfil destination requirement. |
| "The JavaScript looks clean" | Attackers poison legitimate code with injects. Check for eval+decode chains. |
| "I'll use .* for flexibility" | Unbounded regex = performance disaster + memory explosion. Use .{0,30}. |
| "I'll use --relaxed-re-syntax everywhere" | Masks real bugs. Fix the regex instead of hiding problems. |
@a[#a - 1] |
| Duplicate modifiers | Allowed | Remove duplicates |
"abcdef" (4+ bytes) |
| Unescaped braces (YARA-X) | /config{key}/ | /config\{key\}/ |
| Chrome extensions: crx module, permissions |
| Educational |