The Agent Skills Directory
npx skills add https://code.deepline.com通过提取多页面网站内容和招聘信息,计算拉普拉斯平滑提升分数,发现已成交客户与未成交客户之间的差异信号,从而识别买家与非买家的区别。
必需项:
deepline enrich 进行(无需单独的 exa、crustdata 等 API 密钥)非必需项:
积分: 数据丰富化会消耗 Deepline 积分(exa + crustdata 约 6 积分/公司)。使用 deeplineagent 的发现提示是额外的付费调用。在运行付费数据丰富化之前,务必获得用户批准。
始终使用 Deepline CLI (deepline enrich, deepline tools, deepline playground) 进行数据丰富化、数据提取和批量操作。 Deepline 提供:
对所有数据丰富化步骤使用 deepline enrich。对一次性工具调用使用 deepline tools execute。对检查结果使用 deepline playground。有关 Deepline 命令模式和供应商操作手册,请参考 gtm-meta-skill。
0. 发现目标公司(他们销售什么、销售给谁、差异化优势)
0.5. 发现生态系统(竞争对手、技术栈类别、买家角色)
1. 准备输入 CSV(域名、状态)— 去重同时出现在已成交和未成交列表中的域名
1.5. 生成垂直领域特定配置(关键词、工具、职位角色)
2. 多页面网站提取 + 招聘信息(Deepline 数据丰富化)
3. 质量门 — 验证文件完整性 + 覆盖率(>80% 有内容)
3.5. 审查生成的配置(根据丰富化数据进行验证)
4. 运行差异分析 (scripts/analyze_signals.py)
5. 生成报告(使用 references/report-template.md)
6. 结合信号解释规则进行审查 (references/signal-interpretation.md)
并非所有信号都同等重要。根据跨多个垂直领域的实际运行情况,信号遵循明确的可靠性顺序:
| 排名 | 信号来源 | 可靠性 | 原因 |
|---|---|---|---|
| 1 | 招聘信息(招聘领域相关职位) | 最高 | 活跃预算 + 公认痛点。一家公司招聘 3 名客户经理比其网站上出现"销售"是更强的信号。 |
| 2 | 分析师验证(Gartner、Forrester 提及) | 非常高 | 企业成熟度 + 品类认知。通常提升 4-7 倍,很少出现在未成交组。 |
| 3 | 合规基础设施(SOC2、GDPR、ISO) | 高 | 采购成熟度 + 企业就绪度。拥有合规页面的公司有正式的审批流程。 |
| 4 | 买家痛点语言(在职业/博客页面上) | 高 | 对问题的运营认知 — 例如,对于创意运营目标,"碎片化工具"有 3-6 倍提升。 |
| 5 | 技术栈工具(特定于角色的利基 SaaS) | 中等 | 基础设施就绪度 — 针对垂直领域特定买家的利基 SaaS 工具有 2-4 倍提升。 |
| 6 | 网站产品/营销内容 | 可变 | 可能表示买家或竞争对手 — 来源上下文至关重要。 |
当网站信号失效时: 对于 B2B 基础设施工具(AR 自动化、计费、合规),买家不会在公共网站上公开他们的痛点。一家批发分销商在其网站上谈论他们的产品,而不是应收账款挑战。对于这些垂直领域,优先考虑招聘信息、技术栈和公司特征信号,而不是网站关键词匹配。
关键:在进行任何数据丰富化或配置生成之前,首先执行此步骤。
使用 deeplineagent 来了解目标公司销售什么以及销售给谁:
# 示例提示
deeplineagent: "研究 {{company-domain}}。总结该公司销售什么、销售给谁、他们的差异化优势是什么,以及任何示例客户。如果需要,使用 Deepline 管理的工具。"
记录以下内容:
为何重要: 整个流程(exa 查询、关键词、技术栈、职位角色)都基于此发现进行调整。跳过此步骤会导致通用/不相关的信号。
使用 deeplineagent 来发现竞争格局和买家生态系统:
竞争对手发现:
deeplineagent: "研究 {product category} 的竞争格局。列出 3-5 个相关的软件公司或替代方案。"
示例:对于创意运营/DAM 工具,询问主要的 "{product category} software alternatives competitors"。
技术栈发现:
deeplineagent: "研究 {buyer persona} 的常见软件栈。按类别对工具进行分组。"
示例:对于创意团队,询问常见的 "creative teams software stack"。
职位角色发现:
deeplineagent: "研究 {buyer persona} 的常见职位头衔、职责和招聘模式。返回 10-15 个角色变体。"
示例:对于创意运营,询问 "creative operations job titles creative director content manager"。
记录发现:
创建 output/{{company}}-icp-input.csv:
domain,status
customer1.com,won
customer2.com,won
non-customer1.com,lost
non-customer2.com,lost
关键 — 在数据丰富化之前去重。 如果同一个域名同时出现在已成交和未成交组中(同一公司,多个 CRM 交易),Deepline 可能只获取一次招聘信息(针对第一行)。重复域名的内容在两组中是相同的 — 包含它会污染提升分数并导致 won_with_jobs 被低估。在运行数据丰富化之前,务必检查并删除重复域名:
# 构建输入 CSV 后检查重复项
from collections import Counter
domain_counts = Counter(r['domain'] for r in rows)
duplicate_domains = {d for d, c in domain_counts.items() if c > 1}
if duplicate_domains:
print(f"警告:{len(duplicate_domains)} 个域名同时出现在已成交和未成交中:")
for d in sorted(duplicate_domains):
print(f" {{d}}")
print("在数据丰富化之前删除这些行 — 它们会污染提升分数。")
如果存在重复项,删除这些域名的所有行(不仅仅是其中一个副本)。同一家公司有不同的交易结果并不能告诉我们买家与非买家的区别。
使用步骤 0 和 0.5 的发现,创建三个 JSON 配置文件。
有关 JSON 格式和生成指南,请参阅 references/keyword-catalog.md。
# 在 output/{{company}}/ 中创建配置文件
output/{{company}}-keywords.json # 关键词类别
output/{{company}}-tools.json # 按类别的技术栈工具
output/{{company}}-job-roles.json # 职位角色类别
生成方法:
关键词 — 混合以下内容:
技术栈 — 来自生态系统发现的工具:
职位角色 — 来自角色发现的头衔:
有关多垂直领域示例(创意运营、AR 自动化、销售互动、开发者工具),请参阅 references/keyword-catalog.md。每个示例都包含该垂直领域的关键词、工具和职位角色。
验证: 生成的配置是否与目标的垂直领域和买家角色匹配?如果不匹配,请根据步骤 0/0.5 的发现进行改进。
关键:切勿仅抓取主页。 使用带有 contents.text 的 exa_search 来发现并抓取每个域名约 8 个页面,只需一次 API 调用。
根据目标的产品类别动态生成 exa 查询:
# 通用查询(适用于大多数面向营销/销售/产品团队的 B2B SaaS)
QUERY="company product features integrations customers security pricing careers about case-studies"
# 对于面向后台团队(财务、人力资源、法务)的工具:
# 买家不会在营销页面上公开痛点 — 添加合规/审计页面,信号存在于这些页面
QUERY="company product features integrations customers security pricing careers compliance audit regulatory about"
# 对于开发者工具:
# 添加文档/API 页面 — 这些页面揭示了基础设施成熟度和集成就绪度
QUERY="company product features documentation api changelog github integrations security pricing careers about"
# 对于创意/营销工具:
QUERY="company product features portfolio use cases creative workflow customers integrations security pricing careers about"
# 对于销售工具:
QUERY="company product features playbooks outbound pipeline customers integrations security pricing careers about"
示例:
deepline enrich \
--input output/{{company}}-icp-input.csv \
--output output/{{company}}-enriched.csv \
--with '{"alias":"website","tool":"exa_search","payload":{"query":"{{exa-query-from-above}}","numResults":8,"type":"auto","includeDomains":["{{domain}}"],"contents":{"text":{"maxCharacters":3000,"verbosity":"compact","includeSections":["body"]}}}}' \
--with '{"alias":"jobs","tool":"crustdata_job_listings","payload":{"companyDomains":"{{domain}}","limit":50}}' \
为何使用带 contents 的 exa_search(而非 parallel_extract)?
在运行前获得用户积分批准。示例:"60 家公司 x 6 积分 = 约 360 积分。"
关键 — 在运行分析之前验证文件完整性。 deepline enrich 在操作系统缓冲区完全刷新到磁盘之前就将控制权返回给终端。在数据丰富化完成后立即运行分析脚本可能会读取部分写入的文件,其中最后 N 行的招聘信息列尚未同步 — 导致 won_with_jobs: 0 或招聘数据严重低估。务必验证:
# 1. 检查行数是否与输入匹配
INPUT_ROWS=$(wc -l < output/{{company}}-icp-input.csv)
OUTPUT_ROWS=$(wc -l < output/{{company}}-enriched.csv)
echo "输入:$INPUT_ROWS 行,输出:$OUTPUT_ROWS 行"
# 输出应等于输入(两者都包含标题行)
# 2. 抽查已知有招聘信息的已成交账户的招聘数据
python3 -c "
import csv, json, sys
csv.field_size_limit(sys.maxsize)
with open('output/{{company}}-enriched.csv') as f:
rows = list(csv.DictReader(f))
won_rows = [r for r in rows if r.get('status') == 'won']
jobs_col = 'jobs' # 或使用列索引
has_jobs = sum(1 for r in won_rows if r.get(jobs_col, '').strip() not in ('', '{}', 'null'))
print(f'有招聘数据的已成交行数:{{has_jobs}}/{len(won_rows)}')
# 如果这是 0 但您知道已成交账户应该有招聘信息,请等待并重新运行
"
如果 won_with_jobs 为 0 但您期望有招聘数据:
website 和 jobs 列名,而不是 __dl_full_result__。使用 --website-col N --jobs-col N 覆盖。文件验证后,检查覆盖率:
如果覆盖率不佳,使用 --rows 针对特定行重新运行失败的域名。
如果客户域名来自自动提取(CRM 导出、Exa API、案例研究抓取)而非手动验证的列表,请验证域名是否确实属于指定的公司。根据实际运行情况:高达 53% 的自动提取客户可能是误报 — 销售相同产品的竞争对手、域名不匹配以及不相关的公司。
# 检查可疑的域名模式
python3 -c "
import csv, sys
csv.field_size_limit(sys.maxsize)
with open('output/{{company}}-enriched.csv') as f:
rows = list(csv.DictReader(f))
for r in rows:
domain = r.get('domain', '')
# 标记用作来源 URL 的内容平台,而非公司域名
if any(x in domain for x in ['blog.', 'medium.com', 'substack.', 'wordpress.']):
print(f'警告:{{domain}} 看起来像内容平台,而非公司域名')
# 标记可能很通用的非常短的域名
if len(domain.split('.')[0]) <= 2:
print(f'检查:{{domain}} — 域名非常短,请验证它是否属于预期公司')
"
误报的危险信号:
在运行分析之前,根据丰富化数据抽查生成的配置:
# 抽样几个丰富化的公司
deepline playground output/{{company}}-enriched.csv
# 在 playground UI 中,检查:
# - 网站页面是否提及 keywords.json 中的关键词?
# - 招聘信息是否提及 job-roles.json 中的角色?
# - 集成/技术栈页面是否提及 tools.json 中的工具?
危险信号:
如果需要,修复并重新生成配置。
使用配置文件运行分析脚本:
python3 scripts/analyze_signals.py \
--input output/{{company}}-enriched.csv \
--keywords output/{{company}}-keywords.json \
--tools output/{{company}}-tools.json \
--job-roles output/{{company}}-job-roles.json \
--output output/{{company}}-analysis.json
脚本会自动检测网站和招聘数据的 __dl_full_result__ 列。如果需要,使用 --website-col N --jobs-col N 覆盖。
脚本计算的内容:
((won + 0.5) / (won_total + 1)) / ((lost + 0.5) / (lost_total + 1))阅读 references/report-template.md 了解完整的报告结构和质量规则。
报告结构概述:
信号强度条比例(用于第 0.2 部分):
≥10x → 🟩🟩🟩🟩🟩🟩 ≥4x → 🟩🟩🟩🟩🟩 ≥2.5x → 🟩🟩🟩🟩
≥2.0x → 🟩🟩🟩 ≥1.5x → 🟩🟩 ≥1.0x → 🟩
≥0.4x → 🟥🟥 ≥0.25x → 🟥🟥🟥 ≥0.15x → 🟥🟥🟥🟥
≥0.07x → 🟥🟥🟥🟥🟥 <0.07x → 🟥🟥🟥🟥🟥🟥
Apollo URL 格式(用于第 0.3 和 0.4 部分):
People: https://app.apollo.io/#/people?personTitles[]=Title+One&personTitles[]=Title+Two&personSeniorities[]=vp&personSeniorities[]=director&qOrganizationKeywordTags[]=vertical&organizationLocations[]=United+States&page=1
Companies: https://app.apollo.io/#/companies?qOrganizationKeywordTags[]=keyword&organizationLocations[]=United+States&organizationNumEmployeesRanges[]=201-500&page=1
使用 qOrganizationKeywordTags[] 进行关键词过滤(而非硬编码的行业标签 ID)。
所有部分的关键质量规则:
始终显示原始计数:15% (6) 而不仅仅是 15%
标题中的样本量:Won (n=37)、Lost (n=18)
仅对提升 > 2 倍且计数 >= 3 家公司加粗 — 在 1 家公司中出现 10 倍提升的信号,其可靠性低于在 4 家公司中出现 3 倍提升的信号
标记 n=1 信号:如果一个信号仅出现在 1 家已成交公司中,添加注释:*(单家公司 — 在用于评分前请验证)*。在评分模型中,给予 n=1 信号 0.3 倍权重,相对于 n=3+ 信号。
所有关键词表格的来源细分:添加一个"来源"列,显示 3w / 20j / 2both 格式(3 个仅网站、20 个仅招聘信息、2 个两者皆有)。这对于区分仅网站信号(置信度较低)和招聘信息信号(置信度较高)至关重要。
| 关键词 | Won (n=X) | Lost (n=Y) | 提升 | 来源 (w/j/both) | 解释 |
:在每个关键词表格之后,为前 3 个关键词添加带有链接来源的确切引用。分析脚本为每个关键词输出 ,包含 、、、 和 /。格式:
在编写解释列之前,阅读 references/signal-interpretation.md。关键规则:
网站数据:包含 exa_search 响应的 __dl_full_result__ 列。
data.results[].text — 页面内容
data.results[].url — 页面 URL
data.results[].title — 页面标题
招聘信息:包含 crustdata 响应的 __dl_full_result__ 列。
data.listings[].title — 职位头衔(非 "job_title")
data.listings[].description — 职位描述(非 "job_description")
data.listings[].category — 职位类别
data.listings[].url — 招聘信息 URL
| 步骤 | 每行积分 | 总计(60 家公司) |
|---|---|---|
| 带内容的 exa_search | ~5 | ~300 |
| crustdata_job_listings | ~1 | ~60 |
| 总计 | ~6 | ~360 |
在运行付费数据丰富化步骤之前,务必获得用户批准。
won_with_jobs。务必在步骤 1 中去重。deepline enrich 在操作系统缓冲区刷新之前就返回到终端。在步骤 3 中运行文件完整性检查,然后再执行 analyze_signals.py。当您期望有数据时出现 won_with_jobs: 0 的结果是症状;重新运行分析(无需重新丰富化)可以修复它。这些模式已在跨越创意运营、销售互动、AR 自动化、法律科技和 GTM 工具的多个客户分析中得到验证。在解释结果时,将它们作为起点 — 但务必根据特定目标的垂直领域进行验证。
| 信号模式 | 典型提升 | 已验证领域 | 含义 |
|---|---|---|---|
| 分析师验证(Gartner、Forrester) | 4.5x-6.5x | 企业 B2B SaaS | 公司已评估该类别,拥有企业采购流程 |
| 招聘 ICP 相关角色 | 3.8x-5.5x | 所有垂直领域 | 活跃预算 + 公认痛点 — 最高意向信号 |
| 发布案例研究 | 3.7x | 产品驱动 + 销售辅助 | 成熟的营销组织,重视证明点,对供应商友好 |
| 合规基础设施(GDPR、SOC2、ISO) | 2.1x-6.5x | 企业工具 | 正式审批流程、安全审查、更高的成交率 |
| 买家痛点语言(例如,"碎片化工具") | 2.9x-5.2x | 创意运营、MarTech | 对目标解决的具体问题有运营认知 |
| SDK/webhook/API 存在 | 2.5x-3.5x | 开发者相关工具 | 开发者文化,以编程方式集成工具 |
| 联系销售 / 销售主导的 GTM | 2.2x-5.5x | 企业销售工具 | 人工主导的销售流程 = 依赖客户经理 = 销售互动工具买家 |
| 信号模式 | 典型提升 | 含义 |
|---|---|---|
| 消费者信号(购物者、结账、取消、借记) | 0.2x | B2C 公司,非 B2B 销售组织 |
| 留存/流失语言 | 0.2x-0.4x | 消费者订阅模式,非企业购买 |
| 销售相同产品类别 | 0.1x-0.3x | 竞争对手,非买家 — 他们销售解决方案 |
| 12 个月以上无招聘信息 | N/A | 未增长,无招聘预算 |
根据实际运行情况,一个包含三个层级的 0-100 分模型效果良好:
评分阈值:60+ = 层级 1 立即联系,35-59 = 层级 2 基于触发条件,<35 = 培育或跳过。
每周安装数
364
来源
首次出现
2026年3月2日
安全审计
安装于
codex364
cursor363
gemini-cli363
github-copilot363
amp363
cline363
Discover differential signals between Closed Won and Closed Lost accounts by extracting multi-page website content and job listings, then computing Laplace-smoothed lift scores to identify what distinguishes buyers from non-buyers.
Required:
deepline enrich (no separate API keys for exa, crustdata, etc.)NOT required:
Credits: Enrichment consumes Deepline credits (~6 credits/company for exa + crustdata). Discovery prompts with deeplineagent are additional paid calls. Always get user approval before running paid enrichment.
Always use Deepline CLI (deepline enrich, , ) for enrichment, data extraction, and batch operations. Deepline provides:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
evidencecompanysource_typequoteurlpage_titlejob_title> **证据 — "关键词":**
> - [company.com](url) (页面标题): "...包含关键词的确切引用..."
> - [company.com](url) (职位: "职位头衔"): "...来自招聘信息的确切引用..."
利基技术栈工具:按类别报告特定的 SaaS 工具,而非通用关键词。"AWS"、"GitHub"、"Slack" 出现在大多数 B2B 网站上 — 这些不具有区分度。
反契合信号放在单独部分
必需解释列:解释每个信号为何对目标公司重要
供应商相邻证据标注:当引用证据时,用 ✅(清晰的买家信号)或 ⚠️(供应商相邻 — 例如,公司自己的产品/定价页面提及该关键词,因为他们销售类似产品)标注每个引用。这可以防止将竞争对手证据视为买家证据。
评分协调:第 0.5 部分(线索评分速查表)和第 6 部分(评分模型)必须具有匹配的分数值。在编写这两个部分后,交叉检查每个信号的分数分配。不匹配会使参考两者的用户感到困惑。
数据集注意事项:如果数据集使用相似公司作为已成交、样本量较小或有其他限制,请在执行摘要中添加"数据集注意事项"小节,解释这些限制是什么以及它们如何影响解释。
references/signal-interpretation.md| 利基技术栈(Figma、Frame.io、NetSuite) | 1.5x-5.5x | 垂直领域特定 | 为目标集成生态系统准备的基础设施 |
deepline toolsdeepline playgroundUse deepline enrich for all enrichment steps. Use deepline tools execute for one-off tool calls. Use deepline playground for inspecting results. Refer to the gtm-meta-skill for Deepline command patterns and provider playbooks.
0. Discover target company (what they sell, who they sell to, differentiation)
0.5. Discover ecosystem (competitors, tech stack category, buyer personas)
1. Prepare input CSV (domain, status) — deduplicate domains that appear in both won and lost
1.5. Generate vertical-specific configs (keywords, tools, job roles)
2. Multi-page website extraction + job listings (Deepline enrichment)
3. Quality gate — verify file completeness + coverage (>80% with content)
3.5. Review generated configs (validate against enriched data)
4. Run differential analysis (scripts/analyze_signals.py)
5. Generate report (using references/report-template.md)
6. Review with signal interpretation rules (references/signal-interpretation.md)
Not all signals are equal. From actual runs across multiple verticals, signals follow a clear reliability order:
| Rank | Signal Source | Reliability | Why |
|---|---|---|---|
| 1 | Job listings (hiring for domain-related roles) | Highest | Active budget + acknowledged pain. A company hiring 3 AEs is a stronger signal than "sales" on their website. |
| 2 | Analyst validation (Gartner, Forrester mentions) | Very High | Enterprise maturity + category awareness. Typically 4-7x lift, rarely appears in lost group. |
| 3 | Compliance infrastructure (SOC2, GDPR, ISO) | High | Procurement maturity + enterprise readiness. Companies with compliance pages have formal approval processes. |
| 4 | Buyer pain language (on careers/blog pages) | High | Operational awareness of the problem — e.g., "fragmented tools" at 3-6x lift for creative ops targets. |
| 5 | Tech stack tools (niche SaaS specific to persona) | Medium | Infrastructure readiness — niche SaaS tools at 2-4x lift for vertical-specific buyers. |
| 6 | Website product/marketing content | Variable | Can indicate buyer OR competitor — source context is everything. |
When website signals fail: For B2B infrastructure tools (AR automation, billing, compliance), buyers DON'T publish their pain on public websites. A wholesale distributor talks about their products on their website, not accounts receivable challenges. For these verticals, prioritize job listings, tech stack, and firmographic signals over website keyword matching.
CRITICAL: Do this FIRST before any enrichment or config generation.
Use deeplineagent to understand what the target company sells and who they sell to:
# Example prompt
deeplineagent: "Research {{company-domain}}. Summarize what the company sells, who they sell to, what makes them different, and any example customers. Use Deepline-managed tools if needed."
Document the following:
Why this matters: The entire pipeline (exa query, keywords, tech stack, job roles) adapts based on this discovery. Skipping this step results in generic/irrelevant signals.
Use deeplineagent to discover the competitive landscape and buyer ecosystem:
Competitor Discovery:
deeplineagent: "Research the competitive landscape for {product category}. List 3-5 relevant software companies or alternatives."
Example: For a creative ops/DAM tool, ask for the main "{product category} software alternatives competitors".
Tech Stack Discovery:
deeplineagent: "Research the common software stack for {buyer persona}. Group the tools by category."
Example: For creative teams, ask for the common "creative teams software stack".
Job Role Discovery:
deeplineagent: "Research the common job titles, responsibilities, and hiring patterns for {buyer persona}. Return 10-15 role variants."
Example: For creative ops, ask for "creative operations job titles creative director content manager".
Document findings:
Create output/{{company}}-icp-input.csv:
domain,status
customer1.com,won
customer2.com,won
non-customer1.com,lost
non-customer2.com,lost
CRITICAL — Deduplicate before enrichment. If the same domain appears in both won and lost groups (same company, multiple CRM deals), Deepline may only fetch job listings once (for the first row). The duplicate domain's content is identical in both groups — including it pollutes lift scores and causes won_with_jobs to be undercounted. Always check and remove duplicate domains before running enrichment:
# Check for duplicates after building the input CSV
from collections import Counter
domain_counts = Counter(r['domain'] for r in rows)
duplicate_domains = {d for d, c in domain_counts.items() if c > 1}
if duplicate_domains:
print(f"WARNING: {len(duplicate_domains)} domains appear in both won and lost:")
for d in sorted(duplicate_domains):
print(f" {{d}}")
print("Remove these rows before enrichment — they pollute lift scores.")
If duplicates exist, remove ALL rows for those domains (not just one copy). The same company with different deal outcomes tells us nothing about what distinguishes buyers from non-buyers.
Using the discovery from Steps 0 and 0.5 , create three JSON config files.
See references/keyword-catalog.md for JSON format and generation guidance.
# Create config files in output/{{company}}/
output/{{company}}-keywords.json # keyword categories
output/{{company}}-tools.json # tech stack tools by category
output/{{company}}-job-roles.json # job role categories
Generation approach:
Keywords — Mix of:
Tech Stack — Tools from ecosystem discovery:
Job Roles — Titles from role discovery:
See references/keyword-catalog.md for multi-vertical examples (creative ops, AR automation, sales engagement, developer tools). Each example includes keywords, tools, and job roles for that vertical.
Validation: Do the generated configs match the target's vertical and buyer persona? If not, refine based on Step 0/0.5 findings.
CRITICAL: Never scrape just the homepage. Use exa_search with contents.text to discover AND scrape ~8 pages per domain in a single API call.
Generate exa query dynamically based on target's product category:
# Generic query (works for most B2B SaaS selling to marketing/sales/product teams)
QUERY="company product features integrations customers security pricing careers about case-studies"
# For tools selling to back-office teams (finance, HR, legal):
# Buyers don't publish pain on marketing pages — add compliance/audit pages where signals live
QUERY="company product features integrations customers security pricing careers compliance audit regulatory about"
# For developer tools:
# Add documentation/API pages — these reveal infrastructure maturity and integration readiness
QUERY="company product features documentation api changelog github integrations security pricing careers about"
# For creative/marketing tools:
QUERY="company product features portfolio use cases creative workflow customers integrations security pricing careers about"
# For sales tools:
QUERY="company product features playbooks outbound pipeline customers integrations security pricing careers about"
Example:
deepline enrich \
--input output/{{company}}-icp-input.csv \
--output output/{{company}}-enriched.csv \
--with '{"alias":"website","tool":"exa_search","payload":{"query":"{{exa-query-from-above}}","numResults":8,"type":"auto","includeDomains":["{{domain}}"],"contents":{"text":{"maxCharacters":3000,"verbosity":"compact","includeSections":["body"]}}}}' \
--with '{"alias":"jobs","tool":"crustdata_job_listings","payload":{"companyDomains":"{{domain}}","limit":50}}' \
Why exa_search with contents (not parallel_extract)?
Get user credit approval before running. Example: "60 companies x 6 credits = ~360 credits."
CRITICAL — Verify file completeness BEFORE running analysis. deepline enrich returns control to the terminal before OS buffers fully flush to disk. Running the analysis script immediately after enrichment completes can read a partially-written file where job columns for the last N rows haven't synced yet — resulting in won_with_jobs: 0 or severely undercounted job data. Always verify:
# 1. Check row count matches input
INPUT_ROWS=$(wc -l < output/{{company}}-icp-input.csv)
OUTPUT_ROWS=$(wc -l < output/{{company}}-enriched.csv)
echo "Input: $INPUT_ROWS rows, Output: $OUTPUT_ROWS rows"
# Output should equal input (both include header)
# 2. Spot-check job data for a known won account with job listings
python3 -c "
import csv, json, sys
csv.field_size_limit(sys.maxsize)
with open('output/{{company}}-enriched.csv') as f:
rows = list(csv.DictReader(f))
won_rows = [r for r in rows if r.get('status') == 'won']
jobs_col = 'jobs' # or use column index
has_jobs = sum(1 for r in won_rows if r.get(jobs_col, '').strip() not in ('', '{}', 'null'))
print(f'Won rows with job data: {{has_jobs}}/{len(won_rows)}')
# If this is 0 and you know won accounts should have listings, wait and re-run
"
If won_with_jobs is 0 but you expect job data:
website and jobs column names, NOT __dl_full_result__. Use --website-col N --jobs-col N overrides.After file verification, check coverage:
If coverage is poor, re-run failed domains with --rows targeting specific rows.
If customer domains came from automated extraction (CRM exports, Exa API, case study scraping) rather than a manually verified list, validate that domains actually belong to the named companies. From actual runs: up to 53% of auto-extracted customers can be false positives — competitors selling the same product, domain mismatches, and unrelated companies.
# Check for suspicious domain patterns
python3 -c "
import csv, sys
csv.field_size_limit(sys.maxsize)
with open('output/{{company}}-enriched.csv') as f:
rows = list(csv.DictReader(f))
for r in rows:
domain = r.get('domain', '')
# Flag content platforms used as source URLs, not company domains
if any(x in domain for x in ['blog.', 'medium.com', 'substack.', 'wordpress.']):
print(f'WARNING: {{domain}} looks like a content platform, not a company domain')
# Flag very short domains that might be generic
if len(domain.split('.')[0]) <= 2:
print(f'CHECK: {{domain}} — very short domain, verify it belongs to the expected company')
"
Red flags for false positives:
Before running analysis , spot-check the generated configs against enriched data:
# Sample a few enriched companies
deepline playground output/{{company}}-enriched.csv
# In playground UI, check:
# - Do website pages mention the keywords in keywords.json?
# - Do job listings mention the roles in job-roles.json?
# - Do integrations/tech stack pages mention the tools in tools.json?
Red flags:
Fix and re-generate configs if needed.
Run the analysis script with the config files:
python3 scripts/analyze_signals.py \
--input output/{{company}}-enriched.csv \
--keywords output/{{company}}-keywords.json \
--tools output/{{company}}-tools.json \
--job-roles output/{{company}}-job-roles.json \
--output output/{{company}}-analysis.json
The script auto-detects __dl_full_result__ columns for website and jobs data. Override with --website-col N --jobs-col N if needed.
What the script computes:
((won + 0.5) / (won_total + 1)) / ((lost + 0.5) / (lost_total + 1))Read references/report-template.md for the full report structure and quality rules.
Report structure overview:
Signal Strength Bar scale (use in Section 0.2):
≥10x → 🟩🟩🟩🟩🟩🟩 ≥4x → 🟩🟩🟩🟩🟩 ≥2.5x → 🟩🟩🟩🟩
≥2.0x → 🟩🟩🟩 ≥1.5x → 🟩🟩 ≥1.0x → 🟩
≥0.4x → 🟥🟥 ≥0.25x → 🟥🟥🟥 ≥0.15x → 🟥🟥🟥🟥
≥0.07x → 🟥🟥🟥🟥🟥 <0.07x → 🟥🟥🟥🟥🟥🟥
Apollo URL format (use in Section 0.3 and 0.4):
People: https://app.apollo.io/#/people?personTitles[]=Title+One&personTitles[]=Title+Two&personSeniorities[]=vp&personSeniorities[]=director&qOrganizationKeywordTags[]=vertical&organizationLocations[]=United+States&page=1
Companies: https://app.apollo.io/#/companies?qOrganizationKeywordTags[]=keyword&organizationLocations[]=United+States&organizationNumEmployeesRanges[]=201-500&page=1
Use qOrganizationKeywordTags[] for keyword filters (not hardcoded industry tag IDs).
Key quality rules for all sections:
Raw counts always : 15% (6) not just 15%
Sample sizes in headers : Won (n=37), Lost (n=18)
Bold only lift > 2x AND count >= 3 companies — a signal in 1 company with 10x lift is less reliable than a signal in 4 companies with 3x lift
Flag n=1 signals : If a signal appears in only 1 won company, add a note: *(single company — verify before using in scoring)*. In the scoring model, give n=1 signals 0.3x weight vs n=3+ signals.
Source breakdown for ALL keyword tables : Add a Source column showing 3w / 20j / 2both format (3 website-only, 20 jobs-only, 2 from both). This is critical for distinguishing website-only signals (lower confidence) from job-listing signals (higher confidence).
| Keyword | Won (n=X) | Lost (n=Y) | Lift | Source (w/j/both) | Interpretation |
Source evidence required : After each keyword table, add exact quotes with linked sources for top 3 keywords. The analysis script outputs evidence per keyword with company, source_type, quote, url, and page_title/job_title. Format:
> **Evidence — "keyword":**
> - [company.com](url) (page title): "...exact quote with keyword..."
> - [company.com](url) (job: "Job Title"): "...exact quote from listing..."
Niche tech stack tools : Report specific SaaS tools by category, not generic keywords. "AWS", "GitHub", "Slack" appear on most B2B sites — these aren't differentiating.
Anti-fit signals in separate section
Interpretation column required : Explains WHY each signal matters for the target company
Vendor-adjacent evidence annotation : When citing evidence quotes, annotate each with ✅ (clear buyer signal) or ⚠️ (vendor-adjacent — e.g., the company's own product/pricing page mentions the keyword because they sell something similar). This prevents treating competitor evidence as buyer evidence.
Scoring reconciliation : Section 0.5 (Lead Scoring Cheatsheet) and Section 6 (Scoring Model) MUST have matching point values. After writing both sections, cross-check every signal's point allocation. Mismatches confuse users who reference both.
Dataset caveat : If the dataset uses Lookalikes as Won, has small sample sizes, or other limitations, add a "Dataset Caveat" subsection to the Executive Summary explaining what the limitations are and how they affect interpretation.
Read references/signal-interpretation.md before writing interpretation columns. Key rules:
Website data: __dl_full_result__ column containing exa_search response.
data.results[].text — page content
data.results[].url — page URL
data.results[].title — page title
Job listings: __dl_full_result__ column containing crustdata response.
data.listings[].title — job title (NOT "job_title")
data.listings[].description — job description (NOT "job_description")
data.listings[].category — job category
data.listings[].url — listing URL
| Step | Credits per row | Total (60 companies) |
|---|---|---|
| exa_search with contents | ~5 | ~300 |
| crustdata_job_listings | ~1 | ~60 |
| Total | ~6 | ~360 |
Always get user approval before running paid enrichment steps.
won_with_jobs. Always deduplicate in Step 1.deepline enrich returns to terminal before OS buffers flush. Run the file completeness check in Step 3 before executing analyze_signals.py. A won_with_jobs: 0 result when you expect data is the symptom; re-running the analysis (without re-enriching) fixes it.references/signal-interpretation.md.These patterns have been validated across multiple customer analyses spanning creative ops, sales engagement, AR automation, legal tech, and GTM tools. Use them as a starting point when interpreting results — but always validate against the specific target's vertical.
| Signal Pattern | Typical Lift | Validated For | What It Means |
|---|---|---|---|
| Analyst validation (Gartner, Forrester) | 4.5x-6.5x | Enterprise B2B SaaS | Company has evaluated the category, has enterprise procurement process |
| Hiring for ICP-related roles | 3.8x-5.5x | All verticals | Active budget + acknowledged pain — highest-intent signal |
| Published case studies | 3.7x | Product-led + sales-assist | Mature marketing org, values proof points, vendor-friendly |
| Compliance infrastructure (GDPR, SOC2, ISO) | 2.1x-6.5x | Enterprise tools | Formal approval processes, security reviews, higher close rates |
| Buyer pain language (e.g., "fragmented tools") | 2.9x-5.2x | Creative ops, MarTech | Operational awareness of the specific problem the target solves |
| SDK/webhook/API presence | 2.5x-3.5x | Developer-adjacent tools | Developer culture, integrates tools programmatically |
| Contact sales / sales-led GTM | 2.2x-5.5x | Enterprise sales tools | Human-led sales motion = AE-dependent = sales engagement tool buyer |
| Niche tech stack (Figma, Frame.io, NetSuite) | 1.5x-5.5x | Vertical-specific | Infrastructure readiness for the target's integration ecosystem |
| Signal Pattern | Typical Lift | What It Means |
|---|---|---|
| Consumer signals (shopper, checkout, cancel, debit) | 0.2x | B2C company, not B2B sales org |
| Retention/churn language | 0.2x-0.4x | Consumer subscription model, not enterprise buying |
| Selling same product category | 0.1x-0.3x | Competitor, not buyer — they SELL the solution |
| No job listings in 12+ months | N/A | Not growing, no hiring budget |
From actual runs, a 0-100 point model with three tiers works well:
Score thresholds: 60+ = Tier 1 immediate outreach, 35-59 = Tier 2 trigger-based, <35 = nurture or skip.
Weekly Installs
364
Source
First Seen
Mar 2, 2026
Security Audits
Installed on
codex364
cursor363
gemini-cli363
github-copilot363
amp363
cline363
Python PDF处理教程:合并拆分、提取文本表格、创建PDF文件
55,400 周安装