karpathy-jobs-bls-visualizer by aradotso/trending-skills
npx skills add https://github.com/aradotso/trending-skills --skill karpathy-jobs-bls-visualizer技能来自 ara.so — Daily 2026 技能集合。
这是一个研究工具,用于可视化探索美国劳工统计局 职业展望手册 中 342 种职业的数据。交互式矩形树图通过就业规模(面积)和任何选定的指标(颜色)为矩形着色:BLS 增长前景、中位数薪酬、教育要求或 LLM 评分的 AI 暴露度。整个流程是完全可复刻的——编写一个新的提示词,重新运行评分,即可获得新的颜色图层。
在线演示: karpathy.ai/jobs
# 克隆仓库
git clone https://github.com/karpathy/jobs
cd jobs
# 安装依赖项 (使用 uv)
uv sync
uv run playwright install chromium
创建一个包含你的 OpenRouter API 密钥的 .env 文件(仅在 LLM 评分时需要):
OPENROUTER_API_KEY=your_openrouter_key_here
按顺序运行以下命令以完成一次全新的构建:
# 1. 抓取 BLS 页面 (非无头 Playwright;BLS 会屏蔽机器人)
# 结果缓存在 html/ 目录下 — 只需运行一次
uv run python scrape.py
# 2. 将原始 HTML 转换为 pages/ 目录下的干净 Markdown
uv run python process.py
# 3. 提取结构化字段到 occupations.csv
uv run python make_csv.py
# 4. 通过 LLM 对 AI 暴露度进行评分 (使用 OpenRouter API,保存 scores.json)
uv run python score.py
# 5. 合并 CSV 和评分结果到 site/data.json 供前端使用
uv run python build_site_data.py
# 6. 本地运行可视化服务
cd site && python -m http.server 8000
# 打开 http://localhost:8000
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 文件 | 描述 |
|---|---|
occupations.json | 342 种职业的主列表(标题、URL、类别、slug) |
occupations.csv | 汇总统计:薪酬、教育程度、职位数量、增长预测 |
scores.json | 所有 342 种职业的 AI 暴露度评分 (0–10) + 理由 |
prompt.md | 所有数据整合在一个约 45K token 的文件中,方便粘贴到 LLM |
html/ | 来自 BLS 的原始 HTML 页面 (~40MB,数据源) |
pages/ | 每个职业页面的干净 Markdown 版本 |
site/index.html | 矩形树图可视化界面(单个 HTML 文件) |
site/data.json | 前端使用的紧凑合并数据 |
score.py | LLM 评分流程 — 复刻此文件以编写自定义提示词 |
最强大的功能:编写任何评分提示词,运行 score.py,获得新的矩形树图颜色图层。
score.py 中编辑提示词# score.py (简化结构)
SYSTEM_PROMPT = """
You are evaluating occupations for exposure to humanoid robotics over the next 10 years.
Score each occupation from 0 to 10:
- 0 = no meaningful exposure (e.g., requires fine social judgment, non-physical)
- 5 = moderate exposure (some tasks automatable, but humans still central)
- 10 = high exposure (repetitive physical tasks, predictable environments)
Consider: physical task complexity, environment predictability, dexterity requirements,
cost of robot vs human, regulatory barriers.
Respond ONLY with JSON: {"score": <int 0-10>, "rationale": "<1-2 sentences>"}
"""
# 该流程从 pages/ 读取每个职业的 Markdown,
# 将其发送给 LLM,并将结果写入 scores.json
# scores.json 结构:
{
"software-developers": {
"score": 1,
"rationale": "Software development is digital and cognitive; humanoid robots provide no advantage."
},
"construction-laborers": {
"score": 7,
"rationale": "Physical, repetitive outdoor tasks are targets for humanoid robotics, though unstructured environments remain challenging."
}
// ... 总共 342 种职业
}
uv run python build_site_data.py
cd site && python -m http.server 8000
occupations.json 条目{
"title": "Software Developers",
"url": "https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm",
"category": "Computer and Information Technology",
"slug": "software-developers"
}
occupations.csv 列slug, title, category, median_pay, education, job_count, growth_percent, growth_outlook
示例行:
software-developers, Software Developers, Computer and Information Technology,
130160, Bachelor's degree, 1847900, 17, Much faster than average
site/data.json 条目(合并后的前端数据){
"slug": "software-developers",
"title": "Software Developers",
"category": "Computer and Information Technology",
"median_pay": 130160,
"education": "Bachelor's degree",
"job_count": 1847900,
"growth_percent": 17,
"growth_outlook": "Much faster than average",
"ai_score": 9,
"ai_rationale": "AI is deeply transforming software development workflows..."
}
site/index.html)该可视化是一个使用 D3.js 的独立 HTML 文件。
| 图层 | 显示内容 |
|---|---|
| BLS 前景 | BLS 预测的增长类别(绿色 = 快速增长) |
| 中位数薪酬 | 年度中位数工资(颜色渐变) |
| 教育程度 | 最低教育要求 |
| 数字 AI 暴露度 | LLM 评分的 0–10 分 AI 影响估计 |
<!-- 在 site/index.html 中,找到图层切换按钮 -->
<button onclick="setLayer('ai_score')">Digital AI Exposure</button>
<!-- 添加你的新图层按钮 -->
<button onclick="setLayer('robotics_score')">Humanoid Robotics</button>
// 在 colorScale 函数中,为你的新字段添加一个 case:
function getColor(d, layer) {
if (layer === 'robotics_score') {
// 分数 0-10,蓝色 = 低暴露度,红色 = 高暴露度
return d3.interpolateRdYlBu(1 - d.robotics_score / 10);
}
// ... 现有 case
}
然后更新 build_site_data.py,将你的新评分字段包含在 data.json 中。
将所有 342 种职业和汇总统计数据打包到一个文件中,供 LLM 聊天使用:
uv run python make_prompt.py
# 生成 prompt.md (~45K tokens)
# 粘贴到 Claude、GPT-4、Gemini 等中进行基于数据的对话
BLS 会屏蔽自动化机器人,因此 scrape.py 使用非无头的 Playwright(真实的可见浏览器窗口):
# scrape.py 关键行为
browser = await p.chromium.launch(headless=False) # 必须可见
# 页面保存到 html/<slug>.html
# 已抓取的页面会被跳过(已缓存)
如果抓取失败或被限速:
html/ 目录已包含缓存的页面process.py 开始运行import json, os
with open("scores.json") as f:
existing = json.load(f)
with open("occupations.json") as f:
all_occupations = json.load(f)
# 查找缺失项
missing = [o for o in all_occupations if o["slug"] not in existing]
print(f"Missing scores: {len(missing)}")
# 然后运行 score.py 并筛选缺失的 slug
from parse_detail import parse_occupation_page
from pathlib import Path
html = Path("html/software-developers.html").read_text()
data = parse_occupation_page(html)
print(data["median_pay"]) # 例如 130160
print(data["job_count"]) # 例如 1847900
print(data["growth_outlook"]) # 例如 "Much faster than average"
import pandas as pd
df = pd.read_csv("occupations.csv")
# 薪酬最高的前 10 种职业
top_pay = df.nlargest(10, "median_pay")[["title", "median_pay", "growth_outlook"]]
print(top_pay)
# 筛选:快速增长 + 高薪酬
high_value = df[
(df["growth_percent"] > 10) &
(df["median_pay"] > 80000)
].sort_values("median_pay", ascending=False)
import pandas as pd, json
df = pd.read_csv("occupations.csv")
with open("scores.json") as f:
scores = json.load(f)
df["ai_score"] = df["slug"].map(lambda s: scores.get(s, {}).get("score"))
df["ai_rationale"] = df["slug"].map(lambda s: scores.get(s, {}).get("rationale"))
# 高 AI 暴露度,高薪酬 — 正在重塑,而非消失
high_exposure_high_pay = df[
(df["ai_score"] >= 8) &
(df["median_pay"] > 100000)
][["title", "median_pay", "ai_score", "growth_outlook"]]
print(high_exposure_high_pay)
playwright install 失败
uv run playwright install --with-deps chromium
BLS 抓取被屏蔽 / 返回空页面
scrape.py 中 headless=False(默认如此)html/ 目录score.py OpenRouter 错误
.env 中是否设置了 OPENROUTER_API_KEYscore.py 中的 model 以使用不同的 LLM重新评分后 site/data.json 未更新
# 更改 scores.json 后,务必重新构建站点数据
uv run python build_site_data.py
矩形树图显示空白 / 无数据
site/data.json 存在且是有效的 JSONpython -m http.server 运行服务(不要用 file:// — CORS 会阻止本地 JSON 获取)每周安装量
250
仓库
GitHub 星标数
10
首次出现
6 天前
安全审计
安装于
github-copilot249
codex249
amp249
cline249
kimi-cli249
gemini-cli249
Skill by ara.so — Daily 2026 Skills collection.
A research tool for visually exploring Bureau of Labor Statistics Occupational Outlook Handbook data across 342 occupations. The interactive treemap colors rectangles by employment size (area) and any chosen metric (color): BLS growth outlook, median pay, education requirements, or LLM-scored AI exposure. The pipeline is fully forkable — write a new prompt, re-run scoring, get a new color layer.
Live demo: karpathy.ai/jobs
# Clone the repo
git clone https://github.com/karpathy/jobs
cd jobs
# Install dependencies (uses uv)
uv sync
uv run playwright install chromium
Create a .env file with your OpenRouter API key (required only for LLM scoring):
OPENROUTER_API_KEY=your_openrouter_key_here
Run these in order for a complete fresh build:
# 1. Scrape BLS pages (non-headless Playwright; BLS blocks bots)
# Results cached in html/ — only needed once
uv run python scrape.py
# 2. Convert raw HTML → clean Markdown in pages/
uv run python process.py
# 3. Extract structured fields → occupations.csv
uv run python make_csv.py
# 4. Score AI exposure via LLM (uses OpenRouter API, saves scores.json)
uv run python score.py
# 5. Merge CSV + scores → site/data.json for the frontend
uv run python build_site_data.py
# 6. Serve the visualization locally
cd site && python -m http.server 8000
# Open http://localhost:8000
| File | Description |
|---|---|
occupations.json | Master list of 342 occupations (title, URL, category, slug) |
occupations.csv | Summary stats: pay, education, job count, growth projections |
scores.json | AI exposure scores (0–10) + rationales for all 342 occupations |
prompt.md | All data in one ~45K-token file for pasting into an LLM |
html/ | Raw HTML pages from BLS (~40MB, source of truth) |
pages/ |
The most powerful feature: write any scoring prompt, run score.py, get a new treemap color layer.
score.py# score.py (simplified structure)
SYSTEM_PROMPT = """
You are evaluating occupations for exposure to humanoid robotics over the next 10 years.
Score each occupation from 0 to 10:
- 0 = no meaningful exposure (e.g., requires fine social judgment, non-physical)
- 5 = moderate exposure (some tasks automatable, but humans still central)
- 10 = high exposure (repetitive physical tasks, predictable environments)
Consider: physical task complexity, environment predictability, dexterity requirements,
cost of robot vs human, regulatory barriers.
Respond ONLY with JSON: {"score": <int 0-10>, "rationale": "<1-2 sentences>"}
"""
# The pipeline reads each occupation's Markdown from pages/,
# sends it to the LLM, and writes results to scores.json
# scores.json structure:
{
"software-developers": {
"score": 1,
"rationale": "Software development is digital and cognitive; humanoid robots provide no advantage."
},
"construction-laborers": {
"score": 7,
"rationale": "Physical, repetitive outdoor tasks are targets for humanoid robotics, though unstructured environments remain challenging."
}
// ... 342 occupations total
}
uv run python build_site_data.py
cd site && python -m http.server 8000
occupations.json entry{
"title": "Software Developers",
"url": "https://www.bls.gov/ooh/computer-and-information-technology/software-developers.htm",
"category": "Computer and Information Technology",
"slug": "software-developers"
}
occupations.csv columnsslug, title, category, median_pay, education, job_count, growth_percent, growth_outlook
Example row:
software-developers, Software Developers, Computer and Information Technology,
130160, Bachelor's degree, 1847900, 17, Much faster than average
site/data.json entry (merged frontend data){
"slug": "software-developers",
"title": "Software Developers",
"category": "Computer and Information Technology",
"median_pay": 130160,
"education": "Bachelor's degree",
"job_count": 1847900,
"growth_percent": 17,
"growth_outlook": "Much faster than average",
"ai_score": 9,
"ai_rationale": "AI is deeply transforming software development workflows..."
}
site/index.html)The visualization is a single self-contained HTML file using D3.js.
| Layer | What it shows |
|---|---|
| BLS Outlook | BLS projected growth category (green = fast growth) |
| Median Pay | Annual median wage (color gradient) |
| Education | Minimum education required |
| Digital AI Exposure | LLM-scored 0–10 AI impact estimate |
<!-- In site/index.html, find the layer toggle buttons -->
<button onclick="setLayer('ai_score')">Digital AI Exposure</button>
<!-- Add your new layer button -->
<button onclick="setLayer('robotics_score')">Humanoid Robotics</button>
// In the colorScale function, add a case for your new field:
function getColor(d, layer) {
if (layer === 'robotics_score') {
// scores 0-10, blue = low exposure, red = high
return d3.interpolateRdYlBu(1 - d.robotics_score / 10);
}
// ... existing cases
}
Then update build_site_data.py to include your new score field in data.json.
Package all 342 occupations + aggregate stats into a single file for LLM chat:
uv run python make_prompt.py
# Produces prompt.md (~45K tokens)
# Paste into Claude, GPT-4, Gemini, etc. for data-grounded conversation
The BLS blocks automated bots, so scrape.py uses non-headless Playwright (real visible browser window):
# scrape.py key behavior
browser = await p.chromium.launch(headless=False) # Must be visible
# Pages saved to html/<slug>.html
# Already-scraped pages are skipped (cached)
If scraping fails or is rate-limited:
html/ directory already contains cached pages in the repoprocess.py onwardimport json, os
with open("scores.json") as f:
existing = json.load(f)
with open("occupations.json") as f:
all_occupations = json.load(f)
# Find gaps
missing = [o for o in all_occupations if o["slug"] not in existing]
print(f"Missing scores: {len(missing)}")
# Then run score.py with a filter for missing slugs
from parse_detail import parse_occupation_page
from pathlib import Path
html = Path("html/software-developers.html").read_text()
data = parse_occupation_page(html)
print(data["median_pay"]) # e.g. 130160
print(data["job_count"]) # e.g. 1847900
print(data["growth_outlook"]) # e.g. "Much faster than average"
import pandas as pd
df = pd.read_csv("occupations.csv")
# Top 10 highest paying occupations
top_pay = df.nlargest(10, "median_pay")[["title", "median_pay", "growth_outlook"]]
print(top_pay)
# Filter: fast growth + high pay
high_value = df[
(df["growth_percent"] > 10) &
(df["median_pay"] > 80000)
].sort_values("median_pay", ascending=False)
import pandas as pd, json
df = pd.read_csv("occupations.csv")
with open("scores.json") as f:
scores = json.load(f)
df["ai_score"] = df["slug"].map(lambda s: scores.get(s, {}).get("score"))
df["ai_rationale"] = df["slug"].map(lambda s: scores.get(s, {}).get("rationale"))
# High AI exposure, high pay — reshaping, not disappearing
high_exposure_high_pay = df[
(df["ai_score"] >= 8) &
(df["median_pay"] > 100000)
][["title", "median_pay", "ai_score", "growth_outlook"]]
print(high_exposure_high_pay)
playwright install fails
uv run playwright install --with-deps chromium
BLS scraping blocked / returns empty pages
headless=False in scrape.py (already the default)html/ directory in the repo can be used directlyscore.py OpenRouter errors
OPENROUTER_API_KEY is set in .envmodel in score.py for a different LLMsite/data.json not updating after re-scoring
# Always rebuild site data after changing scores.json
uv run python build_site_data.py
Treemap shows blank / no data
site/data.json exists and is valid JSONpython -m http.server (not file:// — CORS blocks local JSON fetch)Weekly Installs
250
Repository
GitHub Stars
10
First Seen
6 days ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
github-copilot249
codex249
amp249
cline249
kimi-cli249
gemini-cli249
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
56,200 周安装
| Clean Markdown versions of each occupation page |
site/index.html | The treemap visualization (single HTML file) |
site/data.json | Compact merged data consumed by the frontend |
score.py | LLM scoring pipeline — fork this to write custom prompts |