langsmith by supercent-io/skills-template
npx skills add https://github.com/supercent-io/skills-template --skill langsmith关键词 :
langsmith·llm tracing·llm evaluation·@traceable·langsmith evaluateLangSmith 是一个与框架无关的平台,用于开发、调试和部署 LLM 应用程序。它提供端到端的追踪、质量评估、提示版本控制和生产监控。
evaluate() 针对精选数据集运行离线评估openevals 进行 LLM 作为评判员的评分广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
pip install -U langsmith (Python) 或 npm install langsmith (TypeScript)LANGSMITH_TRACING=true, LANGSMITH_API_KEY=lsv2_...@traceable 装饰器或 wrap_openai() 包装器进行插桩bash scripts/setup.sh 以自动配置环境API 密钥 : 从 smith.langchain.com → Settings → API Keys 获取 文档 : https://docs.langchain.com/langsmith
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = wrap_openai(OpenAI())
@traceable
def rag_pipeline(question: str) -> str:
"""自动在 LangSmith 中追踪"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
result = rag_pipeline("What is LangSmith?")
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";
const client = wrapOpenAI(new OpenAI());
const pipeline = traceable(async (question: string): Promise<string> => {
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: question }],
});
return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });
await pipeline("What is LangSmith?");
| 概念 | 描述 |
|---|---|
| 运行 | 单个操作(LLM 调用、工具调用、检索)。基本单位。 |
| 追踪 | 来自单个用户请求的所有运行,通过 trace_id 链接。 |
| 线程 | 对话中的多个追踪,通过 session_id 或 thread_id 链接。 |
| 项目 | 用于分组相关追踪的容器(通过 LANGSMITH_PROJECT 设置)。 |
| 数据集 | 用于离线评估的 {inputs, outputs} 示例集合。 |
| 实验 | 针对数据集运行 evaluate() 的结果集。 |
| 反馈 | 附加到运行的评分/标签 —— 数值型、分类型或自由格式。 |
from langsmith import traceable
@traceable(
run_type="chain", # llm | chain | tool | retriever | embedding
name="My Pipeline",
tags=["production", "v2"],
metadata={"version": "2.1", "env": "prod"},
project_name="my-project"
)
def pipeline(question: str) -> str:
return generate_answer(question)
import langsmith as ls
# 仅为此代码块启用追踪
with ls.tracing_context(enabled=True, project_name="debug"):
result = chain.invoke({"input": "..."})
# 即使 LANGSMITH_TRACING=true 也禁用追踪
with ls.tracing_context(enabled=False):
result = chain.invoke({"input": "..."})
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic
openai_client = wrap_openai(OpenAI()) # 所有调用自动追踪
anthropic_client = wrap_anthropic(anthropic.Anthropic())
from langsmith.run_helpers import get_current_run_tree
import langsmith
@langsmith.traceable
def service_a(inputs):
rt = get_current_run_tree()
headers = rt.to_headers() # 传递给子服务
return call_service_b(headers=headers)
@langsmith.traceable
def service_b(x, headers):
with langsmith.tracing_context(parent=headers):
return process(x)
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = Client()
oai = wrap_openai(OpenAI())
# 1. 创建数据集
dataset = client.create_dataset("Geography QA")
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
{"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
]
)
# 2. 目标函数
def target(inputs: dict) -> dict:
res = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": inputs["q"]}]
)
return {"a": res.choices[0].message.content}
# 3. 评估器
def exact_match(inputs, outputs, reference_outputs):
return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()
# 4. 运行实验
results = client.evaluate(
target,
data="Geography QA",
evaluators=[exact_match],
experiment_prefix="gpt-4o-mini-v1",
max_concurrency=4
)
pip install -U openevals
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT
judge = create_llm_as_judge(
prompt=CORRECTNESS_PROMPT,
model="openai:o3-mini",
feedback_key="correctness",
)
results = client.evaluate(target, data="my-dataset", evaluators=[judge])
| 类型 | 何时使用 |
|---|---|
| 代码/启发式 | 精确匹配、格式检查、基于规则 |
| LLM 作为评判员 | 主观质量、安全性、无参考标准 |
| 人工 | 标注队列、成对比较 |
| 成对 | 比较两个应用版本 |
| 在线 | 生产追踪记录、真实流量 |
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
client = Client()
# 推送提示
prompt = ChatPromptTemplate([
("system", "You are a helpful assistant."),
("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)
# 拉取并使用
prompt = client.pull_prompt("my-assistant-prompt")
# 拉取特定版本:
prompt = client.pull_prompt("my-assistant-prompt:abc123")
from langsmith import Client
import uuid
client = Client()
# 自定义运行 ID 用于后续反馈关联
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})
# 附加反馈
client.create_feedback(
key="correctness",
score=1, # 0-1 数值型或分类型
run_id=my_run_id,
comment="准确且简洁"
)
每周安装次数
322
代码仓库
GitHub 星标数
88
首次出现
2026年3月13日
安全审计
安装于
gemini-cli285
codex278
opencode269
github-copilot267
cursor267
kimi-cli266
Keyword :
langsmith·llm tracing·llm evaluation·@traceable·langsmith evaluateLangSmith is a framework-agnostic platform for developing, debugging, and deploying LLM applications. It provides end-to-end tracing, quality evaluation, prompt versioning, and production monitoring.
evaluate() against a curated datasetopenevalspip install -U langsmith (Python) or npm install langsmith (TypeScript)LANGSMITH_TRACING=true, LANGSMITH_API_KEY=lsv2_...@traceable decorator or wrap_openai() wrapperbash scripts/setup.sh to auto-configure environmentAPI Key : Get from smith.langchain.com → Settings → API Keys Docs : https://docs.langchain.com/langsmith
pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."
from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = wrap_openai(OpenAI())
@traceable
def rag_pipeline(question: str) -> str:
"""Automatically traced in LangSmith"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}]
)
return response.choices[0].message.content
result = rag_pipeline("What is LangSmith?")
npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";
const client = wrapOpenAI(new OpenAI());
const pipeline = traceable(async (question: string): Promise<string> => {
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: question }],
});
return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });
await pipeline("What is LangSmith?");
| Concept | Description |
|---|---|
| Run | Individual operation (LLM call, tool call, retrieval). The fundamental unit. |
| Trace | All runs from a single user request, linked by trace_id. |
| Thread | Multiple traces in a conversation, linked by session_id or thread_id. |
| Project | Container grouping related traces (set via LANGSMITH_PROJECT). |
| Dataset | Collection of {inputs, outputs} examples for offline evaluation. |
from langsmith import traceable
@traceable(
run_type="chain", # llm | chain | tool | retriever | embedding
name="My Pipeline",
tags=["production", "v2"],
metadata={"version": "2.1", "env": "prod"},
project_name="my-project"
)
def pipeline(question: str) -> str:
return generate_answer(question)
import langsmith as ls
# Enable tracing for this block only
with ls.tracing_context(enabled=True, project_name="debug"):
result = chain.invoke({"input": "..."})
# Disable tracing despite LANGSMITH_TRACING=true
with ls.tracing_context(enabled=False):
result = chain.invoke({"input": "..."})
from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic
openai_client = wrap_openai(OpenAI()) # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())
from langsmith.run_helpers import get_current_run_tree
import langsmith
@langsmith.traceable
def service_a(inputs):
rt = get_current_run_tree()
headers = rt.to_headers() # Pass to child service
return call_service_b(headers=headers)
@langsmith.traceable
def service_b(x, headers):
with langsmith.tracing_context(parent=headers):
return process(x)
from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI
client = Client()
oai = wrap_openai(OpenAI())
# 1. Create dataset
dataset = client.create_dataset("Geography QA")
client.create_examples(
dataset_id=dataset.id,
examples=[
{"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
{"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
]
)
# 2. Target function
def target(inputs: dict) -> dict:
res = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": inputs["q"]}]
)
return {"a": res.choices[0].message.content}
# 3. Evaluator
def exact_match(inputs, outputs, reference_outputs):
return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()
# 4. Run experiment
results = client.evaluate(
target,
data="Geography QA",
evaluators=[exact_match],
experiment_prefix="gpt-4o-mini-v1",
max_concurrency=4
)
pip install -U openevals
from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT
judge = create_llm_as_judge(
prompt=CORRECTNESS_PROMPT,
model="openai:o3-mini",
feedback_key="correctness",
)
results = client.evaluate(target, data="my-dataset", evaluators=[judge])
| Type | When to use |
|---|---|
| Code/Heuristic | Exact match, format checks, rule-based |
| LLM-as-judge | Subjective quality, safety, reference-free |
| Human | Annotation queues, pairwise comparison |
| Pairwise | Compare two app versions |
| Online | Production traces, real traffic |
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
client = Client()
# Push a prompt
prompt = ChatPromptTemplate([
("system", "You are a helpful assistant."),
("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)
# Pull and use
prompt = client.pull_prompt("my-assistant-prompt")
# Pull specific version:
prompt = client.pull_prompt("my-assistant-prompt:abc123")
from langsmith import Client
import uuid
client = Client()
# Custom run ID for later feedback linking
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})
# Attach feedback
client.create_feedback(
key="correctness",
score=1, # 0-1 numeric or categorical
run_id=my_run_id,
comment="Accurate and concise"
)
Weekly Installs
322
Repository
GitHub Stars
88
First Seen
Mar 13, 2026
Security Audits
Gen Agent Trust HubWarnSocketPassSnykWarn
Installed on
gemini-cli285
codex278
opencode269
github-copilot267
cursor267
kimi-cli266
Azure Data Explorer (Kusto) 查询技能:KQL数据分析、日志遥测与时间序列处理
100,500 周安装
| Experiment | Result set from running evaluate() against a dataset. |
| Feedback | Score/label attached to a run — numeric, categorical, or freeform. |