LangSmith：LLM应用开发与监控平台，提供追踪、评估和提示管理

langsmith by supercent-io/skills-template

322 周安装量

88 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/supercent-io/skills-template --skill langsmith

AI/机器学习开发运维监控

🇨🇳中文介绍

langsmith — LLM 可观测性、评估与提示管理

关键词 : langsmith · llm tracing · llm evaluation · @traceable · langsmith evaluate

LangSmith 是一个与框架无关的平台，用于开发、调试和部署 LLM 应用程序。它提供端到端的追踪、质量评估、提示版本控制和生产监控。

何时使用此技能

为任何 LLM 流水线添加追踪功能（OpenAI、Anthropic、LangChain、自定义模型）
使用 evaluate() 针对精选数据集运行离线评估
设置生产监控和在线评估
在提示中心管理和版本化提示
为回归测试和基准测试创建数据集
将人工或自动化反馈附加到追踪记录
使用 openevals 进行 LLM 作为评判员的评分

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

概念	描述
运行	单个操作（LLM 调用、工具调用、检索）。基本单位。
追踪	来自单个用户请求的所有运行，通过 `trace_id` 链接。
线程	对话中的多个追踪，通过 `session_id` 或 `thread_id` 链接。
项目	用于分组相关追踪的容器（通过 `LANGSMITH_PROJECT` 设置）。
数据集	用于离线评估的 `{inputs, outputs}` 示例集合。
实验	针对数据集运行 `evaluate()` 的结果集。
反馈	附加到运行的评分/标签 —— 数值型、分类型或自由格式。

@traceable 装饰器 (Python)

from langsmith import traceable

@traceable(
    run_type="chain",          # llm | chain | tool | retriever | embedding
    name="My Pipeline",
    tags=["production", "v2"],
    metadata={"version": "2.1", "env": "prod"},
    project_name="my-project"
)
def pipeline(question: str) -> str:
    return generate_answer(question)

选择性追踪上下文

import langsmith as ls

# 仅为此代码块启用追踪
with ls.tracing_context(enabled=True, project_name="debug"):
    result = chain.invoke({"input": "..."})

# 即使 LANGSMITH_TRACING=true 也禁用追踪
with ls.tracing_context(enabled=False):
    result = chain.invoke({"input": "..."})

包装提供商客户端

from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic

openai_client = wrap_openai(OpenAI())           # 所有调用自动追踪
anthropic_client = wrap_anthropic(anthropic.Anthropic())

分布式追踪（微服务）

from langsmith.run_helpers import get_current_run_tree
import langsmith

@langsmith.traceable
def service_a(inputs):
    rt = get_current_run_tree()
    headers = rt.to_headers()     # 传递给子服务
    return call_service_b(headers=headers)

@langsmith.traceable
def service_b(x, headers):
    with langsmith.tracing_context(parent=headers):
        return process(x)

使用 evaluate() 进行基础评估

from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = Client()
oai = wrap_openai(OpenAI())

# 1. 创建数据集
dataset = client.create_dataset("Geography QA")
client.create_examples(
    dataset_id=dataset.id,
    examples=[
        {"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
        {"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
    ]
)

# 2. 目标函数
def target(inputs: dict) -> dict:
    res = oai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": inputs["q"]}]
    )
    return {"a": res.choices[0].message.content}

# 3. 评估器
def exact_match(inputs, outputs, reference_outputs):
    return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()

# 4. 运行实验
results = client.evaluate(
    target,
    data="Geography QA",
    evaluators=[exact_match],
    experiment_prefix="gpt-4o-mini-v1",
    max_concurrency=4
)

使用 openevals 进行 LLM 作为评判员

pip install -U openevals

from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT

judge = create_llm_as_judge(
    prompt=CORRECTNESS_PROMPT,
    model="openai:o3-mini",
    feedback_key="correctness",
)

results = client.evaluate(target, data="my-dataset", evaluators=[judge])

类型	何时使用
代码/启发式	精确匹配、格式检查、基于规则
LLM 作为评判员	主观质量、安全性、无参考标准
人工	标注队列、成对比较
成对	比较两个应用版本
在线	生产追踪记录、真实流量

from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate

client = Client()

# 推送提示
prompt = ChatPromptTemplate([
    ("system", "You are a helpful assistant."),
    ("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)

# 拉取并使用
prompt = client.pull_prompt("my-assistant-prompt")
# 拉取特定版本：
prompt = client.pull_prompt("my-assistant-prompt:abc123")

from langsmith import Client
import uuid

client = Client()

# 自定义运行 ID 用于后续反馈关联
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})

# 附加反馈
client.create_feedback(
    key="correctness",
    score=1,              # 0-1 数值型或分类型
    run_id=my_run_id,
    comment="准确且简洁"
)

Python SDK 参考 — 完整的 Client API、@traceable 签名、evaluate()
TypeScript SDK 参考 — Client、traceable、wrappers、evaluate
CLI 参考 — langsmith CLI 命令
官方文档 — langchain.com/langsmith
SDK GitHub — MIT 许可证，v0.7.17
openevals — 预构建的 LLM 评估器

🇺🇸English

langsmith — LLM Observability, Evaluation & Prompt Management

Keyword : langsmith · llm tracing · llm evaluation · @traceable · langsmith evaluate

LangSmith is a framework-agnostic platform for developing, debugging, and deploying LLM applications. It provides end-to-end tracing, quality evaluation, prompt versioning, and production monitoring.

When to use this skill

Add tracing to any LLM pipeline (OpenAI, Anthropic, LangChain, custom models)
Run offline evaluations with evaluate() against a curated dataset
Set up production monitoring and online evaluation
Manage and version prompts in the Prompt Hub
Create datasets for regression testing and benchmarking
Attach human or automated feedback to traces
Use LLM-as-judge scoring with openevals
Debug agent failures with end-to-end trace inspection

Instructions

Install SDK: pip install -U langsmith (Python) or npm install langsmith (TypeScript)
Set environment variables: LANGSMITH_TRACING=true, LANGSMITH_API_KEY=lsv2_...
Instrument with @traceable decorator or wrap_openai() wrapper
View traces at smith.langchain.com
For evaluation setup, see references/python-sdk.md
For CLI commands, see references/cli.md
Run bash scripts/setup.sh to auto-configure environment

API Key : Get from smith.langchain.com → Settings → API Keys Docs : https://docs.langchain.com/langsmith

Quick Start

Python

pip install -U langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."
export OPENAI_API_KEY="sk-..."



from langsmith import traceable
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = wrap_openai(OpenAI())

@traceable
def rag_pipeline(question: str) -> str:
    """Automatically traced in LangSmith"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": question}]
    )
    return response.choices[0].message.content

result = rag_pipeline("What is LangSmith?")

TypeScript

npm install langsmith openai
export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY="lsv2_..."



import { traceable } from "langsmith/traceable";
import { wrapOpenAI } from "langsmith/wrappers";
import { OpenAI } from "openai";

const client = wrapOpenAI(new OpenAI());

const pipeline = traceable(async (question: string): Promise<string> => {
  const res = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: question }],
  });
  return res.choices[0].message.content ?? "";
}, { name: "RAG Pipeline" });

await pipeline("What is LangSmith?");

Core Concepts

Concept	Description
Run	Individual operation (LLM call, tool call, retrieval). The fundamental unit.
Trace	All runs from a single user request, linked by `trace_id`.
Thread	Multiple traces in a conversation, linked by `session_id` or `thread_id`.
Project	Container grouping related traces (set via `LANGSMITH_PROJECT`).
Dataset	Collection of `{inputs, outputs}` examples for offline evaluation.

Tracing

@traceable decorator (Python)

from langsmith import traceable

@traceable(
    run_type="chain",          # llm | chain | tool | retriever | embedding
    name="My Pipeline",
    tags=["production", "v2"],
    metadata={"version": "2.1", "env": "prod"},
    project_name="my-project"
)
def pipeline(question: str) -> str:
    return generate_answer(question)

Selective tracing context

import langsmith as ls

# Enable tracing for this block only
with ls.tracing_context(enabled=True, project_name="debug"):
    result = chain.invoke({"input": "..."})

# Disable tracing despite LANGSMITH_TRACING=true
with ls.tracing_context(enabled=False):
    result = chain.invoke({"input": "..."})

Wrap provider clients

from langsmith.wrappers import wrap_openai, wrap_anthropic
from openai import OpenAI
import anthropic

openai_client = wrap_openai(OpenAI())           # All calls auto-traced
anthropic_client = wrap_anthropic(anthropic.Anthropic())

Distributed tracing (microservices)

from langsmith.run_helpers import get_current_run_tree
import langsmith

@langsmith.traceable
def service_a(inputs):
    rt = get_current_run_tree()
    headers = rt.to_headers()     # Pass to child service
    return call_service_b(headers=headers)

@langsmith.traceable
def service_b(x, headers):
    with langsmith.tracing_context(parent=headers):
        return process(x)

Evaluation

Basic evaluation with evaluate()

from langsmith import Client
from langsmith.wrappers import wrap_openai
from openai import OpenAI

client = Client()
oai = wrap_openai(OpenAI())

# 1. Create dataset
dataset = client.create_dataset("Geography QA")
client.create_examples(
    dataset_id=dataset.id,
    examples=[
        {"inputs": {"q": "Capital of France?"}, "outputs": {"a": "Paris"}},
        {"inputs": {"q": "Capital of Germany?"}, "outputs": {"a": "Berlin"}},
    ]
)

# 2. Target function
def target(inputs: dict) -> dict:
    res = oai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": inputs["q"]}]
    )
    return {"a": res.choices[0].message.content}

# 3. Evaluator
def exact_match(inputs, outputs, reference_outputs):
    return outputs["a"].strip().lower() == reference_outputs["a"].strip().lower()

# 4. Run experiment
results = client.evaluate(
    target,
    data="Geography QA",
    evaluators=[exact_match],
    experiment_prefix="gpt-4o-mini-v1",
    max_concurrency=4
)

LLM-as-judge with openevals

pip install -U openevals



from openevals.llm import create_llm_as_judge
from openevals.prompts import CORRECTNESS_PROMPT

judge = create_llm_as_judge(
    prompt=CORRECTNESS_PROMPT,
    model="openai:o3-mini",
    feedback_key="correctness",
)

results = client.evaluate(target, data="my-dataset", evaluators=[judge])

Evaluation types

Type	When to use
Code/Heuristic	Exact match, format checks, rule-based
LLM-as-judge	Subjective quality, safety, reference-free
Human	Annotation queues, pairwise comparison
Pairwise	Compare two app versions
Online	Production traces, real traffic

Prompt Hub

from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate

client = Client()

# Push a prompt
prompt = ChatPromptTemplate([
    ("system", "You are a helpful assistant."),
    ("user", "{question}"),
])
client.push_prompt("my-assistant-prompt", object=prompt)

# Pull and use
prompt = client.pull_prompt("my-assistant-prompt")
# Pull specific version:
prompt = client.pull_prompt("my-assistant-prompt:abc123")

Feedback

from langsmith import Client
import uuid

client = Client()

# Custom run ID for later feedback linking
my_run_id = str(uuid.uuid4())
result = chain.invoke({"input": "..."}, {"run_id": my_run_id})

# Attach feedback
client.create_feedback(
    key="correctness",
    score=1,              # 0-1 numeric or categorical
    run_id=my_run_id,
    comment="Accurate and concise"
)

References

Python SDK Reference — full Client API, @traceable signature, evaluate()
TypeScript SDK Reference — Client, traceable, wrappers, evaluate
CLI Reference — langsmith CLI commands
Official Docs — langchain.com/langsmith
SDK GitHub — MIT License, v0.7.17
openevals — Prebuilt LLM evaluators

Weekly Installs

322

Repository

supercent-io/sk…template

GitHub Stars

First Seen

Mar 13, 2026

Security Audits

Gen Agent Trust HubWarn SocketPass SnykWarn

Installed on

gemini-cli285

codex278

opencode269

github-copilot267

cursor267

kimi-cli266

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

100,500 周安装

LangSmith：LLM应用开发与监控平台，提供追踪、评估和提示管理

🇨🇳中文介绍

langsmith — LLM 可观测性、评估与提示管理

何时使用此技能

相关 Skills

使用说明

快速开始

Python

TypeScript

核心概念

追踪

@traceable 装饰器 (Python)

选择性追踪上下文

包装提供商客户端

分布式追踪（微服务）

评估

使用 evaluate() 进行基础评估

使用 openevals 进行 LLM 作为评判员

评估类型

提示中心

反馈

参考资料

🇺🇸English

langsmith — LLM Observability, Evaluation & Prompt Management

When to use this skill

Instructions

Quick Start

Python

TypeScript

Core Concepts

Tracing

@traceable decorator (Python)

Selective tracing context

Wrap provider clients

Distributed tracing (microservices)

Evaluation

Basic evaluation with evaluate()

LLM-as-judge with openevals

Evaluation types

Prompt Hub

Feedback

References

最新 Skills