LangSmith 数据集管理指南：创建、导出、评估 AI 数据集与追踪记录

langsmith-dataset by langchain-ai/langsmith-skills

654 周安装量

76 GitHub Stars

安装命令

npx skills add https://github.com/langchain-ai/langsmith-skills --skill langsmith-dataset

🇨🇳中文介绍

LANGSMITH_API_KEY=lsv2_pt_your_api_key_here # 必需 LANGSMITH_PROJECT=your-project-name # 检查此项以了解哪个项目包含追踪记录 LANGSMITH_WORKSPACE_ID=your-workspace-id # 可选：用于组织范围的密钥

重要提示： 在查询或与 LangSmith 交互之前，请务必检查环境变量或 .env 文件中的 LANGSMITH_PROJECT。这会告诉你哪个项目包含相关的追踪记录和数据。如果 LangSmith 项目不可用，请运用你的最佳判断来确定正确的项目。

Python 依赖项

pip install langsmith

JavaScript 依赖项

npm install langsmith

CLI 工具

curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh

数据集命令

langsmith dataset list - 列出 LangSmith 中的数据集
langsmith dataset get <name-or-id> - 查看数据集详情

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

从导出的追踪记录（编程方式）

首先导出追踪记录，然后使用代码将其处理成数据集格式：

# 1. 将追踪记录导出到 JSONL 文件
langsmith trace export ./traces --project my-project --limit 20 --full

2. 将追踪记录处理成数据集示例

examples = [] for jsonl_file in Path("./traces").glob("*.jsonl"): runs = [json.loads(line) for line in jsonl_file.read_text().strip().split("\n")] root = next((r for r in runs if r.get("parent_run_id") is None), None) if root and root.get("inputs") and root.get("outputs"): examples.append({ "trace_id": root.get("trace_id"), "inputs": root["inputs"], "outputs": root["outputs"] })

with open("/tmp/dataset.json", "w") as f: json.dump(examples, f, indent=2)

</python>

<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";

const client = new Client();

// 2. Process traces into dataset examples
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));

for (const file of files) {
  const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
  const runs = lines.map(line => JSON.parse(line));
  const root = runs.find(r => r.parent_run_id == null);
  if (root?.inputs && root?.outputs) {
    examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
  }
}

// 3. Save locally
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));

# 将本地 JSON 文件上传为数据集
langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"

一步创建数据集并添加示例

dataset = client.create_dataset("My Dataset", description="Evaluation dataset")

client.create_examples( inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}], outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}], dataset_name="My Dataset", )

</python>

<typescript>
```typescript
import { Client } from "langsmith";

const client = new Client();

// Create dataset and add examples
const dataset = await client.createDataset("My Dataset", {
  description: "Evaluation dataset",
});

await client.createExamples({
  inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
  outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
  datasetName: "My Dataset",
});

<dataset_structures>

按类型划分的数据集结构

{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}

{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}

{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}

{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}

</dataset_structures>

# 列出所有数据集
langsmith dataset list

# 获取数据集详情
langsmith dataset get "My Dataset"

# 创建空数据集
langsmith dataset create --name "New Dataset" --description "For evaluation"

# 上传本地 JSON 文件
langsmith dataset upload /tmp/dataset.json --name "My Dataset"

# 将数据集导出到本地文件
langsmith dataset export "My Dataset" /tmp/exported.json --limit 100

# 删除数据集
langsmith dataset delete "My Dataset"

# 列出数据集中的示例
langsmith example list --dataset "My Dataset" --limit 10

# 添加示例
langsmith example create --dataset "My Dataset" \
  --inputs '{"query": "test"}' \
  --outputs '{"answer": "result"}'

# 列出实验
langsmith experiment list --dataset "My Dataset"
langsmith experiment get "eval-v1"

<example_workflow> 从追踪记录到上传 LangSmith 数据集的完整工作流程：

# 1. 从 LangSmith 导出追踪记录
langsmith trace export ./traces --project my-project --limit 20 --full

# 2. 将追踪记录处理成数据集格式（使用 Python/JS 代码）
# 参见上面的"创建数据集"部分

# 3. 上传到 LangSmith
langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response"
langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory"

# 4. 验证上传
langsmith dataset list
langsmith dataset get "Skills: Final Response"
langsmith example list --dataset "Skills: Final Response" --limit 3

# 5. 运行实验
langsmith experiment list --dataset "Skills: Final Response"

上传后数据集为空：

验证 JSON 文件包含具有 inputs 键的对象数组
检查文件是否为空：langsmith example list --dataset "Name"

导出没有数据：

确保使用 --full 标志导出追踪记录以包含输入/输出
验证追踪记录是否同时填充了 inputs 和 outputs

示例数量不匹配：

使用 langsmith dataset get "Name" 检查远程数量
与本地文件比较以验证上传完整性

🇺🇸English

LANGSMITH_API_KEY=lsv2_pt_your_api_key_here # Required LANGSMITH_PROJECT=your-project-name # Check this to know which project has traces LANGSMITH_WORKSPACE_ID=your-workspace-id # Optional: for org-scoped keys

IMPORTANT: Always check the environment variables or .env file for LANGSMITH_PROJECT before querying or interacting with LangSmith. This tells you which project contains the relevant traces and data. If the LangSmith project is not available, use your best judgement to identify the right one.

Python Dependencies

pip install langsmith

JavaScript Dependencies

npm install langsmith

CLI Tool

curl -sSL https://raw.githubusercontent.com/langchain-ai/langsmith-cli/main/scripts/install.sh | sh

Dataset Commands

langsmith dataset list - List datasets in LangSmith
langsmith dataset get <name-or-id> - View dataset details
langsmith dataset create --name <name> - Create a new empty dataset
langsmith dataset delete <name-or-id> - Delete a dataset
langsmith dataset export <name-or-id> <output-file> - Export dataset to local JSON file
langsmith dataset upload <file> --name <name> - Upload a local JSON file as a dataset

Example Commands

langsmith example list --dataset <name> - List examples in a dataset
langsmith example create --dataset <name> --inputs <json> - Add an example to a dataset
langsmith example delete <example-id> - Delete an example

Experiment Commands

langsmith experiment list --dataset <name> - List experiments for a dataset
langsmith experiment get <name> - View experiment results

Common Flags

--limit N - Limit number of results
--yes - Skip confirmation prompts (use with caution)

IMPORTANT - Safety Prompts:

The CLI prompts for confirmation before destructive operations (delete, overwrite)
If you are running with user input: ALWAYS wait for user input; NEVER use --yes unless the user explicitly requests it
If you are running non-interactively: Use --yes to skip confirmation prompts

<dataset_types_overview> Common evaluation dataset types:

final_response - Full conversation with expected output. Tests complete agent behavior.
single_step - Single node inputs/outputs. Tests specific node behavior (e.g., one LLM call or tool).
trajectory - Tool call sequence. Tests execution path (ordered list of tool names).
rag - Question/chunks/answer/citations. Tests retrieval quality. </dataset_types_overview>

<creating_datasets>

Creating Datasets

Datasets are JSON files with an array of examples. Each example has inputs and outputs.

From Exported Traces (Programmatic)

Export traces first, then process them into dataset format using code:

# 1. Export traces to JSONL files
langsmith trace export ./traces --project my-project --limit 20 --full

client = Client()

2. Process traces into dataset examples

3. Save locally

with open("/tmp/dataset.json", "w") as f: json.dump(examples, f, indent=2)

</python>

<typescript>
```typescript
import { Client } from "langsmith";
import { readFileSync, writeFileSync, readdirSync } from "fs";
import { join } from "path";

const client = new Client();

// 2. Process traces into dataset examples
const examples: Array<{trace_id?: string, inputs: Record<string, any>, outputs: Record<string, any>}> = [];
const files = readdirSync("./traces").filter(f => f.endsWith(".jsonl"));

for (const file of files) {
  const lines = readFileSync(join("./traces", file), "utf-8").trim().split("\n");
  const runs = lines.map(line => JSON.parse(line));
  const root = runs.find(r => r.parent_run_id == null);
  if (root?.inputs && root?.outputs) {
    examples.push({ trace_id: root.trace_id, inputs: root.inputs, outputs: root.outputs });
  }
}

// 3. Save locally
writeFileSync("/tmp/dataset.json", JSON.stringify(examples, null, 2));

Upload to LangSmith

# Upload local JSON file as a dataset
langsmith dataset upload /tmp/dataset.json --name "My Evaluation Dataset"

Using the SDK Directly

client = Client()

Create dataset and add examples in one step

dataset = client.create_dataset("My Dataset", description="Evaluation dataset")

client.create_examples( inputs=[{"query": "What is AI?"}, {"query": "Explain RAG"}], outputs=[{"answer": "AI is..."}, {"answer": "RAG is..."}], dataset_name="My Dataset", )

</python>

<typescript>
```typescript
import { Client } from "langsmith";

const client = new Client();

// Create dataset and add examples
const dataset = await client.createDataset("My Dataset", {
  description: "Evaluation dataset",
});

await client.createExamples({
  inputs: [{ query: "What is AI?" }, { query: "Explain RAG" }],
  outputs: [{ answer: "AI is..." }, { answer: "RAG is..." }],
  datasetName: "My Dataset",
});

<dataset_structures>

Dataset Structures by Type

Final Response

{"trace_id": "...", "inputs": {"query": "What are the top genres?"}, "outputs": {"response": "The top genres are..."}}

Single Step

{"trace_id": "...", "inputs": {"messages": [...]}, "outputs": {"content": "..."}, "metadata": {"node_name": "model"}}

Trajectory

{"trace_id": "...", "inputs": {"query": "..."}, "outputs": {"expected_trajectory": ["tool_a", "tool_b", "tool_c"]}}

RAG

{"trace_id": "...", "inputs": {"question": "How do I..."}, "outputs": {"answer": "...", "retrieved_chunks": ["..."], "cited_chunks": ["..."]}}

</dataset_structures>

<script_usage>

CLI Usage

# List all datasets
langsmith dataset list

# Get dataset details
langsmith dataset get "My Dataset"

# Create an empty dataset
langsmith dataset create --name "New Dataset" --description "For evaluation"

# Upload a local JSON file
langsmith dataset upload /tmp/dataset.json --name "My Dataset"

# Export a dataset to local file
langsmith dataset export "My Dataset" /tmp/exported.json --limit 100

# Delete a dataset
langsmith dataset delete "My Dataset"

# List examples in a dataset
langsmith example list --dataset "My Dataset" --limit 10

# Add an example
langsmith example create --dataset "My Dataset" \
  --inputs '{"query": "test"}' \
  --outputs '{"answer": "result"}'

# List experiments
langsmith experiment list --dataset "My Dataset"
langsmith experiment get "eval-v1"

</script_usage>

<example_workflow> Complete workflow from traces to uploaded LangSmith dataset:

# 1. Export traces from LangSmith
langsmith trace export ./traces --project my-project --limit 20 --full

# 2. Process traces into dataset format (using Python/JS code)
# See "Creating Datasets" section above

# 3. Upload to LangSmith
langsmith dataset upload /tmp/final_response.json --name "Skills: Final Response"
langsmith dataset upload /tmp/trajectory.json --name "Skills: Trajectory"

# 4. Verify upload
langsmith dataset list
langsmith dataset get "Skills: Final Response"
langsmith example list --dataset "Skills: Final Response" --limit 3

# 5. Run experiments
langsmith experiment list --dataset "Skills: Final Response"

</example_workflow>

Empty dataset after upload:

Verify JSON file contains an array of objects with inputs key
Check file isn't empty: langsmith example list --dataset "Name"

Export has no data:

Ensure traces were exported with --full flag to include inputs/outputs
Verify traces have both inputs and outputs populated

Example count mismatch:

Use langsmith dataset get "Name" to check remote count
Compare with local file to verify upload completeness

Weekly Installs

630

Repository

langchain-ai/la…h-skills

GitHub Stars

First Seen

Mar 4, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

claude-code579

codex458

cursor396

gemini-cli385

github-copilot384

opencode384

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

41,400 周安装

LangSmith 数据集管理指南：创建、导出、评估 AI 数据集与追踪记录

🇨🇳中文介绍

数据集命令

相关 Skills

示例命令

实验命令

常用标志

创建数据集

从导出的追踪记录（编程方式）

2. 将追踪记录处理成数据集示例

3. 本地保存

上传到 LangSmith

直接使用 SDK

一步创建数据集并添加示例

按类型划分的数据集结构

Final Response

Single Step

Trajectory

RAG

CLI 使用

🇺🇸English

Dataset Commands

Example Commands

Experiment Commands

Common Flags

Creating Datasets

From Exported Traces (Programmatic)

2. Process traces into dataset examples

3. Save locally

Upload to LangSmith

Using the SDK Directly

Create dataset and add examples in one step

Dataset Structures by Type

Final Response

Single Step

Trajectory

RAG

CLI Usage

最新 Skills