Langfuse LLM 应用可观测性指南：集成、追踪与最佳实践

langfuse-observability by langfuse/skills

213 周安装量

60 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/langfuse/skills --skill langfuse-observability

AI/机器学习开发运维监控

🇨🇳中文介绍

Langfuse 可观测性

使用 Langfuse 追踪功能为 LLM 应用添加可观测性，遵循最佳实践并根据您的用例进行定制。

何时使用

在新项目中设置 Langfuse
审计现有的 Langfuse 集成
为 LLM 调用添加可观测性

工作流程

1. 评估当前状态

检查项目：

Langfuse SDK 是否已安装？
使用了哪些 LLM 框架？（OpenAI SDK、LangChain、LlamaIndex、Vercel AI SDK 等）
是否存在现有的集成？

尚无集成： 如果有可用的框架集成，请使用它来设置 Langfuse。集成能自动捕获更多上下文，比手动集成需要更少的代码。

已有集成： 根据以下基线要求进行审计。

2. 验证基线要求

每个追踪都应具备以下基本要素：

要求	检查项	原因
模型名称	是否捕获了 LLM 模型？	支持模型比较和筛选
令牌使用量	是否追踪了输入/输出令牌？	支持自动成本计算
良好的追踪名称	名称是否具有描述性？（例如 `chat-response`，而非）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

3. 首先探索追踪

一旦基线集成开始工作，鼓励用户在添加更多上下文之前，先在 Langfuse UI 中探索他们的追踪：

"您的追踪现在已出现在 Langfuse 中。查看其中一些——看看捕获了哪些数据，哪些有用，哪些缺失。这将帮助我们决定添加哪些额外的上下文。"

这有助于用户：

了解他们已经获得的信息
对缺失内容形成看法
提出关于他们需求的更好问题

4. 发现额外的上下文需求

确定哪些额外的集成是有价值的。尽可能从代码中推断，仅在情况不明时提问。

从代码中推断：

如果在代码中看到...	推断	建议
对话历史、聊天端点、消息数组	多轮对话应用	`session_id`
用户认证、`user_id` 变量	用户感知应用	在追踪上设置 `user_id`
多个不同的端点/功能	多功能应用	`feature` 标签
客户/租户标识符	多租户应用	`customer_id` 或层级标签
反馈收集、评分	拥有用户反馈	捕获为分数

仅在代码不明显时提问：

"您如何知道一个响应是好是坏？" → 确定评分方法
"您希望在仪表板中按什么筛选？" → 发现不明显的标签
"是否有您想要比较的不同用户群体？" → 客户层级、计划等

添加项及其价值：

添加项	原因	文档
`session_id`	将对话分组在一起	https://langfuse.com/docs/tracing-features/sessions
`user_id`	支持用户筛选和成本归属	https://langfuse.com/docs/tracing-features/users
用户反馈分数	支持质量筛选和趋势分析	https://langfuse.com/docs/scores/overview
`feature` 标签	按功能进行分析	https://langfuse.com/docs/tracing-features/tags
`customer_tier` 标签	按细分群体进行成本/质量分解	https://langfuse.com/docs/tracing-features/tags

这些不是基线要求——仅根据推断或用户输入添加相关项。

添加上下文后，引导用户使用相关的 UI 功能：

追踪视图：查看单个请求
会话视图：查看分组后的对话（如果添加了 session_id）
仪表板：使用标签构建筛选视图
分数：按质量指标筛选

优先使用这些集成而非手动集成：

框架	集成方式	文档
OpenAI SDK	直接替换	https://langfuse.com/docs/integrations/openai
LangChain	回调处理器	https://langfuse.com/docs/integrations/langchain
LlamaIndex	回调处理器	https://langfuse.com/docs/integrations/llama-index
Vercel AI SDK	OpenTelemetry 导出器	https://langfuse.com/docs/integrations/vercel-ai-sdk
LiteLLM	回调或代理	https://langfuse.com/docs/integrations/litellm

当建议添加项时，解释对用户的益处：

"我建议在您的追踪中添加 session_id。

原因：这将来自同一对话的消息分组在一起。
您将能够在会话视图中看到完整的对话流程，
这使得调试多轮交互变得容易得多。

了解更多：https://langfuse.com/docs/tracing-features/sessions"

错误	问题	修复方法
脚本中没有 `flush()`	追踪从未发送	在退出前调用 `langfuse.flush()`
扁平化追踪	无法看到哪个步骤失败	为不同的步骤使用嵌套跨度
通用追踪名称	难以筛选	使用描述性名称：`chat-response`、`doc-summary`
记录敏感数据	数据泄露风险	在追踪前对个人身份信息脱敏
存在集成时仍使用手动集成	代码更多，上下文更少	使用框架集成
在环境变量加载前导入 Langfuse	Langfuse 使用缺失/错误的凭据初始化	在加载环境变量（例如 `load_dotenv()` 之后）之后导入 Langfuse
与 OpenAI 的导入顺序错误	Langfuse 无法修补 OpenAI 客户端	在导入 OpenAI 客户端之前导入 Langfuse 并调用其设置

🇺🇸English

Langfuse Observability

Instrument LLM applications with Langfuse tracing, following best practices and tailored to your use case.

When to Use

Setting up Langfuse in a new project
Auditing existing Langfuse instrumentation
Adding observability to LLM calls

Workflow

1. Assess Current State

Check the project:

Is Langfuse SDK installed?
What LLM frameworks are used? (OpenAI SDK, LangChain, LlamaIndex, Vercel AI SDK, etc.)
Is there existing instrumentation?

No integration yet: Set up Langfuse using a framework integration if available. Integrations capture more context automatically and require less code than manual instrumentation.

Integration exists: Audit against baseline requirements below.

2. Verify Baseline Requirements

Every trace should have these fundamentals:

Requirement	Check	Why
Model name	Is the LLM model captured?	Enables model comparison and filtering
Token usage	Are input/output tokens tracked?	Enables automatic cost calculation
Good trace names	Are names descriptive? (`chat-response`, not `trace-1`)	Makes traces findable and filterable
Span hierarchy	Are multi-step operations nested properly?	Shows which step is slow or failing
Correct observation types	Are generations marked as generations?	Enables model-specific analytics
Sensitive data masked	Is PII/confidential data excluded or masked?	Prevents data leakage
Trace input/output	Does the trace capture the full data being processed as input, and the result as output?	Enables debugging and understanding what was processed

Framework integrations (OpenAI, LangChain, etc.) handle model name, tokens, and observation types automatically. Prefer integrations over manual instrumentation.

Docs: https://langfuse.com/docs/tracing

3. Explore Traces First

Once baseline instrumentation is working, encourage the user to explore their traces in the Langfuse UI before adding more context:

"Your traces are now appearing in Langfuse. Take a look at a few of them—see what data is being captured, what's useful, and what's missing. This will help us decide what additional context to add."

This helps the user:

Understand what they're already getting
Form opinions about what's missing
Ask better questions about what they need

4. Discover Additional Context Needs

Determine what additional instrumentation would be valuable. Infer from code when possible, only ask when unclear.

Infer from code:

If you see in code...	Infer	Suggest
Conversation history, chat endpoints, message arrays	Multi-turn app	`session_id`
User authentication, `user_id` variables	User-aware app	`user_id` on traces
Multiple distinct endpoints/features	Multi-feature app	`feature` tag
Customer/tenant identifiers	Multi-tenant app	`customer_id` or tier tag
Feedback collection, ratings

Only ask when not obvious from code:

"How do you know when a response is good vs bad?" → Determines scoring approach
"What would you want to filter by in a dashboard?" → Surfaces non-obvious tags
"Are there different user segments you'd want to compare?" → Customer tiers, plans, etc.

Additions and their value:

Addition	Why	Docs
`session_id`	Groups conversations together	https://langfuse.com/docs/tracing-features/sessions
`user_id`	Enables user filtering and cost attribution	https://langfuse.com/docs/tracing-features/users
User feedback score	Enables quality filtering and trends	https://langfuse.com/docs/scores/overview
`feature` tag	Per-feature analytics	https://langfuse.com/docs/tracing-features/tags

These are NOT baseline requirements—only add what's relevant based on inference or user input.

5. Guide to UI

After adding context, point users to relevant UI features:

Traces view: See individual requests
Sessions view: See grouped conversations (if session_id added)
Dashboard: Build filtered views using tags
Scores: Filter by quality metrics

Framework Integrations

Prefer these over manual instrumentation:

Framework	Integration	Docs
OpenAI SDK	Drop-in replacement	https://langfuse.com/docs/integrations/openai
LangChain	Callback handler	https://langfuse.com/docs/integrations/langchain
LlamaIndex	Callback handler	https://langfuse.com/docs/integrations/llama-index
Vercel AI SDK	OpenTelemetry exporter	https://langfuse.com/docs/integrations/vercel-ai-sdk
LiteLLM	Callback or proxy	https://langfuse.com/docs/integrations/litellm

Full list: https://langfuse.com/docs/integrations

Always Explain Why

When suggesting additions, explain the user benefit:

"I recommend adding session_id to your traces.

Why: This groups messages from the same conversation together.
You'll be able to see full conversation flows in the Sessions view,
making it much easier to debug multi-turn interactions.

Learn more: https://langfuse.com/docs/tracing-features/sessions"

Common Mistakes

Mistake	Problem	Fix
No `flush()` in scripts	Traces never sent	Call `langfuse.flush()` before exit
Flat traces	Can't see which step failed	Use nested spans for distinct steps
Generic trace names	Hard to filter	Use descriptive names: `chat-response`, `doc-summary`
Logging sensitive data	Data leakage risk	Mask PII before tracing
Manual instrumentation when integration exists	More code, less context	Use framework integration

Weekly Installs

213

Repository

langfuse/skills

GitHub Stars

First Seen

Jan 23, 2026

Security Audits

Gen Agent Trust HubFail SocketPass SnykPass

Installed on

codex185

opencode185

gemini-cli175

github-copilot167

claude-code136

kimi-cli135