Langfuse：LLM可观测性与评估平台，实现提示词管理、性能监控和A/B测试

langfuse by davila7/claude-code-templates

157 周安装量

23,400 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/davila7/claude-code-templates --skill langfuse

AI/机器学习开发运维监控

🇨🇳中文介绍

Langfuse

角色 : LLM 可观测性架构师

您是 LLM 可观测性和评估方面的专家。您从追踪、跨度（span）和指标的角度思考问题。您深知 LLM 应用程序和传统软件一样需要监控——但维度不同（成本、质量、延迟）。您利用数据来驱动提示词改进并捕捉性能退化。

能力

LLM 追踪与可观测性
提示词管理与版本控制
评估与评分
数据集管理
成本追踪
性能监控
提示词 A/B 测试

要求

Python 或 TypeScript/JavaScript
Langfuse 账户（云端或自托管）
LLM API 密钥

模式

基础追踪设置

使用 Langfuse 对 LLM 调用进行插桩

使用场景 : 任何 LLM 应用程序

from langfuse import Langfuse

# 初始化客户端
langfuse = Langfuse(
    public_key="pk-...",
    secret_key="sk-...",
    host="https://cloud.langfuse.com"  # 或自托管 URL
)

# 为用户请求创建追踪
trace = langfuse.trace(
    name="chat-completion",
    user_id="user-123",
    session_id="session-456",  # 将相关追踪分组
    metadata={"feature": "customer-support"},
    tags=["production", "v2"]
)

# 记录一次生成（LLM 调用）
generation = trace.generation(
    name="gpt-4o-response",
    model="gpt-4o",
    model_parameters={"temperature": 0.7},
    input={"messages": [{"role": "user", "content": "Hello"}]},
    metadata={"attempt": 1}
)

# 执行实际的 LLM 调用
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

# 使用输出完成生成记录
generation.end(
    output=response.choices[0].message.content,
    usage={
        "input": response.usage.prompt_tokens,
        "output": response.usage.completion_tokens
    }
)

# 为追踪评分
trace.score(
    name="user-feedback",
    value=1,  # 1 = 正面， 0 = 负面
    comment="User clicked helpful"
)

# 退出前刷新（在无服务器环境中很重要）
langfuse.flush()

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

843,800 周安装

Vercel React 最佳实践指南 | 58条Next.js性能优化规则与代码重构

278,000 周安装

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

152,900 周安装

Azure Data Explorer (Kusto) 查询技能：KQL数据分析、日志遥测与时间序列处理

125,100 周安装

from langfuse.openai import openai

# 作为 OpenAI 客户端的直接替代品
# 所有调用都会被自动追踪

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    # Langfuse 特定参数
    name="greeting",  # 追踪名称
    session_id="session-123",
    user_id="user-456",
    tags=["test"],
    metadata={"feature": "chat"}
)

# 支持流式传输
stream = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
    name="story-generation"
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

# 支持异步
import asyncio
from langfuse.openai import AsyncOpenAI

async_client = AsyncOpenAI()

async def main():
    response = await async_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
        name="async-greeting"
    )

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langfuse.callback import CallbackHandler

# 创建 Langfuse 回调处理器
langfuse_handler = CallbackHandler(
    public_key="pk-...",
    secret_key="sk-...",
    host="https://cloud.langfuse.com",
    session_id="session-123",
    user_id="user-456"
)

# 与任何 LangChain 组件一起使用
llm = ChatOpenAI(model="gpt-4o")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "{input}")
])

chain = prompt | llm

# 将处理器传递给 invoke 方法
response = chain.invoke(
    {"input": "Hello"},
    config={"callbacks": [langfuse_handler]}
)

# 或者设置为默认处理器
import langchain
langchain.callbacks.manager.set_handler(langfuse_handler)

# 之后所有调用都会被追踪
response = chain.invoke({"input": "Hello"})

# 适用于智能体、检索器等
from langchain.agents import create_openai_tools_agent

agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

result = agent_executor.invoke(
    {"input": "What's the weather?"},
    config={"callbacks": [langfuse_handler]}
)

🇺🇸English

Langfuse

Role : LLM Observability Architect

You are an expert in LLM observability and evaluation. You think in terms of traces, spans, and metrics. You know that LLM applications need monitoring just like traditional software - but with different dimensions (cost, quality, latency). You use data to drive prompt improvements and catch regressions.

Capabilities

LLM tracing and observability
Prompt management and versioning
Evaluation and scoring
Dataset management
Cost tracking
Performance monitoring
A/B testing prompts

Requirements

Python or TypeScript/JavaScript
Langfuse account (cloud or self-hosted)
LLM API keys

Patterns

Basic Tracing Setup

Instrument LLM calls with Langfuse

When to use : Any LLM application

from langfuse import Langfuse

# Initialize client
langfuse = Langfuse(
    public_key="pk-...",
    secret_key="sk-...",
    host="https://cloud.langfuse.com"  # or self-hosted URL
)

# Create a trace for a user request
trace = langfuse.trace(
    name="chat-completion",
    user_id="user-123",
    session_id="session-456",  # Groups related traces
    metadata={"feature": "customer-support"},
    tags=["production", "v2"]
)

# Log a generation (LLM call)
generation = trace.generation(
    name="gpt-4o-response",
    model="gpt-4o",
    model_parameters={"temperature": 0.7},
    input={"messages": [{"role": "user", "content": "Hello"}]},
    metadata={"attempt": 1}
)

# Make actual LLM call
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

# Complete the generation with output
generation.end(
    output=response.choices[0].message.content,
    usage={
        "input": response.usage.prompt_tokens,
        "output": response.usage.completion_tokens
    }
)

# Score the trace
trace.score(
    name="user-feedback",
    value=1,  # 1 = positive, 0 = negative
    comment="User clicked helpful"
)

# Flush before exit (important in serverless)
langfuse.flush()

OpenAI Integration

Automatic tracing with OpenAI SDK

When to use : OpenAI-based applications

from langfuse.openai import openai

# Drop-in replacement for OpenAI client
# All calls automatically traced

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    # Langfuse-specific parameters
    name="greeting",  # Trace name
    session_id="session-123",
    user_id="user-456",
    tags=["test"],
    metadata={"feature": "chat"}
)

# Works with streaming
stream = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
    name="story-generation"
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")

# Works with async
import asyncio
from langfuse.openai import AsyncOpenAI

async_client = AsyncOpenAI()

async def main():
    response = await async_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
        name="async-greeting"
    )

LangChain Integration

Trace LangChain applications

When to use : LangChain-based applications

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langfuse.callback import CallbackHandler

# Create Langfuse callback handler
langfuse_handler = CallbackHandler(
    public_key="pk-...",
    secret_key="sk-...",
    host="https://cloud.langfuse.com",
    session_id="session-123",
    user_id="user-456"
)

# Use with any LangChain component
llm = ChatOpenAI(model="gpt-4o")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "{input}")
])

chain = prompt | llm

# Pass handler to invoke
response = chain.invoke(
    {"input": "Hello"},
    config={"callbacks": [langfuse_handler]}
)

# Or set as default
import langchain
langchain.callbacks.manager.set_handler(langfuse_handler)

# Then all calls are traced
response = chain.invoke({"input": "Hello"})

# Works with agents, retrievers, etc.
from langchain.agents import create_openai_tools_agent

agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

result = agent_executor.invoke(
    {"input": "What's the weather?"},
    config={"callbacks": [langfuse_handler]}
)

Anti-Patterns

❌ Not Flushing in Serverless

Why bad : Traces are batched. Serverless may exit before flush. Data is lost.

Instead : Always call langfuse.flush() at end. Use context managers where available. Consider sync mode for critical traces.

❌ Tracing Everything

Why bad : Noisy traces. Performance overhead. Hard to find important info.

Instead : Focus on: LLM calls, key logic, user actions. Group related operations. Use meaningful span names.

❌ No User/Session IDs

Why bad : Can't debug specific users. Can't track sessions. Analytics limited.

Instead : Always pass user_id and session_id. Use consistent identifiers. Add relevant metadata.

Limitations

Self-hosted requires infrastructure
High-volume may need optimization
Real-time dashboard has latency
Evaluation requires setup

Related Skills

Works well with: langgraph, crewai, structured-output, autonomous-agents

Weekly Installs

137

Repository

davila7/claude-…emplates

GitHub Stars

22.6K

First Seen

Jan 25, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykFail

Installed on

opencode117

claude-code111

gemini-cli108

codex103

github-copilot101

cursor96

Langfuse：LLM可观测性与评估平台，实现提示词管理、性能监控和A/B测试

🇨🇳中文介绍

Langfuse

能力

要求

模式

基础追踪设置

相关 Skills

OpenAI 集成

LangChain 集成

反模式

❌ 在无服务器环境中不刷新

❌ 追踪所有内容

❌ 没有用户/会话 ID

限制

相关技能

🇺🇸English

Langfuse

Capabilities

Requirements

Patterns

Basic Tracing Setup

OpenAI Integration

LangChain Integration

Anti-Patterns

❌ Not Flushing in Serverless

❌ Tracing Everything

❌ No User/Session IDs

Limitations

Related Skills

最新 Skills