dspy-ruby by everyinc/compound-engineering-plugin
npx skills add https://github.com/everyinc/compound-engineering-plugin --skill dspy-ruby像构建软件一样构建 LLM 应用。类型安全、模块化、可测试。
DSPy.rb 将软件工程的最佳实践引入 LLM 开发。无需手动调整提示词,用 Ruby 类型定义你的需求,剩下的交给 DSPy。
DSPy.rb 是一个用于通过编程式提示构建语言模型应用的 Ruby 框架。它提供:
使用 Ruby 类型定义应用与 LLM 之间的接口:
class EmailClassifier < DSPy::Signature
description "Classify customer support emails by category and priority"
class Priority < T::Enum
enums do
Low = new('low')
Medium = new('medium')
High = new('high')
Urgent = new('urgent')
end
end
input do
const :email_content, String
const :sender, String
end
output do
const :category, String
const :priority, Priority # Type-safe enum with defined values
const :confidence, Float
end
end
从简单的构建块构建复杂的工作流:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
dspy-code_act gem)为智能体创建类型安全的工具,并提供全面的 Sorbet 支持:
# Enum-based tool with automatic type conversion
class CalculatorTool < DSPy::Tools::Base
tool_name 'calculator'
tool_description 'Performs arithmetic operations with type-safe enum inputs'
class Operation < T::Enum
enums do
Add = new('add')
Subtract = new('subtract')
Multiply = new('multiply')
Divide = new('divide')
end
end
sig { params(operation: Operation, num1: Float, num2: Float).returns(T.any(Float, String)) }
def call(operation:, num1:, num2:)
case operation
when Operation::Add then num1 + num2
when Operation::Subtract then num1 - num2
when Operation::Multiply then num1 * num2
when Operation::Divide
return "Error: Division by zero" if num2 == 0
num1 / num2
end
end
end
# Multi-tool toolset with rich types
class DataToolset < DSPy::Tools::Toolset
toolset_name "data_processing"
class Format < T::Enum
enums do
JSON = new('json')
CSV = new('csv')
XML = new('xml')
end
end
tool :convert, description: "Convert data between formats"
tool :validate, description: "Validate data structure"
sig { params(data: String, from: Format, to: Format).returns(String) }
def convert(data:, from:, to:)
"Converted from #{from.serialize} to #{to.serialize}"
end
sig { params(data: String, format: Format).returns(T::Hash[String, T.any(String, Integer, T::Boolean)]) }
def validate(data:, format:)
{ valid: true, format: format.serialize, row_count: 42, message: "Data validation passed" }
end
end
DSPy.rb 使用复杂的类型鉴别来处理复杂的数据结构:
_type 字段 — DSPy 为结构体添加鉴别器字段以确保类型安全T.any() 类型通过 _type 自动区分_type 字段_type 字段使用真实数据提高准确性:
# Install
gem 'dspy'
# Configure
DSPy.configure do |c|
c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
end
# Define a task
class SentimentAnalysis < DSPy::Signature
description "Analyze sentiment of text"
input do
const :text, String
end
output do
const :sentiment, String # positive, negative, neutral
const :score, Float # 0.0 to 1.0
end
end
# Use it
analyzer = DSPy::Predict.new(SentimentAnalysis)
result = analyzer.call(text: "This product is amazing!")
puts result.sentiment # => "positive"
puts result.score # => 0.92
连接 LLM 提供商的两种策略:
# Gemfile
gem 'dspy'
gem 'dspy-openai' # OpenAI, OpenRouter, Ollama
gem 'dspy-anthropic' # Claude
gem 'dspy-gemini' # Gemini
每个适配器 gem 都会引入官方的 SDK(openai、anthropic、gemini-ai)。
# Gemfile
gem 'dspy'
gem 'dspy-ruby_llm' # Routes to any provider via ruby_llm
gem 'ruby_llm'
RubyLLM 根据模型名称处理提供商路由。使用 ruby_llm/ 前缀:
DSPy.configure do |c|
c.lm = DSPy::LM.new('ruby_llm/gemini-2.5-flash', structured_outputs: true)
# c.lm = DSPy::LM.new('ruby_llm/claude-sonnet-4-20250514', structured_outputs: true)
# c.lm = DSPy::LM.new('ruby_llm/gpt-4o-mini', structured_outputs: true)
end
DSPy.rb 附带一个结构化的事件总线,用于观察运行时行为。
class MyAgent < DSPy::Module
subscribe 'lm.tokens', :track_tokens, scope: :descendants
def track_tokens(_event, attrs)
@total_tokens += attrs.fetch(:total_tokens, 0)
end
end
subscription_id = DSPy.events.subscribe('score.create') do |event, attrs|
Langfuse.export_score(attrs)
end
# Wildcards supported
DSPy.events.subscribe('llm.*') { |name, attrs| puts "[#{name}] tokens=#{attrs[:total_tokens]}" }
事件名称使用点分隔的命名空间(llm.generate、react.iteration_complete)。每个事件都包含用于过滤的模块元数据(module_path、module_leaf、module_scope.ancestry_token)。
每个 DSPy::Module 都附带 Rails 风格的生命周期钩子:
before — 在 forward 之前运行,用于设置(指标、上下文加载)around — 包装 forward,调用 yield,并允许你配对设置/清理逻辑after — 在 forward 返回后触发,用于清理或持久化class InstrumentedModule < DSPy::Module
before :setup_metrics
around :manage_context
after :log_metrics
def forward(question:)
@predictor.call(question: question)
end
private
def setup_metrics
@start_time = Time.now
end
def manage_context
load_context
result = yield
save_context
result
end
def log_metrics
duration = Time.now - @start_time
Rails.logger.info "Prediction completed in #{duration}s"
end
end
执行顺序:before → around(在 yield 之前)→ forward → around(在 yield 之后)→ after。回调从父类继承,并按注册顺序执行。
使用 fiber-本地存储临时覆盖语言模型:
fast_model = DSPy::LM.new("openai/gpt-4o-mini", api_key: ENV['OPENAI_API_KEY'])
DSPy.with_lm(fast_model) do
result = classifier.call(text: "test") # Uses fast_model inside this block
end
# Back to global LM outside the block
LM 解析层次结构:实例级 LM → Fiber-本地 LM(DSPy.with_lm)→ 全局 LM(DSPy.configure)。
使用 configure_predictor 对智能体内部进行细粒度控制:
agent = DSPy::ReAct.new(MySignature, tools: tools)
agent.configure { |c| c.lm = default_model }
agent.configure_predictor('thought_generator') { |c| c.lm = powerful_model }
使用 DSPy::Evals 系统性地测试 LLM 应用性能:
metric = DSPy::Metrics.exact_match(field: :answer, case_sensitive: false)
evaluator = DSPy::Evals.new(predictor, metric: metric)
result = evaluator.evaluate(test_examples, display_table: true)
puts "Pass Rate: #{(result.pass_rate * 100).round(1)}%"
内置指标:exact_match、contains、numeric_difference、composite_and。自定义指标返回 true/false 或带有 score: 和 feedback: 字段的 DSPy::Prediction。
使用 DSPy::Example 获取类型化的测试数据,并使用 export_scores: true 将结果推送到 Langfuse。
GEPA(遗传-帕累托反射式提示进化)使用反射驱动的指令重写:
gem 'dspy-gepa'
teleprompter = DSPy::Teleprompt::GEPA.new(
metric: metric,
reflection_lm: DSPy::ReflectionLM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']),
feedback_map: feedback_map,
config: { max_metric_calls: 600, minibatch_size: 6 }
)
result = teleprompter.compile(program, trainset: train, valset: val)
optimized_program = result.optimized_program
指标必须返回 DSPy::Prediction.new(score:, feedback:),以便反射模型能够推理失败原因。使用 feedback_map 来定位复合模块中的各个预测器。
用 T::Struct 输入替换不透明的字符串上下文块。JSON 模式中的每个字段都有自己的 description: 注解,供 LLM 查看:
class NavigationContext < T::Struct
const :workflow_hint, T.nilable(String),
description: "Current workflow phase guidance for the agent"
const :action_log, T::Array[String], default: [],
description: "Compact one-line-per-action history of research steps taken"
const :iterations_remaining, Integer,
description: "Budget remaining. Each tool call costs 1 iteration."
end
class ToolSelectionSignature < DSPy::Signature
input do
const :query, String
const :context, NavigationContext # Structured, not an opaque string
end
output do
const :tool_name, String
const :tool_args, String, description: "JSON-encoded arguments"
end
end
优点:编译时类型安全,LLM 模式中每个字段都有描述,易于作为值对象测试,通过添加 const 声明即可扩展。
控制 DSPy 如何向 LLM 描述签名结构:
structured_outputs: trueschema_format: :baml)— 增强提示模式下减少 84% 的令牌。需要 sorbet-baml gem。schema_format: :toon, data_format: :toon)— 用于模式和数据表格导向的格式。仅限增强提示模式。BAML 和 TOON 仅适用于 structured_outputs: false。当 structured_outputs: true 时,提供商直接接收 JSON 模式。
使用 DSPy::Storage::ProgramStorage 持久化和重新加载优化后的程序:
storage = DSPy::Storage::ProgramStorage.new(storage_path: "./dspy_storage")
storage.save_program(result.optimized_program, result, metadata: { optimizer: 'MIPROv2' })
支持检查点管理、优化历史跟踪以及环境之间的导入/导出。
使用 Rails 约定组织 DSPy 组件:
app/
entities/ # T::Struct types shared across signatures
signatures/ # DSPy::Signature definitions
tools/ # DSPy::Tools::Base implementations
concerns/ # Shared tool behaviors (error handling, etc.)
modules/ # DSPy::Module orchestrators
services/ # Plain Ruby services that compose DSPy modules
config/
initializers/
dspy.rb # DSPy + provider configuration
feature_flags.rb # Model selection per role
spec/
signatures/ # Schema validation tests
tools/ # Tool unit tests
modules/ # Integration tests with VCR
vcr_cassettes/ # Recorded HTTP interactions
# config/initializers/dspy.rb
Rails.application.config.after_initialize do
next if Rails.env.test? && ENV["DSPY_ENABLE_IN_TEST"].blank?
RubyLLM.configure do |config|
config.gemini_api_key = ENV["GEMINI_API_KEY"] if ENV["GEMINI_API_KEY"].present?
config.anthropic_api_key = ENV["ANTHROPIC_API_KEY"] if ENV["ANTHROPIC_API_KEY"].present?
config.openai_api_key = ENV["OPENAI_API_KEY"] if ENV["OPENAI_API_KEY"].present?
end
model = ENV.fetch("DSPY_MODEL", "ruby_llm/gemini-2.5-flash")
DSPy.configure do |config|
config.lm = DSPy::LM.new(model, structured_outputs: true)
config.logger = Rails.logger
end
# Langfuse observability (optional)
if ENV["LANGFUSE_PUBLIC_KEY"].present? && ENV["LANGFUSE_SECRET_KEY"].present?
DSPy::Observability.configure!
end
end
为不同角色使用不同的模型(分类用快速/廉价模型,合成用强大模型):
# config/initializers/feature_flags.rb
module FeatureFlags
SELECTOR_MODEL = ENV.fetch("DSPY_SELECTOR_MODEL", "ruby_llm/gemini-2.5-flash-lite")
SYNTHESIZER_MODEL = ENV.fetch("DSPY_SYNTHESIZER_MODEL", "ruby_llm/gemini-2.5-flash")
end
然后按工具或按预测器覆盖:
class ClassifyTool < DSPy::Tools::Base
def call(query:)
predictor = DSPy::Predict.new(ClassifyQuery)
predictor.configure { |c| c.lm = DSPy::LM.new(FeatureFlags::SELECTOR_MODEL, structured_outputs: true) }
predictor.call(query: query)
end
end
优先使用类型化模式而非字符串描述。 让类型系统向 LLM 传达结构,而不是在签名描述中使用散文。
在 app/entities/ 中定义可复用的 T::Struct 和 T::Enum 类型,并在签名中引用它们:
# app/entities/search_strategy.rb
class SearchStrategy < T::Enum
enums do
SingleSearch = new("single_search")
DateDecomposition = new("date_decomposition")
end
end
# app/entities/scored_item.rb
class ScoredItem < T::Struct
const :id, String
const :score, Float, description: "Relevance score 0.0-1.0"
const :verdict, String, description: "relevant, maybe, or irrelevant"
const :reason, String, default: ""
end
使用模式(T::Struct/T::Enum) 用于:
使用字符串描述 用于:
String 的简单单字段输出description: "YYYY-MM-DD 格式")经验法则:如果你会在输出上写 case 语句,它应该是 T::Enum。如果你会在它上面调用 .each,它应该是 T::Array[SomeStruct]。
一种常见模式:工具封装 DSPy 预测,添加错误处理、模型选择和序列化:
class RerankTool < DSPy::Tools::Base
tool_name "rerank"
tool_description "Score and rank search results by relevance"
MAX_ITEMS = 200
MIN_ITEMS_FOR_LLM = 5
sig { params(query: String, items: T::Array[T::Hash[Symbol, T.untyped]]).returns(T::Hash[Symbol, T.untyped]) }
def call(query:, items: [])
return { scored_items: items, reranked: false } if items.size < MIN_ITEMS_FOR_LLM
capped_items = items.first(MAX_ITEMS)
predictor = DSPy::Predict.new(RerankSignature)
predictor.configure { |c| c.lm = DSPy::LM.new(FeatureFlags::SYNTHESIZER_MODEL, structured_outputs: true) }
result = predictor.call(query: query, items: capped_items)
{ scored_items: result.scored_items, reranked: true }
rescue => e
Rails.logger.warn "[RerankTool] LLM rerank failed: #{e.message}"
{ error: "Rerank failed: #{e.message}", scored_items: items, reranked: false }
end
end
关键模式:
configure 进行按工具模型选择module ErrorHandling
extend ActiveSupport::Concern
private
def safe_predict(signature_class, **inputs)
predictor = DSPy::Predict.new(signature_class)
yield predictor if block_given?
predictor.call(**inputs)
rescue Faraday::Error, Net::HTTPError => e
Rails.logger.error "[#{self.class.name}] API error: #{e.message}"
nil
rescue JSON::ParserError => e
Rails.logger.error "[#{self.class.name}] Invalid LLM output: #{e.message}"
nil
end
end
将操作包装在 span 中,以实现 Langfuse/OpenTelemetry 可见性:
result = DSPy::Context.with_span(
operation: "tool_selector.select",
"dspy.module" => "ToolSelector",
"tool_selector.tools" => tool_names.join(",")
) do
@predictor.call(query: query, context: context, available_tools: schemas)
end
# Gemfile
gem 'dspy-o11y'
gem 'dspy-o11y-langfuse'
# .env
LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_SECRET_KEY=sk-...
DSPY_TELEMETRY_BATCH_SIZE=5
配置可观测性后,每个 DSPy::Predict、DSPy::ReAct 和工具调用都会自动被追踪。
将评估分数报告给 Langfuse:
DSPy.score(name: "relevance", value: 0.85, trace_id: current_trace_id)
VCR.configure do |config|
config.cassette_library_dir = "spec/vcr_cassettes"
config.hook_into :webmock
config.configure_rspec_metadata!
config.filter_sensitive_data('<GEMINI_API_KEY>') { ENV['GEMINI_API_KEY'] }
config.filter_sensitive_data('<OPENAI_API_KEY>') { ENV['OPENAI_API_KEY'] }
end
测试签名是否生成有效模式,而无需调用任何 LLM:
RSpec.describe ClassifyResearchQuery do
it "has required input fields" do
schema = described_class.input_json_schema
expect(schema[:required]).to include("query")
end
it "has typed output fields" do
schema = described_class.output_json_schema
expect(schema[:properties]).to have_key(:search_strategy)
end
end
RSpec.describe RerankTool do
let(:tool) { described_class.new }
it "skips LLM for small result sets" do
expect(DSPy::Predict).not_to receive(:new)
result = tool.call(query: "test", items: [{ id: "1" }])
expect(result[:reranked]).to be false
end
it "calls LLM for large result sets", :vcr do
items = 10.times.map { |i| { id: i.to_s, title: "Item #{i}" } }
result = tool.call(query: "relevant items", items: items)
expect(result[:reranked]).to be true
end
end
帮助用户使用 DSPy.rb 时:
T::Struct 和 T::Enum 类型定义输出结构,而非字符串描述app/entities/ — 提取共享类型,使签名保持精简predictor.configure { |c| c.lm = ... } 为每个任务选择合适的模型input_json_schema 和 output_json_schemaDSPy::Context.with_span 中以实现可观测性保持描述简洁 — 签名 description 应陈述目标,而非字段细节:
# Good — concise goal
class ParseOutline < DSPy::Signature
description 'Extract block-level structure from HTML as a flat list of skeleton sections.'
input do
const :html, String, description: 'Raw HTML to parse'
end
output do
const :sections, T::Array[Section], description: 'Block elements: headings, paragraphs, code blocks, lists'
end
end
使用默认值而非可空数组 — 为了与 OpenAI 结构化输出兼容:
# Good — works with OpenAI structured outputs
class ASTNode < T::Struct
const :children, T::Array[ASTNode], default: []
end
$defs 的递归类型DSPy.rb 支持在结构化输出中使用 JSON 模式 $defs 的递归类型:
class TreeNode < T::Struct
const :value, String
const :children, T::Array[TreeNode], default: [] # Self-reference
end
模式生成器会自动为递归类型创建 #/$defs/TreeNode 引用,与 OpenAI 和 Gemini 的结构化输出兼容。
DSPy.rb 扩展了 T::Struct,以支持字段级的 description: 关键字参数,这些参数会流向 JSON 模式:
class ASTNode < T::Struct
const :node_type, NodeType, description: 'The type of node (heading, paragraph, etc.)'
const :text, String, default: "", description: 'Text content of the node'
const :level, Integer, default: 0 # No description — field is self-explanatory
const :children, T::Array[ASTNode], default: []
end
何时使用字段描述:复杂的字段语义、枚举式字符串、受约束的值、具有模糊名称的嵌套结构体。何时跳过:自解释的字段,如 name、id、url 或布尔标志。
当前:0.34.3
每周安装量
206
仓库
GitHub Stars
10.9K
首次出现
Jan 21, 2026
安全审计
安装于
gemini-cli178
opencode177
codex176
claude-code175
cursor171
github-copilot164
Build LLM apps like you build software. Type-safe, modular, testable.
DSPy.rb brings software engineering best practices to LLM development. Instead of tweaking prompts, define what you want with Ruby types and let DSPy handle the rest.
DSPy.rb is a Ruby framework for building language model applications with programmatic prompts. It provides:
Define interfaces between your app and LLMs using Ruby types:
class EmailClassifier < DSPy::Signature
description "Classify customer support emails by category and priority"
class Priority < T::Enum
enums do
Low = new('low')
Medium = new('medium')
High = new('high')
Urgent = new('urgent')
end
end
input do
const :email_content, String
const :sender, String
end
output do
const :category, String
const :priority, Priority # Type-safe enum with defined values
const :confidence, Float
end
end
Build complex workflows from simple building blocks:
dspy-code_act gem)Create type-safe tools for agents with comprehensive Sorbet support:
# Enum-based tool with automatic type conversion
class CalculatorTool < DSPy::Tools::Base
tool_name 'calculator'
tool_description 'Performs arithmetic operations with type-safe enum inputs'
class Operation < T::Enum
enums do
Add = new('add')
Subtract = new('subtract')
Multiply = new('multiply')
Divide = new('divide')
end
end
sig { params(operation: Operation, num1: Float, num2: Float).returns(T.any(Float, String)) }
def call(operation:, num1:, num2:)
case operation
when Operation::Add then num1 + num2
when Operation::Subtract then num1 - num2
when Operation::Multiply then num1 * num2
when Operation::Divide
return "Error: Division by zero" if num2 == 0
num1 / num2
end
end
end
# Multi-tool toolset with rich types
class DataToolset < DSPy::Tools::Toolset
toolset_name "data_processing"
class Format < T::Enum
enums do
JSON = new('json')
CSV = new('csv')
XML = new('xml')
end
end
tool :convert, description: "Convert data between formats"
tool :validate, description: "Validate data structure"
sig { params(data: String, from: Format, to: Format).returns(String) }
def convert(data:, from:, to:)
"Converted from #{from.serialize} to #{to.serialize}"
end
sig { params(data: String, format: Format).returns(T::Hash[String, T.any(String, Integer, T::Boolean)]) }
def validate(data:, format:)
{ valid: true, format: format.serialize, row_count: 42, message: "Data validation passed" }
end
end
DSPy.rb uses sophisticated type discrimination for complex data structures:
_type field injection — DSPy adds discriminator fields to structs for type safetyT.any() types automatically disambiguated by _type_type fields in structs_type fields filtered during deserialization at all nesting levelsImprove accuracy with real data:
# Install
gem 'dspy'
# Configure
DSPy.configure do |c|
c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
end
# Define a task
class SentimentAnalysis < DSPy::Signature
description "Analyze sentiment of text"
input do
const :text, String
end
output do
const :sentiment, String # positive, negative, neutral
const :score, Float # 0.0 to 1.0
end
end
# Use it
analyzer = DSPy::Predict.new(SentimentAnalysis)
result = analyzer.call(text: "This product is amazing!")
puts result.sentiment # => "positive"
puts result.score # => 0.92
Two strategies for connecting to LLM providers:
# Gemfile
gem 'dspy'
gem 'dspy-openai' # OpenAI, OpenRouter, Ollama
gem 'dspy-anthropic' # Claude
gem 'dspy-gemini' # Gemini
Each adapter gem pulls in the official SDK (openai, anthropic, gemini-ai).
# Gemfile
gem 'dspy'
gem 'dspy-ruby_llm' # Routes to any provider via ruby_llm
gem 'ruby_llm'
RubyLLM handles provider routing based on the model name. Use the ruby_llm/ prefix:
DSPy.configure do |c|
c.lm = DSPy::LM.new('ruby_llm/gemini-2.5-flash', structured_outputs: true)
# c.lm = DSPy::LM.new('ruby_llm/claude-sonnet-4-20250514', structured_outputs: true)
# c.lm = DSPy::LM.new('ruby_llm/gpt-4o-mini', structured_outputs: true)
end
DSPy.rb ships with a structured event bus for observing runtime behavior.
class MyAgent < DSPy::Module
subscribe 'lm.tokens', :track_tokens, scope: :descendants
def track_tokens(_event, attrs)
@total_tokens += attrs.fetch(:total_tokens, 0)
end
end
subscription_id = DSPy.events.subscribe('score.create') do |event, attrs|
Langfuse.export_score(attrs)
end
# Wildcards supported
DSPy.events.subscribe('llm.*') { |name, attrs| puts "[#{name}] tokens=#{attrs[:total_tokens]}" }
Event names use dot-separated namespaces (llm.generate, react.iteration_complete). Every event includes module metadata (module_path, module_leaf, module_scope.ancestry_token) for filtering.
Rails-style lifecycle hooks ship with every DSPy::Module:
before — Runs ahead of forward for setup (metrics, context loading)
around — Wraps forward, calls yield, and lets you pair setup/teardown logic
after — Fires after forward returns for cleanup or persistence
class InstrumentedModule < DSPy::Module before :setup_metrics around :manage_context after :log_metrics
def forward(question:) @predictor.call(question: question) end
private
def setup_metrics @start_time = Time.now end
def manage_context load_context result = yield save_context result end
def log_metrics duration = Time.now - @start_time Rails.logger.info "Prediction completed in #{duration}s" end end
Execution order: before → around (before yield) → forward → around (after yield) → after. Callbacks are inherited from parent classes and execute in registration order.
Override the language model temporarily using fiber-local storage:
fast_model = DSPy::LM.new("openai/gpt-4o-mini", api_key: ENV['OPENAI_API_KEY'])
DSPy.with_lm(fast_model) do
result = classifier.call(text: "test") # Uses fast_model inside this block
end
# Back to global LM outside the block
LM resolution hierarchy : Instance-level LM → Fiber-local LM (DSPy.with_lm) → Global LM (DSPy.configure).
Use configure_predictor for fine-grained control over agent internals:
agent = DSPy::ReAct.new(MySignature, tools: tools)
agent.configure { |c| c.lm = default_model }
agent.configure_predictor('thought_generator') { |c| c.lm = powerful_model }
Systematically test LLM application performance with DSPy::Evals:
metric = DSPy::Metrics.exact_match(field: :answer, case_sensitive: false)
evaluator = DSPy::Evals.new(predictor, metric: metric)
result = evaluator.evaluate(test_examples, display_table: true)
puts "Pass Rate: #{(result.pass_rate * 100).round(1)}%"
Built-in metrics: exact_match, contains, numeric_difference, composite_and. Custom metrics return true/false or a DSPy::Prediction with score: and feedback: fields.
Use DSPy::Example for typed test data and export_scores: true to push results to Langfuse.
GEPA (Genetic-Pareto Reflective Prompt Evolution) uses reflection-driven instruction rewrites:
gem 'dspy-gepa'
teleprompter = DSPy::Teleprompt::GEPA.new(
metric: metric,
reflection_lm: DSPy::ReflectionLM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']),
feedback_map: feedback_map,
config: { max_metric_calls: 600, minibatch_size: 6 }
)
result = teleprompter.compile(program, trainset: train, valset: val)
optimized_program = result.optimized_program
The metric must return DSPy::Prediction.new(score:, feedback:) so the reflection model can reason about failures. Use feedback_map to target individual predictors in composite modules.
Replace opaque string context blobs with T::Struct inputs. Each field gets its own description: annotation in the JSON schema the LLM sees:
class NavigationContext < T::Struct
const :workflow_hint, T.nilable(String),
description: "Current workflow phase guidance for the agent"
const :action_log, T::Array[String], default: [],
description: "Compact one-line-per-action history of research steps taken"
const :iterations_remaining, Integer,
description: "Budget remaining. Each tool call costs 1 iteration."
end
class ToolSelectionSignature < DSPy::Signature
input do
const :query, String
const :context, NavigationContext # Structured, not an opaque string
end
output do
const :tool_name, String
const :tool_args, String, description: "JSON-encoded arguments"
end
end
Benefits: type safety at compile time, per-field descriptions in the LLM schema, easy to test as value objects, extensible by adding const declarations.
Control how DSPy describes signature structure to the LLM:
structured_outputs: trueschema_format: :baml) — 84% token reduction for Enhanced Prompting mode. Requires sorbet-baml gem.schema_format: :toon, data_format: :toon) — Table-oriented format for both schemas and data. Enhanced Prompting mode only.BAML and TOON apply only when structured_outputs: false. With structured_outputs: true, the provider receives JSON Schema directly.
Persist and reload optimized programs with DSPy::Storage::ProgramStorage:
storage = DSPy::Storage::ProgramStorage.new(storage_path: "./dspy_storage")
storage.save_program(result.optimized_program, result, metadata: { optimizer: 'MIPROv2' })
Supports checkpoint management, optimization history tracking, and import/export between environments.
Organize DSPy components using Rails conventions:
app/
entities/ # T::Struct types shared across signatures
signatures/ # DSPy::Signature definitions
tools/ # DSPy::Tools::Base implementations
concerns/ # Shared tool behaviors (error handling, etc.)
modules/ # DSPy::Module orchestrators
services/ # Plain Ruby services that compose DSPy modules
config/
initializers/
dspy.rb # DSPy + provider configuration
feature_flags.rb # Model selection per role
spec/
signatures/ # Schema validation tests
tools/ # Tool unit tests
modules/ # Integration tests with VCR
vcr_cassettes/ # Recorded HTTP interactions
# config/initializers/dspy.rb
Rails.application.config.after_initialize do
next if Rails.env.test? && ENV["DSPY_ENABLE_IN_TEST"].blank?
RubyLLM.configure do |config|
config.gemini_api_key = ENV["GEMINI_API_KEY"] if ENV["GEMINI_API_KEY"].present?
config.anthropic_api_key = ENV["ANTHROPIC_API_KEY"] if ENV["ANTHROPIC_API_KEY"].present?
config.openai_api_key = ENV["OPENAI_API_KEY"] if ENV["OPENAI_API_KEY"].present?
end
model = ENV.fetch("DSPY_MODEL", "ruby_llm/gemini-2.5-flash")
DSPy.configure do |config|
config.lm = DSPy::LM.new(model, structured_outputs: true)
config.logger = Rails.logger
end
# Langfuse observability (optional)
if ENV["LANGFUSE_PUBLIC_KEY"].present? && ENV["LANGFUSE_SECRET_KEY"].present?
DSPy::Observability.configure!
end
end
Use different models for different roles (fast/cheap for classification, powerful for synthesis):
# config/initializers/feature_flags.rb
module FeatureFlags
SELECTOR_MODEL = ENV.fetch("DSPY_SELECTOR_MODEL", "ruby_llm/gemini-2.5-flash-lite")
SYNTHESIZER_MODEL = ENV.fetch("DSPY_SYNTHESIZER_MODEL", "ruby_llm/gemini-2.5-flash")
end
Then override per-tool or per-predictor:
class ClassifyTool < DSPy::Tools::Base
def call(query:)
predictor = DSPy::Predict.new(ClassifyQuery)
predictor.configure { |c| c.lm = DSPy::LM.new(FeatureFlags::SELECTOR_MODEL, structured_outputs: true) }
predictor.call(query: query)
end
end
Prefer typed schemas over string descriptions. Let the type system communicate structure to the LLM rather than prose in the signature description.
Define reusable T::Struct and T::Enum types in app/entities/ and reference them across signatures:
# app/entities/search_strategy.rb
class SearchStrategy < T::Enum
enums do
SingleSearch = new("single_search")
DateDecomposition = new("date_decomposition")
end
end
# app/entities/scored_item.rb
class ScoredItem < T::Struct
const :id, String
const :score, Float, description: "Relevance score 0.0-1.0"
const :verdict, String, description: "relevant, maybe, or irrelevant"
const :reason, String, default: ""
end
Use schemas (T::Struct/T::Enum) for:
Use string descriptions for:
Stringdescription: "YYYY-MM-DD format")Rule of thumb : If you'd write a case statement on the output, it should be a T::Enum. If you'd call .each on it, it should be T::Array[SomeStruct].
A common pattern: tools encapsulate a DSPy prediction, adding error handling, model selection, and serialization:
class RerankTool < DSPy::Tools::Base
tool_name "rerank"
tool_description "Score and rank search results by relevance"
MAX_ITEMS = 200
MIN_ITEMS_FOR_LLM = 5
sig { params(query: String, items: T::Array[T::Hash[Symbol, T.untyped]]).returns(T::Hash[Symbol, T.untyped]) }
def call(query:, items: [])
return { scored_items: items, reranked: false } if items.size < MIN_ITEMS_FOR_LLM
capped_items = items.first(MAX_ITEMS)
predictor = DSPy::Predict.new(RerankSignature)
predictor.configure { |c| c.lm = DSPy::LM.new(FeatureFlags::SYNTHESIZER_MODEL, structured_outputs: true) }
result = predictor.call(query: query, items: capped_items)
{ scored_items: result.scored_items, reranked: true }
rescue => e
Rails.logger.warn "[RerankTool] LLM rerank failed: #{e.message}"
{ error: "Rerank failed: #{e.message}", scored_items: items, reranked: false }
end
end
Key patterns:
configuremodule ErrorHandling
extend ActiveSupport::Concern
private
def safe_predict(signature_class, **inputs)
predictor = DSPy::Predict.new(signature_class)
yield predictor if block_given?
predictor.call(**inputs)
rescue Faraday::Error, Net::HTTPError => e
Rails.logger.error "[#{self.class.name}] API error: #{e.message}"
nil
rescue JSON::ParserError => e
Rails.logger.error "[#{self.class.name}] Invalid LLM output: #{e.message}"
nil
end
end
Wrap operations in spans for Langfuse/OpenTelemetry visibility:
result = DSPy::Context.with_span(
operation: "tool_selector.select",
"dspy.module" => "ToolSelector",
"tool_selector.tools" => tool_names.join(",")
) do
@predictor.call(query: query, context: context, available_tools: schemas)
end
# Gemfile
gem 'dspy-o11y'
gem 'dspy-o11y-langfuse'
# .env
LANGFUSE_PUBLIC_KEY=pk-...
LANGFUSE_SECRET_KEY=sk-...
DSPY_TELEMETRY_BATCH_SIZE=5
Every DSPy::Predict, DSPy::ReAct, and tool call is automatically traced when observability is configured.
Report evaluation scores to Langfuse:
DSPy.score(name: "relevance", value: 0.85, trace_id: current_trace_id)
VCR.configure do |config|
config.cassette_library_dir = "spec/vcr_cassettes"
config.hook_into :webmock
config.configure_rspec_metadata!
config.filter_sensitive_data('<GEMINI_API_KEY>') { ENV['GEMINI_API_KEY'] }
config.filter_sensitive_data('<OPENAI_API_KEY>') { ENV['OPENAI_API_KEY'] }
end
Test that signatures produce valid schemas without calling any LLM:
RSpec.describe ClassifyResearchQuery do
it "has required input fields" do
schema = described_class.input_json_schema
expect(schema[:required]).to include("query")
end
it "has typed output fields" do
schema = described_class.output_json_schema
expect(schema[:properties]).to have_key(:search_strategy)
end
end
RSpec.describe RerankTool do
let(:tool) { described_class.new }
it "skips LLM for small result sets" do
expect(DSPy::Predict).not_to receive(:new)
result = tool.call(query: "test", items: [{ id: "1" }])
expect(result[:reranked]).to be false
end
it "calls LLM for large result sets", :vcr do
items = 10.times.map { |i| { id: i.to_s, title: "Item #{i}" } }
result = tool.call(query: "relevant items", items: items)
expect(result[:reranked]).to be true
end
end
When helping users with DSPy.rb:
T::Struct and T::Enum types, not string descriptionsapp/entities/ — Extract shared types so signatures stay thinpredictor.configure { |c| c.lm = ... } to pick the right model per taskinput_json_schema and output_json_schema in unit testsDSPy::Context.with_span for observabilityKeep description concise — The signature description should state the goal, not the field details:
# Good — concise goal
class ParseOutline < DSPy::Signature
description 'Extract block-level structure from HTML as a flat list of skeleton sections.'
input do
const :html, String, description: 'Raw HTML to parse'
end
output do
const :sections, T::Array[Section], description: 'Block elements: headings, paragraphs, code blocks, lists'
end
end
Use defaults over nilable arrays — For OpenAI structured outputs compatibility:
# Good — works with OpenAI structured outputs
class ASTNode < T::Struct
const :children, T::Array[ASTNode], default: []
end
$defsDSPy.rb supports recursive types in structured outputs using JSON Schema $defs:
class TreeNode < T::Struct
const :value, String
const :children, T::Array[TreeNode], default: [] # Self-reference
end
The schema generator automatically creates #/$defs/TreeNode references for recursive types, compatible with OpenAI and Gemini structured outputs.
DSPy.rb extends T::Struct to support field-level description: kwargs that flow to JSON Schema:
class ASTNode < T::Struct
const :node_type, NodeType, description: 'The type of node (heading, paragraph, etc.)'
const :text, String, default: "", description: 'Text content of the node'
const :level, Integer, default: 0 # No description — field is self-explanatory
const :children, T::Array[ASTNode], default: []
end
When to use field descriptions : complex field semantics, enum-like strings, constrained values, nested structs with ambiguous names. When to skip : self-explanatory fields like name, id, url, or boolean flags.
Current: 0.34.3
Weekly Installs
206
Repository
GitHub Stars
10.9K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubFailSocketPassSnykWarn
Installed on
gemini-cli178
opencode177
codex176
claude-code175
cursor171
github-copilot164
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
45,100 周安装
微信自动化监控与消息处理指南:Windows wxauto 与 macOS Accessibility API 实战
204 周安装
剪映AI视频生成自动化脚本 - 基于Playwright的Seedance 2.0模型文生图/图生视频工具
204 周安装
社交媒体内容策略指南:LinkedIn、Twitter、Instagram、TikTok、Facebook平台优化与模板
204 周安装
Python statsmodels 统计建模与计量经济学分析库 - 回归、时间序列、假设检验
204 周安装
Azure PostgreSQL 无密码身份验证配置指南:Entra ID 迁移与访问管理
34,800 周安装
阅读清单管理器:基于Zettelkasten和间隔重复的智能阅读追踪与知识管理工具