Apple Intelligence 基础模型诊断指南：解决上下文超限、护栏违规、生成缓慢等问题

axiom-foundation-models-diag by charleswiltgen/axiom

162 周安装量

767 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/charleswiltgen/axiom --skill axiom-foundation-models-diag

AI/机器学习 iOS 调试

🇨🇳中文介绍

基础模型诊断

概述

基础模型问题表现为上下文窗口超出错误、护栏违规、生成缓慢、可用性故障和意外输出。核心原则 80% 的基础模型问题源于对模型能力（30亿参数的设备级模型，不具备世界知识）、上下文限制（4096个令牌）或可用性要求的误解——而非框架错误。

危险信号 — 怀疑基础模型问题

如果您看到以下任何情况，请怀疑是基础模型理解问题，而非框架故障：

生成时间超过 5 秒
错误：exceededContextWindowSize
错误：guardrailViolation
错误：unsupportedLanguageOrLocale
模型产生幻觉/错误输出
生成期间 UI 冻结
功能在模拟器中工作，但在设备上不工作
❌ 禁止 "基础模型坏了，我们需要换一个 AI"
- 基础模型为数百万台设备的 Apple Intelligence 提供支持
- 错误输出 = 错误用例（世界知识 vs 摘要）
- 不要合理化问题——诊断它

关键区别 基础模型是一个设备级模型（30亿参数），专为摘要、提取、分类优化——不具备世界知识或复杂推理能力。将其用于错误的任务必然导致糟糕的结果。

强制第一步

始终首先运行这些（在更改代码之前）：

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

917,400 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

122,000 周安装

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

69,600 周安装

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

52,100 周安装

// 1. 检查可用性
let availability = SystemLanguageModel.default.availability

switch availability {
case .available:
    print("✅ 可用")
case .unavailable(let reason):
    print("❌ 不可用: \(reason)")
    // 可能的原因：
    // - 设备不支持 Apple Intelligence
    // - 区域不受支持
    // - 用户未选择加入
}

// 记录："可用？是/否，如果不可用请说明原因"

// 2. 检查支持的语言
let supported = SystemLanguageModel.default.supportedLanguages
print("支持的语言: \(supported)")
print("当前区域设置: \(Locale.current.language)")

if !supported.contains(Locale.current.language) {
    print("⚠️ 当前语言不受支持！")
}

// 记录："语言支持？是/否"

// 3. 检查上下文使用情况
let session = LanguageModelSession()
// 经过一些交互后：
print("转录条目数: \(session.transcript.entries.count)")

// 粗略估计（不精确）：
let transcriptText = session.transcript.entries
    .map { $0.content }
    .joined()
print("近似字符数: \(transcriptText.count)")
print("粗略令牌估计: \(transcriptText.count / 3)")
// 4096 令牌限制 ≈ 12,000 字符

// 记录："接近上下文限制？是/否"

// 4. 使用 Instruments 分析
// 使用基础模型 Instrument 模板运行
// 检查：
// - 初始模型加载时间
// - 令牌计数（输入/输出）
// - 每个请求的生成时间
// - 优化领域

// 记录："延迟分析：[来自 Instruments 的数字]"

// 5. 检查转录以进行调试
print("完整转录：")
for entry in session.transcript.entries {
    print("条目: \(entry.content.prefix(100))...")
}

// 记录："任何异常条目？重复内容？"

如果 availability = .unavailable → 设备/区域/选择加入问题（非代码错误）
如果错误是 exceededContextWindowSize → 令牌过多（压缩转录）
如果错误是 guardrailViolation → 内容策略触发（非模型故障）
如果错误是 unsupportedLanguageOrLocale → 语言不受支持（检查支持列表）
如果输出是幻觉 → 错误用例（世界知识 vs 提取）
如果生成时间 >5 秒 → 未流式传输或需要优化
如果 UI 冻结 → 在主线程上调用（使用 Task {}）

基础模型问题？
│
├─ 无法启动？
│  ├─ .unavailable → 可用性问题
│  │  ├─ 设备不支持？ → 模式 1a（设备要求）
│  │  ├─ 区域限制？ → 模式 1b（区域可用性）
│  │  └─ 用户未选择加入？ → 模式 1c（设置检查）
│  │
├─ 生成失败？
│  ├─ exceededContextWindowSize → 上下文限制
│  │  └─ 长对话或冗长提示？ → 模式 2a（压缩）
│  │
│  ├─ guardrailViolation → 内容策略
│  │  └─ 敏感或不适当内容？ → 模式 2b（优雅处理）
│  │
│  ├─ unsupportedLanguageOrLocale → 语言问题
│  │  └─ 非英语或不受支持的语言？ → 模式 2c（语言检查）
│  │
│  └─ 其他错误 → 通用错误处理
│     └─ 未知错误类型？ → 模式 2d（兜底处理）
│
├─ 输出错误？
│  ├─ 幻觉事实 → 错误模型使用
│  │  └─ 询问世界知识？ → 模式 3a（用例不匹配）
│  │
│  ├─ 错误结构 → 解析问题
│  │  └─ 手动 JSON 解析？ → 模式 3b（使用 @Generable）
│  │
│  ├─ 缺少数据 → 需要工具
│  │  └─ 需要外部信息？ → 模式 3c（工具调用）
│  │
│  └─ 输出不一致 → 采样问题
│     └─ 每次结果不同？ → 模式 3d（温度/贪婪）
│
├─ 太慢？
│  ├─ 初始延迟 (1-2s) → 模型加载
│  │  └─ 第一个请求慢？ → 模式 4a（预热）
│  │
│  ├─ 结果等待时间长 → 未流式传输
│  │  └─ 用户等待 3-5s？ → 模式 4b（流式传输）
│  │
│  ├─ 冗长模式 → 令牌开销
│  │  └─ 大型 @Generable 类型？ → 模式 4c（includeSchemaInPrompt）
│  │
│  └─ 复杂提示 → 处理过多
│     └─ 大量提示或任务？ → 模式 4d（分解）
│
└─ UI 冻结？
   └─ 主线程阻塞 → 异步问题
      └─ 生成期间应用无响应？ → 模式 5a（Task {}）

// ❌ 错误 - 无可用性 UI
let session = LanguageModelSession() // 在不支持的设备上崩溃

// ✅ 正确 - 优雅的 UI
struct AIFeatureView: View {
    @State private var availability = SystemLanguageModel.default.availability

    var body: some View {
        switch availability {
        case .available:
            AIContentView()
        case .unavailable:
            VStack {
                Image(systemName: "cpu")
                Text("AI 功能需要 Apple Intelligence")
                    .font(.headline)
                Text("适用于 iPhone 15 Pro 及更高版本")
                    .font(.caption)
                    .foregroundColor(.secondary)
            }
        }
    }
}

// ❌ 错误 - 未处理的错误
let response = try await session.respond(to: prompt)
// 大约 10-15 轮后崩溃

// ✅ 正确 - 压缩转录
var session = LanguageModelSession()

do {
    let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // 压缩并继续
    session = condensedSession(from: session)
    let response = try await session.respond(to: prompt)
}

func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
    let entries = previous.transcript.entries

    guard entries.count > 2 else {
        return LanguageModelSession(transcript: previous.transcript)
    }

    // 保留：第一个（指令）+ 最后一个（最近上下文）
    var condensed = [entries.first!, entries.last!]

    let transcript = Transcript(entries: condensed)
    return LanguageModelSession(transcript: transcript)
}

// ❌ 错误 - 无语言检查
let response = try await session.respond(to: userInput)
// 如果语言不受支持则崩溃

// ✅ 正确 - 首先检查
let supported = SystemLanguageModel.default.supportedLanguages

guard supported.contains(Locale.current.language) else {
    // 显示免责声明
    print("语言不受支持。当前支持：\(supported)")
    return
}

// 同时处理错误
do {
    let response = try await session.respond(to: userInput)
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
    print("请使用英语或其他支持的语言")
}

// ✅ 正确 - 全面的错误处理
do {
    let response = try await session.respond(to: prompt)
    print(response.content)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // 处理上下文溢出
    session = condensedSession(from: session)
} catch LanguageModelSession.GenerationError.guardrailViolation {
    // 处理内容策略
    showMessage("无法生成该内容")
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
    // 处理语言问题
    showMessage("语言不受支持")
} catch {
    // 兜底处理意外错误
    print("意外错误: \(error)")
    showMessage("出错了。请重试。")
}

// ❌ 错误 - 错误用例
let prompt = "法国总统是谁？"
let response = try await session.respond(to: prompt)
// 将产生幻觉或给出过时信息

// ✅ 正确 - 使用服务器 LLM 获取世界知识
// 基础模型适用于：
// - 摘要
// - 提取
// - 分类
// - 内容生成

// 或者：使用工具调用外部数据源
struct GetFactTool: Tool {
    let name = "getFact"
    let description = "从已验证来源获取事实信息"

    @Generable
    struct Arguments {
        let query: String
    }

    func call(arguments: Arguments) async throws -> ToolOutput {
        // 从维基百科 API、新闻 API 等获取
        let fact = await fetchFactFromAPI(arguments.query)
        return ToolOutput(fact)
    }
}

// ❌ 错误 - 无外部数据
let response = try await session.respond(
    to: "东京天气如何？"
)
// 将编造天气数据

// ✅ 正确 - 工具调用
import WeatherKit

struct GetWeatherTool: Tool {
    let name = "getWeather"
    let description = "获取城市的当前天气"

    @Generable
    struct Arguments {
        let city: String
    }

    func call(arguments: Arguments) async throws -> ToolOutput {
        // 获取真实天气
        let weather = await WeatherService.shared.weather(for: arguments.city)
        return ToolOutput("温度: \(weather.temperature)°F")
    }
}

let session = LanguageModelSession(tools: [GetWeatherTool()])
let response = try await session.respond(to: "东京天气如何？")
// 使用真实天气数据

// 默认：随机采样
let response1 = try await session.respond(to: "写一首俳句")
let response2 = try await session.respond(to: "写一首俳句")
// 每次不同

// ✅ 用于确定性输出（测试/演示）
let response = try await session.respond(
    to: "写一首俳句",
    options: GenerationOptions(sampling: .greedy)
)
// 相同提示输出相同（给定相同模型版本）

// ✅ 用于低方差
let response = try await session.respond(
    to: "分类这篇文章",
    options: GenerationOptions(temperature: 0.5)
)
// 略有变化但集中

// ✅ 用于高创造力
let response = try await session.respond(
    to: "写一个创意故事",
    options: GenerationOptions(temperature: 2.0)
)
// 输出非常多样化

// ❌ 错误 - 用户交互时加载
Button("生成") {
    Task {
        let session = LanguageModelSession() // 此处有 1-2 秒延迟
        let response = try await session.respond(to: prompt)
    }
}

// ✅ 正确 - 初始化时预热
class ViewModel: ObservableObject {
    private var session: LanguageModelSession?

    init() {
        // 用户交互前预热
        Task {
            self.session = LanguageModelSession(instructions: "...")
        }
    }

    func generate(prompt: String) async throws -> String {
        guard let session = session else {
            // 如果未准备好则回退
            self.session = LanguageModelSession()
            return try await self.session!.respond(to: prompt).content
        }
        return try await session.respond(to: prompt).content
    }
}

// ❌ 错误 - 无流式传输
let response = try await session.respond(
    to: "生成 5 天行程",
    generating: Itinerary.self
)
// 用户等待 4 秒，什么也看不到

// ✅ 正确 - 流式传输
@Generable
struct Itinerary {
    var destination: String
    var days: [DayPlan]
}

let stream = session.streamResponse(
    to: "生成前往东京的 5 天行程",
    generating: Itinerary.self
)

for try await partial in stream {
    // 增量更新 UI
    self.itinerary = partial
}
// 用户在 0.5 秒内看到目的地，然后逐步看到天数

// ❌ 错误 - 一个巨大的提示
let prompt = """
    生成包含酒店、餐厅、
    活动、交通、预算、提示和当地习俗的完整 7 天行程
    """
// 5-8 秒，质量差

// ✅ 正确 - 分解为步骤
let overview = try await session.respond(
    to: "为东京生成高级 7 天计划"
)

var dayDetails: [DayPlan] = []
for day in 1...7 {
    let detail = try await session.respond(
        to: "详细说明东京第 \(day) 天的活动和餐厅",
        generating: DayPlan.self
    )
    dayDetails.append(detail.content)
}
// 总时间相似，但质量更好且结果逐步呈现

// ✅ 添加可用性检查 + 优雅的 UI
struct AIFeatureView: View {
    @State private var availability = SystemLanguageModel.default.availability

    var body: some View {
        switch availability {
        case .available:
            // 显示 AI 功能
            AIContentView()

        case .unavailable:
            // 优雅降级
            VStack {
                Image(systemName: "sparkles")
                    .font(.largeTitle)
                    .foregroundColor(.secondary)

                Text("AI 驱动功能")
                    .font(.headline)

                Text("适用于 iPhone 15 Pro 及更高版本")
                    .font(.subheadline)
                    .foregroundColor(.secondary)
                    .multilineTextAlignment(.center)

                // 提供替代方案
                Button("使用标准模式") {
                    // 显示非 AI 降级方案
                }
            }
        }
    }
}

症状	原因	检查	模式	时间
无法启动	.unavailable	SystemLanguageModel.default.availability	1a	5 分钟
区域问题	不受支持的区域	检查支持的区域	1b	5 分钟
未选择加入	Apple Intelligence 已禁用	设置检查	1c	10 分钟
上下文超出	>4096 令牌	转录长度	2a	15 分钟
护栏错误	内容策略	用户输入类型	2b	10 分钟
语言错误	不支持的语言	supportedLanguages	2c	10 分钟
幻觉输出	错误用例	任务类型检查	3a	20 分钟
错误结构	无 @Generable	手动解析？	3b	10 分钟
缺少数据	无工具	需要外部数据？	3c	30 分钟
不一致	随机采样	需要确定性？	3d	5 分钟
初始延迟	模型加载	第一个请求慢？	4a	10 分钟
等待时间长	无流式传输	>1 秒生成？	4b	20 分钟
模式开销	重新插入模式	后续请求？	4c	2 分钟
复杂提示	一次处理过多	>5 秒生成？	4d	30 分钟
UI 冻结	主线程	线程检查	5a	5 分钟

🇺🇸English

Foundation Models Diagnostics

Overview

Foundation Models issues manifest as context window exceeded errors, guardrail violations, slow generation, availability failures, and unexpected output. Core principle 80% of Foundation Models problems stem from misunderstanding model capabilities (3B parameter device-scale model, not world knowledge), context limits (4096 tokens), or availability requirements—not framework bugs.

Red Flags — Suspect Foundation Models Issue

If you see ANY of these, suspect a Foundation Models misunderstanding, not framework breakage:

Generation takes >5 seconds
Error: exceededContextWindowSize
Error: guardrailViolation
Error: unsupportedLanguageOrLocale
Model gives hallucinated/wrong output
UI freezes during generation
Feature works in simulator but not on device
❌ FORBIDDEN "Foundation Models is broken, we need a different AI"
- Foundation Models powers Apple Intelligence across millions of devices
- Wrong output = wrong use case (world knowledge vs summarization)
- Do not rationalize away the issue—diagnose it

Critical distinction Foundation Models is a device-scale model (3B parameters) optimized for summarization, extraction, classification—NOT world knowledge or complex reasoning. Using it for the wrong task guarantees poor results.

Mandatory First Steps

ALWAYS run these FIRST (before changing code):

// 1. Check availability
let availability = SystemLanguageModel.default.availability

switch availability {
case .available:
    print("✅ Available")
case .unavailable(let reason):
    print("❌ Unavailable: \(reason)")
    // Possible reasons:
    // - Device not Apple Intelligence-capable
    // - Region not supported
    // - User not opted in
}

// Record: "Available? Yes/no, reason if not"

// 2. Check supported languages
let supported = SystemLanguageModel.default.supportedLanguages
print("Supported languages: \(supported)")
print("Current locale: \(Locale.current.language)")

if !supported.contains(Locale.current.language) {
    print("⚠️ Current language not supported!")
}

// Record: "Language supported? Yes/no"

// 3. Check context usage
let session = LanguageModelSession()
// After some interactions:
print("Transcript entries: \(session.transcript.entries.count)")

// Rough estimation (not exact):
let transcriptText = session.transcript.entries
    .map { $0.content }
    .joined()
print("Approximate chars: \(transcriptText.count)")
print("Rough token estimate: \(transcriptText.count / 3)")
// 4096 token limit ≈ 12,000 characters

// Record: "Approaching context limit? Yes/no"

// 4. Profile with Instruments
// Run with Foundation Models Instrument template
// Check:
// - Initial model load time
// - Token counts (input/output)
// - Generation time per request
// - Areas for optimization

// Record: "Latency profile: [numbers from Instruments]"

// 5. Inspect transcript for debugging
print("Full transcript:")
for entry in session.transcript.entries {
    print("Entry: \(entry.content.prefix(100))...")
}

// Record: "Any unusual entries? Repeated content?"

What this tells you

Unavailable → Proceed to Pattern 1a/1b/1c (availability issues)
Context exceeded → Proceed to Pattern 2a (token limit)
Guardrail error → Proceed to Pattern 2b (content policy)
Language error → Proceed to Pattern 2c (unsupported language)
Wrong output → Proceed to Pattern 3a/3b/3c (output quality)
Slow generation → Proceed to Pattern 4a/4b/4c/4d (performance)
UI frozen → Proceed to Pattern 5a (main thread blocking)

MANDATORY INTERPRETATION

Before changing ANY code, identify ONE of these:

If availability = .unavailable → Device/region/opt-in issue (not code bug)
If error is exceededContextWindowSize → Too many tokens (condense transcript)
If error is guardrailViolation → Content policy triggered (not model failure)
If error is unsupportedLanguageOrLocale → Language not supported (check supported list)
If output is hallucinated → Wrong use case (world knowledge vs extraction)
If generation >5 seconds → Not streaming or need optimization
If UI frozen → Calling on main thread (use Task {})

If diagnostics are contradictory or unclear

STOP. Do NOT proceed to patterns yet
Add detailed logging to every respond() call
Run with Instruments Foundation Models template
Establish baseline: what's actually happening vs what you assumed

Decision Tree

Foundation Models problem?
│
├─ Won't start?
│  ├─ .unavailable → Availability issue
│  │  ├─ Device not capable? → Pattern 1a (device requirement)
│  │  ├─ Region restriction? → Pattern 1b (regional availability)
│  │  └─ User not opted in? → Pattern 1c (Settings check)
│  │
├─ Generation fails?
│  ├─ exceededContextWindowSize → Context limit
│  │  └─ Long conversation or verbose prompts? → Pattern 2a (condense)
│  │
│  ├─ guardrailViolation → Content policy
│  │  └─ Sensitive or inappropriate content? → Pattern 2b (handle gracefully)
│  │
│  ├─ unsupportedLanguageOrLocale → Language issue
│  │  └─ Non-English or unsupported language? → Pattern 2c (language check)
│  │
│  └─ Other error → General error handling
│     └─ Unknown error type? → Pattern 2d (catch-all)
│
├─ Output wrong?
│  ├─ Hallucinated facts → Wrong model use
│  │  └─ Asking for world knowledge? → Pattern 3a (use case mismatch)
│  │
│  ├─ Wrong structure → Parsing issue
│  │  └─ Manual JSON parsing? → Pattern 3b (use @Generable)
│  │
│  ├─ Missing data → Tool needed
│  │  └─ Need external information? → Pattern 3c (tool calling)
│  │
│  └─ Inconsistent output → Sampling issue
│     └─ Different results each time? → Pattern 3d (temperature/greedy)
│
├─ Too slow?
│  ├─ Initial delay (1-2s) → Model loading
│  │  └─ First request slow? → Pattern 4a (prewarm)
│  │
│  ├─ Long wait for results → Not streaming
│  │  └─ User waits 3-5s? → Pattern 4b (streaming)
│  │
│  ├─ Verbose schema → Token overhead
│  │  └─ Large @Generable type? → Pattern 4c (includeSchemaInPrompt)
│  │
│  └─ Complex prompt → Too much processing
│     └─ Massive prompt or task? → Pattern 4d (break down)
│
└─ UI frozen?
   └─ Main thread blocked → Async issue
      └─ App unresponsive during generation? → Pattern 5a (Task {})

Diagnostic Patterns

Pattern 1a: Device Not Capable

Symptom :

SystemLanguageModel.default.availability = .unavailable
Reason: Device not Apple Intelligence-capable

Diagnosis :

let availability = SystemLanguageModel.default.availability

switch availability {
case .available:
    print("✅ Available")
case .unavailable(let reason):
    print("❌ Reason: \(reason)")
    // Check if device-related
}

Fix :

// ❌ BAD - No availability UI
let session = LanguageModelSession() // Crashes on unsupported devices

// ✅ GOOD - Graceful UI
struct AIFeatureView: View {
    @State private var availability = SystemLanguageModel.default.availability

    var body: some View {
        switch availability {
        case .available:
            AIContentView()
        case .unavailable:
            VStack {
                Image(systemName: "cpu")
                Text("AI features require Apple Intelligence")
                    .font(.headline)
                Text("Available on iPhone 15 Pro and later")
                    .font(.caption)
                    .foregroundColor(.secondary)
            }
        }
    }
}

Time cost : 5-10 minutes to add UI

Pattern 1b: Regional Availability

Symptom :

Feature works for some users, not others
.unavailable due to region restrictions

Diagnosis : Foundation Models requires:

Supported region (e.g., US, UK, Australia initially)
May expand over time

Fix :

// ✅ GOOD - Clear messaging
switch SystemLanguageModel.default.availability {
case .available:
    // proceed
case .unavailable(let reason):
    // Show region-specific message
    Text("AI features not yet available in your region")
    Text("Check Settings → Apple Intelligence for availability")
}

Time cost : 5 minutes

Pattern 1c: User Not Opted In

Symptom :

Device capable, region supported
Still .unavailable

Diagnosis : User must opt in to Apple Intelligence in Settings

Fix :

// ✅ GOOD - Direct user to settings
switch SystemLanguageModel.default.availability {
case .available:
    // proceed
case .unavailable:
    VStack {
        Text("Enable Apple Intelligence")
        Text("Settings → Apple Intelligence → Enable")
        Button("Open Settings") {
            if let url = URL(string: UIApplication.openSettingsURLString) {
                UIApplication.shared.open(url)
            }
        }
    }
}

Time cost : 10 minutes

Pattern 2a: Context Window Exceeded

Symptom :

Error: LanguageModelSession.GenerationError.exceededContextWindowSize

Diagnosis :

4096 token limit (input + output)
Long conversations accumulate tokens
Verbose prompts eat into limit

Fix :

// ❌ BAD - Unhandled error
let response = try await session.respond(to: prompt)
// Crashes after ~10-15 turns

// ✅ GOOD - Condense transcript
var session = LanguageModelSession()

do {
    let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // Condense and continue
    session = condensedSession(from: session)
    let response = try await session.respond(to: prompt)
}

func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
    let entries = previous.transcript.entries

    guard entries.count > 2 else {
        return LanguageModelSession(transcript: previous.transcript)
    }

    // Keep: first (instructions) + last (recent context)
    var condensed = [entries.first!, entries.last!]

    let transcript = Transcript(entries: condensed)
    return LanguageModelSession(transcript: transcript)
}

Time cost : 15-20 minutes to implement condensing

Pattern 2b: Guardrail Violation

Symptom :

Error: LanguageModelSession.GenerationError.guardrailViolation

Diagnosis :

User input triggered content policy
Violence, hate speech, illegal activities
Model refuses to generate

Fix :

// ✅ GOOD - Graceful handling
do {
    let response = try await session.respond(to: userInput)
    print(response.content)
} catch LanguageModelSession.GenerationError.guardrailViolation {
    // Show user-friendly message
    print("I can't help with that request")
    // Log for review (but don't show user input to avoid storing harmful content)
}

Time cost : 5-10 minutes

Pattern 2c: Unsupported Language

Symptom :

Error: LanguageModelSession.GenerationError.unsupportedLanguageOrLocale

Diagnosis : User input in language model doesn't support

Fix :

// ❌ BAD - No language check
let response = try await session.respond(to: userInput)
// Crashes if unsupported language

// ✅ GOOD - Check first
let supported = SystemLanguageModel.default.supportedLanguages

guard supported.contains(Locale.current.language) else {
    // Show disclaimer
    print("Language not supported. Currently supports: \(supported)")
    return
}

// Also handle errors
do {
    let response = try await session.respond(to: userInput)
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
    print("Please use English or another supported language")
}

Time cost : 10 minutes

Pattern 2d: General Error Handling

Symptom : Unknown error types

Fix :

// ✅ GOOD - Comprehensive error handling
do {
    let response = try await session.respond(to: prompt)
    print(response.content)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // Handle context overflow
    session = condensedSession(from: session)
} catch LanguageModelSession.GenerationError.guardrailViolation {
    // Handle content policy
    showMessage("Cannot generate that content")
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
    // Handle language issue
    showMessage("Language not supported")
} catch {
    // Catch-all for unexpected errors
    print("Unexpected error: \(error)")
    showMessage("Something went wrong. Please try again.")
}

Time cost : 10-15 minutes

Pattern 3a: Hallucinated Output (Wrong Use Case)

Symptom :

Model gives factually incorrect answers
Makes up information

Diagnosis : Using model for world knowledge (wrong use case)

Fix :

// ❌ BAD - Wrong use case
let prompt = "Who is the president of France?"
let response = try await session.respond(to: prompt)
// Will hallucinate or give outdated info

// ✅ GOOD - Use server LLM for world knowledge
// Foundation Models is for:
// - Summarization
// - Extraction
// - Classification
// - Content generation

// OR: Use Tool calling with external data source
struct GetFactTool: Tool {
    let name = "getFact"
    let description = "Fetch factual information from verified source"

    @Generable
    struct Arguments {
        let query: String
    }

    func call(arguments: Arguments) async throws -> ToolOutput {
        // Fetch from Wikipedia API, news API, etc.
        let fact = await fetchFactFromAPI(arguments.query)
        return ToolOutput(fact)
    }
}

Time cost : 20-30 minutes to implement tool OR switch to appropriate AI

Pattern 3b: Wrong Structure (Not Using @Generable)

Symptom :

Parsing errors
Invalid JSON
Wrong keys

Diagnosis : Manual JSON parsing instead of @Generable

Fix :

// ❌ BAD - Manual parsing
let prompt = "Generate person as JSON"
let response = try await session.respond(to: prompt)
let data = response.content.data(using: .utf8)!
let person = try JSONDecoder().decode(Person.self, from: data) // CRASHES

// ✅ GOOD - @Generable
@Generable
struct Person {
    let name: String
    let age: Int
}

let response = try await session.respond(
    to: "Generate a person",
    generating: Person.self
)
// response.content is type-safe Person, guaranteed structure

Time cost : 10 minutes to convert to @Generable

Pattern 3c: Missing Data (Need Tool)

Symptom :

Model doesn't have required information
Output is vague or generic

Diagnosis : Need external data (weather, locations, contacts)

Fix :

// ❌ BAD - No external data
let response = try await session.respond(
    to: "What's the weather in Tokyo?"
)
// Will make up weather data

// ✅ GOOD - Tool calling
import WeatherKit

struct GetWeatherTool: Tool {
    let name = "getWeather"
    let description = "Get current weather for a city"

    @Generable
    struct Arguments {
        let city: String
    }

    func call(arguments: Arguments) async throws -> ToolOutput {
        // Fetch real weather
        let weather = await WeatherService.shared.weather(for: arguments.city)
        return ToolOutput("Temperature: \(weather.temperature)°F")
    }
}

let session = LanguageModelSession(tools: [GetWeatherTool()])
let response = try await session.respond(to: "What's the weather in Tokyo?")
// Uses real weather data

Time cost : 20-30 minutes to implement tool

Pattern 3d: Inconsistent Output (Sampling)

Symptom :

Different output every time for same prompt
Need consistent results for testing

Diagnosis : Random sampling (default behavior)

Fix :

// Default: Random sampling
let response1 = try await session.respond(to: "Write a haiku")
let response2 = try await session.respond(to: "Write a haiku")
// Different every time

// ✅ For deterministic output (testing/demos)
let response = try await session.respond(
    to: "Write a haiku",
    options: GenerationOptions(sampling: .greedy)
)
// Same output for same prompt (given same model version)

// ✅ For low variance
let response = try await session.respond(
    to: "Classify this article",
    options: GenerationOptions(temperature: 0.5)
)
// Slightly varied but focused

// ✅ For high creativity
let response = try await session.respond(
    to: "Write a creative story",
    options: GenerationOptions(temperature: 2.0)
)
// Very diverse output

Time cost : 2-5 minutes

Pattern 4a: Initial Latency (Prewarm)

Symptom :

First generation takes 1-2 seconds to start
Subsequent requests faster

Diagnosis : Model loading time

Fix :

// ❌ BAD - Load on user interaction
Button("Generate") {
    Task {
        let session = LanguageModelSession() // 1-2s delay here
        let response = try await session.respond(to: prompt)
    }
}

// ✅ GOOD - Prewarm on init
class ViewModel: ObservableObject {
    private var session: LanguageModelSession?

    init() {
        // Prewarm before user interaction
        Task {
            self.session = LanguageModelSession(instructions: "...")
        }
    }

    func generate(prompt: String) async throws -> String {
        guard let session = session else {
            // Fallback if not ready
            self.session = LanguageModelSession()
            return try await self.session!.respond(to: prompt).content
        }
        return try await session.respond(to: prompt).content
    }
}

Time cost : 10 minutes Latency saved : 1-2 seconds on first request

Pattern 4b: Long Generation (Streaming)

Symptom :

User waits 3-5 seconds seeing nothing
Then entire result appears at once

Diagnosis : Not streaming long generations

Fix :

// ❌ BAD - No streaming
let response = try await session.respond(
    to: "Generate 5-day itinerary",
    generating: Itinerary.self
)
// User waits 4 seconds seeing nothing

// ✅ GOOD - Streaming
@Generable
struct Itinerary {
    var destination: String
    var days: [DayPlan]
}

let stream = session.streamResponse(
    to: "Generate 5-day itinerary to Tokyo",
    generating: Itinerary.self
)

for try await partial in stream {
    // Update UI incrementally
    self.itinerary = partial
}
// User sees destination in 0.5s, then days progressively

Time cost : 15-20 minutes Perceived latency : 0.5s vs 4s

Pattern 4c: Large Schema Overhead

Symptom :

Subsequent requests with same @Generable type slow

Diagnosis : Schema re-inserted into prompt every time

Fix :

// First request - schema inserted automatically
let first = try await session.respond(
    to: "Generate first person",
    generating: Person.self
)

// ✅ Subsequent requests - skip schema insertion
let second = try await session.respond(
    to: "Generate another person",
    generating: Person.self,
    options: GenerationOptions(includeSchemaInPrompt: false)
)

Time cost : 2 minutes Latency saved : 10-20% per request

Pattern 4d: Complex Prompt (Break Down)

Symptom :

Generation takes >5 seconds
Poor quality results

Diagnosis : Prompt too complex for single generation

Fix :

// ❌ BAD - One massive prompt
let prompt = """
    Generate complete 7-day itinerary with hotels, restaurants,
    activities, transportation, budget, tips, and local customs
    """
// 5-8 seconds, poor quality

// ✅ GOOD - Break into steps
let overview = try await session.respond(
    to: "Generate high-level 7-day plan for Tokyo"
)

var dayDetails: [DayPlan] = []
for day in 1...7 {
    let detail = try await session.respond(
        to: "Detail activities and restaurants for day \(day) in Tokyo",
        generating: DayPlan.self
    )
    dayDetails.append(detail.content)
}
// Total time similar, but better quality and progressive results

Time cost : 20-30 minutes Quality improvement : Significantly better

Pattern 5a: UI Frozen (Main Thread Blocking)

Symptom :

App unresponsive during generation
UI freezes for seconds

Diagnosis : Calling respond() on main thread synchronously

Fix :

// ❌ BAD - Blocking main thread
Button("Generate") {
    let response = try await session.respond(to: prompt)
    // UI frozen for 2-5 seconds!
}

// ✅ GOOD - Async task
Button("Generate") {
    Task {
        do {
            let response = try await session.respond(to: prompt)
            // Update UI on main thread
            await MainActor.run {
                self.result = response.content
            }
        } catch {
            print("Error: \(error)")
        }
    }
}

Time cost : 5 minutes UX improvement : Massive (no frozen UI)

Production Crisis Scenario

Context

Situation : You just launched an AI-powered feature using Foundation Models. Within 2 hours:

20% of users report "AI feature doesn't work"
App Store reviews dropping: "New AI broken"
VP of Product emailing: "What's the ETA on fix?"
Engineering manager: "Should we roll back?"

Pressure Signals :

🚨 Revenue impact : Feature is key selling point for new app version
⏰ Time pressure : "Fix it NOW"
👔 Executive visibility : VP watching
📉 Public reputation : App Store reviews visible to all

Rationalization Traps

DO NOT fall into these traps:

"Disable the feature"
- Loses product differentiation
- Admits defeat
- Doesn't learn what went wrong
"Roll back to previous version"
- Loses weeks of work
- Doesn't fix root cause
- Users still angry
"It works for me"
- Simulator ≠ real devices
- Your device ≠ all devices
- Ignores real problem
"Switch to ChatGPT API"
- Violates privacy
- Expensive at scale
- Doesn't address availability issue

MANDATORY Protocol

Phase 1: Identify (5 minutes)

// Check error distribution
// What percentage seeing what error?

// Run this on test devices:
let availability = SystemLanguageModel.default.availability

switch availability {
case .available:
    print("✅ Available")
case .unavailable(let reason):
    print("❌ Unavailable: \(reason)")
}

// Hypothesis:
// - If 20% unavailable → Availability issue (device/region/opt-in)
// - If 20% getting errors → Code bug
// - If 20% seeing wrong results → Use case mismatch

Results : Discover that 20% of users have devices without Apple Intelligence support.

Phase 2: Confirm (5 minutes)

// Check which devices affected
// iPhone 15 Pro+ = ✅ Available
// iPhone 15 = ❌ Unavailable
// iPhone 14 = ❌ Unavailable

// Conclusion: Availability issue, not code bug

Root cause : Feature assumes all users have Apple Intelligence. 20% don't.

Phase 3: Device Requirements (5 minutes)

Verify:

Apple Intelligence requires iPhone 15 Pro or later
Or iPad with M1+ chip
Or Mac with Apple silicon

20% of user base = older devices

Phase 4: Implement Fix (15 minutes)

// ✅ Add availability check + graceful UI
struct AIFeatureView: View {
    @State private var availability = SystemLanguageModel.default.availability

    var body: some View {
        switch availability {
        case .available:
            // Show AI feature
            AIContentView()

        case .unavailable:
            // Graceful fallback
            VStack {
                Image(systemName: "sparkles")
                    .font(.largeTitle)
                    .foregroundColor(.secondary)

                Text("AI-Powered Features")
                    .font(.headline)

                Text("Available on iPhone 15 Pro and later")
                    .font(.subheadline)
                    .foregroundColor(.secondary)
                    .multilineTextAlignment(.center)

                // Offer alternative
                Button("Use Standard Mode") {
                    // Show non-AI fallback
                }
            }
        }
    }
}

Phase 5: Deploy (20 minutes)

Test on multiple devices (15 min)
- iPhone 15 Pro: ✅ Shows AI feature
- iPhone 14: ✅ Shows graceful message
- iPad Pro M1: ✅ Shows AI feature
Submit hotfix build (5 min)

Communication Template

To VP of Product (immediate) :

Root cause identified:

The AI feature requires Apple Intelligence (iPhone 15 Pro+).
20% of our users have older devices. We didn't check availability.

Fix: Added availability check with graceful fallback UI.

Timeline:
- Hotfix ready: Now
- TestFlight: 10 minutes
- App Store submission: 30 minutes
- Review: 24-48 hours (requesting expedited)

Impact mitigation:
- 80% of users see working AI feature
- 20% see clear message + standard mode fallback
- No functionality lost, just graceful degradation

To Engineering Team :

Post-mortem items:
1. Add availability check to launch checklist
2. Test on non-Apple-Intelligence devices
3. Document device requirements clearly
4. Add analytics for availability status

Time Saved

Panic path (disable/rollback) : 2 hours of meetings + lost work
Proper diagnosis : 45 minutes root cause → fix → deploy

What We Learned

Always check availability before creating session
Test on real devices across device generations
Graceful degradation better than feature removal
Clear messaging to users about requirements

Quick Reference Table

Symptom	Cause	Check	Pattern	Time
Won't start	.unavailable	SystemLanguageModel.default.availability	1a	5 min
Region issue	Not supported region	Check supported regions	1b	5 min
Not opted in	Apple Intelligence disabled	Settings check	1c	10 min
Context exceeded	>4096 tokens	Transcript length	2a	15 min
Guardrail error	Content policy	User input type	2b	10 min
Language error	Unsupported language

Cross-References

Related Axiom Skills :

axiom-foundation-models — Discipline skill for anti-patterns, proper usage patterns, pressure scenarios
axiom-foundation-models-ref — Complete API reference with all WWDC 2025 code examples

Apple Resources :

Foundation Models Framework Documentation
WWDC 2025-286: Meet the Foundation Models framework
WWDC 2025-301: Deep dive into the Foundation Models framework
Instruments Foundation Models Template

Last Updated : 2025-12-03 Version : 1.0.0 Skill Type : Diagnostic

Weekly Installs

100

Repository

charleswiltgen/axiom

GitHub Stars

606

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode85

codex79

claude-code79

gemini-cli78

cursor77

github-copilot73

Apple Intelligence 基础模型诊断指南：解决上下文超限、护栏违规、生成缓慢等问题

🇨🇳中文介绍

基础模型诊断

概述

危险信号 — 怀疑基础模型问题

强制第一步

相关 Skills

这告诉您什么

强制解释

如果诊断结果矛盾或不清晰

决策树

诊断模式

模式 1a：设备不支持

模式 1b：区域可用性

模式 1c：用户未选择加入

模式 2a：上下文窗口超出

模式 2b：护栏违规

模式 2c：不支持的语言

模式 2d：通用错误处理

模式 3a：幻觉输出（错误用例）

模式 3b：错误结构（未使用 @Generable）

模式 3c：缺少数据（需要工具）

模式 3d：输出不一致（采样）

模式 4a：初始延迟（预热）

模式 4b：长生成时间（流式传输）

模式 4c：大型模式开销

模式 4d：复杂提示（分解）

模式 5a：UI 冻结（主线程阻塞）

生产危机场景

背景

合理化陷阱

强制协议

阶段 1：识别（5 分钟）

阶段 2：确认（5 分钟）

阶段 3：设备要求（5 分钟）

20% 的用户群 = 旧设备

阶段 4：实施修复（15 分钟）

阶段 5：部署（20 分钟）

沟通模板

节省的时间

我们学到的

快速参考表

交叉引用

🇺🇸English

Foundation Models Diagnostics

Overview

Red Flags — Suspect Foundation Models Issue

Mandatory First Steps

What this tells you

MANDATORY INTERPRETATION

If diagnostics are contradictory or unclear

Decision Tree

Diagnostic Patterns

Pattern 1a: Device Not Capable

Pattern 1b: Regional Availability

Pattern 1c: User Not Opted In

Pattern 2a: Context Window Exceeded

Pattern 2b: Guardrail Violation

Pattern 2c: Unsupported Language

Pattern 2d: General Error Handling

Pattern 3a: Hallucinated Output (Wrong Use Case)

Pattern 3b: Wrong Structure (Not Using @Generable)

Pattern 3c: Missing Data (Need Tool)

Pattern 3d: Inconsistent Output (Sampling)

Pattern 4a: Initial Latency (Prewarm)

Pattern 4b: Long Generation (Streaming)

Pattern 4c: Large Schema Overhead

Pattern 4d: Complex Prompt (Break Down)

Pattern 5a: UI Frozen (Main Thread Blocking)

Production Crisis Scenario

Context

Rationalization Traps

MANDATORY Protocol

Phase 1: Identify (5 minutes)

Phase 2: Confirm (5 minutes)

Phase 3: Device Requirements (5 minutes)

20% of user base = older devices

Phase 4: Implement Fix (15 minutes)

Phase 5: Deploy (20 minutes)

Communication Template