Apple设备端AI开发指南：Foundation Models使用技巧与常见错误规避

axiom-foundation-models by charleswiltgen/axiom

171 周安装量

767 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/charleswiltgen/axiom --skill axiom-foundation-models

AI/机器学习 iOS Swift

🇨🇳中文介绍

Foundation Models — 适用于 Apple 平台的设备端 AI

何时使用此技能

在以下情况下使用：

使用 Foundation Models 实现设备端 AI 功能时
添加文本摘要、分类或提取功能时
从 LLM 响应创建结构化输出时
为外部数据集成构建工具调用模式时
流式传输生成内容以改善用户体验时
调试 Foundation Models 问题（上下文溢出、生成缓慢、输出错误）时
在 Foundation Models 与服务器端 LLM（ChatGPT、Claude 等）之间做决策时

危险信号 —— 会导致失败的错误模式

❌ 用于世界知识

失败原因：设备端模型只有 30 亿参数，专为摘要、提取、分类优化 —— 不适用于世界知识或复杂推理。

错误使用示例：

// ❌ 错误 - 询问世界知识
let session = LanguageModelSession()
let response = try await session.respond(to: "What's the capital of France?")

原因：模型会产生幻觉或给出低质量答案。它训练用于内容生成，而非百科全书式知识。

：使用服务器端 LLM（ChatGPT、Claude）处理世界知识，或通过工具调用提供事实数据。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

❌ 手动解析 JSON

失败原因：提示生成 JSON 并使用 JSONDecoder 解析会导致幻觉键、无效 JSON、缺乏类型安全。

错误使用示例：

// ❌ 错误 - 手动 JSON 解析
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
let data = response.content.data(using: .utf8)!
let person = try JSONDecoder().decode(Person.self, from: data) // 崩溃！

原因：模型可能输出 {firstName: "John"}，而你期望的是 {name: "John"}。或者输出完全无效的 JSON。

// ✅ 正确 - @Generable 保证结构
@Generable
struct Person {
    let name: String
    let age: Int
}

let response = try await session.respond(
    to: "Generate a person",
    generating: Person.self
)
// response.content 是类型安全的 Person 实例

❌ 忽略可用性检查

失败原因：Foundation Models 仅在支持区域的 Apple Intelligence 设备上运行。不进行检查会导致应用崩溃或显示错误。

错误使用示例：

// ❌ 错误 - 没有可用性检查
let session = LanguageModelSession() // 可能会失败！

// ✅ 正确 - 先检查
switch SystemLanguageModel.default.availability {
case .available:
    let session = LanguageModelSession()
    // 继续执行
case .unavailable(let reason):
    // 显示优雅的 UI："AI 功能需要 Apple Intelligence"
}

❌ 单个巨大提示

失败原因：上下文窗口为 4096 个令牌（输入 + 输出）。一个巨大的提示会触及限制，导致结果不佳。

错误使用示例：

// ❌ 错误 - 所有内容放在一个提示中
let prompt = """
    Generate a 7-day itinerary for Tokyo including hotels, restaurants,
    activities for each day, transportation details, budget breakdown...
    """
// 超出上下文，质量差

正确方法：分解为更小的任务，使用工具获取外部数据，进行多轮对话。

❌ 不处理生成错误

失败原因：必须处理三种错误，否则你的应用在生产环境中会崩溃。

do {
    let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // 多轮对话记录增长超过 4096 个令牌
    // → 压缩记录并创建新会话（见模式 5）
} catch LanguageModelSession.GenerationError.guardrailViolation {
    // 内容策略被触发
    // → 显示优雅消息："我无法处理该请求"
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
    // 用户输入了不支持的语言
    // → 显示免责声明，检查 SystemLanguageModel.default.supportedLanguages
}

强制性的第一步

在编写任何 Foundation Models 代码之前，完成以下步骤：

查看上方危险信号中的"忽略可用性检查"以了解所需模式。Foundation Models 需要启用 Apple Intelligence 的设备、支持的区域以及用户选择加入。

问自己：我的主要目标是什么？

用例	Foundation Models？	替代方案
摘要	✅ 是
提取（从文本中提取关键信息）	✅ 是
分类（对内容进行分类）	✅ 是
内容标记	✅ 是（内置适配器！）
世界知识	❌ 否	ChatGPT、Claude、Gemini
复杂推理	❌ 否	服务器端 LLM
数学计算	❌ 否	计算器、符号数学

关键：如果你的用例需要世界知识或高级推理，停止。Foundation Models 是错误的工具。

3. 设计 @Generable 模式

如果你需要结构化输出（不仅仅是纯文本）：

错误方法：提示生成"JSON"并手动解析 正确方法：定义 @Generable 类型

@Generable
struct SearchSuggestions {
    @Guide(description: "Suggested search terms", .count(4))
    var searchTerms: [String]
}

原因：约束解码保证结构。没有解析错误，没有幻觉键。

4. 考虑使用工具获取外部数据

如果你的功能需要外部信息：

天气 → WeatherKit 工具
位置 → MapKit 工具
联系人 → Contacts API 工具
日历 → EventKit 工具

不要尝试从模型获取此信息（它会幻觉）。要定义 Tool 协议实现。

5. 为长生成规划流式传输

如果生成时间 >1 秒，使用流式传输：

let stream = session.streamResponse(
    to: prompt,
    generating: Itinerary.self
)

for try await partial in stream {
    // 增量更新 UI
    self.itinerary = partial
}

原因：用户立即看到进度，感知延迟显著降低。

Need on-device AI?
│
├─ World knowledge/reasoning?
│  └─ ❌ NOT Foundation Models
│     → Use ChatGPT, Claude, Gemini, etc.
│     → Reason: 3B parameter model, not trained for encyclopedic knowledge
│
├─ Summarization?
│  └─ ✅ YES → Pattern 1 (Basic Session)
│     → Example: Summarize article, condense email
│     → Time: 10-15 minutes
│
├─ Structured extraction?
│  └─ ✅ YES → Pattern 2 (@Generable)
│     → Example: Extract name, date, amount from invoice
│     → Time: 15-20 minutes
│
├─ Content tagging?
│  └─ ✅ YES → Pattern 3 (contentTagging use case)
│     → Example: Tag article topics, extract entities
│     → Time: 10 minutes
│
├─ Need external data?
│  └─ ✅ YES → Pattern 4 (Tool calling)
│     → Example: Fetch weather, query contacts, get locations
│     → Time: 20-30 minutes
│
├─ Long generation?
│  └─ ✅ YES → Pattern 5 (Streaming)
│     → Example: Generate itinerary, create story
│     → Time: 15-20 minutes
│
└─ Dynamic schemas (runtime-defined structure)?
   └─ ✅ YES → Pattern 6 (DynamicGenerationSchema)
      → Example: Level creator, user-defined forms
      → Time: 30-40 minutes

模式 1：基础会话

使用时机：简单的文本生成、摘要或内容分析。

LanguageModelSession：

有状态 —— 保留所有交互的记录
指令与提示：
- 指令（来自开发者）：定义模型的角色，静态指导
- 提示（来自用户）：用于生成的动态输入
模型训练为优先遵循指令而非提示（安全特性）

import FoundationModels

func respond(userInput: String) async throws -> String {
    let session = LanguageModelSession(instructions: """
        You are a friendly barista in a pixel art coffee shop.
        Respond to the player's question concisely.
        """
    )
    let response = try await session.respond(to: userInput)
    return response.content
}

指令是可选的 —— 如果省略，有合理的默认值
切勿将用户输入插值到指令中 —— 安全风险（提示注入）
保持指令简洁 —— 每个令牌都会增加延迟

let session = LanguageModelSession()

// 第一轮
let first = try await session.respond(to: "Write a haiku about fishing")
print(first.content)
// "Silent waters gleam,
//  Casting lines in morning mist—
//  Hope in every cast."

// 第二轮 - 模型记住上下文
let second = try await session.respond(to: "Do another one about golf")
print(second.content)
// "Silent morning dew,
//  Caddies guide with gentle words—
//  Paths of patience tread."

// 检查完整记录
print(session.transcript)

为何有效：会话自动保留记录。模型使用之前轮次的上下文。

何时使用此模式

✅ 适用于：

简单问答
文本摘要
内容分析
单轮生成

❌ 不适用于：

结构化输出（使用模式 2）
长对话（会触及上下文限制）
需要外部数据（使用模式 4）

模式 2：@Generable 结构化输出

使用时机：你需要从模型获取结构化数据，而不仅仅是纯文本。

没有 @Generable：

// ❌ 错误 - 不可靠
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
// 可能得到：{"firstName": "John"}，而你期望的是 {"name": "John"}
// 可能得到完全无效的 JSON
// 必须手动解析，容易崩溃

解决方案：@Generable

@Generable
struct Person {
    let name: String
    let age: Int
}

let session = LanguageModelSession()
let response = try await session.respond(
    to: "Generate a person",
    generating: Person.self
)

let person = response.content // 类型安全的 Person 实例！

工作原理（约束解码）

@Generable 宏在编译时生成模式
模式自动传递给模型
模型生成受模式约束的令牌
框架将输出解析为 Swift 类型
保证结构正确性 —— 没有幻觉键，没有解析错误

"约束解码会屏蔽无效令牌。模型只能选择根据模式有效的令牌。"

支持 String、Int、Float、Double、Bool、数组、嵌套的 @Generable 类型、带关联值的枚举以及递归类型。完整列表及示例请参见 axiom-foundation-models-ref。

使用 @Guide 控制生成的值。支持描述、数值范围、数组计数和正则表达式模式：

@Generable
struct NPC {
    @Guide(description: "A full name")
    let name: String

    @Guide(.range(1...10))
    let level: Int

    @Guide(.count(3))
    let attributes: [String]
}

运行时验证：@Guide 约束在生成期间通过约束解码强制执行 —— 模型无法产生超出范围的值。但是，由于模型可能产生语义错误但结构有效的输出，因此始终要对结果进行业务逻辑验证。

完整的 @Guide 参考（范围、正则表达式、最大计数）请参见 axiom-foundation-models-ref。

属性顺序很重要

属性按声明顺序生成：

@Generable
struct Itinerary {
    var destination: String // 首先生成
    var days: [DayPlan]     // 其次生成
    var summary: String     // 最后生成
}

"你可能会发现，当摘要作为最后一个属性时，模型能生成最好的摘要。"

原因：后面的属性可以引用前面的属性。对于流式传输，将最重要的属性放在前面。

模式 3：使用 PartiallyGenerated 进行流式传输

使用时机：生成时间 >1 秒，并且你希望逐步更新 UI。

没有流式传输：

// 用户等待 3-5 秒，什么也看不到
let response = try await session.respond(to: prompt, generating: Itinerary.self)
// 然后整个结果一次性出现

用户体验：感觉缓慢，UI 冻结。

解决方案：流式传输

@Generable
struct Itinerary {
    var name: String
    var days: [DayPlan]
}

let stream = session.streamResponse(
    to: "Generate a 3-day itinerary to Mt. Fuji",
    generating: Itinerary.self
)

for try await partial in stream {
    print(partial) // 增量更新
}

PartiallyGenerated 类型

@Generable 宏会自动创建一个 PartiallyGenerated 类型，其中所有属性都是可选的（随着模型生成它们而填充）。详情请参见 axiom-foundation-models-ref。

struct ItineraryView: View {
    let session: LanguageModelSession
    @State private var itinerary: Itinerary.PartiallyGenerated?

    var body: some View {
        VStack {
            if let name = itinerary?.name {
                Text(name)
                    .font(.title)
            }

            if let days = itinerary?.days {
                ForEach(days, id: \.self) { day in
                    DayView(day: day)
                }
            }

            Button("Generate") {
                Task {
                    let stream = session.streamResponse(
                        to: "Generate 3-day itinerary to Tokyo",
                        generating: Itinerary.self
                    )

                    for try await partial in stream {
                        self.itinerary = partial
                    }
                }
            }
        }
    }
}

对于数组至关重要：

// ✅ 正确 - 稳定的标识
ForEach(days, id: \.id) { day in
    DayView(day: day)
}

// ❌ 错误 - 标识改变，动画中断
ForEach(days.indices, id: \.self) { index in
    DayView(day: days[index])
}

何时使用流式传输

行程规划
故事生成
长描述
多部分内容

简单问答（< 1 句话）
快速分类
内容标记

流式传输错误处理

优雅地处理流式传输期间的错误 —— 部分结果可能已经显示：

do {
    for try await partial in stream {
        self.itinerary = partial
    }
} catch LanguageModelSession.GenerationError.guardrailViolation {
    // 部分内容可能可见 —— 显示非破坏性错误
    self.errorMessage = "Generation stopped by content policy"
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // 上下文过多 —— 创建新会话并重试
    session = LanguageModelSession()
}

模式 4：工具调用

使用时机：模型需要外部数据（天气、位置、联系人）来生成响应。

// ❌ 错误 - 模型会产生幻觉
let response = try await session.respond(
    to: "What's the temperature in Cupertino?"
)
// 输出："It's about 72°F"（完全是编造的！）

原因：30 亿参数的模型没有实时天气数据。

解决方案：工具调用

让模型自主调用你的代码来获取外部数据。

import FoundationModels
import WeatherKit
import CoreLocation

struct GetWeatherTool: Tool {
    let name = "getWeather"
    let description = "Retrieve latest weather for a city"

    @Generable
    struct Arguments {
        @Guide(description: "The city to fetch weather for")
        var city: String
    }

    func call(arguments: Arguments) async throws -> ToolOutput {
        let places = try await CLGeocoder().geocodeAddressString(arguments.city)
        let weather = try await WeatherService.shared.weather(for: places.first!.location!)
        let temp = weather.currentWeather.temperature.value

        return ToolOutput("\(arguments.city)'s temperature is \(temp) degrees.")
    }
}

将工具附加到会话

let session = LanguageModelSession(
    tools: [GetWeatherTool()],
    instructions: "Help user with weather forecasts."
)

let response = try await session.respond(
    to: "What's the temperature in Cupertino?"
)

print(response.content)
// "It's 71°F in Cupertino!"

识别需要天气数据
调用 GetWeatherTool
接收真实温度
将其融入自然响应中

Tool 协议：需要 name、description、@Generable Arguments 和 call() 方法
ToolOutput：返回 String（自然语言）或 GeneratedContent（结构化）
多个工具：会话接受工具数组；模型自主决定调用哪个
有状态工具：当工具需要在多次调用间维护状态时，使用 class（而非 struct）

Tool 协议参考、ToolOutput 形式、有状态工具模式以及其他示例，请参见 axiom-foundation-models-ref。

1. 使用工具初始化会话
2. 用户提示："What's Tokyo's weather?"
3. 模型分析："Need weather data"
4. 模型生成工具调用：getWeather(city: "Tokyo")
5. 框架调用你的工具的 call() 方法
6. 你的工具从 API 获取真实数据
7. 工具输出插入到记录中
8. 模型使用工具输出生成最终响应

"模型自主决定何时以及多久调用一次工具。每个请求可以调用多个工具，甚至可以并行调用。"

有效的工具名称（没有幻觉工具）
有效的参数（通过 @Generable）
结构正确性

❌ 不保证：

工具会被调用（模型可能不需要它）
特定的参数值（模型根据上下文决定）

天气数据
地图/位置查询
联系人信息
日历事件
外部 API

❌ 不用于：

数据模型中已有的信息
提示/指令中的信息
简单计算（模型可以处理这些）

模式 5：上下文管理

使用时机：可能超过 4096 令牌限制的多轮对话。

// 长对话...
for i in 1...100 {
    let response = try await session.respond(to: "Question \(i)")
    // 最终...
    // Error: exceededContextWindowSize
}

上下文窗口：4096 个令牌（输入 + 输出合计） 平均值：英语中每个令牌约 3 个字符

4096 个令牌 ≈ 12,000 个字符
≈ 2,000-3,000 个单词总数

长对话或冗长的提示/响应 → 超出限制

处理上下文溢出

基础方法：启动新会话

var session = LanguageModelSession()

do {
    let response = try await session.respond(to: prompt)
    print(response.content)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // 新会话，无历史记录
    session = LanguageModelSession()
}

问题：丢失整个对话历史。

更好的方法：压缩记录

var session = LanguageModelSession()

do {
    let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // 带有压缩历史的新会话
    session = condensedSession(from: session)
}

func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
    let allEntries = previous.transcript.entries
    var condensedEntries = [Transcript.Entry]()

    // 始终包含第一个条目（指令）
    if let first = allEntries.first {
        condensedEntries.append(first)

        // 包含最后一个条目（最近的上下文）
        if allEntries.count > 1, let last = allEntries.last {
            condensedEntries.append(last)
        }
    }

    let condensedTranscript = Transcript(entries: condensedEntries)
    return LanguageModelSession(transcript: condensedTranscript)
}

指令始终保留
最近的上下文保留
总令牌数大幅减少

高级策略（使用 Foundation Models 本身总结中间条目），请参见 axiom-foundation-models-ref。

防止上下文溢出

1. 保持提示简洁：

// ❌ 错误
let prompt = """
    I want you to generate a comprehensive detailed analysis of this article
    with multiple sections including summary, key points, sentiment analysis,
    main arguments, counter arguments, logical fallacies, and conclusions...
    """

// ✅ 正确
let prompt = "Summarize this article's key points"

2. 使用工具获取数据：不要将整个数据集放入提示中，而是使用工具按需获取。

3. 将复杂任务分解为步骤：

// ❌ 错误 - 一次大规模生成
let response = try await session.respond(
    to: "Create 7-day itinerary with hotels, restaurants, activities..."
)

// ✅ 正确 - 多次较小的生成
let overview = try await session.respond(to: "Create high-level 7-day plan")
for day in 1...7 {
    let details = try await session.respond(to: "Detail activities for day \(day)")
}

模式 6：采样与生成选项

使用时机：你需要控制输出的随机性/确定性。

目标	设置	用例
确定性	`GenerationOptions(sampling: .greedy)`	单元测试、演示、一致性要求高
聚焦	`GenerationOptions(temperature: 0.5)`	事实提取、分类
创造性	`GenerationOptions(temperature: 2.0)`	故事生成、头脑风暴、多样化的 NPC 对话

默认：随机采样（温度 1.0）给出平衡的结果。

注意事项：贪婪确定性仅适用于相同的模型版本。操作系统更新可能会改变输出。

完整的 GenerationOptions API 参考请参见 axiom-foundation-models-ref。

场景 1："直接使用 ChatGPT API"

背景：你正在实现一个新的 AI 功能。产品经理建议使用 ChatGPT API 以获取"更好的结果"。

👔 权威：产品经理级别比你高
💸 现有集成：团队已经在其他功能中使用 OpenAI
⏰ 速度："ChatGPT 是经过验证的，Foundation Models 是新的"

合理化陷阱：

"产品经理最懂"
"ChatGPT 给出更好的答案"
"用现有代码实现更快"

隐私侵犯：用户数据发送到外部服务器
- 医疗笔记、财务文档、个人消息
- 违反用户对设备端隐私的期望
- 潜在的 GDPR/隐私法问题
成本：每次 API 调用都要花钱
- Foundation Models 免费
- 扩展到数百万用户 = 巨额成本
离线不可用：需要互联网
- 飞行模式、信号差 → 功能失效
- Foundation Models 离线工作
延迟：网络往返增加 500-2000 毫秒
- Foundation Models：设备端，<100 毫秒启动

何时 ChatGPT 是合适的：

需要世界知识（例如"法国总统是谁？"）
复杂推理（多步逻辑、数学证明）
非常长的上下文（>4096 个令牌）

强制性回应：

"我理解 ChatGPT 在某些任务上能提供出色的结果。然而，
对于此功能，Foundation Models 是正确的选择，原因有三点至关重要：

1. **隐私**：此功能处理[医疗笔记/财务数据/个人内容]。
   用户期望这些数据保留在设备上。发送到外部 API 违反了这种信任，
   并可能带来合规性问题。

2. **成本**：在规模上，ChatGPT API 调用每 1000 次请求花费 $X。Foundation Models
   是免费的。对于 Y 百万用户，这相当于每年可以避免 $Z 的成本。

3. **离线能力**：Foundation Models 无需互联网即可工作。处于飞行模式
   或信号差的用户仍能获得完整功能。

**何时使用 ChatGPT**：如果此功能需要世界知识或复杂推理，
ChatGPT 将是正确的选择。但这是[摘要/提取/分类]，
而这正是 Foundation Models 优化的方向。

**时间估算**：Foundation Models 实现：15-20 分钟。
ChatGPT 的隐私合规审查：2-4 周。"

节省的时间：隐私合规审查与正确实现：2-4 周 vs 20 分钟

场景 2："手动解析 JSON"

背景：队友建议提示生成 JSON，并用 JSONDecoder 解析。声称这"简单且熟悉"。

⏰ 截止日期：2 天内发布
📚 熟悉度："大家都懂 JSON"
🔧 现有代码：已经有 JSON 解析工具

合理化陷阱：

"JSON 是标准"
"我们到处都在解析 JSON"
"比学习新 API 更快"

幻觉键：模型输出 {firstName: "John"}，而你期望的是 {name: "John"}
- JSONDecoder 崩溃：keyNotFound
- 没有编译时安全性
无效 JSON：模型可能输出：

Here's the person: {name: "John", age: 30}
- 不是有效的 JSON（前言文本）
- 解析失败
没有类型安全：手动字符串解析，容易出错

// ❌ 错误 - 会失败
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)

// 模型输出：{"firstName": "John Smith", "years": 30}
// 你的代码期望：{"name": ..., "age": ...}
// 崩溃：keyNotFound(name)

调试时间：2-4 小时寻找边界情况，编写解析技巧

// ✅ 正确 - 15 分钟，保证有效
@Generable
struct Person {
    let name: String
    let age: Int
}

let response = try await session.respond(
    to: "Generate a person",
    generating: Person.self
)
// response.content 是类型安全的 Person，始终有效

强制性回应：

"我理解 JSON 解析感觉更熟悉，但对于 LLM 输出，@Generable 在技术上
更优越，原因有三点：

1. **约束解码保证结构**：模型只能生成有效的 Person
   实例。不可能得到错误的键、无效的 JSON 或缺失的字段。

2. **无需解析代码**：框架自动处理解析。解析错误的可能性为零。

3. **编译时安全性**：如果我们更改 Person 结构体，编译器会捕获所有问题。
   手动 JSON 解析 = 运行时崩溃。

**实际成本**：手动 JSON 方法会遇到边界情况。调试 'keyNotFound' 崩溃
需要 2-4 小时。@Generable 实现需要 15 分钟，并且没有解析错误。

**类比**：这就像为新代码选择 Swift 而非 Objective-C。两者都有效，但
Swift 的类型安全性防止了整类错误。"

节省的时间：4-8 小时调试 vs 15 分钟正确实现

场景 3："一个大提示"

背景：功能需要从发票中提取名称、日期、金额、类别。队友建议一个提示："提取所有信息。"

🏗️ 架构："一个 API 调用更简单"
⏰ 速度："为什么要搞复杂？"
📉 复杂性："更多提示 = 更多代码"

合理化陷阱：

"越简单越好"
"一个提示意味着更少的代码"
"模型足够智能"

上下文溢出：复杂的提示 + 大发票 → 超出 4096 个令牌
结果差：模型试图一次做太多事情，质量下降
生成缓慢：一个巨大的响应需要 5-8 秒
全有或全无：如果一个字段失败，整个生成失败

更好的方法：分解为任务 + 使用工具

// ❌ 错误 - 一个巨大的提示
let prompt = """
    Extract from this invoice:
    - Vendor name
    - Invoice date
    - Total amount
    - Line items (description, quantity, price each)
    - Payment terms
    - Due date
    - Tax amount
    ...
    """
// 4 秒，质量差，可能超出上下文

// ✅ 正确 - 使用聚焦提示的结构化提取
@Generable
struct InvoiceBasics {
    let vendor: String
    let date: String
    let amount: Double
}

let basics = try await session.respond(
    to: "Extract vendor, date, and amount",
    generating: InvoiceBasics.self
) // 0.5 秒，axiom-高质量

@Generable
struct LineItem {
    let description: String
    let quantity: Int
    let price: Double
}

let items = try await session.respond(
    to: "Extract line items",
    generating: [LineItem].self
) // 1 秒，axiom-高质量

// 总计：1.5 秒，质量更好，优雅的部分失败处理

强制性回应：

"我理解一个简单 API 调用的吸引力。然而，这个特定任务需要
不同的方法：

1. **上下文限制**：发票 + 复杂的提取提示很可能超过 4096 令牌
   限制。多个聚焦的提示则远低于限制。

2. **更好的质量**：模型在聚焦任务上表现更好。'提取供应商名称'
   能达到 95%+ 的准确率。'提取所有内容'只有 60-70%。

3. **更快的感知性能**：多个提示配合流式传输可以逐步显示
   结果。用户在 0.5 秒内看到供应商名称，而不是等待 5 秒才看到所有内容。

4. **优雅降级**：如果行项目失败，我们仍然有基本信息。全有或全无
   的方法意味着完全失败。

**实现**：分解为 3-4 个聚焦提取需要 30 分钟。一个大提示
需要 2-3 小时来调试为什么它会触及上下文限制并产生差的结果。"

节省的时间：2-3 小时调试 vs 30 分钟正确设计

预热会话：在初始化时创建 LanguageModelSession，而不是在用户点击按钮时。节省首次生成的 1-2 秒。
includeSchemaInPrompt: false：对于使用相同 @Generable 类型的后续请求，在 GenerationOptions 中设置此项，以减少 10-20% 的令牌数。
流式传输的属性顺序：将最重要的属性放在 @Generable 结构体的前面。用户在 0.2 秒内看到标题，而不是等待 2.5 秒的完整生成。
Foundation Models 仪器：使用 Instruments > Foundation Models 模板来分析延迟、查看令牌计数并识别优化机会。

每个优化的代码示例请参见 axiom-foundation-models-ref。

在发布 Foundation Models 功能之前：

创建会话前检查了可用性
对结构化输出使用 @Generable（而非手动 JSON）
处理上下文溢出（exceededContextWindowSize）
处理护栏违规（guardrailViolation）
处理不支持的语言（unsupportedLanguageOrLocale）
对长生成（>1 秒）使用流式传输
不阻塞 UI（使用 Task {} 进行异步操作）
对外部数据使用工具（不提示天气/位置）
如果对延迟敏感，预热会话

指令简洁（不冗长）
从不将用户输入插值到指令中
属性顺序针对流式传输 UX 进行了优化
使用适当的温度/采样
在真实设备上测试（不仅仅是模拟器）
使用 Instruments 进行了性能分析（Foundation Models 模板）
错误处理显示优雅的 UI 消息
测试了离线情况（飞行模式）
测试了长对话（上下文处理）

不用于世界知识

🇺🇸English

Foundation Models — On-Device AI for Apple Platforms

When to Use This Skill

Use when:

Implementing on-device AI features with Foundation Models
Adding text summarization, classification, or extraction capabilities
Creating structured output from LLM responses
Building tool-calling patterns for external data integration
Streaming generated content for better UX
Debugging Foundation Models issues (context overflow, slow generation, wrong output)
Deciding between Foundation Models vs server LLMs (ChatGPT, Claude, etc.)

Related Skills

Use axiom-foundation-models-diag for systematic troubleshooting (context exceeded, guardrail violations, availability problems)
Use axiom-foundation-models-ref for complete API reference with all WWDC code examples

Red Flags — Anti-Patterns That Will Fail

❌ Using for World Knowledge

Why it fails : The on-device model is 3 billion parameters, optimized for summarization, extraction, classification — NOT world knowledge or complex reasoning.

Example of wrong use :

// ❌ BAD - Asking for world knowledge
let session = LanguageModelSession()
let response = try await session.respond(to: "What's the capital of France?")

Why : Model will hallucinate or give low-quality answers. It's trained for content generation, not encyclopedic knowledge.

Correct approach : Use server LLMs (ChatGPT, Claude) for world knowledge, or provide factual data through Tool calling.

❌ Blocking Main Thread

Why it fails : session.respond() is async but if called synchronously on main thread, freezes UI for seconds.

Example of wrong use :

// ❌ BAD - Blocking main thread
Button("Generate") {
    let response = try await session.respond(to: prompt) // UI frozen!
}

Why : Generation takes 1-5 seconds. User sees frozen app, bad reviews follow.

Correct approach :

// ✅ GOOD - Async on background
Button("Generate") {
    Task {
        let response = try await session.respond(to: prompt)
        // Update UI with response
    }
}

❌ Manual JSON Parsing

Why it fails : Prompting for JSON and parsing with JSONDecoder leads to hallucinated keys, invalid JSON, no type safety.

Example of wrong use :

// ❌ BAD - Manual JSON parsing
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
let data = response.content.data(using: .utf8)!
let person = try JSONDecoder().decode(Person.self, from: data) // CRASHES!

Why : Model might output {firstName: "John"} when you expect {name: "John"}. Or invalid JSON entirely.

Correct approach :

// ✅ GOOD - @Generable guarantees structure
@Generable
struct Person {
    let name: String
    let age: Int
}

let response = try await session.respond(
    to: "Generate a person",
    generating: Person.self
)
// response.content is type-safe Person instance

❌ Ignoring Availability Check

Why it fails : Foundation Models only runs on Apple Intelligence devices in supported regions. App crashes or shows errors without check.

Example of wrong use :

// ❌ BAD - No availability check
let session = LanguageModelSession() // Might fail!

Correct approach :

// ✅ GOOD - Check first
switch SystemLanguageModel.default.availability {
case .available:
    let session = LanguageModelSession()
    // proceed
case .unavailable(let reason):
    // Show graceful UI: "AI features require Apple Intelligence"
}

❌ Single Huge Prompt

Why it fails : 4096 token context window (input + output). One massive prompt hits limit, gives poor results.

Example of wrong use :

// ❌ BAD - Everything in one prompt
let prompt = """
    Generate a 7-day itinerary for Tokyo including hotels, restaurants,
    activities for each day, transportation details, budget breakdown...
    """
// Exceeds context, poor quality

Correct approach : Break into smaller tasks, use tools for external data, multi-turn conversation.

❌ Not Handling Generation Errors

Why it fails : Three errors MUST be handled or your app will crash in production.

do {
    let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // Multi-turn transcript grew beyond 4096 tokens
    // → Condense transcript and create new session (see Pattern 5)
} catch LanguageModelSession.GenerationError.guardrailViolation {
    // Content policy triggered
    // → Show graceful message: "I can't help with that request"
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
    // User input in unsupported language
    // → Show disclaimer, check SystemLanguageModel.default.supportedLanguages
}

Mandatory First Steps

Before writing any Foundation Models code, complete these steps:

1. Check Availability

See "Ignoring Availability Check" in Red Flags above for the required pattern. Foundation Models requires Apple Intelligence-enabled device, supported region, and user opt-in.

2. Identify Use Case

Ask yourself : What is my primary goal?

Use Case	Foundation Models?	Alternative
Summarization	✅ YES
Extraction (key info from text)	✅ YES
Classification (categorize content)	✅ YES
Content tagging	✅ YES (built-in adapter!)
World knowledge	❌ NO	ChatGPT, Claude, Gemini
Complex reasoning	❌ NO	Server LLMs
Mathematical computation	❌ NO	Calculator, symbolic math

Critical : If your use case requires world knowledge or advanced reasoning, stop. Foundation Models is the wrong tool.

3. Design @Generable Schema

If you need structured output (not just plain text):

Bad approach : Prompt for "JSON" and parse manually Good approach : Define @Generable type

@Generable
struct SearchSuggestions {
    @Guide(description: "Suggested search terms", .count(4))
    var searchTerms: [String]
}

Why : Constrained decoding guarantees structure. No parsing errors, no hallucinated keys.

4. Consider Tools for External Data

If your feature needs external information:

Weather → WeatherKit tool
Locations → MapKit tool
Contacts → Contacts API tool
Calendar → EventKit tool

Don't try to get this information from the model (it will hallucinate). Do define Tool protocol implementations.

5. Plan Streaming for Long Generations

If generation takes >1 second, use streaming:

let stream = session.streamResponse(
    to: prompt,
    generating: Itinerary.self
)

for try await partial in stream {
    // Update UI incrementally
    self.itinerary = partial
}

Why : Users see progress immediately, perceived latency drops dramatically.

Decision Tree

Need on-device AI?
│
├─ World knowledge/reasoning?
│  └─ ❌ NOT Foundation Models
│     → Use ChatGPT, Claude, Gemini, etc.
│     → Reason: 3B parameter model, not trained for encyclopedic knowledge
│
├─ Summarization?
│  └─ ✅ YES → Pattern 1 (Basic Session)
│     → Example: Summarize article, condense email
│     → Time: 10-15 minutes
│
├─ Structured extraction?
│  └─ ✅ YES → Pattern 2 (@Generable)
│     → Example: Extract name, date, amount from invoice
│     → Time: 15-20 minutes
│
├─ Content tagging?
│  └─ ✅ YES → Pattern 3 (contentTagging use case)
│     → Example: Tag article topics, extract entities
│     → Time: 10 minutes
│
├─ Need external data?
│  └─ ✅ YES → Pattern 4 (Tool calling)
│     → Example: Fetch weather, query contacts, get locations
│     → Time: 20-30 minutes
│
├─ Long generation?
│  └─ ✅ YES → Pattern 5 (Streaming)
│     → Example: Generate itinerary, create story
│     → Time: 15-20 minutes
│
└─ Dynamic schemas (runtime-defined structure)?
   └─ ✅ YES → Pattern 6 (DynamicGenerationSchema)
      → Example: Level creator, user-defined forms
      → Time: 30-40 minutes

Pattern 1: Basic Session

Use when : Simple text generation, summarization, or content analysis.

Core Concepts

LanguageModelSession :

Stateful — retains transcript of all interactions
Instructions vs prompts:
- Instructions (from developer): Define model's role, static guidance
- Prompts (from user): Dynamic input for generation
Model trained to obey instructions over prompts (security feature)

Implementation

import FoundationModels

func respond(userInput: String) async throws -> String {
    let session = LanguageModelSession(instructions: """
        You are a friendly barista in a pixel art coffee shop.
        Respond to the player's question concisely.
        """
    )
    let response = try await session.respond(to: userInput)
    return response.content
}

Key Points

Instructions are optional — Reasonable defaults if omitted
Never interpolate user input into instructions — Security risk (prompt injection)
Keep instructions concise — Each token adds latency

Multi-Turn Interactions

let session = LanguageModelSession()

// First turn
let first = try await session.respond(to: "Write a haiku about fishing")
print(first.content)
// "Silent waters gleam,
//  Casting lines in morning mist—
//  Hope in every cast."

// Second turn - model remembers context
let second = try await session.respond(to: "Do another one about golf")
print(second.content)
// "Silent morning dew,
//  Caddies guide with gentle words—
//  Paths of patience tread."

// Inspect full transcript
print(session.transcript)

Why this works : Session retains transcript automatically. Model uses context from previous turns.

When to Use This Pattern

✅ Good for :

Simple Q&A
Text summarization
Content analysis
Single-turn generation

❌ Not good for :

Structured output (use Pattern 2)
Long conversations (will hit context limit)
External data needs (use Pattern 4)

Pattern 2: @Generable Structured Output

Use when : You need structured data from model, not just plain text.

The Problem

Without @Generable:

// ❌ BAD - Unreliable
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
// Might get: {"firstName": "John"} when you expect {"name": "John"}
// Might get invalid JSON entirely
// Must parse manually, prone to crashes

The Solution: @Generable

@Generable
struct Person {
    let name: String
    let age: Int
}

let session = LanguageModelSession()
let response = try await session.respond(
    to: "Generate a person",
    generating: Person.self
)

let person = response.content // Type-safe Person instance!

How It Works (Constrained Decoding)

@Generable macro generates schema at compile-time
Schema passed to model automatically
Model generates tokens constrained by schema
Framework parses output into Swift type
Guaranteed structural correctness — No hallucinated keys, no parsing errors

"Constrained decoding masks out invalid tokens. Model can only pick tokens valid according to schema."

Supported Types

Supports String, Int, Float, Double, Bool, arrays, nested @Generable types, enums with associated values, and recursive types. See axiom-foundation-models-ref for complete list with examples.

@Guide Constraints

Control generated values with @Guide. Supports descriptions, numeric ranges, array counts, and regex patterns:

@Generable
struct NPC {
    @Guide(description: "A full name")
    let name: String

    @Guide(.range(1...10))
    let level: Int

    @Guide(.count(3))
    let attributes: [String]
}

Runtime validation : @Guide constraints are enforced during generation via constrained decoding — the model cannot produce out-of-range values. However, always validate business logic on the result since the model may produce semantically wrong but structurally valid output.

See axiom-foundation-models-ref for complete @Guide reference (ranges, regex, maximum counts).

Property Order Matters

Properties generated in declaration order :

@Generable
struct Itinerary {
    var destination: String // Generated first
    var days: [DayPlan]     // Generated second
    var summary: String     // Generated last
}

"You may find model produces best summaries when they're last property."

Why : Later properties can reference earlier ones. Put most important properties first for streaming.

Pattern 3: Streaming with PartiallyGenerated

Use when : Generation takes >1 second and you want progressive UI updates.

The Problem

Without streaming:

// User waits 3-5 seconds seeing nothing
let response = try await session.respond(to: prompt, generating: Itinerary.self)
// Then entire result appears at once

User experience : Feels slow, frozen UI.

The Solution: Streaming

@Generable
struct Itinerary {
    var name: String
    var days: [DayPlan]
}

let stream = session.streamResponse(
    to: "Generate a 3-day itinerary to Mt. Fuji",
    generating: Itinerary.self
)

for try await partial in stream {
    print(partial) // Incrementally updated
}

PartiallyGenerated Type

@Generable macro automatically creates a PartiallyGenerated type where all properties are optional (they fill in as the model generates them). See axiom-foundation-models-ref for details.

SwiftUI Integration

struct ItineraryView: View {
    let session: LanguageModelSession
    @State private var itinerary: Itinerary.PartiallyGenerated?

    var body: some View {
        VStack {
            if let name = itinerary?.name {
                Text(name)
                    .font(.title)
            }

            if let days = itinerary?.days {
                ForEach(days, id: \.self) { day in
                    DayView(day: day)
                }
            }

            Button("Generate") {
                Task {
                    let stream = session.streamResponse(
                        to: "Generate 3-day itinerary to Tokyo",
                        generating: Itinerary.self
                    )

                    for try await partial in stream {
                        self.itinerary = partial
                    }
                }
            }
        }
    }
}

View Identity

Critical for arrays :

// ✅ GOOD - Stable identity
ForEach(days, id: \.id) { day in
    DayView(day: day)
}

// ❌ BAD - Identity changes, animations break
ForEach(days.indices, id: \.self) { index in
    DayView(day: days[index])
}

When to Use Streaming

✅ Use for :

Itineraries
Stories
Long descriptions
Multi-section content

❌ Skip for :

Simple Q&A (< 1 sentence)
Quick classification
Content tagging

Streaming Error Handling

Handle errors during streaming gracefully — partial results may already be displayed:

do {
    for try await partial in stream {
        self.itinerary = partial
    }
} catch LanguageModelSession.GenerationError.guardrailViolation {
    // Partial content may be visible — show non-disruptive error
    self.errorMessage = "Generation stopped by content policy"
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // Too much context — create fresh session and retry
    session = LanguageModelSession()
}

Pattern 4: Tool Calling

Use when : Model needs external data (weather, locations, contacts) to generate response.

The Problem

// ❌ BAD - Model will hallucinate
let response = try await session.respond(
    to: "What's the temperature in Cupertino?"
)
// Output: "It's about 72°F" (completely made up!)

Why : 3B parameter model doesn't have real-time weather data.

The Solution: Tool Calling

Let model autonomously call your code to fetch external data.

import FoundationModels
import WeatherKit
import CoreLocation

struct GetWeatherTool: Tool {
    let name = "getWeather"
    let description = "Retrieve latest weather for a city"

    @Generable
    struct Arguments {
        @Guide(description: "The city to fetch weather for")
        var city: String
    }

    func call(arguments: Arguments) async throws -> ToolOutput {
        let places = try await CLGeocoder().geocodeAddressString(arguments.city)
        let weather = try await WeatherService.shared.weather(for: places.first!.location!)
        let temp = weather.currentWeather.temperature.value

        return ToolOutput("\(arguments.city)'s temperature is \(temp) degrees.")
    }
}

Attaching Tool to Session

let session = LanguageModelSession(
    tools: [GetWeatherTool()],
    instructions: "Help user with weather forecasts."
)

let response = try await session.respond(
    to: "What's the temperature in Cupertino?"
)

print(response.content)
// "It's 71°F in Cupertino!"

Model autonomously :

Recognizes it needs weather data
Calls GetWeatherTool
Receives real temperature
Incorporates into natural response

Key Concepts

Tool protocol : Requires name, description, @Generable Arguments, and call() method
ToolOutput : Return String (natural language) or GeneratedContent (structured)
Multiple tools : Session accepts array of tools; model autonomously decides which to call
Stateful tools : Use class (not struct) when tools need to maintain state across calls

See axiom-foundation-models-ref for Tool protocol reference, ToolOutput forms, stateful tool patterns, and additional examples.

Tool Calling Flow

1. Session initialized with tools
2. User prompt: "What's Tokyo's weather?"
3. Model analyzes: "Need weather data"
4. Model generates tool call: getWeather(city: "Tokyo")
5. Framework calls your tool's call() method
6. Your tool fetches real data from API
7. Tool output inserted into transcript
8. Model generates final response using tool output

"Model decides autonomously when and how often to call tools. Can call multiple tools per request, even in parallel."

Tool Calling Guarantees

✅ Guaranteed :

Valid tool names (no hallucinated tools)
Valid arguments (via @Generable)
Structural correctness

❌ Not guaranteed :

Tool will be called (model might not need it)
Specific argument values (model decides based on context)

When to Use Tools

✅ Use for :

Weather data
Map/location queries
Contact information
Calendar events
External APIs

❌ Don't use for :

Data model already has
Information in prompt/instructions
Simple calculations (model can do these)

Pattern 5: Context Management

Use when : Multi-turn conversations that might exceed 4096 token limit.

The Problem

// Long conversation...
for i in 1...100 {
    let response = try await session.respond(to: "Question \(i)")
    // Eventually...
    // Error: exceededContextWindowSize
}

Context window : 4096 tokens (input + output combined) Average : ~3 characters per token in English

Rough calculation :

4096 tokens ≈ 12,000 characters
≈ 2,000-3,000 words total

Long conversation or verbose prompts/responses → Exceed limit

Handling Context Overflow

Basic: Start fresh session

var session = LanguageModelSession()

do {
    let response = try await session.respond(to: prompt)
    print(response.content)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // New session, no history
    session = LanguageModelSession()
}

Problem : Loses entire conversation history.

Better: Condense Transcript

var session = LanguageModelSession()

do {
    let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // New session with condensed history
    session = condensedSession(from: session)
}

func condensedSession(from previous: LanguageModelSession) -> LanguageModelSession {
    let allEntries = previous.transcript.entries
    var condensedEntries = [Transcript.Entry]()

    // Always include first entry (instructions)
    if let first = allEntries.first {
        condensedEntries.append(first)

        // Include last entry (most recent context)
        if allEntries.count > 1, let last = allEntries.last {
            condensedEntries.append(last)
        }
    }

    let condensedTranscript = Transcript(entries: condensedEntries)
    return LanguageModelSession(transcript: condensedTranscript)
}

Why this works :

Instructions always preserved
Recent context retained
Total tokens drastically reduced

For advanced strategies (summarizing middle entries with Foundation Models itself), see axiom-foundation-models-ref.

Preventing Context Overflow

1. Keep prompts concise :

// ❌ BAD
let prompt = """
    I want you to generate a comprehensive detailed analysis of this article
    with multiple sections including summary, key points, sentiment analysis,
    main arguments, counter arguments, logical fallacies, and conclusions...
    """

// ✅ GOOD
let prompt = "Summarize this article's key points"

2. Use tools for data : Instead of putting entire dataset in prompt, use tools to fetch on-demand.

3. Break complex tasks into steps :

// ❌ BAD - One massive generation
let response = try await session.respond(
    to: "Create 7-day itinerary with hotels, restaurants, activities..."
)

// ✅ GOOD - Multiple smaller generations
let overview = try await session.respond(to: "Create high-level 7-day plan")
for day in 1...7 {
    let details = try await session.respond(to: "Detail activities for day \(day)")
}

Pattern 6: Sampling & Generation Options

Use when : You need control over output randomness/determinism.

When to Adjust Sampling

Goal	Setting	Use Cases
Deterministic	`GenerationOptions(sampling: .greedy)`	Unit tests, demos, consistency-critical
Focused	`GenerationOptions(temperature: 0.5)`	Fact extraction, classification
Creative	`GenerationOptions(temperature: 2.0)`	Story generation, brainstorming, varied NPC dialog

Default : Random sampling (temperature 1.0) gives balanced results.

Caveat : Greedy determinism only holds for same model version. OS updates may change output.

See axiom-foundation-models-ref for complete GenerationOptions API reference.

Pressure Scenarios

Scenario 1: "Just Use ChatGPT API"

Context : You're implementing a new AI feature. PM suggests using ChatGPT API for "better results."

Pressure signals :

👔 Authority : PM outranks you
💸 Existing integration : Team already uses OpenAI for other features
⏰ Speed : "ChatGPT is proven, Foundation Models is new"

Rationalization traps :

"PM knows best"
"ChatGPT gives better answers"
"Faster to implement with existing code"

Why this fails :

Privacy violation : User data sent to external server
- Medical notes, financial docs, personal messages
- Violates user expectation of on-device privacy
- Potential GDPR/privacy law issues
Cost : Every API call costs money
- Foundation Models is free
- Scale to millions of users = massive costs
Offline unavailable : Requires internet
- Airplane mode, poor signal → feature broken
- Foundation Models works offline
Latency : Network round-trip adds 500-2000ms
- Foundation Models: On-device, <100ms startup

When ChatGPT IS appropriate :

World knowledge required (e.g. "Who is the president of France?")
Complex reasoning (multi-step logic, math proofs)
Very long context (>4096 tokens)

Mandatory response :

"I understand ChatGPT delivers great results for certain tasks. However,
for this feature, Foundation Models is the right choice for three critical reasons:

1. **Privacy**: This feature processes [medical notes/financial data/personal content].
   Users expect this data stays on-device. Sending to external API violates that trust
   and may have compliance issues.

2. **Cost**: At scale, ChatGPT API calls cost $X per 1000 requests. Foundation Models
   is free. For Y million users, that's $Z annually we can avoid.

3. **Offline capability**: Foundation Models works without internet. Users in airplane
   mode or with poor signal still get full functionality.

**When to use ChatGPT**: If this feature required world knowledge or complex reasoning,
ChatGPT would be the right choice. But this is [summarization/extraction/classification],
which is exactly what Foundation Models is optimized for.

**Time estimate**: Foundation Models implementation: 15-20 minutes.
Privacy compliance review for ChatGPT: 2-4 weeks."

Time saved : Privacy compliance review vs correct implementation: 2-4 weeks vs 20 minutes

Scenario 2: "Parse JSON Manually"

Context : Teammate suggests prompting for JSON, parsing with JSONDecoder. Claims it's "simple and familiar."

Pressure signals :

⏰ Deadline : Ship in 2 days
📚 Familiarity : "Everyone knows JSON"
🔧 Existing code : Already have JSON parsing utilities

Rationalization traps :

"JSON is standard"
"We parse JSON everywhere already"
"Faster than learning new API"

Why this fails :

Hallucinated keys : Model outputs {firstName: "John"} when you expect {name: "John"}
- JSONDecoder crashes: keyNotFound
- No compile-time safety
Invalid JSON : Model might output:

Here's the person: {name: "John", age: 30}
- Not valid JSON (preamble text)
- Parsing fails
No type safety : Manual string parsing, prone to errors

Real-world example :

// ❌ BAD - Will fail
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)

// Model outputs: {"firstName": "John Smith", "years": 30}
// Your code expects: {"name": ..., "age": ...}
// CRASH: keyNotFound(name)

Debugging time : 2-4 hours finding edge cases, writing parsing hacks

Correct approach :

// ✅ GOOD - 15 minutes, guaranteed to work
@Generable
struct Person {
    let name: String
    let age: Int
}

let response = try await session.respond(
    to: "Generate a person",
    generating: Person.self
)
// response.content is type-safe Person, always valid

Mandatory response :

"I understand JSON parsing feels familiar, but for LLM output, @Generable is objectively
better for three technical reasons:

1. **Constrained decoding guarantees structure**: Model can ONLY generate valid Person
   instances. Impossible to get wrong keys, invalid JSON, or missing fields.

2. **No parsing code needed**: Framework handles parsing automatically. Zero chance of
   parsing bugs.

3. **Compile-time safety**: If we change Person struct, compiler catches all issues.
   Manual JSON parsing = runtime crashes.

**Real cost**: Manual JSON approach will hit edge cases. Debugging 'keyNotFound' crashes
takes 2-4 hours. @Generable implementation takes 15 minutes and has zero parsing bugs.

**Analogy**: This is like choosing Swift over Objective-C for new code. Both work, but
Swift's type safety prevents entire categories of bugs."

Time saved : 4-8 hours debugging vs 15 minutes correct implementation

Scenario 3: "One Big Prompt"

Context : Feature requires extracting name, date, amount, category from invoice. Teammate suggests one prompt: "Extract all information."

Pressure signals :

🏗️ Architecture : "Simpler with one API call"
⏰ Speed : "Why make it complicated?"
📉 Complexity : "More prompts = more code"

Rationalization traps :

"Simpler is better"
"One prompt means less code"
"Model is smart enough"

Why this fails :

Context overflow : Complex prompt + large invoice → Exceeds 4096 tokens
Poor results : Model tries to do too much at once, quality suffers
Slow generation : One massive response takes 5-8 seconds
All-or-nothing : If one field fails, entire generation fails

Better approach : Break into tasks + use tools

// ❌ BAD - One massive prompt
let prompt = """
    Extract from this invoice:
    - Vendor name
    - Invoice date
    - Total amount
    - Line items (description, quantity, price each)
    - Payment terms
    - Due date
    - Tax amount
    ...
    """
// 4 seconds, poor quality, might exceed context

// ✅ GOOD - Structured extraction with focused prompts
@Generable
struct InvoiceBasics {
    let vendor: String
    let date: String
    let amount: Double
}

let basics = try await session.respond(
    to: "Extract vendor, date, and amount",
    generating: InvoiceBasics.self
) // 0.5 seconds, axiom-high quality

@Generable
struct LineItem {
    let description: String
    let quantity: Int
    let price: Double
}

let items = try await session.respond(
    to: "Extract line items",
    generating: [LineItem].self
) // 1 second, axiom-high quality

// Total: 1.5 seconds, better quality, graceful partial failures

Mandatory response :

"I understand the appeal of one simple API call. However, this specific task requires
a different approach:

1. **Context limits**: Invoice + complex extraction prompt will likely exceed 4096 token
   limit. Multiple focused prompts stay well under limit.

2. **Better quality**: Model performs better with focused tasks. 'Extract vendor name'
   gets 95%+ accuracy. 'Extract everything' gets 60-70%.

3. **Faster perceived performance**: Multiple prompts with streaming show progressive
   results. Users see vendor name in 0.5s, not waiting 5s for everything.

4. **Graceful degradation**: If line items fail, we still have basics. All-or-nothing
   approach means total failure.

**Implementation**: Breaking into 3-4 focused extractions takes 30 minutes. One big
prompt takes 2-3 hours debugging why it hits context limit and produces poor results."

Time saved : 2-3 hours debugging vs 30 minutes proper design

Performance Optimization

Key Optimizations

Prewarm session : Create LanguageModelSession at init, not when user taps button. Saves 1-2 seconds off first generation.
includeSchemaInPrompt: false : For subsequent requests with the same @Generable type, set this in GenerationOptions to reduce token count by 10-20%.
Property order for streaming : Put most important properties first in @Generable structs. User sees title in 0.2s instead of waiting 2.5s for full generation.
Foundation Models Instrument : Use Instruments > Foundation Models template to profile latency, see token counts, and identify optimization opportunities.

See axiom-foundation-models-ref for code examples of each optimization.

Checklist

Before shipping Foundation Models features:

Required Checks

Availability checked before creating session
Using @Generable for structured output (not manual JSON)
Handling context overflow (exceededContextWindowSize)
Handling guardrail violations (guardrailViolation)
Handling unsupported language (unsupportedLanguageOrLocale)
Streaming for long generations (>1 second)
Not blocking UI (using Task {} for async)
Tools for external data (not prompting for weather/locations)
Prewarmed session if latency-sensitive

Best Practices

Instructions are concise (not verbose)
Never interpolating user input into instructions
Property order optimized for streaming UX
Using appropriate temperature/sampling
Tested on real device (not just simulator)
Profiled with Instruments (Foundation Models template)
Error handling shows graceful UI messages
Tested offline (airplane mode)
Tested with long conversations (context handling)

Model Capability

Not using for world knowledge
Not using for complex reasoning
Use case is: summarization, extraction, classification, or generation
Have fallback if unavailable (show message, disable feature)

Resources

WWDC : 286, 259, 301

Skills : axiom-foundation-models-diag, axiom-foundation-models-ref

Last Updated : 2025-12-03 Version : 1.0.0 Target : iOS 26+, macOS 26+, iPadOS 26+, axiom-visionOS 26+

Weekly Installs

145

Repository

charleswiltgen/axiom

GitHub Stars

674

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode129

codex124

gemini-cli120

claude-code119

cursor116

github-copilot113

AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具

50,900 周安装

Apple设备端AI开发指南：Foundation Models使用技巧与常见错误规避

🇨🇳中文介绍

Foundation Models — 适用于 Apple 平台的设备端 AI

何时使用此技能

相关技能

危险信号 —— 会导致失败的错误模式

❌ 用于世界知识

相关 Skills

❌ 阻塞主线程

❌ 手动解析 JSON

❌ 忽略可用性检查

❌ 单个巨大提示

❌ 不处理生成错误

强制性的第一步

1. 检查可用性

2. 确定用例

3. 设计 @Generable 模式

4. 考虑使用工具获取外部数据

5. 为长生成规划流式传输

决策树

模式 1：基础会话

核心概念

实现

关键点

多轮交互

何时使用此模式

模式 2：@Generable 结构化输出

问题

解决方案：@Generable

工作原理（约束解码）

支持的类型

@Guide 约束

属性顺序很重要

模式 3：使用 PartiallyGenerated 进行流式传输

问题

解决方案：流式传输

PartiallyGenerated 类型

SwiftUI 集成

视图标识

何时使用流式传输

流式传输错误处理

模式 4：工具调用

问题

解决方案：工具调用

将工具附加到会话

关键概念

工具调用流程

工具调用保证

何时使用工具

模式 5：上下文管理

问题

处理上下文溢出

基础方法：启动新会话

更好的方法：压缩记录

防止上下文溢出

模式 6：采样与生成选项

何时调整采样

压力场景

场景 1："直接使用 ChatGPT API"

场景 2："手动解析 JSON"

场景 3："一个大提示"

性能优化

关键优化

清单

必需检查

最佳实践

模型能力

🇺🇸English

Foundation Models — On-Device AI for Apple Platforms

When to Use This Skill

Related Skills

Red Flags — Anti-Patterns That Will Fail

❌ Using for World Knowledge

❌ Blocking Main Thread

❌ Manual JSON Parsing

❌ Ignoring Availability Check

❌ Single Huge Prompt

❌ Not Handling Generation Errors

Mandatory First Steps

1. Check Availability