Apify Actor 输出模式生成工具 - 自动化创建 dataset_schema.json 与 output_schema.json

apify-generate-output-schema by apify/agent-skills

245 周安装量

1,700 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/apify/agent-skills --skill apify-generate-output-schema

开发自动化数据处理

🇨🇳中文介绍

生成 Actor 输出模式

你正在为一个 Apify Actor 生成输出模式文件。输出模式告诉 Apify Console 如何显示运行结果。你将分析 Actor 的源代码，创建 dataset_schema.json、output_schema.json 和 key_value_store_schema.json（如果 Actor 使用键值存储），并更新 actor.json。

核心原则

先分析代码：阅读 Actor 的源代码以了解它实际推送到数据集的数据——切勿猜测
每个字段都可为空：API 和网站是不可预测的——始终设置 "nullable": true
匿名化示例：切勿在示例中使用真实的用户 ID、用户名或个人数据
对照代码验证：如果存在 TypeScript 类型，请根据类型定义和生成值的代码交叉检查模式
复用现有模式：在生成模式之前，检查同一仓库中的其他 Actor 是否已有输出模式——匹配它们的结构、命名约定、描述风格和格式
不要重复造轮子：复用代码库中现有的类型定义、接口和实用程序，而不是创建重复的定义

阶段 1：探索 Actor 结构

：定位 Actor 并理解其输出

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

阶段 2：生成 `dataset_schema.json`

目标：创建包含字段定义和显示视图的完整数据集模式

{
    "actorSpecification": 1,
    "fields": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            // 所有输出字段都在这里——Actor 可以生成的每个字段，
            // 不仅仅是概览视图中显示的字段
        },
        "required": [],
        "additionalProperties": true
    },
    "views": {
        "overview": {
            "title": "概览",
            "description": "最重要的字段一目了然",
            "transformation": {
                "fields": [
                    // 8-12 个最重要的字段名
                ]
            },
            "display": {
                "component": "table",
                "properties": {
                    // 每个概览字段的显示配置
                }
            }
        }
    }
}

与现有模式保持一致

如果在阶段 1（步骤 5）中在仓库中找到了现有的输出模式，请遵循它们的约定：

匹配描述写作风格（句子大小写 vs. 小写，带句点 vs. 不带句点等）
匹配字段命名约定（camelCase vs. snake_case）——这也必须与 Actor 代码生成的实际键匹配
匹配示例值风格（例如，日期格式、URL 模式、占位符名称）
匹配视图结构（概览中的字段数、显示格式选择）
匹配JSON 格式（缩进、属性顺序、间距）——同一仓库中的所有模式必须使用相同的格式，包括独立的 Actor

当 Actor 代码已经有明确定义的 TypeScript 接口或 Python 类型类时，直接从这些类型派生字段，而不是从头开始重新分析 pushData/push_data 调用。类型定义是规范来源。

硬性规则（无例外）

规则	详情
`properties` 中的所有字段	`fields.properties` 对象必须包含 Actor 可以输出的每个字段，而不仅仅是概览视图中显示的字段。视图部分选择用于显示的子集——`properties` 部分必须是完整的超集
`"nullable": true`	在每个字段上——API 是不可预测的
`"additionalProperties": true`	在顶级 `fields` 对象上和 `properties` 内的每个嵌套对象上。这是最常被遗漏的规则——它必须出现在两个层级
`"required": []`	始终为空数组——在顶级 `fields` 对象上和 `properties` 内的每个嵌套对象上
匿名化示例	没有真实的用户 ID、用户名或内容
`"type"` 与 `"nullable"` 一起必需	AJV 拒绝在同一字段上只有 `nullable` 而没有 `type`

警告——最常见的错误：

只包含出现在概览视图中的字段。fields.properties 必须列出所有输出字段，即使它们不在 views 部分。

只在嵌套的对象类型属性上添加 "required": [] 和 "additionalProperties": true，但忘记了在顶级 fields 对象上添加。两个层级都需要它们。

注意：nullable 是 JSON Schema draft-07 的 Apify 特定扩展。这是有意为之且正确的。

字符串字段：

"title": {
    "type": "string",
    "description": "抓取项目的标题",
    "nullable": true,
    "example": "示例项目标题"
}

"viewCount": {
    "type": "number",
    "description": "观看次数",
    "nullable": true,
    "example": 15000
}

"isVerified": {
    "type": "boolean",
    "description": "账户是否已验证",
    "nullable": true,
    "example": true
}

"hashtags": {
    "type": "array",
    "description": "与项目关联的话题标签",
    "items": { "type": "string" },
    "nullable": true,
    "example": ["#example", "#demo"]
}

嵌套对象字段：

"authorInfo": {
    "type": "object",
    "description": "作者信息",
    "properties": {
        "name": { "type": "string", "nullable": true },
        "url": { "type": "string", "nullable": true }
    },
    "required": [],
    "additionalProperties": true,
    "nullable": true,
    "example": { "name": "示例作者", "url": "https://example.com/author" }
}

"contentType": {
    "type": "string",
    "description": "内容类型",
    "enum": ["article", "video", "image"],
    "nullable": true,
    "example": "article"
}

联合类型（例如，TypeScript 的 ObjectType | string）：

"metadata": {
    "type": ["object", "string"],
    "description": "结构化元数据对象，或在不可用时的错误字符串",
    "nullable": true,
    "example": { "key": "value" }
}

使用真实但通用的值。遵循平台 ID 格式约定：

字段类型	示例方法
ID	匹配平台格式和长度（例如，YouTube 视频 ID 为 11 个字符）
用户名	`"exampleuser"`、`"sampleuser123"`
显示名称	`"示例频道"`、`"示例作者"`
URL	使用带有假 ID 的平台标准 URL 格式
日期	`"2025-01-15T12:00:00.000Z"` (ISO 8601)
文本内容	通用描述性文本，例如 `"这是一个示例描述。"`

transformation.fields：列出 8–12 个最重要的字段名（顺序 = UI 中的列顺序）
display.properties：每个概览字段一个条目，包含 label 和 format
可用格式："text"、"number"、"date"、"link"、"boolean"、"image"、"array"、"object"

选择能给用户提供最有用的数据概览摘要的字段。

阶段 3：生成 `key_value_store_schema.json`（如果适用）

目标：如果 Actor 在键值存储中存储数据，则定义键值存储集合

跳过此阶段，如果在阶段 1 中没有发现 Actor.setValue() / Actor.set_value() 调用（除了默认的 INPUT 键）。

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "<描述性标题——键值存储包含的内容>",
    "description": "<一句话描述存储的数据>",
    "collections": {
        "<collectionName>": {
            "title": "<人类可读的标题>",
            "description": "<此集合包含的内容>",
            "keyPrefix": "<prefix->"
        }
    }
}

根据键模式对发现的 setValue / set_value 调用进行分组：

固定键（例如，"RESULTS"、"summary"）——使用 "key"（精确匹配）
带有前缀的动态键（例如，"screenshot-${id}"、f"image-{name}"）——使用 "keyPrefix"

每个组成为一个集合。

属性	必需	描述
`title`	是	在 UI 选项卡中显示
`description`	否	在 UI 工具提示中显示
`key`	条件性	单键集合的精确键（使用 `key` 或 `keyPrefix`，不能同时使用）
`keyPrefix`	条件性	多键集合的前缀（使用 `key` 或 `keyPrefix`，不能同时使用）
`contentTypes`	否	限制允许的 MIME 类型（例如，`["image/jpeg"]`、`["application/json"]`）
`jsonSchema`	否	用于验证 `application/json` 内容的 JSON Schema draft-07

单个文件输出（例如，报告）：

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "分析结果",
    "description": "包含分析输出的键值存储",
    "collections": {
        "report": {
            "title": "报告",
            "description": "最终分析报告",
            "key": "REPORT",
            "contentTypes": ["application/json"]
        }
    }
}

带有前缀的多个文件（例如，截图）：

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "抓取的文件",
    "description": "包含下载文件和截图的键值存储",
    "collections": {
        "screenshots": {
            "title": "截图",
            "description": "抓取期间捕获的页面截图",
            "keyPrefix": "screenshot-",
            "contentTypes": ["image/png", "image/jpeg"]
        },
        "documents": {
            "title": "文档",
            "description": "下载的文档文件",
            "keyPrefix": "doc-",
            "contentTypes": ["application/pdf", "text/html"]
        }
    }
}

阶段 4：生成 `output_schema.json`

目标：创建告诉 Apify Console 在哪里查找结果的输出模式

对于大多数将数据推送到数据集的 Actor，这是一个最小化的文件：

{
    "actorOutputSchemaVersion": 1,
    "title": "<描述性标题——Actor 返回的内容>",
    "description": "<一句话描述输出数据>",
    "properties": {
        "dataset": {
            "type": "string",
            "title": "结果",
            "description": "包含所有抓取数据的数据集",
            "template": "{{links.apiDefaultDatasetUrl}}/items"
        }
    }
}

关键：每个属性条目必须包含 "type": "string"——这是 Apify 特定的约定。Apify 元验证器拒绝没有它的属性（并拒绝 "type": "object"——这里只有 "string" 有效）。

如果在阶段 3 中生成了 key_value_store_schema.json，则添加第二个属性：

"files": {
    "type": "string",
    "title": "文件",
    "description": "包含下载文件的键值存储",
    "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
}

可用的模板变量

{{links.apiDefaultDatasetUrl}} — 默认数据集的 API URL
{{links.apiDefaultKeyValueStoreUrl}} — 默认键值存储的 API URL
{{links.publicRunUrl}} — 公开运行 URL
{{links.consoleRunUrl}} — Console 运行 URL
{{links.apiRunUrl}} — API 运行 URL
{{links.containerRunUrl}} — 运行内部 Web 服务器的 URL
{{run.defaultDatasetId}} — 默认数据集的 ID
{{run.defaultKeyValueStoreId}} — 默认键值存储的 ID

阶段 5：更新 `actor.json`

目标：将模式文件连接到 Actor 配置中

读取当前的 actor.json

添加或更新 storages.dataset 引用：

"storages": {
    "dataset": "./dataset_schema.json"
}

如果生成了 key_value_store_schema.json，则添加引用：

"storages": {
    "dataset": "./dataset_schema.json",
    "keyValueStore": "./key_value_store_schema.json"
}

添加或更新 output 引用：
```
"output": "./output_schema.json"
```
如果 actor.json 有内联的 storages.dataset 或 storages.keyValueStore 对象（不是字符串路径），请将其内容迁移到相应的模式文件中，并用文件路径字符串替换内联对象

阶段 6：审查和验证

目标：确保正确性和完整性

源代码中的每个输出字段都在 dataset_schema.json 的 fields.properties 中——不仅仅是概览视图字段，而是 Actor 可以生成的所有字段
每个字段都有 "nullable": true
顶级 fields 对象同时具有 "additionalProperties": true 和 "required": []
properties 内的每个嵌套对象也都具有 "additionalProperties": true 和 "required": []
每个字段都有 "description" 和 "example"
所有示例值都已匿名化
每个具有 "nullable" 的字段都存在 "type"
视图列出了 8–12 个最有用的字段，并带有正确的显示格式
output_schema.json 的每个属性都有 "type": "string"
如果使用了键值存储：key_value_store_schema.json 具有匹配所有 setValue/set_value 调用的集合
如果使用了键值存储：每个集合使用 key 或 keyPrefix 之一（不能同时使用）
actor.json 引用了所有生成的文件
模式字段名与代码中的实际键匹配（camelCase/snake_case 一致性）
如果在仓库中找到了现有模式，新模式遵循它们的约定（描述风格、示例格式、视图结构）
模式字段是从现有类型定义（接口、TypedDicts、dataclasses）派生而来的（如果可用）——没有重复或分歧的字段定义

在写入之前，将生成的模式呈现给用户审查。

目标：记录创建的内容

创建或更新的文件
数据集模式中的字段数
键值存储模式中的集合数（如果已生成）
为概览视图选择的字段
任何需要用户澄清的字段（模糊的类型、不明确的可空性）
建议的后续步骤（使用 apify run 在本地测试，在 Console 中验证输出选项卡）

🇺🇸English

Generate Actor Output Schema

You are generating output schema files for an Apify Actor. The output schema tells Apify Console how to display run results. You will analyze the Actor's source code, create dataset_schema.json, output_schema.json, and key_value_store_schema.json (if the Actor uses key-value store), and update actor.json.

Core Principles

Analyze code first : Read the Actor's source to understand what data it actually pushes to the dataset — never guess
Every field is nullable : APIs and websites are unpredictable — always set "nullable": true
Anonymize examples : Never use real user IDs, usernames, or personal data in examples
Verify against code : If TypeScript types exist, cross-check the schema against both the type definition AND the code that produces the values
Reuse existing patterns : Before generating schemas, check if other Actors in the same repository already have output schemas — match their structure, naming conventions, description style, and formatting
Don't reinvent the wheel : Reuse existing type definitions, interfaces, and utilities from the codebase instead of creating duplicate definitions

Phase 1: Discover Actor Structure

Goal : Locate the Actor and understand its output

Initial request: $ARGUMENTS

Actions :

Create todo list with all phases
Find the .actor/ directory containing actor.json
Read actor.json to understand the Actor's configuration
Check if dataset_schema.json, output_schema.json, and key_value_store_schema.json already exist
Search for existing schemas in the repository : Look for other .actor/ directories or schema files (e.g., **/dataset_schema.json, **/output_schema.json, **/key_value_store_schema.json) to learn the repo's conventions — match their description style, field naming, example formatting, and overall structure

Present findings to user: list all discovered dataset output fields, key-value store keys, their types, and where they come from.

Phase 2: Generate `dataset_schema.json`

Goal : Create a complete dataset schema with field definitions and display views

File structure

{
    "actorSpecification": 1,
    "fields": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            // ALL output fields here — every field the Actor can produce,
            // not just the ones shown in the overview view
        },
        "required": [],
        "additionalProperties": true
    },
    "views": {
        "overview": {
            "title": "Overview",
            "description": "Most important fields at a glance",
            "transformation": {
                "fields": [
                    // 8-12 most important field names
                ]
            },
            "display": {
                "component": "table",
                "properties": {
                    // Display config for each overview field
                }
            }
        }
    }
}

Consistency with existing schemas

If existing output schemas were found in the repository during Phase 1 (step 5), follow their conventions:

Match the description writing style (sentence case vs. lowercase, period vs. no period, etc.)
Match the field naming convention (camelCase vs. snake_case) — this must also match the actual keys produced by the Actor code
Match the example value style (e.g., date formats, URL patterns, placeholder names)
Match the view structure (number of fields in overview, display format choices)
Match the JSON formatting (indentation, property ordering, spacing) — all schemas in the same repository must use identical formatting, including standalone Actors

When the Actor code already has well-defined TypeScript interfaces or Python type classes, derive fields directly from those types rather than re-analyzing pushData/push_data calls from scratch. The type definition is the canonical source.

Hard rules (no exceptions)

Rule	Detail
All fields in`properties`	The `fields.properties` object must contain every field the Actor can output, not just the fields shown in the overview view. The views section selects a subset for display — the `properties` section must be the complete superset
`"nullable": true`	On every field — APIs are unpredictable
`"additionalProperties": true`	On the top-level`fields` object AND on every nested object within . This is the most commonly missed rule — it must appear at both levels

Warning — most common mistakes :

Only including fields that appear in the overview view. The fields.properties must list ALL output fields, even if they are not in the views section.

Only adding "required": [] and "additionalProperties": true on nested object-type properties but forgetting them on the top-level fields object. Both levels need them.

Note : nullable is an Apify-specific extension to JSON Schema draft-07. It is intentional and correct.

Field type patterns

String field:

"title": {
    "type": "string",
    "description": "Title of the scraped item",
    "nullable": true,
    "example": "Example Item Title"
}

Number field:

"viewCount": {
    "type": "number",
    "description": "Number of views",
    "nullable": true,
    "example": 15000
}

Boolean field:

"isVerified": {
    "type": "boolean",
    "description": "Whether the account is verified",
    "nullable": true,
    "example": true
}

Array field:

"hashtags": {
    "type": "array",
    "description": "Hashtags associated with the item",
    "items": { "type": "string" },
    "nullable": true,
    "example": ["#example", "#demo"]
}

Nested object field:

"authorInfo": {
    "type": "object",
    "description": "Information about the author",
    "properties": {
        "name": { "type": "string", "nullable": true },
        "url": { "type": "string", "nullable": true }
    },
    "required": [],
    "additionalProperties": true,
    "nullable": true,
    "example": { "name": "Example Author", "url": "https://example.com/author" }
}

Enum field:

"contentType": {
    "type": "string",
    "description": "Type of content",
    "enum": ["article", "video", "image"],
    "nullable": true,
    "example": "article"
}

Union type (e.g., TypeScriptObjectType | string):

"metadata": {
    "type": ["object", "string"],
    "description": "Structured metadata object, or error string if unavailable",
    "nullable": true,
    "example": { "key": "value" }
}

Anonymized example values

Use realistic but generic values. Follow platform ID format conventions:

Field type	Example approach
IDs	Match platform format and length (e.g., 11 chars for YouTube video IDs)
Usernames	`"exampleuser"`, `"sampleuser123"`
Display names	`"Example Channel"`, `"Sample Author"`
URLs	Use platform's standard URL format with fake IDs
Dates	`"2025-01-15T12:00:00.000Z"` (ISO 8601)
Text content	Generic descriptive text, e.g., `"This is an example description."`

Views section

transformation.fields: List 8–12 most important field names (order = column order in UI)
display.properties: One entry per overview field with label and format
Available formats: "text", "number", "date", "link", "boolean", "image", "array",

Pick fields that give users the most useful at-a-glance summary of the data.

Phase 3: Generate `key_value_store_schema.json` (if applicable)

Goal : Define key-value store collections if the Actor stores data in the key-value store

Skip this phase if no Actor.setValue() / Actor.set_value() calls were found in Phase 1 (beyond the default INPUT key).

File structure

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "<Descriptive title — what the key-value store contains>",
    "description": "<One sentence describing the stored data>",
    "collections": {
        "<collectionName>": {
            "title": "<Human-readable title>",
            "description": "<What this collection contains>",
            "keyPrefix": "<prefix->"
        }
    }
}

How to identify collections

Group the discovered setValue / set_value calls by key pattern:

Fixed keys (e.g., "RESULTS", "summary") — use "key" (exact match)
Dynamic keys with a prefix (e.g., "screenshot-${id}", f"image-{name}") — use "keyPrefix"

Each group becomes a collection.

Collection properties

Property	Required	Description
`title`	Yes	Shown in UI tabs
`description`	No	Shown in UI tooltips
`key`	Conditional	Exact key for single-key collections (use `key` OR `keyPrefix`, not both)
`keyPrefix`	Conditional	Prefix for multi-key collections (use OR , not both)

Examples

Single file output (e.g., a report):

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Analysis Results",
    "description": "Key-value store containing analysis output",
    "collections": {
        "report": {
            "title": "Report",
            "description": "Final analysis report",
            "key": "REPORT",
            "contentTypes": ["application/json"]
        }
    }
}

Multiple files with prefix (e.g., screenshots):

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Scraped Files",
    "description": "Key-value store containing downloaded files and screenshots",
    "collections": {
        "screenshots": {
            "title": "Screenshots",
            "description": "Page screenshots captured during scraping",
            "keyPrefix": "screenshot-",
            "contentTypes": ["image/png", "image/jpeg"]
        },
        "documents": {
            "title": "Documents",
            "description": "Downloaded document files",
            "keyPrefix": "doc-",
            "contentTypes": ["application/pdf", "text/html"]
        }
    }
}

Phase 4: Generate `output_schema.json`

Goal : Create the output schema that tells Apify Console where to find results

For most Actors that push data to a dataset, this is a minimal file:

{
    "actorOutputSchemaVersion": 1,
    "title": "<Descriptive title — what the Actor returns>",
    "description": "<One sentence describing the output data>",
    "properties": {
        "dataset": {
            "type": "string",
            "title": "Results",
            "description": "Dataset containing all scraped data",
            "template": "{{links.apiDefaultDatasetUrl}}/items"
        }
    }
}

Critical : Each property entry must include "type": "string" — this is an Apify-specific convention. The Apify meta-validator rejects properties without it (and rejects "type": "object" — only "string" is valid here).

If key_value_store_schema.json was generated in Phase 3, add a second property:

"files": {
    "type": "string",
    "title": "Files",
    "description": "Key-value store containing downloaded files",
    "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
}

Available template variables

{{links.apiDefaultDatasetUrl}} — API URL of default dataset
{{links.apiDefaultKeyValueStoreUrl}} — API URL of default key-value store
{{links.publicRunUrl}} — Public run URL
{{links.consoleRunUrl}} — Console run URL
{{links.apiRunUrl}} — API run URL
{{links.containerRunUrl}} — URL of webserver running inside the run
{{run.defaultDatasetId}} — ID of the default dataset
{{run.defaultKeyValueStoreId}} — ID of the default key-value store

Phase 5: Update `actor.json`

Goal : Wire the schema files into the Actor configuration

Actions :

Read the current actor.json

Add or update the storages.dataset reference:

"storages": {
    "dataset": "./dataset_schema.json"
}

If key_value_store_schema.json was generated, add the reference:

"storages": {
    "dataset": "./dataset_schema.json",
    "keyValueStore": "./key_value_store_schema.json"
}

Add or update the output reference:
```
"output": "./output_schema.json"
```
If actor.json had inline or objects (not string paths), migrate their content into the respective schema files and replace the inline objects with file path strings

Phase 6: Review and Validate

Goal : Ensure correctness and completeness

Checklist :

Every output field from the source code is in dataset_schema.json fields.properties — not just the overview view fields but ALL fields the Actor can produce
Every field has "nullable": true
The top-levelfields object has both "additionalProperties": true and "required": []
Every nested object within properties also has "additionalProperties": true and "required": []

Present the generated schemas to the user for review before writing them.

Phase 7: Summary

Goal : Document what was created

Report:

Files created or updated
Number of fields in the dataset schema
Number of collections in the key-value store schema (if generated)
Fields selected for the overview view
Any fields that need user clarification (ambiguous types, unclear nullability)
Suggested next steps (test locally with apify run, verify output tab in Console)

Weekly Installs

245

Repository

apify/agent-skills

GitHub Stars

1.7K

First Seen

4 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode243

github-copilot243

codex243

amp243

cline243

kimi-cli243

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

140,500 周安装

Find all places where data is pushed to the dataset:

JavaScript/TypeScript : Search for Actor.pushData(, dataset.pushData(, Dataset.pushData(
Python : Search for Actor.push_data(, dataset.push_data(, Dataset.push_data(

Find all places where data is stored in the key-value store:

JavaScript/TypeScript : Search for Actor.setValue(, keyValueStore.setValue(, KeyValueStore.setValue(
Python : Search for Actor.set_value(, key_value_store.set_value(, KeyValueStore.set_value(

Find output type definitions — reuse them directly instead of recreating from scratch:

TypeScript : Look for output type interfaces/types (e.g., in src/types/, src/types/output.ts). If an interface or type already defines the output shape, derive the schema fields from it — do not create a parallel definition
Python : Look for TypedDict, dataclass, or Pydantic model definitions. Use the existing field names, types, and docstrings as the source of truth

Check for existing shared schema utilities or helper functions in the codebase that handle schema generation or validation — reuse them rather than creating new logic

If inline storages.dataset or storages.keyValueStore config exists in actor.json, note it for migration

storages.keyValueStore

Every field has a "description" and an "example"

All example values are anonymized

"type" is present on every field that has "nullable"

Views list 8–12 most useful fields with correct display formats

output_schema.json has "type": "string" on every property

If key-value store is used: key_value_store_schema.json has collections matching all setValue/set_value calls

If key-value store is used: each collection uses either key or keyPrefix (not both)

actor.json references all generated schema files

Schema field names match the actual keys in the code (camelCase/snake_case consistency)

If existing schemas were found in the repo, the new schema follows their conventions (description style, example format, view structure)

Schema fields are derived from existing type definitions (interfaces, TypedDicts, dataclasses) where available — no duplicated or divergent field definitions

Apify Actor 输出模式生成工具 - 自动化创建 dataset_schema.json 与 output_schema.json

🇨🇳中文介绍

生成 Actor 输出模式

核心原则

阶段 1：探索 Actor 结构

相关 Skills

阶段 2：生成 dataset_schema.json

文件结构

与现有模式保持一致

硬性规则（无例外）

字段类型模式

匿名化示例值

视图部分

阶段 3：生成 key_value_store_schema.json（如果适用）

文件结构

如何识别集合

集合属性

示例

阶段 4：生成 output_schema.json

可用的模板变量

阶段 5：更新 actor.json

阶段 6：审查和验证

阶段 7：总结

🇺🇸English

Generate Actor Output Schema

Core Principles

Phase 1: Discover Actor Structure

Phase 2: Generate dataset_schema.json

File structure

Consistency with existing schemas

Hard rules (no exceptions)

Field type patterns

Anonymized example values

Views section

Phase 3: Generate key_value_store_schema.json (if applicable)

File structure

How to identify collections

Collection properties

Examples

Phase 4: Generate output_schema.json

Available template variables

Phase 5: Update actor.json

Phase 6: Review and Validate

Phase 7: Summary

最新 Skills

阶段 2：生成 `dataset_schema.json`

阶段 3：生成 `key_value_store_schema.json`（如果适用）

阶段 4：生成 `output_schema.json`

阶段 5：更新 `actor.json`

Phase 2: Generate `dataset_schema.json`

Phase 3: Generate `key_value_store_schema.json` (if applicable)

Phase 4: Generate `output_schema.json`

Phase 5: Update `actor.json`