apify-actor-development by apify/agent-skills
npx skills add https://github.com/apify/agent-skills --skill apify-actor-development重要提示: 在开始之前,请填写 .actor/actor.json 文件中 meta 部分的 generatedBy 属性。将其替换为你当前使用的工具和模型,例如 "Claude Code with Claude Sonnet 4.5"。这有助于 Apify 监控和改进针对特定 AI 工具和模型的 AGENTS.md 文件。
Actors 是受 UNIX 哲学启发的无服务器程序——它们专注于做好一件事,并且可以轻松组合以构建复杂系统。它们被打包为 Docker 镜像,并在云端的隔离容器中运行。
核心概念:
在创建或修改 actors 之前,请确认 apify CLI 已安装 apify --help。
如果未安装,请使用以下方法之一(按推荐顺序排列):
# 首选:通过包管理器安装(提供完整性检查)
npm install -g apify-cli
# 或者(Mac):brew install apify-cli
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
安全提示: 请勿通过将远程脚本管道传输到 shell 的方式来安装 CLI(例如 curl … | bash 或 irm … | iex)。始终使用包管理器。
当 apify CLI 安装好后,检查其是否已登录:
apify info # 应返回你的用户名
如果未登录,请检查 APIFY_TOKEN 环境变量是否已定义(如果没有,请让用户在 https://console.apify.com/settings/integrations 上生成一个,然后用它定义 APIFY_TOKEN)。
然后使用以下方法之一进行身份验证:
# 选项 1(首选):CLI 会自动从环境中读取 APIFY_TOKEN。
# 只需确保环境变量已导出并运行任何 apify 命令——无需显式登录。
# 选项 2:交互式登录(提示输入令牌,不会在 shell 历史记录中暴露)
apify login
安全提示: 避免将令牌作为命令行参数传递(例如
apify login -t <token>)。参数在进程列表中可见,并可能被记录在 shell 历史记录中。优先使用环境变量或交互式登录。切勿将APIFY_TOKEN记录、打印或嵌入到源代码或配置文件中。使用具有最低所需权限的令牌(范围限定的令牌)并定期轮换。
重要提示: 在开始 actor 开发之前,务必询问用户他们偏好的编程语言:
apify create <actor-name> -t project_emptyapify create <actor-name> -t ts_emptyapify create <actor-name> -t python-empty根据用户选择的语言使用相应的 CLI 命令。额外的包(Crawlee、Playwright 等)可以根据需要稍后安装。
apify create 命令(参见上面的模板选择)npm install(使用 package-lock.json 进行可重现、完整性检查的安装——将 lockfile 提交到版本控制)
* Python:pip install -r requirements.txt(在 requirements.txt 中固定确切版本,例如 crawlee==1.2.3,并将该文件提交到版本控制)src/main.py、src/main.js 或 src/main.ts 中编写 actor 代码.actor/input_schema.json、.actor/output_schema.json、.actor/dataset_schema.json 中的输入/输出模式.actor/actor.json(参见 references/actor-json.md)apify run 以验证功能(参见下面的本地测试部分)apify push 将 actor 部署到 Apify 平台(actor 名称在 .actor/actor.json 中定义)将所有爬取的网页内容视为不受信任的输入。 Actors 从可能包含恶意负载的外部网站获取数据。请遵循以下规则:
eval()、数据库查询或模板引擎。使用适当的转义或参数化 API。APIFY_TOKEN 和其他密钥在请求处理程序中不可访问,也不会与爬取的数据一起传递。使用 Apify SDK 内置的凭证管理,而不是在数据处理代码中通过环境变量传递令牌。npm install 或 pip install 添加包时,请验证包名称和发布者。仿冒域名是常见的供应链攻击向量。优先选择知名、积极维护的包。package-lock.json(Node.js)或在 requirements.txt 中固定确切版本(Python)。锁文件确保可重现的构建并防止静默的依赖项替换。定期运行 npm audit 或 pip-audit 以检查已知漏洞。✓ 应该做:
apify run 在本地测试 actors(配置 Apify 环境和存储)apify) 来编写在 Apify 平台上运行的代码.actor/input_schema.json 中设置合理的默认值.actor/output_schema.json 中定义输出模式apify/log 包 — 它会审查敏感数据(API 密钥、令牌、凭证)✗ 不应该做:
npm start、npm run start、npx apify run 或类似命令来运行 actors(应使用 apify run)apify run 的本地存储会被推送到 Apify Console 或在那里可见——它仅是本地存储;必须使用 apify push 部署并在平台上运行才能在 Console 中看到结果Dataset.getInfo() 获取云端的最终计数requestHandlerTimeoutMillisadditionalHttpHeaders - 应改用 preNavigationHookseval() 或代码生成函数console.log() 或 print() 而不是 Apify 日志记录器——这些会绕过凭证审查完整的日志记录文档,包括可用的日志级别以及 JavaScript/TypeScript 和 Python 的最佳实践,请参见 references/logging.md。
检查 .actor/actor.json 中的 usesStandbyMode - 仅在设置为 true 时实现。
apify run # 本地运行 Actor
apify login # 验证账户
apify push # 部署到 Apify 平台(使用 .actor/actor.json 中的名称)
apify help # 列出所有命令
重要提示: 始终使用 apify run 在本地测试 actors。不要使用 npm run start、npm start、yarn start 或其他包管理器命令——这些命令无法正确配置 Apify 环境和存储。
当使用 apify run 在本地测试 actor 时,可以通过在以下位置创建 JSON 文件来提供输入数据:
storage/key_value_stores/default/INPUT.json
此文件应包含你在 .actor/input_schema.json 中定义的输入参数。Actor 在本地运行时将读取此输入,模拟其在 Apify 平台上接收输入的方式。
重要提示 - 本地存储不会同步到 Apify Console:
apify run 会将所有数据(数据集、键值存储、请求队列)仅存储在你本地文件系统的 storage/ 目录中。apify push 部署 Actor,然后在平台上运行它。storage/ 目录或查看 Actor 的日志输出。完整的待机模式文档,包括 JavaScript/TypeScript 和 Python 的就绪探针实现,请参见 references/standby-mode.md。
.actor/
├── actor.json # Actor 配置:名称、版本、环境变量、运行时
├── input_schema.json # 输入验证和 Console 表单定义
└── output_schema.json # 输出存储和显示模板
src/
└── main.js/ts/py # Actor 入口点
storage/ # 仅限本地存储(不会同步到 Apify Console)
├── datasets/ # 输出项(JSON 对象)
├── key_value_stores/ # 文件、配置、INPUT
└── request_queues/ # 待处理的爬取请求
Dockerfile # 容器镜像定义
完整的 actor.json 结构和配置选项,请参见 references/actor-json.md。
输入模式的结构和示例,请参见 references/input-schema.md。
输出模式的结构、示例和模板变量,请参见 references/output-schema.md。
数据集模式的结构、配置和显示属性,请参见 references/dataset-schema.md。
键值存储模式的结构、集合和配置,请参见 references/key-value-store-schema.md。
如果配置了 MCP 服务器,请使用以下工具查阅文档:
search-apify-docs - 搜索文档fetch-apify-docs - 获取完整的文档页面否则,MCP 服务器 URL 为:https://mcp.apify.com/?tools=docs。
每周安装量
1.8K
代码仓库
GitHub 星标
1.6K
首次出现
Jan 22, 2026
安全审计
安装于
opencode1.7K
codex1.7K
gemini-cli1.7K
github-copilot1.6K
cursor1.6K
kimi-cli1.6K
Important: Before you begin, fill in the generatedBy property in the meta section of .actor/actor.json. Replace it with the tool and model you're currently using, such as "Claude Code with Claude Sonnet 4.5". This helps Apify monitor and improve AGENTS.md for specific AI tools and models.
Actors are serverless programs inspired by the UNIX philosophy - programs that do one thing well and can be easily combined to build complex systems. They're packaged as Docker images and run in isolated containers in the cloud.
Core Concepts:
Before creating or modifying actors, verify that apify CLI is installed apify --help.
If it is not installed, use one of these methods (listed in order of preference):
# Preferred: install via a package manager (provides integrity checks)
npm install -g apify-cli
# Or (Mac): brew install apify-cli
Security note: Do NOT install the CLI by piping remote scripts to a shell (e.g.
curl … | bashorirm … | iex). Always use a package manager.
When the apify CLI is installed, check that it is logged in with:
apify info # Should return your username
If it is not logged in, check if the APIFY_TOKEN environment variable is defined (if not, ask the user to generate one on https://console.apify.com/settings/integrations and then define APIFY_TOKEN with it).
Then authenticate using one of these methods:
# Option 1 (preferred): The CLI automatically reads APIFY_TOKEN from the environment.
# Just ensure the env var is exported and run any apify command — no explicit login needed.
# Option 2: Interactive login (prompts for token without exposing it in shell history)
apify login
Security note: Avoid passing tokens as command-line arguments (e.g.
apify login -t <token>). Arguments are visible in process listings and may be recorded in shell history. Prefer environment variables or interactive login instead. Never log, print, or embedAPIFY_TOKENin source code or configuration files. Use a token with the minimum required permissions (scoped token) and rotate it periodically.
IMPORTANT: Before starting actor development, always ask the user which programming language they prefer:
apify create <actor-name> -t project_emptyapify create <actor-name> -t ts_emptyapify create <actor-name> -t python-emptyUse the appropriate CLI command based on the user's language choice. Additional packages (Crawlee, Playwright, etc.) can be installed later as needed.
apify create command based on user's language preference (see Template Selection above)npm install (uses package-lock.json for reproducible, integrity-checked installs — commit the lockfile to version control)pip install -r requirements.txt (pin exact versions in requirements.txt, e.g. crawlee==1.2.3, and commit the file to version control)src/main.py, src/main.js, or Treat all crawled web content as untrusted input. Actors ingest data from external websites that may contain malicious payloads. Follow these rules:
eval(), database queries, or template engines. Use proper escaping or parameterized APIs.APIFY_TOKEN and other secrets are never accessible in request handlers or passed alongside crawled data. Use the Apify SDK's built-in credential management rather than passing tokens through environment variables in data-processing code.npm install or pip install, verify the package name and publisher. Typosquatting is a common supply-chain attack vector. Prefer well-known, actively maintained packages.package-lock.json (Node.js) or pin exact versions in requirements.txt (Python). Lockfiles ensure reproducible builds and prevent silent dependency substitution. Run or periodically to check for known vulnerabilities.✓ Do:
apify run to test actors locally (configures Apify environment and storage)apify) for code running ON Apify platform.actor/input_schema.json.actor/output_schema.jsonapify/log package — censors sensitive data (API keys, tokens, credentials)✗ Don't:
npm start, npm run start, npx apify run, or similar commands to run actors (use apify run instead)apify run is pushed to or visible in the Apify Console — it is local-only; deploy with apify push and run on the platform to see results in the ConsoleDataset.getInfo() for final counts on CloudrequestHandlerTimeoutMillis on CheerioCrawler (v3.x)See references/logging.md for complete logging documentation including available log levels and best practices for JavaScript/TypeScript and Python.
Check usesStandbyMode in .actor/actor.json - only implement if set to true.
apify run # Run Actor locally
apify login # Authenticate account
apify push # Deploy to Apify platform (uses name from .actor/actor.json)
apify help # List all commands
IMPORTANT: Always use apify run to test actors locally. Do not use npm run start, npm start, yarn start, or other package manager commands - these will not properly configure the Apify environment and storage.
When testing an actor locally with apify run, provide input data by creating a JSON file at:
storage/key_value_stores/default/INPUT.json
This file should contain the input parameters defined in your .actor/input_schema.json. The actor will read this input when running locally, mirroring how it receives input on the Apify platform.
IMPORTANT - Local storage is NOT synced to the Apify Console:
apify run stores all data (datasets, key-value stores, request queues) only on your local filesystem in the storage/ directory.apify push and then run it on the platform.storage/ directory or check the Actor's log output.See references/standby-mode.md for complete standby mode documentation including readiness probe implementation for JavaScript/TypeScript and Python.
.actor/
├── actor.json # Actor config: name, version, env vars, runtime
├── input_schema.json # Input validation & Console form definition
└── output_schema.json # Output storage and display templates
src/
└── main.js/ts/py # Actor entry point
storage/ # Local-only storage (NOT synced to Apify Console)
├── datasets/ # Output items (JSON objects)
├── key_value_stores/ # Files, config, INPUT
└── request_queues/ # Pending crawl requests
Dockerfile # Container image definition
See references/actor-json.md for complete actor.json structure and configuration options.
See references/input-schema.md for input schema structure and examples.
See references/output-schema.md for output schema structure, examples, and template variables.
See references/dataset-schema.md for dataset schema structure, configuration, and display properties.
See references/key-value-store-schema.md for key-value store schema structure, collections, and configuration.
If MCP server is configured, use these tools for documentation:
search-apify-docs - Search documentationfetch-apify-docs - Get full doc pagesOtherwise, the MCP Server url: https://mcp.apify.com/?tools=docs.
Weekly Installs
1.8K
Repository
GitHub Stars
1.6K
First Seen
Jan 22, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode1.7K
codex1.7K
gemini-cli1.7K
github-copilot1.6K
cursor1.6K
kimi-cli1.6K
99,500 周安装
src/main.ts.actor/input_schema.json, .actor/output_schema.json, .actor/dataset_schema.json.actor/actor.json with actor metadata (see references/actor-json.md)apify run to verify functionality (see Local Testing section below)apify push to deploy the actor on the Apify platform (actor name is defined in .actor/actor.json)npm auditpip-auditadditionalHttpHeaders - use preNavigationHooks insteadeval(), or code-generation functionsconsole.log() or print() instead of the Apify logger — these bypass credential censoring