Web Reader 网页读取技能 - 使用 z-ai-web-dev-sdk 实现内容提取与处理

web-reader by answerzhao/agent-skills

1,500 周安装量

26 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/answerzhao/agent-skills --skill web-reader

AI/机器学习开发数据分析

🇨🇳中文介绍

Web Reader Skill

此技能指导使用 z-ai-web-dev-sdk 包实现网页读取和内容提取功能，使应用程序能够以编程方式获取和处理网页内容。

技能路径

技能位置 : {project_path}/skills/web-reader

此技能位于您项目的上述路径中。

参考脚本 : 示例测试脚本位于 {技能位置}/scripts/ 目录中，用于快速测试和参考。请参阅 {技能位置}/scripts/web-reader.ts 以获取工作示例。

概述

Web Reader 允许您构建能够从网页提取内容、检索文章元数据和处理 HTML 内容的应用程序。该 API 自动处理内容提取，从任何网页 URL 提供干净、结构化的数据。

重要提示 : z-ai-web-dev-sdk 必须仅在后端代码中使用。切勿在客户端代码中使用它。

先决条件

z-ai-web-dev-sdk 包已安装。请按照以下示例所示导入它。

CLI 用法（适用于简单任务）

对于简单的网页内容提取，您可以使用 z-ai CLI 而无需编写代码。这非常适合快速内容抓取、测试 URL 或简单的自动化任务。

基本页面读取

# 从网页提取内容
z-ai function --name "page_reader" --args '{"url": "https://example.com"}'

# 使用短选项
z-ai function -n page_reader -a '{"url": "https://www.example.com/article"}'

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

何时使用 CLI 与 SDK

使用 CLI 适用于：

快速内容提取
测试 URL 可访问性
简单的网页抓取任务
一次性内容检索

使用 SDK 适用于：

具有自定义逻辑的批量 URL 处理
与 Web 应用程序集成
复杂的内容处理管道
具有错误处理的生产应用程序

Web Reader 使用 page_reader 函数来：

获取网页内容
提取主要文章内容和元数据
解析和清理 HTML
返回结构化数据，包括标题、内容和发布时间

基本网页读取实现

import ZAI from 'z-ai-web-dev-sdk';

async function readWebPage(url) {
  try {
    const zai = await ZAI.create();

    const result = await zai.functions.invoke('page_reader', {
      url: url
    });

    console.log('Title:', result.data.title);
    console.log('URL:', result.data.url);
    console.log('Published:', result.data.publishedTime);
    console.log('HTML Content:', result.data.html);
    console.log('Tokens Used:', result.data.usage.tokens);

    return result.data;
  } catch (error) {
    console.error('Page reading failed:', error.message);
    throw error;
  }
}

// 用法
const pageData = await readWebPage('https://example.com/article');
console.log('Page title:', pageData.title);

仅提取文章文本

import ZAI from 'z-ai-web-dev-sdk';

async function extractArticleText(url) {
  const zai = await ZAI.create();

  const result = await zai.functions.invoke('page_reader', {
    url: url
  });

  // 将 HTML 转换为纯文本（基本方法）
  const plainText = result.data.html
    .replace(/<[^>]*>/g, ' ')
    .replace(/\s+/g, ' ')
    .trim();

  return {
    title: result.data.title,
    text: plainText,
    url: result.data.url,
    publishedTime: result.data.publishedTime
  };
}

// 用法
const article = await extractArticleText('https://news.example.com/story');
console.log(article.title);
console.log(article.text.substring(0, 200) + '...');

import ZAI from 'z-ai-web-dev-sdk';

async function readMultiplePages(urls) {
  const zai = await ZAI.create();
  const results = [];

  for (const url of urls) {
    try {
      const result = await zai.functions.invoke('page_reader', {
        url: url
      });

      results.push({
        url: url,
        success: true,
        data: result.data
      });
    } catch (error) {
      results.push({
        url: url,
        success: false,
        error: error.message
      });
    }
  }

  return results;
}

// 用法
const urls = [
  'https://example.com/article1',
  'https://example.com/article2',
  'https://example.com/article3'
];

const pages = await readMultiplePages(urls);
pages.forEach(page => {
  if (page.success) {
    console.log(`✓ ${page.data.title}`);
  } else {
    console.log(`✗ ${page.url}: ${page.error}`);
  }
});

网页内容分析器

import ZAI from 'z-ai-web-dev-sdk';

class WebContentAnalyzer {
  constructor() {
    this.cache = new Map();
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  async readPage(url, useCache = true) {
    // 检查缓存
    if (useCache && this.cache.has(url)) {
      console.log('Returning cached result for:', url);
      return this.cache.get(url);
    }

    // 获取新内容
    const result = await this.zai.functions.invoke('page_reader', {
      url: url
    });

    // 缓存结果
    if (useCache) {
      this.cache.set(url, result.data);
    }

    return result.data;
  }

  async getPageMetadata(url) {
    const data = await this.readPage(url);

    return {
      title: data.title,
      url: data.url,
      publishedTime: data.publishedTime,
      contentLength: data.html.length,
      wordCount: this.estimateWordCount(data.html)
    };
  }

  estimateWordCount(html) {
    const text = html.replace(/<[^>]*>/g, ' ');
    const words = text.split(/\s+/).filter(word => word.length > 0);
    return words.length;
  }

  async comparePages(url1, url2) {
    const [page1, page2] = await Promise.all([
      this.readPage(url1),
      this.readPage(url2)
    ]);

    return {
      page1: {
        title: page1.title,
        wordCount: this.estimateWordCount(page1.html),
        published: page1.publishedTime
      },
      page2: {
        title: page2.title,
        wordCount: this.estimateWordCount(page2.html),
        published: page2.publishedTime
      }
    };
  }

  clearCache() {
    this.cache.clear();
  }
}

// 用法
const analyzer = new WebContentAnalyzer();
await analyzer.initialize();

const metadata = await analyzer.getPageMetadata('https://example.com/article');
console.log('Article Metadata:', metadata);

const comparison = await analyzer.comparePages(
  'https://example.com/article1',
  'https://example.com/article2'
);
console.log('Comparison:', comparison);

import ZAI from 'z-ai-web-dev-sdk';

class FeedReader {
  constructor() {
    this.articles = [];
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  async fetchArticlesFromUrls(urls) {
    const articles = [];

    for (const url of urls) {
      try {
        const result = await this.zai.functions.invoke('page_reader', {
          url: url
        });

        articles.push({
          title: result.data.title,
          url: result.data.url,
          publishedTime: result.data.publishedTime,
          content: result.data.html,
          fetchedAt: new Date().toISOString()
        });

        console.log(`Fetched: ${result.data.title}`);
      } catch (error) {
        console.error(`Failed to fetch ${url}:`, error.message);
      }
    }

    this.articles = articles;
    return articles;
  }

  getRecentArticles(limit = 10) {
    return this.articles
      .sort((a, b) => {
        const dateA = new Date(a.publishedTime || a.fetchedAt);
        const dateB = new Date(b.publishedTime || b.fetchedAt);
        return dateB - dateA;
      })
      .slice(0, limit);
  }

  searchArticles(keyword) {
    return this.articles.filter(article => {
      const searchText = `${article.title} ${article.content}`.toLowerCase();
      return searchText.includes(keyword.toLowerCase());
    });
  }
}

// 用法
const reader = new FeedReader();
await reader.initialize();

const feedUrls = [
  'https://example.com/article1',
  'https://example.com/article2',
  'https://example.com/article3'
];

await reader.fetchArticlesFromUrls(feedUrls);
const recent = reader.getRecentArticles(5);
console.log('Recent articles:', recent.map(a => a.title));

import ZAI from 'z-ai-web-dev-sdk';

async function aggregateContent(urls, options = {}) {
  const zai = await ZAI.create();
  const aggregated = {
    sources: [],
    totalWords: 0,
    aggregatedAt: new Date().toISOString()
  };

  for (const url of urls) {
    try {
      const result = await zai.functions.invoke('page_reader', {
        url: url
      });

      const text = result.data.html.replace(/<[^>]*>/g, ' ');
      const wordCount = text.split(/\s+/).filter(w => w.length > 0).length;

      aggregated.sources.push({
        title: result.data.title,
        url: result.data.url,
        publishedTime: result.data.publishedTime,
        wordCount: wordCount,
        excerpt: text.substring(0, 200).trim() + '...'
      });

      aggregated.totalWords += wordCount;

      if (options.delay) {
        await new Promise(resolve => setTimeout(resolve, options.delay));
      }
    } catch (error) {
      console.error(`Failed to fetch ${url}:`, error.message);
    }
  }

  return aggregated;
}

// 用法
const sources = [
  'https://example.com/news1',
  'https://example.com/news2',
  'https://example.com/news3'
];

const aggregated = await aggregateContent(sources, { delay: 1000 });
console.log(`Aggregated ${aggregated.sources.length} sources`);
console.log(`Total words: ${aggregated.totalWords}`);

import ZAI from 'z-ai-web-dev-sdk';

class ScrapingPipeline {
  constructor() {
    this.processors = [];
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  addProcessor(name, processorFn) {
    this.processors.push({ name, fn: processorFn });
  }

  async scrape(url) {
    // 获取页面
    const result = await this.zai.functions.invoke('page_reader', {
      url: url
    });

    let data = {
      raw: result.data,
      processed: {}
    };

    // 运行处理器
    for (const processor of this.processors) {
      try {
        data.processed[processor.name] = await processor.fn(data.raw);
        console.log(`✓ Processed with ${processor.name}`);
      } catch (error) {
        console.error(`✗ Failed ${processor.name}:`, error.message);
        data.processed[processor.name] = null;
      }
    }

    return data;
  }
}

// 处理器函数
function extractLinks(pageData) {
  const linkRegex = /href=["'](https?:\/\/[^"']+)["']/g;
  const links = [];
  let match;

  while ((match = linkRegex.exec(pageData.html)) !== null) {
    links.push(match[1]);
  }

  return [...new Set(links)]; // 移除重复项
}

function extractImages(pageData) {
  const imgRegex = /src=["'](https?:\/\/[^"']+\.(jpg|jpeg|png|gif|webp))["']/gi;
  const images = [];
  let match;

  while ((match = imgRegex.exec(pageData.html)) !== null) {
    images.push(match[1]);
  }

  return [...new Set(images)];
}

function extractPlainText(pageData) {
  return pageData.html
    .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
    .replace(/<style[^>]*>[\s\S]*?<\/style>/gi, '')
    .replace(/<[^>]*>/g, ' ')
    .replace(/\s+/g, ' ')
    .trim();
}

// 用法
const pipeline = new ScrapingPipeline();
await pipeline.initialize();

pipeline.addProcessor('links', extractLinks);
pipeline.addProcessor('images', extractImages);
pipeline.addProcessor('plainText', extractPlainText);

const result = await pipeline.scrape('https://example.com/article');
console.log('Links found:', result.processed.links.length);
console.log('Images found:', result.processed.images.length);
console.log('Text length:', result.processed.plainText.length);

{
  code: 200,
  status: 200,
  data: {
    title: "Article Title",
    url: "https://example.com/article",
    html: "<div>Article content...</div>",
    publishedTime: "2025-01-15T10:30:00Z",
    usage: {
      tokens: 1500
    }
  },
  meta: {
    usage: {
      tokens: 1500
    }
  }
}

字段	类型	描述
`code`	数字	响应状态码
`status`	数字	HTTP 状态码
`data.title`	字符串	页面标题
`data.url`	字符串	页面 URL
`data.html`	字符串	提取的 HTML 内容
`data.publishedTime`	字符串	发布日期（可选）
`data.usage.tokens`	数字	处理使用的令牌数
`meta.usage.tokens`	数字	使用的总令牌数

async function safeReadPage(url) {
  try {
    const zai = await ZAI.create();

    // 验证 URL
    if (!url || !url.startsWith('http')) {
      throw new Error('Invalid URL format');
    }

    const result = await zai.functions.invoke('page_reader', {
      url: url
    });

    // 检查响应状态
    if (result.code !== 200) {
      throw new Error(`Failed to fetch page: ${result.code}`);
    }

    // 验证基本数据
    if (!result.data.html || !result.data.title) {
      throw new Error('Incomplete page data received');
    }

    return {
      success: true,
      data: result.data
    };
  } catch (error) {
    console.error('Page reading error:', error);
    return {
      success: false,
      error: error.message
    };
  }
}

class RateLimitedReader {
  constructor(requestsPerMinute = 10) {
    this.requestsPerMinute = requestsPerMinute;
    this.requestTimes = [];
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  async readPage(url) {
    await this.waitForRateLimit();

    const result = await this.zai.functions.invoke('page_reader', {
      url: url
    });

    this.requestTimes.push(Date.now());
    return result.data;
  }

  async waitForRateLimit() {
    const now = Date.now();
    const oneMinuteAgo = now - 60000;

    // 移除旧时间戳
    this.requestTimes = this.requestTimes.filter(time => time > oneMinuteAgo);

    // 检查是否需要等待
    if (this.requestTimes.length >= this.requestsPerMinute) {
      const oldestRequest = this.requestTimes[0];
      const waitTime = 60000 - (now - oldestRequest);

      if (waitTime > 0) {
        console.log(`Rate limit reached. Waiting ${waitTime}ms...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
      }
    }
  }
}

// 用法
const reader = new RateLimitedReader(10); // 每分钟 10 个请求
await reader.initialize();

const urls = ['https://example.com/1', 'https://example.com/2'];
for (const url of urls) {
  const data = await reader.readPage(url);
  console.log('Fetched:', data.title);
}

import ZAI from 'z-ai-web-dev-sdk';

class CachedWebReader {
  constructor(cacheDuration = 3600000) { // 默认 1 小时
    this.cache = new Map();
    this.cacheDuration = cacheDuration;
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  async readPage(url, forceRefresh = false) {
    const cacheKey = url;
    const cached = this.cache.get(cacheKey);

    // 如果缓存有效且不强制刷新，则返回缓存
    if (cached && !forceRefresh) {
      const age = Date.now() - cached.timestamp;
      if (age < this.cacheDuration) {
        console.log('Returning cached content for:', url);
        return cached.data;
      }
    }

    // 获取新内容
    const result = await this.zai.functions.invoke('page_reader', {
      url: url
    });

    // 更新缓存
    this.cache.set(cacheKey, {
      data: result.data,
      timestamp: Date.now()
    });

    return result.data;
  }

  clearCache() {
    this.cache.clear();
  }

  getCacheStats() {
    return {
      size: this.cache.size,
      entries: Array.from(this.cache.keys())
    };
  }
}

// 用法
const reader = new CachedWebReader(3600000); // 1 小时缓存
await reader.initialize();

const data1 = await reader.readPage('https://example.com'); // 新获取
const data2 = await reader.readPage('https://example.com'); // 从缓存
const data3 = await reader.readPage('https://example.com', true); // 强制刷新

import ZAI from 'z-ai-web-dev-sdk';

async function readPagesInParallel(urls, concurrency = 3) {
  const zai = await ZAI.create();
  const results = [];
  
  // 分批处理
  for (let i = 0; i < urls.length; i += concurrency) {
    const batch = urls.slice(i, i + concurrency);
    
    const batchResults = await Promise.allSettled(
      batch.map(url =>
        zai.functions.invoke('page_reader', { url })
          .then(result => ({
            url: url,
            success: true,
            data: result.data
          }))
          .catch(error => ({
            url: url,
            success: false,
            error: error.message
          }))
      )
    );

    results.push(...batchResults.map(r => r.value));
    console.log(`Completed batch ${Math.floor(i / concurrency) + 1}`);
  }

  return results;
}

// 用法
const urls = [
  'https://example.com/1',
  'https://example.com/2',
  'https://example.com/3',
  'https://example.com/4',
  'https://example.com/5'
];

const results = await readPagesInParallel(urls, 2); // 2 个并发请求
results.forEach(result => {
  if (result.success) {
    console.log(`✓ ${result.data.title}`);
  } else {
    console.log(`✗ ${result.url}: ${result.error}`);
  }
});

import ZAI from 'z-ai-web-dev-sdk';

class ContentProcessor {
  static extractMainContent(html) {
    // 移除脚本、样式和注释
    let content = html
      .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
      .replace(/<style[^>]*>[\s\S]*?<\/style>/gi, '')
      .replace(/<!--[\s\S]*?-->/g, '');

    return content;
  }

  static htmlToPlainText(html) {
    return html
      .replace(/<br\s*\/?>/gi, '\n')
      .replace(/<\/p>/gi, '\n\n')
      .replace(/<[^>]*>/g, '')
      .replace(/&nbsp;/g, ' ')
      .replace(/&amp;/g, '&')
      .replace(/&lt;/g, '<')
      .replace(/&gt;/g, '>')
      .replace(/&quot;/g, '"')
      .replace(/\s+/g, ' ')
      .trim();
  }

  static extractMetadata(html) {
    const metadata = {};

    // 提取元描述
    const descMatch = html.match(/<meta\s+name=["']description["']\s+content=["']([^"']+)["']/i);
    if (descMatch) metadata.description = descMatch[1];

    // 提取关键词
    const keywordsMatch = html.match(/<meta\s+name=["']keywords["']\s+content=["']([^"']+)["']/i);
    if (keywordsMatch) metadata.keywords = keywordsMatch[1].split(',').map(k => k.trim());

    // 提取作者
    const authorMatch = html.match(/<meta\s+name=["']author["']\s+content=["']([^"']+)["']/i);
    if (authorMatch) metadata.author = authorMatch[1];

    return metadata;
  }
}

// 用法
async function processWebPage(url) {
  const zai = await ZAI.create();
  const result = await zai.functions.invoke('page_reader', { url });

  return {
    title: result.data.title,
    url: result.data.url,
    mainContent: ContentProcessor.extractMainContent(result.data.html),
    plainText: ContentProcessor.htmlToPlainText(result.data.html),
    metadata: ContentProcessor.extractMetadata(result.data.html),
    publishedTime: result.data.publishedTime
  };
}

const processed = await processWebPage('https://example.com/article');
console.log('Processed content:', processed.title);

新闻聚合 : 从多个来源收集和聚合新闻文章
内容监控 : 跟踪特定网页的变化
研究工具 : 从学术或参考网站提取信息
价格跟踪 : 监控产品页面的价格变化
SEO 分析 : 提取页面元数据和内容以用于 SEO 目的
存档创建 : 创建网页内容的本地副本
内容策展 : 按主题收集和组织网页内容
竞争情报 : 监控竞争对手网站的更新

Express.js API 端点

import express from 'express';
import ZAI from 'z-ai-web-dev-sdk';

const app = express();
app.use(express.json());

let zaiInstance;

async function initZAI() {
  zaiInstance = await ZAI.create();
}

app.post('/api/read-page', async (req, res) => {
  try {
    const { url } = req.body;

    if (!url) {
      return res.status(400).json({ 
        error: 'URL is required' 
      });
    }

    const result = await zaiInstance.functions.invoke('page_reader', {
      url: url
    });

    res.json({
      success: true,
      data: {
        title: result.data.title,
        url: result.data.url,
        content: result.data.html,
        publishedTime: result.data.publishedTime,
        tokensUsed: result.data.usage.tokens
      }
    });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error.message
    });
  }
});

app.post('/api/read-multiple', async (req, res) => {
  try {
    const { urls } = req.body;

    if (!urls || !Array.isArray(urls)) {
      return res.status(400).json({ 
        error: 'URLs array is required' 
      });
    }

    const results = await Promise.allSettled(
      urls.map(url =>
        zaiInstance.functions.invoke('page_reader', { url })
          .then(result => ({
            url: url,
            success: true,
            data: result.data
          }))
          .catch(error => ({
            url: url,
            success: false,
            error: error.message
          }))
      )
    );

    res.json({
      success: true,
      results: results.map(r => r.value)
    });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error.message
    });
  }
});

initZAI().then(() => {
  app.listen(3000, () => {
    console.log('Web reader API running on port 3000');
  });
});

计划内容获取器

import ZAI from 'z-ai-web-dev-sdk';
import cron from 'node-cron';

class ScheduledFetcher {
  constructor() {
    this.urls = [];
    this.results = [];
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  addUrl(url, schedule) {
    this.urls.push({ url, schedule });
  }

  async fetchContent(url) {
    try {
      const result = await this.zai.functions.invoke('page_reader', {
        url: url
      });

      return {
        url: url,
        success: true,
        title: result.data.title,
        content: result.data.html,
        fetchedAt: new Date().toISOString()
      };
    } catch (error) {
      return {
        url: url,
        success: false,
        error: error.message,
        fetchedAt: new Date().toISOString()
      };
    }
  }

  startScheduledFetch(url, schedule) {
    cron.schedule(schedule, async () => {
      console.log(`Fetching ${url}...`);
      const result = await this.fetchContent(url);
      this.results.push(result);
      
      // 仅保留最后 100 个结果
      if (this.results.length > 100) {
        this.results = this.results.slice(-100);
      }
      
      console.log(`Fetched: ${result.success ? result.title : result.error}`);
    });
  }

  start() {
    for (const { url, schedule } of this.urls) {
      this.startScheduledFetch(url, schedule);
    }
  }

  getResults() {
    return this.results;
  }
}

// 用法
const fetcher = new ScheduledFetcher();
await fetcher.initialize();

// 每小时获取一次
fetcher.addUrl('https://example.com/news', '0 * * * *');

// 每天午夜获取一次
fetcher.addUrl('https://example.com/daily', '0 0 * * *');

fetcher.start();
console.log('Scheduled fetching started');

问题 : "SDK 必须用于后端"

解决方案 : 确保 z-ai-web-dev-sdk 仅在服务器端代码中导入和使用

问题 : 无法获取页面（404、403 等）

解决方案 : 验证 URL 是否可访问且不在身份验证/付费墙后面

问题 : 内容不完整或缺失

解决方案 : 某些页面可能包含需要 JavaScript 的动态内容。阅读器提取静态 HTML 内容。

问题 : 令牌使用量高

解决方案 : 令牌使用量取决于页面大小。考虑缓存频繁访问的页面。

问题 : 响应时间慢

解决方案 : 实施缓存，对多个 URL 使用并行处理，并考虑速率限制

问题 : HTML 内容为空

解决方案 : 检查页面是否需要身份验证或具有反抓取措施。验证 URL 是否正确。

实施缓存 : 缓存频繁访问的页面以减少 API 调用
使用并行处理 : 并发获取多个页面（带速率限制）
高效处理内容 : 仅从 HTML 中提取所需信息
设置超时 : 为页面获取实施合理的超时
监控令牌使用情况 : 跟踪使用情况以优化成本
批量操作 : 尽可能将多个 URL 获取分组

在处理前验证所有 URL
在显示前清理提取的 HTML 内容
实施速率限制以防止滥用
切勿在客户端代码中暴露 SDK 凭据
尊重 robots.txt 和网站服务条款
根据隐私法规处理用户数据
对失败的请求实施适当的错误处理

始终仅在后端代码中使用 z-ai-web-dev-sdk
SDK 已安装 - 请按照示例所示导入
实施适当的错误处理以构建健壮的应用程序
使用缓存以提高性能并降低成本
尊重网站服务条款和速率限制
仔细处理 HTML 内容以提取有意义的数据
监控令牌使用情况以优化成本

🇺🇸English

Web Reader Skill

This skill guides the implementation of web page reading and content extraction functionality using the z-ai-web-dev-sdk package, enabling applications to fetch and process web page content programmatically.

Skills Path

Skill Location : {project_path}/skills/web-reader

This skill is located at the above path in your project.

Reference Scripts : Example test scripts are available in the {Skill Location}/scripts/ directory for quick testing and reference. See {Skill Location}/scripts/web-reader.ts for a working example.

Overview

Web Reader allows you to build applications that can extract content from web pages, retrieve article metadata, and process HTML content. The API automatically handles content extraction, providing clean, structured data from any web URL.

IMPORTANT : z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code.

Prerequisites

The z-ai-web-dev-sdk package is already installed. Import it as shown in the examples below.

CLI Usage (For Simple Tasks)

For simple web page content extraction, you can use the z-ai CLI instead of writing code. This is ideal for quick content scraping, testing URLs, or simple automation tasks.

Basic Page Reading

# Extract content from a web page
z-ai function --name "page_reader" --args '{"url": "https://example.com"}'

# Using short options
z-ai function -n page_reader -a '{"url": "https://www.example.com/article"}'

Save Page Content

# Save extracted content to JSON file
z-ai function \
  -n page_reader \
  -a '{"url": "https://news.example.com/article"}' \
  -o page_content.json

# Extract and save blog post
z-ai function \
  -n page_reader \
  -a '{"url": "https://blog.example.com/post/123"}' \
  -o blog_post.json

Common Use Cases

# Extract news article
z-ai function \
  -n page_reader \
  -a '{"url": "https://news.site.com/breaking-news"}' \
  -o news.json

# Read documentation page
z-ai function \
  -n page_reader \
  -a '{"url": "https://docs.example.com/getting-started"}' \
  -o docs.json

# Scrape blog content
z-ai function \
  -n page_reader \
  -a '{"url": "https://techblog.com/ai-trends-2024"}' \
  -o blog.json

# Extract research article
z-ai function \
  -n page_reader \
  -a '{"url": "https://research.org/papers/quantum-computing"}' \
  -o research.json

CLI Parameters

--name, -n: Required - Function name (use "page_reader")
--args, -a: Required - JSON arguments object with:
- url (string, required): The URL of the web page to read
--output, -o <path>: Optional - Output file path (JSON format)

Response Structure

The CLI returns a JSON object containing:

title: Page title
html: Main content HTML
text: Plain text content
publish_time: Publication timestamp (if available)
url: Original URL
metadata: Additional page metadata

Example Response

{
  "title": "Introduction to Machine Learning",
  "html": "<article><h1>Introduction to Machine Learning</h1><p>Machine learning is...</p></article>",
  "text": "Introduction to Machine Learning\n\nMachine learning is...",
  "publish_time": "2024-01-15T10:30:00Z",
  "url": "https://example.com/ml-intro",
  "metadata": {
    "author": "John Doe",
    "description": "A comprehensive guide to ML"
  }
}

Processing Multiple URLs

# Create a simple script to process multiple URLs
for url in \
  "https://site1.com/article1" \
  "https://site2.com/article2" \
  "https://site3.com/article3"
do
  filename=$(echo $url | md5sum | cut -d' ' -f1)
  z-ai function -n page_reader -a "{\"url\": \"$url\"}" -o "${filename}.json"
done

When to Use CLI vs SDK

Use CLI for:

Quick content extraction
Testing URL accessibility
Simple web scraping tasks
One-off content retrieval

Use SDK for:

Batch URL processing with custom logic
Integration with web applications
Complex content processing pipelines
Production applications with error handling

How It Works

The Web Reader uses the page_reader function to:

Fetch the web page content
Extract main article content and metadata
Parse and clean the HTML
Return structured data including title, content, and publication time

Basic Web Reading Implementation

Simple Page Reading

import ZAI from 'z-ai-web-dev-sdk';

async function readWebPage(url) {
  try {
    const zai = await ZAI.create();

    const result = await zai.functions.invoke('page_reader', {
      url: url
    });

    console.log('Title:', result.data.title);
    console.log('URL:', result.data.url);
    console.log('Published:', result.data.publishedTime);
    console.log('HTML Content:', result.data.html);
    console.log('Tokens Used:', result.data.usage.tokens);

    return result.data;
  } catch (error) {
    console.error('Page reading failed:', error.message);
    throw error;
  }
}

// Usage
const pageData = await readWebPage('https://example.com/article');
console.log('Page title:', pageData.title);

Extract Article Text Only

import ZAI from 'z-ai-web-dev-sdk';

async function extractArticleText(url) {
  const zai = await ZAI.create();

  const result = await zai.functions.invoke('page_reader', {
    url: url
  });

  // Convert HTML to plain text (basic approach)
  const plainText = result.data.html
    .replace(/<[^>]*>/g, ' ')
    .replace(/\s+/g, ' ')
    .trim();

  return {
    title: result.data.title,
    text: plainText,
    url: result.data.url,
    publishedTime: result.data.publishedTime
  };
}

// Usage
const article = await extractArticleText('https://news.example.com/story');
console.log(article.title);
console.log(article.text.substring(0, 200) + '...');

Read Multiple Pages

import ZAI from 'z-ai-web-dev-sdk';

async function readMultiplePages(urls) {
  const zai = await ZAI.create();
  const results = [];

  for (const url of urls) {
    try {
      const result = await zai.functions.invoke('page_reader', {
        url: url
      });

      results.push({
        url: url,
        success: true,
        data: result.data
      });
    } catch (error) {
      results.push({
        url: url,
        success: false,
        error: error.message
      });
    }
  }

  return results;
}

// Usage
const urls = [
  'https://example.com/article1',
  'https://example.com/article2',
  'https://example.com/article3'
];

const pages = await readMultiplePages(urls);
pages.forEach(page => {
  if (page.success) {
    console.log(`✓ ${page.data.title}`);
  } else {
    console.log(`✗ ${page.url}: ${page.error}`);
  }
});

Advanced Use Cases

Web Content Analyzer

import ZAI from 'z-ai-web-dev-sdk';

class WebContentAnalyzer {
  constructor() {
    this.cache = new Map();
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  async readPage(url, useCache = true) {
    // Check cache
    if (useCache && this.cache.has(url)) {
      console.log('Returning cached result for:', url);
      return this.cache.get(url);
    }

    // Fetch fresh content
    const result = await this.zai.functions.invoke('page_reader', {
      url: url
    });

    // Cache the result
    if (useCache) {
      this.cache.set(url, result.data);
    }

    return result.data;
  }

  async getPageMetadata(url) {
    const data = await this.readPage(url);

    return {
      title: data.title,
      url: data.url,
      publishedTime: data.publishedTime,
      contentLength: data.html.length,
      wordCount: this.estimateWordCount(data.html)
    };
  }

  estimateWordCount(html) {
    const text = html.replace(/<[^>]*>/g, ' ');
    const words = text.split(/\s+/).filter(word => word.length > 0);
    return words.length;
  }

  async comparePages(url1, url2) {
    const [page1, page2] = await Promise.all([
      this.readPage(url1),
      this.readPage(url2)
    ]);

    return {
      page1: {
        title: page1.title,
        wordCount: this.estimateWordCount(page1.html),
        published: page1.publishedTime
      },
      page2: {
        title: page2.title,
        wordCount: this.estimateWordCount(page2.html),
        published: page2.publishedTime
      }
    };
  }

  clearCache() {
    this.cache.clear();
  }
}

// Usage
const analyzer = new WebContentAnalyzer();
await analyzer.initialize();

const metadata = await analyzer.getPageMetadata('https://example.com/article');
console.log('Article Metadata:', metadata);

const comparison = await analyzer.comparePages(
  'https://example.com/article1',
  'https://example.com/article2'
);
console.log('Comparison:', comparison);

RSS Feed Reader

import ZAI from 'z-ai-web-dev-sdk';

class FeedReader {
  constructor() {
    this.articles = [];
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  async fetchArticlesFromUrls(urls) {
    const articles = [];

    for (const url of urls) {
      try {
        const result = await this.zai.functions.invoke('page_reader', {
          url: url
        });

        articles.push({
          title: result.data.title,
          url: result.data.url,
          publishedTime: result.data.publishedTime,
          content: result.data.html,
          fetchedAt: new Date().toISOString()
        });

        console.log(`Fetched: ${result.data.title}`);
      } catch (error) {
        console.error(`Failed to fetch ${url}:`, error.message);
      }
    }

    this.articles = articles;
    return articles;
  }

  getRecentArticles(limit = 10) {
    return this.articles
      .sort((a, b) => {
        const dateA = new Date(a.publishedTime || a.fetchedAt);
        const dateB = new Date(b.publishedTime || b.fetchedAt);
        return dateB - dateA;
      })
      .slice(0, limit);
  }

  searchArticles(keyword) {
    return this.articles.filter(article => {
      const searchText = `${article.title} ${article.content}`.toLowerCase();
      return searchText.includes(keyword.toLowerCase());
    });
  }
}

// Usage
const reader = new FeedReader();
await reader.initialize();

const feedUrls = [
  'https://example.com/article1',
  'https://example.com/article2',
  'https://example.com/article3'
];

await reader.fetchArticlesFromUrls(feedUrls);
const recent = reader.getRecentArticles(5);
console.log('Recent articles:', recent.map(a => a.title));

Content Aggregator

import ZAI from 'z-ai-web-dev-sdk';

async function aggregateContent(urls, options = {}) {
  const zai = await ZAI.create();
  const aggregated = {
    sources: [],
    totalWords: 0,
    aggregatedAt: new Date().toISOString()
  };

  for (const url of urls) {
    try {
      const result = await zai.functions.invoke('page_reader', {
        url: url
      });

      const text = result.data.html.replace(/<[^>]*>/g, ' ');
      const wordCount = text.split(/\s+/).filter(w => w.length > 0).length;

      aggregated.sources.push({
        title: result.data.title,
        url: result.data.url,
        publishedTime: result.data.publishedTime,
        wordCount: wordCount,
        excerpt: text.substring(0, 200).trim() + '...'
      });

      aggregated.totalWords += wordCount;

      if (options.delay) {
        await new Promise(resolve => setTimeout(resolve, options.delay));
      }
    } catch (error) {
      console.error(`Failed to fetch ${url}:`, error.message);
    }
  }

  return aggregated;
}

// Usage
const sources = [
  'https://example.com/news1',
  'https://example.com/news2',
  'https://example.com/news3'
];

const aggregated = await aggregateContent(sources, { delay: 1000 });
console.log(`Aggregated ${aggregated.sources.length} sources`);
console.log(`Total words: ${aggregated.totalWords}`);

Web Scraping Pipeline

import ZAI from 'z-ai-web-dev-sdk';

class ScrapingPipeline {
  constructor() {
    this.processors = [];
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  addProcessor(name, processorFn) {
    this.processors.push({ name, fn: processorFn });
  }

  async scrape(url) {
    // Fetch the page
    const result = await this.zai.functions.invoke('page_reader', {
      url: url
    });

    let data = {
      raw: result.data,
      processed: {}
    };

    // Run through processors
    for (const processor of this.processors) {
      try {
        data.processed[processor.name] = await processor.fn(data.raw);
        console.log(`✓ Processed with ${processor.name}`);
      } catch (error) {
        console.error(`✗ Failed ${processor.name}:`, error.message);
        data.processed[processor.name] = null;
      }
    }

    return data;
  }
}

// Processor functions
function extractLinks(pageData) {
  const linkRegex = /href=["'](https?:\/\/[^"']+)["']/g;
  const links = [];
  let match;

  while ((match = linkRegex.exec(pageData.html)) !== null) {
    links.push(match[1]);
  }

  return [...new Set(links)]; // Remove duplicates
}

function extractImages(pageData) {
  const imgRegex = /src=["'](https?:\/\/[^"']+\.(jpg|jpeg|png|gif|webp))["']/gi;
  const images = [];
  let match;

  while ((match = imgRegex.exec(pageData.html)) !== null) {
    images.push(match[1]);
  }

  return [...new Set(images)];
}

function extractPlainText(pageData) {
  return pageData.html
    .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
    .replace(/<style[^>]*>[\s\S]*?<\/style>/gi, '')
    .replace(/<[^>]*>/g, ' ')
    .replace(/\s+/g, ' ')
    .trim();
}

// Usage
const pipeline = new ScrapingPipeline();
await pipeline.initialize();

pipeline.addProcessor('links', extractLinks);
pipeline.addProcessor('images', extractImages);
pipeline.addProcessor('plainText', extractPlainText);

const result = await pipeline.scrape('https://example.com/article');
console.log('Links found:', result.processed.links.length);
console.log('Images found:', result.processed.images.length);
console.log('Text length:', result.processed.plainText.length);

Response Format

Successful Response

{
  code: 200,
  status: 200,
  data: {
    title: "Article Title",
    url: "https://example.com/article",
    html: "<div>Article content...</div>",
    publishedTime: "2025-01-15T10:30:00Z",
    usage: {
      tokens: 1500
    }
  },
  meta: {
    usage: {
      tokens: 1500
    }
  }
}

Response Fields

Field	Type	Description
`code`	number	Response status code
`status`	number	HTTP status code
`data.title`	string	Page title
`data.url`	string	Page URL
`data.html`	string	Extracted HTML content

Best Practices

1. Error Handling

async function safeReadPage(url) {
  try {
    const zai = await ZAI.create();

    // Validate URL
    if (!url || !url.startsWith('http')) {
      throw new Error('Invalid URL format');
    }

    const result = await zai.functions.invoke('page_reader', {
      url: url
    });

    // Check response status
    if (result.code !== 200) {
      throw new Error(`Failed to fetch page: ${result.code}`);
    }

    // Verify essential data
    if (!result.data.html || !result.data.title) {
      throw new Error('Incomplete page data received');
    }

    return {
      success: true,
      data: result.data
    };
  } catch (error) {
    console.error('Page reading error:', error);
    return {
      success: false,
      error: error.message
    };
  }
}

2. Rate Limiting

class RateLimitedReader {
  constructor(requestsPerMinute = 10) {
    this.requestsPerMinute = requestsPerMinute;
    this.requestTimes = [];
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  async readPage(url) {
    await this.waitForRateLimit();

    const result = await this.zai.functions.invoke('page_reader', {
      url: url
    });

    this.requestTimes.push(Date.now());
    return result.data;
  }

  async waitForRateLimit() {
    const now = Date.now();
    const oneMinuteAgo = now - 60000;

    // Remove old timestamps
    this.requestTimes = this.requestTimes.filter(time => time > oneMinuteAgo);

    // Check if we need to wait
    if (this.requestTimes.length >= this.requestsPerMinute) {
      const oldestRequest = this.requestTimes[0];
      const waitTime = 60000 - (now - oldestRequest);

      if (waitTime > 0) {
        console.log(`Rate limit reached. Waiting ${waitTime}ms...`);
        await new Promise(resolve => setTimeout(resolve, waitTime));
      }
    }
  }
}

// Usage
const reader = new RateLimitedReader(10); // 10 requests per minute
await reader.initialize();

const urls = ['https://example.com/1', 'https://example.com/2'];
for (const url of urls) {
  const data = await reader.readPage(url);
  console.log('Fetched:', data.title);
}

3. Caching Strategy

import ZAI from 'z-ai-web-dev-sdk';

class CachedWebReader {
  constructor(cacheDuration = 3600000) { // 1 hour default
    this.cache = new Map();
    this.cacheDuration = cacheDuration;
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  async readPage(url, forceRefresh = false) {
    const cacheKey = url;
    const cached = this.cache.get(cacheKey);

    // Return cached if valid and not forcing refresh
    if (cached && !forceRefresh) {
      const age = Date.now() - cached.timestamp;
      if (age < this.cacheDuration) {
        console.log('Returning cached content for:', url);
        return cached.data;
      }
    }

    // Fetch fresh content
    const result = await this.zai.functions.invoke('page_reader', {
      url: url
    });

    // Update cache
    this.cache.set(cacheKey, {
      data: result.data,
      timestamp: Date.now()
    });

    return result.data;
  }

  clearCache() {
    this.cache.clear();
  }

  getCacheStats() {
    return {
      size: this.cache.size,
      entries: Array.from(this.cache.keys())
    };
  }
}

// Usage
const reader = new CachedWebReader(3600000); // 1 hour cache
await reader.initialize();

const data1 = await reader.readPage('https://example.com'); // Fresh fetch
const data2 = await reader.readPage('https://example.com'); // From cache
const data3 = await reader.readPage('https://example.com', true); // Force refresh

4. Parallel Processing

import ZAI from 'z-ai-web-dev-sdk';

async function readPagesInParallel(urls, concurrency = 3) {
  const zai = await ZAI.create();
  const results = [];
  
  // Process in batches
  for (let i = 0; i < urls.length; i += concurrency) {
    const batch = urls.slice(i, i + concurrency);
    
    const batchResults = await Promise.allSettled(
      batch.map(url =>
        zai.functions.invoke('page_reader', { url })
          .then(result => ({
            url: url,
            success: true,
            data: result.data
          }))
          .catch(error => ({
            url: url,
            success: false,
            error: error.message
          }))
      )
    );

    results.push(...batchResults.map(r => r.value));
    console.log(`Completed batch ${Math.floor(i / concurrency) + 1}`);
  }

  return results;
}

// Usage
const urls = [
  'https://example.com/1',
  'https://example.com/2',
  'https://example.com/3',
  'https://example.com/4',
  'https://example.com/5'
];

const results = await readPagesInParallel(urls, 2); // 2 concurrent requests
results.forEach(result => {
  if (result.success) {
    console.log(`✓ ${result.data.title}`);
  } else {
    console.log(`✗ ${result.url}: ${result.error}`);
  }
});

5. Content Processing

import ZAI from 'z-ai-web-dev-sdk';

class ContentProcessor {
  static extractMainContent(html) {
    // Remove scripts, styles, and comments
    let content = html
      .replace(/<script[^>]*>[\s\S]*?<\/script>/gi, '')
      .replace(/<style[^>]*>[\s\S]*?<\/style>/gi, '')
      .replace(/<!--[\s\S]*?-->/g, '');

    return content;
  }

  static htmlToPlainText(html) {
    return html
      .replace(/<br\s*\/?>/gi, '\n')
      .replace(/<\/p>/gi, '\n\n')
      .replace(/<[^>]*>/g, '')
      .replace(/&nbsp;/g, ' ')
      .replace(/&amp;/g, '&')
      .replace(/&lt;/g, '<')
      .replace(/&gt;/g, '>')
      .replace(/&quot;/g, '"')
      .replace(/\s+/g, ' ')
      .trim();
  }

  static extractMetadata(html) {
    const metadata = {};

    // Extract meta description
    const descMatch = html.match(/<meta\s+name=["']description["']\s+content=["']([^"']+)["']/i);
    if (descMatch) metadata.description = descMatch[1];

    // Extract keywords
    const keywordsMatch = html.match(/<meta\s+name=["']keywords["']\s+content=["']([^"']+)["']/i);
    if (keywordsMatch) metadata.keywords = keywordsMatch[1].split(',').map(k => k.trim());

    // Extract author
    const authorMatch = html.match(/<meta\s+name=["']author["']\s+content=["']([^"']+)["']/i);
    if (authorMatch) metadata.author = authorMatch[1];

    return metadata;
  }
}

// Usage
async function processWebPage(url) {
  const zai = await ZAI.create();
  const result = await zai.functions.invoke('page_reader', { url });

  return {
    title: result.data.title,
    url: result.data.url,
    mainContent: ContentProcessor.extractMainContent(result.data.html),
    plainText: ContentProcessor.htmlToPlainText(result.data.html),
    metadata: ContentProcessor.extractMetadata(result.data.html),
    publishedTime: result.data.publishedTime
  };
}

const processed = await processWebPage('https://example.com/article');
console.log('Processed content:', processed.title);

Common Use Cases

News Aggregation : Collect and aggregate news articles from multiple sources
Content Monitoring : Track changes on specific web pages
Research Tools : Extract information from academic or reference websites
Price Tracking : Monitor product pages for price changes
SEO Analysis : Extract page metadata and content for SEO purposes
Archive Creation : Create local copies of web content
Content Curation : Collect and organize web content by topic
Competitive Intelligence : Monitor competitor websites for updates

Integration Examples

Express.js API Endpoint

import express from 'express';
import ZAI from 'z-ai-web-dev-sdk';

const app = express();
app.use(express.json());

let zaiInstance;

async function initZAI() {
  zaiInstance = await ZAI.create();
}

app.post('/api/read-page', async (req, res) => {
  try {
    const { url } = req.body;

    if (!url) {
      return res.status(400).json({ 
        error: 'URL is required' 
      });
    }

    const result = await zaiInstance.functions.invoke('page_reader', {
      url: url
    });

    res.json({
      success: true,
      data: {
        title: result.data.title,
        url: result.data.url,
        content: result.data.html,
        publishedTime: result.data.publishedTime,
        tokensUsed: result.data.usage.tokens
      }
    });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error.message
    });
  }
});

app.post('/api/read-multiple', async (req, res) => {
  try {
    const { urls } = req.body;

    if (!urls || !Array.isArray(urls)) {
      return res.status(400).json({ 
        error: 'URLs array is required' 
      });
    }

    const results = await Promise.allSettled(
      urls.map(url =>
        zaiInstance.functions.invoke('page_reader', { url })
          .then(result => ({
            url: url,
            success: true,
            data: result.data
          }))
          .catch(error => ({
            url: url,
            success: false,
            error: error.message
          }))
      )
    );

    res.json({
      success: true,
      results: results.map(r => r.value)
    });
  } catch (error) {
    res.status(500).json({
      success: false,
      error: error.message
    });
  }
});

initZAI().then(() => {
  app.listen(3000, () => {
    console.log('Web reader API running on port 3000');
  });
});

Scheduled Content Fetcher

import ZAI from 'z-ai-web-dev-sdk';
import cron from 'node-cron';

class ScheduledFetcher {
  constructor() {
    this.urls = [];
    this.results = [];
  }

  async initialize() {
    this.zai = await ZAI.create();
  }

  addUrl(url, schedule) {
    this.urls.push({ url, schedule });
  }

  async fetchContent(url) {
    try {
      const result = await this.zai.functions.invoke('page_reader', {
        url: url
      });

      return {
        url: url,
        success: true,
        title: result.data.title,
        content: result.data.html,
        fetchedAt: new Date().toISOString()
      };
    } catch (error) {
      return {
        url: url,
        success: false,
        error: error.message,
        fetchedAt: new Date().toISOString()
      };
    }
  }

  startScheduledFetch(url, schedule) {
    cron.schedule(schedule, async () => {
      console.log(`Fetching ${url}...`);
      const result = await this.fetchContent(url);
      this.results.push(result);
      
      // Keep only last 100 results
      if (this.results.length > 100) {
        this.results = this.results.slice(-100);
      }
      
      console.log(`Fetched: ${result.success ? result.title : result.error}`);
    });
  }

  start() {
    for (const { url, schedule } of this.urls) {
      this.startScheduledFetch(url, schedule);
    }
  }

  getResults() {
    return this.results;
  }
}

// Usage
const fetcher = new ScheduledFetcher();
await fetcher.initialize();

// Fetch every hour
fetcher.addUrl('https://example.com/news', '0 * * * *');

// Fetch every day at midnight
fetcher.addUrl('https://example.com/daily', '0 0 * * *');

fetcher.start();
console.log('Scheduled fetching started');

Troubleshooting

Issue : "SDK must be used in backend"

Solution : Ensure z-ai-web-dev-sdk is only imported and used in server-side code

Issue : Failed to fetch page (404, 403, etc.)

Solution : Verify the URL is accessible and not behind authentication/paywall

Issue : Incomplete or missing content

Solution : Some pages may have dynamic content that requires JavaScript. The reader extracts static HTML content.

Issue : High token usage

Solution : The token usage depends on page size. Consider caching frequently accessed pages.

Issue : Slow response times

Solution : Implement caching, use parallel processing for multiple URLs, and consider rate limiting

Issue : Empty HTML content

Solution : Check if the page requires authentication or has anti-scraping measures. Verify the URL is correct.

Performance Tips

Implement caching : Cache frequently accessed pages to reduce API calls
Use parallel processing : Fetch multiple pages concurrently (with rate limiting)
Process content efficiently : Extract only needed information from HTML
Set timeouts : Implement reasonable timeouts for page fetching
Monitor token usage : Track usage to optimize costs
Batch operations : Group multiple URL fetches when possible

Security Considerations

Validate all URLs before processing
Sanitize extracted HTML content before displaying
Implement rate limiting to prevent abuse
Never expose SDK credentials in client-side code
Be respectful of robots.txt and website terms of service
Handle user data according to privacy regulations
Implement proper error handling for failed requests

Remember

Always use z-ai-web-dev-sdk in backend code only
The SDK is already installed - import as shown in examples
Implement proper error handling for robust applications
Use caching to improve performance and reduce costs
Respect website terms of service and rate limits
Process HTML content carefully to extract meaningful data
Monitor token usage for cost optimization

Weekly Installs

1.4K

Repository

answerzhao/agent-skills

GitHub Stars

First Seen

Jan 23, 2026

Security Audits

Gen Agent Trust HubFail SocketPass SnykWarn

Installed on

opencode1.2K

gemini-cli1.2K

codex1.2K

github-copilot1.2K

kimi-cli1.2K

amp1.2K

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

102,200 周安装

Web Reader 网页读取技能 - 使用 z-ai-web-dev-sdk 实现内容提取与处理

🇨🇳中文介绍

Web Reader Skill

技能路径

概述

先决条件

CLI 用法（适用于简单任务）

基本页面读取

相关 Skills

保存页面内容

常见用例

CLI 参数

响应结构

示例响应

处理多个 URL

何时使用 CLI 与 SDK

工作原理

基本网页读取实现

简单页面读取

仅提取文章文本

读取多个页面

高级用例

网页内容分析器

RSS 源阅读器

内容聚合器

网页抓取管道

响应格式

成功响应

响应字段

最佳实践

1. 错误处理

2. 速率限制

3. 缓存策略

4. 并行处理

5. 内容处理

常见用例

集成示例

Express.js API 端点

计划内容获取器

故障排除

性能提示

安全考虑

记住

🇺🇸English

Web Reader Skill

Skills Path

Overview

Prerequisites

CLI Usage (For Simple Tasks)

Basic Page Reading

Save Page Content

Common Use Cases

CLI Parameters

Response Structure

Example Response

Processing Multiple URLs

When to Use CLI vs SDK

How It Works

Basic Web Reading Implementation

Simple Page Reading

Extract Article Text Only

Read Multiple Pages

Advanced Use Cases

Web Content Analyzer

RSS Feed Reader

Content Aggregator

Web Scraping Pipeline

Response Format

Successful Response

Response Fields

Best Practices

1. Error Handling

2. Rate Limiting

3. Caching Strategy

4. Parallel Processing

5. Content Processing

Common Use Cases

Integration Examples

Express.js API Endpoint

Scheduled Content Fetcher