PPTX文件创建、编辑与分析指南：Python脚本处理.pptx文件XML内容

pptx by aiskillstore/marketplace

159 周安装量

242 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/aiskillstore/marketplace --skill pptx

办公软件自动化数据处理

🇨🇳中文介绍

PPTX 创建、编辑与分析

概述

用户可能会要求您创建、编辑或分析 .pptx 文件的内容。.pptx 文件本质上是一个包含 XML 文件和其他资源的 ZIP 压缩包，您可以读取或编辑这些内容。针对不同的任务，您可以使用不同的工具和工作流程。

读取与分析内容

文本提取

如果只需要读取演示文稿的文本内容，应将文档转换为 Markdown 格式：

# 将文档转换为 Markdown
python -m markitdown path-to-file.pptx

原始 XML 访问

对于以下功能，您需要原始 XML 访问权限：批注、演讲者备注、幻灯片版式、动画、设计元素和复杂格式。对于任何这些功能，您都需要解包演示文稿并读取其原始 XML 内容。

解包文件

python ooxml/scripts/unpack.py <office_file> <output_dir>

注意：unpack.py 脚本位于项目根目录下的 skills/pptx/ooxml/scripts/unpack.py。如果此路径不存在该脚本，请使用 find . -name "unpack.py" 来定位它。

关键文件结构

ppt/presentation.xml - 主演示文稿元数据和幻灯片引用
ppt/slides/slide{N}.xml - 单个幻灯片内容（slide1.xml、slide2.xml 等）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

不使用模板创建新的 PowerPoint 演示文稿

从头开始创建新的 PowerPoint 演示文稿时，请使用 html2pptx 工作流程，将 HTML 幻灯片转换为具有精确定位的 PowerPoint。

关键：在创建任何演示文稿之前，请分析内容并选择合适的设计元素：

考虑主题内容：这个演示文稿是关于什么的？它暗示了什么基调、行业或氛围？
检查品牌标识：如果用户提到了公司/组织，请考虑其品牌颜色和标识
使调色板与内容匹配：选择反映主题的颜色
陈述您的方法：在编写代码之前解释您的设计选择

✅ 在编写代码之前，陈述您基于内容的设计方法
✅ 仅使用网络安全字体：Arial、Helvetica、Times New Roman、Georgia、Courier New、Verdana、Tahoma、Trebuchet MS、Impact
✅ 通过大小、粗细和颜色创建清晰的视觉层次结构
✅ 确保可读性：强对比度、适当大小的文本、干净的对齐方式
✅ 保持一致性：在整个幻灯片中重复使用模式、间距和视觉语言

创造性地选择颜色：

超越默认设置思考：哪些颜色真正符合这个特定主题？避免自动化的选择。
考虑多个角度：主题、行业、氛围、能量水平、目标受众、品牌标识（如果提及）
勇于尝试：尝试意想不到的组合——医疗保健演示文稿不一定是绿色的，金融不一定是深蓝色的
构建您的调色板：选择 3-5 种协调的颜色（主色 + 辅助色调 + 强调色）
确保对比度：文本在背景上必须清晰可读

示例调色板（使用这些来激发创意 - 选择一个、调整它或创建您自己的）：

经典蓝：深海军蓝 (#1C2833)、石板灰 (#2E4053)、银色 (#AAB7B8)、米白色 (#F4F6F6)
青绿色与珊瑚色：青绿色 (#5EA8A7)、深青绿色 (#277884)、珊瑚色 (#FE4447)、白色 (#FFFFFF)
大胆红：红色 (#C0392B)、亮红色 (#E74C3C)、橙色 (#F39C12)、黄色 (#F1C40F)、绿色 (#2ECC71)
温暖腮红：淡紫色 (#A49393)、腮红 (#EED6D3)、玫瑰色 (#E8B4B8)、奶油色 (#FAF7F2)
勃艮第奢华：勃艮第红 (#5D1D2E)、深红色 (#951233)、铁锈色 (#C15937)、金色 (#997929)
深紫色与翡翠绿：紫色 (#B165FB)、深蓝色 (#181B24)、翡翠绿 (#40695B)、白色 (#FFFFFF)
奶油色与森林绿：奶油色 (#FFE1C7)、森林绿 (#40695B)、白色 (#FCFCFC)
粉色与紫色：粉色 (#F8275B)、珊瑚色 (#FF574A)、玫瑰色 (#FF737D)、紫色 (#3D2F68)
酸橙色与梅子色：酸橙色 (#C5DE82)、梅子色 (#7C3A5F)、珊瑚色 (#FD8C6E)、蓝灰色 (#98ACB5)
黑色与金色：金色 (#BF9A4A)、黑色 (#000000)、奶油色 (#F4F6F6)
鼠尾草色与赤陶色：鼠尾草色 (#87A96B)、赤陶色 (#E07A5F)、奶油色 (#F4F1DE)、炭灰色 (#2C2C2C)
炭灰色与红色：炭灰色 (#292929)、红色 (#E33737)、浅灰色 (#CCCBCB)
活力橙色：橙色 (#F96D00)、浅灰色 (#F2F2F2)、炭灰色 (#222831)
森林绿：黑色 (#191A19)、绿色 (#4E9F3D)、深绿色 (#1E5128)、白色 (#FFFFFF)
复古彩虹：紫色 (#722880)、粉色 (#D72D51)、橙色 (#EB5C18)、琥珀色 (#F08800)、金色 (#DEB600)
复古大地色：芥末黄 (#E3B448)、鼠尾草色 (#CBD18F)、森林绿 (#3A6B35)、奶油色 (#F4F1DE)
海岸玫瑰色：旧玫瑰色 (#AD7670)、海狸色 (#B49886)、蛋壳色 (#F3ECDC)、灰灰色 (#BFD5BE)
橙色与绿松石色：浅橙色 (#FC993E)、灰绿松石色 (#667C6F)、白色 (#FCFCFC)

使用对角线区域分隔符代替水平分隔符
不对称列宽（30/70、40/60、25/75）
旋转 90° 或 270° 的文本标题
图像的圆形/六边形框架
角落的三角形装饰形状
重叠形状以增加深度感

边框与框架处理：

仅在单侧使用粗单色边框（10-20pt）
使用对比色的双线边框
使用角括号代替完整框架
L 形边框（顶部+左侧或底部+右侧）
标题下方的下划线装饰（3-5pt 粗）

极端的大小对比（72pt 标题 vs 11pt 正文）
全大写标题配以宽字母间距
使用超大显示字体的编号部分
数据/统计/技术内容使用等宽字体（Courier New）
密集信息使用紧缩字体（Arial Narrow）
强调文本使用轮廓字体

图表与数据样式：

单色图表，关键数据使用单一强调色
使用水平条形图代替垂直条形图
使用点图代替条形图
最小化网格线或完全不使用
直接在元素上标注数据标签（无图例）
关键指标使用超大数字

全出血图像配以文本叠加
侧边栏列（20-30% 宽度）用于导航/上下文
模块化网格系统（3×3、4×4 块）
Z 形或 F 形内容流
彩色形状上方的浮动文本框
杂志风格的多栏布局

占据幻灯片 40-60% 的纯色块
渐变填充（仅垂直或对角线）
分割背景（两种颜色，对角线或垂直）
边到边的色带
将负空间作为设计元素

创建包含图表或表格的幻灯片时：

两栏布局（首选）：使用横跨整个宽度的标题，下方分为两栏——一栏是文本/项目符号，另一栏是特色内容。这提供了更好的平衡，并使图表/表格更易读。使用具有不等列宽（例如，40%/60% 分割）的 flexbox 来优化每种内容类型的空间。
全幻灯片布局：让特色内容（图表/表格）占据整个幻灯片，以获得最大的影响力和可读性
切勿垂直堆叠：不要将图表/表格放在单列文本下方——这会导致可读性差和布局问题

强制 - 阅读整个文件：从头到尾完整阅读 html2pptx.md。阅读此文件时切勿设置任何范围限制。 在继续创建演示文稿之前，请阅读完整的文件内容以了解详细语法、关键格式规则和最佳实践。
为每个幻灯片创建具有适当尺寸（例如，16:9 为 720pt × 405pt）的 HTML 文件 * 对所有文本内容使用 <p>、<h1>-<h6>、<ul>、<ol> * 对于将要添加图表/表格的区域，使用 class="placeholder"（为可见性渲染为灰色背景） * 关键：首先使用 Sharp 将渐变和图标栅格化为 PNG 图像，然后在 HTML 中引用 * 布局：对于包含图表/表格/图像的幻灯片，使用全幻灯片布局或两栏布局以获得更好的可读性
使用 html2pptx.js 库创建并运行一个 JavaScript 文件，将 HTML 幻灯片转换为 PowerPoint 并保存演示文稿 * 使用 html2pptx() 函数处理每个 HTML 文件 * 使用 PptxGenJS API 将图表和表格添加到占位符区域 * 使用 pptx.writeFile() 保存演示文稿
视觉验证：生成缩略图并检查布局问题 * 创建缩略图网格：python scripts/thumbnail.py output.pptx workspace/thumbnails --cols 4 * 读取并仔细检查缩略图图像，查看：
- 文本截断：文本被标题栏、形状或幻灯片边缘截断
- 文本重叠：文本与其他文本或形状重叠
- 定位问题：内容太靠近幻灯片边界或其他元素
- 对比度问题：文本与背景之间的对比度不足 * 如果发现问题，调整 HTML 的边距/间距/颜色并重新生成演示文稿 * 重复此过程，直到所有幻灯片在视觉上都正确无误

编辑现有的 PowerPoint 演示文稿

当编辑现有 PowerPoint 演示文稿中的幻灯片时，您需要使用原始的 Office Open XML (OOXML) 格式。这涉及解包 .pptx 文件、编辑 XML 内容并重新打包。

强制 - 阅读整个文件：从头到尾完整阅读 ooxml.md（约 500 行）。阅读此文件时切勿设置任何范围限制。 在进行任何演示文稿编辑之前，请阅读完整的文件内容以了解 OOXML 结构和编辑工作流程的详细指导。
解包演示文稿：python ooxml/scripts/unpack.py <office_file> <output_dir>
编辑 XML 文件（主要是 ppt/slides/slide{N}.xml 和相关文件）
关键：每次编辑后立即验证，并在继续之前修复任何验证错误：python ooxml/scripts/validate.py <dir> --original <file>
打包最终演示文稿：python ooxml/scripts/pack.py <input_directory> <office_file>

使用模板创建新的 PowerPoint 演示文稿

当您需要创建一个遵循现有模板设计的演示文稿时，您需要先复制并重新排列模板幻灯片，然后替换占位符内容。

提取模板文本并创建视觉缩略图网格：

 * 提取文本：`python -m markitdown template.pptx > template-content.md`

 * 阅读 `template-content.md`：阅读整个文件以了解模板演示文稿的内容。**阅读此文件时切勿设置任何范围限制。**
 * 创建缩略图网格：`python scripts/thumbnail.py template.pptx`
 * 有关更多详细信息，请参阅“创建缩略图网格”部分

2. 分析模板并将清单保存到文件：

 * **视觉分析**：查看缩略图网格以了解幻灯片布局、设计模式和视觉结构
 * 在 `template-inventory.md` 处创建并保存一个模板清单文件，内容如下：
       
       # 模板清单分析
       **总幻灯片数：[count]**
       **重要提示：幻灯片索引从 0 开始（第一张幻灯片 = 0，最后一张幻灯片 = count-1）**
       
       ## [类别名称]
       - 幻灯片 0：[版式代码（如果可用）] - 描述/用途
       - 幻灯片 1：[版式代码] - 描述/用途
       - 幻灯片 2：[版式代码] - 描述/用途
       [... 必须单独列出每张幻灯片及其索引 ...]
       

 * **使用缩略图网格**：参考视觉缩略图来识别：
   * 布局模式（标题幻灯片、内容布局、章节分隔符）
   * 图像占位符的位置和数量
   * 跨幻灯片组的设计一致性
   * 视觉层次结构和结构
 * 此清单文件是下一步选择合适模板的**必需**文件

3. 基于模板清单创建演示文稿大纲：

 * 查看步骤 2 中的可用模板。
 * 为第一张幻灯片选择一个介绍或标题模板。这应该是首批模板之一。
 * 为其他幻灯片选择安全的、基于文本的布局。
 * **关键：使布局结构与实际内容相匹配**：
   * 单栏布局：用于统一的叙述或单一主题
   * 两栏布局：**仅当**您恰好有 2 个不同的项目/概念时使用
   * 三栏布局：**仅当**您恰好有 3 个不同的项目/概念时使用
   * 图像 + 文本布局：**仅当**您有实际图像要插入时使用
   * 引用布局：**仅用于**来自人物的实际引用（带署名），切勿用于强调
   * 切勿使用占位符数量超过您内容数量的布局
   * 如果您有 2 个项目，不要强行将它们放入 3 栏布局
   * 如果您有 4 个或更多项目，请考虑分成多张幻灯片或使用列表格式
 * **在选择布局之前**，计算您实际的内容片段数量
 * 验证所选布局中的每个占位符都将填充有意义的内容
 * 为每个内容部分选择**最佳**布局的一个选项。
 * 保存 `outline.md`，其中包含内容以及利用可用设计的模板映射
 * 模板映射示例：
       
       # 要使用的模板幻灯片（基于 0 的索引）
       # 警告：验证索引在范围内！包含 73 张幻灯片的模板索引为 0-72
       # 映射：大纲中的幻灯片编号 -> 模板幻灯片索引
       template_mapping = [
           0,   # 使用幻灯片 0（标题/封面）
           34,  # 使用幻灯片 34（B1：标题和正文）
           34,  # 再次使用幻灯片 34（为第二个 B1 复制）
           50,  # 使用幻灯片 50（E1：引用）
           54,  # 使用幻灯片 54（F2：结束语 + 文本）
       ]

4. 使用 rearrange.py 复制、重新排序和删除幻灯片：

 * 使用 `scripts/rearrange.py` 脚本创建一个具有所需顺序幻灯片的新演示文稿：
       
       python scripts/rearrange.py template.pptx working.pptx 0,34,34,50,52
       

 * 该脚本会自动处理重复幻灯片的复制、未使用幻灯片的删除和重新排序
 * 幻灯片索引从 0 开始（第一张幻灯片是 0，第二张是 1，依此类推）
 * 同一幻灯片索引可以出现多次以复制该幻灯片

5. 使用 inventory.py 脚本提取所有文本：

 * **运行清单提取**：
       
       python scripts/inventory.py working.pptx text-inventory.json
       

 * **阅读 text-inventory.json**：阅读整个 text-inventory.json 文件以了解所有形状及其属性。**阅读此文件时切勿设置任何范围限制。**

 * 清单 JSON 结构：
       
       {
           "slide-0": {
             "shape-0": {
               "placeholder_type": "TITLE",  // 或 null 表示非占位符
               "left": 1.5,                  // 位置（英寸）
               "top": 2.0,
               "width": 7.5,
               "height": 1.2,
               "paragraphs": [
                 {
                   "text": "段落文本",
                   // 可选属性（仅在非默认值时包含）：
                   "bullet": true,           // 检测到显式项目符号
                   "level": 0,               // 仅当 bullet 为 true 时包含
                   "alignment": "CENTER",    // CENTER, RIGHT（非 LEFT）
                   "space_before": 10.0,     // 段落前间距（磅）
                   "space_after": 6.0,       // 段落后间距（磅）
                   "line_spacing": 22.4,     // 行间距（磅）
                   "font_name": "Arial",     // 来自第一个运行
                   "font_size": 14.0,        // 大小（磅）
                   "bold": true,
                   "italic": false,
                   "underline": false,
                   "color": "FF0000"         // RGB 颜色
                 }
               ]
             }
           }
         }
       

 * 关键特性：

   * **幻灯片**：命名为 "slide-0"、"slide-1" 等。
   * **形状**：按视觉位置（从上到下，从左到右）排序为 "shape-0"、"shape-1" 等。
   * **占位符类型**：TITLE、CENTER_TITLE、SUBTITLE、BODY、OBJECT 或 null
   * **默认字体大小**：从布局占位符（如果可用）提取的 `default_font_size`（磅）
   * **幻灯片编号被过滤**：具有 SLIDE_NUMBER 占位符类型的形状会自动从清单中排除
   * **项目符号**：当 `bullet: true` 时，始终包含 `level`（即使为 0）
   * **间距**：`space_before`、`space_after` 和 `line_spacing`（磅）（仅在设置时包含）
   * **颜色**：RGB 使用 `color`（例如，"FF0000"），主题颜色使用 `theme_color`（例如，"DARK_1"）
   * **属性**：输出中仅包含非默认值

6. 根据上一步的文本清单，生成替换文本并将数据保存到 JSON 文件：

 * **关键**：首先验证清单中存在哪些形状——仅引用实际存在的形状
 * **验证**：replace.py 脚本将验证您的替换 JSON 中的所有形状是否都存在于清单中
   * 如果您引用了不存在的形状，您将收到一个错误，显示可用的形状
   * 如果您引用了不存在的幻灯片，您将收到一个指示该幻灯片不存在的错误
   * 所有验证错误都会在脚本退出前一次性显示
 * **重要**：replace.py 脚本在内部使用 inventory.py 来识别**所有**文本形状
 * **自动清除**：清单中的**所有**文本形状都将被清除，除非您为它们提供 "paragraphs"
 * 为需要内容的形状添加 "paragraphs" 字段（而不是 "replacement_paragraphs"）
 * 替换 JSON 中没有 "paragraphs" 的形状将自动清除其文本
 * 带有项目符号的段落将自动左对齐。当 `"bullet": true` 时，不要设置 `alignment` 属性
 * 为占位符文本生成适当的替换内容
 * 使用形状大小来确定适当的内容长度
 * **关键**：包含来自原始清单的段落属性——不要只提供文本
 * **重要**：当 bullet: true 时，**不要**在文本中包含项目符号（•, -, *）——它们会自动添加
 * **基本格式规则**：
   * 标题/标题通常应具有 `"bold": true`
   * 列表项应具有 `"bullet": true, "level": 0`（当 bullet 为 true 时，level 是必需的）
   * 保留任何对齐属性（例如，居中对齐的文本使用 `"alignment": "CENTER"`）
   * 当字体属性与默认值不同时，包含字体属性（例如，`"font_size": 14.0`、`"font_name": "Lora"`）
   * 颜色：RGB 使用 `"color": "FF0000"`，主题颜色使用 `"theme_color": "DARK_1"`
   * 替换脚本期望**正确格式化的段落**，而不仅仅是文本字符串
   * **重叠形状**：优先选择具有较大 default_font_size 或更合适 placeholder_type 的形状
 * 将包含替换内容的更新清单保存到 `replacement-text.json`
 * **警告**：不同的模板布局具有不同数量的形状——在创建替换内容之前，始终检查实际清单

显示正确格式的 paragraphs 字段示例：

"paragraphs": [
  {
    "text": "新演示文稿标题文本",
    "alignment": "CENTER",
    "bold": true
  },
  {
    "text": "章节标题",
    "bold": true
  },
  {
    "text": "第一个项目符号点（不带项目符号）",
    "bullet": true,
    "level": 0
  },
  {
    "text": "红色文本",
    "color": "FF0000"
  },
  {
    "text": "主题颜色文本",
    "theme_color": "DARK_1"
  },
  {
    "text": "无特殊格式的常规段落文本"
  }
]

未在替换 JSON 中列出的形状会自动被清除：

{
  "slide-0": {
    "shape-0": {
      "paragraphs": [...] // 此形状获得新文本
    }
    // 清单中的 shape-1 和 shape-2 将自动被清除
  }
}

演示文稿的常见格式模式：

 * 标题幻灯片：粗体文本，有时居中对齐
 * 幻灯片内的章节标题：粗体文本
 * 项目符号列表：每个项目都需要 `"bullet": true, "level": 0`
 * 正文：通常不需要特殊属性
 * 引用：可能具有特殊的对齐方式或字体属性

7. 使用 replace.py 脚本应用替换内容

     python scripts/replace.py working.pptx replacement-text.json output.pptx

 * 首先使用 inventory.py 中的函数提取**所有**文本形状的清单
 * 验证替换 JSON 中的所有形状是否都存在于清单中
 * 清除清单中识别的**所有**形状的文本
 * 仅对替换 JSON 中定义了 "paragraphs" 的形状应用新文本
 * 通过应用 JSON 中的段落属性来保留格式
 * 自动处理项目符号、对齐方式、字体属性和颜色
 * 保存更新后的演示文稿

验证错误示例：

ERROR: 替换 JSON 中存在无效形状：
  - 在 'slide-0' 上未找到形状 'shape-99'。可用形状：shape-0, shape-1, shape-4
  - 在清单中未找到幻灯片 'slide-999'


ERROR: 替换文本使这些形状的溢出情况恶化：
  - slide-0/shape-2: 溢出恶化了 1.25"（原为 0.00"，现为 1.25"）

创建缩略图网格

要为 PowerPoint 幻灯片创建视觉缩略图网格以便快速分析和参考：

python scripts/thumbnail.py template.pptx [output_prefix]

创建：thumbnails.jpg（对于大型演示文稿，则为 thumbnails-1.jpg、thumbnails-2.jpg 等）
默认：5 列，每个网格最多 30 张幻灯片（5×6）
自定义前缀：python scripts/thumbnail.py template.pptx my-grid
- 注意：如果您希望输出到特定目录，输出前缀应包含路径（例如，workspace/my-grid）
调整列数：--cols 4（范围：3-6，影响每个网格的幻灯片数）
网格限制：3 列 = 12 张幻灯片/网格，4 列 = 20，5 列 = 30，6 列 = 42
幻灯片索引从 0 开始（幻灯片 0、幻灯片 1 等）

模板分析：快速了解幻灯片布局和设计模式
内容审查：整个演示文稿的视觉概览
导航参考：通过视觉外观查找特定幻灯片
质量检查：验证所有幻灯片的格式是否正确

# 基本用法
python scripts/thumbnail.py presentation.pptx

# 组合选项：自定义名称、列数
python scripts/thumbnail.py template.pptx analysis --cols 4

将幻灯片转换为图像

为了视觉分析 PowerPoint 幻灯片，使用两步过程将其转换为图像：

将 PPTX 转换为 PDF：

soffice --headless --convert-to pdf template.pptx
将 PDF 页面转换为 JPEG 图像：

pdftoppm -jpeg -r 150 template.pdf slide

这将创建类似 slide-1.jpg、slide-2.jpg 等的文件。

-r 150：将分辨率设置为 150 DPI（根据需要调整质量/大小平衡）
-jpeg：输出 JPEG 格式（如果偏好，可使用 -png 输出 PNG）
-f N：要转换的起始页（例如，-f 2 从第 2 页开始）
-l N：要转换的结束页（例如，-l 5 在第 5 页停止）
slide：输出文件的前缀

特定范围示例：

pdftoppm -jpeg -r 150 -f 2 -l 5 template.pdf slide  # 仅转换第 2-5 页

重要：为 PPTX 操作生成代码时：

编写简洁的代码
避免冗长的变量名和冗余操作
避免不必要的打印语句

所需依赖项（应已安装）：

markitdown：pip install "markitdown[pptx]"（用于从演示文稿中提取文本）
pptxgenjs：npm install -g pptxgenjs（用于通过 html2pptx 创建演示文稿）
playwright：npm install -g playwright（用于 html2pptx 中的 HTML 渲染）
react-icons：npm install -g react-icons react react-dom（用于图标）
sharp：npm install -g sharp（用于 SVG 栅格化和图像处理）
LibreOffice：sudo apt-get install libreoffice（用于 PDF 转换）
Poppler：sudo apt-get install poppler-utils（用于 pdftoppm 将 PDF 转换为图像）
defusedxml：pip install defusedxml（用于安全的 XML 解析）

2026 年 1 月 20 日

🇺🇸English

PPTX creation, editing, and analysis

Overview

A user may ask you to create, edit, or analyze the contents of a .pptx file. A .pptx file is essentially a ZIP archive containing XML files and other resources that you can read or edit. You have different tools and workflows available for different tasks.

Reading and analyzing content

Text extraction

If you just need to read the text contents of a presentation, you should convert the document to markdown:

# Convert document to markdown
python -m markitdown path-to-file.pptx

Raw XML access

You need raw XML access for: comments, speaker notes, slide layouts, animations, design elements, and complex formatting. For any of these features, you'll need to unpack a presentation and read its raw XML contents.

Unpacking a file

python ooxml/scripts/unpack.py <office_file> <output_dir>

Note : The unpack.py script is located at skills/pptx/ooxml/scripts/unpack.py relative to the project root. If the script doesn't exist at this path, use find . -name "unpack.py" to locate it.

Key file structures

ppt/presentation.xml - Main presentation metadata and slide references
ppt/slides/slide{N}.xml - Individual slide contents (slide1.xml, slide2.xml, etc.)
ppt/notesSlides/notesSlide{N}.xml - Speaker notes for each slide
ppt/comments/modernComment_*.xml - Comments for specific slides
ppt/slideLayouts/ - Layout templates for slides
ppt/slideMasters/ - Master slide templates
ppt/theme/ - Theme and styling information
ppt/media/ - Images and other media files

Typography and color extraction

When given an example design to emulate : Always analyze the presentation's typography and colors first using the methods below:

Read theme file : Check ppt/theme/theme1.xml for colors (<a:clrScheme>) and fonts (<a:fontScheme>)
Sample slide content : Examine ppt/slides/slide1.xml for actual font usage (<a:rPr>) and colors
Search for patterns : Use grep to find color (<a:solidFill>, <a:srgbClr>) and font references across all XML files

Creating a new PowerPoint presentation without a template

When creating a new PowerPoint presentation from scratch, use the html2pptx workflow to convert HTML slides to PowerPoint with accurate positioning.

Design Principles

CRITICAL : Before creating any presentation, analyze the content and choose appropriate design elements:

Consider the subject matter : What is this presentation about? What tone, industry, or mood does it suggest?
Check for branding : If the user mentions a company/organization, consider their brand colors and identity
Match palette to content : Select colors that reflect the subject
State your approach : Explain your design choices before writing code

Requirements :

✅ State your content-informed design approach BEFORE writing code
✅ Use web-safe fonts only: Arial, Helvetica, Times New Roman, Georgia, Courier New, Verdana, Tahoma, Trebuchet MS, Impact
✅ Create clear visual hierarchy through size, weight, and color
✅ Ensure readability: strong contrast, appropriately sized text, clean alignment
✅ Be consistent: repeat patterns, spacing, and visual language across slides

Color Palette Selection

Choosing colors creatively :

Think beyond defaults : What colors genuinely match this specific topic? Avoid autopilot choices.
Consider multiple angles : Topic, industry, mood, energy level, target audience, brand identity (if mentioned)
Be adventurous : Try unexpected combinations - a healthcare presentation doesn't have to be green, finance doesn't have to be navy
Build your palette : Pick 3-5 colors that work together (dominant colors + supporting tones + accent)
Ensure contrast : Text must be clearly readable on backgrounds

Example color palettes (use these to spark creativity - choose one, adapt it, or create your own):

Classic Blue : Deep navy (#1C2833), slate gray (#2E4053), silver (#AAB7B8), off-white (#F4F6F6)
Teal & Coral: Teal (#5EA8A7), deep teal (#277884), coral (#FE4447), white (#FFFFFF)
Bold Red : Red (#C0392B), bright red (#E74C3C), orange (#F39C12), yellow (#F1C40F), green (#2ECC71)
Warm Blush : Mauve (#A49393), blush (#EED6D3), rose (#E8B4B8), cream (#FAF7F2)
Burgundy Luxury : Burgundy (#5D1D2E), crimson (#951233), rust (#C15937), gold (#997929)
Deep Purple & Emerald: Purple (#B165FB), dark blue (#181B24), emerald (#40695B), white (#FFFFFF)
Cream & Forest Green: Cream (#FFE1C7), forest green (#40695B), white (#FCFCFC)
Pink & Purple: Pink (#F8275B), coral (#FF574A), rose (#FF737D), purple (#3D2F68)
Lime & Plum: Lime (#C5DE82), plum (#7C3A5F), coral (#FD8C6E), blue-gray (#98ACB5)
Black & Gold: Gold (#BF9A4A), black (#000000), cream (#F4F6F6)
Sage & Terracotta: Sage (#87A96B), terracotta (#E07A5F), cream (#F4F1DE), charcoal (#2C2C2C)
Charcoal & Red: Charcoal (#292929), red (#E33737), light gray (#CCCBCB)
Vibrant Orange : Orange (#F96D00), light gray (#F2F2F2), charcoal (#222831)
Forest Green : Black (#191A19), green (#4E9F3D), dark green (#1E5128), white (#FFFFFF)
Retro Rainbow : Purple (#722880), pink (#D72D51), orange (#EB5C18), amber (#F08800), gold (#DEB600)
Vintage Earthy : Mustard (#E3B448), sage (#CBD18F), forest green (#3A6B35), cream (#F4F1DE)
Coastal Rose : Old rose (#AD7670), beaver (#B49886), eggshell (#F3ECDC), ash gray (#BFD5BE)

Visual Details Options

Geometric Patterns :

Diagonal section dividers instead of horizontal
Asymmetric column widths (30/70, 40/60, 25/75)
Rotated text headers at 90° or 270°
Circular/hexagonal frames for images
Triangular accent shapes in corners
Overlapping shapes for depth

Border & Frame Treatments:

Thick single-color borders (10-20pt) on one side only
Double-line borders with contrasting colors
Corner brackets instead of full frames
L-shaped borders (top+left or bottom+right)
Underline accents beneath headers (3-5pt thick)

Typography Treatments :

Extreme size contrast (72pt headlines vs 11pt body)
All-caps headers with wide letter spacing
Numbered sections in oversized display type
Monospace (Courier New) for data/stats/technical content
Condensed fonts (Arial Narrow) for dense information
Outlined text for emphasis

Chart & Data Styling:

Monochrome charts with single accent color for key data
Horizontal bar charts instead of vertical
Dot plots instead of bar charts
Minimal gridlines or none at all
Data labels directly on elements (no legends)
Oversized numbers for key metrics

Layout Innovations :

Full-bleed images with text overlays
Sidebar column (20-30% width) for navigation/context
Modular grid systems (3×3, 4×4 blocks)
Z-pattern or F-pattern content flow
Floating text boxes over colored shapes
Magazine-style multi-column layouts

Background Treatments :

Solid color blocks occupying 40-60% of slide
Gradient fills (vertical or diagonal only)
Split backgrounds (two colors, diagonal or vertical)
Edge-to-edge color bands
Negative space as a design element

Layout Tips

When creating slides with charts or tables:

Two-column layout (PREFERRED) : Use a header spanning the full width, then two columns below - text/bullets in one column and the featured content in the other. This provides better balance and makes charts/tables more readable. Use flexbox with unequal column widths (e.g., 40%/60% split) to optimize space for each content type.
Full-slide layout : Let the featured content (chart/table) take up the entire slide for maximum impact and readability
NEVER vertically stack : Do not place charts/tables below text in a single column - this causes poor readability and layout issues

Workflow

MANDATORY - READ ENTIRE FILE : Read html2pptx.md completely from start to finish. NEVER set any range limits when reading this file. Read the full file content for detailed syntax, critical formatting rules, and best practices before proceeding with presentation creation.
Create an HTML file for each slide with proper dimensions (e.g., 720pt × 405pt for 16:9)
- Use <p>, <h1>-<h6>, <ul>, <ol> for all text content
- Use class="placeholder" for areas where charts/tables will be added (render with gray background for visibility)
- CRITICAL : Rasterize gradients and icons as PNG images FIRST using Sharp, then reference in HTML
- LAYOUT : For slides with charts/tables/images, use either full-slide layout or two-column layout for better readability

Editing an existing PowerPoint presentation

When edit slides in an existing PowerPoint presentation, you need to work with the raw Office Open XML (OOXML) format. This involves unpacking the .pptx file, editing the XML content, and repacking it.

Workflow

MANDATORY - READ ENTIRE FILE : Read ooxml.md (~500 lines) completely from start to finish. NEVER set any range limits when reading this file. Read the full file content for detailed guidance on OOXML structure and editing workflows before any presentation editing.
Unpack the presentation: python ooxml/scripts/unpack.py <office_file> <output_dir>
Edit the XML files (primarily ppt/slides/slide{N}.xml and related files)
CRITICAL : Validate immediately after each edit and fix any validation errors before proceeding: python ooxml/scripts/validate.py <dir> --original <file>
Pack the final presentation: python ooxml/scripts/pack.py <input_directory> <office_file>

Creating a new PowerPoint presentation using a template

When you need to create a presentation that follows an existing template's design, you'll need to duplicate and re-arrange template slides before then replacing placeholder context.

Workflow

Extract template text AND create visual thumbnail grid :
- Extract text: python -m markitdown template.pptx > template-content.md
- Read template-content.md: Read the entire file to understand the contents of the template presentation. NEVER set any range limits when reading this file.
- Create thumbnail grids: python scripts/thumbnail.py template.pptx
- See Creating Thumbnail Grids section for more details

Analyze template and save inventory to a file :

Visual Analysis : Review thumbnail grid(s) to understand slide layouts, design patterns, and visual structure

Create and save a template inventory file at template-inventory.md containing:

# Template Inventory Analysis
**Total Slides: [count]**
**IMPORTANT: Slides are 0-indexed (first slide = 0, last slide = count-1)**

## [Category Name]
- Slide 0: [Layout code if available] - Description/purpose
- Slide 1: [Layout code] - Description/purpose
- Slide 2: [Layout code] - Description/purpose
[... EVERY slide must be listed individually with its index ...]

Example paragraphs field showing proper formatting:

"paragraphs": [
  {
    "text": "New presentation title text",
    "alignment": "CENTER",
    "bold": true
  },
  {
    "text": "Section Header",
    "bold": true
  },
  {
    "text": "First bullet point without bullet symbol",
    "bullet": true,
    "level": 0
  },
  {
    "text": "Red colored text",
    "color": "FF0000"
  },
  {
    "text": "Theme colored text",
    "theme_color": "DARK_1"
  },
  {
    "text": "Regular paragraph text without special formatting"
  }
]

Shapes not listed in the replacement JSON are automatically cleared :

{
  "slide-0": {
    "shape-0": {
      "paragraphs": [...] // This shape gets new text
    }
    // shape-1 and shape-2 from inventory will be cleared automatically
  }
}

Common formatting patterns for presentations :

 * Title slides: Bold text, sometimes centered
 * Section headers within slides: Bold text
 * Bullet lists: Each item needs `"bullet": true, "level": 0`
 * Body text: Usually no special properties needed
 * Quotes: May have special alignment or font properties

7. Apply replacements using thereplace.py script

     python scripts/replace.py working.pptx replacement-text.json output.pptx

The script will:

 * First extract the inventory of ALL text shapes using functions from inventory.py
 * Validate that all shapes in the replacement JSON exist in the inventory
 * Clear text from ALL shapes identified in the inventory
 * Apply new text only to shapes with "paragraphs" defined in the replacement JSON
 * Preserve formatting by applying paragraph properties from the JSON
 * Handle bullets, alignment, font properties, and colors automatically
 * Save the updated presentation

Example validation errors:

ERROR: Invalid shapes in replacement JSON:
  - Shape 'shape-99' not found on 'slide-0'. Available shapes: shape-0, shape-1, shape-4
  - Slide 'slide-999' not found in inventory


ERROR: Replacement text made overflow worse in these shapes:
  - slide-0/shape-2: overflow worsened by 1.25" (was 0.00", now 1.25")

Creating Thumbnail Grids

To create visual thumbnail grids of PowerPoint slides for quick analysis and reference:

python scripts/thumbnail.py template.pptx [output_prefix]

Features :

Creates: thumbnails.jpg (or thumbnails-1.jpg, thumbnails-2.jpg, etc. for large decks)
Default: 5 columns, max 30 slides per grid (5×6)
Custom prefix: python scripts/thumbnail.py template.pptx my-grid
- Note: The output prefix should include the path if you want output in a specific directory (e.g., workspace/my-grid)
Adjust columns: --cols 4 (range: 3-6, affects slides per grid)
Grid limits: 3 cols = 12 slides/grid, 4 cols = 20, 5 cols = 30, 6 cols = 42
Slides are zero-indexed (Slide 0, Slide 1, etc.)

Use cases :

Template analysis: Quickly understand slide layouts and design patterns
Content review: Visual overview of entire presentation
Navigation reference: Find specific slides by their visual appearance
Quality check: Verify all slides are properly formatted

Examples :

# Basic usage
python scripts/thumbnail.py presentation.pptx

# Combine options: custom name, columns
python scripts/thumbnail.py template.pptx analysis --cols 4

Converting Slides to Images

To visually analyze PowerPoint slides, convert them to images using a two-step process:

Convert PPTX to PDF :

soffice --headless --convert-to pdf template.pptx

Convert PDF pages to JPEG images :

pdftoppm -jpeg -r 150 template.pdf slide

This creates files like slide-1.jpg, slide-2.jpg, etc.

Options:

-r 150: Sets resolution to 150 DPI (adjust for quality/size balance)
-jpeg: Output JPEG format (use -png for PNG if preferred)
-f N: First page to convert (e.g., -f 2 starts from page 2)
-l N: Last page to convert (e.g., -l 5 stops at page 5)
slide: Prefix for output files

Example for specific range:

pdftoppm -jpeg -r 150 -f 2 -l 5 template.pdf slide  # Converts only pages 2-5

Code Style Guidelines

IMPORTANT : When generating code for PPTX operations:

Write concise code
Avoid verbose variable names and redundant operations
Avoid unnecessary print statements

Dependencies

Required dependencies (should already be installed):

markitdown : pip install "markitdown[pptx]" (for text extraction from presentations)
pptxgenjs : npm install -g pptxgenjs (for creating presentations via html2pptx)
playwright : npm install -g playwright (for HTML rendering in html2pptx)
react-icons : npm install -g react-icons react react-dom (for icons)
sharp : npm install -g sharp (for SVG rasterization and image processing)
LibreOffice : sudo apt-get install libreoffice (for PDF conversion)
Poppler : sudo apt-get install poppler-utils (for pdftoppm to convert PDF to images)

Weekly Installs

Repository

aiskillstore/marketplace

GitHub Stars

203

First Seen

Jan 20, 2026

Security Audits

Gen Agent Trust HubFail SocketPass SnykWarn

Installed on

opencode78

gemini-cli75

codex72

cursor72

github-copilot67

claude-code66

通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南

40,000 周安装

Orange & Turquoise: Light orange (#FC993E), grayish turquoise (#667C6F), white (#FCFCFC)

Create and run a JavaScript file using the html2pptx.js library to convert HTML slides to PowerPoint and save the presentation

Use the html2pptx() function to process each HTML file
Add charts and tables to placeholder areas using PptxGenJS API
Save the presentation using pptx.writeFile()

Visual validation : Generate thumbnails and inspect for layout issues

Create thumbnail grid: python scripts/thumbnail.py output.pptx workspace/thumbnails --cols 4
Read and carefully examine the thumbnail image for:
- Text cutoff : Text being cut off by header bars, shapes, or slide edges
- Text overlap : Text overlapping with other text or shapes
- Positioning issues : Content too close to slide boundaries or other elements
- Contrast issues : Insufficient contrast between text and backgrounds
If issues found, adjust HTML margins/spacing/colors and regenerate the presentation
Repeat until all slides are visually correct

Using the thumbnail grid : Reference the visual thumbnails to identify:

Layout patterns (title slides, content layouts, section dividers)
Image placeholder locations and counts
Design consistency across slide groups
Visual hierarchy and structure

This inventory file is REQUIRED for selecting appropriate templates in the next step

Create presentation outline based on template inventory :

Review available templates from step 2.
Choose an intro or title template for the first slide. This should be one of the first templates.
Choose safe, text-based layouts for the other slides.
CRITICAL: Match layout structure to actual content :
- Single-column layouts: Use for unified narrative or single topic
- Two-column layouts: Use ONLY when you have exactly 2 distinct items/concepts
- Three-column layouts: Use ONLY when you have exactly 3 distinct items/concepts
- Image + text layouts: Use ONLY when you have actual images to insert
- Quote layouts: Use ONLY for actual quotes from people (with attribution), never for emphasis
- Never use layouts with more placeholders than you have content
- If you have 2 items, don't force them into a 3-column layout
- If you have 4+ items, consider breaking into multiple slides or using a list format
Count your actual content pieces BEFORE selecting the layout
Verify each placeholder in the chosen layout will be filled with meaningful content
Select one option representing the best layout for each content section.
Save outline.md with content AND template mapping that leverages available designs

Example template mapping:

# Template slides to use (0-based indexing)
# WARNING: Verify indices are within range! Template with 73 slides has indices 0-72
# Mapping: slide numbers from outline -> template slide indices
template_mapping = [
    0,   # Use slide 0 (Title/Cover)
    34,  # Use slide 34 (B1: Title and body)
    34,  # Use slide 34 again (duplicate for second B1)
    50,  # Use slide 50 (E1: Quote)
    54,  # Use slide 54 (F2: Closing + Text)
]

Duplicate, reorder, and delete slides usingrearrange.py:

Use the scripts/rearrange.py script to create a new presentation with slides in the desired order:
```
python scripts/rearrange.py template.pptx working.pptx 0,34,34,50,52
```
The script handles duplicating repeated slides, deleting unused slides, and reordering automatically
Slide indices are 0-based (first slide is 0, second is 1, etc.)
The same slide index can appear multiple times to duplicate that slide

Extract ALL text using theinventory.py script:

Run inventory extraction :

python scripts/inventory.py working.pptx text-inventory.json

Read text-inventory.json : Read the entire text-inventory.json file to understand all shapes and their properties. NEVER set any range limits when reading this file.

The inventory JSON structure:

{
    "slide-0": {
      "shape-0": {
        "placeholder_type": "TITLE",  // or null for non-placeholders
        "left": 1.5,                  // position in inches
        "top": 2.0,
        "width": 7.5,
        "height": 1.2,
        "paragraphs": [
          {
            "text": "Paragraph text",
            // Optional properties (only included when non-default):
            "bullet": true,           // explicit bullet detected
            "level": 0,               // only included when bullet is true
            "alignment": "CENTER",    // CENTER, RIGHT (not LEFT)
            "space_before": 10.0,     // space before paragraph in points
            "space_after": 6.0,       // space after paragraph in points
            "line_spacing": 22.4,     // line spacing in points
            "font_name": "Arial",     // from first run
            "font_size": 14.0,        // in points
            "bold": true,
            "italic": false,
            "underline": false,
            "color": "FF0000"         // RGB color
          }
        ]
      }
    }
  }

Key features:
- Slides : Named as "slide-0", "slide-1", etc.
- Shapes : Ordered by visual position (top-to-bottom, left-to-right) as "shape-0", "shape-1", etc.
- Placeholder types : TITLE, CENTER_TITLE, SUBTITLE, BODY, OBJECT, or null
- Default font size : default_font_size in points extracted from layout placeholders (when available)
- Slide numbers are filtered : Shapes with SLIDE_NUMBER placeholder type are automatically excluded from inventory
- Bullets : When bullet: true, level is always included (even if 0)
- Spacing : space_before, space_after, and line_spacing in points (only included when set)
- Colors : color for RGB (e.g., "FF0000"), for theme colors (e.g., "DARK_1")

Generate replacement text and save the data to a JSON file Based on the text inventory from the previous step:

CRITICAL : First verify which shapes exist in the inventory - only reference shapes that are actually present
VALIDATION : The replace.py script will validate that all shapes in your replacement JSON exist in the inventory
- If you reference a non-existent shape, you'll get an error showing available shapes
- If you reference a non-existent slide, you'll get an error indicating the slide doesn't exist
- All validation errors are shown at once before the script exits
IMPORTANT : The replace.py script uses inventory.py internally to identify ALL text shapes
AUTOMATIC CLEARING : ALL text shapes from the inventory will be cleared unless you provide "paragraphs" for them
Add a "paragraphs" field to shapes that need content (not "replacement_paragraphs")
Shapes without "paragraphs" in the replacement JSON will have their text cleared automatically
Paragraphs with bullets will be automatically left aligned. Don't set the alignment property on when "bullet": true
Generate appropriate replacement content for placeholder text
Use shape size to determine appropriate content length
CRITICAL : Include paragraph properties from the original inventory - don't just provide text
IMPORTANT : When bullet: true, do NOT include bullet symbols (•, -, *) in text - they're added automatically
ESSENTIAL FORMATTING RULES :
- Headers/titles should typically have "bold": true
- List items should have "bullet": true, "level": 0 (level is required when bullet is true)
- Preserve any alignment properties (e.g., "alignment": "CENTER" for centered text)
- Include font properties when different from default (e.g., "font_size": 14.0, "font_name": "Lora")
- Colors: Use "color": "FF0000" for RGB or "theme_color": "DARK_1" for theme colors
- The replacement script expects properly formatted paragraphs , not just text strings
- Overlapping shapes : Prefer shapes with larger default_font_size or more appropriate placeholder_type
Save the updated inventory with replacements to replacement-text.json
WARNING : Different template layouts have different shape counts - always check the actual inventory before creating replacements

defusedxml : pip install defusedxml (for secure XML parsing)

Properties : Only non-default values are included in the output

PPTX文件创建、编辑与分析指南：Python脚本处理.pptx文件XML内容

🇨🇳中文介绍

PPTX 创建、编辑与分析

概述

读取与分析内容

文本提取

原始 XML 访问

解包文件

关键文件结构

相关 Skills

字体与颜色提取

不使用模板创建新的 PowerPoint 演示文稿

设计原则

调色板选择

视觉细节选项

布局技巧

工作流程

编辑现有的 PowerPoint 演示文稿

工作流程

使用模板创建新的 PowerPoint 演示文稿

工作流程

创建缩略图网格

将幻灯片转换为图像

代码风格指南

依赖项

🇺🇸English

PPTX creation, editing, and analysis

Overview

Reading and analyzing content

Text extraction

Raw XML access

Unpacking a file

Key file structures

Typography and color extraction

Creating a new PowerPoint presentation without a template

Design Principles

Color Palette Selection

Visual Details Options

Layout Tips

Workflow

Editing an existing PowerPoint presentation

Workflow

Creating a new PowerPoint presentation using a template

Workflow

Creating Thumbnail Grids

Converting Slides to Images

Code Style Guidelines

Dependencies

最新 Skills