docx by anthropics/skills
npx skills add https://github.com/anthropics/skills --skill docx.docx 文件是一个包含 XML 文件的 ZIP 归档包。
| 任务 | 方法 |
|---|---|
| 读取/分析内容 | pandoc 或解包以获取原始 XML |
| 创建新文档 | 使用 docx-js - 参见下文“创建新文档” |
| 编辑现有文档 | 解包 → 编辑 XML → 重新打包 - 参见下文“编辑现有文档” |
在编辑之前,必须转换旧的 .doc 文件:
python scripts/office/soffice.py --headless --convert-to docx document.doc
# 提取文本并包含修订记录
pandoc --track-changes=all document.docx -o output.md
# 访问原始 XML
python scripts/office/unpack.py document.docx unpacked/
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
生成一个接受所有修订记录的干净文档(需要 LibreOffice):
python scripts/accept_changes.py input.docx output.docx
使用 JavaScript 生成 .docx 文件,然后进行验证。安装:npm install -g docx
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab,
PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader,
TabStopType, TabStopPosition, Column, SectionType,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
创建文件后,对其进行验证。如果验证失败,请解包、修复 XML,然后重新打包。
python scripts/office/validate.py doc.docx
// 关键:docx-js 默认使用 A4,而非 US Letter
// 为获得一致的结果,请始终明确设置页面尺寸
sections: [{
properties: {
page: {
size: {
width: 12240, // 8.5 英寸,以 DXA 为单位
height: 15840 // 11 英寸,以 DXA 为单位
},
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 英寸页边距
}
},
children: [/* content */]
}]
常用页面尺寸(DXA 单位,1440 DXA = 1 英寸):
| 纸张 | 宽度 | 高度 | 内容宽度(1英寸页边距) |
|---|---|---|---|
| US Letter | 12,240 | 15,840 | 9,360 |
| A4(默认) | 11,906 | 16,838 | 9,026 |
横向(Landscape)方向: docx-js 内部会交换宽度和高度,因此传递纵向尺寸并让它处理交换:
size: {
width: 12240, // 将短边作为宽度传递
height: 15840, // 将长边作为高度传递
orientation: PageOrientation.LANDSCAPE // docx-js 在 XML 中交换它们
},
// 内容宽度 = 15840 - 左边距 - 右边距(使用长边)
使用 Arial 作为默认字体(普遍支持)。为保持可读性,标题保持黑色。
const doc = new Document({
styles: {
default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt 默认值
paragraphStyles: [
// 重要:使用确切的 ID 来覆盖内置样式
{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 32, bold: true, font: "Arial" },
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel 为目录所必需
{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 28, bold: true, font: "Arial" },
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
]
},
sections: [{
children: [
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
]
}]
});
// ❌ 错误 - 切勿手动插入项目符号字符
new Paragraph({ children: [new TextRun("• Item")] }) // 错误
new Paragraph({ children: [new TextRun("\u2022 Item")] }) // 错误
// ✅ 正确 - 使用带有 LevelFormat.BULLET 的编号配置
const doc = new Document({
numbering: {
config: [
{ reference: "bullets",
levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
{ reference: "numbers",
levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
]
},
sections: [{
children: [
new Paragraph({ numbering: { reference: "bullets", level: 0 },
children: [new TextRun("Bullet item")] }),
new Paragraph({ numbering: { reference: "numbers", level: 0 },
children: [new TextRun("Numbered item")] }),
]
}]
});
// ⚠️ 每个引用创建独立的编号序列
// 相同引用 = 继续编号(1,2,3 然后 4,5,6)
// 不同引用 = 重新开始编号(1,2,3 然后 1,2,3)
关键:表格需要双重宽度设置 - 在表格上设置 columnWidths 并在每个单元格上设置 width。缺少任何一项,表格在某些平台上可能渲染不正确。
// 关键:始终设置表格宽度以确保渲染一致
// 关键:使用 ShadingType.CLEAR(而非 SOLID)以防止黑色背景
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };
new Table({
width: { size: 9360, type: WidthType.DXA }, // 始终使用 DXA(百分比在 Google Docs 中会出错)
columnWidths: [4680, 4680], // 必须总和等于表格宽度(DXA:1440 = 1 英寸)
rows: [
new TableRow({
children: [
new TableCell({
borders,
width: { size: 4680, type: WidthType.DXA }, // 每个单元格上也需设置
shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // 使用 CLEAR,而非 SOLID
margins: { top: 80, bottom: 80, left: 120, right: 120 }, // 单元格内边距(内部,不增加宽度)
children: [new Paragraph({ children: [new TextRun("Cell")] })]
})
]
})
]
})
表格宽度计算:
始终使用 WidthType.DXA — WidthType.PERCENTAGE 在 Google Docs 中会出错。
// 表格宽度 = columnWidths 的总和 = 内容宽度
// US Letter 纸张,1英寸页边距:12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360] // 必须总和等于表格宽度
宽度规则:
WidthType.DXA — 切勿使用 WidthType.PERCENTAGE(与 Google Docs 不兼容)columnWidths 的总和width 必须与对应的 columnWidth 匹配margins 是内部内边距 - 它们减少内容区域,不增加单元格宽度// 关键:type 参数是必需的
new Paragraph({
children: [new ImageRun({
type: "png", // 必需:png、jpg、jpeg、gif、bmp、svg
data: fs.readFileSync("image.png"),
transformation: { width: 200, height: 150 },
altText: { title: "Title", description: "Desc", name: "Name" } // 三项均为必需
})]
})
// 关键:PageBreak 必须位于 Paragraph 内部
new Paragraph({ children: [new PageBreak()] })
// 或者使用 pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })
// 外部链接
new Paragraph({
children: [new ExternalHyperlink({
children: [new TextRun({ text: "Click here", style: "Hyperlink" })],
link: "https://example.com",
})]
})
// 内部链接(书签 + 引用)
// 1. 在目标位置创建书签
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [
new Bookmark({ id: "chapter1", children: [new TextRun("Chapter 1")] }),
]})
// 2. 链接到该书签
new Paragraph({ children: [new InternalHyperlink({
children: [new TextRun({ text: "See Chapter 1", style: "Hyperlink" })],
anchor: "chapter1",
})]})
const doc = new Document({
footnotes: {
1: { children: [new Paragraph("Source: Annual Report 2024")] },
2: { children: [new Paragraph("See appendix for methodology")] },
},
sections: [{
children: [new Paragraph({
children: [
new TextRun("Revenue grew 15%"),
new FootnoteReferenceRun(1),
new TextRun(" using adjusted metrics"),
new FootnoteReferenceRun(2),
],
})]
}]
});
// 在同一行右对齐文本(例如,日期与标题相对)
new Paragraph({
children: [
new TextRun("Company Name"),
new TextRun("\tJanuary 2025"),
],
tabStops: [{ type: TabStopType.RIGHT, position: TabStopPosition.MAX }],
})
// 点引导线(例如,目录样式)
new Paragraph({
children: [
new TextRun("Introduction"),
new TextRun({ children: [
new PositionalTab({
alignment: PositionalTabAlignment.RIGHT,
relativeTo: PositionalTabRelativeTo.MARGIN,
leader: PositionalTabLeader.DOT,
}),
"3",
]}),
],
})
// 等宽栏
sections: [{
properties: {
column: {
count: 2, // 栏数
space: 720, // 栏间间距,以 DXA 为单位(720 = 0.5 英寸)
equalWidth: true,
separate: true, // 栏间垂直线
},
},
children: [/* 内容自然跨栏流动 */]
}]
// 自定义宽度栏(equalWidth 必须为 false)
sections: [{
properties: {
column: {
equalWidth: false,
children: [
new Column({ width: 5400, space: 720 }),
new Column({ width: 3240 }),
],
},
},
children: [/* content */]
}]
使用 type: SectionType.NEXT_COLUMN 的新节来强制分栏。
// 关键:标题必须仅使用 HeadingLevel - 不能使用自定义样式
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })
sections: [{
properties: {
page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 英寸
},
headers: {
default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
},
footers: {
default: new Footer({ children: [new Paragraph({
children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
})] })
},
children: [/* content */]
}]
width 传递,长边作为 height 传递,并设置 orientation: PageOrientation.LANDSCAPE\n - 使用单独的 Paragraph 元素LevelFormat.BULLETtype - 始终指定 png/jpg 等width - 切勿使用 WidthType.PERCENTAGE(在 Google Docs 中会出错)columnWidths 数组和单元格的 width,两者必须匹配margins: { top: 80, bottom: 80, left: 120, right: 120 } 以获得可读的内边距ShadingType.CLEAR - 对于表格底纹,切勿使用 SOLIDborder: { bottom: { style: BorderStyle.SINGLE, size: 6, color: "2E75B6", space: 1 } }。对于两栏页脚,请使用制表位(参见“制表位”部分),而非表格outlineLevel - 目录所必需(H1 为 0,H2 为 1,依此类推)请按顺序遵循以下 3 个步骤。
python scripts/office/unpack.py document.docx unpacked/
提取 XML,进行美化打印,合并相邻的 run,并将智能引号转换为 XML 实体(“ 等),以便在编辑后保留。使用 --merge-runs false 跳过 run 合并。
编辑 unpacked/word/ 中的文件。有关模式,请参见下文的 XML 参考。
对于修订记录和批注,使用 "Claude" 作为作者,除非用户明确要求使用不同的名称。
对于字符串替换,请直接使用编辑工具。不要编写 Python 脚本。 脚本会引入不必要的复杂性。编辑工具会准确显示正在替换的内容。
关键:为新内容使用智能引号。 当添加带有撇号或引号的文本时,请使用 XML 实体来生成智能引号:
<!-- 使用这些实体以获得专业的排版效果 -->
<w:t>Here’s a quote: “Hello”</w:t>
| 实体 | 字符 |
|---|---|
‘ | ‘(左单引号) |
’ | ’(右单引号 / 撇号) |
“ | “(左双引号) |
” | ”(右双引号) |
添加批注: 使用 comment.py 处理跨多个 XML 文件的样板代码(文本必须是预先转义的 XML):
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # 回复批注 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # 自定义作者名称
然后将标记添加到 document.xml(参见 XML 参考中的“批注”部分)。
python scripts/office/pack.py unpacked/ output.docx --original document.docx
使用自动修复进行验证,压缩 XML,并创建 DOCX。使用 --validate false 跳过验证。
自动修复将修复:
durableId >= 0x7FFFFFFF(重新生成有效 ID)<w:t> 上缺少 xml:space="preserve"自动修复不会修复:
<w:r> 元素:当添加修订记录时,将整个 <w:r>...</w:r> 块替换为作为同级元素的 <w:del>...<w:ins>...。不要在 run 内部注入修订记录标签。<w:rPr> 格式:将原始 run 的 <w:rPr> 块复制到您的修订记录 run 中,以保持加粗、字体大小等格式。<w:pPr> 中的元素顺序:<w:pStyle>、<w:numPr>、<w:spacing>、<w:ind>、<w:jc>、<w:rPr> 最后<w:t> 上添加 xml:space="preserve"00AB1234)插入:
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:t>inserted text</w:t></w:r>
</w:ins>
删除:
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
在<w:del> 内部:使用 <w:delText> 代替 <w:t>,使用 <w:delInstrText> 代替 <w:instrText>。
最小化编辑 - 仅标记更改的部分:
<!-- 将 "30 days" 改为 "60 days" -->
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
<w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
<w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>
删除整个段落/列表项 - 当删除段落中的所有内容时,还需将段落标记标记为已删除,以便与下一个段落合并。在 <w:pPr><w:rPr> 内部添加 <w:del/>:
<w:p>
<w:pPr>
<w:numPr>...</w:numPr> <!-- 如果存在列表编号 -->
<w:rPr>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
</w:rPr>
</w:pPr>
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
</w:del>
</w:p>
如果没有 <w:pPr><w:rPr> 中的 <w:del/>,接受更改后会留下一个空的段落/列表项。
拒绝另一位作者的插入 - 将删除嵌套在他们的插入内部:
<w:ins w:author="Jane" w:id="5">
<w:del w:author="Claude" w:id="10">
<w:r><w:delText>their inserted text</w:delText></w:r>
</w:del>
</w:ins>
恢复另一位作者的删除 - 在其删除之后添加插入(不要修改他们的删除):
<w:del w:author="Jane" w:id="5">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
<w:r><w:t>deleted text</w:t></w:r>
</w:ins>
运行 comment.py 后(参见步骤 2),将标记添加到 document.xml。对于回复,使用 --parent 标志并将标记嵌套在父标记内部。
关键:<w:commentRangeStart> 和 <w:commentRangeEnd> 是 <w:r> 的同级元素,绝不在 <w:r> 内部。
<!-- 批注标记是 w:p 的直接子元素,绝不在 w:r 内部 -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted</w:delText></w:r>
</w:del>
<w:r><w:t> more text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<!-- 批注 0 内部嵌套了回复 1 -->
<w:commentRangeStart w:id="0"/>
<w:commentRangeStart w:id="1"/>
<w:r><w:t>text</w:t></w:r>
<w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>
word/media/word/_rels/document.xml.rels:<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>
[Content_Types].xml:<Default Extension="png" ContentType="image/png"/>
<w:drawing>
<wp:inline>
<wp:extent cx="914400" cy="914400"/> <!-- EMU:914400 = 1 英寸 -->
<a:graphic>
<a:graphicData uri=".../picture">
<pic:pic>
<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
npm install -g docx(新文档)scripts/office/soffice.py 为沙盒环境自动配置)pdftoppm每周安装量
37.7K
代码库
GitHub 星标数
101.0K
首次出现
2026年1月19日
安全审计
安装于
opencode31.6K
gemini-cli30.1K
codex29.9K
github-copilot27.9K
cursor27.4K
amp25.7K
A .docx file is a ZIP archive containing XML files.
| Task | Approach |
|---|---|
| Read/analyze content | pandoc or unpack for raw XML |
| Create new document | Use docx-js - see Creating New Documents below |
| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |
Legacy .doc files must be converted before editing:
python scripts/office/soffice.py --headless --convert-to docx document.doc
# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md
# Raw XML access
python scripts/office/unpack.py document.docx unpacked/
python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page
To produce a clean document with all tracked changes accepted (requires LibreOffice):
python scripts/accept_changes.py input.docx output.docx
Generate .docx files with JavaScript, then validate. Install: npm install -g docx
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab,
PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader,
TabStopType, TabStopPosition, Column, SectionType,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
python scripts/office/validate.py doc.docx
// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results
sections: [{
properties: {
page: {
size: {
width: 12240, // 8.5 inches in DXA
height: 15840 // 11 inches in DXA
},
margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } // 1 inch margins
}
},
children: [/* content */]
}]
Common page sizes (DXA units, 1440 DXA = 1 inch):
| Paper | Width | Height | Content Width (1" margins) |
|---|---|---|---|
| US Letter | 12,240 | 15,840 | 9,360 |
| A4 (default) | 11,906 | 16,838 | 9,026 |
Landscape orientation: docx-js swaps width/height internally, so pass portrait dimensions and let it handle the swap:
size: {
width: 12240, // Pass SHORT edge as width
height: 15840, // Pass LONG edge as height
orientation: PageOrientation.LANDSCAPE // docx-js swaps them in the XML
},
// Content width = 15840 - left margin - right margin (uses the long edge)
Use Arial as the default font (universally supported). Keep titles black for readability.
const doc = new Document({
styles: {
default: { document: { run: { font: "Arial", size: 24 } } }, // 12pt default
paragraphStyles: [
// IMPORTANT: Use exact IDs to override built-in styles
{ id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 32, bold: true, font: "Arial" },
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 } }, // outlineLevel required for TOC
{ id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 28, bold: true, font: "Arial" },
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 } },
]
},
sections: [{
children: [
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("Title")] }),
]
}]
});
// ❌ WRONG - never manually insert bullet characters
new Paragraph({ children: [new TextRun("• Item")] }) // BAD
new Paragraph({ children: [new TextRun("\u2022 Item")] }) // BAD
// ✅ CORRECT - use numbering config with LevelFormat.BULLET
const doc = new Document({
numbering: {
config: [
{ reference: "bullets",
levels: [{ level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
{ reference: "numbers",
levels: [{ level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } } }] },
]
},
sections: [{
children: [
new Paragraph({ numbering: { reference: "bullets", level: 0 },
children: [new TextRun("Bullet item")] }),
new Paragraph({ numbering: { reference: "numbers", level: 0 },
children: [new TextRun("Numbered item")] }),
]
}]
});
// ⚠️ Each reference creates INDEPENDENT numbering
// Same reference = continues (1,2,3 then 4,5,6)
// Different reference = restarts (1,2,3 then 1,2,3)
CRITICAL: Tables need dual widths - set both columnWidths on the table AND width on each cell. Without both, tables render incorrectly on some platforms.
// CRITICAL: Always set table width for consistent rendering
// CRITICAL: Use ShadingType.CLEAR (not SOLID) to prevent black backgrounds
const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };
new Table({
width: { size: 9360, type: WidthType.DXA }, // Always use DXA (percentages break in Google Docs)
columnWidths: [4680, 4680], // Must sum to table width (DXA: 1440 = 1 inch)
rows: [
new TableRow({
children: [
new TableCell({
borders,
width: { size: 4680, type: WidthType.DXA }, // Also set on each cell
shading: { fill: "D5E8F0", type: ShadingType.CLEAR }, // CLEAR not SOLID
margins: { top: 80, bottom: 80, left: 120, right: 120 }, // Cell padding (internal, not added to width)
children: [new Paragraph({ children: [new TextRun("Cell")] })]
})
]
})
]
})
Table width calculation:
Always use WidthType.DXA — WidthType.PERCENTAGE breaks in Google Docs.
// Table width = sum of columnWidths = content width
// US Letter with 1" margins: 12240 - 2880 = 9360 DXA
width: { size: 9360, type: WidthType.DXA },
columnWidths: [7000, 2360] // Must sum to table width
Width rules:
WidthType.DXA — never WidthType.PERCENTAGE (incompatible with Google Docs)columnWidthswidth must match corresponding columnWidthmargins are internal padding - they reduce content area, not add to cell width// CRITICAL: type parameter is REQUIRED
new Paragraph({
children: [new ImageRun({
type: "png", // Required: png, jpg, jpeg, gif, bmp, svg
data: fs.readFileSync("image.png"),
transformation: { width: 200, height: 150 },
altText: { title: "Title", description: "Desc", name: "Name" } // All three required
})]
})
// CRITICAL: PageBreak must be inside a Paragraph
new Paragraph({ children: [new PageBreak()] })
// Or use pageBreakBefore
new Paragraph({ pageBreakBefore: true, children: [new TextRun("New page")] })
// External link
new Paragraph({
children: [new ExternalHyperlink({
children: [new TextRun({ text: "Click here", style: "Hyperlink" })],
link: "https://example.com",
})]
})
// Internal link (bookmark + reference)
// 1. Create bookmark at destination
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [
new Bookmark({ id: "chapter1", children: [new TextRun("Chapter 1")] }),
]})
// 2. Link to it
new Paragraph({ children: [new InternalHyperlink({
children: [new TextRun({ text: "See Chapter 1", style: "Hyperlink" })],
anchor: "chapter1",
})]})
const doc = new Document({
footnotes: {
1: { children: [new Paragraph("Source: Annual Report 2024")] },
2: { children: [new Paragraph("See appendix for methodology")] },
},
sections: [{
children: [new Paragraph({
children: [
new TextRun("Revenue grew 15%"),
new FootnoteReferenceRun(1),
new TextRun(" using adjusted metrics"),
new FootnoteReferenceRun(2),
],
})]
}]
});
// Right-align text on same line (e.g., date opposite a title)
new Paragraph({
children: [
new TextRun("Company Name"),
new TextRun("\tJanuary 2025"),
],
tabStops: [{ type: TabStopType.RIGHT, position: TabStopPosition.MAX }],
})
// Dot leader (e.g., TOC-style)
new Paragraph({
children: [
new TextRun("Introduction"),
new TextRun({ children: [
new PositionalTab({
alignment: PositionalTabAlignment.RIGHT,
relativeTo: PositionalTabRelativeTo.MARGIN,
leader: PositionalTabLeader.DOT,
}),
"3",
]}),
],
})
// Equal-width columns
sections: [{
properties: {
column: {
count: 2, // number of columns
space: 720, // gap between columns in DXA (720 = 0.5 inch)
equalWidth: true,
separate: true, // vertical line between columns
},
},
children: [/* content flows naturally across columns */]
}]
// Custom-width columns (equalWidth must be false)
sections: [{
properties: {
column: {
equalWidth: false,
children: [
new Column({ width: 5400, space: 720 }),
new Column({ width: 3240 }),
],
},
},
children: [/* content */]
}]
Force a column break with a new section using type: SectionType.NEXT_COLUMN.
// CRITICAL: Headings must use HeadingLevel ONLY - no custom styles
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" })
sections: [{
properties: {
page: { margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 } } // 1440 = 1 inch
},
headers: {
default: new Header({ children: [new Paragraph({ children: [new TextRun("Header")] })] })
},
footers: {
default: new Footer({ children: [new Paragraph({
children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })]
})] })
},
children: [/* content */]
}]
width, long edge as height, and set orientation: PageOrientation.LANDSCAPE\n - use separate Paragraph elementsLevelFormat.BULLET with numbering configtype - always specify png/jpg/etcwidth with DXA - never use WidthType.PERCENTAGE (breaks in Google Docs)Follow all 3 steps in order.
python scripts/office/unpack.py document.docx unpacked/
Extracts XML, pretty-prints, merges adjacent runs, and converts smart quotes to XML entities (“ etc.) so they survive editing. Use --merge-runs false to skip run merging.
Edit files in unpacked/word/. See XML Reference below for patterns.
Use "Claude" as the author for tracked changes and comments, unless the user explicitly requests use of a different name.
Use the Edit tool directly for string replacement. Do not write Python scripts. Scripts introduce unnecessary complexity. The Edit tool shows exactly what is being replaced.
CRITICAL: Use smart quotes for new content. When adding text with apostrophes or quotes, use XML entities to produce smart quotes:
<!-- Use these entities for professional typography -->
<w:t>Here’s a quote: “Hello”</w:t>
| Entity | Character |
|---|---|
‘ | ‘ (left single) |
’ | ’ (right single / apostrophe) |
“ | “ (left double) |
” | ” (right double) |
Adding comments: Use comment.py to handle boilerplate across multiple XML files (text must be pre-escaped XML):
python scripts/comment.py unpacked/ 0 "Comment text with & and ’"
python scripts/comment.py unpacked/ 1 "Reply text" --parent 0 # reply to comment 0
python scripts/comment.py unpacked/ 0 "Text" --author "Custom Author" # custom author name
Then add markers to document.xml (see Comments in XML Reference).
python scripts/office/pack.py unpacked/ output.docx --original document.docx
Validates with auto-repair, condenses XML, and creates DOCX. Use --validate false to skip.
Auto-repair will fix:
durableId >= 0x7FFFFFFF (regenerates valid ID)xml:space="preserve" on <w:t> with whitespaceAuto-repair won't fix:
<w:r> elements: When adding tracked changes, replace the whole <w:r>...</w:r> block with <w:del>...<w:ins>... as siblings. Don't inject tracked change tags inside a run.<w:rPr> formatting: Copy the original run's <w:rPr> block into your tracked change runs to maintain bold, font size, etc.<w:pPr>: <w:pStyle>, <w:numPr>, <w:spacing>, <w:ind>, <w:jc>, <w:rPr> lastxml:space="preserve" to <w:t> with leading/trailing spaces00AB1234)Insertion:
<w:ins w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:t>inserted text</w:t></w:r>
</w:ins>
Deletion:
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
Inside<w:del>: Use <w:delText> instead of <w:t>, and <w:delInstrText> instead of <w:instrText>.
Minimal edits - only mark what changes:
<!-- Change "30 days" to "60 days" -->
<w:r><w:t>The term is </w:t></w:r>
<w:del w:id="1" w:author="Claude" w:date="...">
<w:r><w:delText>30</w:delText></w:r>
</w:del>
<w:ins w:id="2" w:author="Claude" w:date="...">
<w:r><w:t>60</w:t></w:r>
</w:ins>
<w:r><w:t> days.</w:t></w:r>
Deleting entire paragraphs/list items - when removing ALL content from a paragraph, also mark the paragraph mark as deleted so it merges with the next paragraph. Add <w:del/> inside <w:pPr><w:rPr>:
<w:p>
<w:pPr>
<w:numPr>...</w:numPr> <!-- list numbering if present -->
<w:rPr>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z"/>
</w:rPr>
</w:pPr>
<w:del w:id="2" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>Entire paragraph content being deleted...</w:delText></w:r>
</w:del>
</w:p>
Without the <w:del/> in <w:pPr><w:rPr>, accepting changes leaves an empty paragraph/list item.
Rejecting another author's insertion - nest deletion inside their insertion:
<w:ins w:author="Jane" w:id="5">
<w:del w:author="Claude" w:id="10">
<w:r><w:delText>their inserted text</w:delText></w:r>
</w:del>
</w:ins>
Restoring another author's deletion - add insertion after (don't modify their deletion):
<w:del w:author="Jane" w:id="5">
<w:r><w:delText>deleted text</w:delText></w:r>
</w:del>
<w:ins w:author="Claude" w:id="10">
<w:r><w:t>deleted text</w:t></w:r>
</w:ins>
After running comment.py (see Step 2), add markers to document.xml. For replies, use --parent flag and nest markers inside the parent's.
CRITICAL:<w:commentRangeStart> and <w:commentRangeEnd> are siblings of <w:r>, never inside <w:r>.
<!-- Comment markers are direct children of w:p, never inside w:r -->
<w:commentRangeStart w:id="0"/>
<w:del w:id="1" w:author="Claude" w:date="2025-01-01T00:00:00Z">
<w:r><w:delText>deleted</w:delText></w:r>
</w:del>
<w:r><w:t> more text</w:t></w:r>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<!-- Comment 0 with reply 1 nested inside -->
<w:commentRangeStart w:id="0"/>
<w:commentRangeStart w:id="1"/>
<w:r><w:t>text</w:t></w:r>
<w:commentRangeEnd w:id="1"/>
<w:commentRangeEnd w:id="0"/>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="0"/></w:r>
<w:r><w:rPr><w:rStyle w:val="CommentReference"/></w:rPr><w:commentReference w:id="1"/></w:r>
word/media/word/_rels/document.xml.rels:<Relationship Id="rId5" Type=".../image" Target="media/image1.png"/>
[Content_Types].xml:<Default Extension="png" ContentType="image/png"/>
<w:drawing>
<wp:inline>
<wp:extent cx="914400" cy="914400"/> <!-- EMUs: 914400 = 1 inch -->
<a:graphic>
<a:graphicData uri=".../picture">
<pic:pic>
<pic:blipFill><a:blip r:embed="rId5"/></pic:blipFill>
</pic:pic>
</a:graphicData>
</a:graphic>
</wp:inline>
</w:drawing>
npm install -g docx (new documents)scripts/office/soffice.py)pdftoppm for imagesWeekly Installs
37.7K
Repository
GitHub Stars
101.0K
First Seen
Jan 19, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode31.6K
gemini-cli30.1K
codex29.9K
github-copilot27.9K
cursor27.4K
amp25.7K
97,600 周安装
columnWidths array AND cell width, both must matchmargins: { top: 80, bottom: 80, left: 120, right: 120 } for readable paddingShadingType.CLEAR - never SOLID for table shadingborder: { bottom: { style: BorderStyle.SINGLE, size: 6, color: "2E75B6", space: 1 } } on a Paragraph instead. For two-column footers, use tab stops (see Tab Stops section), not tablesoutlineLevel - required for TOC (0 for H1, 1 for H2, etc.)