xlsx by appautomaton/document-skills
npx skills add https://github.com/appautomaton/document-skills --skill xlsx除非用户或现有模板另有说明
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
用户可能要求您创建、编辑或分析 .xlsx 文件的内容。针对不同的任务,您可以使用不同的工具和工作流程。
公式重新计算需要 LibreOffice :您可以假设已安装 LibreOffice,以便使用 recalc.py 脚本重新计算公式值。该脚本在首次运行时会自动配置 LibreOffice。
对于数据分析、可视化和基本操作,请使用 pandas,它提供了强大的数据操作能力:
import pandas as pd
# 读取 Excel
df = pd.read_excel('file.xlsx') # 默认:第一个工作表
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # 所有工作表作为字典
# 分析
df.head() # 预览数据
df.info() # 列信息
df.describe() # 统计信息
# 写入 Excel
df.to_excel('output.xlsx', index=False)
始终使用 Excel 公式,而不是在 Python 中计算值并硬编码它们。 这确保了电子表格保持动态性和可更新性。
# 错误:在 Python 中计算并硬编码结果
total = df['Sales'].sum()
sheet['B10'] = total # 硬编码 5000
# 错误:在 Python 中计算增长率
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # 硬编码 0.15
# 错误:在 Python 中计算平均值
avg = sum(values) / len(values)
sheet['D20'] = avg # 硬编码 42.5
# 正确:让 Excel 计算总和
sheet['B10'] = '=SUM(B2:B9)'
# 正确:使用 Excel 公式计算增长率
sheet['C5'] = '=(C4-C2)/C2'
# 正确:使用 Excel 函数计算平均值
sheet['D20'] = '=AVERAGE(D2:D19)'
这适用于所有计算 - 总计、百分比、比率、差值等。当源数据更改时,电子表格应能重新计算。
选择工具 :数据处理用 pandas,公式/格式用 openpyxl
创建/加载 :创建新工作簿或加载现有文件
修改 :添加/编辑数据、公式和格式
保存 :写入文件
重新计算公式(如果使用公式则必须执行) :使用 recalc.py 脚本
python recalc.py output.xlsx
验证并修复任何错误 :
status 为 errors_found,请检查 error_summary 以获取具体的错误类型和位置#REF!:无效的单元格引用#DIV/0!:除以零#VALUE!:公式中使用了错误的数据类型#NAME?:无法识别的公式名称# 使用 openpyxl 处理公式和格式
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
# 添加数据
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
# 添加公式
sheet['B2'] = '=SUM(A1:A10)'
# 格式设置
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
# 列宽
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
# 使用 openpyxl 保留公式和格式
from openpyxl import load_workbook
# 加载现有文件
wb = load_workbook('existing.xlsx')
sheet = wb.active # 或 wb['SheetName'] 指定特定工作表
# 处理多个工作表
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Sheet: {sheet_name}")
# 修改单元格
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # 在第 2 行插入行
sheet.delete_cols(3) # 删除第 3 列
# 添加新工作表
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
由 openpyxl 创建或修改的 Excel 文件包含作为字符串的公式,但不包含计算值。使用提供的 recalc.py 脚本重新计算公式:
python recalc.py <excel_file> [timeout_seconds]
示例:
python recalc.py output.xlsx 30
该脚本:
确保公式正常工作的快速检查:
pd.notna() 检查空值/ 之前检查分母(#DIV/0!)脚本返回包含错误详情的 JSON:
{
"status": "success", // 或 "errors_found"
"total_errors": 0, // 错误总数
"total_formulas": 42, // 文件中的公式数量
"error_summary": { // 仅在发现错误时存在
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}
data_only=True 读取计算值:load_workbook('file.xlsx', data_only=True)data_only=True 打开并保存,公式将被值替换并永久丢失read_only=True,写入时使用 write_only=Truepd.read_excel('file.xlsx', dtype={'id': str})pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])pd.read_excel('file.xlsx', parse_dates=['date_column'])重要 :当为 Excel 操作生成 Python 代码时:
对于 Excel 文件本身 :
使用 ExcelFile 高效处理所有工作表:
import pandas as pd
excel_file = pd.ExcelFile("workbook.xlsx")
for sheet_name in excel_file.sheet_names:
df = pd.read_excel(excel_file, sheet_name=sheet_name)
print(f"{sheet_name}: {len(df)} rows")
import pandas as pd
df = pd.read_excel("sales_data.xlsx")
pivot = pd.pivot_table(
df,
values="sales",
index="region",
columns="product",
aggfunc="sum",
fill_value=0
)
pivot.to_excel("pivot_report.xlsx")
df = pd.read_excel("sales.xlsx")
# 分组并求和
sales_by_region = df.groupby("region")["sales"].sum()
# 多重聚合
summary = df.groupby("region").agg({
"sales": "sum",
"quantity": "mean",
"profit": ["min", "max"]
})
# 简单筛选
high_sales = df[df["sales"] > 10000]
# 多重条件
filtered = df[(df["region"] == "West") & (df["sales"] > 5000)]
# 计算新列
df["profit_margin"] = (df["revenue"] - df["cost"]) / df["revenue"]
# 排序
df_sorted = df.sort_values("sales", ascending=False)
import pandas as pd
df = pd.read_excel("messy_data.xlsx")
# 移除重复项
df = df.drop_duplicates()
# 处理缺失值
df = df.fillna(0) # 用值填充
df = df.dropna() # 删除包含缺失值的行
df = df.dropna(subset=["important_col"]) # 仅当特定列为空时删除
# 移除字符串中的空白字符
df["name"] = df["name"].str.strip()
# 转换数据类型
df["date"] = pd.to_datetime(df["date"])
df["amount"] = pd.to_numeric(df["amount"], errors="coerce")
# 保存清洗后的数据
df.to_excel("cleaned_data.xlsx", index=False)
import pandas as pd
# 垂直拼接文件(堆叠行)
df1 = pd.read_excel("sales_q1.xlsx")
df2 = pd.read_excel("sales_q2.xlsx")
combined = pd.concat([df1, df2], ignore_index=True)
# 基于公共列合并(类似 SQL JOIN)
customers = pd.read_excel("customers.xlsx")
sales = pd.read_excel("sales.xlsx")
merged = pd.merge(sales, customers, on="customer_id", how="left")
merged.to_excel("merged_data.xlsx", index=False)
使用 matplotlib 从 Excel 数据生成图表:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel("data.xlsx")
# 条形图
df.plot(x="category", y="value", kind="bar")
plt.title("按类别销售额")
plt.xlabel("类别")
plt.ylabel("销售额")
plt.tight_layout()
plt.savefig("bar_chart.png")
plt.close()
# 饼图
df.set_index("category")["value"].plot(kind="pie", autopct="%1.1f%%")
plt.title("市场份额")
plt.ylabel("")
plt.savefig("pie_chart.png")
plt.close()
# 折线图
df.plot(x="date", y="revenue", kind="line")
plt.savefig("trend.png")
plt.close()
根据单元格值以编程方式应用格式:
import pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Font
df = pd.DataFrame({
"Product": ["A", "B", "C"],
"Sales": [100, 200, 150]
})
df.to_excel("formatted.xlsx", index=False)
wb = load_workbook("formatted.xlsx")
ws = wb.active
# 定义填充色
red_fill = PatternFill(start_color="FF0000", end_color="FF0000", fill_type="solid")
green_fill = PatternFill(start_color="00FF00", end_color="00FF00", fill_type="solid")
# 应用条件格式
for row in range(2, len(df) + 2):
cell = ws[f"B{row}"]
if cell.value < 150:
cell.fill = red_fill
else:
cell.fill = green_fill
# 标题加粗
for cell in ws[1]:
cell.font = Font(bold=True)
wb.save("formatted.xlsx")
对于大型 Excel 文件:
import pandas as pd
# 仅读取特定列
df = pd.read_excel("large.xlsx", usecols=["A", "C", "E"])
# 对于非常大的文件,分块读取
for chunk in pd.read_excel("huge.xlsx", chunksize=10000):
# 处理每个数据块
process(chunk)
# 指定数据类型以避免推断开销
df = pd.read_excel("data.xlsx", dtype={"id": str, "amount": float})
# 对于使用 openpyxl 处理大文件
from openpyxl import load_workbook
wb = load_workbook("large.xlsx", read_only=True) # 只读模式
import pandas as pd
df = pd.DataFrame({"Product": ["Widget A", "Widget B"], "Sales": [100, 200]})
writer = pd.ExcelWriter("output.xlsx", engine="openpyxl")
df.to_excel(writer, sheet_name="Sales", index=False)
worksheet = writer.sheets["Sales"]
for column in worksheet.columns:
max_length = 0
column_letter = column[0].column_letter
for cell in column:
try:
if len(str(cell.value)) > max_length:
max_length = len(str(cell.value))
except:
pass
worksheet.column_dimensions[column_letter].width = max_length + 2
writer.close()
每周安装量
132
代码仓库
GitHub 星标数
29
首次出现
2026年2月7日
安全审计
安装于
opencode122
gemini-cli116
codex116
cursor116
github-copilot110
kimi-cli106
Unless otherwise stated by the user or existing template
A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.
LibreOffice Required for Formula Recalculation : You can assume LibreOffice is installed for recalculating formula values using the recalc.py script. The script automatically configures LibreOffice on first run
For data analysis, visualization, and basic operations, use pandas which provides powerful data manipulation capabilities:
import pandas as pd
# Read Excel
df = pd.read_excel('file.xlsx') # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
# Analyze
df.head() # Preview data
df.info() # Column info
df.describe() # Statistics
# Write Excel
df.to_excel('output.xlsx', index=False)
Always use Excel formulas instead of calculating values in Python and hardcoding them. This ensures the spreadsheet remains dynamic and updateable.
# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total # Hardcodes 5000
# Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hardcodes 0.15
# Bad: Python calculation for average
avg = sum(values) / len(values)
sheet['D20'] = avg # Hardcodes 42.5
# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'
# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'
# Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'
This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.
Choose tool : pandas for data, openpyxl for formulas/formatting
Create/Load : Create new workbook or load existing file
Modify : Add/edit data, formulas, and formatting
Save : Write to file
Recalculate formulas (MANDATORY IF USING FORMULAS) : Use the recalc.py script
python recalc.py output.xlsx
Verify and fix any errors :
status is errors_found, check error_summary for specific error types and locations#REF!: Invalid cell references# Using openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
# Add formula
sheet['B2'] = '=SUM(A1:A10)'
# Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
# Column width
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
# Using openpyxl to preserve formulas and formatting
from openpyxl import load_workbook
# Load existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName'] for specific sheet
# Working with multiple sheets
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Sheet: {sheet_name}")
# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # Insert row at position 2
sheet.delete_cols(3) # Delete column 3
# Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided recalc.py script to recalculate formulas:
python recalc.py <excel_file> [timeout_seconds]
Example:
python recalc.py output.xlsx 30
The script:
Quick checks to ensure formulas work correctly:
pd.notna()/ in formulas (#DIV/0!)The script returns JSON with error details:
{
"status": "success", // or "errors_found"
"total_errors": 0, // Total error count
"total_formulas": 42, // Number of formulas in file
"error_summary": { // Only present if errors found
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}
data_only=True to read calculated values: load_workbook('file.xlsx', data_only=True)data_only=True and saved, formulas are replaced with values and permanently lostread_only=True for reading or write_only=True for writingpd.read_excel('file.xlsx', dtype={'id': str})pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])pd.read_excel('file.xlsx', parse_dates=['date_column'])IMPORTANT : When generating Python code for Excel operations:
For Excel files themselves :
Process all sheets efficiently with ExcelFile:
import pandas as pd
excel_file = pd.ExcelFile("workbook.xlsx")
for sheet_name in excel_file.sheet_names:
df = pd.read_excel(excel_file, sheet_name=sheet_name)
print(f"{sheet_name}: {len(df)} rows")
import pandas as pd
df = pd.read_excel("sales_data.xlsx")
pivot = pd.pivot_table(
df,
values="sales",
index="region",
columns="product",
aggfunc="sum",
fill_value=0
)
pivot.to_excel("pivot_report.xlsx")
df = pd.read_excel("sales.xlsx")
# Group and sum
sales_by_region = df.groupby("region")["sales"].sum()
# Multiple aggregations
summary = df.groupby("region").agg({
"sales": "sum",
"quantity": "mean",
"profit": ["min", "max"]
})
# Simple filter
high_sales = df[df["sales"] > 10000]
# Multiple conditions
filtered = df[(df["region"] == "West") & (df["sales"] > 5000)]
# Calculate new columns
df["profit_margin"] = (df["revenue"] - df["cost"]) / df["revenue"]
# Sort
df_sorted = df.sort_values("sales", ascending=False)
import pandas as pd
df = pd.read_excel("messy_data.xlsx")
# Remove duplicates
df = df.drop_duplicates()
# Handle missing values
df = df.fillna(0) # Fill with value
df = df.dropna() # Drop rows with missing values
df = df.dropna(subset=["important_col"]) # Drop only if specific column is null
# Remove whitespace from strings
df["name"] = df["name"].str.strip()
# Convert data types
df["date"] = pd.to_datetime(df["date"])
df["amount"] = pd.to_numeric(df["amount"], errors="coerce")
# Save cleaned data
df.to_excel("cleaned_data.xlsx", index=False)
import pandas as pd
# Concatenate files vertically (stack rows)
df1 = pd.read_excel("sales_q1.xlsx")
df2 = pd.read_excel("sales_q2.xlsx")
combined = pd.concat([df1, df2], ignore_index=True)
# Merge on common column (like SQL JOIN)
customers = pd.read_excel("customers.xlsx")
sales = pd.read_excel("sales.xlsx")
merged = pd.merge(sales, customers, on="customer_id", how="left")
merged.to_excel("merged_data.xlsx", index=False)
Generate charts from Excel data using matplotlib:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel("data.xlsx")
# Bar chart
df.plot(x="category", y="value", kind="bar")
plt.title("Sales by Category")
plt.xlabel("Category")
plt.ylabel("Sales")
plt.tight_layout()
plt.savefig("bar_chart.png")
plt.close()
# Pie chart
df.set_index("category")["value"].plot(kind="pie", autopct="%1.1f%%")
plt.title("Market Share")
plt.ylabel("")
plt.savefig("pie_chart.png")
plt.close()
# Line chart
df.plot(x="date", y="revenue", kind="line")
plt.savefig("trend.png")
plt.close()
Apply formatting programmatically based on cell values:
import pandas as pd
from openpyxl import load_workbook
from openpyxl.styles import PatternFill, Font
df = pd.DataFrame({
"Product": ["A", "B", "C"],
"Sales": [100, 200, 150]
})
df.to_excel("formatted.xlsx", index=False)
wb = load_workbook("formatted.xlsx")
ws = wb.active
# Define fills
red_fill = PatternFill(start_color="FF0000", end_color="FF0000", fill_type="solid")
green_fill = PatternFill(start_color="00FF00", end_color="00FF00", fill_type="solid")
# Apply conditional formatting
for row in range(2, len(df) + 2):
cell = ws[f"B{row}"]
if cell.value < 150:
cell.fill = red_fill
else:
cell.fill = green_fill
# Bold headers
for cell in ws[1]:
cell.font = Font(bold=True)
wb.save("formatted.xlsx")
For large Excel files:
import pandas as pd
# Read only specific columns
df = pd.read_excel("large.xlsx", usecols=["A", "C", "E"])
# Read in chunks for very large files
for chunk in pd.read_excel("huge.xlsx", chunksize=10000):
# Process each chunk
process(chunk)
# Specify dtypes to avoid inference overhead
df = pd.read_excel("data.xlsx", dtype={"id": str, "amount": float})
# For openpyxl with large files
from openpyxl import load_workbook
wb = load_workbook("large.xlsx", read_only=True) # Read-only mode
import pandas as pd
df = pd.DataFrame({"Product": ["Widget A", "Widget B"], "Sales": [100, 200]})
writer = pd.ExcelWriter("output.xlsx", engine="openpyxl")
df.to_excel(writer, sheet_name="Sales", index=False)
worksheet = writer.sheets["Sales"]
for column in worksheet.columns:
max_length = 0
column_letter = column[0].column_letter
for cell in column:
try:
if len(str(cell.value)) > max_length:
max_length = len(str(cell.value))
except:
pass
worksheet.column_dimensions[column_letter].width = max_length + 2
writer.close()
Weekly Installs
132
Repository
GitHub Stars
29
First Seen
Feb 7, 2026
Security Audits
Gen Agent Trust HubWarnSocketPassSnykPass
Installed on
opencode122
gemini-cli116
codex116
cursor116
github-copilot110
kimi-cli106
Lark Drive API 使用指南:飞书云文档、Wiki、表格 Token 处理与文件管理
37,500 周安装
Knip 代码清理工具:自动查找并移除未使用的文件、依赖和导出
211 周安装
Magento 2 Hyvä CMS 组件创建器 - 快速构建自定义CMS组件
213 周安装
Ralplan 共识规划工具:AI 驱动的迭代规划与决策制定 | 自动化开发工作流
213 周安装
ln-724-artifact-cleaner:自动清理在线平台项目产物,移除平台依赖,准备生产部署
204 周安装
Scanpy 单细胞 RNA-seq 数据分析教程 | Python 生物信息学工具包
206 周安装
AlphaFold 数据库技能:AI预测蛋白质3D结构检索、下载与分析完整指南
207 周安装
#DIV/0!: Division by zero#VALUE!: Wrong data type in formula#NAME?: Unrecognized formula name