vaex by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill vaexVaex 是一个高性能的 Python 库,专为惰性、核外(out-of-core)DataFrame 而设计,用于处理和可视化因过大而无法放入 RAM 的表格数据集。Vaex 每秒可处理超过十亿行数据,使得对具有数十亿行数据的数据集进行交互式数据探索和分析成为可能。
在以下情况使用 Vaex:
Vaex 提供六个主要能力领域,每个领域在 references 目录中都有详细文档:
从各种来源加载和创建 Vaex DataFrame,包括文件(HDF5、CSV、Arrow、Parquet)、pandas DataFrame、NumPy 数组和字典。参考 references/core_dataframes.md 了解:
执行过滤、创建虚拟列、使用表达式以及聚合数据,而无需将所有内容加载到内存中。参考 references/data_processing.md 了解:
利用 Vaex 的惰性求值、缓存策略和内存高效操作。参考 了解:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
references/performance.mddelay=True 进行批处理操作为大型数据集创建交互式可视化,包括热图、直方图和散点图。参考 references/visualization.md 了解:
使用转换器、编码器以及与 scikit-learn、XGBoost 等框架的集成来构建 ML 流水线。参考 references/machine_learning.md 了解:
以各种格式高效读写数据,并获得最佳性能。参考 references/io_operations.md 了解:
对于大多数 Vaex 任务,请遵循以下模式:
import vaex
# 1. 打开或创建 DataFrame
df = vaex.open('large_file.hdf5') # 或 .csv, .arrow, .parquet
# 或者
df = vaex.from_pandas(pandas_df)
# 2. 探索数据
print(df) # 显示首尾行和列信息
df.describe() # 统计摘要
# 3. 创建虚拟列(无内存开销)
df['new_column'] = df.x ** 2 + df.y
# 4. 使用选择进行过滤
df_filtered = df[df.age > 25]
# 5. 计算统计量(快速,惰性求值)
mean_val = df.x.mean()
stats = df.groupby('category').agg({'value': 'sum'})
# 6. 可视化
df.plot1d(df.x, limits=[0, 100])
df.plot(df.x, df.y, limits='99.7%')
# 7. 如有需要则导出
df.export_hdf5('output.hdf5')
参考文件包含每个能力领域的详细信息。根据具体任务将参考资料加载到上下文中:
references/core_dataframes.md 和 references/data_processing.md 开始references/performance.mdreferences/visualization.mdreferences/machine_learning.mdreferences/io_operations.mddelay=Truedf.stat() 进行性能分析 以了解内存使用情况并优化操作import vaex
# 打开大型 CSV(自动分块处理)
df = vaex.from_csv('large_file.csv')
# 导出到 HDF5 以便未来更快访问
df.export_hdf5('large_file.hdf5')
# 后续加载是即时的
df = vaex.open('large_file.hdf5')
# 使用 delay=True 批处理多个操作
mean_x = df.x.mean(delay=True)
std_y = df.y.std(delay=True)
sum_z = df.z.sum(delay=True)
# 一次性执行所有操作
results = vaex.execute([mean_x, std_y, sum_z])
# 无内存开销 - 动态计算
df['age_squared'] = df.age ** 2
df['full_name'] = df.first_name + ' ' + df.last_name
df['is_adult'] = df.age >= 18
此技能包含 references/ 目录中的参考文档:
core_dataframes.md - DataFrame 创建、加载和基本结构data_processing.md - 过滤、表达式、聚合和转换performance.md - 优化策略和惰性求值visualization.md - 绘图和交互式可视化machine_learning.md - ML 流水线和模型集成io_operations.md - 文件格式和数据导入/导出每周安装次数
116
代码仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
安装于
claude-code101
opencode90
cursor87
gemini-cli86
antigravity83
codex75
Vaex is a high-performance Python library designed for lazy, out-of-core DataFrames to process and visualize tabular datasets that are too large to fit into RAM. Vaex can process over a billion rows per second, enabling interactive data exploration and analysis on datasets with billions of rows.
Use Vaex when:
Vaex provides six primary capability areas, each documented in detail in the references directory:
Load and create Vaex DataFrames from various sources including files (HDF5, CSV, Arrow, Parquet), pandas DataFrames, NumPy arrays, and dictionaries. Reference references/core_dataframes.md for:
Perform filtering, create virtual columns, use expressions, and aggregate data without loading everything into memory. Reference references/data_processing.md for:
Leverage Vaex's lazy evaluation, caching strategies, and memory-efficient operations. Reference references/performance.md for:
delay=True for batching operationsCreate interactive visualizations of large datasets including heatmaps, histograms, and scatter plots. Reference references/visualization.md for:
Build ML pipelines with transformers, encoders, and integration with scikit-learn, XGBoost, and other frameworks. Reference references/machine_learning.md for:
Efficiently read and write data in various formats with optimal performance. Reference references/io_operations.md for:
For most Vaex tasks, follow this pattern:
import vaex
# 1. Open or create DataFrame
df = vaex.open('large_file.hdf5') # or .csv, .arrow, .parquet
# OR
df = vaex.from_pandas(pandas_df)
# 2. Explore the data
print(df) # Shows first/last rows and column info
df.describe() # Statistical summary
# 3. Create virtual columns (no memory overhead)
df['new_column'] = df.x ** 2 + df.y
# 4. Filter with selections
df_filtered = df[df.age > 25]
# 5. Compute statistics (fast, lazy evaluation)
mean_val = df.x.mean()
stats = df.groupby('category').agg({'value': 'sum'})
# 6. Visualize
df.plot1d(df.x, limits=[0, 100])
df.plot(df.x, df.y, limits='99.7%')
# 7. Export if needed
df.export_hdf5('output.hdf5')
The reference files contain detailed information about each capability area. Load references into context based on the specific task:
references/core_dataframes.md and references/data_processing.mdreferences/performance.mdreferences/visualization.mdreferences/machine_learning.mdreferences/io_operations.mddelay=True when performing multiple calculationsdf.stat() to understand memory usage and optimize operationsimport vaex
# Open large CSV (processes in chunks automatically)
df = vaex.from_csv('large_file.csv')
# Export to HDF5 for faster future access
df.export_hdf5('large_file.hdf5')
# Future loads are instant
df = vaex.open('large_file.hdf5')
# Use delay=True to batch multiple operations
mean_x = df.x.mean(delay=True)
std_y = df.y.std(delay=True)
sum_z = df.z.sum(delay=True)
# Execute all at once
results = vaex.execute([mean_x, std_y, sum_z])
# No memory overhead - computed on the fly
df['age_squared'] = df.age ** 2
df['full_name'] = df.first_name + ' ' + df.last_name
df['is_adult'] = df.age >= 18
This skill includes reference documentation in the references/ directory:
core_dataframes.md - DataFrame creation, loading, and basic structuredata_processing.md - Filtering, expressions, aggregations, and transformationsperformance.md - Optimization strategies and lazy evaluationvisualization.md - Plotting and interactive visualizationsmachine_learning.md - ML pipelines and model integrationio_operations.md - File formats and data import/exportWeekly Installs
116
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
claude-code101
opencode90
cursor87
gemini-cli86
antigravity83
codex75
Excel财务建模规范与xlsx文件处理指南:专业格式、零错误公式与数据分析
45,000 周安装