data-engineering-data-pipeline by sickn33/antigravity-awesome-skills
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline您是一位数据管道架构专家,专注于为批处理和流式数据处理设计可扩展、可靠且经济高效的数据管道。
$ARGUMENTS
批处理
流式处理
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
Airflow
Prefect
Great Expectations
dbt 测试
Delta Lake
Apache Iceberg
监控
成本优化
# Batch ingestion with validation
from batch_ingestion import BatchDataIngester
from storage.delta_lake_manager import DeltaLakeManager
from data_quality.expectations_suite import DataQualityFramework
ingester = BatchDataIngester(config={})
# Extract with incremental loading
df = ingester.extract_from_database(
connection_string='postgresql://host:5432/db',
query='SELECT * FROM orders',
watermark_column='updated_at',
last_watermark=last_run_timestamp
)
# Validate
schema = {'required_fields': ['id', 'user_id'], 'dtypes': {'id': 'int64'}}
df = ingester.validate_and_clean(df, schema)
# Data quality checks
dq = DataQualityFramework()
result = dq.validate_dataframe(df, suite_name='orders_suite', data_asset_name='orders')
# Write to Delta Lake
delta_mgr = DeltaLakeManager(storage_path='s3://lake')
delta_mgr.create_or_update_table(
df=df,
table_name='orders',
partition_columns=['order_date'],
mode='append'
)
# Save failed records
ingester.save_dead_letter_queue('s3://lake/dlq/orders')
每周安装数
203
代码仓库
GitHub 星标数
27.4K
首次出现
2026年1月28日
安全审计
安装于
opencode190
gemini-cli186
codex186
github-copilot183
cursor182
claude-code168
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
$ARGUMENTS
Batch
Streaming
Airflow
Prefect
Great Expectations
dbt Tests
Delta Lake
Apache Iceberg
Monitoring
Cost Optimization
# Batch ingestion with validation
from batch_ingestion import BatchDataIngester
from storage.delta_lake_manager import DeltaLakeManager
from data_quality.expectations_suite import DataQualityFramework
ingester = BatchDataIngester(config={})
# Extract with incremental loading
df = ingester.extract_from_database(
connection_string='postgresql://host:5432/db',
query='SELECT * FROM orders',
watermark_column='updated_at',
last_watermark=last_run_timestamp
)
# Validate
schema = {'required_fields': ['id', 'user_id'], 'dtypes': {'id': 'int64'}}
df = ingester.validate_and_clean(df, schema)
# Data quality checks
dq = DataQualityFramework()
result = dq.validate_dataframe(df, suite_name='orders_suite', data_asset_name='orders')
# Write to Delta Lake
delta_mgr = DeltaLakeManager(storage_path='s3://lake')
delta_mgr.create_or_update_table(
df=df,
table_name='orders',
partition_columns=['order_date'],
mode='append'
)
# Save failed records
ingester.save_dead_letter_queue('s3://lake/dlq/orders')
Weekly Installs
203
Repository
GitHub Stars
27.4K
First Seen
Jan 28, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode190
gemini-cli186
codex186
github-copilot183
cursor182
claude-code168
Excel财务建模规范与xlsx文件处理指南:专业格式、零错误公式与数据分析
42,900 周安装
测试驱动开发TDD完整指南:红绿重构循环、最佳实践与铁律
201 周安装
文档转换套件:PDF、Word、Excel、PPT、TXT、CSV、MD、HTML 8种格式互转工具
201 周安装
通话总结AI工具:自动提取行动项、生成会议纪要和跟进邮件 | 销售效率提升
201 周安装
营销绩效报告生成工具 - 自动分析关键指标与优化建议 | 营销数据分析
201 周安装
NestJS Drizzle CRUD 生成器 - 自动生成类型安全的 NestJS 后端模块
201 周安装
Next.js 15+ 身份验证完整指南:Auth.js 5 集成、OAuth、RBAC 与服务器组件
201 周安装