AWS 成本优化与 FinOps 工作流：系统化节省云开支的完整指南

aws-cost-finops by ahmedasmar/devops-claude-skills

85 周安装量

105 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/ahmedasmar/devops-claude-skills --skill aws-cost-finops

AWS 云服务开发运维

🇨🇳中文介绍

AWS 成本优化与 FinOps

用于 AWS 成本优化和财务运营管理的系统化工作流。

何时使用此技能

在以下情况下使用此技能：

寻找成本节省机会：识别未使用的资源、调整规模的机会或承诺折扣
分析支出：了解成本趋势、检测异常或细分成本
优化架构：选择经济高效的服务、存储层级或实例类型
实施 FinOps：建立治理、标记、预算或月度审查
做出采购决策：评估预留实例、Savings Plans 或 Spot 实例
排查成本问题：调查意外账单或成本激增
规划预算：预测成本或评估新项目的影响

成本优化工作流

遵循此系统化方法进行 AWS 成本优化：

    ┌─────────────────────────────────────────────┐
    │ 1. 发现                                    │
    │    我们的钱花在哪里？                      │
    │    运行：find_unused_resources.py           │
    │    运行：cost_anomaly_detector.py           │
    └─────────────────────────────────────────────┘
                        ↓
    ┌─────────────────────────────────────────────┐
    │ 2. 分析                                    │
    │    优化机会在哪里？                        │
    │    运行：rightsizing_analyzer.py            │
    │    运行：detect_old_generations.py          │
    │    运行：spot_recommendations.py            │
    │    运行：analyze_ri_recommendations.py      │
    └─────────────────────────────────────────────┘
                        ↓
    ┌─────────────────────────────────────────────┐
    │ 3. 优先级排序                              │
    │    我们应该首先优化什么？                  │
    │    - 快速见效（低风险，高节省）            │
    │    - 低垂果实（易于实施）                  │
    │    - 战略性改进                            │
    └─────────────────────────────────────────────┘
                        ↓
    ┌─────────────────────────────────────────────┐
    │ 4. 实施                                    │
    │    执行优化操作                            │
    │    - 删除未使用的资源                      │
    │    - 调整实例规模                          │
    │    - 购买承诺                              │
    │    - 迁移到新世代                          │
    └─────────────────────────────────────────────┘
                        ↓
    ┌─────────────────────────────────────────────┐
    │ 5. 监控                                    │
    │    验证节省并跟踪指标                      │
    │    - 月度成本审查                          │
    │    - 标记合规性监控                        │
    │    - 预算差异跟踪                          │
    └─────────────────────────────────────────────┘

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

工作流 1：月度成本优化审查

频率：每月运行（每月第一周）

步骤 1：查找未使用的资源

# 扫描所有资源中的浪费
python3 scripts/find_unused_resources.py

# 预期输出：
# - 未挂载的 EBS 卷
# - 旧快照
# - 未使用的弹性 IP
# - 闲置的 NAT 网关
# - 闲置的 EC2 实例
# - 未使用的负载均衡器
# - 预估月度节省

步骤 2：分析成本异常

# 检测异常支出模式
python3 scripts/cost_anomaly_detector.py --days 30

# 预期输出：
# - 成本激增和异常
# - 主要成本驱动因素
# - 周期对比
# - 30 天预测

步骤 3：识别调整规模的机会

# 查找过大的实例
python3 scripts/rightsizing_analyzer.py --days 30

# 预期输出：
# - 利用率低的 EC2 实例
# - 利用率低的 RDS 实例
# - 推荐的较小实例类型
# - 预估节省

步骤 4：生成月度报告

# 使用模板汇总发现
cp assets/templates/monthly_cost_report.md reports/$(date +%Y-%m)-cost-report.md

# 填写：
# - 脚本发现
# - 待办事项
# - 团队成本细分
# - 优化成果

步骤 5：团队审查会议

向工程团队展示发现
分配优化任务
跟踪待办事项直至完成

工作流 2：承诺购买分析（RI/Savings Plans）

时机：每季度或当使用模式稳定时

步骤 1：分析当前使用情况

# 识别适合承诺的工作负载
python3 scripts/analyze_ri_recommendations.py --days 60

# 查找：
# - 持续运行 60 天以上的 EC2 实例
# - 使用稳定的 RDS 实例
# - 计算 1 年与 3 年承诺的投资回报率

步骤 2：审查推荐

评估每个推荐：

✅ 良好候选者如果：
  - 24/7 运行 60 天以上
  - 工作负载稳定且可预测
  - 无更改架构的计划
  - 节省 > 30%

❌ 不良候选者如果：
  - 工作负载可变或处于实验阶段
  - 计划进行架构更改
  - 实例类型可能更改
  - 开发/测试环境

步骤 3：选择承诺类型

标准 RI：最高折扣（63%），无灵活性
可转换 RI：中等折扣（54%），可更改实例类型
最适合：特定实例类型，稳定工作负载

Savings Plans：

Compute SP：跨实例类型、区域灵活（66% 节省）
EC2 Instance SP：在同一系列内跨大小灵活（72% 节省）
最适合：约束内的可变工作负载

已知实例类型，不会更改 → 标准 RI
可能需要更改类型 → 可转换 RI 或 Compute SP
可变工作负载 → Compute Savings Plan
最大灵活性 → Compute Savings Plan

步骤 4：购买和跟踪

通过 AWS 控制台或 CLI 购买
用购买日期和所有者标记承诺
每月监控利用率
目标 >90% 利用率

参考：有关详细承诺策略，请参阅 references/best_practices.md

工作流 3：实例世代迁移

时机：在架构审查或优化冲刺期间

步骤 1：检测旧实例

# 查找过时的实例世代
python3 scripts/detect_old_generations.py

# 识别：
# - t2 → t3 迁移（10% 节省）
# - m4 → m5 → m6i 迁移
# - Intel → Graviton 机会（20% 节省）

步骤 2：确定迁移优先级

快速见效（低风险）：

t2 → t3：直接替换，10% 节省
m4 → m5：更好的性能，5% 节省
gp2 → gp3：无停机时间，20% 节省

中等工作量（需要测试）：

x86 → Graviton (ARM64)：20% 节省
- 需要 ARM64 兼容性测试
- 大多数现代框架支持 ARM64
- 首先在预演环境中测试

步骤 3：执行迁移

对于 EC2（x86 到 x86）：

停止实例
更改实例类型
启动实例
验证应用程序

对于 Graviton 迁移：

创建 ARM64 AMI 或 Docker 镜像
启动新的 Graviton 实例
彻底测试
切换流量
终止旧实例

步骤 4：验证节省

在 Cost Explorer 中监控新成本
验证性能可接受
为其他团队记录迁移

参考：请参阅 references/best_practices.md → 计算优化

工作流 4：Spot 实例评估

时机：用于容错工作负载或 Auto Scaling 组

步骤 1：识别候选者

# 分析工作负载的 Spot 适用性
python3 scripts/spot_recommendations.py

# 评估：
# - Auto Scaling 组中的实例（良好候选者）
# - 开发/测试/预演环境
# - 批处理工作负载
# - CI/CD 和构建服务器

步骤 2：评估适用性

非常适合 Spot：

无状态应用程序
批处理作业
CI/CD 流水线
数据处理
Auto Scaling 组

不适合 Spot：

数据库（无副本）
有状态应用程序
实时服务
关键任务工作负载

步骤 3：实施策略

选项 1：Fargate Spot（最简单）

# ECS 任务定义
requiresCompatibilities:
  - FARGATE
capacityProviderStrategy:
  - capacityProvider: FARGATE_SPOT
    weight: 70  # 70% Spot
  - capacityProvider: FARGATE
    weight: 30  # 30% 按需

选项 2：使用 Spot 的 EC2 Auto Scaling

# 混合实例策略
MixedInstancesPolicy:
  InstancesDistribution:
    OnDemandBaseCapacity: 2
    OnDemandPercentageAboveBaseCapacity: 30
    SpotAllocationStrategy: capacity-optimized
  LaunchTemplate:
    Overrides:
      - InstanceType: m5.large
      - InstanceType: m5a.large
      - InstanceType: m5n.large

选项 3：EC2 Spot Fleet

# 创建具有多种实例类型的 Spot Fleet
aws ec2 request-spot-fleet --spot-fleet-request-config file://spot-fleet.json

步骤 4：实施中断处理

# 处理 2 分钟终止通知
# 实例元数据：/latest/meta-data/spot/instance-action

# 在应用程序中：
1. 轮询终止通知
2. 优雅关闭（保存状态）
3. 排空连接
4. 退出

参考：请参阅 references/best_practices.md → 计算优化 → Spot 实例

快速参考：成本优化脚本

ls scripts/
# find_unused_resources.py
# analyze_ri_recommendations.py
# detect_old_generations.py
# spot_recommendations.py
# rightsizing_analyzer.py
# cost_anomaly_detector.py

月度审查（运行全部）：

python3 scripts/find_unused_resources.py
python3 scripts/cost_anomaly_detector.py --days 30
python3 scripts/rightsizing_analyzer.py --days 30

python3 scripts/analyze_ri_recommendations.py --days 60
python3 scripts/detect_old_generations.py
python3 scripts/spot_recommendations.py

仅特定区域：

python3 scripts/find_unused_resources.py --region us-east-1
python3 scripts/rightsizing_analyzer.py --region us-west-2

命名 AWS 配置文件：

python3 scripts/find_unused_resources.py --profile production
python3 scripts/cost_anomaly_detector.py --profile production --days 60

# 安装依赖项
pip install boto3 tabulate

# 需要 AWS 凭证
# 通过以下方式配置：aws configure
# 或使用：--profile PROFILE_NAME

迁移到 Graviton（20% 节省）
对容错工作负载使用 Spot（70% 节省）
为稳定工作负载购买 RI（40-65% 节省）
调整过大实例的规模

参考：references/best_practices.md → 计算优化

转换 gp2 → gp3（20% 节省）
实施 S3 生命周期策略（50-95% 节省）
删除旧快照
使用 S3 Intelligent-Tiering

参考：references/best_practices.md → 存储优化

用 VPC 端点替换 NAT 网关（每个节省 $25-30/月）
使用 CloudFront 降低数据传输成本
尽可能将资源放在同一可用区

参考：references/best_practices.md → 网络优化

调整 RDS 实例规模
使用 gp3 存储（比 gp2 便宜 20%）
评估 Aurora Serverless 用于可变工作负载
购买 RDS 预留实例

参考：references/best_practices.md → 数据库优化

服务替代方案决策指南

需要帮助在服务之间进行选择吗？

问题："我应该使用 EC2、Lambda 还是 Fargate？" 答案：请参阅 references/service_alternatives.md → 计算替代方案

问题："我应该使用哪个 S3 存储类？" 答案：请参阅 references/service_alternatives.md → 存储替代方案

问题："我应该使用 RDS 还是 Aurora？" 答案：请参阅 references/service_alternatives.md → 数据库替代方案

问题："NAT 网关 vs VPC 端点 vs NAT 实例？" 答案：请参阅 references/service_alternatives.md → 网络替代方案

FinOps 治理与流程

阶段 1：基础（第 1 个月）

启用 Cost Explorer
设置 AWS Budgets
定义标记策略
激活成本分配标记

阶段 2：可见性（第 2-3 个月）

实施标记强制执行
运行优化脚本
设置月度审查
创建团队成本报告

阶段 3：文化（持续进行）

工程 KPI 中的成本指标
架构决策中的成本审查
定期优化冲刺
每个团队中的 FinOps 倡导者

完整指南：请参阅 references/finops_governance.md

第 1 周：数据收集

运行所有优化脚本
导出成本与使用报告
汇总发现

第 2 周：分析

识别趋势
寻找机会
确定行动优先级

第 3 周：团队审查

向工程团队展示
讨论优化
分配待办事项

第 4 周：执行报告

创建执行摘要
预测下一季度
报告优化成果

模板：请参阅 assets/templates/monthly_cost_report.md

详细流程：请参阅 references/finops_governance.md → 月度审查流程

快速见效（首先执行）

删除未挂载的 EBS 卷
删除旧的 EBS 快照（>90 天）
释放未使用的弹性 IP
转换 gp2 → gp3 卷
停止/终止闲置的 EC2 实例
启用 S3 Intelligent-Tiering
设置 AWS Budgets 和警报

中等工作量（本季度）

调整过大实例的规模
迁移到较新的实例世代
为稳定工作负载购买预留实例
实施 S3 生命周期策略
用 VPC 端点替换 NAT 网关（适用时）
启用自动化资源调度（开发/测试）
实施标记策略和强制执行

战略举措（持续进行）

迁移到 Graviton 实例
对容错工作负载实施 Spot
建立月度成本审查流程
按团队设置成本分配
实施成本分摊/展示模型
创建 FinOps 文化和实践

"我的账单突然增加了"

运行成本异常检测：

python3 scripts/cost_anomaly_detector.py --days 30

检查 Cost Explorer 的服务细分
审查 CloudTrail 中的资源创建事件
检查 AutoScaling 事件
验证没有预留实例过期

"我需要将成本降低 X%"

遵循优化工作流：

运行所有发现脚本
计算总潜在节省
按以下方式确定优先级：节省金额 × (1 / 工作量)
首先关注快速见效
为长期实施战略性更改

"我怎么知道预留实例是否有意义？"

python3 scripts/analyze_ri_recommendations.py --days 60

持续运行 60 天以上的实例
不会更改的工作负载
节省 > 30%

"我可以安全删除哪些资源？"

运行未使用资源查找器：

python3 scripts/find_unused_resources.py

通常可以安全删除：

未挂载的 EBS 卷（验证后）
90 天的快照（如果备份存在于其他地方）
未使用的弹性 IP（验证不在 DNS 中后）
30 天的已停止 EC2 实例（确认已废弃后）

删除前务必与资源所有者核实！

标记一切：一致的标记支持成本分配和问责制
持续监控：每周运行脚本及早发现浪费
月度审查：定期审查防止成本漂移
主动调整规模：不要等到成本问题出现才优化
明智使用承诺：仅对稳定工作负载使用 RI/SP
迁移前测试：特别是对于 Graviton 或 Spot
自动化清理：开发/测试资源的计划性关闭
分享成果：庆祝成本节省以建立 FinOps 文化

references/best_practices.md：全面的优化策略
references/service_alternatives.md：经济高效的服务选择
references/finops_governance.md：组织 FinOps 实践

assets/templates/monthly_cost_report.md：月度报告模板

所有脚本位于 scripts/ 目录中，使用 --help 查看用法

🇺🇸English

AWS Cost Optimization & FinOps

Systematic workflows for AWS cost optimization and financial operations management.

When to Use This Skill

Use this skill when you need to:

Find cost savings : Identify unused resources, rightsizing opportunities, or commitment discounts
Analyze spending : Understand cost trends, detect anomalies, or break down costs
Optimize architecture : Choose cost-effective services, storage tiers, or instance types
Implement FinOps : Set up governance, tagging, budgets, or monthly reviews
Make purchase decisions : Evaluate Reserved Instances, Savings Plans, or Spot instances
Troubleshoot costs : Investigate unexpected bills or cost spikes
Plan budgets : Forecast costs or evaluate impact of new projects

Cost Optimization Workflow

Follow this systematic approach for AWS cost optimization:

┌─────────────────────────────────────────────┐
│ 1. DISCOVER                                 │
│    What are we spending money on?           │
│    Run: find_unused_resources.py            │
│    Run: cost_anomaly_detector.py            │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 2. ANALYZE                                  │
│    Where are the optimization opportunities?│
│    Run: rightsizing_analyzer.py             │
│    Run: detect_old_generations.py           │
│    Run: spot_recommendations.py             │
│    Run: analyze_ri_recommendations.py       │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 3. PRIORITIZE                               │
│    What should we optimize first?           │
│    - Quick wins (low risk, high savings)    │
│    - Low-hanging fruit (easy to implement)  │
│    - Strategic improvements                 │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 4. IMPLEMENT                                │
│    Execute optimization actions             │
│    - Delete unused resources                │
│    - Rightsize instances                    │
│    - Purchase commitments                   │
│    - Migrate to new generations             │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│ 5. MONITOR                                  │
│    Verify savings and track metrics         │
│    - Monthly cost reviews                   │
│    - Tag compliance monitoring              │
│    - Budget variance tracking               │
└─────────────────────────────────────────────┘

Core Workflows

Workflow 1: Monthly Cost Optimization Review

Frequency : Run monthly (first week of each month)

Step 1: Find Unused Resources

# Scan for waste across all resources
python3 scripts/find_unused_resources.py

# Expected output:
# - Unattached EBS volumes
# - Old snapshots
# - Unused Elastic IPs
# - Idle NAT Gateways
# - Idle EC2 instances
# - Unused load balancers
# - Estimated monthly savings

Step 2: Analyze Cost Anomalies

# Detect unusual spending patterns
python3 scripts/cost_anomaly_detector.py --days 30

# Expected output:
# - Cost spikes and anomalies
# - Top cost drivers
# - Period-over-period comparison
# - 30-day forecast

Step 3: Identify Rightsizing Opportunities

# Find oversized instances
python3 scripts/rightsizing_analyzer.py --days 30

# Expected output:
# - EC2 instances with low utilization
# - RDS instances with low utilization
# - Recommended smaller instance types
# - Estimated savings

Step 4: Generate Monthly Report

# Use the template to compile findings
cp assets/templates/monthly_cost_report.md reports/$(date +%Y-%m)-cost-report.md

# Fill in:
# - Findings from scripts
# - Action items
# - Team cost breakdowns
# - Optimization wins

Step 5: Team Review Meeting

Present findings to engineering teams
Assign optimization tasks
Track action items to completion

Workflow 2: Commitment Purchase Analysis (RI/Savings Plans)

When : Quarterly or when usage patterns stabilize

Step 1: Analyze Current Usage

# Identify workloads suitable for commitments
python3 scripts/analyze_ri_recommendations.py --days 60

# Looks for:
# - EC2 instances running consistently for 60+ days
# - RDS instances with stable usage
# - Calculates ROI for 1yr vs 3yr commitments

Step 2: Review Recommendations

Evaluate each recommendation:

✅ Good candidate if:
  - Running 24/7 for 60+ days
  - Workload is stable and predictable
  - No plans to change architecture
  - Savings > 30%

❌ Poor candidate if:
  - Workload is variable or experimental
  - Architecture changes planned
  - Instance type may change
  - Dev/test environment

Step 3: Choose Commitment Type

Reserved Instances :

Standard RI: Highest discount (63%), no flexibility
Convertible RI: Moderate discount (54%), can change instance type
Best for: Specific instance types, stable workloads

Savings Plans :

Compute SP: Flexible across instance types, regions (66% savings)
EC2 Instance SP: Flexible across sizes in same family (72% savings)
Best for: Variable workloads within constraints

Decision Matrix :

Known instance type, won't change → Standard RI
May need to change types → Convertible RI or Compute SP
Variable workloads → Compute Savings Plan
Maximum flexibility → Compute Savings Plan

Step 4: Purchase and Track

Purchase through AWS Console or CLI
Tag commitments with purchase date and owner
Monitor utilization monthly
Aim for >90% utilization

Reference : See references/best_practices.md for detailed commitment strategies

Workflow 3: Instance Generation Migration

When : During architecture reviews or optimization sprints

Step 1: Detect Old Instances

# Find outdated instance generations
python3 scripts/detect_old_generations.py

# Identifies:
# - t2 → t3 migrations (10% savings)
# - m4 → m5 → m6i migrations
# - Intel → Graviton opportunities (20% savings)

Step 2: Prioritize Migrations

Quick Wins (Low Risk) :

t2 → t3: Drop-in replacement, 10% savings
m4 → m5: Better performance, 5% savings
gp2 → gp3: No downtime, 20% savings

Medium Effort (Test Required) :

x86 → Graviton (ARM64): 20% savings
- Requires ARM64 compatibility testing
- Most modern frameworks support ARM64
- Test in staging first

Step 3: Execute Migration

For EC2 (x86 to x86) :

Stop instance
Change instance type
Start instance
Verify application

For Graviton Migration :

Create ARM64 AMI or Docker image
Launch new Graviton instance
Test thoroughly
Cut over traffic
Terminate old instance

Step 4: Validate Savings

Monitor new costs in Cost Explorer
Verify performance is acceptable
Document migration for other teams

Reference : See references/best_practices.md → Compute Optimization

Workflow 4: Spot Instance Evaluation

When : For fault-tolerant workloads or Auto Scaling Groups

Step 1: Identify Candidates

# Analyze workloads for Spot suitability
python3 scripts/spot_recommendations.py

# Evaluates:
# - Instances in Auto Scaling Groups (good candidates)
# - Dev/test/staging environments
# - Batch processing workloads
# - CI/CD and build servers

Step 2: Assess Suitability

Excellent for Spot :

Stateless applications
Batch jobs
CI/CD pipelines
Data processing
Auto Scaling Groups

NOT suitable for Spot :

Databases (without replicas)
Stateful applications
Real-time services
Mission-critical workloads

Step 3: Implementation Strategy

Option 1: Fargate Spot (Easiest)

# ECS task definition
requiresCompatibilities:
  - FARGATE
capacityProviderStrategy:
  - capacityProvider: FARGATE_SPOT
    weight: 70  # 70% Spot
  - capacityProvider: FARGATE
    weight: 30  # 30% On-Demand

Option 2: EC2 Auto Scaling with Spot

# Mixed instances policy
MixedInstancesPolicy:
  InstancesDistribution:
    OnDemandBaseCapacity: 2
    OnDemandPercentageAboveBaseCapacity: 30
    SpotAllocationStrategy: capacity-optimized
  LaunchTemplate:
    Overrides:
      - InstanceType: m5.large
      - InstanceType: m5a.large
      - InstanceType: m5n.large

Option 3: EC2 Spot Fleet

# Create Spot Fleet with diverse instance types
aws ec2 request-spot-fleet --spot-fleet-request-config file://spot-fleet.json

Step 4: Implement Interruption Handling

# Handle 2-minute termination notice
# Instance metadata: /latest/meta-data/spot/instance-action

# In application:
1. Poll for termination notice
2. Gracefully shutdown (save state)
3. Drain connections
4. Exit

Reference : See references/best_practices.md → Compute Optimization → Spot Instances

Quick Reference: Cost Optimization Scripts

All Scripts Location

ls scripts/
# find_unused_resources.py
# analyze_ri_recommendations.py
# detect_old_generations.py
# spot_recommendations.py
# rightsizing_analyzer.py
# cost_anomaly_detector.py

Script Usage Patterns

Monthly Review (Run all) :

python3 scripts/find_unused_resources.py
python3 scripts/cost_anomaly_detector.py --days 30
python3 scripts/rightsizing_analyzer.py --days 30

Quarterly Optimization :

python3 scripts/analyze_ri_recommendations.py --days 60
python3 scripts/detect_old_generations.py
python3 scripts/spot_recommendations.py

Specific Region Only :

python3 scripts/find_unused_resources.py --region us-east-1
python3 scripts/rightsizing_analyzer.py --region us-west-2

Named AWS Profile :

python3 scripts/find_unused_resources.py --profile production
python3 scripts/cost_anomaly_detector.py --profile production --days 60

Script Requirements

# Install dependencies
pip install boto3 tabulate

# AWS credentials required
# Configure via: aws configure
# Or use: --profile PROFILE_NAME

Service-Specific Optimization

Compute Optimization

Key Actions :

Migrate to Graviton (20% savings)
Use Spot for fault-tolerant workloads (70% savings)
Purchase RIs for stable workloads (40-65% savings)
Right-size oversized instances

Reference : references/best_practices.md → Compute Optimization

Storage Optimization

Key Actions :

Convert gp2 → gp3 (20% savings)
Implement S3 lifecycle policies (50-95% savings)
Delete old snapshots
Use S3 Intelligent-Tiering

Reference : references/best_practices.md → Storage Optimization

Network Optimization

Key Actions :

Replace NAT Gateways with VPC Endpoints (save $25-30/month each)
Use CloudFront to reduce data transfer costs
Colocate resources in same AZ when possible

Reference : references/best_practices.md → Network Optimization

Database Optimization

Key Actions :

Right-size RDS instances
Use gp3 storage (20% cheaper than gp2)
Evaluate Aurora Serverless for variable workloads
Purchase RDS Reserved Instances

Reference : references/best_practices.md → Database Optimization

Service Alternatives Decision Guide

Need help choosing between services?

Question : "Should I use EC2, Lambda, or Fargate?" Answer : See references/service_alternatives.md → Compute Alternatives

Question : "Which S3 storage class should I use?" Answer : See references/service_alternatives.md → Storage Alternatives

Question : "Should I use RDS or Aurora?" Answer : See references/service_alternatives.md → Database Alternatives

Question : "NAT Gateway vs VPC Endpoint vs NAT Instance?" Answer : See references/service_alternatives.md → Networking Alternatives

FinOps Governance & Process

Setting Up FinOps

Phase 1: Foundation (Month 1)

Enable Cost Explorer
Set up AWS Budgets
Define tagging strategy
Activate cost allocation tags

Phase 2: Visibility (Months 2-3)

Implement tagging enforcement
Run optimization scripts
Set up monthly reviews
Create team cost reports

Phase 3: Culture (Ongoing)

Cost metrics in engineering KPIs
Cost review in architecture decisions
Regular optimization sprints
FinOps champions in each team

Full Guide : See references/finops_governance.md

Monthly Review Process

Week 1 : Data Collection

Run all optimization scripts
Export Cost & Usage Reports
Compile findings

Week 2 : Analysis

Identify trends
Find opportunities
Prioritize actions

Week 3 : Team Reviews

Present to engineering teams
Discuss optimizations
Assign action items

Week 4 : Executive Reporting

Create executive summary
Forecast next quarter
Report optimization wins

Template : See assets/templates/monthly_cost_report.md

Detailed Process : See references/finops_governance.md → Monthly Review Process

Cost Optimization Checklist

Quick Wins (Do First)

Delete unattached EBS volumes
Delete old EBS snapshots (>90 days)
Release unused Elastic IPs
Convert gp2 → gp3 volumes
Stop/terminate idle EC2 instances
Enable S3 Intelligent-Tiering
Set up AWS Budgets and alerts

Medium Effort (This Quarter)

Right-size oversized instances
Migrate to newer instance generations
Purchase Reserved Instances for stable workloads
Implement S3 lifecycle policies
Replace NAT Gateways with VPC Endpoints (where applicable)
Enable automated resource scheduling (dev/test)
Implement tagging strategy and enforcement

Strategic Initiatives (Ongoing)

Migrate to Graviton instances
Implement Spot for fault-tolerant workloads
Establish monthly cost review process
Set up cost allocation by team
Implement chargeback/showback model
Create FinOps culture and practices

Troubleshooting Cost Issues

"My bill suddenly increased"

Run cost anomaly detection:

python3 scripts/cost_anomaly_detector.py --days 30

Check Cost Explorer for service breakdown
Review CloudTrail for resource creation events
Check for AutoScaling events
Verify no Reserved Instances expired

"I need to reduce costs by X%"

Follow the optimization workflow:

Run all discovery scripts
Calculate total potential savings
Prioritize by: Savings Amount × (1 / Effort)
Focus on quick wins first
Implement strategic changes for long-term

"How do I know if Reserved Instances make sense?"

Run RI analysis:

python3 scripts/analyze_ri_recommendations.py --days 60

Look for:

Instances running 60+ days consistently
Workloads that won't change
Savings > 30%

"Which resources can I safely delete?"

Run unused resource finder:

python3 scripts/find_unused_resources.py

Safe to delete (usually):

Unattached EBS volumes (after verifying)
Snapshots > 90 days (if backups exist elsewhere)
Unused Elastic IPs (after verifying not in DNS)
Stopped EC2 instances > 30 days (after confirming abandoned)

Always verify with resource owner before deletion!

Best Practices Summary

Tag Everything : Consistent tagging enables cost allocation and accountability
Monitor Continuously : Weekly script runs catch waste early
Review Monthly : Regular reviews prevent cost drift
Right-size Proactively : Don't wait for cost issues to optimize
Use Commitments Wisely : RIs/SPs for stable workloads only
Test Before Migrating : Especially for Graviton or Spot
Automate Cleanup : Scheduled shutdown of dev/test resources
Share Wins : Celebrate cost savings to build FinOps culture

Additional Resources

Detailed References :

references/best_practices.md: Comprehensive optimization strategies
references/service_alternatives.md: Cost-effective service selection
references/finops_governance.md: Organizational FinOps practices

Templates :

assets/templates/monthly_cost_report.md: Monthly reporting template

Scripts :

All scripts in scripts/ directory with --help for usage

AWS Documentation :

AWS Cost Explorer: https://aws.amazon.com/aws-cost-management/aws-cost-explorer/
AWS Budgets: https://aws.amazon.com/aws-cost-management/aws-budgets/
FinOps Foundation: https://www.finops.org

Weekly Installs

Repository

ahmedasmar/devo…e-skills

GitHub Stars

First Seen

Jan 23, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

opencode55

gemini-cli51

codex49

claude-code49

github-copilot46

cursor44

AWS 成本优化与 FinOps 工作流：系统化节省云开支的完整指南

🇨🇳中文介绍

AWS 成本优化与 FinOps

何时使用此技能

成本优化工作流

相关 Skills

核心工作流

工作流 1：月度成本优化审查

工作流 2：承诺购买分析（RI/Savings Plans）

工作流 3：实例世代迁移

工作流 4：Spot 实例评估

快速参考：成本优化脚本

所有脚本位置

脚本使用模式

脚本要求

服务特定优化

计算优化

存储优化

网络优化

数据库优化

服务替代方案决策指南

FinOps 治理与流程

建立 FinOps

月度审查流程

成本优化清单

快速见效（首先执行）

中等工作量（本季度）

战略举措（持续进行）

成本问题排查

"我的账单突然增加了"

"我需要将成本降低 X%"

"我怎么知道预留实例是否有意义？"

"我可以安全删除哪些资源？"

最佳实践总结

其他资源

🇺🇸English

AWS Cost Optimization & FinOps

When to Use This Skill

Cost Optimization Workflow

Core Workflows

Workflow 1: Monthly Cost Optimization Review

Workflow 2: Commitment Purchase Analysis (RI/Savings Plans)

Workflow 3: Instance Generation Migration

Workflow 4: Spot Instance Evaluation

Quick Reference: Cost Optimization Scripts

All Scripts Location

Script Usage Patterns

Script Requirements

Service-Specific Optimization

Compute Optimization

Storage Optimization

Network Optimization

Database Optimization

Service Alternatives Decision Guide

FinOps Governance & Process

Setting Up FinOps

Monthly Review Process

Cost Optimization Checklist

Quick Wins (Do First)

Medium Effort (This Quarter)

Strategic Initiatives (Ongoing)

Troubleshooting Cost Issues

"My bill suddenly increased"

"I need to reduce costs by X%"

"How do I know if Reserved Instances make sense?"

"Which resources can I safely delete?"

Best Practices Summary

Additional Resources

最新 Skills