azure-resource-health-diagnose by github/awesome-copilot
npx skills add https://github.com/github/awesome-copilot --skill azure-resource-health-diagnose此工作流用于分析特定 Azure 资源,评估其健康状态,利用日志和遥测数据诊断潜在问题,并为发现的问题制定全面的修复计划。
azmcp-*),其次才是直接使用 Azure CLI操作:检索诊断和故障排除最佳实践 工具:Azure MCP 最佳实践工具 流程:
操作:定位并识别目标 Azure 资源 工具:Azure MCP 工具 + Azure CLI 备用 流程:
资源查找:
azmcp-subscription-list 在所有订阅中搜索az resource list --name <resource-name> 查找匹配的资源广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
资源类型检测:
操作:评估当前资源健康状况和可用性 工具:Azure MCP 监控工具 + Azure CLI 流程:
基本健康检查:
特定于服务的健康指标:
操作:分析日志和遥测数据以识别问题和模式 工具:用于 Log Analytics 查询的 Azure MCP 监控工具 流程:
查找监控源:
azmcp-monitor-workspace-list 识别 Log Analytics 工作区azmcp-monitor-table-list 识别相关的日志表执行诊断查询:根据资源类型,使用带有针对性 KQL 查询的 azmcp-monitor-log-query:
常规错误分析:
// 最近的错误和异常
union isfuzzy=true
AzureDiagnostics,
AppServiceHTTPLogs,
AppServiceAppLogs,
AzureActivity
| where TimeGenerated > ago(24h)
| where Level == "Error" or ResultType != "Success"
| summarize ErrorCount=count() by Resource, ResultType, bin(TimeGenerated, 1h)
| order by TimeGenerated desc
性能分析:
// 性能下降模式
Perf
| where TimeGenerated > ago(7d)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
| where avg_CounterValue > 80
应用程序特定查询:
// Application Insights - 失败的请求
requests
| where timestamp > ago(24h)
| where success == false
| summarize FailureCount=count() by resultCode, bin(timestamp, 1h)
| order by timestamp desc
// 数据库 - 连接失败
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.SQL"
| where Category == "SQLSecurityAuditEvents"
| where action_name_s == "CONNECTION_FAILED"
| summarize ConnectionFailures=count() by bin(TimeGenerated, 1h)
3. 模式识别:
* 识别重复出现的错误模式或异常
* 将错误与部署时间或配置更改相关联
* 分析性能趋势和下降模式
* 查找依赖项故障或外部服务问题
操作:对识别出的问题进行分类并确定根本原因 流程:
问题分类:
根本原因分析:
影响评估:
操作:创建全面的计划以解决识别出的问题 流程:
立即行动(严重问题):
短期修复(高/中优先级问题):
长期改进(所有问题):
实施步骤:
操作:展示发现结果并获取修复行动的批准 流程:
显示健康评估摘要:
🏥 Azure 资源健康评估
📊 资源概览:
• 资源:[名称]([类型])
• 状态:[健康/警告/严重]
• 位置:[区域]
• 上次分析时间:[时间戳]
🚨 识别出的问题:
• 严重:X 个需要立即关注的问题
• 高:Y 个影响性能/可靠性的问题
• 中:Z 个用于优化的问题
• 低:N 个信息性项目
🔍 主要问题:
1. [问题类型]:[描述] - 影响:[高/中/低]
2. [问题类型]:[描述] - 影响:[高/中/低]
3. [问题类型]:[描述] - 影响:[高/中/低]
🛠️ 修复计划:
• 立即行动:X 项
• 短期修复:Y 项
• 长期改进:Z 项
• 预计解决时间:[时间线]
❓ 是否继续执行详细的修复计划?(y/n)
生成详细报告:
# Azure 资源健康报告:[资源名称]
**生成时间**:[时间戳]
**资源**:[完整资源 ID]
**整体健康状况**:[带颜色指示器的状态]
## 🔍 执行摘要
[健康状况和关键发现的简要概述]
## 📊 健康指标
- **可用性**:过去 24 小时 X%
- **性能**:[平均响应时间/吞吐量]
- **错误率**:过去 24 小时 X%
- **资源利用率**:[CPU/内存/存储百分比]
## 🚨 识别出的问题
### 严重问题
- **[问题 1]**:[描述]
- **根本原因**:[分析]
- **影响**:[业务影响]
- **立即行动**:[所需步骤]
### 高优先级问题
- **[问题 2]**:[描述]
- **根本原因**:[分析]
- **影响**:[性能/可靠性影响]
- **建议修复**:[解决步骤]
## 🛠️ 修复计划
### 阶段 1:立即行动 (0-2 小时)
```bash
# 恢复服务的关键修复
[带有解释的 Azure CLI 命令]
# 性能和可靠性改进
[带有解释的 Azure CLI 命令]
# 架构性和预防性措施
[Azure CLI 命令和配置变更]
* **需要配置的警报**:[建议的警报列表]
* **需要创建的仪表板**:[监控仪表板建议]
* **定期健康检查**:[建议的频率和范围]
* 通过日志验证问题解决情况
* 确认性能改进
* 测试应用程序功能
* 更新监控和警报
* 记录经验教训
* [防止类似问题的建议]
* [流程改进]
* [监控增强]
每周安装量
7.3K
代码仓库
GitHub 星标数
26.7K
首次出现
2026年2月25日
安全审计
安装于
codex7.2K
gemini-cli7.2K
opencode7.2K
cursor7.2K
github-copilot7.1K
kimi-cli7.1K
This workflow analyzes a specific Azure resource to assess its health status, diagnose potential issues using logs and telemetry data, and develop a comprehensive remediation plan for any problems discovered.
azmcp-*) over direct Azure CLI when availableAction : Retrieve diagnostic and troubleshooting best practices Tools : Azure MCP best practices tool Process :
Action : Locate and identify the target Azure resource Tools : Azure MCP tools + Azure CLI fallback Process :
Resource Lookup :
azmcp-subscription-listaz resource list --name <resource-name> to find matching resourcesResource Type Detection :
Action : Evaluate current resource health and availability Tools : Azure MCP monitoring tools + Azure CLI Process :
Basic Health Check :
Service-Specific Health Indicators :
Action : Analyze logs and telemetry to identify issues and patterns Tools : Azure MCP monitoring tools for Log Analytics queries Process :
Find Monitoring Sources :
azmcp-monitor-workspace-list to identify Log Analytics workspacesazmcp-monitor-table-listExecute Diagnostic Queries : Use azmcp-monitor-log-query with targeted KQL queries based on resource type:
General Error Analysis :
// Recent errors and exceptions
union isfuzzy=true
AzureDiagnostics,
AppServiceHTTPLogs,
AppServiceAppLogs,
AzureActivity
| where TimeGenerated > ago(24h)
| where Level == "Error" or ResultType != "Success"
| summarize ErrorCount=count() by Resource, ResultType, bin(TimeGenerated, 1h)
| order by TimeGenerated desc
Performance Analysis :
// Performance degradation patterns
Perf
| where TimeGenerated > ago(7d)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
| where avg_CounterValue > 80
Application-Specific Queries :
// Application Insights - Failed requests
requests
| where timestamp > ago(24h)
| where success == false
| summarize FailureCount=count() by resultCode, bin(timestamp, 1h)
| order by timestamp desc
// Database - Connection failures
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.SQL"
| where Category == "SQLSecurityAuditEvents"
| where action_name_s == "CONNECTION_FAILED"
| summarize ConnectionFailures=count() by bin(TimeGenerated, 1h)
3. Pattern Recognition :
* Identify recurring error patterns or anomalies
* Correlate errors with deployment times or configuration changes
* Analyze performance trends and degradation patterns
* Look for dependency failures or external service issues
Action : Categorize identified issues and determine root causes Process :
Issue Classification :
Root Cause Analysis :
Impact Assessment :
Action : Create a comprehensive plan to address identified issues Process :
Immediate Actions (Critical issues):
Short-term Fixes (High/Medium issues):
Long-term Improvements (All issues):
Implementation Steps :
Action : Present findings and get approval for remediation actions Process :
Display Health Assessment Summary :
🏥 Azure Resource Health Assessment
📊 Resource Overview:
• Resource: [Name] ([Type])
• Status: [Healthy/Warning/Critical]
• Location: [Region]
• Last Analyzed: [Timestamp]
🚨 Issues Identified:
• Critical: X issues requiring immediate attention
• High: Y issues affecting performance/reliability
• Medium: Z issues for optimization
• Low: N informational items
🔍 Top Issues:
1. [Issue Type]: [Description] - Impact: [High/Medium/Low]
2. [Issue Type]: [Description] - Impact: [High/Medium/Low]
3. [Issue Type]: [Description] - Impact: [High/Medium/Low]
🛠️ Remediation Plan:
• Immediate Actions: X items
• Short-term Fixes: Y items
• Long-term Improvements: Z items
• Estimated Resolution Time: [Timeline]
❓ Proceed with detailed remediation plan? (y/n)
Generate Detailed Report :
# Azure Resource Health Report: [Resource Name]
**Generated**: [Timestamp]
**Resource**: [Full Resource ID]
**Overall Health**: [Status with color indicator]
## 🔍 Executive Summary
[Brief overview of health status and key findings]
## 📊 Health Metrics
- **Availability**: X% over last 24h
- **Performance**: [Average response time/throughput]
- **Error Rate**: X% over last 24h
- **Resource Utilization**: [CPU/Memory/Storage percentages]
## 🚨 Issues Identified
### Critical Issues
- **[Issue 1]**: [Description]
- **Root Cause**: [Analysis]
- **Impact**: [Business impact]
- **Immediate Action**: [Required steps]
### High Priority Issues
- **[Issue 2]**: [Description]
- **Root Cause**: [Analysis]
- **Impact**: [Performance/reliability impact]
- **Recommended Fix**: [Solution steps]
## 🛠️ Remediation Plan
### Phase 1: Immediate Actions (0-2 hours)
```bash
# Critical fixes to restore service
[Azure CLI commands with explanations]
# Performance and reliability improvements
[Azure CLI commands with explanations]
# Architectural and preventive measures
[Azure CLI commands and configuration changes]
* **Alerts to Configure** : [List of recommended alerts]
* **Dashboards to Create** : [Monitoring dashboard suggestions]
* **Regular Health Checks** : [Recommended frequency and scope]
* Verify issue resolution through logs
* Confirm performance improvements
* Test application functionality
* Update monitoring and alerting
* Document lessons learned
* [Recommendations to prevent similar issues]
* [Process improvements]
* [Monitoring enhancements]
Weekly Installs
7.3K
Repository
GitHub Stars
26.7K
First Seen
Feb 25, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex7.2K
gemini-cli7.2K
opencode7.2K
cursor7.2K
github-copilot7.1K
kimi-cli7.1K
31,300 周安装