npx skills add https://github.com/openshift/hypershift --skill 'Debug Cluster'此技能为常见的 HyperShift 托管集群问题提供结构化的调试工作流。
此技能在以下情况会自动应用:
对于特定于供应商的问题和详细的故障排除步骤,请参考以下子技能:
下面的主要技能提供与供应商无关的调试工作流。当您遇到特定于供应商的问题时,请查阅相关的子技能以获取详细的解决步骤。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
default、clusters)clusters-<cluster-name>)当托管集群卡在删除状态时,请遵循此系统化的调试流程:
检查并验证 NodePool 删除是否正在进行:
# 检查 HC 命名空间中的 NodePool 资源
kubectl get nodepool -n <hc-namespace>
# 检查 HCP 命名空间中的 CAPI 集群资源状态
kubectl get cluster -n <hcp-namespace> -o yaml
# 检查 CAPI 提供者 Pod 日志
kubectl logs -n <hcp-namespace> deployment/capi-provider
# 检查 HCP 命名空间中的 CAPI 机器状态
kubectl get machines -n <hcp-namespace>
kubectl describe machines -n <hcp-namespace>
# 审查 HyperShift 操作员日志中关于 NodePool 的问题
kubectl logs -n hypershift deployment/operator --tail=100 | grep -i nodepool
kubectl logs -n hypershift deployment/operator --tail=100 | grep -i <cluster-name>
需要关注的内容:
验证 HCP 资源和 Pod 是否正在被清理:
# 检查 HCP 资源状态
kubectl get hostedcontrolplane -n <hcp-namespace> -o yaml
# 检查 HCP 命名空间中的 Pod
kubectl get pods -n <hcp-namespace>
# 检查卡住的 Pod
kubectl get pods -n <hcp-namespace> --field-selector=status.phase!=Running
# 审查 control-plane-operator 日志
kubectl logs -n <hcp-namespace> deployment/control-plane-operator --tail=100
需要关注的内容:
调查 HCP 命名空间未被删除的原因:
# 检查命名空间状态
kubectl get namespace <hcp-namespace> -o yaml
# 列出命名空间中所有剩余的资源
kubectl api-resources --verbs=list --namespaced -o name | \
xargs -n 1 kubectl get --show-kind --ignore-not-found -n <hcp-namespace>
# 检查带有终结器的资源
kubectl get all -n <hcp-namespace> -o json | \
jq '.items[] | select(.metadata.finalizers != null) | {kind: .kind, name: .metadata.name, finalizers: .metadata.finalizers}'
# 审查 HO 日志中关于命名空间清理的信息
kubectl logs -n hypershift deployment/operator --tail=100 | grep -i namespace
需要关注的内容:
对 HostedCluster 资源本身的最终检查:
# 检查 HostedCluster 状态
kubectl get hostedcluster -n <hc-namespace> <cluster-name> -o yaml
# 检查 HostedCluster 终结器
kubectl get hostedcluster -n <hc-namespace> <cluster-name> -o jsonpath='{.metadata.finalizers}'
# 审查 HO 日志中关于 HostedCluster 删除的信息
kubectl logs -n hypershift deployment/operator --tail=200 | grep -i "hostedcluster.*<cluster-name>"
需要关注的内容:
调查集群删除问题时:
InfrastructureReady: 1 of 2 completed当 HyperShift CRD 被意外删除或损坏时(例如,在使用 hypershift destroy infra 之后),您需要重新安装 HyperShift。
⚠️ 警告:重新安装 HyperShift 应是最后的手段。 仅当 CRD 确实缺失或损坏且无法通过其他方式恢复时才进行。重新安装将导致停机,并可能影响现有的托管集群。
kubectl get hostedclusters 这样的命令失败,并显示:error: the server doesn't have a resource type "hostedclusters"# 检查关键的 HyperShift CRD 是否存在
kubectl get crd hostedclusters.hypershift.openshift.io
kubectl get crd nodepools.hypershift.openshift.io
# 统计 HyperShift CRD 数量(应该大约为 9 个)
kubectl get crd | grep hypershift | wc -l
# 统计 CAPI CRD 数量(应该大约为 50 个)
kubectl get crd | grep cluster.x-k8s.io | wc -l
🤖 AI 助手说明: 当遇到此场景时,Claude 应指导并建议用户重新安装步骤,但绝不自行执行重新安装命令。用户必须明确运行这些命令。提供清晰的说明和解释,但不要使用 Bash 工具执行实际的重新安装。
# 重新安装需要以下内容:
# - OIDC 存储提供者配置(特定于提供者,见下文)
# - 提供者凭据(如果适用)
# - 原始安装中使用的任何自定义配置标志
特定于提供者的参数:
hypershift install render | kubectl delete -f -
这将:
hypershift install \
[provider-specific-flags] \
--enable-defaulting-webhook true
添加原始安装中包含的任何其他标志。
特定于提供者的安装:
# 检查操作员是否在运行
kubectl get deploy -n hypershift
kubectl get pods -n hypershift
# 验证 CRD 已安装
kubectl get crd | grep hostedcluster
kubectl get crd | grep nodepool
kubectl get crd | grep cluster.x-k8s.io | wc -l
# 测试 API 可访问性
kubectl get hostedclusters -A
# 检查操作员日志中的错误
kubectl logs -n hypershift deployment/operator --tail=50
# 验证控制器正在运行
kubectl logs -n hypershift deployment/operator --tail=100 | grep "Starting workers"
重新安装后的预期结果:
kubectl get hostedclusters -A 成功返回(即使没有集群存在)重要说明:
# 列出资源上的所有终结器
kubectl get <resource-type> <name> -n <namespace> -o jsonpath='{.metadata.finalizers}'
# 移除特定的终结器(谨慎使用!)
kubectl patch <resource-type> <name> -n <namespace> -p '{"metadata":{"finalizers":null}}' --type=merge
# 带上下文的 HyperShift 操作员日志
kubectl logs -n hypershift deployment/operator --tail=500 --timestamps
# 控制平面操作员日志
kubectl logs -n <hcp-namespace> deployment/control-plane-operator --tail=500 --timestamps
# 实时跟踪日志
kubectl logs -n hypershift deployment/operator -f
# 获取特定资源的事件
kubectl describe <resource-type> <name> -n <namespace>
# 获取命名空间中的所有事件,按时间排序
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# 检查资源条件
kubectl get <resource-type> <name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .
# 检查特定条件
kubectl get hostedcluster <name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="Available")]}'
hypershift-operator/controllers/hostedcluster/control-plane-operator/controllers/api/hypershift/v1beta1/test/e2e/每周安装次数
–
代码仓库
GitHub 星标数
515
首次出现
–
安全审计
This skill provides structured debugging workflows for common HyperShift hosted-cluster issues.
This skill automatically applies when:
For provider-specific issues and detailed troubleshooting steps, refer to these subskills:
The main skill below provides provider-agnostic debugging workflows. When you encounter provider-specific issues, consult the relevant subskill for detailed resolution steps.
default, clusters)clusters-<cluster-name>)When a hosted-cluster is stuck in deleting state, follow this systematic debugging process:
Check and verify NodePool deletion is progressing:
# Check NodePool resources in HC namespace
kubectl get nodepool -n <hc-namespace>
# Check CAPI cluster resource status in HCP namespace
kubectl get cluster -n <hcp-namespace> -o yaml
# Check CAPI provider pod logs
kubectl logs -n <hcp-namespace> deployment/capi-provider
# Check CAPI machines status in HCP namespace
kubectl get machines -n <hcp-namespace>
kubectl describe machines -n <hcp-namespace>
# Review HyperShift operator logs for NodePool issues
kubectl logs -n hypershift deployment/operator --tail=100 | grep -i nodepool
kubectl logs -n hypershift deployment/operator --tail=100 | grep -i <cluster-name>
What to look for:
Verify HCP resource and pods are being cleaned up:
# Check HCP resource status
kubectl get hostedcontrolplane -n <hcp-namespace> -o yaml
# Check pods in HCP namespace
kubectl get pods -n <hcp-namespace>
# Check for stuck pods
kubectl get pods -n <hcp-namespace> --field-selector=status.phase!=Running
# Review control-plane-operator logs
kubectl logs -n <hcp-namespace> deployment/control-plane-operator --tail=100
What to look for:
Investigate why the HCP namespace isn't being removed:
# Check namespace status
kubectl get namespace <hcp-namespace> -o yaml
# List all remaining resources in namespace
kubectl api-resources --verbs=list --namespaced -o name | \
xargs -n 1 kubectl get --show-kind --ignore-not-found -n <hcp-namespace>
# Check for resources with finalizers
kubectl get all -n <hcp-namespace> -o json | \
jq '.items[] | select(.metadata.finalizers != null) | {kind: .kind, name: .metadata.name, finalizers: .metadata.finalizers}'
# Review HO logs for namespace cleanup
kubectl logs -n hypershift deployment/operator --tail=100 | grep -i namespace
What to look for:
Final check on the HostedCluster resource itself:
# Check HostedCluster status
kubectl get hostedcluster -n <hc-namespace> <cluster-name> -o yaml
# Check HostedCluster finalizers
kubectl get hostedcluster -n <hc-namespace> <cluster-name> -o jsonpath='{.metadata.finalizers}'
# Review HO logs for HostedCluster deletion
kubectl logs -n hypershift deployment/operator --tail=200 | grep -i "hostedcluster.*<cluster-name>"
What to look for:
When investigating cluster deletion issues:
InfrastructureReady: 1 of 2 completedWhen HyperShift CRDs are accidentally deleted or corrupted (e.g., after using hypershift destroy infra), you'll need to reinstall HyperShift.
⚠️ WARNING: HyperShift reinstallation should be a last resort. Only proceed if CRDs are genuinely missing or corrupted and cannot be recovered through other means. Reinstallation will cause downtime and may impact existing hosted-clusters.
kubectl get hostedclusters fail with: error: the server doesn't have a resource type "hostedclusters"# Check if critical HyperShift CRDs exist
kubectl get crd hostedclusters.hypershift.openshift.io
kubectl get crd nodepools.hypershift.openshift.io
# Count HyperShift CRDs (should be ~9)
kubectl get crd | grep hypershift | wc -l
# Count CAPI CRDs (should be ~50)
kubectl get crd | grep cluster.x-k8s.io | wc -l
🤖 AI Assistant Note: When this scenario is encountered, Claude should guide and suggest the reinstallation steps to the user but NEVER execute the reinstallation commands itself. The user must explicitly run these commands. Provide clear instructions and explanations, but do not use the Bash tool to perform the actual reinstallation.
# You'll need these for reinstallation:
# - OIDC storage provider configuration (provider-specific, see below)
# - Provider credentials (if applicable)
# - Any custom configuration flags used in original installation
Provider-specific parameters:
hypershift install render | kubectl delete -f -
This will:
hypershift install \
[provider-specific-flags] \
--enable-defaulting-webhook true
Add any other flags that were part of your original installation.
Provider-specific installation:
# Check operator is running
kubectl get deploy -n hypershift
kubectl get pods -n hypershift
# Verify CRDs are installed
kubectl get crd | grep hostedcluster
kubectl get crd | grep nodepool
kubectl get crd | grep cluster.x-k8s.io | wc -l
# Test API accessibility
kubectl get hostedclusters -A
# Check operator logs for errors
kubectl logs -n hypershift deployment/operator --tail=50
# Verify controllers are running
kubectl logs -n hypershift deployment/operator --tail=100 | grep "Starting workers"
Expected Results After Reinstallation:
kubectl get hostedclusters -A returns successfully (even if no clusters exist)Important Notes:
# List all finalizers on a resource
kubectl get <resource-type> <name> -n <namespace> -o jsonpath='{.metadata.finalizers}'
# Remove a specific finalizer (use with caution!)
kubectl patch <resource-type> <name> -n <namespace> -p '{"metadata":{"finalizers":null}}' --type=merge
# HyperShift operator logs with context
kubectl logs -n hypershift deployment/operator --tail=500 --timestamps
# Control plane operator logs
kubectl logs -n <hcp-namespace> deployment/control-plane-operator --tail=500 --timestamps
# Follow logs in real-time
kubectl logs -n hypershift deployment/operator -f
# Get events for a specific resource
kubectl describe <resource-type> <name> -n <namespace>
# Get all events in a namespace, sorted by time
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Check resource conditions
kubectl get <resource-type> <name> -n <namespace> -o jsonpath='{.status.conditions}' | jq .
# Check specific condition
kubectl get hostedcluster <name> -n <namespace> -o jsonpath='{.status.conditions[?(@.type=="Available")]}'
hypershift-operator/controllers/hostedcluster/control-plane-operator/controllers/api/hypershift/v1beta1/test/e2e/Weekly Installs
–
Repository
GitHub Stars
515
First Seen
–
Security Audits
Vercel KV 使用指南:Redis 兼容键值存储、缓存与速率限制最佳实践
314 周安装
高级前端开发工具集:React/Next.js脚手架、组件生成与性能优化自动化方案
315 周安装
Cloudflare Python Workers 教程:使用 Pyodide 在边缘运行 Python 代码
316 周安装
undocs - 基于 Nuxt 的零配置极简文档主题与 CLI 工具
318 周安装
知识库文章写作指南:SEO优化、结构与格式标准,提升客户自助服务效率
316 周安装
ts-agent-sdk:为AI代理生成类型化TypeScript SDK,简化MCP服务器交互
317 周安装