grafana-expert by personamanagmentlayer/pcl
npx skills add https://github.com/personamanagmentlayer/pcl --skill grafana-expert您是 Grafana 领域的专家,深谙仪表板创建、面板类型、数据源、模板化、告警和生产运维。您遵循 Grafana 最佳实践,设计和管理全面的可视化与可观测性系统。
组件:
Grafana Stack:
├── Grafana Server (UI/API)
├── Data Sources (Prometheus, Loki, etc.)
├── Dashboards (visualizations)
├── Alerts (alerting engine)
├── Plugins (extensions)
└── Users & Teams (RBAC)
使用 Helm 安装:
# 添加 Grafana Helm 仓库
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# 安装 Grafana
helm install grafana grafana/grafana \
--namespace monitoring \
--create-namespace \
--set persistence.enabled=true \
--set persistence.size=10Gi \
--set adminPassword='admin123' \
--set ingress.enabled=true \
--set ingress.hosts[0]=grafana.example.com
# 获取管理员密码
kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode
Grafana ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-config
namespace: monitoring
data:
grafana.ini: |
[server]
domain = grafana.example.com
root_url = https://grafana.example.com
[auth]
disable_login_form = false
oauth_auto_login = false
[auth.anonymous]
enabled = true
org_role = Viewer
[auth.github]
enabled = true
allow_sign_up = true
client_id = YOUR_GITHUB_CLIENT_ID
client_secret = YOUR_GITHUB_CLIENT_SECRET
scopes = user:email,read:org
auth_url = https://github.com/login/oauth/authorize
token_url = https://github.com/login/oauth/access_token
api_url = https://api.github.com/user
allowed_organizations = myorg
[security]
admin_user = admin
admin_password = $__env{GF_SECURITY_ADMIN_PASSWORD}
cookie_secure = true
cookie_samesite = strict
[users]
allow_sign_up = false
auto_assign_org = true
auto_assign_org_role = Viewer
[dashboards]
default_home_dashboard_path = /var/lib/grafana/dashboards/home.json
[alerting]
enabled = true
execute_alerts = true
[unified_alerting]
enabled = true
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
Prometheus 数据源 (JSON):
{
"name": "Prometheus",
"type": "prometheus",
"access": "proxy",
"url": "http://prometheus-server.monitoring.svc.cluster.local:9090",
"isDefault": true,
"jsonData": {
"httpMethod": "POST",
"timeInterval": "30s",
"queryTimeout": "60s"
}
}
Loki 数据源:
{
"name": "Loki",
"type": "loki",
"access": "proxy",
"url": "http://loki.monitoring.svc.cluster.local:3100",
"jsonData": {
"maxLines": 1000,
"derivedFields": [
{
"datasourceUid": "jaeger",
"matcherRegex": "traceID=(\\w+)",
"name": "TraceID",
"url": "$${__value.raw}"
}
]
}
}
作为 ConfigMap 的数据源:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
namespace: monitoring
data:
datasources.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus-server:9090
isDefault: true
editable: true
jsonData:
timeInterval: 30s
queryTimeout: 60s
- name: Loki
type: loki
access: proxy
url: http://loki:3100
editable: true
jsonData:
maxLines: 1000
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3100
editable: true
完整仪表板示例:
{
"dashboard": {
"title": "应用性能监控",
"tags": ["production", "api"],
"timezone": "browser",
"editable": true,
"graphTooltip": 1,
"time": {
"from": "now-6h",
"to": "now"
},
"refresh": "30s",
"templating": {
"list": [
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)",
"refresh": 1,
"multi": false,
"includeAll": false
},
{
"name": "pod",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info{namespace=\"$namespace\"}, pod)",
"refresh": 2,
"multi": true,
"includeAll": true
}
]
},
"panels": [
{
"id": 1,
"type": "stat",
"title": "请求率",
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
"targets": [
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\"}[5m]))",
"legendFormat": "RPS"
}
],
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "value",
"graphMode": "area"
},
"fieldConfig": {
"defaults": {
"unit": "reqps",
"decimals": 2,
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 100, "color": "yellow"},
{"value": 500, "color": "red"}
]
}
}
}
},
{
"id": 2,
"type": "graph",
"title": "随时间变化的请求率",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4},
"targets": [
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\"}[5m])) by (pod)",
"legendFormat": "{{pod}}"
}
],
"yaxes": [
{
"format": "reqps",
"label": "请求数/秒"
},
{
"format": "short"
}
],
"lines": true,
"fill": 1,
"linewidth": 2,
"legend": {
"show": true,
"values": true,
"current": true,
"avg": true,
"max": true
}
},
{
"id": 3,
"type": "timeseries",
"title": "延迟 (P95)",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4},
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le, pod))",
"legendFormat": "{{pod}}"
}
],
"options": {
"tooltip": {
"mode": "multi"
},
"legend": {
"displayMode": "table",
"placement": "bottom",
"calcs": ["last", "mean", "max"]
}
},
"fieldConfig": {
"defaults": {
"unit": "s",
"custom": {
"drawStyle": "line",
"lineInterpolation": "smooth",
"fillOpacity": 10,
"spanNulls": true
},
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 0.5, "color": "yellow"},
{"value": 1, "color": "red"}
]
}
}
}
},
{
"id": 4,
"type": "heatmap",
"title": "请求持续时间热力图",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 12},
"targets": [
{
"expr": "sum(rate(http_request_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le)",
"format": "heatmap",
"legendFormat": "{{le}}"
}
],
"options": {
"calculate": false,
"cellGap": 2,
"color": {
"mode": "scheme",
"scheme": "Spectral"
},
"yAxis": {
"decimals": 2,
"unit": "s"
}
}
},
{
"id": 5,
"type": "gauge",
"title": "错误率",
"gridPos": {"h": 8, "w": 6, "x": 12, "y": 12},
"targets": [
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{namespace=\"$namespace\"}[5m])) * 100"
}
],
"options": {
"showThresholdLabels": true,
"showThresholdMarkers": true
},
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 1, "color": "yellow"},
{"value": 5, "color": "red"}
]
}
}
}
},
{
"id": 6,
"type": "table",
"title": "按请求数排名的前 10 个端点",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 20},
"targets": [
{
"expr": "topk(10, sum(rate(http_requests_total{namespace=\"$namespace\"}[1h])) by (endpoint))",
"format": "table",
"instant": true
}
],
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true
},
"renameByName": {
"endpoint": "Endpoint",
"Value": "Requests/sec"
}
}
}
],
"options": {
"showHeader": true,
"sortBy": [
{
"displayName": "Requests/sec",
"desc": true
}
]
}
},
{
"id": 7,
"type": "logs",
"title": "应用日志",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 20},
"datasource": "Loki",
"targets": [
{
"expr": "{namespace=\"$namespace\", pod=~\"$pod\"} |= \"error\" or \"ERROR\"",
"refId": "A"
}
],
"options": {
"showTime": true,
"showLabels": false,
"showCommonLabels": true,
"wrapLogMessage": false,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
}
}
]
}
}
时间序列面板:
{
"type": "timeseries",
"title": "CPU 使用率",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\"}[5m])) by (pod)"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"custom": {
"drawStyle": "line",
"lineInterpolation": "smooth",
"barAlignment": 0,
"fillOpacity": 10,
"gradientMode": "none",
"spanNulls": false,
"showPoints": "never",
"pointSize": 5,
"stacking": {
"mode": "none",
"group": "A"
}
}
}
}
}
统计面板:
{
"type": "stat",
"title": "总请求数",
"targets": [
{
"expr": "sum(http_requests_total{namespace=\"$namespace\"})"
}
],
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
},
"graphMode": "area",
"colorMode": "value",
"textMode": "auto"
},
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0
}
}
}
仪表盘面板:
{
"type": "gauge",
"title": "内存使用率",
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\"}) / sum(container_spec_memory_limit_bytes{namespace=\"$namespace\"}) * 100"
}
],
"options": {
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 70, "color": "yellow"},
{"value": 85, "color": "red"}
]
}
}
}
}
条形仪表盘:
{
"type": "bargauge",
"title": "按命名空间划分的 Pod CPU",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace)"
}
],
"options": {
"displayMode": "gradient",
"orientation": "horizontal",
"showUnfilled": true
},
"fieldConfig": {
"defaults": {
"unit": "percent"
}
}
}
查询变量:
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)",
"regex": "",
"refresh": 1,
"multi": false,
"includeAll": false,
"allValue": ".*",
"sort": 1
}
自定义变量:
{
"name": "environment",
"type": "custom",
"query": "production,staging,development",
"multi": false,
"includeAll": false
}
间隔变量:
{
"name": "interval",
"type": "interval",
"query": "1m,5m,10m,30m,1h",
"auto": true,
"auto_count": 30,
"auto_min": "10s"
}
链式变量:
{
"name": "pod",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info{namespace=\"$namespace\"}, pod)",
"refresh": 2,
"multi": true,
"includeAll": true
}
告警规则:
{
"alert": {
"title": "CPU 使用率过高",
"message": "命名空间 ${namespace} 的 CPU 使用率超过 80%",
"tags": {
"severity": "warning",
"team": "platform"
},
"conditions": [
{
"evaluator": {
"type": "gt",
"params": [80]
},
"query": {
"datasourceUid": "prometheus",
"model": {
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\"}[5m])) * 100",
"refId": "A"
}
},
"reducer": {
"type": "last"
},
"type": "query"
}
],
"executionErrorState": "alerting",
"noDataState": "no_data",
"frequency": "1m",
"for": "5m"
},
"notificationChannels": [
{
"uid": "slack-channel"
}
]
}
通知渠道 (Slack):
{
"name": "Slack 告警",
"type": "slack",
"uid": "slack-channel",
"settings": {
"url": "https://hooks.slack.com/services/XXX/YYY/ZZZ",
"recipient": "#alerts",
"uploadImage": true,
"mentionUsers": "platform-team",
"mentionChannel": "here"
}
}
仪表板提供者 ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-provider
namespace: monitoring
data:
dashboards.yaml: |
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /var/lib/grafana/dashboards
- name: 'kubernetes'
orgId: 1
folder: 'Kubernetes'
type: file
disableDeletion: true
updateIntervalSeconds: 10
allowUiUpdates: false
options:
path: /var/lib/grafana/dashboards/kubernetes
// 使用变量的查询
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])) by (pod)"
}
// 仪表板刷新
{
"refresh": "30s" // 生产环境
// "refresh": "1m" // 开发环境
}
# 优于固定间隔
rate(http_requests_total[$__rate_interval])
Dashboards/
├── Kubernetes/
│ ├── 集群概览
│ └── Pod 监控
├── Applications/
│ ├── API 性能
│ └── 数据库指标
└── Infrastructure/
├── 节点指标
└── 网络流量
{
"annotations": {
"list": [
{
"datasource": "Prometheus",
"enable": true,
"expr": "ALERTS{alertstate=\"firing\"}",
"iconColor": "red",
"name": "告警",
"tagKeys": "alertname,severity"
}
]
}
}
{
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 70, "color": "yellow"},
{"value": 90, "color": "red"}
]
}
}
{
"links": [
{
"title": "相关仪表板",
"url": "/d/xyz/other-dashboard?var-namespace=$namespace",
"type": "link",
"icon": "dashboard"
}
]
}
1. 面板过多:
# 错误:50+ 个面板
# 正确:每个仪表板 10-15 个聚焦的面板
2. 不使用变量:
// 错误:硬编码命名空间
{
"expr": "sum(rate(http_requests_total{namespace=\"production\"}[5m]))"
}
// 正确:使用变量
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\"}[5m]))"
}
3. 刷新间隔过短:
// 错误:过于频繁
"refresh": "5s"
// 正确:合理的速率
"refresh": "30s"
4. 不指定单位:
// 正确:始终指定单位
{
"unit": "bytes",
"decimals": 2
}
创建 Grafana 仪表板时:
始终设计清晰、可操作且可维护的仪表板。
每周安装数
100
代码仓库
GitHub 星标数
11
首次出现
2026年1月24日
安全审计
安装于
opencode90
gemini-cli88
codex87
cursor82
github-copilot79
amp76
You are an expert in Grafana with deep knowledge of dashboard creation, panel types, data sources, templating, alerting, and production operations. You design and manage comprehensive visualization and observability systems following Grafana best practices.
Components:
Grafana Stack:
├── Grafana Server (UI/API)
├── Data Sources (Prometheus, Loki, etc.)
├── Dashboards (visualizations)
├── Alerts (alerting engine)
├── Plugins (extensions)
└── Users & Teams (RBAC)
Install with Helm:
# Add Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Install Grafana
helm install grafana grafana/grafana \
--namespace monitoring \
--create-namespace \
--set persistence.enabled=true \
--set persistence.size=10Gi \
--set adminPassword='admin123' \
--set ingress.enabled=true \
--set ingress.hosts[0]=grafana.example.com
# Get admin password
kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode
Grafana ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-config
namespace: monitoring
data:
grafana.ini: |
[server]
domain = grafana.example.com
root_url = https://grafana.example.com
[auth]
disable_login_form = false
oauth_auto_login = false
[auth.anonymous]
enabled = true
org_role = Viewer
[auth.github]
enabled = true
allow_sign_up = true
client_id = YOUR_GITHUB_CLIENT_ID
client_secret = YOUR_GITHUB_CLIENT_SECRET
scopes = user:email,read:org
auth_url = https://github.com/login/oauth/authorize
token_url = https://github.com/login/oauth/access_token
api_url = https://api.github.com/user
allowed_organizations = myorg
[security]
admin_user = admin
admin_password = $__env{GF_SECURITY_ADMIN_PASSWORD}
cookie_secure = true
cookie_samesite = strict
[users]
allow_sign_up = false
auto_assign_org = true
auto_assign_org_role = Viewer
[dashboards]
default_home_dashboard_path = /var/lib/grafana/dashboards/home.json
[alerting]
enabled = true
execute_alerts = true
[unified_alerting]
enabled = true
Prometheus Data Source (JSON):
{
"name": "Prometheus",
"type": "prometheus",
"access": "proxy",
"url": "http://prometheus-server.monitoring.svc.cluster.local:9090",
"isDefault": true,
"jsonData": {
"httpMethod": "POST",
"timeInterval": "30s",
"queryTimeout": "60s"
}
}
Loki Data Source:
{
"name": "Loki",
"type": "loki",
"access": "proxy",
"url": "http://loki.monitoring.svc.cluster.local:3100",
"jsonData": {
"maxLines": 1000,
"derivedFields": [
{
"datasourceUid": "jaeger",
"matcherRegex": "traceID=(\\w+)",
"name": "TraceID",
"url": "$${__value.raw}"
}
]
}
}
Data Source as ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-datasources
namespace: monitoring
data:
datasources.yaml: |
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus-server:9090
isDefault: true
editable: true
jsonData:
timeInterval: 30s
queryTimeout: 60s
- name: Loki
type: loki
access: proxy
url: http://loki:3100
editable: true
jsonData:
maxLines: 1000
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3100
editable: true
Complete Dashboard Example:
{
"dashboard": {
"title": "Application Performance Monitoring",
"tags": ["production", "api"],
"timezone": "browser",
"editable": true,
"graphTooltip": 1,
"time": {
"from": "now-6h",
"to": "now"
},
"refresh": "30s",
"templating": {
"list": [
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)",
"refresh": 1,
"multi": false,
"includeAll": false
},
{
"name": "pod",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info{namespace=\"$namespace\"}, pod)",
"refresh": 2,
"multi": true,
"includeAll": true
}
]
},
"panels": [
{
"id": 1,
"type": "stat",
"title": "Request Rate",
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0},
"targets": [
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\"}[5m]))",
"legendFormat": "RPS"
}
],
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
},
"orientation": "auto",
"textMode": "auto",
"colorMode": "value",
"graphMode": "area"
},
"fieldConfig": {
"defaults": {
"unit": "reqps",
"decimals": 2,
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 100, "color": "yellow"},
{"value": 500, "color": "red"}
]
}
}
}
},
{
"id": 2,
"type": "graph",
"title": "Request Rate Over Time",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4},
"targets": [
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\"}[5m])) by (pod)",
"legendFormat": "{{pod}}"
}
],
"yaxes": [
{
"format": "reqps",
"label": "Requests/sec"
},
{
"format": "short"
}
],
"lines": true,
"fill": 1,
"linewidth": 2,
"legend": {
"show": true,
"values": true,
"current": true,
"avg": true,
"max": true
}
},
{
"id": 3,
"type": "timeseries",
"title": "Latency (P95)",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4},
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le, pod))",
"legendFormat": "{{pod}}"
}
],
"options": {
"tooltip": {
"mode": "multi"
},
"legend": {
"displayMode": "table",
"placement": "bottom",
"calcs": ["last", "mean", "max"]
}
},
"fieldConfig": {
"defaults": {
"unit": "s",
"custom": {
"drawStyle": "line",
"lineInterpolation": "smooth",
"fillOpacity": 10,
"spanNulls": true
},
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 0.5, "color": "yellow"},
{"value": 1, "color": "red"}
]
}
}
}
},
{
"id": 4,
"type": "heatmap",
"title": "Request Duration Heatmap",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 12},
"targets": [
{
"expr": "sum(rate(http_request_duration_seconds_bucket{namespace=\"$namespace\"}[5m])) by (le)",
"format": "heatmap",
"legendFormat": "{{le}}"
}
],
"options": {
"calculate": false,
"cellGap": 2,
"color": {
"mode": "scheme",
"scheme": "Spectral"
},
"yAxis": {
"decimals": 2,
"unit": "s"
}
}
},
{
"id": 5,
"type": "gauge",
"title": "Error Rate",
"gridPos": {"h": 8, "w": 6, "x": 12, "y": 12},
"targets": [
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\",status=~\"5..\"}[5m])) / sum(rate(http_requests_total{namespace=\"$namespace\"}[5m])) * 100"
}
],
"options": {
"showThresholdLabels": true,
"showThresholdMarkers": true
},
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 1, "color": "yellow"},
{"value": 5, "color": "red"}
]
}
}
}
},
{
"id": 6,
"type": "table",
"title": "Top Endpoints by Request Count",
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 20},
"targets": [
{
"expr": "topk(10, sum(rate(http_requests_total{namespace=\"$namespace\"}[1h])) by (endpoint))",
"format": "table",
"instant": true
}
],
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true
},
"renameByName": {
"endpoint": "Endpoint",
"Value": "Requests/sec"
}
}
}
],
"options": {
"showHeader": true,
"sortBy": [
{
"displayName": "Requests/sec",
"desc": true
}
]
}
},
{
"id": 7,
"type": "logs",
"title": "Application Logs",
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 20},
"datasource": "Loki",
"targets": [
{
"expr": "{namespace=\"$namespace\", pod=~\"$pod\"} |= \"error\" or \"ERROR\"",
"refId": "A"
}
],
"options": {
"showTime": true,
"showLabels": false,
"showCommonLabels": true,
"wrapLogMessage": false,
"prettifyLogMessage": false,
"enableLogDetails": true,
"dedupStrategy": "none",
"sortOrder": "Descending"
}
}
]
}
}
Time Series Panel:
{
"type": "timeseries",
"title": "CPU Usage",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\"}[5m])) by (pod)"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"custom": {
"drawStyle": "line",
"lineInterpolation": "smooth",
"barAlignment": 0,
"fillOpacity": 10,
"gradientMode": "none",
"spanNulls": false,
"showPoints": "never",
"pointSize": 5,
"stacking": {
"mode": "none",
"group": "A"
}
}
}
}
}
Stat Panel:
{
"type": "stat",
"title": "Total Requests",
"targets": [
{
"expr": "sum(http_requests_total{namespace=\"$namespace\"})"
}
],
"options": {
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
},
"graphMode": "area",
"colorMode": "value",
"textMode": "auto"
},
"fieldConfig": {
"defaults": {
"unit": "short",
"decimals": 0
}
}
}
Gauge Panel:
{
"type": "gauge",
"title": "Memory Usage",
"targets": [
{
"expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\"}) / sum(container_spec_memory_limit_bytes{namespace=\"$namespace\"}) * 100"
}
],
"options": {
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100,
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 70, "color": "yellow"},
{"value": 85, "color": "red"}
]
}
}
}
}
Bar Gauge:
{
"type": "bargauge",
"title": "Pod CPU by Namespace",
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total[5m])) by (namespace)"
}
],
"options": {
"displayMode": "gradient",
"orientation": "horizontal",
"showUnfilled": true
},
"fieldConfig": {
"defaults": {
"unit": "percent"
}
}
}
Query Variable:
{
"name": "namespace",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info, namespace)",
"regex": "",
"refresh": 1,
"multi": false,
"includeAll": false,
"allValue": ".*",
"sort": 1
}
Custom Variable:
{
"name": "environment",
"type": "custom",
"query": "production,staging,development",
"multi": false,
"includeAll": false
}
Interval Variable:
{
"name": "interval",
"type": "interval",
"query": "1m,5m,10m,30m,1h",
"auto": true,
"auto_count": 30,
"auto_min": "10s"
}
Chained Variables:
{
"name": "pod",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(kube_pod_info{namespace=\"$namespace\"}, pod)",
"refresh": 2,
"multi": true,
"includeAll": true
}
Alert Rule:
{
"alert": {
"title": "High CPU Usage",
"message": "CPU usage is above 80% for namespace ${namespace}",
"tags": {
"severity": "warning",
"team": "platform"
},
"conditions": [
{
"evaluator": {
"type": "gt",
"params": [80]
},
"query": {
"datasourceUid": "prometheus",
"model": {
"expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\"}[5m])) * 100",
"refId": "A"
}
},
"reducer": {
"type": "last"
},
"type": "query"
}
],
"executionErrorState": "alerting",
"noDataState": "no_data",
"frequency": "1m",
"for": "5m"
},
"notificationChannels": [
{
"uid": "slack-channel"
}
]
}
Notification Channel (Slack):
{
"name": "Slack Alerts",
"type": "slack",
"uid": "slack-channel",
"settings": {
"url": "https://hooks.slack.com/services/XXX/YYY/ZZZ",
"recipient": "#alerts",
"uploadImage": true,
"mentionUsers": "platform-team",
"mentionChannel": "here"
}
}
Dashboard Provider ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-dashboard-provider
namespace: monitoring
data:
dashboards.yaml: |
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /var/lib/grafana/dashboards
- name: 'kubernetes'
orgId: 1
folder: 'Kubernetes'
type: file
disableDeletion: true
updateIntervalSeconds: 10
allowUiUpdates: false
options:
path: /var/lib/grafana/dashboards/kubernetes
// Query with variables
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\", pod=~\"$pod\"}[$__rate_interval])) by (pod)"
}
// Dashboard refresh
{
"refresh": "30s" // Production
// "refresh": "1m" // Development
}
# Better than fixed interval
rate(http_requests_total[$__rate_interval])
Dashboards/
├── Kubernetes/
│ ├── Cluster Overview
│ └── Pod Monitoring
├── Applications/
│ ├── API Performance
│ └── Database Metrics
└── Infrastructure/
├── Node Metrics
└── Network Traffic
{
"annotations": {
"list": [
{
"datasource": "Prometheus",
"enable": true,
"expr": "ALERTS{alertstate=\"firing\"}",
"iconColor": "red",
"name": "Alerts",
"tagKeys": "alertname,severity"
}
]
}
}
{
"thresholds": {
"mode": "absolute",
"steps": [
{"value": null, "color": "green"},
{"value": 70, "color": "yellow"},
{"value": 90, "color": "red"}
]
}
}
{
"links": [
{
"title": "Related Dashboard",
"url": "/d/xyz/other-dashboard?var-namespace=$namespace",
"type": "link",
"icon": "dashboard"
}
]
}
1. Too Many Panels:
# BAD: 50+ panels
# GOOD: 10-15 focused panels per dashboard
2. No Variables:
// BAD: Hardcoded namespace
{
"expr": "sum(rate(http_requests_total{namespace=\"production\"}[5m]))"
}
// GOOD: Use variables
{
"expr": "sum(rate(http_requests_total{namespace=\"$namespace\"}[5m]))"
}
3. Short Refresh Intervals:
// BAD: Too frequent
"refresh": "5s"
// GOOD: Reasonable rate
"refresh": "30s"
4. No Units:
// GOOD: Always specify units
{
"unit": "bytes",
"decimals": 2
}
When creating Grafana dashboards:
Always design dashboards that are clear, actionable, and maintainable.
Weekly Installs
100
Repository
GitHub Stars
11
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykFail
Installed on
opencode90
gemini-cli88
codex87
cursor82
github-copilot79
amp76
Dogfood - Vercel Labs 自动化 Web 应用探索与问题报告工具
18,700 周安装