重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
loki by julianobarbosa/claude-code-skills
npx skills add https://github.com/julianobarbosa/claude-code-skills --skill lokiGrafana Loki 的全面指南——这是一款受 Prometheus 启发的、经济高效、可水平扩展的日志聚合系统。
Loki 是一个可水平扩展、高可用、多租户的日志聚合系统,它:
| 组件 | 用途 |
|---|---|
| Distributor | 验证请求,预处理标签,路由到 ingester |
| Ingester | 在内存中缓冲日志,压缩成数据块,写入存储 |
| Querier | 从 ingester 和存储中执行 LogQL 查询 |
| Query Frontend | 通过拆分、缓存、调度来加速查询 |
| Query Scheduler |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 管理每个租户的查询队列以确保公平性 |
| Index Gateway | 为 TSDB 存储提供索引查询服务 |
| Compactor | 合并索引文件,管理保留策略,处理删除 |
| Ruler | 评估告警和记录规则 |
写入路径:
Log Source → Distributor → Ingester → Object Storage
↓
Chunks + Indexes
读取路径:
Query → Query Frontend → Query Scheduler → Querier
↓
Ingesters + Storage
-target=all)deploymentMode: SimpleScalable
write:
replicas: 3 # Distributor + Ingester
read:
replicas: 2 # Query Frontend + Querier
backend:
replicas: 2 # Compactor + Index Gateway + Query Scheduler + Ruler
deploymentMode: Distributed
ingester:
replicas: 3
zoneAwareReplication:
enabled: true
distributor:
replicas: 3
querier:
replicas: 3
queryFrontend:
replicas: 2
queryScheduler:
replicas: 2
compactor:
replicas: 1
indexGateway:
replicas: 2
推荐:使用 Schema v13 的 TSDB
loki:
schemaConfig:
configs:
- from: "2024-04-01"
store: tsdb
object_store: azure # or s3, gcs
schema: v13
index:
prefix: loki_index_
period: 24h
loki:
storage:
type: azure
bucketNames:
chunks: loki-chunks
ruler: loki-ruler
admin: loki-admin
azure:
accountName: <storage-account-name>
# 选项 1:用户分配的托管标识(推荐)
useManagedIdentity: true
useFederatedToken: false
userAssignedId: <identity-client-id>
# 选项 2:账户密钥(仅限开发)
# accountKey: <account-key>
requestTimeout: 30s
loki:
storage:
type: s3
bucketNames:
chunks: my-loki-chunks-2024
ruler: my-loki-ruler-2024
admin: my-loki-admin-2024
s3:
endpoint: s3.us-east-1.amazonaws.com
region: us-east-1
# 使用 IAM 角色或访问密钥
accessKeyId: <access-key>
secretAccessKey: <secret-key>
s3ForcePathStyle: false
loki:
storage:
type: gcs
bucketNames:
chunks: my-loki-gcs-bucket
gcs:
bucketName: my-loki-gcs-bucket
# 使用 Workload Identity 或服务账户
loki:
ingester:
chunk_encoding: snappy # 推荐(快速且高效)
chunk_target_size: 1572864 # ~1.5MB 压缩后
max_chunk_age: 2h # 刷新前的最大时间
chunk_idle_period: 30m # 刷新空闲的数据块
flush_check_period: 30s
flush_op_timeout: 10m
| 设置 | 推荐值 | 用途 |
|---|---|---|
chunk_encoding | snappy | 速度与压缩比的最佳平衡 |
chunk_target_size | 1.5MB | 目标压缩数据块大小 |
max_chunk_age | 2h | 限制内存占用和数据丢失风险 |
chunk_idle_period | 30m | 刷新非活动流 |
loki:
limits_config:
# 保留策略
retention_period: 744h # 31 天
# 摄取限制
ingestion_rate_mb: 50
ingestion_burst_size_mb: 100
per_stream_rate_limit: 3MB
per_stream_rate_limit_burst: 15MB
# 查询限制
max_query_series: 10000
max_query_lookback: 720h
max_entries_limit_per_query: 10000
# OTLP 必需
allow_structured_metadata: true
volume_enabled: true
# 样本拒绝
reject_old_samples: true
reject_old_samples_max_age: 168h # 7 天
max_label_names_per_series: 25
loki:
compactor:
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 50
compaction_interval: 10m
delete_request_store: azure # 匹配您的存储类型
推荐:独立的 Memcached 实例
# Loki 缓存的 Helm 值
memcached:
# 结果缓存
frontend:
replicas: 3
memcached:
maxItemMemory: 1024 # 1GB
maxItemSize: 5m
connectionLimit: 1024
# 数据块缓存
chunks:
replicas: 3
memcached:
maxItemMemory: 4096 # 4GB
maxItemSize: 2m
connectionLimit: 1024
# 在 Loki 配置中启用缓存
loki:
chunk_store_config:
chunk_cache_config:
memcached_client:
host: loki-memcached-chunks.monitoring.svc
service: memcached-client
# 流选择器
{job="api-server"}
# 多个标签
{job="api-server", env="prod"}
# 标签匹配器
{namespace=~".*-prod"} # 正则匹配
{level!="debug"} # 不等于
# 过滤表达式
{job="api-server"} |= "error" # 包含
{job="api-server"} != "debug" # 不包含
{job="api-server"} |~ "err.*" # 正则匹配
{job="api-server"} !~ "debug.*" # 正则不匹配
# JSON 解析
{job="api-server"} | json
# 提取特定字段
{job="api-server"} | json | line_format "{{.message}}"
# 标签提取
{job="api-server"} | logfmt | level="error"
# 模式匹配
{job="api-server"} | pattern "<ip> - - [<_>] \"<method> <path>\"" | method="POST"
# 每分钟日志计数
count_over_time({job="api-server"}[1m])
# 错误率
rate({job="api-server"} |= "error" [5m])
# 字节速率
bytes_rate({job="api-server"}[5m])
# 按标签求和
sum by (namespace) (rate({job="api-server"}[5m]))
# 按流量排名前 10
topk(10, sum by (namespace) (bytes_rate({}[5m])))
OpenTelemetry Collector 配置:
exporters:
otlphttp:
endpoint: http://loki-gateway:3100/otlp
headers:
X-Scope-OrgID: "my-tenant"
service:
pipelines:
logs:
receivers: [otlp]
exporters: [otlphttp]
Loki 配置:
loki:
limits_config:
allow_structured_metadata: true # OTLP 必需
主要优势:
| OTLP 属性 | Loki 标签 |
|---|---|
service.name | service_name |
service.namespace | service_namespace |
k8s.pod.name | k8s_pod_name |
k8s.namespace.name | k8s_namespace_name |
cloud.region | cloud_region |
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install loki grafana/loki \
--namespace monitoring \
--values values.yaml
deploymentMode: Distributed
loki:
auth_enabled: true
schemaConfig:
configs:
- from: "2024-04-01"
store: tsdb
object_store: azure
schema: v13
index:
prefix: loki_index_
period: 24h
storage:
type: azure
azure:
accountName: mystorageaccount
useManagedIdentity: true
userAssignedId: <client-id>
bucketNames:
chunks: loki-chunks
ruler: loki-ruler
admin: loki-admin
limits_config:
retention_period: 2160h # 90 天
allow_structured_metadata: true
ingester:
replicas: 3
zoneAwareReplication:
enabled: true
resources:
requests:
cpu: 2
memory: 8Gi
limits:
cpu: 4
memory: 16Gi
querier:
replicas: 3
maxUnavailable: 2
queryFrontend:
replicas: 2
distributor:
replicas: 3
compactor:
replicas: 1
indexGateway:
replicas: 2
maxUnavailable: 1
# 用于外部访问的网关
gateway:
service:
type: LoadBalancer
# 监控
monitoring:
serviceMonitor:
enabled: true
1. 创建标识:
az identity create \
--name loki-identity \
--resource-group <rg>
IDENTITY_CLIENT_ID=$(az identity show --name loki-identity --resource-group <rg> --query clientId -o tsv)
IDENTITY_PRINCIPAL_ID=$(az identity show --name loki-identity --resource-group <rg> --query principalId -o tsv)
2. 分配给节点池:
az vmss identity assign \
--resource-group <aks-node-rg> \
--name <vmss-name> \
--identities /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/loki-identity
3. 授予存储权限:
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id $IDENTITY_PRINCIPAL_ID \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
4. 配置 Loki:
loki:
storage:
azure:
useManagedIdentity: true
userAssignedId: <IDENTITY_CLIENT_ID>
loki:
auth_enabled: true
# 使用租户头信息查询
curl -H "X-Scope-OrgID: tenant-a" \
"http://loki:3100/loki/api/v1/query?query={job=\"app\"}"
# 多租户查询(如果启用)
# X-Scope-OrgID: tenant-a|tenant-b
1. 容器未找到(Azure)
# 创建所需的容器
az storage container create --name loki-chunks --account-name <storage>
az storage container create --name loki-ruler --account-name <storage>
az storage container create --name loki-admin --account-name <storage>
2. 授权失败(Azure)
# 验证 RBAC 分配
az role assignment list --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
# 如果缺失则分配
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id <principal-id> \
--scope <storage-scope>
# 重启 pod 以刷新令牌
kubectl delete pod -n monitoring <ingester-pod>
3. Ingester OOM
# 增加内存限制
ingester:
resources:
limits:
memory: 16Gi
4. 查询超时
loki:
querier:
query_timeout: 5m
max_concurrent: 8
query_scheduler:
max_outstanding_requests_per_tenant: 2048
# 检查 pod 状态
kubectl get pods -n monitoring -l app.kubernetes.io/name=loki
# 检查 ingester 日志
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100
# 检查 compactor 日志
kubectl logs -n monitoring -l app.kubernetes.io/component=compactor --tail=100
# 验证就绪状态
kubectl exec -it <loki-pod> -n monitoring -- wget -qO- http://localhost:3100/ready
# 检查配置
kubectl exec -it <loki-pod> -n monitoring -- cat /etc/loki/config/config.yaml
# 推送日志
POST /loki/api/v1/push
# OTLP 日志
POST /otlp/v1/logs
# 即时查询
GET /loki/api/v1/query?query={job="app"}&time=<timestamp>
# 范围查询
GET /loki/api/v1/query_range?query={job="app"}&start=<start>&end=<end>
# 标签
GET /loki/api/v1/labels
GET /loki/api/v1/label/<name>/values
# 序列
GET /loki/api/v1/series
# 尾部(WebSocket)
GET /loki/api/v1/tail?query={job="app"}
GET /ready
GET /metrics
按主题的详细配置:
每周安装次数
66
仓库
GitHub 星标数
44
首次出现
2026 年 1 月 24 日
安全审计
安装于
codex56
opencode56
cursor55
gemini-cli55
github-copilot53
claude-code49
Comprehensive guide for Grafana Loki - the cost-effective, horizontally-scalable log aggregation system inspired by Prometheus.
Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system that:
| Component | Purpose |
|---|---|
| Distributor | Validates requests, preprocesses labels, routes to ingesters |
| Ingester | Buffers logs in memory, compresses into chunks, writes to storage |
| Querier | Executes LogQL queries from ingesters and storage |
| Query Frontend | Accelerates queries via splitting, caching, scheduling |
| Query Scheduler | Manages per-tenant query queues for fairness |
| Index Gateway | Serves index queries for TSDB stores |
| Compactor | Merges index files, manages retention, handles deletion |
| Ruler | Evaluates alerting and recording rules |
Write Path:
Log Source → Distributor → Ingester → Object Storage
↓
Chunks + Indexes
Read Path:
Query → Query Frontend → Query Scheduler → Querier
↓
Ingesters + Storage
-target=all)deploymentMode: SimpleScalable
write:
replicas: 3 # Distributor + Ingester
read:
replicas: 2 # Query Frontend + Querier
backend:
replicas: 2 # Compactor + Index Gateway + Query Scheduler + Ruler
deploymentMode: Distributed
ingester:
replicas: 3
zoneAwareReplication:
enabled: true
distributor:
replicas: 3
querier:
replicas: 3
queryFrontend:
replicas: 2
queryScheduler:
replicas: 2
compactor:
replicas: 1
indexGateway:
replicas: 2
Recommended: TSDB with Schema v13
loki:
schemaConfig:
configs:
- from: "2024-04-01"
store: tsdb
object_store: azure # or s3, gcs
schema: v13
index:
prefix: loki_index_
period: 24h
loki:
storage:
type: azure
bucketNames:
chunks: loki-chunks
ruler: loki-ruler
admin: loki-admin
azure:
accountName: <storage-account-name>
# Option 1: User-Assigned Managed Identity (Recommended)
useManagedIdentity: true
useFederatedToken: false
userAssignedId: <identity-client-id>
# Option 2: Account Key (Dev only)
# accountKey: <account-key>
requestTimeout: 30s
loki:
storage:
type: s3
bucketNames:
chunks: my-loki-chunks-2024
ruler: my-loki-ruler-2024
admin: my-loki-admin-2024
s3:
endpoint: s3.us-east-1.amazonaws.com
region: us-east-1
# Use IAM roles or access keys
accessKeyId: <access-key>
secretAccessKey: <secret-key>
s3ForcePathStyle: false
loki:
storage:
type: gcs
bucketNames:
chunks: my-loki-gcs-bucket
gcs:
bucketName: my-loki-gcs-bucket
# Uses Workload Identity or service account
loki:
ingester:
chunk_encoding: snappy # Recommended (fast + efficient)
chunk_target_size: 1572864 # ~1.5MB compressed
max_chunk_age: 2h # Max time before flush
chunk_idle_period: 30m # Flush idle chunks
flush_check_period: 30s
flush_op_timeout: 10m
| Setting | Recommended | Purpose |
|---|---|---|
chunk_encoding | snappy | Best speed-to-compression balance |
chunk_target_size | 1.5MB | Target compressed chunk size |
max_chunk_age | 2h | Limits memory and data loss exposure |
chunk_idle_period | 30m | Flushes inactive streams |
loki:
limits_config:
# Retention
retention_period: 744h # 31 days
# Ingestion limits
ingestion_rate_mb: 50
ingestion_burst_size_mb: 100
per_stream_rate_limit: 3MB
per_stream_rate_limit_burst: 15MB
# Query limits
max_query_series: 10000
max_query_lookback: 720h
max_entries_limit_per_query: 10000
# Required for OTLP
allow_structured_metadata: true
volume_enabled: true
# Sample rejection
reject_old_samples: true
reject_old_samples_max_age: 168h # 7 days
max_label_names_per_series: 25
loki:
compactor:
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 50
compaction_interval: 10m
delete_request_store: azure # Match your storage type
Recommended: Separate Memcached instances
# Helm values for Loki caching
memcached:
# Results cache
frontend:
replicas: 3
memcached:
maxItemMemory: 1024 # 1GB
maxItemSize: 5m
connectionLimit: 1024
# Chunks cache
chunks:
replicas: 3
memcached:
maxItemMemory: 4096 # 4GB
maxItemSize: 2m
connectionLimit: 1024
# Enable caching in Loki config
loki:
chunk_store_config:
chunk_cache_config:
memcached_client:
host: loki-memcached-chunks.monitoring.svc
service: memcached-client
# Stream selector
{job="api-server"}
# Multiple labels
{job="api-server", env="prod"}
# Label matchers
{namespace=~".*-prod"} # Regex match
{level!="debug"} # Not equal
# Filter expressions
{job="api-server"} |= "error" # Contains
{job="api-server"} != "debug" # Not contains
{job="api-server"} |~ "err.*" # Regex match
{job="api-server"} !~ "debug.*" # Regex not match
# JSON parsing
{job="api-server"} | json
# Extract specific fields
{job="api-server"} | json | line_format "{{.message}}"
# Label extraction
{job="api-server"} | logfmt | level="error"
# Pattern matching
{job="api-server"} | pattern "<ip> - - [<_>] \"<method> <path>\"" | method="POST"
# Count logs per minute
count_over_time({job="api-server"}[1m])
# Rate of errors
rate({job="api-server"} |= "error" [5m])
# Bytes rate
bytes_rate({job="api-server"}[5m])
# Sum by label
sum by (namespace) (rate({job="api-server"}[5m]))
# Top 10 by volume
topk(10, sum by (namespace) (bytes_rate({}[5m])))
OpenTelemetry Collector Config:
exporters:
otlphttp:
endpoint: http://loki-gateway:3100/otlp
headers:
X-Scope-OrgID: "my-tenant"
service:
pipelines:
logs:
receivers: [otlp]
exporters: [otlphttp]
Loki Config:
loki:
limits_config:
allow_structured_metadata: true # Required for OTLP
Key Benefits:
| OTLP Attribute | Loki Label |
|---|---|
service.name | service_name |
service.namespace | service_namespace |
k8s.pod.name | k8s_pod_name |
k8s.namespace.name | k8s_namespace_name |
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm install loki grafana/loki \
--namespace monitoring \
--values values.yaml
deploymentMode: Distributed
loki:
auth_enabled: true
schemaConfig:
configs:
- from: "2024-04-01"
store: tsdb
object_store: azure
schema: v13
index:
prefix: loki_index_
period: 24h
storage:
type: azure
azure:
accountName: mystorageaccount
useManagedIdentity: true
userAssignedId: <client-id>
bucketNames:
chunks: loki-chunks
ruler: loki-ruler
admin: loki-admin
limits_config:
retention_period: 2160h # 90 days
allow_structured_metadata: true
ingester:
replicas: 3
zoneAwareReplication:
enabled: true
resources:
requests:
cpu: 2
memory: 8Gi
limits:
cpu: 4
memory: 16Gi
querier:
replicas: 3
maxUnavailable: 2
queryFrontend:
replicas: 2
distributor:
replicas: 3
compactor:
replicas: 1
indexGateway:
replicas: 2
maxUnavailable: 1
# Gateway for external access
gateway:
service:
type: LoadBalancer
# Monitoring
monitoring:
serviceMonitor:
enabled: true
1. Create Identity:
az identity create \
--name loki-identity \
--resource-group <rg>
IDENTITY_CLIENT_ID=$(az identity show --name loki-identity --resource-group <rg> --query clientId -o tsv)
IDENTITY_PRINCIPAL_ID=$(az identity show --name loki-identity --resource-group <rg> --query principalId -o tsv)
2. Assign to Node Pool:
az vmss identity assign \
--resource-group <aks-node-rg> \
--name <vmss-name> \
--identities /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/loki-identity
3. Grant Storage Permission:
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id $IDENTITY_PRINCIPAL_ID \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
4. Configure Loki:
loki:
storage:
azure:
useManagedIdentity: true
userAssignedId: <IDENTITY_CLIENT_ID>
loki:
auth_enabled: true
# Query with tenant header
curl -H "X-Scope-OrgID: tenant-a" \
"http://loki:3100/loki/api/v1/query?query={job=\"app\"}"
# Multi-tenant queries (if enabled)
# X-Scope-OrgID: tenant-a|tenant-b
1. Container Not Found (Azure)
# Create required containers
az storage container create --name loki-chunks --account-name <storage>
az storage container create --name loki-ruler --account-name <storage>
az storage container create --name loki-admin --account-name <storage>
2. Authorization Failure (Azure)
# Verify RBAC assignment
az role assignment list --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>
# Assign if missing
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id <principal-id> \
--scope <storage-scope>
# Restart pod to refresh token
kubectl delete pod -n monitoring <ingester-pod>
3. Ingester OOM
# Increase memory limits
ingester:
resources:
limits:
memory: 16Gi
4. Query Timeout
loki:
querier:
query_timeout: 5m
max_concurrent: 8
query_scheduler:
max_outstanding_requests_per_tenant: 2048
# Check pod status
kubectl get pods -n monitoring -l app.kubernetes.io/name=loki
# Check ingester logs
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100
# Check compactor logs
kubectl logs -n monitoring -l app.kubernetes.io/component=compactor --tail=100
# Verify readiness
kubectl exec -it <loki-pod> -n monitoring -- wget -qO- http://localhost:3100/ready
# Check configuration
kubectl exec -it <loki-pod> -n monitoring -- cat /etc/loki/config/config.yaml
# Push logs
POST /loki/api/v1/push
# OTLP logs
POST /otlp/v1/logs
# Instant query
GET /loki/api/v1/query?query={job="app"}&time=<timestamp>
# Range query
GET /loki/api/v1/query_range?query={job="app"}&start=<start>&end=<end>
# Labels
GET /loki/api/v1/labels
GET /loki/api/v1/label/<name>/values
# Series
GET /loki/api/v1/series
# Tail (WebSocket)
GET /loki/api/v1/tail?query={job="app"}
GET /ready
GET /metrics
For detailed configuration by topic:
Weekly Installs
66
Repository
GitHub Stars
44
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex56
opencode56
cursor55
gemini-cli55
github-copilot53
claude-code49
Azure 升级评估与自动化工具 - 轻松迁移 Functions 计划、托管层级和 SKU
111,700 周安装
cloud.region | cloud_region |