Provides production-grade Kubernetes patterns and kubectl debugging commands for workloads, probes, RBAC, ConfigMap/Secret management, and autoscaling. Helps write, review, and debug K8s YAML.
How this skill is triggered — by the user, by Claude, or both
Slash command
/everything-claude-code:kubernetes-patternsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
面向可靠部署、管理和调试工作负载的生产级 Kubernetes 模式。
面向可靠部署、管理和调试工作负载的生产级 Kubernetes 模式。
同上方 何时激活。此别名满足仓库 skill-format 约定。任何编写、审查或调试 Kubernetes YAML 和工作负载时都可使用此技能。
此技能提供可复制粘贴的生产级 YAML 模式和按任务组织的 kubectl 调试命令:
Deployment,含 security context、rolling update 策略、三种 probe 类型、resource limits,以及从 ConfigMap/Secret 注入环境变量。failureThreshold × periodSeconds 数学。envFrom、file-mount 和 external secrets 指引。restartPolicy 的一次性和定时工作负载模式。完整可运行示例见下方 sections。快速参考:
| 任务 | 跳转到 |
|---|---|
| 完整生产 Deployment YAML | Core Workload Patterns |
| Probe 配置 | Probes |
| RBAC 最小权限设置 | RBAC |
| 调试 CrashLoopBackOff | kubectl Debugging Cheatsheet |
| Autoscaling | HPA |
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: my-namespace
labels:
app: my-app
version: "1.0.0"
spec:
replicas: 3
selector:
matchLabels:
app: my-app
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # 更新期间允许 1 个额外 pod
maxUnavailable: 0 # 绝不低于期望数量
template:
metadata:
labels:
app: my-app
version: "1.0.0"
spec:
# pod 级 security context
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
# 优雅关闭
terminationGracePeriodSeconds: 30
containers:
- name: my-app
image: ghcr.io/org/my-app:1.0.0 # 绝不用 :latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
protocol: TCP
# requests 和 limits 都必须配置
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
# 容器 security context
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
# Probes(见下方 Probes section)
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 0
periodSeconds: 30
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 2
# 从 ConfigMap 和 Secret 注入环境变量
envFrom:
- configMapRef:
name: my-app-config
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: my-app-secrets
key: db-password
# readOnlyRootFilesystem: true 时可写的 tmp 目录
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
理解每种 probe 何时使用很关键:
| Probe | 失败动作 | 用于 |
|---|---|---|
startupProbe | 启动慢则杀掉容器 | 慢启动应用(JVM、Python) |
livenessProbe | 重启容器 | 死锁 / 卡住进程检测 |
readinessProbe | 从 Service endpoints 移除 | 临时不可用(DB 重连) |
# 正确模式:startupProbe 覆盖慢启动,
# 然后 liveness/readiness 接管
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30 # 30 * 5s = 最长 150s 启动时间
periodSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 8080
periodSeconds: 30
failureThreshold: 3 # 3 * 30s = 90s 后重启
readinessProbe:
httpGet:
path: /ready # 独立 endpoint:检查 DB、cache 等
port: 8080
periodSeconds: 10
failureThreshold: 2
# 错误:没有 startupProbe 却用 initialDelaySeconds
# 如果应用要 60s 启动,应设 startupProbe
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60 # 差:任意等待,有竞态
# ClusterIP(默认)— 仅内部
apiVersion: v1
kind: Service
metadata:
name: my-app
namespace: my-namespace
spec:
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
protocol: TCP
type: ClusterIP
# LoadBalancer — 外部流量(云厂商)
spec:
type: LoadBalancer
ports:
- port: 443
targetPort: 8080
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app
namespace: my-namespace
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- myapp.example.com
secretName: my-app-tls
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app
port:
number: 80
apiVersion: v1
kind: ConfigMap
metadata:
name: my-app-config
namespace: my-namespace
data:
LOG_LEVEL: "info"
APP_ENV: "production"
MAX_CONNECTIONS: "100"
# 复杂配置挂载为文件
app.yaml: |
server:
port: 8080
timeout: 30s
# 把 ConfigMap 挂载为文件
volumes:
- name: config
configMap:
name: my-app-config
items:
- key: app.yaml
path: app.yaml
volumeMounts:
- name: config
mountPath: /etc/app
readOnly: true
# 从字面量创建 secret(CLI,然后存到 Vault/SOPS)
kubectl create secret generic my-app-secrets \
--from-literal=db-password='s3cr3t' \
--namespace=my-namespace \
--dry-run=client -o yaml | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: my-app-secrets
namespace: my-namespace
type: Opaque
# 值是 base64 编码(不是加密——要真正加密用 Sealed Secrets 或 ESO)
data:
db-password: czNjcjN0 # base64 of 's3cr3t'
重要: 原始 Kubernetes Secrets 只是 base64 编码,除非集群配置了加密,否则不是静态加密。生产环境用 Sealed Secrets 或 External Secrets Operator。
resources:
requests: # 调度器用它放置 pod
cpu: "100m" # 100 millicores = 0.1 CPU
memory: "128Mi"
limits: # 超过此值容器被 kill/throttle
cpu: "500m"
memory: "256Mi"
经验规则:
| 工作负载类型 | CPU Request | Memory Request | 备注 |
|---|---|---|---|
| Web API | 100–250m | 128–256Mi | limits 设为 requests 的 2-4 倍 |
| Worker/consumer | 250–500m | 256–512Mi | memory limit = request 以保证可预测 |
| JVM 应用 | 500m–1 | 512Mi–2Gi | 在 -Xmx 之上留 JVM overhead |
| Sidecar | 10–50m | 32–64Mi | 保持最小 |
# 错误:没有 requests 或 limits——调度不可预测,OOM 驱逐
containers:
- name: app
image: myapp:latest
# 缺少 resources: {} —— 生产环境很危险
# 错误:有 limits 无 requests——requests 默认等于 limits,过度预留容量
resources:
limits:
cpu: "2"
memory: "1Gi"
# 缺少 requests——会默认等于 limits
两种模式,取决于应用是否调用 Kubernetes API:
在 ServiceAccount 上禁用 token 自动挂载。不需要 Role/RoleBinding。
# 禁用 token 的 ServiceAccount——最安全的默认
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app-sa
namespace: my-namespace
automountServiceAccountToken: false # 不向 pods 注入 K8s API token
# 在 Deployment 中引用——无 token、无 API 访问
spec:
template:
spec:
serviceAccountName: my-app-sa
automountServiceAccountToken: false # 双保险:pod 级也设
启用 token,并只授予实际需要的权限。
# 1. ServiceAccount——为此 SA 启用 token
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app-sa
namespace: my-namespace
automountServiceAccountToken: true # 需要 token:应用调用 K8s API
# 2. Role——只授予应用所需(namespace 作用域)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: my-app-role
namespace: my-namespace
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"] # 只读,特定资源
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["my-app-secrets"] # 按名称限制到特定 secret
verbs: ["get"]
# 3. 把 Role 绑定到 ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: my-app-rolebinding
namespace: my-namespace
subjects:
- kind: ServiceAccount
name: my-app-sa
namespace: my-namespace
roleRef:
kind: Role
apiGroup: rbac.authorization.k8s.io
name: my-app-role
# 4. 在 Deployment 中引用 SA
spec:
template:
spec:
serviceAccountName: my-app-sa
# automountServiceAccountToken 默认从 SA 为 true——token 会注入
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
namespace: my-namespace
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2 # HA 至少 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # 平均 CPU > 70% 时扩容
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
HPA 要求所有容器都设置
resources.requests——它按current / request计算 utilization。
防止 node drain 或滚动更新期间过多 pod 同时下线:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
namespace: my-namespace
spec:
minAvailable: 2 # 或用 maxUnavailable: 1
selector:
matchLabels:
app: my-app
# 创建带 resource quotas 的 namespace
kubectl create namespace my-namespace
# 应用 ResourceQuota 限制 namespace 消耗
kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
name: my-namespace-quota
namespace: my-namespace
spec:
hard:
requests.cpu: "4"
requests.memory: 4Gi
limits.cpu: "8"
limits.memory: 8Gi
pods: "20"
EOF
# 一次性 Job(DB migration、数据处理)
apiVersion: batch/v1
kind: Job
metadata:
name: db-migrate
namespace: my-namespace
spec:
backoffLimit: 3 # 失败最多重试 3 次
ttlSecondsAfterFinished: 3600 # 1h 后自动删除
template:
spec:
restartPolicy: OnFailure # Jobs 用 Never 或 OnFailure,不用 Always
containers:
- name: migrate
image: ghcr.io/org/my-app:1.0.0
command: ["python", "manage.py", "migrate"]
resources:
requests:
cpu: "100m"
memory: "256Mi"
# CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: cleanup-job
namespace: my-namespace
spec:
schedule: "0 2 * * *" # 每天凌晨 2 点
concurrencyPolicy: Forbid # 上一个仍在跑就不启动新的
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: cleanup
image: ghcr.io/org/cleanup:1.0.0
resources:
requests:
cpu: "50m"
memory: "64Mi"
# --- Pod 状态和日志 ---
kubectl get pods -n my-namespace
kubectl get pods -n my-namespace -o wide # 显示 node 分配
kubectl describe pod <pod-name> -n my-namespace # events 和状态详情
kubectl logs <pod-name> -n my-namespace # 当前日志
kubectl logs <pod-name> -n my-namespace --previous # 崩溃容器的日志
kubectl logs <pod-name> -n my-namespace -c <container> # 多容器 pod
# --- 进入运行中容器 ---
kubectl exec -it <pod-name> -n my-namespace -- sh
kubectl exec -it <pod-name> -n my-namespace -- bash
# --- 查看资源使用 ---
kubectl top pods -n my-namespace
kubectl top nodes
# --- Deployment 操作 ---
kubectl rollout status deployment/my-app -n my-namespace
kubectl rollout history deployment/my-app -n my-namespace
kubectl rollout undo deployment/my-app -n my-namespace # 回滚
kubectl rollout undo deployment/my-app --to-revision=2 -n my-namespace
# --- 手动扩缩 ---
kubectl scale deployment my-app --replicas=5 -n my-namespace
# --- 检查 events(集群级问题)---
kubectl get events -n my-namespace --sort-by='.lastTimestamp'
# --- port-forward 本地调试 ---
kubectl port-forward pod/<pod-name> 8080:8080 -n my-namespace
kubectl port-forward svc/my-app 8080:80 -n my-namespace
# --- dry-run 验证 YAML ---
kubectl apply -f deployment.yaml --dry-run=client
kubectl apply -f deployment.yaml --dry-run=server # 对 live cluster 验证
# CrashLoopBackOff:容器反复崩溃
kubectl logs <pod-name> --previous -n my-namespace # 查崩溃日志
kubectl describe pod <pod-name> -n my-namespace # 查 exit code 和 OOMKilled
# ImagePullBackOff:拉不到镜像
kubectl describe pod <pod-name> -n my-namespace # 查 Events section
# 原因:镜像 tag 错、缺 imagePullSecret、私有 registry
# Pending pod:未被调度
kubectl describe pod <pod-name> -n my-namespace
# 原因:资源不足、无匹配 node selector、taint/toleration 不匹配
# OOMKilled:内存不足
# 提高 memory limits,检查内存泄漏
kubectl describe pod <pod-name> -n my-namespace | grep -A5 "Last State"
# 差:用 :latest tag——部署不可确定
image: myapp:latest
# 好:固定到具体不可变 tag(SHA 或 semver)
image: ghcr.io/org/myapp:1.4.2
# 或
image: ghcr.io/org/myapp@sha256:abc123...
# ---
# 差:以 root 运行
securityContext: {} # 默认 root
# 好:非 root 并显式 UID
securityContext:
runAsNonRoot: true
runAsUser: 1001
# ---
# 差:无 resource limits——一个 pod 可饿死整个 node
containers:
- name: app
image: myapp:1.0.0
# 无 resources 定义
# 好:始终设置 requests 和 limits
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
# ---
# 差:在 ConfigMaps 存明文 secrets
apiVersion: v1
kind: ConfigMap
data:
DB_PASSWORD: "mysecretpassword" # 绝不——用 Secret 或 external secrets manager
# ---
# 差:给应用 service accounts 授 ClusterAdmin
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
roleRef:
kind: ClusterRole
name: cluster-admin # 给你的应用 god-mode
# ---
# 差:PDB 设 minAvailable: 0——失去意义
spec:
minAvailable: 0
# ---
# 差:Job 用 restartPolicy: Always(导致无限重启循环)
spec:
restartPolicy: Always # Jobs 用 OnFailure 或 Never
runAsNonRoot: true、设置 runAsUser)readOnlyRootFilesystem: true,可写路径用 emptyDirallowPrivilegeEscalation: falsecapabilities.drop: [ALL])defaultautomountServiceAccountToken: falseRole,除非需要才用 ClusterRole)minReplicas: 2+RollingUpdate 策略带 maxUnavailable: 0/health(liveness)和 /ready(readiness)endpointsapp、version、environmentdocker-patterns — 多阶段 Dockerfiles 和镜像安全deployment-patterns — CI/CD pipelines、rollback 策略、health check endpointssecurity-review — 更广泛的安全加固上下文git-workflow — GitOps 与 K8s 集成(ArgoCD / Flux 模式)npx claudepluginhub aaione/everything-claude-code-zhProvides copy-pasteable production-grade Kubernetes YAML patterns (Deployments, Probes, RBAC, HPA, Jobs) and kubectl debugging commands for managing workloads.
Provides quick Kubernetes reference for manifests (Pods, Deployments, Services), security hardening, RBAC, kubectl commands, and troubleshooting. Activates on Kubernetes YAML files.
Creates and manages Kubernetes workloads, networking, storage, Helm charts, RBAC, and GitOps pipelines. Use for deploying, debugging, and securing clusters.