Difference between revisions of "Kubernetes troubleshooting"
Jump to navigation
Jump to search
(→Log) |
|||
(13 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | * [[Kubernetes troubleshooting steps]] | |
== Commands == | == Commands == | ||
Line 6: | Line 6: | ||
* <code>[[kubectl describe pod]] your_pod</code> | * <code>[[kubectl describe pod]] your_pod</code> | ||
* <code>[[kubectl describe nodes]]</code>, review <code>[[kubectl describe nodes (conditions:)|conditions:]]</code> | * <code>[[kubectl describe nodes]]</code>, review <code>[[kubectl describe nodes (conditions:)|conditions:]]</code> | ||
+ | * <code>[[kubectl top]]</code> | ||
+ | * <code>[[kubectl cluster-info dump]]</code> | ||
− | * Tools: | + | * Tools: <code>[[K9s]]</code> and <code>[[crictl]]</code></code> |
== [[Kubernetes events|Events]] == | == [[Kubernetes events|Events]] == | ||
Line 18: | Line 20: | ||
[[Load Balancer]] | [[Load Balancer]] | ||
− | * [[UnAvailableLoadBalancer]] | + | * <code>[[UnAvailableLoadBalancer]]</code> |
[[Kubelet]] | [[Kubelet]] | ||
Line 30: | Line 32: | ||
* [[Karpenter]] | * [[Karpenter]] | ||
* <code>[[ttlSecondsUntilExpired]]</code>, <code>[[controller.node]] [[Triggering termination for expired node after]] 168h0m0s .../...</code> | * <code>[[ttlSecondsUntilExpired]]</code>, <code>[[controller.node]] [[Triggering termination for expired node after]] 168h0m0s .../...</code> | ||
+ | |||
+ | [[etcd]] | ||
== Log == | == Log == | ||
− | * [[Karpenter logs]] | + | * <code>[[Karpenter logs]]</code> |
− | * [[Kubelet logs]] | + | * <code>[[Kubelet logs]]</code> |
* <code>[[/var/log/kubelet.log]]</code> | * <code>[[/var/log/kubelet.log]]</code> | ||
== Related == | == Related == | ||
* [[Readiness]], [[Liveness]], <code>[[Readiness probe errored]]</code> | * [[Readiness]], [[Liveness]], <code>[[Readiness probe errored]]</code> | ||
− | * [[Reason]]: [[ProbeWarning]] | + | * <code>[[Reason]]: [[ProbeWarning]]</code> |
* [[Kubernetes Pod Disruptions]] | * [[Kubernetes Pod Disruptions]] | ||
* <code>[[Unable to connect to the server]], [[~/.kube/config]]</code> | * <code>[[Unable to connect to the server]], [[~/.kube/config]]</code> | ||
Line 47: | Line 51: | ||
* <code>[[karpenter.sh/do-not-evict: true]]</code> | * <code>[[karpenter.sh/do-not-evict: true]]</code> | ||
* <code>[[NodeNotReady]]</code> | * <code>[[NodeNotReady]]</code> | ||
+ | * <code>[[kubectl-node-shell]]</code> | ||
+ | * <code>[[kubectl exec]]</code> | ||
+ | * <code>[[kubectl attach]]</code> | ||
+ | * [[EKS troubleshooting]] | ||
== Activities == | == Activities == | ||
* Review: https://learnk8s.io/troubleshooting-deployments | * Review: https://learnk8s.io/troubleshooting-deployments | ||
+ | * [[Kubernetes debugging with an ephemeral debug container]]: <code>[[kubectl debug]]</code> | ||
== See also == | == See also == |
Latest revision as of 11:55, 28 February 2024
Commands[edit]
kubectl logs your_pod
kubectl get events -A
kubectl describe pod your_pod
kubectl describe nodes
, reviewconditions:
kubectl top
kubectl cluster-info dump
Events[edit]
- FailedScheduling:
Insufficient cpu
,Insufficient memory
,timed out waiting for the condition
,unbound immediate PersistentVolumeClaims
kubectl get events, OOMKilling, FailedKillPod, SuccessfulDelete, SuccessfulCreate, NoPods, Warning, Critical, NodeSysctlChange, FailedAttachVolume, FailedMount, UnAvailableLoadBalancer, FailedCreatePodSandBox, InvalidDiskCapacity, Scheduled, NetworkNotReady, Evict, Killing, SuccessfulReconcilied, FailedToUpdateEndpointSlices, BackendNotFound, FailedScheduling, ProvisioningFailed
- Kubernetes node events: Kubernetes node events,
NodeNotSchedulable
,NodeAllocatableEnforced
,NodeHasNoDiskPressure, DiskPressure, NodeHasSufficientMemory
,NodeHasSufficientPID
,RegisteredNode
,InvalidDiskCapacity
,Starting
,NodeReady
,RemovingNode
Kubernetes components[edit]
- Kubernetes scheduling
- Kubernetes Pod Topology Spread Constraints
- Kubernetes pod affinity and anti affinity
- Karpenter
ttlSecondsUntilExpired
,controller.node Triggering termination for expired node after 168h0m0s .../...
Log[edit]
Related[edit]
- Readiness, Liveness,
Readiness probe errored
Reason: ProbeWarning
- Kubernetes Pod Disruptions
Unable to connect to the server, ~/.kube/config
DiskPressure
CalculateExpectedPodCountFailed
aws eks create-cluster --logging
Node-pressure Eviction
karpenter.sh/do-not-evict: true
NodeNotReady
kubectl-node-shell
kubectl exec
kubectl attach
- EKS troubleshooting
Activities[edit]
- Review: https://learnk8s.io/troubleshooting-deployments
- Kubernetes debugging with an ephemeral debug container:
kubectl debug
See also[edit]
kubectl get events, OOMKilling, FailedKillPod, SuccessfulDelete, SuccessfulCreate, NoPods, Warning, Critical, NodeSysctlChange, FailedAttachVolume, FailedMount, UnAvailableLoadBalancer, FailedCreatePodSandBox, InvalidDiskCapacity, Scheduled, NetworkNotReady, Evict, Killing, SuccessfulReconcilied, FailedToUpdateEndpointSlices, BackendNotFound, FailedScheduling, ProvisioningFailed
- Kubernetes node events: Kubernetes node events,
NodeNotSchedulable
,NodeAllocatableEnforced
,NodeHasNoDiskPressure, DiskPressure, NodeHasSufficientMemory
,NodeHasSufficientPID
,RegisteredNode
,InvalidDiskCapacity
,Starting
,NodeReady
,RemovingNode
- K8s troubleshooting:
kubectl logs, kubectl top, kubectl get events -A, kubectl describe pod
, Liveness, Readiness,Kubernetes events
, Pulling image, OOMKilled, ProbeWarning, Reason,FailedScheduling
,errImagePull, ImagePullBackOff
, Kubelet conditions:MemoryPressure, DiskPressure, KubeletHasSufficientPID, KubeletReady, kubectl [ debug | attach | exec ] kubectl cluster-info dump, SimKube, KWOK
Advertising: