Difference between revisions of "Kubernetes troubleshooting"
Jump to navigation
Jump to search
(→Log) |
|||
(44 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | * | + | * [[Kubernetes troubleshooting steps]] |
+ | == Commands == | ||
* <code>[[kubectl logs]] [[your_pod]]</code> | * <code>[[kubectl logs]] [[your_pod]]</code> | ||
* <code>[[kubectl get events -A]]</code> | * <code>[[kubectl get events -A]]</code> | ||
* <code>[[kubectl describe pod]] your_pod</code> | * <code>[[kubectl describe pod]] your_pod</code> | ||
− | * <code>[[kubectl describe nodes]]</code>, review [[kubectl describe nodes (conditions:)|conditions:]] | + | * <code>[[kubectl describe nodes]]</code>, review <code>[[kubectl describe nodes (conditions:)|conditions:]]</code> |
+ | * <code>[[kubectl top]]</code> | ||
+ | * <code>[[kubectl cluster-info dump]]</code> | ||
− | * <code>[[ | + | * Tools: <code>[[K9s]]</code> and <code>[[crictl]]</code></code> |
+ | == [[Kubernetes events|Events]] == | ||
* {{FailedScheduling}} | * {{FailedScheduling}} | ||
+ | * {{kubectl get events}} | ||
+ | * {{Kubernetes nodes events}} | ||
+ | == [[Kubernetes components]] == | ||
[[Load Balancer]] | [[Load Balancer]] | ||
− | * [[UnAvailableLoadBalancer]] | + | * <code>[[UnAvailableLoadBalancer]]</code> |
− | * [[ | + | [[Kubelet]] |
+ | * <code>[[PLEG is not healthy]]</code> | ||
+ | * <code>[[/var/log/kubelet.log]]</code> | ||
+ | |||
+ | [[Scheduling]] | ||
+ | * [[Kubernetes scheduling]] | ||
+ | * [[Kubernetes Pod Topology Spread Constraints]] | ||
+ | * [[Kubernetes pod affinity and anti affinity]] | ||
+ | * [[Karpenter]] | ||
+ | * <code>[[ttlSecondsUntilExpired]]</code>, <code>[[controller.node]] [[Triggering termination for expired node after]] 168h0m0s .../...</code> | ||
+ | |||
+ | [[etcd]] | ||
+ | |||
+ | == Log == | ||
+ | * <code>[[Karpenter logs]]</code> | ||
+ | * <code>[[Kubelet logs]]</code> | ||
+ | * <code>[[/var/log/kubelet.log]]</code> | ||
== Related == | == Related == | ||
− | * [[ | + | * [[Readiness]], [[Liveness]], <code>[[Readiness probe errored]]</code> |
− | + | * <code>[[Reason]]: [[ProbeWarning]]</code> | |
− | * <code> | ||
− | |||
− | |||
* [[Kubernetes Pod Disruptions]] | * [[Kubernetes Pod Disruptions]] | ||
* <code>[[Unable to connect to the server]], [[~/.kube/config]]</code> | * <code>[[Unable to connect to the server]], [[~/.kube/config]]</code> | ||
* <code>[[DiskPressure]]</code> | * <code>[[DiskPressure]]</code> | ||
− | * <code>[[ | + | * <code>[[CalculateExpectedPodCountFailed]]</code> |
+ | * <code>[[aws eks create-cluster --logging]]</code> | ||
+ | * <code>[[Node-pressure Eviction]]</code> | ||
+ | * <code>[[karpenter.sh/do-not-evict: true]]</code> | ||
+ | * <code>[[NodeNotReady]]</code> | ||
+ | * <code>[[kubectl-node-shell]]</code> | ||
+ | * <code>[[kubectl exec]]</code> | ||
+ | * <code>[[kubectl attach]]</code> | ||
+ | * [[EKS troubleshooting]] | ||
+ | |||
+ | == Activities == | ||
+ | * Review: https://learnk8s.io/troubleshooting-deployments | ||
+ | * [[Kubernetes debugging with an ephemeral debug container]]: <code>[[kubectl debug]]</code> | ||
== See also == | == See also == |
Latest revision as of 11:55, 28 February 2024
Commands[edit]
kubectl logs your_pod
kubectl get events -A
kubectl describe pod your_pod
kubectl describe nodes
, reviewconditions:
kubectl top
kubectl cluster-info dump
Events[edit]
- FailedScheduling:
Insufficient cpu
,Insufficient memory
,timed out waiting for the condition
,unbound immediate PersistentVolumeClaims
kubectl get events, OOMKilling, FailedKillPod, SuccessfulDelete, SuccessfulCreate, NoPods, Warning, Critical, NodeSysctlChange, FailedAttachVolume, FailedMount, UnAvailableLoadBalancer, FailedCreatePodSandBox, InvalidDiskCapacity, Scheduled, NetworkNotReady, Evict, Killing, SuccessfulReconcilied, FailedToUpdateEndpointSlices, BackendNotFound, FailedScheduling, ProvisioningFailed
- Kubernetes node events: Kubernetes node events,
NodeNotSchedulable
,NodeAllocatableEnforced
,NodeHasNoDiskPressure, DiskPressure, NodeHasSufficientMemory
,NodeHasSufficientPID
,RegisteredNode
,InvalidDiskCapacity
,Starting
,NodeReady
,RemovingNode
Kubernetes components[edit]
- Kubernetes scheduling
- Kubernetes Pod Topology Spread Constraints
- Kubernetes pod affinity and anti affinity
- Karpenter
ttlSecondsUntilExpired
,controller.node Triggering termination for expired node after 168h0m0s .../...
Log[edit]
Related[edit]
- Readiness, Liveness,
Readiness probe errored
Reason: ProbeWarning
- Kubernetes Pod Disruptions
Unable to connect to the server, ~/.kube/config
DiskPressure
CalculateExpectedPodCountFailed
aws eks create-cluster --logging
Node-pressure Eviction
karpenter.sh/do-not-evict: true
NodeNotReady
kubectl-node-shell
kubectl exec
kubectl attach
- EKS troubleshooting
Activities[edit]
- Review: https://learnk8s.io/troubleshooting-deployments
- Kubernetes debugging with an ephemeral debug container:
kubectl debug
See also[edit]
kubectl get events, OOMKilling, FailedKillPod, SuccessfulDelete, SuccessfulCreate, NoPods, Warning, Critical, NodeSysctlChange, FailedAttachVolume, FailedMount, UnAvailableLoadBalancer, FailedCreatePodSandBox, InvalidDiskCapacity, Scheduled, NetworkNotReady, Evict, Killing, SuccessfulReconcilied, FailedToUpdateEndpointSlices, BackendNotFound, FailedScheduling, ProvisioningFailed
- Kubernetes node events: Kubernetes node events,
NodeNotSchedulable
,NodeAllocatableEnforced
,NodeHasNoDiskPressure, DiskPressure, NodeHasSufficientMemory
,NodeHasSufficientPID
,RegisteredNode
,InvalidDiskCapacity
,Starting
,NodeReady
,RemovingNode
- K8s troubleshooting:
kubectl logs, kubectl top, kubectl get events -A, kubectl describe pod
, Liveness, Readiness,Kubernetes events
, Pulling image, OOMKilled, ProbeWarning, Reason,FailedScheduling
,errImagePull, ImagePullBackOff
, Kubelet conditions:MemoryPressure, DiskPressure, KubeletHasSufficientPID, KubeletReady, kubectl [ debug | attach | exec ] kubectl cluster-info dump, SimKube, KWOK
Advertising: