k8s: get events from specific node - kubernetes

I want to know how to get events which are running on a specific node.
In my case my k8s cluster is made up of 3 worker nodes (node1,node2,node3). I want to get a list of all the events that are getting executed on node2.
I know i can get namespace specific events by:
kubectl get event --namespace default
Is there a way/option to get something like:
kubectl get event --nodename node2

This should work
kubectl get events --all-namespaces | grep -i node01
this command gives me pod scheduled events too
master $ kubectl get events --all-namespaces | grep -i node01
default 46s Normal Scheduled pod/nginx-
dashrath Successfully assigned default/nginx-
dashrath to node01
default 10m Normal Scheduled pod/nginx
Successfully assigned default/nginx to node01
default 11m Normal NodeHasSufficientMemory node/node01
Node node01 status is now: NodeHasSufficientMemory

This is what works
$ kubectl get events --all-namespaces -o wide | grep -i node01

Related

Error when getting IngressClass nginx: "nginx" not found

I'm using Kubernetes version: 1.19.16 on bare metal Ubuntu-18.04lts server. When i tried to deploy the nginx-ingress yaml file it always fails with below errors.
Following steps followed to deploy nginx-ingress,
$ git clone https://github.com/nginxinc/kubernetes-ingress.git
cd kubernetes-ingress/deployments
kubernetes-ingress/deployments$ git branch
* main
$ kubectl apply -f common/ns-and-sa.yaml
$ kubectl apply -f rbac/rbac.yaml
$ kubectl apply -f rbac/ap-rbac.yaml
$ kubectl apply -f common/default-server-secret.yaml
$ kubectl apply -f common/nginx-config.yaml
$ kubectl apply -f deployment/nginx-ingress.yaml
deployment.apps/nginx-ingress created
$ kubectl get pods -n nginx-ingress -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-ingress-75c4bd64bd-mm52x 0/1 Error 2 21s 10.244.1.5 k8s-master <none> <none>
$ kubectl -n nginx-ingress get all
NAME READY STATUS RESTARTS AGE
pod/nginx-ingress-75c4bd64bd-mm52x 0/1 CrashLoopBackOff 12 38m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-ingress 0/1 1 0 38m
NAME DESIRED CURRENT READY AGE
replicaset.apps/nginx-ingress-75c4bd64bd 1 1 0 38m
$ kubectl logs nginx-ingress-75c4bd64bd-mm52x -n nginx-ingress
W1003 04:53:02.833073 1 flags.go:273] Ignoring unhandled arguments: []
I1003 04:53:02.833154 1 flags.go:190] Starting NGINX Ingress Controller Version=2.3.1 PlusFlag=false
I1003 04:53:02.833158 1 flags.go:191] Commit=a8742472b9ddf27433b6b1de49d250aa9a7cb47e Date=2022-09-16T08:09:31Z DirtyState=false Arch=linux/amd64 Go=go1.18.5
I1003 04:53:02.844374 1 main.go:210] Kubernetes version: 1.19.16
F1003 04:53:02.846604 1 main.go:225] Error when getting IngressClass nginx: ingressclasses.networking.k8s.io "nginx" not found
$ kubectl describe pods nginx-ingress-75c4bd64bd-mm52x -n nginx-ingress
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m6s default-scheduler Successfully assigned nginx-ingress/nginx-ingress-75c4bd64bd-mm52x to k8s-worker-1
Normal Pulled 87s (x5 over 3m5s) kubelet Container image "nginx/nginx-ingress:2.3.1" already present on machine
Normal Created 87s (x5 over 3m5s) kubelet Created container nginx-ingress
Normal Started 87s (x5 over 3m5s) kubelet Started container nginx-ingress
Warning BackOff 75s (x10 over 3m3s) kubelet Back-off restarting failed container
Nginx Ingress controller Deployment file Link for the reference.
As I'm using kubernetes-ingress.git repository main branch, not sure whether main branch is compatible with my Kubernetes version or not.
Can anyone share some pointer to solve this?
I think you missed to install ingress-controller "NGINX" that is why it is not able to identify the same https://github.com/nginxinc/kubernetes-ingress/blob/main/deployments/common/ingress-class.yaml#L4
kubectl apply -f common/ingress-class.yaml
You can follow thie steps from this document: https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-manifests/

Failed to move past 1 pod has unbound immediate PersistentVolumeClaims

I am new to Kubernetes, and trying to get apache airflow working using helm charts. After almost a week of struggling, I am nowhere - even to get the one provided in the apache airflow documentation working. I use Pop OS 20.04 and microk8s.
When I run these commands:
kubectl create namespace airflow
helm repo add apache-airflow https://airflow.apache.org
helm install airflow apache-airflow/airflow --namespace airflow
The helm installation times out after five minutes.
kubectl get pods -n airflow
shows this list:
NAME READY STATUS RESTARTS AGE
airflow-postgresql-0 0/1 Pending 0 4m8s
airflow-redis-0 0/1 Pending 0 4m8s
airflow-worker-0 0/2 Pending 0 4m8s
airflow-scheduler-565d8587fd-vm8h7 0/2 Init:0/1 0 4m8s
airflow-triggerer-7f4477dcb6-nlhg8 0/1 Init:0/1 0 4m8s
airflow-webserver-684c5d94d9-qhhv2 0/1 Init:0/1 0 4m8s
airflow-run-airflow-migrations-rzm59 1/1 Running 0 4m8s
airflow-statsd-84f4f9898-sltw9 1/1 Running 0 4m8s
airflow-flower-7c87f95f46-qqqqx 0/1 Running 4 4m8s
Then when I run the below command:
kubectl describe pod airflow-postgresql-0 -n airflow
I get the below (trimmed up to the events):
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 58s (x2 over 58s) default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
Then I deleted the namespace using the following commands
kubectl delete ns airflow
At this point, the termination of the pods gets stuck. Then I bring up the proxy in another terminal:
kubectl proxy
Then issue the following command to force deleting the namespace and all it's pods and resources:
kubectl get ns airflow -o json | jq '.spec.finalizers=[]' | curl -X PUT http://localhost:8001/api/v1/namespaces/airflow/finalize -H "Content-Type: application/json" --data #-
Then I deleted the PVC's using the following command:
kubectl delete pvc --force --grace-period=0 --all -n airflow
You get stuck again, so I had to issue another command to force this deletion:
kubectl patch pvc data-airflow-postgresql-0 -p '{"metadata":{"finalizers":null}}' -n airflow
The PVC's gets terminated at this point and these two commands return nothing:
kubectl get pvc -n airflow
kubectl get all -n airflow
Then I restarted the machine and executed the helm install again (using first and last commands in the first section of this question), but the same result.
I executed the following command then (using the suggestions I found here):
kubectl describe pvc -n airflow
I got the following output (I am posting the event portion of PostgreSQL):
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 2m58s (x42 over 13m) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
So my assumption is that I need to provide storage class as part of the values.yaml
Is my understanding right? How do I provide the required (and what values) in the values.yaml?
If you installed with helm, you can uninstall with helm delete airflow -n airflow.
Here's a way to install airflow for testing purposes using default values:
Generate the manifest helm template airflow apache-airflow/airflow -n airflow > airflow.yaml
Open the "airflow.yaml" with your favorite editor, replace all "volumeClaimTemplates" with emptyDir. Example:
Create the namespace and install:
kubectl create namespace airflow
kubectl apply -f airflow.yaml --namespace airflow
You can copy files out from the pods if needed.
To delete kubectl delete -f airflow.yaml --namespace airflow.

kubectl status.phase=Running return wrong results

When I run:
kubectl get pods --field-selector=status.phase=Running
I see:
NAME READY STATUS RESTARTS AGE
k8s-fbd7b 2/2 Running 0 5m5s
testm-45gfg 1/2 Error 0 22h
I don't understand why this command gives me pod that are in Error status?
According to K8S api, there is no such thing STATUS=Error.
How can I get only the pods that are in this Error status?
When I run:
kubectl get pods --field-selector=status.phase=Failed
It tells me that there are no pods in that status.
Using the kubectl get pods --field-selector=status.phase=Failed command you can display all Pods in the Failed phase.
Failed means that all containers in the Pod have terminated, and at least one container has terminated in failure (see: Pod phase):
Failed - All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system.
In your example, both Pods are in the Running phase because at least one container is still running in each of these Pods.:
Running - The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting.
You can check the current phase of Pods using the following command:
$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
Let's check how this command works:
$ kubectl get pods
NAME READY STATUS
app-1 1/2 Error
app-2 0/1 Error
$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\n"}{end}'
app-1 Running
app-2 Failed
As you can see, only the app-2 Pod is in the Failed phase. There is still one container running in the app-1 Pod, so this Pod is in the Running phase.
To list all pods with the Error status, you can simply use:
$ kubectl get pods -A | grep Error
default app-1 1/2 Error
default app-2 0/1 Error
Additionally, it's worth mentioning that you can check the state of all containers in Pods:
$ kubectl get pod -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.containerStatuses[*].state}{"\n"}{end}'
app-1 {"terminated":{"containerID":"containerd://f208e2a1ff08c5ce2acf3a33da05603c1947107e398d2f5fbf6f35d8b273ac71","exitCode":2,"finishedAt":"2021-08-11T14:07:21Z","reason":"Error","startedAt":"2021-08-11T14:07:21Z"}} {"running":{"startedAt":"2021-08-11T14:07:21Z"}}
app-2 {"terminated":{"containerID":"containerd://7a66cbbf73985efaaf348ec2f7a14d8e5bf22f891bd655c4b64692005eb0439b","exitCode":2,"finishedAt":"2021-08-11T14:08:50Z","reason":"Error","startedAt":"2021-08-11T14:08:50Z"}}
You can simply grep the Error pods using the
kubectl get pods --all-namespces | grep Error
Remove all error pods from the cluster
kubectl delete pod `kubectl get pods --namespace <yournamespace> | awk '$3 == "Error" {print $1}'` --namespace <yournamespace>
Mostly Pod failures return explicit error states that can be observed in the status field
Error :
Your pod is crashed, it was able to schedule on node successfully but crashed after that. To debug it more you can use different methods or commands
kubectl describe pod <Pod name > -n <Namespace>
https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/#my-pod-is-crashing-or-otherwise-unhealthy
Here is an overkill go-template based attempt:
kubectl get pods -o go-template='{{range $index, $element := .items}}{{range .status.containerStatuses}}{{range .state }}{{if .reason }}{{if (eq .reason "Error") }}{{$element.metadata.name}} {{$element.metadata.namespace}}{{"\n"}}{{end}}{{end}}{{end}}{{end}}{{end}}'
job1-stn45 default
My pod status:
k get pod
NAME READY STATUS RESTARTS AGE
foo 1/1 Running 1 2d11h
nginx-0 1/1 Running 3 5d10h
nginx-2 1/1 Running 3 5d10h
nginx-1 1/1 Running 3 5d10h
job1-stn45 0/1 Error 0 113m
update-test-27145740-82z7s 0/1 ImagePullBackOff 0 96m
update-test-27145500-7f2l9 0/1 ImagePullBackOff 0 5h36m

Kubernetes ALL workloads fail when deploying a single update

After I update the backend code (pushing update to gcr.io), I delete the pod. Usually a new pod spins up.
But after today the whole cluster just breaks down. I really cannot comprehend what is happening here (I did not touch any of the other items).
I am really looking in the dark here. Where do I start looking?
I see that the logs show:
0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
when I look this up:
kubectl describe node | grep -i taint
Taints: node.kubernetes.io/unreachable:NoSchedule
Taints: node.kubernetes.io/unreachable:NoSchedule
But I have no clue what this is or how they even get there.
EDIT:
It looks like I need to remove the taints, but I am not able to (taint not found?)
kubectl taint nodes --all node-role.kubernetes.io/unreachable-
taint "node-role.kubernetes.io/unreachable" not found
taint "node-role.kubernetes.io/unreachable" not found
Likely problem with the nodes. Debug with some of these (sample):
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 1d v1.14.2
k8s-node1 NotReady <none> 1d v1.14.2
k8s-node2 NotReady <none> 1d v1.14.2 <-- Does it say NotReady?
$ kubectl describe node k8s-node1
...
# Do you see something like this? What's the event message?
MemoryPressure...
DiskPressure...
PIDPressure...
Check if the kubelet is running on every node (it might be crashing and restarting)
ssh k8s-node1
# ps -Af | grep kubelet
# systemctl status kubelet
# journalctl -xeu kubelet
Nuclear option:
If you are using a node pool, delete your nodes and let the autoscaler restart brand new nodes.
Related question/answer.
✌️

JSONPath to list all nodes in ready state except the ones which are tainted?

I want to list all nodes which are in ready state except the ones which have any kind of taint on them. How can I achieve this using jsonpath ?
I tried below statement taken from k8s doc but it doesn't print what I want. I am looking for output such as -- node01 node02. There is no master node in the output as it has a taint on it. What kind of taint is not really significant here.
JSONPATH='{range .items[*]}{#.metadata.name}:{range #.status.conditions[*]}{#.type}={#.status};{end}{end}' \
&& kubectl get nodes -o jsonpath="$JSONPATH" | grep "Ready=True"
I have successfully listed my nodes that are ready and not tainted using jq.
Here you have all the nodes:
$ kubectl get nodes
gke-standard-cluster-1-default-pool-9c101360-9lvw Ready <none> 31s v1.13.11-gke.9
gke-standard-cluster-1-default-pool-9c101360-fdhr Ready <none> 30s v1.13.11-gke.9
gke-standard-cluster-1-default-pool-9c101360-gq9c Ready <none> 31s v1.13.11-gke.
Here I have tainted one node:
$ kubectl taint node gke-standard-cluster-1-default-pool-9c101360-9lvw key=value:NoSchedule
node/gke-standard-cluster-1-default-pool-9c101360-9lvw tainted
And finally a command that list the not tainted and ready nodes:
$ kubectl get nodes -o json | jq -r '.items[] | select(.spec.taints|not) | select(.status.conditions[].reason=="KubeletReady" and .status.conditions[].status=="True") | .metadata.name'
gke-standard-cluster-1-default-pool-9c101360-fdhr
gke-standard-cluster-1-default-pool-9c101360-gq9c
You can get it using -o jsonpath and awk
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready master 27m v1.19.0
node01 Ready,SchedulingDisabled <none> 26m v1.19.0
node02 Ready <none> 26m v1.19.0
node03 Ready <none> 26m v1.19.0
controlplane and node01 are ready but have taints NoSchedule
To list all the nodes with name, status Ready=True and taints
$ kubectl get nodes -o jsonpath='{range .items[*]} {.metadata.name} {" "} {.status.conditions[?(#.type=="Ready")].status} {" "} {.spec.taints} {"\n"} {end}'
controlplane True [{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"}]
node01 True [{"effect":"NoSchedule","key":"node.kubernetes.io/unschedulable","timeAdded":"2021-04-03T12:22:56Z"}]
node02 True
node03 True
Using awk to print the Ready Nodes and do not have taints NoSchedule
$ kubectl get nodes -o jsonpath='{range .items[*]} {.metadata.name} {" "} {.status.conditions[?(#.type=="Ready"].status} {" "} {.spec.taints} {"\n"} {end}' | awk '$2=="True"' | awk '$3 !~/"NoSchedule"/ { print $1}'
node02
node03