Airflow is receiving incorrect POD status from Kubernetes - kubernetes

We are using Airflow to schedule Spark job on Kubernetes. Recently, I have encountered a scenario where:
airflow received error 404 with message "pods pod-name not found"
I manually checked that POD was actually working fine at that time. In fact, I was able to collect logs using kubectl logs -f -n namespace podname
What happened due to this is that airflow created another POD for running the same job which resulted in race condition.
Airflow is using Kubernetes Python client's read_namespaced_pod API()
def read_pod(self, pod):
"""Read POD information"""
try:
return self._client.read_namespaced_pod(pod.metadata.name, pod.metadata.namespace)
except BaseHTTPError as e:
raise AirflowException(
'There was an error reading the kubernetes API: {}'.format(e)
)
I believe read_namespaced_pod() calls Kubernetes API. In order to investigate this further, I would like to like check logs of Kubernetes API server.
Can you please share steps to check what is happening on Kubernetes side ?
Note: Kubernetes version is 1.18 and Airflow version is 1.10.10.

Answering the question from the perspective of logs/troubleshooting:
I believe read_namespaced_pod() calls Kubernetes API. In order to investigate this further, I would like to like check logs of Kubernetes API server.
Yes, you are correct, this function calls the Kubernetes API. You can check the logs of Kubernetes API server by running:
$ kubectl logs -n kube-system KUBERNETES_API_SERVER_POD_NAME
I would also consider checking the kube-controller-manager:
$ kubectl logs -n kube-system KUBERNETES_CONTROLLER_MANAGER_POD_NAME
The example output of it:
I0413 12:33:12.840270 1 event.go:291] "Event occurred" object="default/nginx-6799fc88d8" kind="ReplicaSet" apiVersion="apps/v1" type="Normal" reason="SuccessfulCreate" message="Created pod: nginx-6799fc88d8-kchp7"
A side note!
Above commands will work assuming that your kubernetes-apiserver and kubernetes-controller-manager Pod is visible to you
Can you please share steps to check what is happening on Kubernetes side ?
This question targets the basics of troubleshooting/logs checking.
For that you can use following commands (and the ones mentioned earlier):
$ kubectl get RESOURCE RESOURCE_NAME:
example: $ kubectl get pod airflow-pod-name
also you can add -o yaml for more information
$ kubectl describe RESOURCE RESOURCE_NAME:
example: $ kubectl describe pod airflow-pod-name
$ kubectl logs POD_NAME:
example: $ kubectl logs airflow-pod-name
Additional resources:
Kubernetes.io: Docs: Concepts: Cluster administration: Logging Architecture
Kubernetes.io: Docs: Tasks: Debug application cluster: Debug cluster

Related

What are "snapshot logs" and how they differ from "standard(?) logs" in Kubernetes?

I was looking for a way to stream the logs of all pods of a specific deployment of mine.
So, some days ago I've found this SO answer giving me a magical command:
kubectl logs -f deployment/<my-deployment> --all-containers=true
However, I've just discovered, after a lot of time debugging, that this command actually shows the logs of just one pod, and not all of the deployment.
So I went to Kubectl's official documentation and found nothing relevant on the topic, just the following phrase above the example that uses the deployment, as a kind of selector, for log streaming:
...
# Show logs from a kubelet with an expired serving certificate
kubectl logs --insecure-skip-tls-verify-backend nginx
# Return snapshot logs from first container of a job named hello
kubectl logs job/hello
# Return snapshot logs from container nginx-1 of a deployment named nginx
kubectl logs deployment/nginx -c nginx-1
So why is that the first example shown says "Show logs" and the other two say "Return snapshot logs"?
Is it because of this "snapshot" that I can't retrieve logs from all the pods of the deployment?
I've searched a lot for more deep documentation on streaming logs with kubectl but couldn't find any.
To return all pod(s) log of a deployment you can use the same selector as the deployment. You can retrieve the deployment selector like this kubectl get deployment <name> -o jsonpath='{.spec.selector}' --namespace <name>, then you retrieve logs using the same selector kubectl logs --selector <key1=value1,key2=value2> --namespace <name>

Kubectl using command to get cluster status

I need to create a shell-script which examine the cluster
Status.**
I saw that the kubectl describe-nodes provides lots of data
I can output it to json and then parse it but maybe it’s just overkill.
Is there a simple way to with kubectl command to get the status of the cluster ? just if its up / down
The least expensive way to check if you can reach the API server is kubectl version. In addition kubectl cluster-info gives you some more info.
In addition to Michael's answer, that would only tell you about the API server or master and internal services like KubeDns etc, but not the nodes.
It depends on your need and definition of "status" here. You could run kubectl cluster-info followed by kubectl get nodes and check the STATUS column for all nodes using parsing tools like awk, jq or kubectl's own -o jsonpath option to verify that all nodes are ready.
The below command would display the health of scheduler, controller and etcd
kubectl get cs
Command below lists Kubernetes core components like, etcd, controller, scheduler, kube-proxy, core-dns, network plugin. All those pods should be running to be sure that Kubernetes is healthy.
kubectl get pod -n kube-system
Finally deploy one front-end and back-end Pod and verify the inter-pod communication to ensure that cluster is up and working correctly.
Below are the commands to get cluster status based on requirements:
To get information regarding where your Kubernetes master is running at, CoreDNS is running at, kubernetes-dasboard is running at, use
kubectl cluster-info
To get detailed information to further debug and diagnose cluster problem, use kubectl cluster-info dump
To get only the health status for your node use, kubectl get componentstatus or kubectl get cs
*To show detailed information about a resource use kubectl describe node <node>

Automatic restart of a Kubernetes pod

I have a Kubernetes cluster on Google Cloud Platform. The Kubernetes cluster contains a deployment which has one pod. The pod has two containers. I have observed that the pod has been replaced by a new pod and the entire data is wiped out. I am not able to identify the reason behind it.
I have tried the below two commands:
kubectl logs [podname] -c [containername] --previous
**Result: ** previous terminated container [containername] in pod [podname] not found
kubectl get pods
Result: I see that the number of restarts for my pod equals 0.
Is there anything I could do to get the logs from my old pod?
Try below command to see the pod info
kubectl describe po
Not many chances you will retrieve this information, but try next:
1) If you know your failed container id - try to find old logs here
/var/lib/docker/containers/<container id>/<container id>-json.log
2) look at kubelet's logs:
journalctl -u kubelet

ibm Cloud private console Not coming after installation

I have installed the Ibm private cloud private with 3 nodes. MASTER,PROXY worker and management are configured on all the nodes. I also added vsphere cloud provider configuration in the config.yaml before those installation.
Installation is successful and i got the url for console http://proxy_vip:8443. But i cannot access the console. The port 8443 is not listening.
When i checked the pod status i got the below output.
i found this issue while running 'kubectl -s 127.0.0.1:8888 -n kube-system get pods. Other pods are running
Try deleting the POD using kubectl delete pod icp-router -n kube-system. It should reinitialize the POD.
The admin console will be available at https://master_ip:8443/console. If the port isn't listening, then you can confirm the health of the icp-router pod(s):
kubectl -n kube-system get pods -o wide | grep icp-router
The output will show you the pod which is used to serve access to the web console. If it's not running or in a bad state, then your web console may not be accessible. If you can post logs from the container, then it may provide more insight into what's going on within your cluster:
kubectl -n kube-system logs icp-router-[XXXXX]
After ICP 2.1.0 installation, if the pods is CrashLoopBackOff, and kubectl logs or docker logs command shows 'Illegal instruction (core dumped)' error, you need to check your CPU information by command 'cat /proc/cpuinfo'. Ensure your CPU has 'sse4_2' flag.

Error while creating pods in Kubernetes

I have installed Kubernetes in Ubuntu server using instructions here. I am trying to create pods using kubectl run hello-minikube --image=gcr.io/google_containers/echoserver:1.4 --hostport=8000 --port=8080 as listed in the example. However, when I do kubectl get pod I get the status of the container as pending. I further did kubectl describe pod for debugging and I see the message:
FailedScheduling pod (hello-minikube-3383150820-1r4f7) failed to fit in any node fit failure on node (minikubevm): PodFitsHostPorts.
I am further trying to delete this pod by kubectl delete pod hello-minikube-3383150820-1r4f7 but when I further do kubectl get pod I see another pod with prefix "hello-minikube-3383150820-" that I havent created. Does anyone know how to fix this problem? Thank you in advance.
The PodFitsHostPorts predicate is failing because you have something else on your nodes using port 8000. You might be able to find what it is by running kubectl describe svc.
kubectl run creates a deployment object (you can see it with kubectl describe deployments) which makes sure that you always keep the intended number of replicas of the pod running (in this case 1). When you delete the pod, the deployment controller automatically creates another for you. If you want to delete the deployment and the pods it keeps creating, you can run kubectl delete deployments hello-minikube.