Postgres pod suddenly dissapeared after update in gcloud - kubernetes

We changed kubernetes node version because of this message, and because for some reason, pods were unable to be scheduled
Before this message, however, there was a postgres pod running
As you can see the pod is gone, for some reason, why is it so?
I cannot seem to get it back, when I try kubectl get events I get that no resources cannot be found, is there anyway to revive the postgres container, or get information about it, why is it down? What could I do? kubectl logs postgres doesn't seem to work either.
What I want to get is where was this postgres pod running (like the location path), or if the configuration of this pod is still available, or if this is lost forever. If the pod is dead, can I still access to it's "graveyard" (that means the database data), or was this cleaned up?
Update
Okay so it turns out this pod wasn't managed by a controller, so that's why when it died there was no traces of it, but why there is no log information that this pod was killed?

Judging by the name your pod has, it wasn't provisioned using a deployment or a replicaset (if it was, like your other pods, it'd have a random id after its name)
More than likely, it's a standalone pod, which means one the node is gone, the pod is gone.
It might be possible to use kubectl get pods --show-all but it's unlikely.
If your database has a persistent volume, you may still be able to retrieve the data by reattaching that to a new postgres pod.
In future, you might consider setting the termination message and message path and also ensuring all pods are in a replicaset or deployment with persistent volumes attached.

Related

Auto delete CrashBackoffLoop pods in a deployment

In my kubernetes cluster, there are multiple deployments in a namespace.
For a specific deployment, there is a need to not allow "CrashLoopBackoff" pods to exist.
So basically, when any pod gets to this state, I would want it to be deleted and later a new pod to be created which is already handled by the ReplicaSet.
I tried with custom controllers, with the thought that the SharedInformer would alert about the state of Pod and then I would delete it from that loop.
However, this brings dependency on the pod on which the custom controller would run.
I also tried searching for any option to be configured in the manifest itself, but could not find any.
I am pretty new to Kuberenetes, so need help in the implementation of this behaviour.
Firstly, you should address the reason why the pod has entered the CrashLoopBackOff state rather than just delete it. If you do this, you'll potentially just recreate the problem again and you'll be deleting pods repeatedly. For example, if your pod is trying to access an external DB and that DB is down, it'll CrashLoop, and deleting and restarting the pod won't help fix that.
Secondly, if you want to do this deleting in an automated manner, an easy way would be to run a CronJob resource that goes through your deployment and deletes the CrashLooped pods. You could set the cronjob to run once an hour or whatever schedule you wish.
Deleting the POD and waiting for the New one is like restarting the deployment or POD.
Kubernetes will auto restart your CrashLoopBackoff POD if failing, you can check the Restart count.
NAME READY STATUS RESTARTS AGE
te-pod-1 0/1 CrashLoopBackOff 2 1m44s
This restarts will be similar to what you have mentioned
when any pod gets to this state, I would want it to be deleted and
later a new pod to be created which is already handled by the
ReplicaSet.
If you want to remove Crashing the POD fully and not look for new POD to come up, you have to rollback the deployment.
If there is any issue with your Replicaset and your POD is crashing it would be useless, any number of times you delete and restart the POD it will crash all time, unless you check logs & debug to solve the real issue in replicaset(Deployment).

Duplicate pods / Pods creating without deploy existing

I'm running into an issue managing my Kubernetes pods.
I had a deploy instance which I removed and created a new one. The pod tied to that deploy instance shut down as expected and a new one came up when I created a new deploy, as expected.
However, once I changed the deploy, a second pod began running. I tried to "kubectl delete pod pod-id" but it would just recreate itself again.
I went through the same process again and now I'm stuck with 3 pods, and no deploy. I removed the deploy completely, and I try to delete the pods but they keep recreating themselves. This is an issue because I am exhausting the resources available on my Kubernetes.
Does anyone know how to force remove these pods? I do not know how they are recreating themselves if there's no deploy to go by.
The root cause could be either an existing deployment, replicaset, daemonset, statefulset or a static pod. Check if any of these exist in the affected namespace using kubectl get <RESOURCE-TYPE>
I've had this happen after issuing a rollout restart deployment while a pod was already in an error or creating state, and explicitly deleting the second pod only resulted in a new one getting scheduled (trick birthday candle situation).
I find almost any time I have an issue like this it can be fixed by simply zeroing out the replicaSets in the deployment, applying, then restoring replicaSets to the original value.

Kubernetes: view logs of crashed Airflow worker pod

Pods on our k8s cluster are scheduled with Airflow's KubernetesExecutor, which runs all Tasks in a new pod.
I have a such a Task for which the pod instantly (after 1 or 2 seconds) crashes, and for which of course I want to see the logs.
This seems hard. As soon the pod crashes, it gets deleted, along with the ability to retrieve crash logs. I already tried all of:
kubectl logs -f <pod> -p: cannot be used since these pods are named uniquely
(courtesy of KubernetesExecutor).
kubectl logs -l label_name=label_value: I
struggle to apply the labels to the pod (if this is a known/used way of working, I'm happy to try further)
An shared nfs is mounted on all pods on a fixed log directory. The failing pod however, does not log to this folder.
When I am really quick I run kubectl logs -f -l dag_id=sample_dag --all-containers (dag_idlabel is added byAirflow)
between running and crashing and see Error from server (BadRequest): container "base" in pod "my_pod" is waiting to start: ContainerCreating. This might give me some clue but:
these are only but the last log lines
this is really backwards
I'm basically looking for the canonical way of retrieving logs from transient pods
You need to enable remote logging. Code sample below is for using S3. In airflow.cfg set the following:
remote_logging = True
remote_log_conn_id = my_s3_conn
remote_base_log_folder = s3://airflow/logs
The my_s3_conn can be set in airflow>Admin>Connections. In the Conn Type dropdown, select S3.

On what basis restart count in kubernetes increase

I have a kubernetes cluster running fine. It has 4 workers and 1 master with the dashboard to view the status. After running it for sometime, I looked at the Restart count of a node and it was 8. I immediately ran the describe command to get any events but there was no events for that pod. However when I checked the logs of the containers, I found out that the node itself was powered down and up 4 times but dont know why it didnt had any events.
In another node, while looking at the restart count, I got event as Sandbox changed which means probably the node was powered down for sometime and thus the master lost connection to it and so incremented the restart count by 2.
I wanted to know how can we get the logs/debug related to this restart count to know why it was restarted.
Whenever a pod is recreated, does it takes up a new name.? If so, how can we get the events of the previous pod.
Does sandbox changed event actually means that master actually lost connection.?
Step by step:
I'd check the kubelet and docker daemon logs, these restarts should appear somewhere in the logs and hopefully more info about what causes them.
Yes, the pod's name is unique thus it change everytime a pod is destroyed and recreated. You can try to find the pod with kubectl get po -a. Other solution is to get all events with kubectl get events and then filter to find your pod's events.
I've seen this error before and in my case it meant problem with the docker daemon networking. But I searched a bit in google and I saw many other reasons. Again, try to analyse the docker daemon and kubelet logs, and also dmesg. If you have doubts please add a link to the logs in your question and I'll try to help.

kubernetes pods are restarting with new ID

The pods i am working with are being managed by kubernetes. When I use the docker restart command to restart a pod, sometimes the pod gets a new id and sometimes the old one. When the pod gets a new id, its state first goes friom running ->error->crashloopbackoff. Can anyone please tell me why is this happening. Also how frequently does kubernetes does the health check
Kubernetes currently does not use the docker restart command for many reasons (e.g., preserving the logs of older containers). Kubelet, the daemon on the node, creates a new container if the existing container terminated. In any case, users should not perform container lifecycle operations (e.g., stop, restart) on kubernetes-managed containers directly using docker, as it could cause unexpected behaviors.
EDIT: If you want kubernetes to restart your container automatically, set RestartPolicy in your pod spec to "Always" or "OnFailure". For more details, see http://kubernetes.io/docs/user-guide/pod-states/