Kubernetes Deployment update crashes ReplicaSet and creates too many Pods - kubernetes

Using Kubernetes I deploy an app to Google Cloud Containerengine on a cluster with 3 smalll instances.
On a first-time deploy, all goes well using:
kubectl create -f deployment.yaml
And:
kubectl create -f service.yaml
Then I change the image in my deployment.yaml and update it like so:
kubectl apply -f deployment.yaml
After the update, a couple of things happen:
Kubernetes updates its Pods correctly, ending up with 3 updated instances.
Short after this, another ReplicaSet is created (?)
Also, the double amount (2 * 3 = 6) of Pods are suddenly present, where half of them have a status of Running, and the other half Unknown.
So I inspected my Pods and came across this error:
FailedSync Error syncing pod, skipping: network is not ready: [Kubenet does not have netConfig. This is most likely due to lack of PodCIDR]
Also I can't use the dashboard anymore using kubectl proxy. The page shows:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "no endpoints available for service \"kubernetes-dashboard\"",
"reason": "ServiceUnavailable",
"code": 503
}
So I decided to delete all pods forecefully:
kubectl delete pod <pod-name> --grace-period=0 --force
Then, three Pods are triggered for creation, since this is defined in my service.yaml. But upon inspecting my Pods using kubectl describe pods/<pod-name>, I see:
no nodes available to schedule pods
I have no idea where this all went wrong. I essence, all I did was updating an image of a deployment.
Anyone ideas?

I've run into similar issues on Kubernetes. According to your reply to my question on your question (see above):
I noticed that this happens only when I deploy to a micro instance on Google Cloud, which simply has insufficient resources to handle the deployment. Scaling up the initial resources (CPU, Memory) resolved my issue
It seems to me like what's happening here is that the OOM killer from the Linux kernel ends up killing the kubelet, which in turn makes the Node useless to the cluster (and becomes "Unknown").
A real solution to this problem (to prevent an entire node from dropping out of service) is to add resource limits. Make sure you're not just adding requests; add limits because you want your services -- rather than K8s system services -- to be killed so that they can be rescheduled appropriately (if possible).
Also inside of the cluster settings (specifically in the Node Pool -- select from https://console.cloud.google.com/kubernetes/list), there is a box you can check for "Automatic Node Repair" that would at least partially re-mediate this problem rather than giving you an undefined amount of downtime.

If your intention is just to update the image try to use kubectl set image instead. That at least works for me.
By googling kubectl apply a lot of known issues do seem to come up. See this issue for example or this one.
You did not post which version of kubernetes you deployed, but if you can try to upgrade your cluster to the latest version to see if the issue still persists.

Related

deployed a service on k8s but not showing any pods weven when it failed

I have deployed a k8s service, however its not showing any pods. This is what I see
kubectl get deployments
It should create on the default namespace
kubectl get nodes (this shows me nothing)
How do I troubleshoot a failed deployment. The test-control-plane is the one deployed by kind this is the k8s one I'm using.
kubectl get nodes
If above command is not showing anything which mean there is no Nodes in your cluster so where your workload will run ?
You need to have at least one worker node in K8s cluster so deployment can schedule the POD on it and run the application.
You can check worker node using same command
kubectl get nodes
You can debug more and check the reason of issue further using
kubectl describe deployment <name of your deployment>
To find out what really went wrong, first follow the steps described in Harsh Manvar in his answer. Perhaps obtaining that information can help you find the problem. If not, check the logs of your deployment. Try to list your pods and see which ones did not boot properly, then check their logs.
You can also use the kubectl describe on pods to see in more detail what went wrong. Since you are using kind, I include a list of known errors for you.
You can also see this visual guide on troubleshooting Kubernetes deployments and 5 Tips for Troubleshooting Kubernetes Deployments.

Heapster status stuck in Container Creating or Pending status

I am new to Kubernetes and started working with it from past one month.
When creating the setup of cluster, sometimes I see that Heapster will be stuck in Container Creating or Pending status. After this happens the only way have found here is to re-install everything from the scratch which has solved our problem. Later if I run the Heapster it would run without any problem. But I think this is not the optimal solution every time. So please help out in solving the same issue when it occurs again.
Heapster image is pulled from the github for our use. Right now the cluster is running fine, So could not send the screenshot of the heapster failing with it's status by staying in Container creating or Pending status.
Suggest any alternative for the problem to be solved if it occurs again.
Thanks in advance for your time.
A pod stuck in pending state can mean more than one thing. Next time it happens you should do 'kubectl get pods' and then 'kubectl describe pod '. However, since it works sometimes the most likely cause is that the cluster doesn't have enough resources on any of its nodes to schedule the pod. If the cluster is low on remaining resources you should get an indication of this by 'kubectl top nodes' and by 'kubectl describe nodes'. (Or with gke, if you are on google cloud, you often get a low resource warning in the web UI console.)
(Or if in Azure then be wary of https://github.com/Azure/ACS/issues/29 )

Postgres pod suddenly dissapeared after update in gcloud

We changed kubernetes node version because of this message, and because for some reason, pods were unable to be scheduled
Before this message, however, there was a postgres pod running
As you can see the pod is gone, for some reason, why is it so?
I cannot seem to get it back, when I try kubectl get events I get that no resources cannot be found, is there anyway to revive the postgres container, or get information about it, why is it down? What could I do? kubectl logs postgres doesn't seem to work either.
What I want to get is where was this postgres pod running (like the location path), or if the configuration of this pod is still available, or if this is lost forever. If the pod is dead, can I still access to it's "graveyard" (that means the database data), or was this cleaned up?
Update
Okay so it turns out this pod wasn't managed by a controller, so that's why when it died there was no traces of it, but why there is no log information that this pod was killed?
Judging by the name your pod has, it wasn't provisioned using a deployment or a replicaset (if it was, like your other pods, it'd have a random id after its name)
More than likely, it's a standalone pod, which means one the node is gone, the pod is gone.
It might be possible to use kubectl get pods --show-all but it's unlikely.
If your database has a persistent volume, you may still be able to retrieve the data by reattaching that to a new postgres pod.
In future, you might consider setting the termination message and message path and also ensuring all pods are in a replicaset or deployment with persistent volumes attached.

pods keep creating themselves even I deleted all deployments

I am running k8s on aws, and I updated the deployment of nginx - which normally, it works fine-, but after this time, the nginx deployment won't show up in "kubectl get deployments".
I want to kill all the pods related to nginx, but they keep reproduce themselves. I deleted all deployments "kubectl delete --all deployments", other pods just got terminated, but not nginx.
I have no idea where I can stop the pods recreating.
any idea where to start ?
check the deployment, replication controller and replica set and remove them.
kubectl get deploy,rc,rs
In modern kubernetes, there is also an annotation kubernetes.io/created-by on the Pod showing its "owner", as seen here, but I can't lay my hands on the documentation link right now. However, I found a pastebin containing a concrete example of the contents of the annotation

Google (Stackdriver) Logging fails after Kubernetes rolling-update

When performing a kubectl rolling-update of a replication controller in Kubernetes (Google Container Engine), the Google (Stackdriver) Logging agent doesn't pick up the newly deployed pod. The Log is stuck at the last message produced from the old pod.
Consequently, the logs for the replication controller are out-of-date until we do a manual restart (i.e. kubectl scale and kubectl delete) of the pod and the logs are updated again.
Can anybody else confirm that behaviour? Is there a workaround?
I can try to repro the behavior, but first can you try running kubectl logs <pod-name> on the newly created pod after doing the rolling-update to verify that the new version of your app was producing logs at all?
This sounds more likely to be an application problem than an infrastructure problem, but if you can confirm that it is an infra problem I'd love to get to the bottom of it.