kubectl get nodes hangs when I delete a node externally - kubernetes

Been experimenting with Kubernetes/Rancher and encountered some unexpected behavior. Today I'm deliberately putting on my chaos monkey hat and learning how things behave when stuff fails.
Here's what I've done:
1) Using the Rancher UI, stand up a 3 node cluster on Digital Ocean
Success -- a few mins later I have a 3 node cluster, visible in Rancher.
2) Using the Rancher UI, I deleted a node in a 'happy' scenario where I push the appropriate node delete button using Rancher.
Some minutes later, I have a 2 node cluster. Great.
3) Using the Digital Ocean admin UI, I delete a node in an 'oops' scenario as if a sysadmin accidentally deleted a node.
Back on the ranch (sorry), I click here to view the state of the cluster:
Unfortunately after three minutes, I'm getting a gateway timeout
Detailed timeouts in Chrome network inspector
Here's what kubectl says:
$ kubectl get nodes
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)
So, question is, what happened here?
I was under the impression Kubernetes was 'self healing' and even if this node I deleted was the etcd leader, it would eventually recover. Been around 2 hours -- do I just need to wait more?

Related

Synchronize and rollback independent deployments in kubernetes

I have k8s setup that contains 2 deployments: client and server deployed from different images. Both deployments have replica sets inside, liveness and readiness probes defined. The client communicates with the server via k8s' service.
Currently, the deployment scripts for both client and server are separated (separate yaml files applied via kustomization). Rollback works correctly for both parts independently but let's consider the following scenario:
1. deployment is starting
2. both deployment configurations are applied
3. k8s master starts replacing pods of server and client
4. server pods start correctly so new replica set has all the new pods up and running
5. client pods have an issue, so the old replica set is still running
In many cases it's not a problem, because client and server work independently, but there are situations when breaking change to the server API is released and both client and server must be updated. In that case if any of these two fails then both should be rolled back (doesn't matter which one fails - both needs to be rolled back to be in sync).
Is there a way to achieve that in k8s? I spent quite a lot of time searching for some solution but everything I found so far describes deployments/rollbacks of one thing at the time and that doesn't solve the issue above.
The problem here is something covered in Blue/Green deployments.
Here is a good reference of Blue/Green deployments with k8s.
The basic idea is, you deploy the new version (Green deployment) while keeping the previous version (Blue deployment) up and running and only allow traffic to the new version (Green deployment) when everything went fine.

Kubernetes pod/containers running but not listed with 'kubectl get pods'?

I have an issue that, at face value, appears to indicate that I have two deployments running in parallel within my kube cluster, but 'kubectl get pods' only shows one deployment.
My deployment is composed of a pod with two containers. One of the containers runs a golang application that creates an http API endpoint, and the other runs Telegraf to read metrics from the API endpoint and push them to InfluxDB. When writing the data to Influx I tag the data with the source host as the name of the pod. I use Grafana to plot the metrics and I can clearly see incoming streaming data coming from two hosts (e.g. I can set a "WHERE host=" query clause as either "application-pod-name-231620957-7n32f" and "application-pod-name-1931165991-x154c").
Based on the above, I'm fairly certain that two deployments of the pod are running, each with the two containers (one providing application metrics and the other with telegraf sending metrics to InfluxDB).
However, kube seems to think that one of the deployments doesn't exist. As mentioned, "kubectl get pods" doesn't display the 2nd pod name in any way shape or form. Only one of them.
Has anyone seen this? Any ideas on further troubleshooting? I've attempted to use the pod name (that I have within telegraf) to query more information using kubectl but always get the response that the pod doesn't exist... but it must exist! It's sending live data!
We had been experiencing issues with a node within the cluster. Specifically, the node was experiencing GC failures and communications into the cluster from that node was broken. Due to these failures, someone on our team performed a 'kubectl delete' on the node from within the cluster. By doing so the node continued running, but also the kubelet running on the node remained in a broken state, and so the node couldn't re-auto-register itself into the cluster. This node happened to be running the 2nd pod, and the pods running on the node continued running without issue. In our case, the node was running on AWS, in which case the way to avoid this situation is to reboot the node either from the AWS console or AWS API.

What happens when the Kubernetes master fails?

I've been trying to figure out what happens when the Kubernetes master fails in a cluster that only has one master. Do web requests still get routed to pods if this happens, or does the entire system just shut down?
According to the OpenShift 3 documentation, which is built on top of Kubernetes, (https://docs.openshift.com/enterprise/3.2/architecture/infrastructure_components/kubernetes_infrastructure.html), if a master fails, nodes continue to function properly, but the system looses its ability to manage pods. Is this the same for vanilla Kubernetes?
In typical setups, the master nodes run both the API and etcd and are either largely or fully responsible for managing the underlying cloud infrastructure. When they are offline or degraded, the API will be offline or degraded.
In the event that they, etcd, or the API are fully offline, the cluster ceases to be a cluster and is instead a bunch of ad-hoc nodes for this period. The cluster will not be able to respond to node failures, create new resources, move pods to new nodes, etc. Until both:
Enough etcd instances are back online to form a quorum and make progress (for a visual explanation of how this works and what these terms mean, see this page).
At least one API server can service requests
In a partially degraded state, the API server may be able to respond to requests that only read data.
However, in any case, life for applications will continue as normal unless nodes are rebooted, or there is a dramatic failure of some sort during this time, because TCP/ UDP services, load balancers, DNS, the dashboard, etc. Should all continue to function for at least some time. Eventually, these things will all fail on different timescales. In single master setups or complete API failure, DNS failure will probably happen first as caches expire (on the order of minutes, though the exact timing is configurable, see the coredns cache plugin documentation). This is a good reason to consider a multi-master setup–DNS and service routing can continue to function indefinitely in a degraded state, even if etcd can no longer make progress.
There are actions that you could take as an operator which would accelerate failures, especially in a fully degraded state. For instance, rebooting a node would cause DNS queries and in fact probably all pod and service networking functionality until at least one master comes back online. Restarting DNS pods or kube-proxy would also be bad.
If you'd like to test this out yourself, I recommend kubeadm-dind-cluster, kind or, for more exotic setups, kubeadm on VMs or bare metal. Note: kubectl proxy will not work during API failure, as that routes traffic through the master(s).
Kubernetes cluster without a master is like a company running without a Manager.
No one else can instruct the workers(k8s components) other than the Manager(master node)(even you, the owner of the cluster, can only instruct the Manager)
Everything works as usual. Until the work is finished or something stopped them.(because the master node died after assigning the works)
As there is no Manager to re-assign any work for them, the workers will wait and wait until the Manager comes back.
The best practice is to assign multiple managers(master) to your cluster.
Although your data plane and running applications does not immediately starts breaking but there are several scenarios where cluster admins will wish they had multi-master setup. Key to understanding the impact would be understanding which all components talk to master for what and how and more importantly when will they fail if master fails.
Although your application pods running on data plane will not get immediately impacted but imagine a very possible scenario - your traffic suddenly surged and your horizontal pod autoscaler kicked in. The autoscaling would not work as Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and vertical pod autoscaler ( but your API server is already dead).If your pod memory shoots up because of high load then it will eventually lead to getting killed by k8s OOM killer. If any of the pods die, then since controller manager and scheduler talks to API Server to watch for current state of pods so they too will fail. In short a new pod will not be scheduled and your application may stop responding.
One thing to highlight is that Kubernetes system components communicate only with the API server. They don’t
talk to each other directly and so their functionality themselves could fail I guess. Unavailable master plane can mean several things - failure of any or all of these components - API server,etcd, kube scheduler, controller manager or worst the entire node had crashed.
If API server is unavailable - no one can use kubectl as generally all commands talk to API server ( meaning you cannot connect to cluster, cannot login into any pods to check anything on container file system. You will not be able to see application logs unless you have any additional centralized log management system).
If etcd database failed or got corrupted - your entire cluster state data is gone and the admins would want to restore it from backups as early as possible.
In short - a failed single master control plane although may not immediately impact traffic serving capability but cannot be relied on for serving your traffic.

GKE cluster v1.2.4 - route recreation

I've created a 3 nodes GKE cluster which is running fine but I noticed a few times that my components are not able to reach the API Server during 3 or 4 minutes.
I recently had the same problem again on a fresh new cluster so I decided to look a bit closer. In the Compute Engine Operations section I noticed at the same time that the 3 routes had been removed and recreated 4 minutes later... This task had been scheduled by a #cloudservices.gserviceaccount.com address so from the cluster directly I suppose.
What is causing this behavior, forcing the routes to be deleted and recreated randomly ?
The apiserver may become unreachable if it gets temporarily overloaded or if it is upgraded or repaired. This should be unrelated to routes being removed and recreated, although it's possible that the node manager does not behave correctly when it is restarted.

How to restart unresponsive kubernetes master in GKE

The kubernetes master in one of my GKE clusters became unresponsive last night following the infrastructure issue in us-central1-a.
Whenever I run "kubectl get pods" in the default namespace I get the following error message:
Error from server: an error on the server has prevented the request from succeeding
If I run "kubectl get pods --namespace=kube-system", I only see the kube-proxy and the fluentd-logging daemon.
I have trying scaling the cluster down to 0 and then scaling it back up. I have also tried downgrading and upgrading the cluster but that seems to apply only to the nodes (not the master). Is there any GKE/K8S API command to issue a restart to the kubernetes master?
There is not a command that will allow you to restart the Kubernetes master in GKE (since the master is considered a part of the managed service). There is automated infrastructure (and then an oncall engineer from Google) that is responsible for restarting the master if it is unhealthy.
In this particular cases, restarting the master had no effect on restoring it to normal behavior because Google Compute Engine Incident #16011 caused an outage on 2016-06-28 for GKE masters running in us-central1-a (even though that isn't indicated on the Google Cloud Status Dashboard). During the incident, many masters were unavailable.
If you had tried to create a GCE cluster using kube-up.sh during that time, you would have similarly seen that it would be unable to create a functional master VM due to the SSD Persistent disk latency issues.
I'm trying to have at least one version to upgrade ready, if you trying to upgrade the master, it will restart and work within few minutes. Otherwise you should wait around 3 days while Google team will reboot it. On e-mail/phone, then won't help you. And unless you have payed support (transition to which taking few days), they won't give a bird.