How to detect GKE autoupgrading a node in Stackdriver logs - kubernetes

We have a GKE cluster with auto-upgrading nodes. We recently noticed a node become unschedulable and eventually deleted that we suspect was being upgraded automatically for us. Is there a way to confirm (or otherwise) in Stackdriver that this was indeed the cause what was happening?

You can use the following advanced logs queries with Cloud Logging (previously Stackdriver) to detect upgrades to node pools:
protoPayload.methodName="google.container.internal.ClusterManagerInternal.UpdateClusterInternal"
resource.type="gke_nodepool"
and master:
protoPayload.methodName="google.container.internal.ClusterManagerInternal.UpdateClusterInternal"
resource.type="gke_cluster"
Additionally, you can control when the update are applied with Maintenance Windows (like the user aurelius mentioned).

I think your question has been already answered in the comments. Just as addition automatic upgrades occur at regular intervals at the discretion of the GKE team. To get more control you can create a Maintenance Windows as explained here. This is basically a time frame that you choose in which automatic upgrades should occur.

Related

Two versions of fluentd fighting over port in my cluster

Somehow, I have 2 versions of fluentd running in my cluster:
They end up fighting over the same port, they just keep cranking away, trying to start up on that port, and it saturates all the CPU in the cluster.
unexpected error error_class=Errno::EADDRINUSE error="Address already in use - bind(2) for 0.0.0.0:24231
/opt/google-fluentd/embedded/lib/ruby/2.6.0/socket.rb:201:in 'bind'
I've tried deleting the daemon sets and deployments, they just keep coming back. Also tried ssh'ing into the machines and killing the process on that port. Nothing seems to work.
Obviously, I only want one version of fluentd to run (and I'm not even sure which one).
I seem to have fixed it. I went to GCP dashboard cluster edit page, Kubernetes Engine Monitoring dropdown was blank. It seems not even the dropdown could decide what to display here.
It seems the automated agent, or whatever, seriously messed up here, and had 2 versions of the logging and monitoring system running, fighting over a port, and crushing the CPU on every machine in the cluster. On top of that, I couldn't delete the daemon sets, pods, or deployments. It seems Google treats these as special somehow, maybe with some kind of automated agent, I don't know.
From the dropdown, I just selected System and workload logging and monitoring, saved, and it applied the changes.
Everything looking good so far, but this whole event has me worried, I didn't do anything. This just....happened.
This is a dev cluster, but if it was a production cluster...

GKE Node Upgrade "Out of Resources"

I had left my GKE cluster running 3 minor versions behind the latest and decided to finally upgrade. The master upgrade went well but then my node upgrades kept failing. I used the Cloud Shell console to manually start an upgrade and view the output, which said something along the lines of "Zone X is out of resources, try Y instead." Unfortunately,I can't just spin up a new node pool in a new zone and have my pipeline work because I am using GitLab's AutoDevOps pipeline and they make certain assumptions about node pool naming and such that I can't find any way to override. I also don't want to potentially lose the data stored in my persistent volumes if I end up needing to re-create everything in a new node-pool.
I just solved this issue but couldn't find any questions posed on this particular problem, so I wanted to post the answer here in case someone else comes looking for it.
My particular problem was that I had a non-autoscaling node pool with a single node. For my purposes, that's enough for the application stack to run smoothly and I don't want to incur unforeseen charges with additional nodes automatically being added to the pool. However, this meant that the upgrade had to apparently share resources with everything else running on that node to perform the upgrade, which it didn't have enough of. The solution was simple: add more nodes temporarily.
Because this is specifically GKE, I was able to use a beta feature called "surge upgrade", which allows you to set the maximum number of "surge" nodes to add when performing an upgrade. Once this was enabled, I started the upgrade process again and it temporarily added an extra node, performed the upgrade, and then scaled back down to a single node.
If you aren't on GKE, or don't wish to use a beta feature (or can't), then simply resize the node pool with the node(s) that needs upgrading. I would add a single node unless you are positive you need more.

(How) do node pool autoupgrades in GKE actually work?

We have a fairly large kubernetes deployment on GKE, and we wanted to make our life a little easier by enabling auto-upgrades. The documentation on the topic tells you how to enable it, but not how it actually works.
We enabled the feature on a test cluster, but no nodes were ever upgraded (although the UI kept nagging us that "upgrades are available").
The docs say it would be updated to the "latest stable" version and that it occurs "at regular intervals at the discretion of the GKE team" - both of which is not terribly helpful.
The UI always says: "Next auto-upgrade: Not scheduled"
Has someone used this feature in production and can shed some light on what it'll actually do?
What I did:
I enabled the feature on the nodepools (not the cluster itself)
I set up a maintenance window
Cluster version was 1.11.7-gke.3
Nodepools had version 1.11.5-gke.X
The newest available version was 1.11.7-gke.6
What I expected:
The nodepool would be updated to either 1.11.7-gke.3 (the default cluster version) or 1.11.7-gke.6 (the most recent version)
The update would happen in the next maintenance window
The update would otherwise work like a "manual" update
What actually happened:
Nothing
The nodepools remained on 1.11.5-gke.X for more than a week
My question
Is the nodepool version supposed to update?
If so, at what time?
If so, to what version?
I'll finally answer this myself. The auto-upgrade does work, though it took several days to a week until the version was upgraded.
There is no indication of the planned upgrade date, or any feedback other than the version updating.
It will upgrade to the current master version of the cluster.
Addition: It still doesn't work reliably, and still no way to debug if it doesn't. One information I got was that the mechanism does not work if you initially provided a specific version for the node pool. As it is not possible to deduce the inner workings of the autoupdates, we had to resort to manually checking the status again.
I wanted to share two other possibilities as to why a node-pool may not be auto-upgrading or scheduled to upgrade.
One of our projects was having the similar issue where the master version had auto-upgraded to 1.14.10-gke.27 but our node-pool stayed stuck at 1.14.10-gke.24 for over a month.
Reaching a node quota
The node-pool upgrade might be failing due to a node quota (although I'm not sure the web console would say Next auto-upgrade: Not scheduled). From the node upgrades documentation, it suggests we can run the following to view any failed upgrade operations:
gcloud container operations list --filter="STATUS=DONE AND TYPE=UPGRADE_NODES AND targetLink:https://container.googleapis.com/v1/projects/[PROJECT_ID]/zones/[ZONE]/clusters/[CLUSTER_NAME]"
Automatic node upgrades are for minor+ versions only
After exhausting my troubleshooting steps, I reached out GCP Support and opened a case (Case 23113272 for anyone working at Google). They told me the following:
Automatic node upgrade:
The node version could not necessary upgrade automatically, let me explain, exists three upgrades in a node: Minor versions (1.X), Patch releases (1.X.Y) and Security updates and bug fixes (1.X.Y-gke.N), please take a look at this documentation [2] the automatic node upgrade works from a minor version and in your case the upgrade was a security update that can't upgrade automatically.
I responded back and they confirmed that automatic node upgrades will only happen for minor versions and above. I have requested that they submit a request to update their documentation because (at the time of this response) it is not outlined anywhere in their node auto-upgrade documentation.
This feature replaces the VMs (Kubernetes nodes) in your node pool running the "old" Kubernetes version with VMs running the "new" version.
The node pool "upgrade" operation is done in a rolling fashion: It's not like GKE deletes all your VMs and recreates them simultaneously (except when you have only 1 node in your cluster). By default, the nodes are replaced with newer nodes one-by-one (although this might change).
GKE internally uses mostly the features of managed instance groups to manage operations on node pools.
You can find documentation on how to schedule node upgrades by specifying certain "maintenance windows" so you are impacted minimally. (This article also gives a bit more insights on how upgrades happen.)
That said, you can disable auto-upgrades and upgrade your cluster manually (although this is not recommended). Some GKE users have thousands of nodes, therefore for them, upgrading VMs one-by-one are not feasible.
For that GKE offers an option that lets you choose "how many nodes are upgraded at a time":
gcloud container clusters upgrade \
--concurrent-node-count=CONCURRENT_NODE_COUNT
Documentation of this flag says:
The number of nodes to upgrade concurrently. Valid values are [1, 20]. It is a recommended best practice to set this value to no higher than 3% of your cluster size.'

How is the preemption notice handled?

I'm currently running on AWS and use kube-aws/kube-spot-termination-notice-handler to intercept an AWS spot termination notice and gracefully evict the pods.
I'm reading this GKE documentation page and I see:
Preemptible instances terminate after 30 seconds upon receiving a preemption notice.
Going into the Compute Engine documentation, I see that a ACPI G2 Soft Off is sent 30 seconds before the termination happens but this issue suggests that the kubelet itself doesn't handle it.
So, how does GKE handle preemption? Will the node do a drain/cordon operation or does it just do a hard shutdown?
Yes you are right, so far there is no built in way to handle ACPI G2 Soft Off.
Notice that if normal preemptible instance supports shutdown scripts (where you could introduce some kind of logic to perform drain/cordon), this is not the case if they are Kubernetes nodes:
Currently, preemptible VMs do not support shutdown scripts.
You can perform some test but quoting again from documentation:
You can simulate an instance preemption by stopping the instance.
And so far if you stop the instance, even if it is a Kubernetes node no action is taken to cordon/drain and gratefully remove the node from the cluster.
However this feature is still in beta therefore it is at its early stage of life and in this moment it is a matter of discussion if and how introduce this feature.
Disclaimer: I work For Google Cloud Platform Support
More recent and relevant answer
There's a GitHub project (not mine) that catches this ACPI handler and has the node cordon and drain itself, and then restart itself which in our tests results in a much cleaner preemptible experience, it's almost not noticeable with a highly available deployments on your cluster.
See: https://github.com/GoogleCloudPlatform/k8s-node-termination-handler

Windows OS Update/Patch handling - best practices for SF today

I'm aware that the SF doesn't yet automatically handle OS Upgrades/patching in any way like Cloud Services do. I eagerly await it when that is ready. But for now I am curious what I should expect by default.
Since SF uses Scale Sets and standard Windows VMs, should I expect that the instances will have the default Windows Update settings and thus will reboot automatically every so often as updates are applied? I believe the defaults are to install updates automatically and reboot during the defined maintenance window (3am?), is that correct?
If that is true, can I expect that SF will gracefully handle the reboot? By that I mean any services running on it are shutdown and the load balancer is notified to stop sending requests to any externally visible endpoints on that host?
But taking that a step further, if all of the above happens to be true, is there anything preventing all nodes in my cluster to hit the maintenance window and reboot at the same time? That would seem catastrophic to me.
Given all that, what is the best practice and general advice for handling Windows Updates in SF today?
You're correct that there could be catastrophic results if you just turn on Windows Update and let it go. There will be no coordination when the node reboots and you could lose part or all of your application or cluster if the nodes cause the service fabric services to lose quorum.
The only safe approach is to install the patches/updates on a single node at a time and don't move to the next node until the cluster is healthy. This can be scripted to make it easier or worst case can be done manually.
There may be another approach that has to do with adding nodetypes, but it is not yet tested, so I don't want to give details until we know it works.