Pods assigned to node with no role

Pods assigned to node with no role - kubernetes

I am currently doing some scenarios on Katacoda and I am curious about the node roles.
My cluster has 2 nodes and one has the role Master and the other has no role.
NAME STATUS ROLES AGE VERSION
controlplane Ready master 5m11s v1.18.0
node01 Ready <none> 4m40s v1.18.0
Now when I look at the pods in my cluster, it seems like all three pods are assigned to node01
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
random-logger-7687d48b59-jftl9 1/1 Running 0 26s 10.244.1.5 node01 <none> <none>
random-logger-7687d48b59-mwz4h 1/1 Running 0 26s 10.244.1.4 node01 <none> <none>
random-logger-7687d48b59-vgbc8 1/1 Running 0 117s 10.244.1.3 node01 <none> <none>
My understanding is that the scheduler puts pods on notes that are assigned the worker role, so why are they being scheduled on a node with no role? Is a node with no role in the cluster assumed to be a worker node?

use kubectl describe nodes my-node to see the details of the node
A node role is just a label with the format node-role.kubernetes.io/
You can add this yourself with 'kubectl label'

Related

How to make k8s imagePullPolicy = never work?

I have followed the instructions on this blog to create a simple container image and deploy it in a k8s cluster.
However, in my case the pods do not run:
student#master:~$ k get pod -o wide -l app=hello-python --field-selector spec.nodeName=master
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-python-58547cf485-7l8dg 0/1 ErrImageNeverPull 0 2m26s 192.168.219.126 master <none> <none>
hello-python-598c594dc5-4c9zd 0/1 ErrImageNeverPull 0 2m26s 192.168.219.67 master <none> <none>
student#master:~$ sudo podman images hello-python
REPOSITORY TAG IMAGE ID CREATED SIZE
localhost/hello-python latest 11cf1e5a86b1 50 minutes ago 941 MB
student#master:~$ hostname
master
student#master:~$
I understand why it may not work on the worker node, but why it does not work on the same node where the image is cached - the master node?
student#master:~$ k describe pod hello-python-58547cf485-7l8dg | grep -A 10 'Events:'
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/hello-python-58547cf485-7l8dg to master
Warning Failed 8m7s (x12 over 10m) kubelet Error: ErrImageNeverPull
Warning ErrImageNeverPull 4m59s (x27 over 10m) kubelet Container image "localhost/hello-python:latest" is not present with pull policy of Never
student#master:~$
My question is: how to make the pod run on the master node with the imagePullPolicy = never given that the image in question is available on the master node as the podman images attests?
EDIT 1
I am using a k8s cluster running on two VMs deployed in GCE. It was setup with a script provided in the context of the Linux Foundation Kubernetes Developer course LFD0259.
EDIT 2
The master node is allowed to run workloads - this is how the LFD259 course sets it up. For example:
student#master:~$ k create deployment xyz --image=httpd
deployment.apps/xyz created
student#master:~$ k get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
xyz-6c6bd4cd89-qn4zr 1/1 Running 0 5m37s 192.168.171.66 worker <none> <none>
student#master:~$
student#master:~$ k scale deployment xyz --replicas=10
deployment.apps/xyz scaled
student#master:~$ k get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
xyz-6c6bd4cd89-c2xv4 1/1 Running 0 73s 192.168.219.71 master <none> <none>
xyz-6c6bd4cd89-g89k2 0/1 ContainerCreating 0 73s <none> master <none> <none>
xyz-6c6bd4cd89-jfftl 0/1 ContainerCreating 0 73s <none> worker <none> <none>
xyz-6c6bd4cd89-kbdnq 1/1 Running 0 73s 192.168.219.106 master <none> <none>
xyz-6c6bd4cd89-nm6rt 0/1 ContainerCreating 0 73s <none> worker <none> <none>
xyz-6c6bd4cd89-qn4zr 1/1 Running 0 7m22s 192.168.171.66 worker <none> <none>
xyz-6c6bd4cd89-vts6x 1/1 Running 0 73s 192.168.171.84 worker <none> <none>
xyz-6c6bd4cd89-wd2ls 1/1 Running 0 73s 192.168.171.127 worker <none> <none>
xyz-6c6bd4cd89-wv4jn 0/1 ContainerCreating 0 73s <none> worker <none> <none>
xyz-6c6bd4cd89-xvtlm 0/1 ContainerCreating 0 73s <none> master <none> <none>
student#master:~$

It depends how you've set up your Kubernetes Cluster. I assume you've installed it with kubeadm. However, by default the Master is not scheduleable for workloads. And by my understanding the image you're talking about only exists on the master node right? If that's the case you can't start a pod with that Image as it only exists on the master node, which doesn't allow workloads by default.
If you were to copy the Image to the worker node, your given command should work.
However if you want to make your Master-Node scheduleable just taint it with (maybe you need to amend the last bit if it differs from yours):
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

how to communicate with daemonset pod from another pod in another node?

I have a daemonset configuration that runs on all nodes.
every pod listens on port 34567. I want from other pod on different node to communicate with this pod. how can I achieve that?

Find the target Pod's IP address as shown below
controlplane $ k get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-fb8b8dccf-42pq8 1/1 Running 1 5m43s 10.88.0.4 node01 <none> <none>
coredns-fb8b8dccf-f9n5x 1/1 Running 1 5m43s 10.88.0.3 node01 <none> <none>
etcd-controlplane 1/1 Running 0 4m38s 172.17.0.23 controlplane <none> <none>
katacoda-cloud-provider-74dc75cf99-2jrpt 1/1 Running 3 5m42s 10.88.0.2 node01 <none> <none>
kube-apiserver-controlplane 1/1 Running 0 4m33s 172.17.0.23 controlplane <none> <none>
kube-controller-manager-controlplane 1/1 Running 0 4m45s 172.17.0.23 controlplane <none> <none>
kube-keepalived-vip-smkdc 1/1 Running 0 5m27s 172.17.0.26 node01 <none> <none>
kube-proxy-8sxkt 1/1 Running 0 5m27s 172.17.0.26 node01 <none> <none>
kube-proxy-jdcqc 1/1 Running 0 5m43s 172.17.0.23 controlplane <none> <none>
kube-scheduler-controlplane 1/1 Running 0 4m47s 172.17.0.23 controlplane <none> <none>
weave-net-8cxqg 2/2 Running 1 5m27s 172.17.0.26 node01 <none> <none>
weave-net-s4tcj 2/2 Running 1 5m43s 172.17.0.23 controlplane <none> <none>
Next "exec" into the originating pod - kube-proxy-8sxkt in my example
kubectl -n kube-system exec -it kube-proxy-8sxkt sh
Next, you will use the destination pod's IP and port (10256 - my example) number to connect. Please note that you may have to install curl/telnet if your originating container's image does not include the application
# curl telnet://172.17.0.23:10256
HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

You can call via pod's IP.
Note: This IP can only be used in the k8s cluster.

POD address (IP) is a good option you can use it, unless you know the POD IP which might get changed from time to time due to deployment and scaling changes.
i would suggest trying out the Daemon set by exposing it using the service type Node port if you have a fix amount of Node and not much autoscaling there.
If you want to connect your POD with a specific POD you can use the Node IP on which POD is scheduled and use the Node port service.
Node IP:Node port
Read more at : https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
If you don't want to connect to a specific POD and just any of the Daemon sets replica will work to connect with you can use the service name to connect PODs with each other.
my-svc.my-namespace.svc.cluster-domain.example
Read more about the service and POD DNS
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

Rook ceph broken on kubernetes?

Using Ceph v1.14.10, Rook v1.3.8 on k8s 1.16 on-premise. After 10 days without any trouble, we decided to drain some nodes, then, all moved pods cant attach to their PV any more, look like Ceph cluster is broken:
My ConfigMap rook-ceph-mon-endpoints is referencing 2 missing mon pod IPs:
csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["10.115.0.129:6789","10.115.0.4:6789","10.115.0.132:6789"]}]
But
kubectl -n rook-ceph get pod -l app=rook-ceph-mon -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-mon-e-56b849775-4g5wg 1/1 Running 0 6h42m 10.115.0.2 XXXX <none> <none>
rook-ceph-mon-h-fc486fb5c-8mvng 1/1 Running 0 6h42m 10.115.0.134 XXXX <none> <none>
rook-ceph-mon-i-65666fcff4-4ft49 1/1 Running 0 30h 10.115.0.132 XXXX <none> <none>
Is it normal or I must run a kind of "reconciliation" task to update the CM with new mon pod IPs ?
(could be related to https://github.com/rook/rook/issues/2262)
I had to manualy update:
secret rook-ceph-config
cm rook-ceph-mon-endpoints
cm rook-ceph-csi-config

As #travisn said:
The operator owns updating that configmap and secret. It's not expected to update them manually unless there is some disaster recovery situation as described at https://rook.github.io/docs/rook/v1.4/ceph-disaster-recovery.html.

How to scale a kubernetes cluster while limiting costs on GCP

We have a GKE cluster set up on google cloud platform.
We have an activity that requires 'bursts' of computing power.
Imagine that we usually do 100 computations an hour on average, then suddently we need to be able to process 100000 in less then two minutes. However most of the time, everything is close to idle.
We do not want to pay for idle servers 99% of the time, and want to scale clusters depending on actual use (no data persistance needed, servers can be deleted afterwards). I looked up the documentation available on kubernetes regarding auto scaling, for adding more pods with HPA and adding more nodes with cluster autoscaler
However it doesn't seem like any of these solutions would actually reduce our costs or improve performances, because they do not seem to scale past the GCP plan:
Imagine that we have a google plan with 8 CPUs. My understanding is if we add more nodes with cluster autoscaler we will just instead of having e.g. 2 nodes using 4 CPUs each we will have 4 nodes using 2 cpus each. But the total available computing power will still be 8 CPU.
Same reasoning go for HPA with more pods instead of more nodes.
If we have the 8 CPU payment plan but only use 4 of them, my understanding is we still get billed for 8 so scaling down is not really useful.
What we want is autoscaling to change our payment plan temporarly (imagine from n1-standard-8 to n1-standard-16) and get actual new computing power.
I can't believe we are the only ones with this use case but I cannot find any documentation on this anywhere! Did I misunderstand something ?

TL;DR:
Create a small persistant node-pool
Create a powerfull node-pool that can be scaled to zero (and cease billing) while not in use.
Tools used:
GKE’s Cluster Autoscaling, Node selector, Anti-affinity rules and Taints and tolerations.
GKE Pricing:
From GKE Pricing:
Starting June 6, 2020, GKE will charge a cluster management fee of $0.10 per cluster per hour. The following conditions apply to the cluster management fee:
One zonal cluster per billing account is free.
The fee is flat, irrespective of cluster size and topology.
Billing is computed on a per-second basis for each cluster. The total amount is rounded to the nearest cent, at the end of each month.
From Pricing for Worker Nodes:
GKE uses Compute Engine instances for worker nodes in the cluster. You are billed for each of those instances according to Compute Engine's pricing, until the nodes are deleted. Compute Engine resources are billed on a per-second basis with a one-minute minimum usage cost.
Enters, Cluster Autoscaler:
automatically resize your GKE cluster’s node pools based on the demands of your workloads. When demand is high, cluster autoscaler adds nodes to the node pool. When demand is low, cluster autoscaler scales back down to a minimum size that you designate. This can increase the availability of your workloads when you need it, while controlling costs.
Cluster Autoscaler cannot scale the entire cluster to zero, at least one node must always be available in the cluster to run system pods.
Since you already have a persistent workload, this wont be a problem, what we will do is create a new node pool:
A node pool is a group of nodes within a cluster that all have the same configuration. Every cluster has at least one default node pool, but you can add other node pools as needed.
For this example I'll create two node pools:
A default node pool with a fixed size of one node with a small instance size (emulating the cluster you already have).
A second node pool with more compute power to run the jobs (I'll call it power-pool).
Choose the machine type with the power you need to run your AI Jobs, for this example I'll create a n1-standard-8.
This power-pool will have autoscaling set to allow max 4 nodes, minimum 0 nodes.
If you like to add GPUs you can check this great: Guide Scale to almost zero + GPUs.
Taints and Tolerations:
Only the jobs related to the AI workload will run on the power-pool, for that use a node selector in the job pods to make sure they run in the power-pool nodes.
Set a anti-affinity rule to ensure that two of your training pods cannot be scheduled on the same node (optimizing the price-performance ratio, this is optional depending on your workload).
Add a taint to the power-pool to avoid other workloads (and system resources) to be scheduled on the autoscalable pool.
Add the tolerations to the AI Jobs to let them run on those nodes.
Reproduction:
Create the Cluster with the persistent default-pool:
PROJECT_ID="YOUR_PROJECT_ID"
GCP_ZONE="CLUSTER_ZONE"
GKE_CLUSTER_NAME="CLUSTER_NAME"
AUTOSCALE_POOL="power-pool"
gcloud container clusters create ${GKE_CLUSTER_NAME} \
--machine-type="n1-standard-1" \
--num-nodes=1 \
--zone=${GCP_ZONE} \
--project=${PROJECT_ID}
Create the auto-scale pool:
gcloud container node-pools create ${GKE_BURST_POOL} \
--cluster=${GKE_CLUSTER_NAME} \
--machine-type=n1-standard-8 \
--node-labels=load=on-demand \
--node-taints=reserved-pool=true:NoSchedule \
--enable-autoscaling \
--min-nodes=0 \
--max-nodes=4 \
--zone=${GCP_ZONE} \
--project=${PROJECT_ID}
Note about parameters:
--node-labels=load=on-demand: Add a label to the nodes in the power pool to allow selecting them in our AI job using a node selector.
--node-taints=reserved-pool=true:NoSchedule: Add a taint to the nodes to prevent any other workload from accidentally being scheduled in this node pool.
Here you can see the two pools we created, the static pool with 1 node and the autoscalable pool with 0-4 nodes.
Since we don't have workload running on the autoscalable node-pool, it shows 0 nodes running (and with no charge while there is no node in execution).
Now we'll create a job that create 4 parallel pods that run for 5 minutes.
This job will have the following parameters to differentiate from normal pods:
parallelism: 4: to use all 4 nodes to enhance performance
nodeSelector.load: on-demand: to assign to the nodes with that label.
podAntiAffinity: to declare that we do not want two pods with the same label app: greedy-job running in the same node (optional).
tolerations: to match the toleration to the taint that we attached to the nodes, so these pods are allowed to be scheduled in these nodes.
apiVersion: batch/v1
kind: Job
metadata:
name: greedy-job
spec:
parallelism: 4
template:
metadata:
name: greedy-job
labels:
app: greedy-app
spec:
containers:
- name: busybox
image: busybox
args:
- sleep
- "300"
nodeSelector:
load: on-demand
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- greedy-app
topologyKey: "kubernetes.io/hostname"
tolerations:
- key: reserved-pool
operator: Equal
value: "true"
effect: NoSchedule
restartPolicy: OnFailure
Now that our cluster is in standby we will use the job yaml we just created (I'll call it greedyjob.yaml). This job will run four processes that will run in parallel and that will complete after about 5 minutes.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 42m v1.14.10-gke.27
$ kubectl get pods
No resources found in default namespace.
$ kubectl apply -f greedyjob.yaml
job.batch/greedy-job created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
greedy-job-2xbvx 0/1 Pending 0 11s
greedy-job-72j8r 0/1 Pending 0 11s
greedy-job-9dfdt 0/1 Pending 0 11s
greedy-job-wqct9 0/1 Pending 0 11s
Our job was applied, but is in pending, let's see what's going on in those pods:
$ kubectl describe pod greedy-job-2xbvx
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28s (x2 over 28s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match node selector.
Normal TriggeredScaleUp 23s cluster-autoscaler pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/owilliam/zones/us-central1-b/instanceGroups/gke-autoscale-to-zero-clus-power-pool-564148fd-grp 0->1 (max: 4)}]
The pod can't be scheduled on the current node due to the rules we defined, this triggers a Scale Up routine on our power-pool. This is a very dynamic process, after 90 seconds the first node is up and running:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
greedy-job-2xbvx 0/1 Pending 0 93s
greedy-job-72j8r 0/1 ContainerCreating 0 93s
greedy-job-9dfdt 0/1 Pending 0 93s
greedy-job-wqct9 0/1 Pending 0 93s
$ kubectl nodes
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 44m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 11s v1.14.10-gke.27
Since we set pod anti-affinity rules, the second pod can't be scheduled on the node that was brought up and triggers the next scale up, take a look at the events on the second pod:
$ k describe pod greedy-job-2xbvx
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal TriggeredScaleUp 2m45s cluster-autoscaler pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/owilliam/zones/us-central1-b/instanceGroups/gke-autoscale-to-zero-clus-power-pool-564148fd-grp 0->1 (max: 4)}]
Warning FailedScheduling 93s (x3 over 2m50s) default-scheduler 0/1 nodes are available: 1 node(s) didn't match node selector.
Warning FailedScheduling 79s (x3 over 83s) default-scheduler 0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) had taints that the pod didn't tolerate.
Normal TriggeredScaleUp 62s cluster-autoscaler pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/owilliam/zones/us-central1-b/instanceGroups/gke-autoscale-to-zero-clus-power-pool-564148fd-grp 1->2 (max: 4)}]
Warning FailedScheduling 3s (x3 over 68s) default-scheduler 0/2 nodes are available: 1 node(s) didn't match node selector, 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules.
The same process repeats until all requirements are satisfied:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
greedy-job-2xbvx 0/1 Pending 0 3m39s
greedy-job-72j8r 1/1 Running 0 3m39s
greedy-job-9dfdt 0/1 Pending 0 3m39s
greedy-job-wqct9 1/1 Running 0 3m39s
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 46m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 2m16s v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q Ready <none> 28s v1.14.10-gke.27
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
greedy-job-2xbvx 0/1 Pending 0 5m19s
greedy-job-72j8r 1/1 Running 0 5m19s
greedy-job-9dfdt 1/1 Running 0 5m19s
greedy-job-wqct9 1/1 Running 0 5m19s
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 48m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 Ready <none> 63s v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 4m8s v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q Ready <none> 2m20s v1.14.10-gke.27
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
greedy-job-2xbvx 1/1 Running 0 6m12s
greedy-job-72j8r 1/1 Running 0 6m12s
greedy-job-9dfdt 1/1 Running 0 6m12s
greedy-job-wqct9 1/1 Running 0 6m12s
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 48m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 Ready <none> 113s v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv Ready <none> 26s v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 4m58s v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q Ready <none> 3m10s v1.14.10-gke.27
Here we can see that all nodes are now up and running (thus, being billed by second)
Now all jobs are running, after a few minutes the jobs complete their tasks:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
greedy-job-2xbvx 1/1 Running 0 7m22s
greedy-job-72j8r 0/1 Completed 0 7m22s
greedy-job-9dfdt 1/1 Running 0 7m22s
greedy-job-wqct9 1/1 Running 0 7m22s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
greedy-job-2xbvx 0/1 Completed 0 11m
greedy-job-72j8r 0/1 Completed 0 11m
greedy-job-9dfdt 0/1 Completed 0 11m
greedy-job-wqct9 0/1 Completed 0 11m
Once the task is completed, the autoscaler starts downsizing the cluster.
You can learn more about the rules for this process here: GKE Cluster AutoScaler
$ while true; do kubectl get nodes ; sleep 60; done
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 54m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 Ready <none> 7m26s v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv Ready <none> 5m59s v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 10m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q Ready <none> 8m43s v1.14.10-gke.27
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 62m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 Ready <none> 15m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv Ready <none> 14m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 18m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-sf6q NotReady <none> 16m v1.14.10-gke.27
Once conditions are met, autoscaler flags the node as NotReady and starts removing them:
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 64m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 NotReady <none> 17m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv NotReady <none> 16m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw Ready <none> 20m v1.14.10-gke.27
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 65m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-39m2 NotReady <none> 18m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv NotReady <none> 17m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-qxkw NotReady <none> 21m v1.14.10-gke.27
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 66m v1.14.10-gke.27
gke-autoscale-to-zero-clus-power-pool-564148fd-ggxv NotReady <none> 18m v1.14.10-gke.27
NAME STATUS ROLES AGE VERSION
gke-autoscale-to-zero-cl-default-pool-9f6d80d3-x9lb Ready <none> 67m v1.14.10-gke.27
Here is the confirmation that the nodes were removed from GKE and from VMs(remember that every node is a Virtual Machine billed as Compute Engine):
Compute Engine: (note that gke-cluster-1-default-pool is from another cluster, I added it to the screenshot to show you that there is no other node from cluster gke-autoscale-to-zero other than the default persistent one.)
GKE:
Final Thoughts:
When scaling down, cluster autoscaler respects scheduling and eviction rules set on Pods. These restrictions can prevent a node from being deleted by the autoscaler. A node's deletion could be prevented if it contains a Pod with any of these conditions:
An application's PodDisruptionBudget can also prevent autoscaling; if deleting nodes would cause the budget to be exceeded, the cluster does not scale down.
You can note that the process is really fast, in our example it took around 90 seconds to upscale a node and 5 minutes to finish downscaling a standby node, providing a HUGE improvement in your billing.
Preemptible VMs can reduce even further your billing, but you will have to consider the kind of workload you are running:
Preemptible VMs are Compute Engine VM instances that last a maximum of 24 hours and provide no availability guarantees. Preemptible VMs are priced lower than standard Compute Engine VMs and offer the same machine types and options.
I know you are still considering the best architecture for your app.
Using APP Engine and IA Platform are optimal solutions as well, but since you are currently running your workload on GKE I wanted to show you an example as requested.
If you have any further questions let me know in the comments.

Why does the pod is running on the master nodes?

My kubernetes cluster looks as follow:
k get nodes
NAME STATUS ROLES AGE VERSION
k8s-1 Ready master 2d22h v1.16.2
k8s-2 Ready master 2d22h v1.16.2
k8s-3 Ready master 2d22h v1.16.2
k8s-4 Ready master 2d22h v1.16.2
k8s-5 Ready <none> 2d22h v1.16.2
k8s-6 Ready <none> 2d22h v1.16.2
k8s-7 Ready <none> 2d22h v1.16.2
As you can see, the cluster consists of 4 master and 3 nodes.
These are the running pods:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default greeter-service-v1-8d97f9bcd-2hf4x 2/2 Running 0 47h 10.233.69.7 k8s-6 <none> <none>
default greeter-service-v1-8d97f9bcd-gnsvp 2/2 Running 0 47h 10.233.65.3 k8s-2 <none> <none>
default greeter-service-v1-8d97f9bcd-lkt6p 2/2 Running 0 47h 10.233.68.9 k8s-7 <none> <none>
default helloweb-77c9476f6d-7f76v 2/2 Running 0 47h 10.233.64.3 k8s-1 <none> <none>
default helloweb-77c9476f6d-pj494 2/2 Running 0 47h 10.233.69.8 k8s-6 <none> <none>
default helloweb-77c9476f6d-tnqfb 2/2 Running 0 47h 10.233.70.7 k8s-5 <none> <none>
Why the pods greeter-service-v1-8d97f9bcd-gnsvp and helloweb-77c9476f6d-7f76v are running on the master?

By default, there is no restriction for Pod to be scheduled on master unless there is a Taint like node-role.kubernetes.io/master:NoSchedule.
You can verify if there is any taint on master node using
kubectl describe k8s-1
or
kubectl get node k8s-secure-master.linxlabs.com -o jsonpath={.spec.taints[]} && echo
If you want to put a taint then use below
kubectl taint node k8s-1 node-role.kubernetes.io/master="":NoSchedule
After adding taint, no new pods will be scheduled on this node unless there is matching toleration on Pod spec.
Read more about Taints and Tolerations here

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse