List nodes under nodePool - kubernetes

I am trying to re-deploy jenkins pod on kubernetes.
After I tried it, I am getting error and the pod is not initializing.
After I describe the pod, I can see
Warning FailedScheduling 46s default-scheduler 0/12 nodes are available: 12 node(s) didn't match node selector.
Normal NotTriggerScaleUp 7s (x4 over 38s) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) didn't match node selector
Also, I see that there is a defined nodePool, which is Node-Selectors: nodePool=default
I have a kubernetes deployment called jenkins where I can see that this value is defined.
I am not sure what should be the nodePool, since I am not sure how can I list all nodePools that I have available.
I can list all nodes using kubectl get nodes, but I do not see any info about nodePool there.
Any advice how to do this?

please check the lables using kubectl get nodes --show-labels looks like your deployment has wrong labels

Related

kubevirt had volume node affinity conflict

I am trying to set up kubevirt following https://kubevirt.io/2019/How-To-Import-VM-into-Kubevirt.html and https://github.com/kubevirt/containerized-data-importer
The instructions are not fully complete so I had to manually create PV
The error I get is
0/1 nodes are available: 1 node(s) had volume node affinity conflict. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
I have tried this https://www.datree.io/resources/kubernetes-troubleshooting-fixing-persistentvolumeclaims-error#anchor0 but still same issue

GKE autoscaler not scaling down the nodes

I have created a google Kubernetes engine with autoscale enabled with minimum and maximum nodes. A few days ago I deployed couple of servers on production which increased the nodes count as expected. but when I deleted those deployments I expect it to resize the nodes which are to scale down. I waited more than an hour but it still did not scale down.
All my other pods are controlled by replica controller since I deployed with kind: deployment.
All my statefulset pods are using PVC as volume.
I'm not sure what prevented the nodes to scale down so I manually scaled the nodes for now. Since I made the changes manually I can not get the autoscaler logs now.
Does anyone know what could be the issue here?
GKE version is 1.16.15-gke.4300
As mentioned in this link
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node
I'm not using any local storage.
pods not having PodDisruptionBudget(don't know what is that)
Pods are created by deployments (helm charts)
only thing is I don't have "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" this annotation. is this must?
I have tested Cluster Autoscaler on my GKE cluster. It work's bit differently than you expected.
Backgorund
You can enable autoscaling using command or enable it during creation like it's described in this documentation.
In Cluster Autoscaler documentation you can find various information like Operation criteria, Limitations, etc.
As I mentioned in comment section, Cluster Autoscaler - Frequently Asked Questions won't work if will encounter one of below situation:
Pods with restrictive PodDisruptionBudget.
Kube-system pods that:
are not run on the node by default, *
don't have a pod disruption budget set or their PDB is too restrictive (since CA 0.6).
Pods that are not backed by a controller object (so not created by deployment, replica set, job, statefulset etc). *
Pods with local storage. *
Pods that cannot be moved elsewhere due to various constraints (lack of resources, non-matching node selectors or affinity, matching anti-affinity, etc)
Pods that have the following annotation set:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
For my tests I've used 6 nodes, with autoscaling range 1-6 and nginx application with requests cpu: 200m and memory: 128Mi.
As OP mentioned that is not able to provide autoscaler logs, I will paste my logs from Logs Explorer. Description of how they can be achieved is in Viewing cluster autoscaler events documentation.
In those logs you should search noScaleDown events. You will find there a few information, however the most important is:
reason: {
parameters: [
0: "kube-dns-66d6b7c877-hddgs"
]
messageId: "no.scale.down.node.pod.kube.system.unmovable"
As it's described in NoScaleDown node-level reasons for "no.scale.down.node.pod.kube.system.unmovable":
Pod is blocking scale down because it's a non-daemonset, non-mirrored, non-pdb-assigned kube-system pod. See the Kubernetes Cluster Autoscaler FAQ for more details.
Solution
If you want to make Cluster Autoscaler work on GKE, you have to create Disruptions with proper information, how to create it can be found in How to set PDBs to enable CA to move kube-system pods?
kubectl create poddisruptionbudget <pdb name> --namespace=kube-system --selector app=<app name> --max-unavailable 1
where you have to specify the correct selector and --max-unavailable or --min-available depends on your needs. For more details, please read Specifying a PodDisruptionBudget documentation.
Tests
$ kubectl get deploy,nodes
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-deployment 16/16 16 16 66m
NAME STATUS ROLES AGE VERSION
node/gke-cluster-1-default-pool-6d42fa0a-1ckn Ready <none> 11m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-2j4j Ready <none> 11m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-388n Ready <none> 3h33m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-5x35 Ready <none> 3h33m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-pdfk Ready <none> 3h33m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-wqtm Ready <none> 11m v1.16.15-gke.6000
$ kubectl get pdb -A
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
kube-system kubedns 1 N/A 1 43m
Scaledown deployment
$ kubectl scale deploy nginx-deployment --replicas=2
deployment.apps/nginx-deployment scaled
After a while (~10-15 minutes) in the event viewer you will find the Decision event and inside you will find information that the node was deleted.
...
scaleDown: {
nodesToBeRemoved: [
0: {
node: {
mig: {
zone: "europe-west2-c"
nodepool: "default-pool"
name: "gke-cluster-1-default-pool-6d42fa0a-grp"
}
name: "gke-cluster-1-default-pool-6d42fa0a-wqtm"
Number of nodes decreased:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-1-default-pool-6d42fa0a-2j4j Ready <none> 30m v1.16.15-gke.6000
gke-cluster-1-default-pool-6d42fa0a-388n Ready <none> 3h51m v1.16.15-gke.6000
gke-cluster-1-default-pool-6d42fa0a-5x35 Ready <none> 3h51m v1.16.15-gke.6000
gke-cluster-1-default-pool-6d42fa0a-pdfk Ready <none> 3h51m v1.16.15-gke.6000
Another place where you can confirm it's scaling down is kubectl get events --sort-by='.metadata.creationTimestamp'
Output:
5m16s Normal NodeNotReady node/gke-cluster-1-default-pool-6d42fa0a-wqtm Node gke-cluster-1-default-pool-6d42fa0a-wqtm status is now: NodeNotReady
4m56s Normal NodeNotReady node/gke-cluster-1-default-pool-6d42fa0a-1ckn Node gke-cluster-1-default-pool-6d42fa0a-1ckn status is now: NodeNotReady
4m Normal Deleting node gke-cluster-1-default-pool-6d42fa0a-wqtm because it does not exist in the cloud provider node/gke-cluster-1-default-pool-6d42fa0a-wqtm Node gke-cluster-1-default-pool-6d42fa0a-wqtm event: DeletingNode
3m55s Normal RemovingNode node/gke-cluster-1-default-pool-6d42fa0a-wqtm Node gke-cluster-1-default-pool-6d42fa0a-wqtm event: Removing Node gke-cluster-1-default-pool-6d42fa0a-wqtm from Controller
3m50s Normal Deleting node gke-cluster-1-default-pool-6d42fa0a-1ckn because it does not exist in the cloud provider node/gke-cluster-1-default-pool-6d42fa0a-1ckn Node gke-cluster-1-default-pool-6d42fa0a-1ckn event: DeletingNode
3m45s Normal RemovingNode node/gke-cluster-1-default-pool-6d42fa0a-1ckn Node gke-cluster-1-default-pool-6d42fa0a-1ckn event: Removing Node gke-cluster-1-default-pool-6d42fa0a-1ckn from Controller
Conclusion
By default, kube-system pods prevent CA from removing nodes on which they are running. Users can manually add PDBs for the kube-system pods that can be safely rescheduled elsewhere. It can be achieved using:
kubectl create poddisruptionbudget <pdb name> --namespace=kube-system --selector app=<app name> --max-unavailable 1
List of possible reasons why CA won't autoscale can be found in Cluster Autoscaler - Frequently Asked Questions.
To verify which pods could still block CA downscale, you can use Autoscaler Events.

Can kubernetes provide a verbose description of its scheduling decisions?

When scheduling a kubernetes Job and Pod, if the Pod can't be placed the explanation available from kubectl describe pods PODNAME looks like:
Warning FailedScheduling <unknown> default-scheduler 0/172 nodes are available:
1 Insufficient pods, 1 node(s) were unschedulable, 11 Insufficient memory,
30 Insufficient cpu, 32 node(s) didn't match node selector, 97 Insufficient nvidia.com/gpu.
That's useful but a little too vague. I'd like more detail than that.
Specifically can I list all nodes with the reason the pod wasn't scheduled to each particular node?
I was recently changing labels and the node selector and want to determine if I made a mistake somewhere in that process or if the nodes I need really are just busy.
You can find more details related to problems with scheduling particular Pod in kube-scheduler logs. If you set up your cluster with kubeadm tool, kube-scheduler as well as other key components of the cluster is deployed as a system Pod. You can list such Pods with the following command:
kubectl get pods -n kube-system
which will show you among others your kube-scheduler Pod:
NAME READY STATUS RESTARTS AGE
kube-scheduler-master-ubuntu-18-04 1/1 Running 0 2m37s
Then you can check its logs. In my example the command will look as follows:
kubectl logs kube-scheduler-master-ubuntu-18-04 -n kube-system
You should find there the information you need.
One more thing...
If you've already verified it, just ignore this tip
Let's start from the beginning...
I've just created a simple job from the example you can find here:
kubectl apply -f https://k8s.io/examples/controllers/job.yaml
job.batch/pi created
If I run:
kubectl get jobs
it shows me:
NAME COMPLETIONS DURATION AGE
pi 0/1 17m 17m
Hmm... completions 0/1 ? Something definitely went wrong. Let's check it.
kubectl describe job pi
tells me basically nothing. In it's events I can see only:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 18m job-controller Created pod: pi-zxp4p
as if everything went well... but we already know it didn't. So let's investigate further. As you probably know, job-controller creates Pods that run to completion to perform certain task. From the perspective of the job-controller everything went well (we've just seen it in it's events):
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 23m job-controller Created pod: pi-zxp4p
It did it's part of the task and reported that everything went fine. But it's just part of the whole task. It passed actual Pod creation task further to the kube-scheduler controller as being just a job-controller it isn't responsible (and doesn't even have enough privileges) to schedule the actual Pod on particular node. If we run:
kubectl get pods
we can see one Pod in a Pending state:
NAME READY STATUS RESTARTS AGE
pi-zxp4p 0/1 Pending 0 30m
Let's describe it:
kubectl describe pod pi-zxp4p
In events we can see some very important and specific info:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 20s (x24 over 33m) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
so now we know the actual reason why our Pod couldn't be scheduled.
Pay attention to different fields of the event:
From: default-scheduler - it means that the message was originated from our kube-scheduler.
Type: Warning, which isn't as important as Critical or Error so chances are that it may not appear in kube-scheduler logs if the last one was started with the default level of log verbosity.
You can read here that:
As per the comments, the practical default level is V(2). Developers
and QE environments may wish to run at V(3) or V(4). If you wish to
change the log level, you can pass in -v=X where X is the desired
maximum level to log.

AWS has per node Pod IP restrictions, pods are stuck at ContainerCreating state

As we all know, AWS has per node Pod IP restriction and kubernetes doesn't care this while scheduling, pods get scheduled in nodes where no pod IPs can be allocated and pods get stuck at ContainerCreating state as following:
Normal Scheduled 114s default-scheduler Successfully assigned default/whoami-deployment-9f9c86c4f-r4flx to ip-192-168-15-248.ec2.internal
Warning FailedCreatePodSandBox 111s kubelet, ip-192-168-15-248.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8d4b5f98f9b600ad9ec486f994fa2f9223d5224842df7f78802616f014b52970" network for pod "whoami-deployment-9f9c86c4f-r4flx": NetworkPlugin cni failed to set up pod "whoami-deployment-9f9c86c4f-r4flx_default" network: add cmd: failed to assign an IP address to container
Normal SandboxChanged 86s (x12 over 109s) kubelet, ip-192-168-15-248.ec2.internal Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 61s (x4 over 76s) kubelet, ip-192-168-15-248.ec2.internal (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e2a3c54ba7d9a33a45248f7c276f4a2d5b0c8ba6c3deb5184392156b35638553" network for pod "whoami-deployment-9f9c86c4f-r4flx": NetworkPlugin cni failed to set up pod "whoami-deployment-9f9c86c4f-r4flx_default" network: add cmd: failed to assign an IP address to container
So I tried overcoming the issue by tainting nodes with key=value:NoSchedule, so that default scheduler doesn't schedule pods to the nodes which already reached pod IP limit and deleted all pods which were stuck at ContainerCreating state. I was hoping that it will make the scheduler not to schedule any more pods to tainted nodes and that's what happened but, since pods are not scheduled I was also hoping, cluster-autoscaler will scale ASG and my pods will run on new nodes and that's what didn't happen.
When I do describe pod I see:
Warning FailedScheduling 40s (x5 over 58s) default-scheduler 0/5 nodes are available: 5 node(s) had taints that the pod didn't tolerate.
Normal NotTriggerScaleUp 5s (x6 over 56s) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 node(s) had taints that the pod didn't tolerate
When I look at cluster-autoscaler logs I see:
I1108 16:30:47.521026 1 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"whoami-deployment-9f9c86c4f-x5h4d", UID:"158cc806-0245-11ea-a67a-0efb4254edc4", APIVersion:"v1", ResourceVersion:"2483839", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 node(s) had taints that the pod didn't tolerate
Now, I tried an alternative way to mark my nodes unschedulable by removing the above NoSchedule taint and patching nodes by:
kubectl patch nodes node1.internal -p '{"spec": {"unschedulable": true}}'
And this is the logs I see in cluster-autoscaler:
I1109 10:47:50.894680 1 static_autoscaler.go:138] Starting main loop
W1109 10:47:50.894719 1 static_autoscaler.go:562] Cluster has no ready nodes.
I1109 10:47:50.901157 1 event.go:209] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"7c949105-0153-11ea-9a39-12e5fc698b6e", APIVersion:"v1", ResourceVersion:"2629645", FieldPath:""}): type: 'Warning' reason: 'ClusterUnhealthy' Cluster has no ready nodes.
So, my idea of overcoming the issue made no sense. How shall I overcome this?
Kubernetes version: 1.14
Cluster Autoscaler: 1.14.6
Let me know if you guys need more details.

istio-pilot on minikube is always in pending state

istio-pilot pod on minikube kubernetes cluster is always in Pending state. Increased CPU=4 and memory=8GB. Still the status of istio-pilot pod is Pending.
Is specific change required to run istio on minikube other than the ones mentioned in documentation?
Resolved the issue . Im running minikube with Virtual box and running minikube with higher memory and CPU does not reflect until minikube is deleted and started with new parameters. Without this it was resulting in Insufficient memory.
I saw istio-pilot in 1.1 rc3 consume a lot of CPU and was in Pending state due to the following message in kubectl describe <istio-pilot pod name> -n=istio-system:
Warning FailedScheduling 1m (x25 over 3m) default-scheduler 0/2 nodes are available:
1 Insufficient cpu, 1 node(s) had taints that the pod didn't tolerate.
I was able to reduce it by doing --set pilot.resources.requests.cpu=30m when installing istio using helm.
https://github.com/istio/istio/blob/1.1.0-rc.3/install/kubernetes/helm/istio/charts/pilot/values.yaml#L16