How to reserve certain worker nodes for a namespace - kubernetes

I would like to reserve some worker nodes for a namespace. I see the notes of stackflow and medium
How to assign a namespace to certain nodes?
https://medium.com/#alejandro.ramirez.ch/reserving-a-kubernetes-node-for-specific-nodes-e75dc8297076
I understand we can use taint and nodeselector to achieve that.
My question is if people get to know the details of nodeselector or taint, how can we prevent them to deploy pods into these dedicated worker nodes.
thank you

To accomplish what you need, basically you have to use taint.
Let's suppose you have a Kubernetes cluster with one Master and 2 Worker nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
knode01 Ready <none> 8d v1.16.2
knode02 Ready <none> 8d v1.16.2
kubemaster Ready master 8d v1.16.2
As example I'll setup knode01 as Prod and knode02 as Dev.
$ kubectl taint nodes knode01 key=prod:NoSchedule
$ kubectl taint nodes knode02 key=dev:NoSchedule
To run a pod into these nodes, we have to specify a toleration in spec session on you yaml file:
apiVersion: v1
kind: Pod
metadata:
name: pod1
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
tolerations:
- key: "key"
operator: "Equal"
value: "dev"
effect: "NoSchedule"
This pod (pod1) will always run in knode02 because it's setup as dev. If we want to run it on prod, our tolerations should look like that:
tolerations:
- key: "key"
operator: "Equal"
value: "prod"
effect: "NoSchedule"
Since we have only 2 nodes and both are specified to run only prod or dev, if we try to run a pod without specifying tolerations, the pod will enter on a pending state:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod0 1/1 Running 0 21m 192.168.25.156 knode01 <none> <none>
pod1 1/1 Running 0 20m 192.168.32.83 knode02 <none> <none>
pod2 1/1 Running 0 18m 192.168.25.157 knode01 <none> <none>
pod3 1/1 Running 0 17m 192.168.32.84 knode02 <none> <none>
shell-demo 0/1 Pending 0 16m <none> <none> <none> <none>
To remove a taint:
$ kubectl taint nodes knode02 key:NoSchedule-

This is how it can be done
Add new label, say, ns=reserved, label to a specific worker node
Add taint and tolerations to target specific pods on to this worker node
You need to define RBAC roles and role bindings in that namespace to control what other users can do

Related

Controll job deletions

We have a cronjob monitoring in our cluster. If a pod did not appear in 24 hours, It means that the cronjob haven't ran and we need to alert. But sometimes, due to some garbage collection, pod is deleted (but job completed successfully). How to keep all pods and avoid garbage collection? I know about finalizers, but looks like It's not working in this case.
Posting this as answer since it's a reason why it can happen.
Answer
Cloud kubernetes clusters have nodes autoscaling policies. Or sometimes node pools can be scaled down/up manually.
Cronjob creates job for each run which in turn creates a corresponding pod. Pods are assigned to exact nodes. And if for any reason node with assigned to it pod(s) was removed due to node autoscaling/manual scaling, pods will be gone. However jobs will be preserved since they are stored in etcd.
There are two flags which control amount of jobs stored in the history:
.spec.successfulJobsHistoryLimit - which is by default set to 3
.spec.failedJobsHistoryLimit - set by default to 1
If setting up 0 then everything will be removed right after jobs finish.
Jobs History Limits
How it happens in fact
I have a GCP GKE cluster with two nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-xxxx Ready <none> 15h v1.21.3-gke.2001
gke-cluster-yyyy Ready <none> 3d20h v1.21.3-gke.2001
cronjob.yaml for testing:
apiVersion: batch/v1
kind: CronJob
metadata:
name: test-cronjob
spec:
schedule: "*/2 * * * *"
successfulJobsHistoryLimit: 5
jobTemplate:
spec:
template:
spec:
containers:
- name: test
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Pods created:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-cronjob-27253914-mxnzg 0/1 Completed 0 8m59s 10.24.0.22 gke-cluster-4-xxxx <none> <none>
test-cronjob-27253916-88cjn 0/1 Completed 0 6m59s 10.24.0.25 gke-cluster-4-xxxx <none> <none>
test-cronjob-27253918-hdcg9 0/1 Completed 0 4m59s 10.24.0.29 gke-cluster-4-xxxx <none> <none>
test-cronjob-27253920-shnnp 0/1 Completed 0 2m59s 10.24.1.15 gke-cluster-4-yyyy <none> <none>
test-cronjob-27253922-cw5gp 0/1 Completed 0 59s 10.24.1.18 gke-cluster-4-yyyy <none> <none>
Scaling down one node:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-4-xxxx NotReady,SchedulingDisabled <none> 16h v1.21.3-gke.2001
gke-cluster-4-yyyy Ready <none> 3d21h v1.21.3-gke.2001
And getting pods now:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-cronjob-27253920-shnnp 0/1 Completed 0 7m47s 10.24.1.15 gke-cluster-4-yyyy <none> <none>
test-cronjob-27253922-cw5gp 0/1 Completed 0 5m47s 10.24.1.18 gke-cluster-4-yyyy <none> <none>
Previously completed pods on the first node are gone now.
Jobs are still in place:
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
test-cronjob-27253914 1/1 1s 13m
test-cronjob-27253916 1/1 2s 11m
test-cronjob-27253918 1/1 1s 9m55s
test-cronjob-27253920 1/1 34s 7m55s
test-cronjob-27253922 1/1 2s 5m55s
How it can be solved
Changing monitoring alert to look for jobs completion is much more precise method and independent to any cluster nodes scaling actions.
E.g. I still can retrieve a result from job test-cronjob-27253916 where corresponding pod to it is deleted:
$ kubectl get job test-cronjob-27253916 -o jsonpath='{.status.succeeded'}
1
Useful links:
Jobs history limits
Garbage collection in kubernetes
TTL controller for finished resources

Pods affinity to different nodes in StatefulSet

I am creating StatefulSets and I want pods within one StatefulSet to be distributed across different nodes of the k8s cluster. In my case - one StatefulSet is one database replicaset.
sts.Spec.Template.Labels["mydb.io/replicaset-uuid"] = replicasetUUID.String()
sts.Spec.Template.Spec.Affinity.PodAntiAffinity = &corev1.PodAntiAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: []corev1.PodAffinityTerm{
{
LabelSelector: &metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{
Key: "mydb.io/replicaset-uuid",
Operator: metav1.LabelSelectorOpIn,
Values: []string{replicasetUUID.String()},
},
},
},
TopologyKey: "kubernetes.io/hostname",
},
},
}
However, with these settings, I get the opposite. storage-0-0 and storage-0-1 are on the same replicaset and on the same node...
Moreover, they have exactly the same label mydb.io/replicaset-uuid
$ kubectl -n mydb get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
storage-0-0 1/1 Running 0 40m x.x.x.x kubernetes-cluster-x-main-0 <none> <none>
storage-0-1 1/1 Running 0 39m x.x.x.x kubernetes-cluster-x-main-0 <none> <none>
storage-1-0 1/1 Running 0 40m x.x.x.x kubernetes-cluster-x-slave-0 <none> <none>
storage-1-1 1/1 Running 0 40m x.x.x.x kubernetes-cluster-x-slave-0 <none> <none>
mydb-operator-58c9bfbb9b-7djml 1/1 Running 0 46m x.x.x.x kubernetes-cluster-x-slave-0 <none> <none>
It works correctly, as #jesmart wrote in the comment:
The description of the problem works correctly I just indicated the wrong image with the application
I suggest using an podAntiAffinity rule in the statefulset definition to deploy your application so that no two instances are located on the same host.
Reference: An example of a pod that uses pod affinity

Hashicorp Consul, Agent/Client access

I am trying to do Consul setup via Kubernetes, helm chart, https://www.consul.io/docs/k8s/helm
Based on my pre-Kubernetes knowledge: services, using Consul access via Consul Agent, running on each host and listening on hosts IP
Now, I deployed via Helm chart to Kubernetes cluster. First misunderstanding the terminology, Consul Agent vs Client in this setup? I presume it is the same
Now, set up:
Helm chart config (Terraform fragment), nothing specific to Clients/Agent's and their service:
global:
name: "consul"
datacenter: "${var.consul_config.datacenter}"
server:
storage: "${var.consul_config.storage}"
connect: false
syncCatalog:
enabled: true
default: true
k8sAllowNamespaces: ['*']
k8sDenyNamespaces: [${join(",", var.consul_config.k8sDenyNamespaces)}]
Pods, client/agent ones are DaemonSet, not in host network mode
kubectl get pods
NAME READY STATUS RESTARTS AGE
consul-8l587 1/1 Running 0 11h
consul-cfd8z 1/1 Running 0 11h
consul-server-0 1/1 Running 0 11h
consul-server-1 1/1 Running 0 11h
consul-server-2 1/1 Running 0 11h
consul-sync-catalog-8b688ff9b-klqrv 1/1 Running 0 11h
consul-vrmtp 1/1 Running 0 11h
Services
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
consul ExternalName <none> consul.service.consul <none> 11h
consul-dns ClusterIP 172.20.124.238 <none> 53/TCP,53/UDP 11h
consul-server ClusterIP None <none> 8500/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP 11h
consul-ui ClusterIP 172.20.131.29 <none> 80/TCP 11h
Question 1 Where is a service, to target Client (Agent) pods, but not Server's pods ? Did I miss it in helm chart?
My plan is, while I am not going to use Host (Kubernetes node) networking:
Find the Client/Agent service or make my own. So, it will be used by the Consul's user's. E.g., this service address I will specify for Consul template init pod of the Consul template. In the config consuming application
kubectl get pods --selector app=consul,component=client,release=consul
consul-8l587 1/1 Running 0 11h
consul-cfd8z 1/1 Running 0 11h
consul-vrmtp 1/1 Running 0 11h
Optional: will add a topologyKeys in to agent service, so each consumer will not cross host boundary
Question 2 Is it right approach? Or it is different for Consul Kubernetes deployments
You can use the Kubernetes downward API to inject the IP of host as an environment variable for your pod.
apiVersion: v1
kind: Pod
metadata:
name: consul-example
spec:
containers:
- name: example
image: 'consul:latest'
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
command:
- '/bin/sh'
- '-ec'
- |
export CONSUL_HTTP_ADDR="${HOST_IP}:8500"
consul kv put hello world
restartPolicy: Never
See https://www.consul.io/docs/k8s/installation/install#accessing-the-consul-http-api for more info.

Horizontal Pod Autoscaler replicas based on the amount of the nodes in the cluster

Im looking for the solution that will scale out pods automatically when the nodes join the cluster and scale in back when the nodes are deleted.
We are running WebApp on the nodes and this require graceful pod eviction/termination when the node is scheduled to be disconnected.
I was checking the option of using the DaemonSet but since we are using Kops for the cluster rolling update it ignores DaemonSets evictions (flag "--ignore-daemionset" is not supported).
As a result the WebApp "dies" with the node which is not acceptable for our application.
The ability of HorizontalPodAutoscaler to overwrite the amount of replicas which are set in the deployment yaml could solve the problem.
I want to find the way to change the min/maxReplicas in HorizontalPodAutoscaler yaml dynamically based on the amount of nodes in the cluster.
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: MyWebApp
minReplicas: "Num of nodes in the cluster"
maxReplicas: "Num of nodes in the cluster"
Any ideas how to get the number of nodes and update HorizontalPodAutoscaler yaml in the cluster accordingly? Or any other solutions for the problem?
Have you tried usage of nodeSelector spec in daemonset yaml.
So if you have nodeselector set in yaml and just before drain if you remove the nodeselector label value from the node the daemonset should scale down gracefully also same when you add new node to cluster label the node with custom value and deamonset will scale up.
This works for me so you can try this and confirm with Kops
First : Label all you nodes with a custom label you will always have on your cluster
Example:
kubectl label nodes k8s-master-1 mylabel=allow_demon_set
kubectl label nodes k8s-node-1 mylabel=allow_demon_set
kubectl label nodes k8s-node-2 mylabel=allow_demon_set
kubectl label nodes k8s-node-3 mylabel=allow_demon_set
Then to your daemon set yaml add node selector.
Example.yaml used as below : Note added nodeselctor field
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
nodeSelector:
mylabel: allow_demon_set
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
So nodes are labeled as below
$ kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-master-1 Ready master 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master-1,kubernetes.io/os=linux,mylable=allow_demon_set,node-role.kubernetes.io/master=
k8s-node-1 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,mylable=allow_demon_set
k8s-node-2 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,mylable=allow_demon_set
k8s-node-3 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-3,kubernetes.io/os=linux,mylable=allow_demon_set
Once you have correct yaml start the daemon set using it
$ kubectl create -f Example.yaml
$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-jrgl6 1/1 Running 0 20s 10.244.3.19 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 20s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 20s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 20s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 4 4 4 4 4 mylable=allow_demon_set 20s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Then before draining a node we can just remove the custom label from node and the pod-should scale down gracefully and then drain the node.
$ kubectl label nodes k8s-node-3 mylabel-
Check the daemonset and it should scale down
ubuntu#k8s-kube-client:~$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-jrgl6 0/1 Terminating 0 2m36s 10.244.3.19 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 2m36s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 2m36s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 2m36s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 3 3 3 3 3 mylable=allow_demon_set 2m36s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Now again add the label to new node with same custom label when it is added to cluster and the deamonset will scale up
$ kubectl label nodes k8s-node-3 mylable=allow_demon_set
ubuntu#k8s-kube-client:~$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-22rsj 1/1 Running 0 2s 10.244.3.20 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 5m28s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 5m28s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 5m28s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 4 4 4 4 4 mylable=allow_demon_set 5m28s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Kindly confirm if this what you want to do and works with kops

Kubernetes coredns pods stuck in Pending status. Cannot start the dashboard

I am building a Kubernetes cluster following this tutorial, and I have troubles to access the Kubernetes dashboard. I already created another question about it that you can see here, but while digging up into my cluster, I think that the problem might be somewhere else and that's why I create a new question.
I start my master, by running the following commands:
> kubeadm reset
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later
> kubectl apply -f https://git.io/weave-kube/
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 2m46s
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 2m46s
etcd-kubemaster 1/1 Running 0 93s
kube-apiserver-kubemaster 1/1 Running 0 93s
kube-controller-manager-kubemaster 1/1 Running 0 113s
kube-proxy-lxhvs 1/1 Running 0 2m46s
kube-scheduler-kubemaster 1/1 Running 0 93s
Here we can see that I have two coredns pods stuck in Pending state forever, and when I run the command :
> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq
I can see in the Events part the following Warning :
Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.
Since it is a Warning and not and Error, and that as a Kubernetes newbie, taints does not mean much to me, I tried to connect a node to the master (using the previously saved command) :
> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
--discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]
> ssh [USER]#[WORKER_IP] 'bash' < join.sh
This node has joined the cluster.
On the master, I check that the node is connected:
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady master 13m v1.14.1
kubeslave1 NotReady <none> 31s v1.14.1
And I check my pods :
> kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 14m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 14m
etcd-kubemaster 1/1 Running 0 13m
kube-apiserver-kubemaster 1/1 Running 0 13m
kube-controller-manager-kubemaster 1/1 Running 0 13m
kube-proxy-lxhvs 1/1 Running 0 14m
kube-proxy-xllx4 0/1 ContainerCreating 0 2m16s
kube-scheduler-kubemaster 1/1 Running 0 13m
We can see that another kube-proxy pod have been created and is stuck in ContainerCreating status.
And when I am doing a describe again :
kubectl -n kube-system describe pod kube-proxy-xllx4
I can see in the Events part multiple identical Warnings :
Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused
Here are my repositories :
docker image ls
REPOSITORY TAG
k8s.gcr.io/kube-proxy v1.14.1
k8s.gcr.io/kube-apiserver v1.14.1
k8s.gcr.io/kube-controller-manager v1.14.1
k8s.gcr.io/kube-scheduler v1.14.1
k8s.gcr.io/coredns 1.3.1
k8s.gcr.io/etcd 3.3.10
k8s.gcr.io/pause 3.1
And so, for the dashboard part, I tried to start it with the command
> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml
But the dashboard pod is stuck in Pending state.
kubectl -n kube-system get pod
NAME READY STATUS RESTARTS AGE
coredns-fb8b8dccf-kb2zq 0/1 Pending 0 40m
coredns-fb8b8dccf-nnc5n 0/1 Pending 0 40m
etcd-kubemaster 1/1 Running 0 38m
kube-apiserver-kubemaster 1/1 Running 0 38m
kube-controller-manager-kubemaster 1/1 Running 0 39m
kube-proxy-lxhvs 1/1 Running 0 40m
kube-proxy-xllx4 0/1 ContainerCreating 0 27m
kube-scheduler-kubemaster 1/1 Running 0 38m
kubernetes-dashboard-5f7b999d65-qn8qn 1/1 Pending 0 8s
So, event though my problem originaly was that I cannot access to my dashboard, I guess that the real problem is deeper thant that.
I know that I just put a lot of information here, but I am a k8s beginner and I am completely lost on this.
There is an issue I experienced with coredns pods stuck in a pending mode when setting up your own cluster; which I resolve by adding pod network.
Looks like because there is no Network Addon installed, the nodes are taint as not-ready. Installing the Addon would remove the taints and the Pods will be able to schedule. In my case adding flannel fixed the issue.
EDIT: There is a note about this in the official k8s documentation - Create cluster with kubeadm:
The network must be deployed before any applications. Also, CoreDNS
will not start up before a network is installed. kubeadm only
supports Container Network Interface (CNI) based networks (and does
not support kubenet).
Actually it is the opposite of a deep or serious issue. This is a trivial issue. Always you see a pod stuck on Pending state, it means the scheduler is having a hard time to schedule the pod; mostly because there are no enough resources on the node.
In your case it is a taint that has the node, and your pod doesn't have the toleration. What you have to do is to describe the node and get the taint:
kubectl describe node | grep -i taints
Note: you might have more then one taint. So you might want to do kubectl describe no NODE since with grep you will only see one taint.
Once you get the taint, that will be something like hello=world:NoSchedule; which means key=value:effect, you will have to add a toleration section in your Deployment. This is an example Deployment so you can see how it should look like:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 10
strategy:
type: Recreate
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
ports:
- containerPort: 80
name: http
tolerations:
- effect: NoExecute #NoSchedule, PreferNoSchedule
key: node
operator: Equal
value: not-ready
tolerationSeconds: 3600
As you can see there is the toleration section in the yaml. So, if I would have a node with node=not-ready:NoExecute taint, no pod would be able to be scheduled on that node, unless would have this toleration.
Also you can remove the taint, if you don need it. To remove a taint you would describe the node, get the key of the taint and do:
kubectl taint node NODE key-
Hope it makes sense. Just add this section to your deployment, and it will work.
Set up the flannel network tool.
Running commands:
$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f
https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml