Horizontal Pod Autoscaler replicas based on the amount of the nodes in the cluster - kubernetes

Im looking for the solution that will scale out pods automatically when the nodes join the cluster and scale in back when the nodes are deleted.
We are running WebApp on the nodes and this require graceful pod eviction/termination when the node is scheduled to be disconnected.
I was checking the option of using the DaemonSet but since we are using Kops for the cluster rolling update it ignores DaemonSets evictions (flag "--ignore-daemionset" is not supported).
As a result the WebApp "dies" with the node which is not acceptable for our application.
The ability of HorizontalPodAutoscaler to overwrite the amount of replicas which are set in the deployment yaml could solve the problem.
I want to find the way to change the min/maxReplicas in HorizontalPodAutoscaler yaml dynamically based on the amount of nodes in the cluster.
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: MyWebApp
minReplicas: "Num of nodes in the cluster"
maxReplicas: "Num of nodes in the cluster"
Any ideas how to get the number of nodes and update HorizontalPodAutoscaler yaml in the cluster accordingly? Or any other solutions for the problem?

Have you tried usage of nodeSelector spec in daemonset yaml.
So if you have nodeselector set in yaml and just before drain if you remove the nodeselector label value from the node the daemonset should scale down gracefully also same when you add new node to cluster label the node with custom value and deamonset will scale up.
This works for me so you can try this and confirm with Kops
First : Label all you nodes with a custom label you will always have on your cluster
Example:
kubectl label nodes k8s-master-1 mylabel=allow_demon_set
kubectl label nodes k8s-node-1 mylabel=allow_demon_set
kubectl label nodes k8s-node-2 mylabel=allow_demon_set
kubectl label nodes k8s-node-3 mylabel=allow_demon_set
Then to your daemon set yaml add node selector.
Example.yaml used as below : Note added nodeselctor field
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
labels:
k8s-app: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
nodeSelector:
mylabel: allow_demon_set
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
So nodes are labeled as below
$ kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-master-1 Ready master 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master-1,kubernetes.io/os=linux,mylable=allow_demon_set,node-role.kubernetes.io/master=
k8s-node-1 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-1,kubernetes.io/os=linux,mylable=allow_demon_set
k8s-node-2 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-2,kubernetes.io/os=linux,mylable=allow_demon_set
k8s-node-3 Ready <none> 9d v1.17.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node-3,kubernetes.io/os=linux,mylable=allow_demon_set
Once you have correct yaml start the daemon set using it
$ kubectl create -f Example.yaml
$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-jrgl6 1/1 Running 0 20s 10.244.3.19 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 20s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 20s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 20s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 4 4 4 4 4 mylable=allow_demon_set 20s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Then before draining a node we can just remove the custom label from node and the pod-should scale down gracefully and then drain the node.
$ kubectl label nodes k8s-node-3 mylabel-
Check the daemonset and it should scale down
ubuntu#k8s-kube-client:~$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-jrgl6 0/1 Terminating 0 2m36s 10.244.3.19 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 2m36s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 2m36s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 2m36s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 3 3 3 3 3 mylable=allow_demon_set 2m36s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Now again add the label to new node with same custom label when it is added to cluster and the deamonset will scale up
$ kubectl label nodes k8s-node-3 mylable=allow_demon_set
ubuntu#k8s-kube-client:~$ kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/fluentd-elasticsearch-22rsj 1/1 Running 0 2s 10.244.3.20 k8s-node-3 <none> <none>
pod/fluentd-elasticsearch-rgcm2 1/1 Running 0 5m28s 10.244.0.6 k8s-master-1 <none> <none>
pod/fluentd-elasticsearch-wccr9 1/1 Running 0 5m28s 10.244.1.14 k8s-node-1 <none> <none>
pod/fluentd-elasticsearch-wxq5v 1/1 Running 0 5m28s 10.244.2.33 k8s-node-2 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 9d <none>
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
daemonset.apps/fluentd-elasticsearch 4 4 4 4 4 mylable=allow_demon_set 5m28s fluentd-elasticsearch quay.io/fluentd_elasticsearch/fluentd:v2.5.2 name=fluentd-elasticsearch
Kindly confirm if this what you want to do and works with kops

Related

Kubernetes daemonset creating two pods instead of one (expected)

I have the following local 2-node kubernetes cluster-
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
srl1 Ready control-plane,master 2d18h v1.21.2 xxx.xxx.12.58 <none> Ubuntu 20.04.2 LTS 5.4.0-80-generic docker://20.10.7
srl2 Ready <none> 2d18h v1.21.3 xxx.xxx.80.72 <none> Ubuntu 18.04.2 LTS 5.4.0-80-generic docker://20.10.2
I am trying to deploy an application on using a cluster creation python scirpt(https://github.com/hydro-project/cluster/blob/master/hydro/cluster/create_cluster.py)
When it tries to create a routing node using apps_client.create_namespaced_daemon_set(namespace=util.NAMESPACE, body=yml) it is expected that it should create a single pod from the routing-ds.yaml (given below) file and assign it to the routing daemonset (kind). However as you can see, it is creating two routing pods instead of one on every physical node. (FYI-my master can schedule pod)
akazad#srl1:~/hydro-project/cluster$ kubectl get all -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default pod/management-pod 1/1 Running 0 25m 192.168.190.77 srl2 <none> <none>
default pod/monitoring-pod 1/1 Running 0 25m 192.168.120.71 srl1 <none> <none>
default pod/routing-nodes-9q7dr 1/1 Running 0 24m xxx.xxx.12.58 srl1 <none> <none>
default pod/routing-nodes-kfbnv 1/1 Running 0 24m xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/calico-kube-controllers-7676785684-tpz7q 1/1 Running 0 2d19h 192.168.120.65 srl1 <none> <none>
kube-system pod/calico-node-lnxtb 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/calico-node-mdvpd 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/coredns-558bd4d5db-vfghf 1/1 Running 0 2d19h 192.168.120.66 srl1 <none> <none>
kube-system pod/coredns-558bd4d5db-x7jhj 1/1 Running 0 2d19h xxx.xxx.120.67 srl1 <none> <none>
kube-system pod/etcd-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-apiserver-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-controller-manager-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-proxy-l8fds 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
kube-system pod/kube-proxy-szrng 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
kube-system pod/kube-scheduler-srl1 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
metallb-system pod/controller-6b78bff7d9-t7gjr 1/1 Running 0 2d19h 192.168.190.65 srl2 <none> <none>
metallb-system pod/speaker-qsqnc 1/1 Running 0 2d19h xxx.xxx.12.58 srl1 <none> <none>
metallb-system pod/speaker-s4pp8 1/1 Running 0 2d19h xxx.xxx.80.72 srl2 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d19h <none>
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 2d19h k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
default daemonset.apps/routing-nodes 2 2 2 2 2 <none> 24m routing-container akazad1/srlanna:v2 role=routing
kube-system daemonset.apps/calico-node 2 2 2 2 2 kubernetes.io/os=linux 2d19h calico-node calico/node:v3.14.2 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 2 2 2 2 2 kubernetes.io/os=linux 2d19h kube-proxy k8s.gcr.io/kube-proxy:v1.21.3 k8s-app=kube-proxy
metallb-system daemonset.apps/speaker 2 2 2 2 2 kubernetes.io/os=linux 2d19h speaker quay.io/metallb/speaker:v0.10.2 app=metallb,component=speaker
However, when it is directly creating pods from the management-pod.yaml (given below), it is creating one as expected.
Why the dasemonset is creating two pods instead of one?
Code segment where it is supposed to create a daemonset of type routing node
for i in range(len(kinds)):
kind = kinds[i]
# Create should only be true when the DaemonSet is being created for the
# first time -- i.e., when this is called from create_cluster. After that,
# we can basically ignore this because the DaemonSet will take care of
# adding pods to created nodes.
if create:
fname = 'yaml/ds/%s-ds.yml' % kind
yml = util.load_yaml(fname, prefix)
for container in yml['spec']['template']['spec']['containers']:
env = container['env']
util.replace_yaml_val(env, 'ROUTING_IPS', route_str)
util.replace_yaml_val(env, 'ROUTE_ADDR', route_addr)
util.replace_yaml_val(env, 'SCHED_IPS', sched_str)
util.replace_yaml_val(env, 'FUNCTION_ADDR', function_addr)
util.replace_yaml_val(env, 'MON_IPS', mon_str)
util.replace_yaml_val(env, 'MGMT_IP', management_ip)
util.replace_yaml_val(env, 'SEED_IP', seed_ip)
apps_client.create_namespaced_daemon_set(namespace=util.NAMESPACE,
body=yml)
# Wait until all pods of this kind are running
res = []
while len(res) != expected_counts[i]:
res = util.get_pod_ips(client, 'role='+kind, is_running=True)
pods = client.list_namespaced_pod(namespace=util.NAMESPACE,
label_selector='role=' +
kind).items
created_pods = get_current_pod_container_pairs(pods)
I have removed the nodeSelector from all the yaml files as I am running it on bare-metal cluster.
1 routing-ds.yaml
14
15 apiVersion: apps/v1
16 kind: DaemonSet
17 metadata:
18 name: routing-nodes
19 labels:
20 role: routing
21 spec:
22 selector:
23 matchLabels:
24 role: routing
25 template:
26 metadata:
27 labels:
28 role: routing
29 spec:
30 #nodeSelector:
31 # role: routing
32
33 hostNetwork: true
34 containers:
35 - name: routing-container
36 image: akazad1/srlanna:v2
37 env:
38 - name: SERVER_TYPE
39 value: r
40 - name: MON_IPS
41 value: MON_IPS_DUMMY
42 - name: REPO_ORG
43 value: hydro-project
44 - name: REPO_BRANCH
45 value: master
2 management-pod.yaml
15 apiVersion: v1
16 kind: Pod
17 metadata:
18 name: management-pod
19 labels:
20 role: management
21 spec:
22 restartPolicy: Never
23 containers:
24 - name: management-container
25 image: hydroproject/management
26 env:
27 #- name: AWS_ACCESS_KEY_ID
28 #value: ACCESS_KEY_ID_DUMMY
29 #- name: AWS_SECRET_ACCESS_KEY
30 #value: SECRET_KEY_DUMMY
31 #- name: KOPS_STATE_STORE
32 # value: KOPS_BUCKET_DUMMY
33 - name: HYDRO_CLUSTER_NAME
34 value: CLUSTER_NAME
35 - name: REPO_ORG
36 value: hydro-project
37 - name: REPO_BRANCH
38 value: master
39 - name: ANNA_REPO_ORG
40 value: hydro-project
41 - name: ANNA_REPO_BRANCH
42 value: master
43 # nodeSelector:
44 #role: general
May you have misunderstanding you have to use the kind: deployment if you want to manage the replicas (PODs - 1,2,3...n) on Kubernetes.
Daemon set behavior is like it will run the POD on each available node in the cluster.
So inside your cluster, there are two nodes so daemon set will run the POD on each available node. If you will increase the Node deamon set will auto-create the POD on that node also.
kind: Pod
will create the single POD only which is its default behavior.
The following are some of the Kubernetes Objects:
pods
ReplicationController (Manages Pods)
Deployment (Manages Pods)
StatefulSets
DaemonSets
You can read more at : https://chkrishna.medium.com/kubernetes-objects-e0a8b93b5cdc
Official document : https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/
If you want to manage POD using any type of controller kind: deployment is best. you can scale the replicas and scale down. You can also mention the replicas in YAML 1,2,3 and that way it will be running on cluster based on number.

kubectl not communicating with minikube

Learning kubernetes with kubectl and minikube locally. I can see this via kubectl:
> kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/mongodb-deployment-8f6675bc5-tjwsb 1/1 Running 0 20s 10.1.43.14 chris-x1 <none> <none>
pod/mongo-express-78fcf796b8-9gbsd 1/1 Running 0 20s 10.1.43.15 chris-x1 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 29m <none>
service/mongo-express-service LoadBalancer 10.152.183.254 <pending> 8081:30000/TCP 20s app=mongo-express
service/mongodb-service ClusterIP 10.152.183.115 <none> 27017/TCP 20s app=mongodb
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/mongodb-deployment 1/1 1 1 20s mongodb mongo app=mongodb
deployment.apps/mongo-express 1/1 1 1 21s mongo-express mongo-express app=mongo-express
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/mongodb-deployment-8f6675bc5 1 1 1 20s mongodb mongo app=mongodb,pod-template-hash=8f6675bc5
replicaset.apps/mongo-express-78fcf796b8 1 1 1 21s mongo-express mongo-express app=mongo-express,pod-template-hash=78fcf796b8
But when I lanuch minikube dashboard I don't see any pods, deployments, services etc...? Its like they are running off of different clusters. If I paste the YAML configs directly into minikube dashboard, then I can see everything. So strange... Why?
I can use the minikube kubectl command, but that doesn't seem like thats how this should work.
Running Ubuntu 20, kubectl 1.20, and minikube 1.17.
Turns out I was set to run off of microk8s-cluster instead of minikube.
chris#chris-x1 /v/w/p/k8s> kubectl config view
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://127.0.0.1:16443
name: microk8s-cluster
contexts:
- context:
cluster: microk8s-cluster
user: admin
name: microk8s
current-context: microk8s
kind: Config
preferences: {}
users:
- name: admin
user:
token: REDACTED
So I needed to follow these steps to access a dashboard: https://microk8s.io/docs/addon-dashboard

Hashicorp Consul, Agent/Client access

I am trying to do Consul setup via Kubernetes, helm chart, https://www.consul.io/docs/k8s/helm
Based on my pre-Kubernetes knowledge: services, using Consul access via Consul Agent, running on each host and listening on hosts IP
Now, I deployed via Helm chart to Kubernetes cluster. First misunderstanding the terminology, Consul Agent vs Client in this setup? I presume it is the same
Now, set up:
Helm chart config (Terraform fragment), nothing specific to Clients/Agent's and their service:
global:
name: "consul"
datacenter: "${var.consul_config.datacenter}"
server:
storage: "${var.consul_config.storage}"
connect: false
syncCatalog:
enabled: true
default: true
k8sAllowNamespaces: ['*']
k8sDenyNamespaces: [${join(",", var.consul_config.k8sDenyNamespaces)}]
Pods, client/agent ones are DaemonSet, not in host network mode
kubectl get pods
NAME READY STATUS RESTARTS AGE
consul-8l587 1/1 Running 0 11h
consul-cfd8z 1/1 Running 0 11h
consul-server-0 1/1 Running 0 11h
consul-server-1 1/1 Running 0 11h
consul-server-2 1/1 Running 0 11h
consul-sync-catalog-8b688ff9b-klqrv 1/1 Running 0 11h
consul-vrmtp 1/1 Running 0 11h
Services
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
consul ExternalName <none> consul.service.consul <none> 11h
consul-dns ClusterIP 172.20.124.238 <none> 53/TCP,53/UDP 11h
consul-server ClusterIP None <none> 8500/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP 11h
consul-ui ClusterIP 172.20.131.29 <none> 80/TCP 11h
Question 1 Where is a service, to target Client (Agent) pods, but not Server's pods ? Did I miss it in helm chart?
My plan is, while I am not going to use Host (Kubernetes node) networking:
Find the Client/Agent service or make my own. So, it will be used by the Consul's user's. E.g., this service address I will specify for Consul template init pod of the Consul template. In the config consuming application
kubectl get pods --selector app=consul,component=client,release=consul
consul-8l587 1/1 Running 0 11h
consul-cfd8z 1/1 Running 0 11h
consul-vrmtp 1/1 Running 0 11h
Optional: will add a topologyKeys in to agent service, so each consumer will not cross host boundary
Question 2 Is it right approach? Or it is different for Consul Kubernetes deployments
You can use the Kubernetes downward API to inject the IP of host as an environment variable for your pod.
apiVersion: v1
kind: Pod
metadata:
name: consul-example
spec:
containers:
- name: example
image: 'consul:latest'
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
command:
- '/bin/sh'
- '-ec'
- |
export CONSUL_HTTP_ADDR="${HOST_IP}:8500"
consul kv put hello world
restartPolicy: Never
See https://www.consul.io/docs/k8s/installation/install#accessing-the-consul-http-api for more info.

k3s - can't access from one pod to another if pods on different master nodes (HighAvailability setup)

k3s - can't access from one pod to another if pods on different nodes
Update:
I've narrowed the issue down - it's pods that are on other master nodes that can't communicate with those on the original master
pods on rpi4-server1 - the original cluster - can communicate with pods on rpi-worker01 and rpi3-worker02
pods on rpi4-server2 are unable to communicate with the others
I'm trying to run a HighAvailability cluster with embedded DB and using flannel / vxlan
I'm trying to setup a project with 5 services in k3s
When all of the pods are contained on a single node, they work together fine.
As soon as I add other nodes into the system and pods are deployed to them, the links seem to break.
In troubleshooting I've exec'd into one of the pods and tried to curl another. When they are on the same node this works, if the second service is on another node it doesn't.
I'm sure this is something simple that I'm missing, but I can't work it out! Help appreciated.
Key details:
Using k3s and native traefik
Two rpi4s as servers (High Availability) and two rpi3s as worker nodes
metallb as loadbalancer
Two services - blah-interface and blah-svc are configured as LoadBalancer to allow external access. The others blah-server, n34 and test-apisas NodePort to support debugging, but only really need internal access
Info on nodes, pods and services....
pi#rpi4-server1:~/Projects/test_demo_2020/test_kube_config/testchart/templates $ sudo kubectl get nodes --all-namespaces -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
rpi4-server1 Ready master 11h v1.17.0+k3s.1 192.168.0.140 <none> Raspbian GNU/Linux 10 (buster) 4.19.75-v7l+ docker://19.3.5
rpi-worker01 Ready,SchedulingDisabled <none> 10h v1.17.0+k3s.1 192.168.0.41 <none> Raspbian GNU/Linux 10 (buster) 4.19.66-v7+ containerd://1.3.0-k3s.5
rpi3-worker02 Ready,SchedulingDisabled <none> 10h v1.17.0+k3s.1 192.168.0.142 <none> Raspbian GNU/Linux 10 (buster) 4.19.75-v7+ containerd://1.3.0-k3s.5
rpi4-server2 Ready master 10h v1.17.0+k3s.1 192.168.0.143 <none> Raspbian GNU/Linux 10 (buster) 4.19.75-v7l+ docker://19.3.5
pi#rpi4-server1:~/Projects/test_demo_2020/test_kube_config/testchart/templates $ sudo kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system helm-install-traefik-l2z6l 0/1 Completed 2 11h 10.42.0.2 rpi4-server1 <none> <none>
test-demo n34-5c7b9475cb-zjlgl 1/1 Running 1 4h30m 10.42.0.32 rpi4-server1 <none> <none>
kube-system metrics-server-6d684c7b5-5wgf9 1/1 Running 3 11h 10.42.0.26 rpi4-server1 <none> <none>
metallb-system speaker-62rkm 0/1 Pending 0 99m <none> rpi-worker01 <none> <none>
metallb-system speaker-2shzq 0/1 Pending 0 99m <none> rpi3-worker02 <none> <none>
metallb-system speaker-2mcnt 1/1 Running 0 99m 192.168.0.143 rpi4-server2 <none> <none>
metallb-system speaker-v8j9g 1/1 Running 0 99m 192.168.0.140 rpi4-server1 <none> <none>
metallb-system controller-65895b47d4-pgcs6 1/1 Running 0 90m 10.42.0.49 rpi4-server1 <none> <none>
test-demo blah-server-858ccd7788-mnf67 1/1 Running 0 64m 10.42.0.50 rpi4-server1 <none> <none>
default nginx2-6f4f6f76fc-n2kbq 1/1 Running 0 22m 10.42.0.52 rpi4-server1 <none> <none>
test-demo blah-interface-587fc66bf9-qftv6 1/1 Running 0 22m 10.42.0.53 rpi4-server1 <none> <none>
test-demo blah-svc-6f8f68f46-gqcbw 1/1 Running 0 21m 10.42.0.54 rpi4-server1 <none> <none>
kube-system coredns-d798c9dd-hdwn5 1/1 Running 1 11h 10.42.0.27 rpi4-server1 <none> <none>
kube-system local-path-provisioner-58fb86bdfd-tjh7r 1/1 Running 31 11h 10.42.0.28 rpi4-server1 <none> <none>
kube-system traefik-6787cddb4b-tgq6j 1/1 Running 0 4h50m 10.42.1.23 rpi4-server2 <none> <none>
default testdemo2020-testchart-6f8d44b496-2hcfc 1/1 Running 1 6h31m 10.42.0.29 rpi4-server1 <none> <none>
test-demo test-apis-75bb68dcd7-d8rrp 1/1 Running 0 7m13s 10.42.1.29 rpi4-server2 <none> <none>
pi#rpi4-server1:~/Projects/test_demo_2020/test_kube_config/testchart/templates $ sudo kubectl get svc --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 11h <none>
kube-system kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 11h k8s-app=kube-dns
kube-system metrics-server ClusterIP 10.43.74.118 <none> 443/TCP 11h k8s-app=metrics-server
kube-system traefik-prometheus ClusterIP 10.43.78.135 <none> 9100/TCP 11h app=traefik,release=traefik
test-demo blah-server NodePort 10.43.224.128 <none> 5055:31211/TCP 10h io.kompose.service=blah-server
default testdemo2020-testchart ClusterIP 10.43.91.7 <none> 80/TCP 10h app.kubernetes.io/instance=testdemo2020,app.kubernetes.io/name=testchart
test-demo traf-dashboard NodePort 10.43.60.155 <none> 8080:30808/TCP 10h io.kompose.service=traf-dashboard
test-demo test-apis NodePort 10.43.248.59 <none> 8075:31423/TCP 7h11m io.kompose.service=test-apis
kube-system traefik LoadBalancer 10.43.168.18 192.168.0.240 80:30688/TCP,443:31263/TCP 11h app=traefik,release=traefik
default nginx2 LoadBalancer 10.43.249.123 192.168.0.241 80:30497/TCP 92m app=nginx2
test-demo n34 NodePort 10.43.171.206 <none> 7474:30474/TCP,7687:32051/TCP 72m io.kompose.service=n34
test-demo blah-interface LoadBalancer 10.43.149.158 192.168.0.242 80:30634/TCP 66m io.kompose.service=blah-interface
test-demo blah-svc LoadBalancer 10.43.19.242 192.168.0.243 5005:30005/TCP,5006:31904/TCP,5002:30685/TCP 51m io.kompose.service=blah-svc
Hi you issue could be related to the following issue.
After configuring the network under /etc/systemd/network/eth0.network (filename may differ in your case, since i am using arch linux on all pis)
[Match]
Name=eth0
[Network]
Address=x.x.x.x/24 # ip of node
Gateway=x.x.x.x # ip of gateway router
Domains=default.svc.cluster.local svc.cluster.local cluster.local
DNS=10.x.x.x # k3s dns ip x.x.x.x # ip of gateway router
After that I removed the 10.x.x.x routes with ip route del 10.x.x.x dev [flannel|cni0] on every node and restarted them.

How to reserve certain worker nodes for a namespace

I would like to reserve some worker nodes for a namespace. I see the notes of stackflow and medium
How to assign a namespace to certain nodes?
https://medium.com/#alejandro.ramirez.ch/reserving-a-kubernetes-node-for-specific-nodes-e75dc8297076
I understand we can use taint and nodeselector to achieve that.
My question is if people get to know the details of nodeselector or taint, how can we prevent them to deploy pods into these dedicated worker nodes.
thank you
To accomplish what you need, basically you have to use taint.
Let's suppose you have a Kubernetes cluster with one Master and 2 Worker nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
knode01 Ready <none> 8d v1.16.2
knode02 Ready <none> 8d v1.16.2
kubemaster Ready master 8d v1.16.2
As example I'll setup knode01 as Prod and knode02 as Dev.
$ kubectl taint nodes knode01 key=prod:NoSchedule
$ kubectl taint nodes knode02 key=dev:NoSchedule
To run a pod into these nodes, we have to specify a toleration in spec session on you yaml file:
apiVersion: v1
kind: Pod
metadata:
name: pod1
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
tolerations:
- key: "key"
operator: "Equal"
value: "dev"
effect: "NoSchedule"
This pod (pod1) will always run in knode02 because it's setup as dev. If we want to run it on prod, our tolerations should look like that:
tolerations:
- key: "key"
operator: "Equal"
value: "prod"
effect: "NoSchedule"
Since we have only 2 nodes and both are specified to run only prod or dev, if we try to run a pod without specifying tolerations, the pod will enter on a pending state:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod0 1/1 Running 0 21m 192.168.25.156 knode01 <none> <none>
pod1 1/1 Running 0 20m 192.168.32.83 knode02 <none> <none>
pod2 1/1 Running 0 18m 192.168.25.157 knode01 <none> <none>
pod3 1/1 Running 0 17m 192.168.32.84 knode02 <none> <none>
shell-demo 0/1 Pending 0 16m <none> <none> <none> <none>
To remove a taint:
$ kubectl taint nodes knode02 key:NoSchedule-
This is how it can be done
Add new label, say, ns=reserved, label to a specific worker node
Add taint and tolerations to target specific pods on to this worker node
You need to define RBAC roles and role bindings in that namespace to control what other users can do