How to deregister a kubernetes node from a kubernetes cluster - kubernetes

I have a node mistakenly registered on a cluster B while it is actually serving for cluster A.
Here 'registered on a cluster B' means I can see the node from kubectl get node from cluster B.
I want to deregister this node from cluster B, but keep the node intact.
I know regular process to delete a node is:
kubectl drain node xxx
kubectl delete node xxx
# on node
kubeadm reset
But I do not want pods on the node from cluster A to be deleted or transfered. And I want to make sure the node would not self-register to cluster B afterwards.
To be clear, let's say, cluster A has Pod A on the node, cluster B has Pod B on the node as well, I want to delete node from cluster B, but keep Pod A intact. (By the way, can I see Pod A from cluster B?)
Thank you in advance!

To deregister the node without removing any pod you run below command
kubectl delete node nodename
After this is done the node will not appear in kubectl get nodes
For the node to not self register again stop the kubelet process on that node by logging into that node and using below command.
systemctl stop kubelet

As this case has been already clarified I decided to publish a Community Wiki answer based on the following comment:
#mario nvm, I thought different clusters in one node affect each
other, actually they do not, they just share container runtime which
is more like 'read-only', and they have different kubelets of
themselves listening on different port. – Li Ziyan Aug 17 at 5:29
to make it clear also for other users what was actually the issue here and how it has been solved or simply clarified.
So if you design your infrastructure in such a way that you use one physical (or virtual) machine as Node for more than one kubernetes clusters (which I believe is not very common case) the infrastructure looks as follows:
Components that are shared:
physical (or virtual) node
common container runtime environment (e.g. docker)
Components that are separate:
two separate kubelets. Although they are running on the same physical/virtual node they are configured to listen on different ports and are registered within two master Nodes (or more specifically two different kube-apiservers being part of two different kubernetes control planes)
two logically separate, independent kubernetes Nodes which, although they are configured on the same physical node/host, are logically completely separate kubernetes Nodes, being part of two completely different kubernetes clusters that don't interfere with each other in any way.
I hope it helps to clarify possible confusion about this question and maybe help someone in case they have similar doubts.

Related

Running other non-cluster containers on k8s node

I have a k8s cluster that runs the main workload and has a lot of nodes.
I also have a node (I call it the special node) that some of special container are running on that that is NOT part of the cluster. The node has access to some resources that are required for those special containers.
I want to be able to manage containers on the special node along with the cluster, and make it possible to access them inside the cluster, so the idea is to add the node to the cluster as a worker node and taint it to prevent normal workloads to be scheduled on it, and add tolerations on the pods running special containers.
The idea looks fine, but there may be a problem. There will be some other containers and non-container daemons and services running on the special node that are not managed by the cluster (they belong to other activities that have to be separated from the cluster). I'm not sure that will be a problem, but I have not seen running non-cluster containers along with pod containers on a worker node before, and I could not find a similar question on the web about that.
So please enlighten me, is it ok to have non-cluster containers and other daemon services on a worker node? Does is require some cautions, or I'm just worrying too much?
Ahmad from the above description, I could understand that you are trying to deploy a kubernetes cluster using kudeadm or minikube or any other similar kind of solution. In this you have some servers and in those servers one is having some special functionality like GPU etc., for deploying your special pods you can use node selector and I hope you are already doing this.
Coming to running separate container runtime on one of these nodes you need to consider two points mainly
This can be done and if you didn’t integrated the container runtime with
kubernetes it will be one more software that is running on your server
let’s say you used kubeadm on all the nodes and you want to run docker
containers this will be separate provided you have drafted a proper
architecture and configured separate isolated virtual network
accordingly.
Now comes the storage part, you need to create separate storage volumes
for kubernetes and container runtime separately because if any one
software gets failed or corrupted it should not affect the second one and
also for providing the isolation.
If you maintain proper isolation starting from storage to network then you can run both kubernetes and container runtime separately however it is not a suggested way of implementation for production environments.

What is minikube config specifying?

According to the minikube handbook the configuration commands are used to "Configure your cluster". But what does that mean?
If I set cpus and memory then are these the max values the cluster as a whole can ever consume?
Are these the values it will reserve on the host machine in preparation for use?
Are these the values that are handed to the control plane container/VM and now I have to specify more resources when making a worker node?
What if I want to add another machine (VM or bare metal) and add its resources in the form of a worker node to the cluster? From the looks of it I would have to delete that cluster, change the configuration, then start a new cluster with the new configuration. That doesn't seem scalable.
Thanks for the help in advance.
Answering the question:
If I set cpus and memory then are these the max values the cluster as a whole can ever consume?
In short. It will be a limit for the whole resource (either a VM, a container, etc. depending on a --driver used). It will be used for the underlying OS, Kubernetes components and the workload that you are trying to run on it.
Are these the values it will reserve on the host machine in preparation for use?
I'd reckon this would be related to the --driver you are using and how its handling the resources. I personally doubt it's reserving the 100% of CPU and memory you've passed in the $ minikube start and I'm more inclined to the idea that it uses how much it needs during specific operations.
Are these the values that are handed to the control plane container/VM and now I have to specify more resources when making a worker node?
By default, when you create a minikube instance with: $ minikube start ... you will create a single node cluster capable of being a control-plane node and a worker node simultaneously. You will be able to run your workloads (like an nginx-deployment without adding additional node).
You can add a node to your minikube ecosystem with just: $ minikube node add. This will make another node marked as a worker (with no control-plane components). You can read more about it here:
Minikube.sigs.k8s.io: Docs: Tutorials: Multi node
What if I want to add another machine (VM or bare metal) and add its resources in the form of a worker node to the cluster? From the looks of it I would have to delete that cluster, change the configuration, then start a new cluster with the new configuration. That doesn't seem scalable.
As said previously, you don't need to delete the minikube cluster to add another node. You can run $ minikube node add to add a node on a minikube host. There are also options to delete/stop/start nodes.
Personally speaking if the workload that you are trying to run requires multiple nodes, I would try to consider other Kubernetes cluster built on top/with:
Kubeadm
Kubespray
Microk8s
This would allow you to have more flexibility on where you want to create your Kubernetes cluster (as far as I know, minikube works within a single host (like your laptop for example)).
A side note!
There is an answer (written more than 2 years ago) which shows the way to add a Kubernetes cluster node to a minikube here :
Stackoverflow.com: Answer: How do I get the minikube nodes in a local cluster
Additional resources:
Kubernetes.io: Docs: Setup: Production environment: Tools: Kubeadm: Create cluster kubeadm
Github.com: Kubernetes sigs: Kubespray
Microk8s.io

How does kube-proxy behave when it can't reach the master?

From what I've read about Kubernetes, if the master(s) die, the workers should still be able to function as normal (https://stackoverflow.com/a/39173007/281469), although no new scheduling will occur.
However, I've found this to not be the case when the master can also schedule worker pods. Take a 2-node cluster, where one node is a master and the other a worker, and the master has the taints removed:
If I shut down the master and docker exec into one of the containers on the worker I can see that:
nc -zv ip-of-pod 80
succeeds, but
nc -zv ip-of-service 80
fails half of the time. The Kubernetes version is v1.15.10, using iptables mode for kube-proxy.
I'm guessing that since the kube-proxy on the worker node can't connect to the apiserver, it will not remove the master node from the iptables rules.
Questions:
Is it expected behaviour that kube-proxy won't stop routing to pods on master nodes, or is there something "broken"?
Are any workarounds available for this kind of setup to allow the worker nodes to still function correctly?
I realise the best thing to do is separate the CP nodes but that's not viable for what I'm working on at the moment.
Is it expected behaviour that kube-proxy won't stop routing to pods on
master nodes, or is there something "broken"?
Are any workarounds
available for this kind of setup to allow the worker nodes to still
function correctly?
The cluster master plays the role of decision maker for the various activities in cluster's nodes. This can include scheduling workloads, managing the workloads' lifecycle, scaling etc.. Each node is managed by the master components and contains the services necessary to run pods. The services on a node typically includes the kube-proxy, container runtime and kubelet.
The kube-proxy component enforces network rules on nodes and helps kubernetes in managing the connectivity among Pods and Services. Also, the kube-proxy, acts as an egress-based load-balancing controller which keeps monitoring the the kubernetes API server and continually updates node's iptables subsystem based on it.
In simple terms, the master node only is aware of everything and is in charge of creating the list of routing rules as well based on node addition or deletion etc. kube-proxy plays a kind of enforcer whereby it takes charge of checking with master, syncing the information and enforcing the rules on the list.
If the master node(API server) is down, the cluster will not be able to respond to API commands or deploy nodes. If another master node is not available, there shall be no one else available who can instruct the worker nodes on change in work allocation and hence they shall continue to execute the operations that were earlier scheduled by the master until the time the master node is back and gives different instructions. Inline to it, kube-proxy shall also be unable to get the latest rules by sync up with master, however it shall not stop routing and shall continue to handle the networking and routing functionalities (uses the earlier iptable rules that were determined before the master node went down) that shall allow network communication to your pods provided all pods in worker nodes are still up and running.
Single master node based architecture is not a preferred deployment architecture for production. Considering that resilience and reliability is one of the major business goal of kubernetes, it is recommended as a best practice to have HA cluster based architecture to avoid single point of failure.
Once you remove taints, kubernetes scheduler don't need any tolerations to schedule pods on your master node. So it is as good as your worker node with control plane components running on it and you can also run your workload pods on this node (although its not a recommended practice).
Kube-proxy (https://kubernetes.io/docs/concepts/overview/components/#kube-proxy) is the component deployed on all the nodes of cluster and it handles the networking and routing connection to your pods. So, even if your master node is down kube-proxy still works fine on the worker node and it will route traffic to your pods running on worker node.
If all your pods are running in worker nodes (which are still up and running), then kube-proxy will continue to route traffic to your pods even via service.
There is nothing inherent in Kubernetes that would cause this. The master node role is just for humans, and if you've removed the taints then the nodes are just normal nodes. That said, remember that usual rules about scheduling and resource requests apply so if your pods don't all fit then things wouldn't be scheduled. It's possible your Kubernetes deploy system set up more specialized firewall rules or similar around the control plane nodes, but that would be dependent on that system.

kubernetes - can we create 2 node master-only cluster with High availability

I am new to the Kubernetes and cluster.
I would like to bring up an High Availability Master Only Kubernetes Cluster(Need Not to!).
I have the 2 Instances/Servers running Kubernetes daemon, and running different kind of pods on both the Nodes.
Now I would like to somehow create the cluster and if the one of the host(2) down, then all the pods from that host(2) should move to the another host(1).
once the host(2) comes up. the pods should float back.
Please let me know if there is any way i can achieve this?
Since your requirement is to have a 2 node master-only cluster and also have HA capabilities then unfortunately there is no straightforward way to achieve it.
Reason being that a 2 node master-only cluster deployed by kubeadm has only 2 etcd pods (one on each node). This gives you no fault tolerance. Meaning if one of the nodes goes down, etcd cluster would lose quorum and the remaining k8s master won't be able to operate.
Now, if you were ok with having an external etcd cluster where you can maintain an odd number of etcd members then yes, you can have a 2 node k8s cluster and still have HA capabilities.
It is possible that master node serves also as a worker node however it is not advisable on production environments, mainly for performance reasons.
By default, kubeadm configures master node so that no workload can be run on it and only regular nodes, added later would be able to handle it. But you can easily override this default behaviour.
In order to enable workload to be scheduled also on master node you need to remove from it the following taint, which is added by default:
kubectl taint nodes --all node-role.kubernetes.io/master-
To install and configure multi-master kubernetes cluster you can follow this tutorial. It describes scenario with 3 master nodes but you can easily customize it to your needs.

Redistribute pods after adding a node in Kubernetes

What should I do with pods after adding a node to the Kubernetes cluster?
I mean, ideally I want some of them to be stopped and started on the newly added node. Do I have to manually pick some for stopping and hope that they'll be scheduled for restarting on the newly added node?
I don't care about affinity, just semi-even distribution.
Maybe there's a way to always have the number of pods be equal to the number of nodes?
For the sake of having an example:
I'm using juju to provision small Kubernetes cluster on AWS. One master and two workers. This is just a playground.
My application is apache serving PHP and static files. So I have a deployment, a service of type NodePort and an ingress using nginx-ingress-controller.
I've turned off one of the worker instances and my application pods were recreated on the one that remained working.
I then started the instance back, master picked it up and started nginx ingress controller there. But when I tried deleting my application pods, they were recreated on the instance that kept running, and not on the one that was restarted.
Not sure if it's important, but I don't have any DNS setup. Just added IP of one of the instances to /etc/hosts with host value from my ingress.
descheduler, a kuberenets incubator project could be helpful. Following is the introduction
As Kubernetes clusters are very dynamic and their state change over time, there may be desired to move already running pods to some other nodes for various reasons:
Some nodes are under or over utilized.
The original scheduling decision does not hold true any more, as taints or labels are added to or removed from nodes, pod/node affinity requirements are not satisfied any more.
Some nodes failed and their pods moved to other nodes.
New nodes are added to clusters.
There is automatic redistribution in Kubernetes when you add a new node. You can force a redistribution of single pods by deleting them and having a host based antiaffinity policy in place​. Otherwise Kubernetes will prefer using the new node for scheduling and thus achieve a redistribution over time.
What are your reasons for a manual triggered redistribution​?