How do I create a policy to run a container on every node, except the master unless there is only one node? - kubernetes

In the Kubernetes Book, it says that it's poor form to run pods on the master node.
Following this advice, I'd like to create a policy that runs a pod on all nodes, except the master if there are more than one nodes. However, to simplify testing and work in single-node environments, I'd also like to run my pod on the master node if there is just a single node in the entire system.
I've been looking around, and can't figure out how to express this policy. I see that DaemonSets have affinities and anti-affinities. I considered labeling the master node and adding an anti-affinity for that label. However, I didn't see how to require that at least a single pod would always come up (to ensure that things worked for single-node environment). Please let me know if I'm misunderstanding something. Thanks!

How about something like this:
During node provisioning, assign a particular label to each node that should run the job. In a single node cluster, this would be the master. In a multi-node environment, it would be every node except the master(s).
Create a deamonset that has tolerations for any nodes
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
As described in that doc you linked, use .spec.template.spec.nodeSelector to select only nodes with your special label. (node selector docs).
How you assign the special label to nodes is probably a fairly manual process heavily dependent on how you are actually deploying your clusters, but that is the general plan I would follow.
EDIT: Or I believe it may be simplest to just remove the master node taint from your single-node cluster. I believe most simple distributions like minikube will come this way by default.

Related

Can a pod run on multiple nodes?

I have one kubernetes master and three kubernetes nodes. I made one pod which is running on specific node. I want to run that pod on 2 nodes. how can I achieve this? do replica concept help me? if yes how?
Yes, you can assign pods to one or more nodes of your cluster, and here are some options to achieve this:
nodeSelector
nodeSelector is the simplest recommended form of node selection constraint. nodeSelector is a field of PodSpec. It specifies a map of key-value pairs. For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). The most common usage is one key-value pair.
affinity and anti-affinity
Node affinity is conceptually similar to nodeSelector -- it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node.
nodeSelector provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity feature, greatly expands the types of constraints you can express. The key enhancements are
The affinity/anti-affinity language is more expressive. The language offers more matching rules besides exact matches created with a logical AND operation;
you can indicate that the rule is "soft"/"preference" rather than a hard requirement, so if the scheduler can't satisfy it, the pod will still be scheduled;
you can constrain against labels on other pods running on the node (or other topological domain), rather than against labels on the node itself, which allows rules about which pods can and cannot be co-located
DaemonSet
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
Some typical uses of a DaemonSet are:
running a cluster storage daemon on every node
running a logs collection daemon on every node
running a node monitoring daemon on every node
Please check this link to read more about how to assign pods to nodes.
It's not a good practice to run the pods directly on the nodes as the nodes/pods can crash at any time. It's better use the K8S controllers as mentioned in the K8S documentation here.
K8S supports multiple containers and depending on the requirement the appropriate controller can be used. By looking at the OP it's difficult to say which controller to use.
You can use daemonset, if you want to run pod on each node.
What I see is you are trying to deploy pod on each node, it's better if you allow the scheduler to make decision where the pod need to be deployed based on the resources.
This would be best in all worst scenario's.
I'm mean in case of node failures.

Should server node be on different server than agent nodes, and how to achieve that?

I need advice for k3s architecture. I would like to create small cluster with one master and 3 agent nodes, but in my opinion master node should be in separate server so it have resources only for itself. But I can't see in k3s documentation --disable-agent anymore, and I read that it is buggy so they removed it, so I am wondering how can I have only server setup on one node and is it a good practice at all?
Having master node separated is a typical architecture that Kubernetes utilizes since it runs all the vital components (API Server, Controller manager, etcd and scheduler) necessary to manage your cluster. So it a good idea to have it running on another node (In K8s it is the only way although it is possible to schedule pods on master node if you untaint it)
Here`s a good article about having multinode k3 cluster that relates to your desire state.
Alternative way would be to a solution suggested in this github issue related to --disable-agent and taint the master with NoExecute key.

How to deregister a kubernetes node from a kubernetes cluster

I have a node mistakenly registered on a cluster B while it is actually serving for cluster A.
Here 'registered on a cluster B' means I can see the node from kubectl get node from cluster B.
I want to deregister this node from cluster B, but keep the node intact.
I know regular process to delete a node is:
kubectl drain node xxx
kubectl delete node xxx
# on node
kubeadm reset
But I do not want pods on the node from cluster A to be deleted or transfered. And I want to make sure the node would not self-register to cluster B afterwards.
To be clear, let's say, cluster A has Pod A on the node, cluster B has Pod B on the node as well, I want to delete node from cluster B, but keep Pod A intact. (By the way, can I see Pod A from cluster B?)
Thank you in advance!
To deregister the node without removing any pod you run below command
kubectl delete node nodename
After this is done the node will not appear in kubectl get nodes
For the node to not self register again stop the kubelet process on that node by logging into that node and using below command.
systemctl stop kubelet
As this case has been already clarified I decided to publish a Community Wiki answer based on the following comment:
#mario nvm, I thought different clusters in one node affect each
other, actually they do not, they just share container runtime which
is more like 'read-only', and they have different kubelets of
themselves listening on different port. – Li Ziyan Aug 17 at 5:29
to make it clear also for other users what was actually the issue here and how it has been solved or simply clarified.
So if you design your infrastructure in such a way that you use one physical (or virtual) machine as Node for more than one kubernetes clusters (which I believe is not very common case) the infrastructure looks as follows:
Components that are shared:
physical (or virtual) node
common container runtime environment (e.g. docker)
Components that are separate:
two separate kubelets. Although they are running on the same physical/virtual node they are configured to listen on different ports and are registered within two master Nodes (or more specifically two different kube-apiservers being part of two different kubernetes control planes)
two logically separate, independent kubernetes Nodes which, although they are configured on the same physical node/host, are logically completely separate kubernetes Nodes, being part of two completely different kubernetes clusters that don't interfere with each other in any way.
I hope it helps to clarify possible confusion about this question and maybe help someone in case they have similar doubts.

Expandable single node K8s cluster

I am searching for a solution that enables me to set up a single node K8s cluster and if I needed I add nodes to it later.
I am aware of solutions such as minikube and microk8s but they are not expandable. I am trying k3s at the moment exactly because it is offering this feature but I have some problems with storage and other stuff that I am working on them.
Now my questions:
What other solution for this exists?
What are the disadvantages if I untaint the master node and run everything there (for a long period and not just for test)?
You can use kubeadm to setup a single node "cluster". Then you can use the join command to add more nodes
You can expand k3s cluster via k3sup join.Here is guide.
Key Kubernetes services such as kube-apiserver, kube-scheduler should be available and running smoothly at all times on master nodes. Therefore, it is essential to have dedicated resources for the master nodes, and avoid having other non-critical workloads interfere with the functioning of the master services
What are the disadvantages if I untaint the master node and run everything there (for a long period and not just for test)?
Failure of the worker will of course bring down your applications. When you recover it or spin up another one, K8s will recover your apps for you.
Failure of the master will not adversely affect your systems only the cluster's ability to manage itself and its self-healing capabilities (which will affect uptime at some point).
I am searching for a solution that enables me to set up a single node K8s cluster and if I needed I add nodes to it later.
To the best of my knowledge, there is no such thing as single node production ready k8s cluster.
For something small and simple you can check Rancher.
What other solution for this exists?
kubeadm allows you to install everything on a single node. Install kubeadm on the node, "kubeadm init", install a pod network, then remove the master taint.
Another solution you may be interested in is the Kubespray.
Some "honorable mentions" are:
Charmed Kubernetes by Canonical allows you to do everything on one node; however it should be quite a big node, so may be not the case here (but still worth mentioning).
If you don't really require all the k8s power (with only one small node), then Nomad could be an alternative.
Let me know if that helps.

what is the benefit of the taint model over node selector

I am learning Kubernetes, and and faced a conceptual question, what is the benefit of new taint model over the simple node selector.
Documentation talks about a usecase where a group of devs might have exclusive right for a set of pods by a taint like dedicated=groupA:NoSchedule. But I thought we can do the same thing by a simple nodeSelector.
To be more specific, what is the role of the effect on this taint. Why not simply a label like the rest of the Kubernetes.
A node selector affects a single pod template, asking the scheduler to place it on a set of nodes. A NoSchedule taint affects all pods asking the scheduler to block all pods from being scheduled there.
A node selector is useful when the pod needs something from the node. For example, requesting a node that has a GPU. A node taint is useful when the node needs to be reserved for special workloads. For example, a node that should only be running pods that will use the GPU (so the GPU node isn't filled with pods that aren't using it).
Sometimes they are useful together as in the example above, too. You want the node to only have pods that use the GPU, and you want the pod that needs a GPU to be scheduled to a GPU node. In that case you may want to taint the node with dedicated=gpu:NoSchedule and add both a taint toleration and node selector to the pod template.