I am trying to use Cilium Egress Gateway Policy in my K8s cluster. I want to apply policy on all pods scheduled on Node X. How can I do that?
Using the podSelector field, I can pick pods which matchLabels. Theer is also a special label io.kubernetes.pod.namespace to select pods in a namespace. But I don't know how to filter for the pod's scheduled node (spec.nodeName).
Another possible solution is that I write a daemon set, which will get all pods on the node, and then call api-server to add a label for nodeName. But I need guidance how to write such daemon set, or if it's even secure to have the api-server credentials on the node.
Related
In Kubernetes, is possible to specify, at the Node level, which deployments it should run? That's kind of different from Node/PodAffinity, since would be possible to create a new node with the specified set of deployments running from the beginning, instead of wait for the scheduler to place new pods on that Node.
This would look like templating a VM if you are using some managed Kubernetes service, where you can specify the # of instances and it will be new nodes on your cluster (that will come up with that set of workloads that you defined). Would be that possible or is not the right Kubernetes' mindset?
In Kubernetes it's always the scheduler that assigns Pods to nodes. You can't somehow manually launch Pods on a node (outside of Kubernetes) and at the same time let them be a part of the Kubernetes cluster. The way to go is to always define your deployment via the Kubernetes API server and then let the scheduler assign the Pods to the available nodes.
However, you can influence how the scheduler assigns Pods to nodes. In case you want to define at the node level which types of Pods can run on a specific node, you can use taints and tolerations: define taints on your nodes and tolerations on your Pods so that only a specific set of Pods can run on a given node.
so i naturally run nvidia-docker and the k8s-device-plugin as a daemonset. as not all my kubernetes worker nodes have gpus, i use a nodeSelector in the daemonset to run just on nodes that i've labeled with accelerator=nvidia.
in another case, i also do the same for ingress-nginx: i label a few nodes that i want and run it as a daemonset. i then have an external (f5) load balancer that holds the VIP to the relevant DNS records for the ingress endpoints (yeah, i know there's a f5 ingress available - its on the todo list).
i've noticed that many users state that daemonsets should only be used for pods that should be running on ALL workers. is there anything inherently bad with my restriction of running daemonsets on a subset of nodes?
It's a valid use case. You can restrict the daemonset to run on the nodes that you want by using node selectors.
In Kelsey Hightower's Kubernetes Up and Running, he gives two commands :
kubectl get daemonSets --namespace=kube-system kube-proxy
and
kubectl get deployments --namespace=kube-system kube-dns
Why does one use daemonSets and the other deployments?
And what's the difference?
Kubernetes deployments manage stateless services running on your cluster (as opposed to for example StatefulSets which manage stateful services). Their purpose is to keep a set of identical pods running and upgrade them in a controlled way. For example, you define how many replicas(pods) of your app you want to run in the deployment definition and kubernetes will make that many replicas of your application spread over nodes. If you say 5 replica's over 3 nodes, then some nodes will have more than one replica of your app running.
DaemonSets manage groups of replicated Pods. However, DaemonSets attempt to adhere to a one-Pod-per-node model, either across the entire cluster or a subset of nodes. A Daemonset will not run more than one replica per node. Another advantage of using a Daemonset is that, if you add a node to the cluster, then the Daemonset will automatically spawn a pod on that node, which a deployment will not do.
DaemonSets are useful for deploying ongoing background tasks that you need to run on all or certain nodes, and which do not require user intervention. Examples of such tasks include storage daemons like ceph, log collection daemons like fluentd, and node monitoring daemons like collectd
Lets take the example you mentioned in your question: why iskube-dns a deployment andkube-proxy a daemonset?
The reason behind that is that kube-proxy is needed on every node in the cluster to run IP tables, so that every node can access every pod no matter on which node it resides. Hence, when we make kube-proxy a daemonset and another node is added to the cluster at a later time, kube-proxy is automatically spawned on that node.
Kube-dns responsibility is to discover a service IP using its name and only one replica of kube-dns is enough to resolve the service name to its IP. Hence we make kube-dns a deployment, because we don't need kube-dns on every node.
I have added podAntiAffinity to my DeploymentConfig template.
However, pods are being scheduled on nodes that I expected would be excluded by the rules.
How can I view logs of the kubernetes scheduler to understand why it chose the node it did for a given pod?
PodAntiAffinity has more to do with other pods than nodes specifically. That is, PodAntiAffinity specifies which nodes to exclude based on what pods are already scheduled on that node. And even here you can make it a requirement vs. just a preference. To directly pick the node on which a pod is/is not scheduled, you want to use NodeAffinity. The guide.
I have kubernetes running on version 1.5 with two nodes and one master nodes. I would like to deploy fluentd as a daemon set onto all nodes, but the master node (the master node spams warning messages as it can't find logs). How can I avoid deploying to the master node?
So to make a pod not schedule on a master node you need to add the following
nodeSelector:
kubernetes.io/role: node
This will make the pod schedule on only nodes. The above example shows the default label for node in kops provisioned cluster. Please very the key value if you have have provisioned the cluster from a different provider
You can use a label for your slave nodes and use that label in a selector for the daemon set, which will only deploy on the nodes that have that label.
Inversely, you can define a negative selector to assign the daemon set to pods that don't have a label. In your case, the pod that doesn't have the master's label.
You're looking for the Taints and Tolerations features. Using these you can define that given node in "tainted" in particular way preventing pods scheduling on this node unless they have a toleration matching that taint.