How do I debug kubernetes scheduling? - kubernetes

I have added podAntiAffinity to my DeploymentConfig template.
However, pods are being scheduled on nodes that I expected would be excluded by the rules.
How can I view logs of the kubernetes scheduler to understand why it chose the node it did for a given pod?

PodAntiAffinity has more to do with other pods than nodes specifically. That is, PodAntiAffinity specifies which nodes to exclude based on what pods are already scheduled on that node. And even here you can make it a requirement vs. just a preference. To directly pick the node on which a pod is/is not scheduled, you want to use NodeAffinity. The guide.

Related

Google cloud system kube-system namespaced deployed daemonsets not working

We are having problem with several deployments in our cluster that do not seem to be working. But I am a bit apprehensive in touching these, since they are part of the kube-system namespace. I am also unsure as what the correct approach to getting them into an OK state is.
I currently have two daemonsets that have warnings with the message
DaemonSet has no nodes selected
See images below. Does anyone have any idea what the correct approach is?
A DaemonSet is creating a pod in each node of your Kubernetes cluster.
If the Kubernetes scheduler cannot schedule any pod, there are several possibilities:
Pod spec has a too high memory requests resource for the memory node capacity, look at the value of spec.containers[].resources.requests.memory
The nodes may have a taint, so DaemonSet declaration must have a toleration (kubernetes documentation about taint and toleration)
The pod spec may have a nodeSelector field (kubernetes documentation about node selector)
The pod spec may have an enforced node affinity or anti-affinity (kubernetes documentation about node affinity)
If Pod Security Policies are enabled on the cluster, a security policy may be blocking access to a resource that the pod needs to run
There are not the only solutions possible. More generally, a good start would be to look at the events associated to the daemon set:
> kubectl describe daemonsets NAME_OF_YOUR_DAEMON_SET

Difference between daemonsets and deployments

In Kelsey Hightower's Kubernetes Up and Running, he gives two commands :
kubectl get daemonSets --namespace=kube-system kube-proxy
and
kubectl get deployments --namespace=kube-system kube-dns
Why does one use daemonSets and the other deployments?
And what's the difference?
Kubernetes deployments manage stateless services running on your cluster (as opposed to for example StatefulSets which manage stateful services). Their purpose is to keep a set of identical pods running and upgrade them in a controlled way. For example, you define how many replicas(pods) of your app you want to run in the deployment definition and kubernetes will make that many replicas of your application spread over nodes. If you say 5 replica's over 3 nodes, then some nodes will have more than one replica of your app running.
DaemonSets manage groups of replicated Pods. However, DaemonSets attempt to adhere to a one-Pod-per-node model, either across the entire cluster or a subset of nodes. A Daemonset will not run more than one replica per node. Another advantage of using a Daemonset is that, if you add a node to the cluster, then the Daemonset will automatically spawn a pod on that node, which a deployment will not do.
DaemonSets are useful for deploying ongoing background tasks that you need to run on all or certain nodes, and which do not require user intervention. Examples of such tasks include storage daemons like ceph, log collection daemons like fluentd, and node monitoring daemons like collectd
Lets take the example you mentioned in your question: why iskube-dns a deployment andkube-proxy a daemonset?
The reason behind that is that kube-proxy is needed on every node in the cluster to run IP tables, so that every node can access every pod no matter on which node it resides. Hence, when we make kube-proxy a daemonset and another node is added to the cluster at a later time, kube-proxy is automatically spawned on that node.
Kube-dns responsibility is to discover a service IP using its name and only one replica of kube-dns is enough to resolve the service name to its IP. Hence we make kube-dns a deployment, because we don't need kube-dns on every node.

Can you tell kubernetes to start one pod before another?

Can I add some config so that my daemon pods start before other pods can be scheduled or nodes are designated as ready?
Adding post edit:
These are 2 different pods altogether, the daemonset is a downstream dependency to any pods that might get scheduled on the host.
There's no such a thing as Pod hierarchy in Kubernetes between multiple separate types of pods. Meaning belonging to different Deployments, Statefulsets, Daemonsets, etc. In other words, there is no notion of a master pod and children pods. If you like to create your custom hierarchy you can build your own tooling around, for example waiting for the status of all pods in a DaemonSet to start or create a new Pod or Kubernetes workload resource.
The closest in terms of pod dependency in K8s is StatefulSets.
As per the docs:
For a StatefulSet with N replicas, when Pods are being deployed, they are created sequentially, in order from {0..N-1}.

How to convert Daemonsets to kind Deployment

I have already deployed pods using Daemonsets with nodeselector. My requirements is I need to use kind Deployment but at the same time I would want to retain Daemonsets functionality
.I have nodeselector defined so that same pod should be installed in labelled node.
How to achieve your help is appreciated.
My requirements is pod should be placed automatically based on nodeselector but with kind Deployment
In otherwords
Using Replication controller when I schedule 2 (two) replicas of a pod I expect 1 (one) replica each in each Nodes (VMs). Instead I find both replicas are created in same node This will make 1 Node a single point of failure which I need to avoid.
I have labelled two nodes properly. And I could see both pods spawned on single node. How to achieve both pods always schedule on both nodes?
Look into affinity and anti-affinity, specifically, inter-pod affinity and anti-affinity.
From official documentation:
Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes. The rules are of the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y”.

What's the purpose of Kubernetes DaemonSet when replication controllers have node anti-affinity

DaemonSet is a Kubernetes beta resource that can ensure that exactly one pod is scheduled to a group of nodes. The group of nodes is all nodes by default, but can be limited to a subset using nodeSelector or the Alpha feature of node affinity/anti-affinity.
It seems that DaemonSet functionality can be achieved with replication controllers/replica sets with proper node affinity and anti-affinity.
Am I missing something? If that's correct should DaemonSet be deprecated before it even leaves Beta?
As you said, DaemonSet guarantees one pod per node for a subset of the nodes in the cluster. If you use ReplicaSet instead, you need to
use the node affinity/anti-affinity and/or node selector to control the set of nodes to run on (similar to how DaemonSet does it).
use inter-pod anti-affinity to spread the pods across the nodes.
make sure the number of pods > number of node in the set, so that every node has one pod scheduled.
However, ensuring (3) is a chore as the set of nodes can change over time. With DaemonSet, you don't have to worry about that, nor would you need to create extra, unschedulable pods. On top of that, DaemonSet does not rely on the scheduler to assign its pods, which makes it useful for cluster bootstrap (see How Daemon Pods are scheduled).
See the "Alternative to DaemonSet" section in the DaemonSet doc for more comparisons. DaemonSet is still the easiest way to run a per-node daemon without external tools.