Kubernetes' High Availability Leader Lease - kubernetes

I have a question regarding Kubernetes' leader/follower lease management for the kube-controller-manager and the kube-scheduler: As far as I understand Kubernetes tracks the current leader as Endpoints in the kube-system namespace.
You can get the leader via
$ kubectl get endpoints -n kube-system
NAME ENDPOINTS AGE
kube-controller-manager <none> 20m
kube-scheduler <none> 20m
then e.g.
$ kubectl describe endpoints kube-scheduler -n kube-system
Name: kube-scheduler
Namespace: kube-system
Annotations: control-plane.alpha.kubernetes.io/leader={"holderIdentity":"controller-0", ...}
The current leader is the holderIdentity of the control-plane.alpha.kubernetes.io/leader annotation.
My question:
Lease management like acquiring leases, renewing leases, time to live, etc. is implemented in leaderelection.go on top of Kubernetes Endpoints. Is there a specific reason lease management is not implemented directly on Etcd with "out-of-the-box" Etcd primitives like Etcd's compare and swap operation and time to live on objects?
Edit(s)
add Etcd compare and swap
add Etcd time to live

A few reasons:
Etcd might be running externally to Kubernetes' network, which means network latency
Etcd could be busy/loaded and therefore slow
The etcd cluster is very likely to have fewer nodes than the Kubernetes master, making it less reliable

For security reasons, only the API server should have access to etcd. Keep in mind that if etcd was used for leader leases by convention, custom controllers and operators using leader election would also need access to etcd which would be inadvisable given how critical the data stored in etcd is.
Ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters

Related

Kubernetes v1.20 endpoints resource cannot view controller manger or scheduler

I want to check the election of basic components, but the information displayed is different in different versions of kubernetes binary installation methods.
Is the corresponding information cancelled in kubernetes v1.20 +? Or is there any other way to view the election of basic components?
The following kubernetes configuration parameters are consistent, except that the binary executable file is replaced
Kubernetes v1.20.8 or Kubernetes v1.20.2
$ kubectl get endpoints -n kube-system
No resources found in kube-system namespace.
Kubernetes v1.19.12
$ kubectl get endpoints -n kube-system
NAME ENDPOINTS AGE
kube-controller-manager <none> 9m12s
kube-scheduler <none> 9m13s
I found the cause of the problem
The difference between the two versions is the default value of --leader-select resource-lock
Kubernetes v1.20.8 or Kubernetes v1.20.2
--leader-elect-resource-lock string The type of resource object that is used for locking during leader election. Supported options are 'endpoints', 'configmaps', 'leases', 'endpointsleases' and 'configmapsleases'. (default "leases")
Kubernetes v1.19.12
--leader-elect-resource-lock string The type of resource object that is used for locking during leader election. Supported options are 'endpoints', 'configmaps', 'leases', 'endpointsleases' and 'configmapsleases'. (default "endpointsleases")
When I don't set --leader-select-resource-lock string in controller-manager or scheduler in v1.20.8, the default value is leaders.
so I can use the following command to view the information of component leaders.
$ kubectl get leases -n kube-system
NAME HOLDER AGE
kube-controller-manager master01_dec12376-f89e-4721-92c5-a20267a483b8 45h
kube-scheduler master02_c0c373aa-1642-474d-9dbd-ec41c4da089d 45h

kubectl get pod status always ContainerCreating

k8s version: 1.12.1
I created pod with api on node and allocated an IP (through flanneld). When I used the kubectl describe pod command, I could not get the pod IP, and there was no such IP in etcd storage.
It was only a few minutes later that the IP could be obtained, and then kubectl get pod STATUS was Running.
Has anyone ever encountered this problem?
Like MatthiasSommer mentioned in comment, process of creating pod might take a while.
If POD will stay for a longer time in ContainerCreating status you can check what is stopping it change to status Running by command:
kubectl describe pod <pod_name>
Why creating of pod may take a longer time?
Depends on what is included in manifest, pod can share namespace, storage volumes, secrets, assignin resources, configmaps etc.
kube-apiserver validates and configures data for api objects.
kube-scheduler needs to check and collect resurces requrements, constraints, etc and assign pod to the node.
kubelet is running on each node and is ensures that all containers fulfill pod specification and are healty.
kube-proxy is also running on each node and it is responsible for network on pod.
As you see there are many requests, validates, syncs and it need a while to create pod fulfill all requirements.

Kubernetes pods in pending state for indfinite time..?

I'm using digital ocean kubernetes cluster service and have deployed 9 nodes in cluster but when i'm trying to deploy kafka zookeeper pods few pods get deployed other remain in pending state. i've tried doing
kubectl describe pods podname -n namespace
it shows
its not getting assigned to any nodes
check if your deployment/statefulset might have some node Selectors and/or node/pod affinity that might prevent it from running .
also it would be helpful to see more parts of the pod decribe since it might give more details.
there is a message on your print screen about the PersistentVolume Claims so I would also check the status of the pvc objects to check if they are bound or not.
good luck

Difference between daemonsets and deployments

In Kelsey Hightower's Kubernetes Up and Running, he gives two commands :
kubectl get daemonSets --namespace=kube-system kube-proxy
and
kubectl get deployments --namespace=kube-system kube-dns
Why does one use daemonSets and the other deployments?
And what's the difference?
Kubernetes deployments manage stateless services running on your cluster (as opposed to for example StatefulSets which manage stateful services). Their purpose is to keep a set of identical pods running and upgrade them in a controlled way. For example, you define how many replicas(pods) of your app you want to run in the deployment definition and kubernetes will make that many replicas of your application spread over nodes. If you say 5 replica's over 3 nodes, then some nodes will have more than one replica of your app running.
DaemonSets manage groups of replicated Pods. However, DaemonSets attempt to adhere to a one-Pod-per-node model, either across the entire cluster or a subset of nodes. A Daemonset will not run more than one replica per node. Another advantage of using a Daemonset is that, if you add a node to the cluster, then the Daemonset will automatically spawn a pod on that node, which a deployment will not do.
DaemonSets are useful for deploying ongoing background tasks that you need to run on all or certain nodes, and which do not require user intervention. Examples of such tasks include storage daemons like ceph, log collection daemons like fluentd, and node monitoring daemons like collectd
Lets take the example you mentioned in your question: why iskube-dns a deployment andkube-proxy a daemonset?
The reason behind that is that kube-proxy is needed on every node in the cluster to run IP tables, so that every node can access every pod no matter on which node it resides. Hence, when we make kube-proxy a daemonset and another node is added to the cluster at a later time, kube-proxy is automatically spawned on that node.
Kube-dns responsibility is to discover a service IP using its name and only one replica of kube-dns is enough to resolve the service name to its IP. Hence we make kube-dns a deployment, because we don't need kube-dns on every node.

Are there issues with running user pods on a Kubernetes master node?

Many of the run-throughs for deploying Kubernetes master nodes suggest you use --register-schedulable=false to prevent user pods being scheduled to the master node (e.g. https://coreos.com/kubernetes/docs/latest/deploy-master.html). On a very small Kubernetes cluster it seems somewhat a wasteful of compute resources to effectively prevent an entire node from being used for pod scheduling unless absolutely essential.
The answer to this question (Will (can) Kubernetes run Docker containers on the master node(s)?) suggests that it is indeed possible to run user pods on a master node - but doesn't address whether there are any issues associated with allowing this.
The only information that I've been able to find to date that suggests there might be issues associated with allowing this is that it appears that pods on master nodes communicate insecurely (see http://kubernetes.io/docs/admin/master-node-communication/ and https://github.com/kubernetes/kubernetes/issues/13598). I assume that this would potentially allow a rogue pod running on a master node to access/hijack Kubernetes functionality not normally accessible to pods on non-master nodes. Probably not a big deal with if only running pods/containers developed internally - although I guess there's always the possibility of someone hacking access to a pod/container and thereby gaining access to the master node.
Does this sound like a viable potential risk associated with this scenario (allowing user pods to run on a Kubernetes master node)? Are there any other potential issues associated with such a setup?
Running pods on the master node is definitely possible.
The security risk you mention is one issue, but if you configure service accounts, it isn't actually much different for all deployed pods to have secure remote access to the apiserver vs. insecure local access.
Another issue is resource contention. If you run a rogue pod on your master node that disrupts the master components, it can destabilize your entire cluster. Clearly this is a concern for production deployments, but if you are looking to maximize utilization of a small number of nodes in a development / experimentation environment, then it should be fine to run a couple of extra pods on the master.
Finally, you need to make sure the master node has a sufficiently large pod cidr allocated to it. In some deployments, the master only gets a /30 which isn't going to allow you to run very many pods.
Now Kubernetes and some Kubernetes distribution have what it calls taint.
taint can decide if the master can run a pod or not.
although running the pod on the master node is not the best practice but it's possible to do so. medium
in Kubernetes, we can read the explanation about taint here and I believe this is also related to scheduler
in Kubernetes or K3S, we can check if the nodes set the taint or not by describing the nodes.
# kubectl describe nodes | grep Taints
Taints: node.kubernetes.io/unreachable:NoExecute
Taints: node.kubernetes.io/unreachable:NoSchedule
Taints: node.kubernetes.io/unreachable:NoExecute
Taints: <none>
NoSchedule: Pods that do not tolerate this taint are not scheduled on the node.
PreferNoSchedule: Kubernetes avoids scheduling Pods that do not tolerate this taint onto the node.
NoExecute: Pod is evicted from the node if it is already running on the node, and is not scheduled onto the node if it is not yet running on the node.
source
if you want to specify one of your nodes, rather master or agent, just mention the nodes
# kubectl describe nodes agent3 | grep Taints
Taints: <none>
# kubectl describe nodes master | grep Taints
Taints: <none>
this is how you apply the taint to your nodes
kubectl taint nodes agent1 key1=value1:NoSchedule
kubectl taint nodes agent2 key1=value1:NoExecute
when your nodes are not running automatically it will show NoSchedule or NoExecute, make sure to check your nodes before checking the taint.
#robert have given a clear answer. I'm just trying to explain in a metaphorical way with a real-time example.
Your company's MANAGER is a better coder. If he starts coding, your company's MANAGER kind of work will be stalled/less efficient, because he can handle one thing in an efficient way. that will put your entire company at risks.
To operate efficiently, Hire more devs to code and don't make your MANAGER code(in order to get the works for the amount you are paying him).