Multi-master setup and worker nodes - kubernetes

Do worker nodes in a multi-master setup talk to the apiserver on the master nodes via the load-balancer? It seems like the cluster is aware of the active apiserver endpoints via the endpoint reconciler, so I would think the logical and HA way is for the worker nodes to talk to the active endpoints it knows of. But according to the official documentation/diagram (https://kubernetes.io/docs/admin/high-availability/building/), it shows that the worker nodes goes through the load-balancer. Doesn't this mean that if for whatever reason the load-balancer becomes unavailable, your worker nodes will also malfunction?

When your kubelet starts, it needs to connect to the apiserver. The location of the apiserver is provided as a configuration option and in most cases will be a non-changing domain name pointing to a loadbalancer. You can not rely on ClusterIP based service for kubernetes main components like kubelet or kube-proxy as you would essentially be running your self into a chicken-and-egg situation / introducing additional dependencies.
Any reasonable environment should have a dependable loadbalancer, and it it is down, odds are that quite a lot of other things is down (also keep in mind that in many cases kubernetes will survive temporary inaccessibility of control plane)

Related

In which cases do we need container networks in kubernetes while we already have kubernetes Service?

Why do we need point-to-point connection between pods while we have workloads abstraction and networking mechanism (Service/kube-proxy/Ingress etc.) over it?
What is the default CNI?
REDACTED: I was confused about this question because I felt like I haven't installed any of popular CNI plugins when I was installing Kubernetes. It turns out Kubernetes defaults to kubenet
Btw, I see a lot of overlap features between Istio and container networks. IMO they could achieve identical objectives. The only difference is that Istio is high-level and CNI is low-level and more efficient, is that correct?
REDACTED:Interestingly, istio has it's own CNI
Kubernetes networking has some requirements:
pods on a node can communicate with all pods on all nodes without NAT
agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
pods in the host network of a node can communicate with all pods on all nodes without NAT
and CNI(Container Network Interface) setup a standard interface, all implements(calico, flannel) need follow it.
So it aims to resolve the kubernetes networking.
The SVC is different, it's supplied a virtual address to proxy the pods, sine pods is ephemeral and its ip will changing but the address of svc is immutable.
For the istio, it's another thing, it make the connection between microservice as infrastructure and pull out this part from business code (think about spring cloud).
why do we need point-to-point connection between pods while we have workloads abstraction and networking mechanism(Service/kube-proxy/Ingress etc.) over it?
In general, you will find everything about networking in a cluster in this documentation. You can find more information about pod networking:
Every Pod gets its own IP address. This means you do not need to explicitly create links between Pods and you almost never need to deal with mapping container ports to host ports. This creates a clean, backwards-compatible model where Pods can be treated much like VMs or physical hosts from the perspectives of port allocation, naming, service discovery, load balancing, application configuration, and migration.
Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):
pods on a node can communicate with all pods on all nodes without NAT
agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
Note: For those platforms that support Pods running in the host network (e.g. Linux):
pods in the host network of a node can communicate with all pods on all nodes without NAT
Then you are asking:
what is the default cni?
There is no single default CNI in a kubernetes cluster. It depends on what type you meet, where and how you set up the cluster etc. As you can see reading this doc about implementing networking model there are many CNI's available in Kubernetes.
Istio is a completely different tool for something else. You can't compare them like that. Istio is a service mesh tool.
Istio extends Kubernetes to establish a programmable, application-aware network using the powerful Envoy service proxy. Working with both Kubernetes and traditional workloads, Istio brings standard, universal traffic management, telemetry, and security to complex deployments.

How does kube-proxy behave when it can't reach the master?

From what I've read about Kubernetes, if the master(s) die, the workers should still be able to function as normal (https://stackoverflow.com/a/39173007/281469), although no new scheduling will occur.
However, I've found this to not be the case when the master can also schedule worker pods. Take a 2-node cluster, where one node is a master and the other a worker, and the master has the taints removed:
If I shut down the master and docker exec into one of the containers on the worker I can see that:
nc -zv ip-of-pod 80
succeeds, but
nc -zv ip-of-service 80
fails half of the time. The Kubernetes version is v1.15.10, using iptables mode for kube-proxy.
I'm guessing that since the kube-proxy on the worker node can't connect to the apiserver, it will not remove the master node from the iptables rules.
Questions:
Is it expected behaviour that kube-proxy won't stop routing to pods on master nodes, or is there something "broken"?
Are any workarounds available for this kind of setup to allow the worker nodes to still function correctly?
I realise the best thing to do is separate the CP nodes but that's not viable for what I'm working on at the moment.
Is it expected behaviour that kube-proxy won't stop routing to pods on
master nodes, or is there something "broken"?
Are any workarounds
available for this kind of setup to allow the worker nodes to still
function correctly?
The cluster master plays the role of decision maker for the various activities in cluster's nodes. This can include scheduling workloads, managing the workloads' lifecycle, scaling etc.. Each node is managed by the master components and contains the services necessary to run pods. The services on a node typically includes the kube-proxy, container runtime and kubelet.
The kube-proxy component enforces network rules on nodes and helps kubernetes in managing the connectivity among Pods and Services. Also, the kube-proxy, acts as an egress-based load-balancing controller which keeps monitoring the the kubernetes API server and continually updates node's iptables subsystem based on it.
In simple terms, the master node only is aware of everything and is in charge of creating the list of routing rules as well based on node addition or deletion etc. kube-proxy plays a kind of enforcer whereby it takes charge of checking with master, syncing the information and enforcing the rules on the list.
If the master node(API server) is down, the cluster will not be able to respond to API commands or deploy nodes. If another master node is not available, there shall be no one else available who can instruct the worker nodes on change in work allocation and hence they shall continue to execute the operations that were earlier scheduled by the master until the time the master node is back and gives different instructions. Inline to it, kube-proxy shall also be unable to get the latest rules by sync up with master, however it shall not stop routing and shall continue to handle the networking and routing functionalities (uses the earlier iptable rules that were determined before the master node went down) that shall allow network communication to your pods provided all pods in worker nodes are still up and running.
Single master node based architecture is not a preferred deployment architecture for production. Considering that resilience and reliability is one of the major business goal of kubernetes, it is recommended as a best practice to have HA cluster based architecture to avoid single point of failure.
Once you remove taints, kubernetes scheduler don't need any tolerations to schedule pods on your master node. So it is as good as your worker node with control plane components running on it and you can also run your workload pods on this node (although its not a recommended practice).
Kube-proxy (https://kubernetes.io/docs/concepts/overview/components/#kube-proxy) is the component deployed on all the nodes of cluster and it handles the networking and routing connection to your pods. So, even if your master node is down kube-proxy still works fine on the worker node and it will route traffic to your pods running on worker node.
If all your pods are running in worker nodes (which are still up and running), then kube-proxy will continue to route traffic to your pods even via service.
There is nothing inherent in Kubernetes that would cause this. The master node role is just for humans, and if you've removed the taints then the nodes are just normal nodes. That said, remember that usual rules about scheduling and resource requests apply so if your pods don't all fit then things wouldn't be scheduled. It's possible your Kubernetes deploy system set up more specialized firewall rules or similar around the control plane nodes, but that would be dependent on that system.

how does kubernetes guarantee reliability of kube proxy and kubelet?

If Kube proxy is down, the pods on a kubernetes node will not be able to communicate with the external world. Anything that Kubernetes does specially to guarantee the reliability of kube-proxy?
Similarly, how does Kubernetes guarantee reliability of kubelet?
It guarantees their reliability by:
Having multiple nodes: If one kubelet crashes, one node goes down. Similarly, every node runs a kube-proxy instance, which means losing one node means losing the kube-proxy instance on that node. Kubernetes is designed to handle node failures. And if you designed your app that is running on Kubernetes to be scalable, you will not be running it as single instance but rather as multiple instances - and kube-scheduler will distribute your workload across multiple nodes - which means your application will still be accessible.
Supporting a Highly-Available Setup: If you set up your Kubernetes cluster in High-Availability mode properly, there won't be one master node, but multiple. This means, you can even tolerate losing some master nodes. The managed Kubernetes offerings of the cloud providers are always highly-available.
These are the first 2 things that come to my mind. However, this is a broad question, so I can go into details if you elaborate what you mean by "reliability" a bit.

Kubernetes Load balancing and Proxy

I am quite new with Kubernetes and I have a few questions regarding REST API request proxy and load balancing.
I have one Master and two Worker nodes with some of the Services on one Worker node and few on other Worker node.
At a beginning I had just one worker node and I accessed to my pods using Worker node IP and service NodePort. After adding another Worker node to cluster, Kubernetes have "redistributed" mu pods to both of Working nodes.
Now, I can again access to my pods using both Worker node IPs and Service NodePorts. This i a bit confusing to me: how can I reach my pod REST APIs for pods that are not on the worker node which IP address is used?
Also, since I have 2 Worker nodes now, how Load balancing should be done in a proper way over both of Worker nodes? I know that I can set serviceType to LoadBalancer for Service, but is that enough?
Thank you for your answers!
how can I reach my pod REST APIs for pods that are not on the worker node which IP address is used?
It is better to think of exposing your services to outer world, rather than pods, and consequently avoid considering IP addresses of nodes that pods are running on. Answer to this question is dependent on your setup. Many configurations are possible depending on actual complexity and speed/availability requirements, but basic setup boils down to:
If you are running in some supported cloud environment then setup of load balanced ingress would expose it to outer world without much fuss.
If, however, you are running on bare metal, then you have to make your own ingress (simple nginx or apache proxy pod would suffice) and point upstream to your service name (or fqdn in case of another namespace), thus exposing all pods within service regardless of actual nodes they are running on to outer world and leaving load balancing to kubernetes service.
how Load balancing should be done in a proper way over both of Worker nodes?
This is a bit more complex topic since in uniform distribution of your pods across the nodes, you can make do with external load balancer that is oblivious of pod distribution. For us, leaving load balancing to kubernetes service proved to be more accurate, since more often than not you can have two pods run on same node (if number of pods is larger than number of nodes) in which case external load balancer will not be able to balance uniformly and kubernetes service layer will be.

Where do services live in Kubernetes?

I am learning Kubernetes and currently deep diving into high availability and while I understand that I can set up a highly available control plane (API-server, controllers, scheduler) with local (or with remote) etcds as well as a highly available set of minions (through Kubernetes itself), I am still not sure where in this concept services are located.
If they live in the control plane: Good I can set them up to be highly available.
If they live on a certain node: Ok, but what happens if the node goes down or becomes unavailable in any other way?
As I understand it, services are needed to expose my pods to the internet as well as for loadbalancing. So no HA service, I risk that my application won't be reachable (even though it might be super highly available for any other aspect of the system).
Kubernetes Service is another REST Object in the k8s Cluster. There are following types are services. Each one of them serves a different purpose in the cluster.
ClusterIP
NodePort
LoadBalancer
Headless
fundamental Purpose of Services
Providing a single point of gateway to the pods
Load balancing the pods
Inter Pods communication
Provide Stability as pods can die and restart with different Ip
more
These Objects are stored in etcd as it is the single source of truth in the cluster.
Kube-proxy is the responsible for creating these objects. It uses selectors and labels.
For instance, each pod object has labels therefore service object has selectors to match these labels. Furthermore, Each Pod has endpoints, so basically kube-proxy assign these endpoints (IP:Port) with service (IP:Port).Kube-proxy use IP-Tables rules to do this magic.
Kube-Proxy is deployed as DaemonSet in each cluster nodes so they are aware of each other by using etcd.
You can think of a service as an internal (and in some cases external) loadbalancer. The definition is stored in Kubernetes API server, yet the fact thayt it exists there means nothing if something does not implement it. Most common component that works with services is kube-proxy that implements services on nodes using iptables (meaning that every node has every service implemented in it's local iptables rules), but there are also ie. Ingress Controller implementations that use Service concept from API to find endpoints and direct traffic to them, effectively skipping iptables implementation. Finaly there are service mesh solutions like linkerd or istio that can leverage Service definitions on their own.
Services loadbalance between pods in most of implementations, meaning that as long as you have one backing pod alive (and with enough capacity) your "service" will respond (so you get HA as well, specially if you implement readiness/liveness probes that among other things will remove unhealthy pods from services)
Kubernetes Service documentation provides pretty good insight on that