I was asked about it, and I couldn't find info about it online - Which algorithm Kubernetes uses to nevigate traffics in replicaset or deployment (I guess that they the same) for pods?
Lets say, I have replica of 5 pods in my Kubernetes cluster, defined in replicaset. How does the cluster pick which pod to go to, in new request? Is it uses round-robin? I couldn't find info about it.
The algorithm applied to determine which pod will process the request depends on kube-proxy mode that is running.
In 1.0, the proxy works in mode called userspace and default algorithm is round robin.
In 1.2 mode iptables proxy was added, but still round robin is used due to iptables limitations.
In 1.8.0-beta, IP Virtual Server (IPVS) was introduced, it allow much more algorithms options, like:
RoundRobin;
WeightedRoundRobin;
LeastConnection;
WeightedLeastConnection;
LocalityBasedLeastConnection;
LocalityBasedLeastConnectionWithReplication;
SourceHashing;
DestinationHashing;
ShortestExpectedDelay;
NeverQueue.
References:
https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies
https://sookocheff.com/post/kubernetes/understanding-kubernetes-networking-model/
Related
Why do we need point-to-point connection between pods while we have workloads abstraction and networking mechanism (Service/kube-proxy/Ingress etc.) over it?
What is the default CNI?
REDACTED: I was confused about this question because I felt like I haven't installed any of popular CNI plugins when I was installing Kubernetes. It turns out Kubernetes defaults to kubenet
Btw, I see a lot of overlap features between Istio and container networks. IMO they could achieve identical objectives. The only difference is that Istio is high-level and CNI is low-level and more efficient, is that correct?
REDACTED:Interestingly, istio has it's own CNI
Kubernetes networking has some requirements:
pods on a node can communicate with all pods on all nodes without NAT
agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
pods in the host network of a node can communicate with all pods on all nodes without NAT
and CNI(Container Network Interface) setup a standard interface, all implements(calico, flannel) need follow it.
So it aims to resolve the kubernetes networking.
The SVC is different, it's supplied a virtual address to proxy the pods, sine pods is ephemeral and its ip will changing but the address of svc is immutable.
For the istio, it's another thing, it make the connection between microservice as infrastructure and pull out this part from business code (think about spring cloud).
why do we need point-to-point connection between pods while we have workloads abstraction and networking mechanism(Service/kube-proxy/Ingress etc.) over it?
In general, you will find everything about networking in a cluster in this documentation. You can find more information about pod networking:
Every Pod gets its own IP address. This means you do not need to explicitly create links between Pods and you almost never need to deal with mapping container ports to host ports. This creates a clean, backwards-compatible model where Pods can be treated much like VMs or physical hosts from the perspectives of port allocation, naming, service discovery, load balancing, application configuration, and migration.
Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):
pods on a node can communicate with all pods on all nodes without NAT
agents on a node (e.g. system daemons, kubelet) can communicate with all pods on that node
Note: For those platforms that support Pods running in the host network (e.g. Linux):
pods in the host network of a node can communicate with all pods on all nodes without NAT
Then you are asking:
what is the default cni?
There is no single default CNI in a kubernetes cluster. It depends on what type you meet, where and how you set up the cluster etc. As you can see reading this doc about implementing networking model there are many CNI's available in Kubernetes.
Istio is a completely different tool for something else. You can't compare them like that. Istio is a service mesh tool.
Istio extends Kubernetes to establish a programmable, application-aware network using the powerful Envoy service proxy. Working with both Kubernetes and traditional workloads, Istio brings standard, universal traffic management, telemetry, and security to complex deployments.
I made a deployment and scaled out to 2 replicas. And I made a service to forward it.
I found that kube-proxy uses iptables for forwarding from Service to Pod. But the load balancing strategy of iptables is RANDOM.
How can I force my service to forward requests to 2 pods using round-robin strategy without switching my kube-proxy to userspace or ipvs mode?
You cannot, the strategies are only supported in ipvs mode. The option is even called --ipvs-scheduler.
You cannot.
But if you really don't want to change --proxy-mode flag on kube-proxy you can use some third-party proxy/loadbalancer (like HAProxy) and point it to your application. But this is usually not the best option as you need to make sure it's deployed with HA and it will also add complexity to your deployment.
The official kubernetes docs clearly state that kube-proxy "will not scale to very large clusters with thousands of Services", however when a LoadBalancer type Service is created on GKE the externalTrafficPolicy is set to Cluster by default (meaning that each request will be load-balanced by kube-proxy anyway in addition to external load balancing). As it is explained for example in this video from Next '17, this is to avoid traffic imbalance (as Google's external load balancers are not capable of asking a cluster how many pods of a given service are on each node).
Hence the question: does it mean that:
a) by default GKE cannot be used for for "very large clusters with thousands of Services" and to do so I need to risk traffic imbalances by setting externalTrafficPolicy to Local
b) ...or the information about poor scalability of kube-proxy is incorrect or outdated
c) ...or something else that I couldn't come up with
Thanks!
will not scale to very large clusters with thousands of services quote refers to userspace proxy, which was the default mode long time ago before full iptables based implementation happened. So this statement is largely outdated, but...
iptables mode has it's own issues that come with scale (extreamly large iptables rule chains take a lot of time to update) which is one of the reasons why IPVS work made it into kube-proxy. You'd have to have a really hardcore scale to run into performance issues with kube-proxy.
According to the Kubernetes official documentation about externalTrafficPolicy the answer is a).
Since Cluster option obscures the client source IP and may cause a second hop to another node, but should have good overall load-spreading, and Local option preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type services, but risks potentially imbalanced traffic spreading.
I am new to K8S and I am trying to understand the exact role of kube-proxy running on each node in a cluster. The documentation mentions that "kube-proxy reflects services as defined in the Kubernetes API on each node and can do simple TCP, UDP, and SCTP stream forwarding or round robin TCP, UDP, and SCTP forwarding across a set of backends". For this to be true, each kube-proxy will need to have complete information about all the services running in the cluster as it is the responsibility of the kube-proxy to provide access to any service which is demanded by an application running on a pod (on that respective node). So does that mean that all the kube-proxies inside a K8S cluster (running on each node) are mirror images? If so, why is a kube-proxy present on each node instead of a centralized one for entire cluster?
link to K8S documentaion on proxies: https://kubernetes.io/docs/concepts/cluster-administration/proxies/
So does that mean that all the kube-proxies inside a K8S cluster (running on each node) are mirror images?
Yea, They are instance of same image.
If so, why is a kube-proxy present on each node instead of a centralized one for entire cluster?
kube-proxy uses the operating system packet filtering layer if there is one and it’s available such as IPtable, IPVS. Otherwise, kube-proxy forwards the traffic itself. kube-proxy
Kube-Proxy is a k8s controller itself which watch the desired state(service & endpoints) of the cluster and make changes on the nodes, as it manage IPtabels (in use of iptable mode)
For this to be true, each kube-proxy will need to have complete information about all the services running in the cluster .....
there are following flags to set the behaviour of kube-proxy
--iptables-min-sync-period duration
The minimum interval of how often the iptables rules can be refreshed as endpoints and services change (e.g. '5s', '1m', '2h22m').
--iptables-sync-period duration Default: 30s
The maximum interval of how often iptables rules are refreshed (e.g. '5s', '1m', '2h22m'). Must be greater than 0.
IMO, Decision of the connections (forwarding, accepting ) among pods on nodes should be made by node components rather than a central plane components. Besides, K8s Control plane (api-server, etcd) keep the desired state and current state of the cluster, so all of the controller can reconcile according to their set behaviour.
In preparation for HIPAA compliance, we are transitioning our Kubernetes cluster to use secure endpoints across the fleet (between all pods). Since the cluster is composed of about 8-10 services currently using HTTP connections, it would be super useful to have this taken care of by Kubernetes.
The specific attack vector we'd like to address with this is packet sniffing between nodes (physical servers).
This question breaks down into two parts:
Does Kubernetes encrypts the traffic between pods & nodes by default?
If not, is there a way to configure it such?
Many thanks!
Actually the correct answer is "it depends". I would split the cluster into 2 separate networks.
Control Plane Network
This network is that of the physical network or the underlay network in other words.
k8s control-plane elements - kube-apiserver, kube-controller-manager, kube-scheduler, kube-proxy, kubelet - talk to each other in various ways. Except for a few endpoints (eg. metrics), it is possible to configure encryption on all endpoints.
If you're also pentesting, then kubelet authn/authz should be switched on too. Otherwise, the encryption doesn't prevent unauthorized access to the kubelet. This endpoint (at port 10250) can be hijacked with ease.
Cluster Network
The cluster network is the one used by the Pods, which is also referred to as the overlay network. Encryption is left to the 3rd-party overlay plugin to implement, failing which, the app has to implement.
The Weave overlay supports encryption. The service mesh linkerd that #lukas-eichler suggested can also achieve this, but on a different networking layer.
The replies here seem to be outdated. As of 2021-04-28 at least the following components seem to be able to provide an encrypted networking layer to Kubernetes:
Istio
Weave
linkerd
cilium
Calico (via Wireguard)
(the list above was gained via consultation of the respective projects home pages)
Does Kubernetes encrypts the traffic between pods & nodes by default?
Kubernetes does not encrypt any traffic.
There are servicemeshes like linkerd that allow you to easily introduce https communication between your http service.
You would run a instance of the service mesh on each node and all services would talk to the service mesh. The communication inside the service mesh would be encrypted.
Example:
your service -http-> localhost to servicemesh node - https-> remoteNode -http-> localhost to remote service.
When you run the service mesh node in the same pod as your service the localhost communication would run on a private virtual network device that no other pod can access.
No, kubernetes does not encrypt traffic by default
I haven't personally tried it, but the description on the Calico software defined network seems oriented toward what you are describing, with the additional benefit of already being kubernetes friendly
I thought that Calico did native encryption, but based on this GitHub issue it seems they recommend using a solution like IPSEC to encrypt just like you would a traditional host