Kubernetes Service IP entry in IP tables - kubernetes

Deployed pod using replication controller with replicas set to 3. Cluster has 5 nodes. Created a service (type nodeport) for the pod. Now kube-proxy adds entry about the service into ip-tables of all 5 nodes. Would it not be a overhead if there are 50 nodes in the cluster?

This is not an overhead. Every node needs to be able to communicate with services even if it does not host the pods of that service (ie. it may have pods that connect to that service).
That said, in some very large clusters it was reported that performance of iptables updates might be poor (mind that this is for a very, very big scale). If that is the case, you might prefer to look into solutions like Linkerd (https://linkerd.io/) or Istio (https://istio.io/)

Related

EKS provisioned LoadBalancers reference all nodes in the cluster. If the pods reside on 1 or 2 nodes is this efficient as the ELB traffic increases?

In Kubernetes (on AWS EKS) when I create a service of type LoadBalancer the resultant EC2 LoadBalancer is associated with all nodes (instances) in the EKS cluster even though the selector in the service will only find the pods running on 1 or 2 of these nodes (ie. a much smaller subset of nodes).
I am keen to understand is this will be efficient as the volume of traffic increases.
I could not find any advice on this topic and am keen to understand if this the correct approach.
This could introduce additional SNAT if the request arrives at the node which the pods is not running on and also does not preserve the source IP of the request. You can change externalTrafficPolicy to Local which only associates nodes have pods running to the LoadBalancers.
You can get more information from the following links.
Perserve source IP
EKS load balancer support
On EKS, if you are using AWS CNI, which is default for EKS, then you can use aws-alb-ingress-loadbalancer to create ELB & ALB.
While creating loadbalancer you can use below annotation, then traffic is only routed to your pods.
alb.ingress.kubernetes.io/target-type: ip
Reference:
https://github.com/aws/amazon-vpc-cni-k8s
https://github.com/kubernetes-sigs/aws-alb-ingress-controller
https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/ingress/annotation/#target-type

Bare-Metal K8s: How to preserve source IP of client and direct traffic to nginx replica on current server on

I would like to ask you about some assistance:
Entrypoint to cluster for http/https is NGINX: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.0 running as deamonset
I want to achieve 2 things:
preserve source IP of client
direct traffic to nginx replica on
current server (so if request is sent to server A, listed as
externalIP address, nginx on node A should handle it)
Questions:
How is it possible?
Is it possible without nodeport? Control plane can be started with custom --service-node-port-range so I can add nodeport for 80
and 443, but it looks a little bit like a hack (after reading about
nodeport intended usage)
I was considering using metallb, but layer2 configuration will cause bottleneck (high traffic on cluster). I am not sure if BGP will solve this problem.
Kubernetes v15
Bare-metal
Ubuntu 18.04
Docker (18.9) and WeaveNet (2.6)
You can preserve the source IP of client by using externalTrafficPolicy set to local, this will proxy requests to local endpoints. This is explained on Source IP for Services with Type=NodePort.
Can should also have a look at Using Source IP.
In case of MetalLB:
MetalLB respects the service’s externalTrafficPolicy option, and implements two different announcement modes depending on what policy you select. If you’re familiar with Google Cloud’s Kubernetes load balancers, you can probably skip this section: MetalLB’s behaviors and tradeoffs are identical.
“Local” traffic policy
With the Local traffic policy, nodes will only attract traffic if they are running one or more of the service’s pods locally. The BGP routers will load-balance incoming traffic only across those nodes that are currently hosting the service. On each node, the traffic is forwarded only to local pods by kube-proxy, there is no “horizontal” traffic flow between nodes.
This policy provides the most efficient flow of traffic to your service. Furthermore, because kube-proxy doesn’t need to send traffic between cluster nodes, your pods can see the real source IP address of incoming connections.
The downside of this policy is that it treats each cluster node as one “unit” of load-balancing, regardless of how many of the service’s pods are running on that node. This may result in traffic imbalances to your pods.
For example, if your service has 2 pods running on node A and one pod running on node B, the Local traffic policy will send 50% of the service’s traffic to each node. Node A will split the traffic it receives evenly between its two pods, so the final per-pod load distribution is 25% for each of node A’s pods, and 50% for node B’s pod. In contrast, if you used the Cluster traffic policy, each pod would receive 33% of the overall traffic.
In general, when using the Local traffic policy, it’s recommended to finely control the mapping of your pods to nodes, for example using node anti-affinity, so that an even traffic split across nodes translates to an even traffic split across pods.
You need to take for the account the limitations of BGP routing protocol for MetalLB.
Please also have a look at this blog post Using MetalLb with Kind.

Kubernetes, How to enable inter-pod DNS within a same Deployment?

I am new to Kubernetes, and I am trying to make inter-pod communication over DNS to work.
Pods in My k8s are spawned using Deployments. My problem is all the Pods report its hostname to Zookeeper, and pods use those hostnames found in Zookeeper to ping the other peers. It always fail because the peer's hostnames are unresolvable between pods.
The only solution now is to manually add each pod's hostname to peer's /etc/hosts file. But this method would not endure to work for large clusters.
If there is a DNS solution for inter-pod communication, that keeps a record of any newly generated pods, and delete dead pods, will be great.
Thanks in advance.
One solution I had found was to add hostname and subdomain under spec->template->spec-> , then the communication over hostnames between each pod is successful.
However, this solution is fairly dumb, because I cannot set the replicas for each Deployment to more than 1, or I will get more than 1 pod with same hostname in the cluster. If I have 10 slave nodes with same function in a cluster, I will need to create 10 Deployments.
Any better solutions?
You need to use a service definition pointing to your pods
https://kubernetes.io/docs/concepts/services-networking/service/
With that you have a balanced proxy to control the inter-pod communications and the internal DNS on Kubernetes takes care of that service instead of each pod no matter the state of the pod.
If that simples solution didn't fit your needs you can substitute kubedns as the default internal DNS by using coreDNS.
https://coredns.io/

Pop to Pod communication for pods within the same Deployment

I have a Kubernetes deployment that has 3 replicas. It starts 3 pods which are distributed across a given cluster. I would like to know how to reliably get one pod to contact another pod within the same ReplicaSet.
The deployment above is already wrapped up in a Kubernetes Service. But Services do not cover my use case. I need each instance of my container (each Pod) to start-up a local in memory cache and have these cache communicate/sync with other cache instances running on other Pods. This is how I see a simple distributed cache working on for my service. Pod to pod communication within the same cluster is allowed as per the Kubernetes Network Model but I cannot see a reliable way to address each a pod from another pod.
I believe I can use a StatefulSet, however, I don't want to lose the ClusterIP assigned to the service which is required by Ingress for load balancing.
Ofcourse you can use statefulset, and ingress doesn't need ClusterIP that assigned to the service, since it uses the endpoints, so 'headless service' is ok.

Kubernetes NodePort routing logic

I have a kubernetes setup that contains 4 minions (node1,2,3,4). I created a service that exposes port 80 as node port of 30010. There are 4 nginx pods that accepts the traffic from above service. However distribution of pods among nodes may vary. For example node 1 has 2 pods, node 2 has 1 pod and node 3 has 1 pod. Node 4 doesn't have any pod deployed. My requirement is, whenever I send a request to node1:30010 it should hit only 2 pods on node 1 and it should not hit other pods. Traffic should be routed to other nodes if and only if there is no pod in local node. For example node4 may have to route requests to node4:30010 to other nodes because it has no suitable pod deployed on it. Can I facilitate this requirement by changing configurations of kube-proxy?
As far as I'm aware, no. Hitting node1:30010 will pass traffic to the service, the service will then round robin the response.
Kubernetes is designed as a layer of abstraction above nodes, so you don't have to worry about where traffic is being sent, trying to control which node traffic goes to goes against that idea.
Could you explain your end goal? If your different pods are serving different responses then you may want to create more services, or if you are worried about latency and want to serve traffic from the node closest to the user you may want to look at federating your cluster.