Kubernetes Load balancing among pods - kubernetes

I made a deployment and scaled out to 2 replicas.
And I made a service to forward it.
I found that kube-proxy uses iptables for forwarding from Service to Pod.
But I am confused that which one is actually responsible for load-balancing.
Service or Kube-proxy?

Actually it's iptables that's responsible for load-balancing.
At the beginning, you setup a service. At the same time, kube-proxy is watching at the apiserver to get the new service, and setup iptables rules. last, when somebody try to access the service ip, iptables fowarding the request to actual pod ip according to the rules.


difference between Kubeproxy and service

I see in an article that I can access to pods from kubeproxy, so what is the role of kubernetes service here? and what is the difference between Kube Proxy and service? finally,
is kube proxy part of service?
As far as I understand:
Service is a Kubernetes object that has a stable name and stable IP and sits in front of a set of pods. All requests sent to the pods should go to the service.
Kube-proxy is a networking component running on every cluster node(basically its a Daemonset). It implements the low-level rules to allow communication to pods from inside as well as outside the Kubernetes Cluster. We can say that kube-proxy is a part of service.
So when a user tries to reach an application deployed on Kubernetes first it reaches the service and then forwards the request one of the underlying pods. This is done by using the rules that Kube proxy created.
For more understanding refer this video : Kube proxy & blog
Closer look at Kube proxy
From my understanding
If you are only accessing the pod ports inside the cluster, then there are no Service involved, as you need Service objects to expose your pods outside of your Cluster
Service exposes your pods outside of your Cluster. Service provides a stable virtual IP address. A controller keeps track of the pods that are associated with the Service. While kube-proxy is a daemon running on each node and watches the service resources defined in the cluster and manages the rules for the requests on a Service’s backend pods
kube-proxy interacts with the Service so kube-proxy can change the iptable rules when there are changes on Service objects. Hence they are separate entities.
We can discuss this for a while, but let's short a long story.
Requests come to Service
Then Service passes it to Kube-Proxy
Kube-Proxy decides to which Pod this request go
How requests are forwarded from Service to Pod
Kube Proxy forwards the request
Responsible for maintaining a list of Service IPs and corresponding Pod IPs
Check this section for more details...

How do Kubernetes NodePort services with Service.spec.externalTrafficPolicy=Local route traffic?

There seems to be two contradictory explanations of how NodePort services route traffic. Services can route traffic to one of the two, not both:
Nodes (through the kube-proxy) According to kubectl explain Service.spec.externalTrafficPolicy and this article that adds more detail, packets incoming to NodePort services with Service.spec.externalTrafficPolicy=Local set get routed to a kube-proxy, which then routes the packets to the corresponding pods its running.
This kube-proxy networking documentation further supports this theory adding that endpoints add a rule in the service's IPtable that forwards traffic to nodes through the kube-proxy.
Pods: services update their IPtables from endpoints, which contain the IP addresses for the pods they can route to. Furthermore, if you remove your service's label selectors and edit endpoints you can change where your traffic is routed to.
If one of these is right, then I must be misunderstanding something.
If services route to nodes, then why can I edit endpoints without breaking the IPtables?
If services route to pods, then why would services go through the trouble of routing to nodes when Service.spec.externalTrafficPolicy is set?
A Service is a virtual address/port managed by kube-proxy. Services forward traffic to their associated endpoints, which are usually pods but as you mentioned, can be set to any destination IP/Port.
A NodePort Service doesn't change the endpoint side of the service, the NodePort allows external traffic into Service via a port on a node.
Breakdown of a Service
kube-proxy can use 3 methods to implement the forwarding of a service from Node to destination.
a user proxy
Most clusters use iptables, which is what is described below. I use the term "forward" instead of "route" because services use Network Address Translation (or the proxy) to "forward" traffic rather than standard network routing.
The service ClusterIP is a virtual entity managed by kube-proxy. This address/port combination is available on every node in the cluster and forwards any local (pod) service traffic to the endpoints IP and port.
/ Pod (remote node)
Pod -- ClusterIP/Port -- KUBE-SVC-NAT -- Pod
\ Pod (remote node)
A service with a NodePort is the same as above, with the addition of a way to forward external traffic into the cluster via a Node. kube-proxy manages an additional rule to watch for external traffic and forward it into the same service rules.
Ext -- NodePort \ / Pod (remote node)
Pod -- ClusterIP/Port / \ Pod (remote node)
The externalTrafficPolicy=Local setting makes a NodePort service use only a local Pod to service the incoming traffic. This avoids a network hop which removes the need to rewrite the source of the packet (via NAT). This results in the real network IP arriving at the pod servicing the connection, rather than one of the cluster nodes being the source IP.
Ext -- NodePort \ Pod (remote node)
KUBE-SVC-NAT -- Pod (local)
Pod -- ClusterIP/Port / Pod (remote node)
I recommend attempting to trace a connection from source to destination for a service or nodeport on a host. It requires a bit of iptables knowledge but I think it's worthwhile
To list all the services ip/ports that will be forwarded:
iptables -vnL -t nat KUBE-SERVICES
To list all the nodeports that will be forwarded:
iptables -vnL -t nat KUBE-NODEPORTS
Once you have the rule you can jump through KUBE-SVC-XXX "target" rules in the full output.
iptables -vnL -t nat | less
externalTrafficPolicy: Cluster will not used on ClusterIP, try to remove and apply it, it"ll work

How does kube-proxy configure services of type nodePort?

When creating a kubernetes service of type nodePort, kube-proxy configures each worker node to listen on a particular port.
How does kube-proxy (in the iptables proxy mode) actually configure this? Is it just done using iptables which opens a port? (not sure if that is even possible)
Kube Proxy uses IPTable and netfilter rules for forwarding traffic from nodeports to pods. Mark Betz's article series on K8's networking is a good read.

Access nodeport via kube-proxy from another machine

I have kubernetes cluster (node01-03).
There is a service with nodeport to access a pod (nodeport 31000).
The pod is running on node03.
I can access the service with http://node03:31000 from any host. On every node I can access the service like http://[name_of_the_node]:31000. But I cannot access the service the following way: http://node01:31000 even though there is a listener (kube-proxy) on node01 at port 31000. The iptables rules look okay to me. Is this how it's intended to work ? If not, how can I further troubleshoot?
NodePort is exposed on every node in the cluster. https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport clearly says:
each Node will proxy that port (the same port number on every Node) into your Service
So, from both inside and outside the cluster, the service can be accessed using NodeIP:NodePort on any node in the cluster and kube-proxy will route using iptables to the right node that has the backend pod.
However, if the service is accessed using NodeIP:NodePort from outside the cluster, we need to first make sure that NodeIP is reachable from where we are hitting NodeIP:NodePort.
If NodeIP:NodePort cannot be accessed on a node that is not running the backend pod, it may be caused by the default DROP rule on the FORWARD chain (which in turn is caused by Docker 1.13 for security reasons). Here is more info about it. Also see step 8 here. A solution for this is to add the following rule on the node:
iptables -A FORWARD -j ACCEPT
The k8s issue for this is here and the fix is here (the fix should be there in k8s 1.9).
Three other options to enable external access to a service are:
ExternalIPs: https://kubernetes.io/docs/concepts/services-networking/service/#external-ips
LoadBalancer with an external, cloud-provider's load-balancer: https://kubernetes.io/docs/concepts/services-networking/service/#type-loadbalancer
Ingress: https://kubernetes.io/docs/concepts/services-networking/ingress/
If accessing pods within the Kubernetes cluster, you dont need to use the nodeport. Infer the Kubernetes service targetport instead. Say podA needs to access podB through service called serviceB. All you need assuming http is http://serviceB:targetPort/

Routing traffic to kubernetes cluster

I have a question related to Kubernetes networking.
I have a microservice (say numcruncherpod) running in a pod which is serving requests via port 9000, and I have created a corresponding Service of type NodePort (numcrunchersvc) and node port which this service is exposed is 30900.
My cluster has 3 nodes with following IPs:,
I will be routing the traffic to my cluster via reverse proxy (nginx). As I understand in nginx I need to specify IPs of all these cluster nodes to route the traffic to the cluster, is my understanding correct ?
My worry is since nginx won't have knowledge of cluster it might not be a good judge to decide the cluster node to which the traffic should be sent to. So is there a better way to route the traffic to my kubernetes cluster ?
PS: I am not running the cluster on any cloud platform.
This answer is a little late, and a little long, so I ask for forgiveness before I begin. :)
For people not running kubernetes clusters on Cloud Providers there are 4 distinct options for exposing services running inside the cluster to the world outside.
Service of type: NodePort. This is the simplest and default. Kubernetes assigns a random port to your service. Every node in the cluster listens for traffic to this particular port and then forwards that traffic to any one of the pods backing that service. This is usually handled by kube-proxy, which leverages iptables and load balances using a round-robin strategy. Typically since the UX for this setup is not pretty, people often add an external "proxy" server, such as HAProxy, Nginx or httpd to listen to traffic on a single IP and forward it to one of these backends. This is the setup you, OP, described.
A step up from this would be using a Service of type: ExternalIP. This is identical to the NodePort service, except it also gets kubernetes to add an additional rule on all kubernetes nodes that says "All traffic that arrives for destination IP == must also be forwarded to the pods". This basically allows you to specify any arbitrary IP as the "external IP" for the service. As long as traffic destined for that IP reaches one of the nodes in the cluster, it will be routed to the correct pod. Getting that traffic to any of the nodes however, is your responsibility as the cluster administrator. The advantage here is that you no longer have to run an haproxy/nginx setup, if you specify the IP of one of the physical interfaces of one of your nodes (for example one of your master nodes). Additionally you cut down the number of hops by one.
Service of type: LoadBalancer. This service type brings baremetal clusters at parity with cloud providers. A fully functioning loadbalancer provider is able to select IP from a pre-defined pool, automatically assign it to your service and advertise it to the network, assuming it is configured correctly. This is the most "seamless" experience you'll have when it comes to kubernetes networking on baremetal. Most of LoadBalancer provider implementations use BGP to talk and advertise to an upstream L3 router. Metallb and kube-router are the two FOSS projects that fit this niche.
Kubernetes Ingress. If your requirement is limited to L7 applications, such as REST APIs, HTTP microservices etc. You can setup a single Ingress provider (nginx is one such provider) and then configure ingress resources for all your microservices, instead of service resources. You deploy your ingress provider and make sure it has an externally available and routable IP (you can pin it to a master node, and use the physical interface IP for that node for example). The advantage of using ingress over services is that ingress objects understand HTTP mircoservices natively and you can do smarter health checking, routing and management.
Often people combine one of (1), (2), (3) with (4), since the first 3 are L4 (TCP/UDP) and (4) is L7. So things like URL path/Domain based routing, SSL Termination etc is handled by the ingress provider and the IP lifecycle management and routing is taken care of by the service layer.
For your use case, the ideal setup would involve:
A deployment for your microservice, with health endpoints on your pod
An Ingress provider, so that you can tweak/customize your routing/load-balancing as well as use for SSL termination, domain matching etc.
(optional): Use a LoadBalancer provider to front your Ingress provider, so that you don't have to manually configure your Ingress's networking.
Correct. You can route traffic to any or all of the K8 minions. The K8 network layer will forward to the appropriate minion if necessary.
If you are running only a single pod for example, nginx will most likely round-robin the requests. When the requests hit a minion which does not have the pod running on it, the request will be forwarded to the minion that does have the pod running.
If you run 3 pods, one on each minion, the request will be handled by whatever minion gets the request from nginx.
If you run more than one pod on each minion, the requests will be round-robin to each minion, and then round-robin to each pod on that minion.