I have deployed my Kubernetes cluster on EKS. I have an ingress-nginx which is exposed via load balancer to route traffic to different services. In ingress-nginx first request goes to auth service for authentication and if it is a valid request then I allow it move forward.
Let say the request is in Service 1 and now from there, it wants to communicate to Service 2. So if I somehow want my request to go directly to ingress not via load balancer and then from ingress to service 2.
Is is possible to do so?
Will it help in improving performance as I bypassed load balancer?
As the request is not moving through load balancer so load balancing won't take place, is it a serious concern?
1/ Is it possible: short answer, no.
There are edge cases, that would require for someone to create another Ingress object exposing Service2 in the first place. Then, you could trick the Ingress into routing you to some service that might not otherwise be reachable (if the DNS doesn't exist, some VIP was not yet exposed, ...)
There's no real issue with external clients bypassing the ELB, as long as they can not join all ports on your nodes, just the ones bound by your ingress controller.
2/ Bypassing the loadbalancer: won't change much in terms of performance.
If we're talking about a TCP loadbalancer, getting it away would help track real client IPs, though. Figuring out how to change it for an HTTP loadbalancer may be better -- though not always easy.
3/ Removing the LoadBalancer: if you have several nodes hosting replicas of your incress controller, then you would still be able to do some kind of DNS-based loadbalancing. Though for sure, it's not the same as having a real LB.
In AWS, you could find a middle ground setting up health-check based Route53 Records: set one for each node hosting an ingress controller, create another regrouping all healthy ingress nodes, then change your existing ingress FQDN records so they'ld all point to your new route53 name. You'ld be able to do TCP/HTTP checks against EC2 instances IPs, that's usually good enough. But again: DNS loadbalancing can suffer from outdated browser caches, some ISP not refreshing zones, ... LB is the real thing.
Related
From k8s docs and along with other answers I can find, it shows load balancer(LB) before the ingress. However I am confused that after matching the ingress rule, there can be still multiple containers that backed the selected services. Does LB happen again here for selecting one container to route to?
https://kubernetes.io/docs/concepts/services-networking/ingress/#what-is-ingress
As you can see from the picture you posted, the Ingress choose a Service (based on a Rule) and not directly a Pod. Then, the Service may (or may not) have more than one Pod behind.
The default Service type for Kubernetes is called ClusterIP. It receives a virtual IP, which then redirects requests to one of the Pods served behind. On each node of the cluster, runs a kube-proxy which is responsible for implementing this virtual ip mechanic.
So, yes, load balancing happens again after a Service is selected.. if that service selects more than one Pod. Which backend (Pod) is choosen depends on how kube-proxy is configured and is usually either round robin or just random.
There is also a way to create a Service without a virtual IP. Such services, called headless Services, directly uses DNS to redirect requests to the different backends.. but they are not the default because it is better to use proxies than try to load balance with DNS.. which may have side effects (depending on who makes requests)
You can find a lot of info regarding how Services work in the docs.
Is my understanding of the following workflow correct:
When a request goes to the Load Balancer, it will also go through the Ingress Object (essentially a map of exactly how to process the incoming request).
This request is then forwarded to an Ingress Controller to fulfil (the request ultimately gets sent to the appropriate Pod/Service).
But what happens when there is only one Ingress Controller? It seems to me that the purpose of the load balancer will be defeated as all the requests will go to EKS Worker Node - 1?
And additionally, let's say that Pod A and Pod B in EKS Worker Node - 1 is occupied/down, will the Ingress controller forward that request to EKS Worker Node - 2?
Is my assumptions correct? And should you always have multiple ingress controllers across different nodes?
I'm confused as I don't understand how these two components work together? Which component is actually balancing the load?
To your main question, having multiple ingress controller replicas for redundancy in case of node failure is very common and probably a good thing to do on any production setup.
As to how it works: there's two modes for Load Balancer services depending on the "external traffic policy". In the default mode, the NLB doesn't talk directly to Nginx, it talk to the kube-proxy mesh on every node which then routes the Nginx pods as needed, even if they are on a different node. With traffic policy "Local", the NLB will bypass the kube-proxy mesh and only talks to nodes that have at least 1 Nginx pod, so in your example the NLB would not talk to node 2 at all.
The extra hop in the default mode improves resiliency and has smoother balancing but at a cost of hiding the client IP (moot here since the NLB does that anyway) and introducing a small bit of latency from the extra packet hops, usually only a few milliseconds.
I've been looking into Kubernetes networking, more specifically, how to serve HTTPS users the most efficient.
I was watching this talk: https://www.youtube.com/watch?v=0Omvgd7Hg1I and from 22:18 he explains what the problem is with a load balancer that is not pod aware. Now, how they solve this in kubernetes is by letting the nodes also act as a 'router' and letting the node pass the request on to another node. (explained at 22:46). This does not seem very efficient, but when looking around SoundCloud (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic) actually seems to do something similar to this but with NodePorts. They say that the overhead costs less than creating a better load balancer.
From what I have read an option might be using an ingress controller. Making sure that there is not more than one ingress controller per node, and routing the traffic to the specific nodes that have an ingress controller. That way there will not be any traffic re-routing needed. However, this does add another layer of routing.
This information is all from 2017, so my question is: is there any pod aware load balancer out there, or is there some other method that does not involve sending the http request and response over the network twice?
Thank you in advance,
Hendrik
EDIT:
A bit more information about my use case:
There is a bare-metal setup with kubernetes. The firewall load balances the incomming data between two HAProxy instances. These HAProxy instances do ssl termination and forward the traffic to a few sites. This includes an exchange setup, a few internal IIS sites and a nginx server for a static web app. The idea is to transform the app servers into kubernetes.
Now my main problem is how to get the requests from HAProxy into kubernetes. I see a few options:
Use the SoundCloud setup. The infrastructure could stay almost the same, the HAProxy server can still operate the way they do now.
I could use an ingress controller on EACH node in the kubernetes cluster and have the firewall load balance between the nodes. I believe it is possible to forward traffic from the ingress controller to server outside the cluster, e.g. exchange.
Some magic load balancer that I do not know about that is pod aware and able to operate outside of the kubernetes cluster.
Option 1 and 2 are relatively simple and quite close in how they work, but they do come with a performance penalty. This is the case when the node that the requests gets forwarded to by the firewall does not have the required pod running, or if another pod is doing less work. The request will get forwarded to another node, thus, using the network twice.
Is this just the price you pay when using Kubernetes, or is there something that I am missing?
How traffic heads to pods depend on whether a managed cluster is used.
Almost all cloud providers can forward traffic in a cloud-native way in their managed K8s clusters. First, you can a managed cluster with some special network settings (e.g. vpc-native cluster of GKE). Then, the only thing you need to do is to create a LoadBalancer typed Service to expose your workload. You can also create Ingresses for your L7 workloads, they are going to be handled by provided IngressControllers (e.g. ALB of AWS).
In an on-premise cluster without any cloud provider(OpenStack or vSphere), the only way to expose workloads is NodePort typed Service. It doesn't mean you can't improve it.
If your cluster is behind reverse proxies (the SoundCloud case), setting externalTrafficPolicy: Local to Services could break traffic forwarding among work nodes. When traffic received through NodePorts, they are forwarded to local Pods or dropped if Pods reside on other nodes. Reserve proxy will mark these NodePort as unhealthy in the backend health check and reject to forward traffic to them. Another choice is to use topology-aware service routing. In this case, local Pods have priorities and traffic is still forwarded between node when no local Pods matched.
For IngressController in on-prem clusters, it is a little different. You may have some work nodes that have EIP or public IP. To expose HTTP(S) services, an IngressController usually deployed on those work nodes through DaemeaSet and HostNetwork such that clients access the IngressController via the well-known ports and EIP of nodes. These work nodes regularly don't accept other workloads (e.g. infra node in OpenShift) and one more forward on the Pod network is needed. You can also deploy the IngressController on all work nodes as well as other workloads, so traffic could be forwarded to a closer Pod if the IngressController supports topology-aware service routing although it can now.
Hope it helps!
I'm trying to understand the concepts of ingress and ingress controllers in kubernetes. But I'm not so sure what the end product should look like. Here is what I don't fully understand:
Given I'm having a running Kubernetes cluster somewhere with a master node which runes the control plane and the etcd database. Besides that I'm having like 3 worker nodes - each of the worker nodes has a public IPv4 address with a corresponding DNS A record (worker{1,2,3}.domain.tld) and I've full control over my DNS server. I want that my users access my web application via www.domain.tld. So I point the the www CNAME to one of the worker nodes (I saw that my ingress controller i.e. got scheduled to worker1 one so I point it to worker1.domain.tld).
Now when I schedule a workload consisting of 2 frontend pods and 1 database pod with 1 service for the frontend and 1 service for the database. From what've understand right now, I need an ingress controller pointing to the frontend service to achieve some kind of load balancing. Two questions here:
Isn't running the ingress controller only on one worker node pointless to internally load balance two the two frontend pods via its service? Is it best practice to run an ingress controller on every worker node in the cluster?
For whatever reason the worker which runs the ingress controller dies and it gets rescheduled to another worker. So the ingress point will get be at another IPv4 address, right? From a user perspective which tries to access the frontend via www.domain.tld, this DNS entry has to be updated, right? How so? Do I need to run a specific kubernetes-aware DNS server somewhere? I don't understand the connection between the DNS server and the kubernetes cluster.
Bonus question: If I run more ingress controllers replicas (spread across multiple workers) do I do a DNS-round robin based approach here with multiple IPv4 addresses bound to one DNS entry? Or what's the best solution to achieve HA. I rather not want to use load balancing IP addresses where the worker share the same IP address.
Given I'm having a running Kubernetes cluster somewhere with a master
node which runes the control plane and the etcd database. Besides that
I'm having like 3 worker nodes - each of the worker nodes has a public
IPv4 address with a corresponding DNS A record
(worker{1,2,3}.domain.tld) and I've full control over my DNS server. I
want that my users access my web application via www.domain.tld. So I
point the the www CNAME to one of the worker nodes (I saw that my
ingress controller i.e. got scheduled to worker1 one so I point it to
worker1.domain.tld).
Now when I schedule a workload consisting of 2 frontend pods and 1
database pod with 1 service for the frontend and 1 service for the
database. From what've understand right now, I need an ingress
controller pointing to the frontend service to achieve some kind of
load balancing. Two questions here:
Isn't running the ingress controller only on one worker node pointless to internally load balance two the two frontend pods via its
service? Is it best practice to run an ingress controller on every
worker node in the cluster?
Yes, it's a good practice. Having multiple pods for the load balancer is important to ensure high availability. For example, if you run the ingress-nginx controller, you should probably deploy it to multiple nodes.
For whatever reason the worker which runs the ingress controller dies and it gets rescheduled to another worker. So the ingress point
will get be at another IPv4 address, right? From a user perspective
which tries to access the frontend via www.domain.tld, this DNS entry
has to be updated, right? How so? Do I need to run a specific
kubernetes-aware DNS server somewhere? I don't understand the
connection between the DNS server and the kubernetes cluster.
Yes, the IP will change. And yes, this needs to be updated in your DNS server.
There are a few ways to handle this:
assume clients will deal with outages. you can list all load balancer nodes in round-robin and assume clients will fallback. this works with some protocols, but mostly implies timeouts and problems and should generally not be used, especially since you still need to update the records by hand when k8s figures it will create/remove LB entries
configure an external DNS server automatically. this can be done with the external-dns project which can sync against most of the popular DNS servers, including standard RFC2136 dynamic updates but also cloud providers like Amazon, Google, Azure, etc.
Bonus question: If I run more ingress controllers replicas (spread
across multiple workers) do I do a DNS-round robin based approach here
with multiple IPv4 addresses bound to one DNS entry? Or what's the
best solution to achieve HA. I rather not want to use load balancing
IP addresses where the worker share the same IP address.
Yes, you should basically do DNS round-robin. I would assume external-dns would do the right thing here as well.
Another alternative is to do some sort of ECMP. This can be accomplished by having both load balancers "announce" the same IP space. That is an advanced configuration, however, which may not be necessary. There are interesting tradeoffs between BGP/ECMP and DNS updates, see this dropbox engineering post for a deeper discussion about those.
Finally, note that CoreDNS is looking at implementing public DNS records which could resolve this natively in Kubernetes, without external resources.
Isn't running the ingress controller only on one worker node pointless to internally load balance two the two frontend pods via its service? Is it best practice to run an ingress controller on every worker node in the cluster?
A quantity of replicas of the ingress will not affect the quality of load balancing. But for HA you can run more than 1 replica of the controller.
For whatever reason the worker which runs the ingress controller dies and it gets rescheduled to another worker. So the ingress point will get be at another IPv4 address, right? From a user perspective which tries to access the frontend via www.domain.tld, this DNS entry has to be updated, right? How so? Do I need to run a specific kubernetes-aware DNS server somewhere? I don't understand the connection between the DNS server and the kubernetes cluster.
Right, it will be on another IPv4. Yes, DNS should be updated for that. There are no standard tools for that included in Kubernetes. Yes, you need to run external DNS and somehow manage records on it manually (by some tools or scripts).
DNS server inside a Kubernetes cluster and your external DNS server are totally different things. DNS server inside the cluster provides resolving only inside the cluster for service discovery. Kubernetes does not know anything about access from external networks to the cluster, at least on bare-metal. In a cloud, it can manage some staff like load-balancers to automate external access management.
I run more ingress controllers replicas (spread across multiple workers) do I do a DNS-round robin based approach here with multiple IPv4 addresses bound to one DNS entry? Or what's the best solution to achieve HA.
DNS round-robin works in that case, but if one of the nodes is down, your clients will get a problem with connecting to that node, so you need to find some way to move/remove IP of that node.
The solutions for HA provided by #jjo is not the worst way to achieve what you want if you can prepare an environment for that. If not, you should choose something else, but the best practice is using a Load Balancer provided by an infrastructure. Will it be based on several dedicated servers, or load balancing IPs, or something else - it does not matter.
The behavior you describe is actually a LoadBalancer (a Service with type=LoadBalancer in Kubernetes), which is "naturally" provided when you're running Kubernetes on top of a cloud provider.
From your description, it looks like your cluster is on bare-metal (either true or virtual metal), a possible approach (that has worked for me) will be:
Deploy https://github.com/google/metallb
this is where your external IP will "live" (HA'd), via the speaker-xxx pods deployed as DaemonSet to each worker node
depending on your extn L2/L3 setup, you'll need to choose between L3 (BGP) or L2 (ARP) modes
fyi I've successfully used L2 mode + simple proxyarp at the border router
Deploy nginx-ingress controller, with its Service as type=LoadBalancer
this will make metallb to "land" (actually: L3 or L2 "advertise" ...) the assigned IP to the nodes
fyi I successfully tested it together with kube-router using --advertise-loadbalancer-ip as CNI, the effect will be that e.g. <LB_IP>:80 will be redirected to the ingress-nginx Service NodePort
Point your DNS to ingress-nginx LB IP, i.e. what's shown by:
kubectl get svc --namespace=ingress-nginx ingress-nginx -ojsonpath='{.status.loadBalancer.ingress[].ip}{"\n"}'
fyi you can also quickly test it using fake DNSing with http://A.B.C.D.xip.io/ (A.B.C.D being your public IP addr)
Here is a Kubernetes DNS add-ons Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services allowing to handle DNS record updates for ingress LoadBalancers. It allows to keep DNS record up to date according to Ingress controller config.
Made my way into kubernetes through GKE, currently trying out via kubeadm on bare metal.
In the later environment, there is no need of any specific load balancer; using nginx-ingress and ingresses let one serve service to the www.
Oppositely, on gke, using the same nginx-ingress, or using the gke provided l7, you always end up with a billed load balancer.
What's the reason about that, as it seemed not to be ultimately needed ?
(Reposting my comment above)
In general, when one is receiving traffic from the outside world, that traffic is being sent to one or more non-ACLd public IP addresses.
If you run k8s on bare metals, those BMs can have public IPs, and you can just run ingress on one or more of them.
A managed k8s environment, however, for security reasons, will not permit nodes to have public IPs.
Instead, managed load balancers are allowed to have public IPs. Those are configured to know the private node IPs hosting ingress for your cluster and will direct traffic accordingly.
Kubernetes services have few types, each building up on previous one : ClusterIP, NodePort and LoadBalancer. Only the last one will provision LoadBalancer in a cloud environment, so you can avoid it on GKE without fuzz. The question is, what then? Because, in best case you end up with an Ingress (I assume we expose ingress as in your question), that is available on volatile IPs (nodes can be rolled at any time and new ones will get new IPs) and high ports given by NodePort service. Meaning that not only you have no fixed IP to use, but also you would need to open something like http://:31978, which obviously is crap. Hence, in cloud, you have a simple solution of putting a cloud load balancer in front of it with LoadBalancer service type. This LB will ingest the traffic on port 80/443 and forward it to correct backing service/pods.