How to automatically update the Service `spec.externalIPs` when a Kubernetes worker is drained/down? - kubernetes

I'm hosting a Kubernetes cluster on VMs/VPS from a random cloud provider not providing any Kubernetes things at all, meaning with a dedicated public IP address and to allow the trafic coming to the worker nodes, I'm defining my Service with the spec.externalIPs with the fixed list of IP addresses.
I'm looking for a way to get that list updated when a node is drained/down automatically.
I had a look at the existing operators from https://operatorhub.io/ but I haven't found any that seem to cover my use case.
The idea would that when the event of a node passing to NotReady is emitted, the Service is updated with the Nodes being Ready.
Is there any operator that could allow doing that?

After some time working on this, I finally figured out that this is not possible, at least today, there's no known operator or what so ever that could update the field with the IP addresses.
And even if it was the case, there would be delays to update the DNS records.
What I've done instead is to buy another VPS, installing HAproxy in order to proxy the Kubernetes API trafic to the master nodes, and the web trafic (both 80 and 443) to the Kubernetes worker nodes.
HAproxy monitors the nodes, and add/remove nodes automagically and in a very quick way.
With this, you just need one DNS record, pointing to the Load Balancer (or VIP of the Load Balancers in order to avoid SPOF), and HAproxy will do the rest!

Related

Having 1 outgoing IP for kubernetes egress traffic

Current set-up
Cluster specs: Managed Kubernetes on Digital Ocean
Goal
My pods are accessing some websites but I want to use a proxy first.
Problem
The proxy I need to use is only taking 1 IP address in an "allow-list".
My cluster is using different nodes, with node-autoscaler so I have multiple and changing IP addresses.
Solutions I am thinking about
Setting-up a proxy (squid? nginx?) outside of the cluster (Currently not working when I access an HTTPS website)
Istio could let me set-up a gateway? (No knowledge of Istio)
Use GCP managed K8s, and follow the answers on Kubernetes cluster outgoing traffic IP. But all our stack is on Digital Ocean and the pricing is better there.
I am curious to know what is the best practice, easiest solution or if anyone experienced such use-case before :)
Best
You could set up all your traffic to go through istio-egressgateway.
Then you could manipulate the istio-egressgateway to always be deployed on the same node of the cluster, and whitelist that IP address.
Pros: super easy. BUT. If you are not using Istio already, to set up Istio just for this is may be killing a mosquito with a bazooka.
Cons: Need to make sure the node doesn't change the IP address. Otherwise the istio-egressgateway itself might not get deployed (if you do not have the labels added to the new node), and you will need to reconfigure everything for the new node (new IP address). Another con might be the fact that if the traffic goes up, there is an HPA, which will deploy more replicas of the gateway, and all of them will be deployed on the same node. So, if you are going to have lots of traffic, may be it would be a good idea to isolate one node, just for this purpose.
Another option would be as you are suggesting; a proxy. I would recommend an Envoy proxy directly. I mean, Istio is going to be using Envoy anyways right? So, just get the proxy directly, put it in a pod, do the same thing as I mentioned before; node affinity, so it will always run on the same node, so it will go out with the same IP.
Pros: You are not installing entire service mesh control plane for one tiny thing.
Cons: Same as before, as you still have the issue of the node IP change if something goes wrong, plus you will need to manage your own Deployment object, HPA, configure the Envoy proxy, etc. instead of using Istio objects (like Gateway and a VirtualService).
Finally, I see a third option; to set up a NAT gateway outside the cluster, and configure your traffic to go through it.
Pros: You won't have to configure any kubernetes object, therefor there will be no need to set up any node affinity, therefor there will be no node overwhelming or IP change. Plus you can remove the external IP addresses from your cluster, so it will be more secure (unless you have other workloads that need to reach internet directly). Also , probably having a single node configured as NAT will be more resilient then a kubernetes pod, running in a node.
Cons: May be a little bit more complicate to set up?
And there is this general Con, that you can whitelist only 1 IP address, so you will always have a single point of failure. Even NAT gateway; it still can fail.
The GCP static IP won't help you. What is suggesting the other post is to reserve an IP address, so you can re-use it always. But it's not that you will have that IP address automatically added to a random node that goes down. Human intervention is needed. I don't think you can have one specific node to have a static IP address, and if it goes down, the new created node will pick the same IP. That service, to my knowledge, doesn't exist.
Now, GCP does offer a very resilient NAT gateway. It is managed by Google, so shouldn't fail. Not cheap though.

Q: Efficient Kubernetes load balancing

I've been looking into Kubernetes networking, more specifically, how to serve HTTPS users the most efficient.
I was watching this talk: https://www.youtube.com/watch?v=0Omvgd7Hg1I and from 22:18 he explains what the problem is with a load balancer that is not pod aware. Now, how they solve this in kubernetes is by letting the nodes also act as a 'router' and letting the node pass the request on to another node. (explained at 22:46). This does not seem very efficient, but when looking around SoundCloud (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic) actually seems to do something similar to this but with NodePorts. They say that the overhead costs less than creating a better load balancer.
From what I have read an option might be using an ingress controller. Making sure that there is not more than one ingress controller per node, and routing the traffic to the specific nodes that have an ingress controller. That way there will not be any traffic re-routing needed. However, this does add another layer of routing.
This information is all from 2017, so my question is: is there any pod aware load balancer out there, or is there some other method that does not involve sending the http request and response over the network twice?
Thank you in advance,
Hendrik
EDIT:
A bit more information about my use case:
There is a bare-metal setup with kubernetes. The firewall load balances the incomming data between two HAProxy instances. These HAProxy instances do ssl termination and forward the traffic to a few sites. This includes an exchange setup, a few internal IIS sites and a nginx server for a static web app. The idea is to transform the app servers into kubernetes.
Now my main problem is how to get the requests from HAProxy into kubernetes. I see a few options:
Use the SoundCloud setup. The infrastructure could stay almost the same, the HAProxy server can still operate the way they do now.
I could use an ingress controller on EACH node in the kubernetes cluster and have the firewall load balance between the nodes. I believe it is possible to forward traffic from the ingress controller to server outside the cluster, e.g. exchange.
Some magic load balancer that I do not know about that is pod aware and able to operate outside of the kubernetes cluster.
Option 1 and 2 are relatively simple and quite close in how they work, but they do come with a performance penalty. This is the case when the node that the requests gets forwarded to by the firewall does not have the required pod running, or if another pod is doing less work. The request will get forwarded to another node, thus, using the network twice.
Is this just the price you pay when using Kubernetes, or is there something that I am missing?
How traffic heads to pods depend on whether a managed cluster is used.
Almost all cloud providers can forward traffic in a cloud-native way in their managed K8s clusters. First, you can a managed cluster with some special network settings (e.g. vpc-native cluster of GKE). Then, the only thing you need to do is to create a LoadBalancer typed Service to expose your workload. You can also create Ingresses for your L7 workloads, they are going to be handled by provided IngressControllers (e.g. ALB of AWS).
In an on-premise cluster without any cloud provider(OpenStack or vSphere), the only way to expose workloads is NodePort typed Service. It doesn't mean you can't improve it.
If your cluster is behind reverse proxies (the SoundCloud case), setting externalTrafficPolicy: Local to Services could break traffic forwarding among work nodes. When traffic received through NodePorts, they are forwarded to local Pods or dropped if Pods reside on other nodes. Reserve proxy will mark these NodePort as unhealthy in the backend health check and reject to forward traffic to them. Another choice is to use topology-aware service routing. In this case, local Pods have priorities and traffic is still forwarded between node when no local Pods matched.
For IngressController in on-prem clusters, it is a little different. You may have some work nodes that have EIP or public IP. To expose HTTP(S) services, an IngressController usually deployed on those work nodes through DaemeaSet and HostNetwork such that clients access the IngressController via the well-known ports and EIP of nodes. These work nodes regularly don't accept other workloads (e.g. infra node in OpenShift) and one more forward on the Pod network is needed. You can also deploy the IngressController on all work nodes as well as other workloads, so traffic could be forwarded to a closer Pod if the IngressController supports topology-aware service routing although it can now.
Hope it helps!

Unique external IP per kubernetes pod

I need to scale my application so that it won't get banned for passing request rate-limit of a site it uses frequently (which allow up to X requests per minute per IP).
I meant to use kubernetes and split the requests between multiple workers, but I saw that all the pods get the same external IP.
so what can I do?
I used kubernetes DaemonSet to attach pod to each node, and instead of scaling by changing deployment, I'm scaling by adding new nodes.
If you run in cloud you can create worker nodes with Public IP addresses. Then your pods will use node's public IP address. And then you can somehow distribute your pods across nodes using multiple replicas or DaemonSet.
do not worry a bout getting one external IP because if you have 3 worker and one master like below
worker1 192.168.1.10
worker2 192.168.1.11
worker3 192.168.1.12
master 192.168.1.13
and forexample if you expose nginx on 30000 port the kubernetes open this port in every nod and you can access it by
curl 192.168.1.10:30000
curl 192.168.1.11:30000
curl 192.168.1.12:30000
curl 192.168.1.13:30000
and if you want to every worker have one pod you can use DaemonSet or you can use label to the node that you want
This probably has less to do with your Kubernetes implementation and more to do with your network setup. It would depend on the source of the "exernal IP" you're referencing: is it given to you by your ISP? If you google "what is my ip", does it match the single IP you're talking about? If so, then you would need to negotiate with your ISP for additional external IPs.
Worth Noting that #JamesJJ is correct. Using additional IPs to 'trick' the API into allowing more connections is most likely a violation of that site's TOS and may result in your access getting terminated.

How to setup up DNS and ingress-controllers for a public facing web app?

I'm trying to understand the concepts of ingress and ingress controllers in kubernetes. But I'm not so sure what the end product should look like. Here is what I don't fully understand:
Given I'm having a running Kubernetes cluster somewhere with a master node which runes the control plane and the etcd database. Besides that I'm having like 3 worker nodes - each of the worker nodes has a public IPv4 address with a corresponding DNS A record (worker{1,2,3}.domain.tld) and I've full control over my DNS server. I want that my users access my web application via www.domain.tld. So I point the the www CNAME to one of the worker nodes (I saw that my ingress controller i.e. got scheduled to worker1 one so I point it to worker1.domain.tld).
Now when I schedule a workload consisting of 2 frontend pods and 1 database pod with 1 service for the frontend and 1 service for the database. From what've understand right now, I need an ingress controller pointing to the frontend service to achieve some kind of load balancing. Two questions here:
Isn't running the ingress controller only on one worker node pointless to internally load balance two the two frontend pods via its service? Is it best practice to run an ingress controller on every worker node in the cluster?
For whatever reason the worker which runs the ingress controller dies and it gets rescheduled to another worker. So the ingress point will get be at another IPv4 address, right? From a user perspective which tries to access the frontend via www.domain.tld, this DNS entry has to be updated, right? How so? Do I need to run a specific kubernetes-aware DNS server somewhere? I don't understand the connection between the DNS server and the kubernetes cluster.
Bonus question: If I run more ingress controllers replicas (spread across multiple workers) do I do a DNS-round robin based approach here with multiple IPv4 addresses bound to one DNS entry? Or what's the best solution to achieve HA. I rather not want to use load balancing IP addresses where the worker share the same IP address.
Given I'm having a running Kubernetes cluster somewhere with a master
node which runes the control plane and the etcd database. Besides that
I'm having like 3 worker nodes - each of the worker nodes has a public
IPv4 address with a corresponding DNS A record
(worker{1,2,3}.domain.tld) and I've full control over my DNS server. I
want that my users access my web application via www.domain.tld. So I
point the the www CNAME to one of the worker nodes (I saw that my
ingress controller i.e. got scheduled to worker1 one so I point it to
worker1.domain.tld).
Now when I schedule a workload consisting of 2 frontend pods and 1
database pod with 1 service for the frontend and 1 service for the
database. From what've understand right now, I need an ingress
controller pointing to the frontend service to achieve some kind of
load balancing. Two questions here:
Isn't running the ingress controller only on one worker node pointless to internally load balance two the two frontend pods via its
service? Is it best practice to run an ingress controller on every
worker node in the cluster?
Yes, it's a good practice. Having multiple pods for the load balancer is important to ensure high availability. For example, if you run the ingress-nginx controller, you should probably deploy it to multiple nodes.
For whatever reason the worker which runs the ingress controller dies and it gets rescheduled to another worker. So the ingress point
will get be at another IPv4 address, right? From a user perspective
which tries to access the frontend via www.domain.tld, this DNS entry
has to be updated, right? How so? Do I need to run a specific
kubernetes-aware DNS server somewhere? I don't understand the
connection between the DNS server and the kubernetes cluster.
Yes, the IP will change. And yes, this needs to be updated in your DNS server.
There are a few ways to handle this:
assume clients will deal with outages. you can list all load balancer nodes in round-robin and assume clients will fallback. this works with some protocols, but mostly implies timeouts and problems and should generally not be used, especially since you still need to update the records by hand when k8s figures it will create/remove LB entries
configure an external DNS server automatically. this can be done with the external-dns project which can sync against most of the popular DNS servers, including standard RFC2136 dynamic updates but also cloud providers like Amazon, Google, Azure, etc.
Bonus question: If I run more ingress controllers replicas (spread
across multiple workers) do I do a DNS-round robin based approach here
with multiple IPv4 addresses bound to one DNS entry? Or what's the
best solution to achieve HA. I rather not want to use load balancing
IP addresses where the worker share the same IP address.
Yes, you should basically do DNS round-robin. I would assume external-dns would do the right thing here as well.
Another alternative is to do some sort of ECMP. This can be accomplished by having both load balancers "announce" the same IP space. That is an advanced configuration, however, which may not be necessary. There are interesting tradeoffs between BGP/ECMP and DNS updates, see this dropbox engineering post for a deeper discussion about those.
Finally, note that CoreDNS is looking at implementing public DNS records which could resolve this natively in Kubernetes, without external resources.
Isn't running the ingress controller only on one worker node pointless to internally load balance two the two frontend pods via its service? Is it best practice to run an ingress controller on every worker node in the cluster?
A quantity of replicas of the ingress will not affect the quality of load balancing. But for HA you can run more than 1 replica of the controller.
For whatever reason the worker which runs the ingress controller dies and it gets rescheduled to another worker. So the ingress point will get be at another IPv4 address, right? From a user perspective which tries to access the frontend via www.domain.tld, this DNS entry has to be updated, right? How so? Do I need to run a specific kubernetes-aware DNS server somewhere? I don't understand the connection between the DNS server and the kubernetes cluster.
Right, it will be on another IPv4. Yes, DNS should be updated for that. There are no standard tools for that included in Kubernetes. Yes, you need to run external DNS and somehow manage records on it manually (by some tools or scripts).
DNS server inside a Kubernetes cluster and your external DNS server are totally different things. DNS server inside the cluster provides resolving only inside the cluster for service discovery. Kubernetes does not know anything about access from external networks to the cluster, at least on bare-metal. In a cloud, it can manage some staff like load-balancers to automate external access management.
I run more ingress controllers replicas (spread across multiple workers) do I do a DNS-round robin based approach here with multiple IPv4 addresses bound to one DNS entry? Or what's the best solution to achieve HA.
DNS round-robin works in that case, but if one of the nodes is down, your clients will get a problem with connecting to that node, so you need to find some way to move/remove IP of that node.
The solutions for HA provided by #jjo is not the worst way to achieve what you want if you can prepare an environment for that. If not, you should choose something else, but the best practice is using a Load Balancer provided by an infrastructure. Will it be based on several dedicated servers, or load balancing IPs, or something else - it does not matter.
The behavior you describe is actually a LoadBalancer (a Service with type=LoadBalancer in Kubernetes), which is "naturally" provided when you're running Kubernetes on top of a cloud provider.
From your description, it looks like your cluster is on bare-metal (either true or virtual metal), a possible approach (that has worked for me) will be:
Deploy https://github.com/google/metallb
this is where your external IP will "live" (HA'd), via the speaker-xxx pods deployed as DaemonSet to each worker node
depending on your extn L2/L3 setup, you'll need to choose between L3 (BGP) or L2 (ARP) modes
fyi I've successfully used L2 mode + simple proxyarp at the border router
Deploy nginx-ingress controller, with its Service as type=LoadBalancer
this will make metallb to "land" (actually: L3 or L2 "advertise" ...) the assigned IP to the nodes
fyi I successfully tested it together with kube-router using --advertise-loadbalancer-ip as CNI, the effect will be that e.g. <LB_IP>:80 will be redirected to the ingress-nginx Service NodePort
Point your DNS to ingress-nginx LB IP, i.e. what's shown by:
kubectl get svc --namespace=ingress-nginx ingress-nginx -ojsonpath='{.status.loadBalancer.ingress[].ip}{"\n"}'
fyi you can also quickly test it using fake DNSing with http://A.B.C.D.xip.io/ (A.B.C.D being your public IP addr)
Here is a Kubernetes DNS add-ons Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services allowing to handle DNS record updates for ingress LoadBalancers. It allows to keep DNS record up to date according to Ingress controller config.

kube2sky in kubernetes with multiple api servers

It a Kubernetes cluster where everything is highly available, the DNS is a key piece of the system, everything relies on the DNS.
The pod kube2sky has a parameter "-kube_master_url" where, afaik, you can only specify one api server node.
You might have multiple api servers for redundancy behing a service, but if the one that kube2sky is using gets down, the whole DNS system gets down too, hence, the highly availabily of the cluster is gone.
For other pods, you can use the internal DNS name of the api server service, but in this case, you can't since this is the actual DNS service.
Any idea how to solve this issue?
In its standard configuration, kube2sky doesn't actually rely on having a single apiserver IP address to use. Instead, it uses the virtual IP of the kubernetes service that gets auto-created in every cluster, and which the kube-proxy sets up iptables rules for. It's briefly explained in the docs on github.
Also, it's recommended that replicated masters are put behind a load balancer in such high-availability configurations to avoid problems like this with client tools.