Having 1 outgoing IP for kubernetes egress traffic - kubernetes

Current set-up
Cluster specs: Managed Kubernetes on Digital Ocean
Goal
My pods are accessing some websites but I want to use a proxy first.
Problem
The proxy I need to use is only taking 1 IP address in an "allow-list".
My cluster is using different nodes, with node-autoscaler so I have multiple and changing IP addresses.
Solutions I am thinking about
Setting-up a proxy (squid? nginx?) outside of the cluster (Currently not working when I access an HTTPS website)
Istio could let me set-up a gateway? (No knowledge of Istio)
Use GCP managed K8s, and follow the answers on Kubernetes cluster outgoing traffic IP. But all our stack is on Digital Ocean and the pricing is better there.
I am curious to know what is the best practice, easiest solution or if anyone experienced such use-case before :)
Best

You could set up all your traffic to go through istio-egressgateway.
Then you could manipulate the istio-egressgateway to always be deployed on the same node of the cluster, and whitelist that IP address.
Pros: super easy. BUT. If you are not using Istio already, to set up Istio just for this is may be killing a mosquito with a bazooka.
Cons: Need to make sure the node doesn't change the IP address. Otherwise the istio-egressgateway itself might not get deployed (if you do not have the labels added to the new node), and you will need to reconfigure everything for the new node (new IP address). Another con might be the fact that if the traffic goes up, there is an HPA, which will deploy more replicas of the gateway, and all of them will be deployed on the same node. So, if you are going to have lots of traffic, may be it would be a good idea to isolate one node, just for this purpose.
Another option would be as you are suggesting; a proxy. I would recommend an Envoy proxy directly. I mean, Istio is going to be using Envoy anyways right? So, just get the proxy directly, put it in a pod, do the same thing as I mentioned before; node affinity, so it will always run on the same node, so it will go out with the same IP.
Pros: You are not installing entire service mesh control plane for one tiny thing.
Cons: Same as before, as you still have the issue of the node IP change if something goes wrong, plus you will need to manage your own Deployment object, HPA, configure the Envoy proxy, etc. instead of using Istio objects (like Gateway and a VirtualService).
Finally, I see a third option; to set up a NAT gateway outside the cluster, and configure your traffic to go through it.
Pros: You won't have to configure any kubernetes object, therefor there will be no need to set up any node affinity, therefor there will be no node overwhelming or IP change. Plus you can remove the external IP addresses from your cluster, so it will be more secure (unless you have other workloads that need to reach internet directly). Also , probably having a single node configured as NAT will be more resilient then a kubernetes pod, running in a node.
Cons: May be a little bit more complicate to set up?
And there is this general Con, that you can whitelist only 1 IP address, so you will always have a single point of failure. Even NAT gateway; it still can fail.
The GCP static IP won't help you. What is suggesting the other post is to reserve an IP address, so you can re-use it always. But it's not that you will have that IP address automatically added to a random node that goes down. Human intervention is needed. I don't think you can have one specific node to have a static IP address, and if it goes down, the new created node will pick the same IP. That service, to my knowledge, doesn't exist.
Now, GCP does offer a very resilient NAT gateway. It is managed by Google, so shouldn't fail. Not cheap though.

Related

How to automatically update the Service `spec.externalIPs` when a Kubernetes worker is drained/down?

I'm hosting a Kubernetes cluster on VMs/VPS from a random cloud provider not providing any Kubernetes things at all, meaning with a dedicated public IP address and to allow the trafic coming to the worker nodes, I'm defining my Service with the spec.externalIPs with the fixed list of IP addresses.
I'm looking for a way to get that list updated when a node is drained/down automatically.
I had a look at the existing operators from https://operatorhub.io/ but I haven't found any that seem to cover my use case.
The idea would that when the event of a node passing to NotReady is emitted, the Service is updated with the Nodes being Ready.
Is there any operator that could allow doing that?
After some time working on this, I finally figured out that this is not possible, at least today, there's no known operator or what so ever that could update the field with the IP addresses.
And even if it was the case, there would be delays to update the DNS records.
What I've done instead is to buy another VPS, installing HAproxy in order to proxy the Kubernetes API trafic to the master nodes, and the web trafic (both 80 and 443) to the Kubernetes worker nodes.
HAproxy monitors the nodes, and add/remove nodes automagically and in a very quick way.
With this, you just need one DNS record, pointing to the Load Balancer (or VIP of the Load Balancers in order to avoid SPOF), and HAproxy will do the rest!

Q: Efficient Kubernetes load balancing

I've been looking into Kubernetes networking, more specifically, how to serve HTTPS users the most efficient.
I was watching this talk: https://www.youtube.com/watch?v=0Omvgd7Hg1I and from 22:18 he explains what the problem is with a load balancer that is not pod aware. Now, how they solve this in kubernetes is by letting the nodes also act as a 'router' and letting the node pass the request on to another node. (explained at 22:46). This does not seem very efficient, but when looking around SoundCloud (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic) actually seems to do something similar to this but with NodePorts. They say that the overhead costs less than creating a better load balancer.
From what I have read an option might be using an ingress controller. Making sure that there is not more than one ingress controller per node, and routing the traffic to the specific nodes that have an ingress controller. That way there will not be any traffic re-routing needed. However, this does add another layer of routing.
This information is all from 2017, so my question is: is there any pod aware load balancer out there, or is there some other method that does not involve sending the http request and response over the network twice?
Thank you in advance,
Hendrik
EDIT:
A bit more information about my use case:
There is a bare-metal setup with kubernetes. The firewall load balances the incomming data between two HAProxy instances. These HAProxy instances do ssl termination and forward the traffic to a few sites. This includes an exchange setup, a few internal IIS sites and a nginx server for a static web app. The idea is to transform the app servers into kubernetes.
Now my main problem is how to get the requests from HAProxy into kubernetes. I see a few options:
Use the SoundCloud setup. The infrastructure could stay almost the same, the HAProxy server can still operate the way they do now.
I could use an ingress controller on EACH node in the kubernetes cluster and have the firewall load balance between the nodes. I believe it is possible to forward traffic from the ingress controller to server outside the cluster, e.g. exchange.
Some magic load balancer that I do not know about that is pod aware and able to operate outside of the kubernetes cluster.
Option 1 and 2 are relatively simple and quite close in how they work, but they do come with a performance penalty. This is the case when the node that the requests gets forwarded to by the firewall does not have the required pod running, or if another pod is doing less work. The request will get forwarded to another node, thus, using the network twice.
Is this just the price you pay when using Kubernetes, or is there something that I am missing?
How traffic heads to pods depend on whether a managed cluster is used.
Almost all cloud providers can forward traffic in a cloud-native way in their managed K8s clusters. First, you can a managed cluster with some special network settings (e.g. vpc-native cluster of GKE). Then, the only thing you need to do is to create a LoadBalancer typed Service to expose your workload. You can also create Ingresses for your L7 workloads, they are going to be handled by provided IngressControllers (e.g. ALB of AWS).
In an on-premise cluster without any cloud provider(OpenStack or vSphere), the only way to expose workloads is NodePort typed Service. It doesn't mean you can't improve it.
If your cluster is behind reverse proxies (the SoundCloud case), setting externalTrafficPolicy: Local to Services could break traffic forwarding among work nodes. When traffic received through NodePorts, they are forwarded to local Pods or dropped if Pods reside on other nodes. Reserve proxy will mark these NodePort as unhealthy in the backend health check and reject to forward traffic to them. Another choice is to use topology-aware service routing. In this case, local Pods have priorities and traffic is still forwarded between node when no local Pods matched.
For IngressController in on-prem clusters, it is a little different. You may have some work nodes that have EIP or public IP. To expose HTTP(S) services, an IngressController usually deployed on those work nodes through DaemeaSet and HostNetwork such that clients access the IngressController via the well-known ports and EIP of nodes. These work nodes regularly don't accept other workloads (e.g. infra node in OpenShift) and one more forward on the Pod network is needed. You can also deploy the IngressController on all work nodes as well as other workloads, so traffic could be forwarded to a closer Pod if the IngressController supports topology-aware service routing although it can now.
Hope it helps!

How to assign a single static source IP address for all pods of a service or deployment in kubernetes?

Consider a microservice X which is containerized and deployed in a kubernetes cluster. X communicates with a Payment Gateway PG. However, the payment gateway requires a static IP for services contacting it as it maintains a whitelist of IP addresses which are authorized to access the payment gateway. One way for X to contact PG is through a third party proxy server like QuotaGuard which will provide a static IP address to service X which can be whitelisted by the Payment Gateway.
However, is there an inbuilt mechanism in kubernetes which can enable a service deployed in a kube-cluster to obtain a static IP address?
there's no mechanism in Kubernetes for this yet.
other possible solutions:
if nodes of the cluster are in a private network behind a NAT then just add your network's default gateway to the PG's whitelist.
if whitelist can accept a cidr apart from single IPs (like 86.34.0.0/24 for example) then add your cluster's network cidr to the whitelist
If every node of the cluster has a public IP and you can't add a cidr to the whitelist then it gets more complicated:
a naive way would be to add ever node's IP to the whitelist, but it doesn't scale above tiny clusters few just few nodes.
if you have access to administrating your network, then even though nodes have pubic IPs, you can setup a NAT for the network anyway that targets only packets with PG's IP as a destination.
if you don't have administrative access to the network, then another way is to allocate a machine with a static IP somewhere and make it act as a proxy using iptables NAT similarly like above again. This introduces a single point of failure though. In order to make it highly available, you could deploy it on a kubernetes cluster again with few (2-3) replicas (this can be the same cluster where X is running: see below). The replicas instead of using their node's IP to communicate with PG would share a VIP using keepalived that would be added to PG's whitelist. (you can have a look at easy-keepalived and either try to use it directly or learn from it how it does things). This requires high privileges on the cluster: you need be able to grant to pods of your proxy NET_ADMIN and NET_RAW capabilities in order for them to be able to add iptables rules and setup a VIP.
update:
While waiting for builds and deployments during last few days, I've polished my old VIP-iptables scripts that I used to use as a replacement for external load-balancers on bare-metal clusters, so now they can be used as well to provide egress VIP as described in the last point of my original answer. You can give them a try: https://github.com/morgwai/kevip
There are two answers to this question: for the pod IP itself, it depends on your CNI plugin. Some allow it with special pod annotations. However most CNI plugins also involve a NAT when talking to the internet so the pod IP being static on the internal network is kind of moot, what you care about is the public IP the connection ends up coming from. So the second answer is "it depends on how your node networking and NAT is set up". This is usually up to the tool you used to deploy Kubernetes (or OpenShift in your case I guess). With Kops it's pretty easy to tweak the VPC routing table.

custom outgoing network path for kubernetes pod

say I have 2 sites, S1,S2; each with at least a kubernetes worker. The two sites are geographically apart and have different public IPs on the nodes/workers.
Does kubernetes offer any existing mechanisms to route outgoing internet traffic from a pod/container in S1, via S2 ?
The goal is to be able to use the public IP(s) in S2 for pods in S1.
If k8s-federation is a requisite for a solution; then that is fine.
Kubernetes doesn't have any input on this, it's up to your network design and structure, just like it would be with traditional VMs or whatnot. That said, this sounds like a very bad network design given what you described so I would be surprised if it was easy to set up. Calico runs normal BGP under the hood so you can probably set up two ASes and force one to route via the other.
With calico you can introduce a non default IPPool which does not do NAT for outgoing traffic. Then you can annotate the pods and/or namespaces you want to use that IPPool.
At that point it leaves you with non-working outgoing traffic, since your cluster IPs are "leaking" to the outside world. But a return path is not known by any upstream routers between your cluster and the internet.
You have to let your router, which should be between your cluster and the internet, know about the cluster ranges. You can use the global BGPPeer concept from calico. Once you've done that, als set up BGP on your router. (Use [1]) for more info.
From there you should have all the flexibility on your router to route it differently based on the non default IPPool's subnet and/or first tunnel it, e.g., to the questioner's 'S2'
Note that unless you are truly using public IP space, i.e., non RFC-1918 IPs (plus some others), you should introduce NATing somewhere yourself now, you can choose to do so in S1 or S2, if you opt for the latter, than that site also needs to know about the return path back to your router.
This is not really a cloud-native solution since you're just moving the problem from kubernetes to "old-school" domain of policy based routing on fixed subnets -- which is not really what the questioner asked for since he implied that there is also a kubernetes process in 'S2'. In the possible solution above, a k8s process is not needed in S2.
This is what #coderanger's custom outgoing network path for kubernetes pod was suggesting

Why is it bad to use Pod IPs to communicate within a Kubernetes cluster?

So I'm setting up a NATS cluster at work in OpenShift. I can easily get things to work by having each NATS server instance broadcast its Pod IP to the cluster. The guy I talked to at work strongly advised against using the Pod IP and suggested using the Pod name. In the email, he said something about if a pod restarted. But like I tried deleting the pod and the new Pod IP was in the list of connect urls for NATS and it worked fine. I know Kubernetes has DNS and you can use the headless service but it seems somewhat flaky to me. The Pod IP works.
I believe "the guy at work" has a point, to a certain extent, but it's hard to tell to which extent it's cargo-culting and what is half knowledge. The point being: the pod IPs are not stable, that is, every time a pod gets re-launched (on the same node or somewhere else, doesn't matter) it will get a new IP from the pod CIDR-range assigned.
Now, services provide stability by introducing a virtual IP (VIP): this acts as a cluster-internal mini-load balancer sitting in front of pods and yes, the recommended way to talk to pods, in the general case, is via services. Otherwise, you'd need to keep track of the pod IPs out-of-band, no bueno.
Bottom-line: if NATS manages that for you, keeps track and maps pod IPs then fine, use it, no harm done.
While the answer from Michael is mostly true, it is important to understand there is no 100% guarantee that a service IP (aka ClusterIP) service will not change it's IP. There is a specific case of service recreation (delete/create) that will cause service IP change.
That said, the situation is somewhat different for services that have their own means of autodiscovery and/or clustering. Usually it will not be fine or enough to have a single regular service. They need to connect to seed, or discover all nodes etc. One of the means that you might use here are headless services, which return, under given name a full list of all, direct pod IPs.
Mind that using headles service has its tiny quirks as well, ie. not all software re-resolves DNS over time after initial startup, so you might end up with cached endpoints that become obsolete over time.
You might also want to leverage StatefulSets capability to retain a deterministic name (aka network identity) for each pod (ie. mypod-1, mypod-2 etc.) which, combined with headless Service, will give you static per pod names to use.
I do think that using only pod IPs will probably lead to some issues at one edge case or another, so you should at least use one of the above solutions for cluster discovery/registration. For actual communication during and after the pod was registered in the cluster, use of pod IPs can actually be for the best.