Proxy outgoing traffic of Kubernetes cluster through a static IP - kubernetes

I am trying to build a service that needs to be connected to a socket over the internet without downtime. The service will be reading and publishing info to a message queue, messages should be published only once and in the order received.
For this reason I thought of deploying it into Kubernetes where I can automatically have multiple replicas in case one process fails, i.e. just one process (pod) should be running all time, not multiple pods publishing the same messages to the queue.
These requests need to be routed through a proxy with a static IP, otherwise I cannot connect to the socket. I understand this may not be a standard use case as a reverse proxy as it is normally use with load balancers such as Nginx.
How is it possible to build this kind of forward proxy in Kubernetes?
I will be deploying this on Google Container Engine.

Assuming you're happy to use Terraform, you can use this:
https://github.com/GoogleCloudPlatform/terraform-google-nat-gateway
However, there's one caveat and that is it may inbound traffic to other clusters in that same region/zone.

Is the LoadBalancer that you need?
kubernetes create external loadbalancer,you can see this doc.

Related

Q: Efficient Kubernetes load balancing

I've been looking into Kubernetes networking, more specifically, how to serve HTTPS users the most efficient.
I was watching this talk: https://www.youtube.com/watch?v=0Omvgd7Hg1I and from 22:18 he explains what the problem is with a load balancer that is not pod aware. Now, how they solve this in kubernetes is by letting the nodes also act as a 'router' and letting the node pass the request on to another node. (explained at 22:46). This does not seem very efficient, but when looking around SoundCloud (https://developers.soundcloud.com/blog/how-soundcloud-uses-haproxy-with-kubernetes-for-user-facing-traffic) actually seems to do something similar to this but with NodePorts. They say that the overhead costs less than creating a better load balancer.
From what I have read an option might be using an ingress controller. Making sure that there is not more than one ingress controller per node, and routing the traffic to the specific nodes that have an ingress controller. That way there will not be any traffic re-routing needed. However, this does add another layer of routing.
This information is all from 2017, so my question is: is there any pod aware load balancer out there, or is there some other method that does not involve sending the http request and response over the network twice?
Thank you in advance,
Hendrik
EDIT:
A bit more information about my use case:
There is a bare-metal setup with kubernetes. The firewall load balances the incomming data between two HAProxy instances. These HAProxy instances do ssl termination and forward the traffic to a few sites. This includes an exchange setup, a few internal IIS sites and a nginx server for a static web app. The idea is to transform the app servers into kubernetes.
Now my main problem is how to get the requests from HAProxy into kubernetes. I see a few options:
Use the SoundCloud setup. The infrastructure could stay almost the same, the HAProxy server can still operate the way they do now.
I could use an ingress controller on EACH node in the kubernetes cluster and have the firewall load balance between the nodes. I believe it is possible to forward traffic from the ingress controller to server outside the cluster, e.g. exchange.
Some magic load balancer that I do not know about that is pod aware and able to operate outside of the kubernetes cluster.
Option 1 and 2 are relatively simple and quite close in how they work, but they do come with a performance penalty. This is the case when the node that the requests gets forwarded to by the firewall does not have the required pod running, or if another pod is doing less work. The request will get forwarded to another node, thus, using the network twice.
Is this just the price you pay when using Kubernetes, or is there something that I am missing?
How traffic heads to pods depend on whether a managed cluster is used.
Almost all cloud providers can forward traffic in a cloud-native way in their managed K8s clusters. First, you can a managed cluster with some special network settings (e.g. vpc-native cluster of GKE). Then, the only thing you need to do is to create a LoadBalancer typed Service to expose your workload. You can also create Ingresses for your L7 workloads, they are going to be handled by provided IngressControllers (e.g. ALB of AWS).
In an on-premise cluster without any cloud provider(OpenStack or vSphere), the only way to expose workloads is NodePort typed Service. It doesn't mean you can't improve it.
If your cluster is behind reverse proxies (the SoundCloud case), setting externalTrafficPolicy: Local to Services could break traffic forwarding among work nodes. When traffic received through NodePorts, they are forwarded to local Pods or dropped if Pods reside on other nodes. Reserve proxy will mark these NodePort as unhealthy in the backend health check and reject to forward traffic to them. Another choice is to use topology-aware service routing. In this case, local Pods have priorities and traffic is still forwarded between node when no local Pods matched.
For IngressController in on-prem clusters, it is a little different. You may have some work nodes that have EIP or public IP. To expose HTTP(S) services, an IngressController usually deployed on those work nodes through DaemeaSet and HostNetwork such that clients access the IngressController via the well-known ports and EIP of nodes. These work nodes regularly don't accept other workloads (e.g. infra node in OpenShift) and one more forward on the Pod network is needed. You can also deploy the IngressController on all work nodes as well as other workloads, so traffic could be forwarded to a closer Pod if the IngressController supports topology-aware service routing although it can now.
Hope it helps!

Within a Kubernetes cluster catch outgoing requests from a Pod and redirect to a different target

I have a cluster with 3 nodes. In each node i have a frontend application running in a Pod and backend application running in a separate Pod.
I send data from the frontend application to the backend application, to do this i utilise the Cluster IP Service and k8 dns resource.
I also have a function in my frontend where i send data to a separate service unrelated to my k8s cluster. I send this data using a standard AJAX request to a url with a payload i.e http://my-seperate-service-unrelated-tok8.com.
All of this works correctly and the cluster operates as i want. - i have this cluster deployed to GKE. 

I now want to run this cluster local using minikube, which i have been able to do, however, when i am running locally i do not want to send data to my external service - instead i want to forward it to either a new Pod i will create or just not send it.


The problem here is i need a proxy to intercept outgoing network traffic, check if the outgoing request is the request i am looking for and if it is then redirect it.
I understand each node running in a cluster has a kube-proxy service running within the node - which is used to forward traffic to the relevant services in the cluster. 

I would like to either extend this service, or create a new proxy service where i can listen for outgoing traffic to a specific url and redirect it. 

Is this possible to do in a k8 cluster? I assume there is a Service i can create to listen for all outgoing requests and redirect specific requests based on rules i set. 

I wasn’t sure if k8 clusters have a Service already configured i can simply add to - that’s why i thought of the kube-proxy, would anyone be able to advice on this?

I wanted to add this proxy so i don’t have to change my code when its ran locally in minikube or deployed to GKE.


Any help is greatly appreciated. Thanks!
I did a tool that help you to forward a service to another service,local port, service from other cluster, etc...
This way you can have exactly your same urls, ports and code... but the underlying services gets "replaced", if I understand correctly this is what you are looking for.
Here is a quick example of an stage service being replaced with my local 3000 port
This is the repository with more info and examples: linker-tool
If you are interested let me know if you need help or have any question.

Synchronize HTTP requests between several service instances in Kubernetes

We have a service with multiple replicas which works with storage without transactions and blocking approaches. So we need somehow to synchronize concurrent requests between multiple instances by some "sharding" key. Right now we host this service in Kubernetes environment as a ReplicaSet.
Don't you know any simple out-of-the-box approaches on how to do this to not implement it from scratch?
Here are several of our ideas on how to do this:
Deploy the service as a StatefulSet and implement some proxy API which will route traffic to the specific pod in this StatefulSet by sharding key from the HTTP request. In this scenario, all requests which should be synchronized will be handled by one instance and it wouldn't be a problem to handle this case.
Deploy the service as a StatefulSet and implement some custom logic in the same service to re-route traffic to the specific instance (or process on this exact instance). As I understand it's not possible to have abstract implementation and it would work only in Kubernetes environment.
Somehow expose each pod IP outside the cluster and implement routing logic on the client-side.
Just implement synchronization between instances through some third-party service like Redis.
I would like to try to route traffic to the specific pods. If you know standard approaches how to handle this case I'll be much appreciated.
Thank you a lot in advance!
Another approach would be to put a messaging queue (like Kafka and RabbitMq) in front of your service.
Then your pods will subscribe to the MQ topic/stream. The pod will decide if it should process the message or not.
Also, try looking into service meshes like Istio or Linkerd.
They might have an OOTB solution for your use-case, although I wasn't able to find one.
Remember that Network Policy is not traffic routing !
Pods are intended to be stateless and indistinguishable from one another, pod-networking.
I recommend to Istio. It has special component which is responsible or routing- Envoy. It is a high-performance proxy developed in C++ to mediate all inbound and outbound traffic for all services in the service mesh.
Useful article: istio-envoy-proxy.
Istio documentation: istio-documentation.
Useful Istio explaination https://www.youtube.com/watch?v=e2kowI0fAz0.
But you should be able to create a Deployment per customer group, and a Service per Deployment. The Ingress nginx should be able to be told to map incoming requests by whatever attributes are relevant to specific customer group Services.
Other solution is to use kube-router.
Kube-router can be run as an agent or a Pod (via DaemonSet) on each node and leverages standard Linux technologies iptables, ipvs/lvs, ipset, iproute2.
Kube-router uses IPVS/LVS technology built in Linux to provide L4 load balancing. Each ClusterIP, NodePort, and LoadBalancer Kubernetes Service type is configured as an IPVS virtual service. Each Service Endpoint is configured as real server to the virtual service. The standard ipvsadm tool can be used to verify the configuration and monitor the active connections.
How it works: service-proxy.

How can I do port discovery with Kubernetes service discovery?

I have an HPC cluster application where I am looking to replace MPI and our internal cluster management software with a combination of Kubernetes and some middleware, most likely ZMQ or RabbitMQ.
I'm trying to design how best to do peer discovery on this system using Kubernetes' service discovery.
I know Kubernetes can provide a DNS name for a given service, and that's great, but is there a way to also dynamically discover ports?
For example, assuming I replaced the MPI middleware with ZeroMQ, I would need a way for ranks (processes on the cluster) to find each other. I know I could simply have the ranks issue service creation messages to the Kubernetes discovery mechanism and get a hostname like myapp_mypid_rank_42 fairly easily, but how would I handle the port?
If possible, it would be great if I could just do:
zmqSocket.connect("tcp://myapp_mypid_rank_42");
but I don't think that would work since I have no port number information from DNS.
How can I have Kubernetes service discovery also provide a port in as simple a manner as possible to allow ranks in the cluster to discover each other?
Note: The registering process knows its port and can register it with the K8s service discovery daemon. The problem is a quick and easy way to get that port number back for the processes that want it. The question I'm asking is whether or not there is a mechanism as simple as a DNS host name, or will I need to explicitly query both hostname and port number from the k8s daemon rather than simply building a hostname based on some agreed upon rule (like building a string from myapp_mypid_myrank)?
Turns out the best way to do this is with a DNS SRV record:
https://kubernetes.io/docs/concepts/services-networking/service/#discovering-services
https://en.wikipedia.org/wiki/SRV_record
A DNS SRV record provides both a hostname/IP and a port for a given request.
Luckily, Kubernetes service discovery supports SRV records and provides them on the cluster's DNS.
I think in the most usual case you should know the port number to access your services.
But if it is useful, Kubernetes add some environment variables to every pod to ease autodiscovery of all services. For example {SVCNAME}_SERVICE_HOST and {SVCNAME}_SERVICE_PORT. Docs here

Access a specific pod from external

We have an old service discovery system that requires processes to register its ip:port during startup. On a kubernetes cluster, we exposed a service that enables NodePort. The processes within container can register to the old system with their Pod Ip:port + HostIp. For the clients within the same kubernetes cluster, they should be able to connect to the right process via specific Pod Ip:port. For an external client, it knows the HostIp+NodePort and the specific Pod Ip:port, is there an efficient way to route the client’s request to the specific Pod? Running a proxy on each node to route the traffic (nodeport -> pod) seems inefficient due to additional proxy layer.
I guess you mean you don't want to add a Service of type NodePort as for your case that seems like an additional proxy layer. I can see how it is an additional layer in your case. Typically Kubernetes would be doing the orchestration and the Service would be part of the service-discovery mechanism. It sounds like you could use hostPort. But if you do go this route you should be aware it's not suggested practice as Kubernetes is intended for orchestration.