In Kubernetes, how do I implement session affinity using an Ingress? - kubernetes

I'd like to implement a sticky-session Ingress controller. Cookies or IP hashing would both be fine; I'm happy as long as the same client is generally routed to the same pod.
What I'm stuck on: it seems like the Kubernetes service model means my connections are going to be proxied randomly no matter what. I can configure my Ingress controller with session affinity, but as soon as the the connection gets past the that and hits a service, kube-proxy is just going to route me randomly. There's the sessionAffinity: ClientIP flag on services, but that doesn't help me -- the Client IP will always be the internal IP of the Ingress pod.
Am I missing something? Is this possible given Kubernetes' current architecture?

An ingress controller can completely bypass kube-proxy. The haproxy controller for example, does this and goes straight to endpoints. However it doesn't use the Ingress in the typical sense.
You could do the same with the nginx controller, all you need to lookup endpoints and insert them instead of the DNS name it currently uses (i.e swap this line for a pointer to an upstream that contains the endpoints).

I evaluated the haproxy controller but could not get it running reliably with session affinity. After some research I discovered Nginx Ingress Controller which since version 0.61 also includes the nginx-sticky-module-ng module and is now running reliably since a couple of days in our test environment. I created a Gist that sets up the required Kubernetes pieces since some important configuration is a bit hard to locate in the existing documentation. Good luck!

Related

Service-to-Service Communication in Kubernetes

I have deployed my Kubernetes cluster on EKS. I have an ingress-nginx which is exposed via load balancer to route traffic to different services. In ingress-nginx first request goes to auth service for authentication and if it is a valid request then I allow it move forward.
Let say the request is in Service 1 and now from there, it wants to communicate to Service 2. So if I somehow want my request to go directly to ingress not via load balancer and then from ingress to service 2.
Is is possible to do so?
Will it help in improving performance as I bypassed load balancer?
As the request is not moving through load balancer so load balancing won't take place, is it a serious concern?
1/ Is it possible: short answer, no.
There are edge cases, that would require for someone to create another Ingress object exposing Service2 in the first place. Then, you could trick the Ingress into routing you to some service that might not otherwise be reachable (if the DNS doesn't exist, some VIP was not yet exposed, ...)
There's no real issue with external clients bypassing the ELB, as long as they can not join all ports on your nodes, just the ones bound by your ingress controller.
2/ Bypassing the loadbalancer: won't change much in terms of performance.
If we're talking about a TCP loadbalancer, getting it away would help track real client IPs, though. Figuring out how to change it for an HTTP loadbalancer may be better -- though not always easy.
3/ Removing the LoadBalancer: if you have several nodes hosting replicas of your incress controller, then you would still be able to do some kind of DNS-based loadbalancing. Though for sure, it's not the same as having a real LB.
In AWS, you could find a middle ground setting up health-check based Route53 Records: set one for each node hosting an ingress controller, create another regrouping all healthy ingress nodes, then change your existing ingress FQDN records so they'ld all point to your new route53 name. You'ld be able to do TCP/HTTP checks against EC2 instances IPs, that's usually good enough. But again: DNS loadbalancing can suffer from outdated browser caches, some ISP not refreshing zones, ... LB is the real thing.

How to allow only one connection per pod using nginx ingress controller

My Kubernetes cluster uses a replicaSet to run N similar pods. Each pod can only handles one websocket connection du to resource limitations. My Kubernetes uses a nginx ingress controller.
Is there any way to make nginx dispatch only one incoming websocket connection per pod and in case of lack of available pods, refuses the incoming connection ?
I'm not super familiar with the Kubernetes Nginx ingress setup, but assuming it exposes some of the Nginx configuration options for setting up groups of servers, in the server function there's a parameter called max_conns that will let you limit the number of connections to a given server. Assuming there's a mapping in the ingress controller, it should be possible to set max_conns=1 for each server that's getting created and added to the Nginx configuration under the hood.
http://nginx.org/en/docs/http/ngx_http_upstream_module.html#server
Edit: a little cursory research and it looks like this is indeed possible. It looks like you can specify this in a ConfigMap as nginx.org/max-conns according to the master list of parameters here: https://github.com/nginxinc/kubernetes-ingress/blob/master/docs/configmap-and-annotations.md
You could use the readinessProbe with a periodSecond relatively low and, obvously, {success,failure}Threshold set to 1 in order to release or not the Pod as fast as possible.
Basically, you could set up a script or a simple HTTP endpoint that returns a failed status code in case a connection has been established: so the Pod endpoint will be removed from the Service endpoints list and will not be selected by the Ingress Controller.
Just keep in mind that this solution could be affected by race-conditions but it's the most simple one: a better solution could be using a Service Mesh but means additional complexity.

How expose multiple services on the same port in kubernetes using OpenStack

I have a Kubernetes cluster on a private cloud based on the OpenStack. My service is required to be exposed on a specific port. I am able to do this using NodePort. However, if I try to create another service similar to the first one, I am not able to expose it since I have to use the same port and it is already occupied by the first one.
I've noticed that I can use LoadBalancer in public clouds for this, but I assume this is not possible in OpenStack?
I also tried to use Ingress Controller of Kubernetes but it did not worked. However, I am not sure if I went through a correct way to do it.
Is there any other way else than LoadBalancer or Ingress to do this? (My first assumption was that if I dedicate my pods to specific nodes, then I should be able to expose each of services on the same port on different nodes, but this approach also did not worked.)
Please let me know if you have any thoughts on this.
You have to setup the OpenStack Cloud Provider: basically, this Deployment will watch for LoadBalancer Service and will provide an {internal,external} IP address you can use to interact with your application, even at L4 and not only (sic) L7 like many Ingress Controller resources.
If you want to only expose one port then the only answer to the best of my knowledge is an ingress-controller. The two most famous ones are Nginx and Traefik. I agree that setting up ingress-controller can be difficult and I had problems with them before but you have to solve them one by one.
Another thing you can do is you can build your own ingress controller. What I mean is to use a reverse proxy such as Nginx, configure it to reroute the traffic based on your topology then just expose this reverse proxy so all the traffic goes through this custom reverse proxy but this should be done just if you need something very customized.

Loadbalancing logic in service resource

In my kubernetes cluster, there is a lot of inter-service communication. I have also enabled horizontal pod autoscaler and we use service resource for all these services.
I need to understand how does the service resource loadbalance the request across the pods.
I did read about SessionAffinity but it only supports ClientIP, if you are not using an Ingress resource.
Want to know if the service can loadbalance based on the load (in terms of cpu, memory, etc.) on a particular pod. In short, does sessionAffinity config support anything other than ClientIP. I dont want to bring in an Ingress resource, as these are not external facing requests, these are inter-service communication.
Thanks in advance.
In short, does sessionAffinity config support anything other than ClientIP
No, it does not. See the model definition here for v1.SessionAffinityConfig: Nor will it; I already feel that sessionAffinity is out of a Service's scope, I'm surprised it exists at all.
You're going to want to use some kind of layer in front of your service if you want to have more control of your connections. There are plenty of Service Meshes that might solve your problem (see isito or Linkerd )
You could also roll your own solution with nginx and send your requests for that service to the nginx pod.

Expose each pod in a statefulset to the internet without a custom proxy

I have a StatefulSet with pods server-0, server-1, etc. I want to expose them directly to the internet with URLs like server-0.mydomain.com or like mydomain.com/server-0.
I want to be able to scale the StatefulSet and automatically be able to access the new pods from the internet. For example, if I scale it up to include a server-2, I want mydomain.com/server-2 to route requests to the new pod when it's ready. I don't want to have to also scale some other resource or create another Service to achieve that effect.
I could achieve this with a custom proxy service that just checks the request path and forwards to the correct pod internally, but this seems error-prone and wasteful.
Is there a way to cause an Ingress to automatically route to different pods within a StatefulSet, or some other built-in technique that would avoid custom code?
I don't think you can do it. Being part of the same statefulSet, all pods up to pod-x, are targeted by a service. As you can't define which pod is going to get a request, you can't force "pod-1.yourapp.com" or "yourapp.com/pod-1" to be sent to pod-1. It will be sent to the service, and the service might sent it to pod-4.
Even though if you could, you would need to dynamically update your ingress rules, which can cause a downtime of minutes, easily.
With the custom proxy, I see it impossible too. Note that it would need to basically replace the service behind the pods. If your ingress controller knows that it needs to deliver a packet to a service, now you have to force it to deliver to your proxy. But how?
A Kubernetes service is a set of iptables (or IPVS) rules that will redirect a packet with the ServiceIP as a destination address to ONE OF THE PODS that have the same label.
from Kubernetes Services documentation
The service installs iptables rules which select a backend Pod. By default, the choice of backend is random.
Which refers to the fact that a service is not able to distinguish between different pods in the same set.
If you want to force the selection of a specific Pod out of the set by changing the iprules (fairly simple), or by adding any type of proxy is problematic:
let's say you configured pod-1 and pod-2 (1.1.1.1 and 1.1.1.2 respectively), and you configured iptables rules to DNAT requests with destination pod-1.myserver.com to 1.1.1.1 and same for pod-2. (you may ask why the IP, and it's simply because it's the only way to distinguish between these pods)
This approach will fail whenever a pod restarts, let's say pod-1 failed, Kubernetes won't recreate the same pod with same IP and name, instead will create pod-3 with a different IP and updates the iptables accordingly. As a result, all the packets going toward 1.1.1.1 will be dropped until you update the proxy or iptables again.
In fact, that's one of the reasons why we use service to access pods instead of accessing them directly since the Pod IP can change however the service IP won't.
However, since this very specific part of kubernetes was my work for the last 4 months, I have developed a python script to edit the iptables and to choose a specific pod, my conclusion of that work was it's costy and time-consuming and will impose the server to go offline for a couple of seconds when the pods are changed, you can take a look at the code, it definitely works but its not recommended.
This problem is a kubernetes problem and the solution is changing the source code of Kube-proxy, which is my current work.
I suggest you read my answer explaining how kubernetes services exactly work in this question: Which service is doing load balancing between kubernetes nodes?