My Kubernetes cluster uses a replicaSet to run N similar pods. Each pod can only handles one websocket connection du to resource limitations. My Kubernetes uses a nginx ingress controller.
Is there any way to make nginx dispatch only one incoming websocket connection per pod and in case of lack of available pods, refuses the incoming connection ?
I'm not super familiar with the Kubernetes Nginx ingress setup, but assuming it exposes some of the Nginx configuration options for setting up groups of servers, in the server function there's a parameter called max_conns that will let you limit the number of connections to a given server. Assuming there's a mapping in the ingress controller, it should be possible to set max_conns=1 for each server that's getting created and added to the Nginx configuration under the hood.
http://nginx.org/en/docs/http/ngx_http_upstream_module.html#server
Edit: a little cursory research and it looks like this is indeed possible. It looks like you can specify this in a ConfigMap as nginx.org/max-conns according to the master list of parameters here: https://github.com/nginxinc/kubernetes-ingress/blob/master/docs/configmap-and-annotations.md
You could use the readinessProbe with a periodSecond relatively low and, obvously, {success,failure}Threshold set to 1 in order to release or not the Pod as fast as possible.
Basically, you could set up a script or a simple HTTP endpoint that returns a failed status code in case a connection has been established: so the Pod endpoint will be removed from the Service endpoints list and will not be selected by the Ingress Controller.
Just keep in mind that this solution could be affected by race-conditions but it's the most simple one: a better solution could be using a Service Mesh but means additional complexity.
Related
I'm trying to figure out which tools from GKE stack I should apply to my use case which is a dynamic deployment of stateful application with dynamic HTTP endpoints.
Stateful in my case means that I don't want any replicas and load-balancing (because the app doesn't scale horizontally at all). I understand though that in k8s/gke nomenclature I'm still going to be using a 'load-balancer' even though it'll act as a reverse proxy and not actually balance any load.
The use case is as follows. I have some web app where I can request for a 'new instance' and in return I get a dynamically generated url (e.g. http://random-uuid-1.acme.io). This domain should point to a newly spawned, single instance of a container (Pod) hosting some web application. Again, if I request another 'new instance', I'll get a http://random-uuid-2.acme.io which will point to another (separate), newly spawned instance of the same app.
So far I figured out following setup. Every time I request a 'new instance' I do the following:
create a new Pod with dynamic name app-${uuid} that exposes HTTP port
create a new Service with NodePort that "exposes" the Pod's HTTP port to the Cluster
create or update (if exists) Ingress by adding a new http rule where I specify that domain X should point at NodePort X
The Ingress mentioned above uses a LoadBalancer as its controller, which is automated process in GKE.
A few issues that I've already encountered which you might be able to help me out with:
While Pod and NodePort are separate resources per each app, Ingress is shared. I am thus not able to just create/delete a resource but I'm also forced to keep track of what has been added to the Ingress to be then able to append/delete from the yaml which is definitely not the way to do that (i.e. editing yamls). Instead I'd probably want to have something like an Ingress to monitor a specific namespace and create rules automatically based on Pod labels. Say I have 3 pods with labels, app-1, app-2 and app-3 and I want Ingress to automatically monitor all Pods in my namespace and create rules based on the labels of these pods (i.e. app-1.acme.io -> reverse proxy to Pod app-1).
Updating Ingress with a new HTTP rule takes around a minute to allow traffic into the Pod, until then I keep getting 404 even though both Ingress and LoadBalancer look as 'ready'. I can't figure out what I should watch/wait for to get a clear message that the Ingress Controller is ready for accepting traffic for newly spawned app.
What would be the good practice of managing such cluster where you can't strictly define Pods/Services manifests because you are creating them dynamically (with different names, endpoints or rules). You surely don't want to create bunch of yaml-s for every application you spawn to maintain. I would imagine something similar to consul templates in case of Consul but for k8s?
I participated in a similar project and our decision was to use Kubernetes Client Library to spawn instances. The instances were managed by a simple web application, which took some customisation parameters, saved them into its database, then created an instance. Because of the database, there was no problem with keeping track of what have been created so far. By querying the database we were able to tell if such deployment was already created or update/delete any associated resources.
Each instance consisted of:
a deployment (single or multi-replica, depending on the instance);
a ClusterIp service (no reason to reserve machine port with NodePort);
an ingress object for shared ingress controller;
and some shared configMaps.
And we also used external DNS and cert manager, one to manage DNS records and another to issue SSL certificates for the ingress. With this setup it took about 10 minutes to deploy a new instance. The pod and ingress controller were ready in seconds but we had to wait for the certificate and it's readiness depended on whether issuer's DNS got our new record. This problem might be avoided by using a wildcard domain but we had to use many different domains so it wasn't an option in our case.
Other than that you might consider writing a Helm chart and make use of helm list command to find existing instances and manage them. Though, this is a rather 'manual' solution. If you want this functionality to be a part of your application - better use a client library for Kubernetes.
I have a pretty strange use-case in which we need to have multiple concurrent connections to the same ReplicaSet but each one of them needs to go to a different instance.
Here's an image to summarize it:
Is there any way to set a label based on the ordinal index of an instance in the replica set?
In that way I could map different ports to different instances.
Thanks in advance
Assuming http based cookies something that are looking for you can use ingress controller and so called sticky sessions / session affinity.
The ingress controller replies the response with a Set-Cookie header
to the first request. The value of the cookie will map to a specific
pod replica. When the subsequent request come back again, the client
browser will attach the cookie and the ingress controller is therefore
able to route the traffic to the same pod replica.
Basically by setting appropriate annotations in the ingress object, you will make sure the subsequent traffic request will still be sent to the same pod.
Nginx has a good example and documentation about those.
Alternatively you can also use traefik as ingress controllers. It also supports sitcky session (they`re just defined not at ingress level but traefik's ingressroute).
I have a StatefulSet with pods server-0, server-1, etc. I want to expose them directly to the internet with URLs like server-0.mydomain.com or like mydomain.com/server-0.
I want to be able to scale the StatefulSet and automatically be able to access the new pods from the internet. For example, if I scale it up to include a server-2, I want mydomain.com/server-2 to route requests to the new pod when it's ready. I don't want to have to also scale some other resource or create another Service to achieve that effect.
I could achieve this with a custom proxy service that just checks the request path and forwards to the correct pod internally, but this seems error-prone and wasteful.
Is there a way to cause an Ingress to automatically route to different pods within a StatefulSet, or some other built-in technique that would avoid custom code?
I don't think you can do it. Being part of the same statefulSet, all pods up to pod-x, are targeted by a service. As you can't define which pod is going to get a request, you can't force "pod-1.yourapp.com" or "yourapp.com/pod-1" to be sent to pod-1. It will be sent to the service, and the service might sent it to pod-4.
Even though if you could, you would need to dynamically update your ingress rules, which can cause a downtime of minutes, easily.
With the custom proxy, I see it impossible too. Note that it would need to basically replace the service behind the pods. If your ingress controller knows that it needs to deliver a packet to a service, now you have to force it to deliver to your proxy. But how?
A Kubernetes service is a set of iptables (or IPVS) rules that will redirect a packet with the ServiceIP as a destination address to ONE OF THE PODS that have the same label.
from Kubernetes Services documentation
The service installs iptables rules which select a backend Pod. By default, the choice of backend is random.
Which refers to the fact that a service is not able to distinguish between different pods in the same set.
If you want to force the selection of a specific Pod out of the set by changing the iprules (fairly simple), or by adding any type of proxy is problematic:
let's say you configured pod-1 and pod-2 (1.1.1.1 and 1.1.1.2 respectively), and you configured iptables rules to DNAT requests with destination pod-1.myserver.com to 1.1.1.1 and same for pod-2. (you may ask why the IP, and it's simply because it's the only way to distinguish between these pods)
This approach will fail whenever a pod restarts, let's say pod-1 failed, Kubernetes won't recreate the same pod with same IP and name, instead will create pod-3 with a different IP and updates the iptables accordingly. As a result, all the packets going toward 1.1.1.1 will be dropped until you update the proxy or iptables again.
In fact, that's one of the reasons why we use service to access pods instead of accessing them directly since the Pod IP can change however the service IP won't.
However, since this very specific part of kubernetes was my work for the last 4 months, I have developed a python script to edit the iptables and to choose a specific pod, my conclusion of that work was it's costy and time-consuming and will impose the server to go offline for a couple of seconds when the pods are changed, you can take a look at the code, it definitely works but its not recommended.
This problem is a kubernetes problem and the solution is changing the source code of Kube-proxy, which is my current work.
I suggest you read my answer explaining how kubernetes services exactly work in this question: Which service is doing load balancing between kubernetes nodes?
we're currently trying out traefik and consider using it as ingress controller for our internal kubernetes cluster.
Now I wonder if it is possible to use traefik to loadbalance the kube-apiserver? We have a HA setup with 3 masters.
How would I proceed here?
Basically I just want to loadbalance the API requests from all nodes in the cluster between the 3 masters.
Should I just run traefik outside the cluster?
I'm trying to wrap my head around this... I'm having a hard time to understand how this could work together with traefik as ingress controller.
Thanks for any input, much appreciated!
One way to achieve this is to use the file provider and create a static setup pointing at your API server nodes; something like this (untested)
[file]
[backends]
[backends.backend1]
[backends.backend1.servers]
[backends.backend1.servers.server1]
url = "http://apiserver1:80"
weight = 1
[backends.backend1.servers.server2]
url = "http://apiserver2:80"
weight = 1
[backends.backend1.servers.server3]
url = "http://apiserver3:80"
weight = 1
[frontends]
[frontends.frontend1]
entryPoints = ["http"]
backend = "backend1"
passHostHeader = true
[frontends.frontend1.routes]
[frontends.frontend1.routes.route1]
rule = "Host:apiserver"
(This assumes a simple HTTP-only setup; HTTPS would need some extra setup.)
When Traefik is given this piece of configuration (and whatever else you need to do either via the TOML file or CLI parameters), it will round-robin requests with an apiserver Host header across the three nodes.
Another at least potential option is to create a Service object capturing your API server nodes and another Ingress object referencing that Service and mapping the desirable URL path and host to your API server. This would give you more flexibility as the Service should adjust to changes to your API server automatically, which might be interesting when things like rolling upgrades come into play. One aggravating point though might be that Traefik needs to speak to the API server to process Ingresses and Services (and Endpoints, for that matter), which it cannot if the API server is unavailable. You might need some kind of HA setup or be willing to sustain a certain non-availability. (FWIW, Traefik should recover from temporary downtimes on its own.)
Whether you want run Traefik in-cluster or out-of-cluster is up to you. The former is definitely easier to setup if you want to process API objects as you won't have to pass in API server configuration parameters, though the same restrictions about API server connectivity needs apply if you want to go down the Ingress/Service route. With the file provider approach, you don't need to worry about that -- it is perfectly possible to run Traefik inside Kubernetes without using the Kubernetes provider.
I'd like to implement a sticky-session Ingress controller. Cookies or IP hashing would both be fine; I'm happy as long as the same client is generally routed to the same pod.
What I'm stuck on: it seems like the Kubernetes service model means my connections are going to be proxied randomly no matter what. I can configure my Ingress controller with session affinity, but as soon as the the connection gets past the that and hits a service, kube-proxy is just going to route me randomly. There's the sessionAffinity: ClientIP flag on services, but that doesn't help me -- the Client IP will always be the internal IP of the Ingress pod.
Am I missing something? Is this possible given Kubernetes' current architecture?
An ingress controller can completely bypass kube-proxy. The haproxy controller for example, does this and goes straight to endpoints. However it doesn't use the Ingress in the typical sense.
You could do the same with the nginx controller, all you need to lookup endpoints and insert them instead of the DNS name it currently uses (i.e swap this line for a pointer to an upstream that contains the endpoints).
I evaluated the haproxy controller but could not get it running reliably with session affinity. After some research I discovered Nginx Ingress Controller which since version 0.61 also includes the nginx-sticky-module-ng module and is now running reliably since a couple of days in our test environment. I created a Gist that sets up the required Kubernetes pieces since some important configuration is a bit hard to locate in the existing documentation. Good luck!