Kubernetes: how best to autoscale nodes containing websocket connections? - kubernetes

Is there support for autoscaling nodes where the pods only contain websockets used for push notifications back to the client? I suspect we would likely be hitting connection constraints before either CPU or memory limits are hit. Please correct me if others have different experiences here.
The main issue I see is the persistent nature of the connections - pods that contain active websockets must remain intact when down-scaling and tenancy is not relocatable.
So my questions here are these:
is this support available? Would we want to make these statefulsets? I am not even sure what model works best here.
would we want to use Kubernetes services to route incoming websocketconnections to the worker nodes? If so, how would we set the kube-proxy to respect and ignore those worker nodes whose connections limits have been reached and thus should not get new connection requests?
how do we autoscale based upon a configurable limits on the number of connections maintained by a pod? How do we scale down without destroying any nodes that have > 1 active connections?
Thanks in advance for all tips/pointers, especially any advice on how to best ask these questions.

Related

involuntary disruptions / SIGKILL handling in microservice following saga pattern

Should i engineer my microservice to handle involuntary disruptions like hardware failure?
Are these disruptions frequent enough to be handled in a service running on AWS managed EKS cluster.
Should i consider some design change in the service to handle the unexpected SIGKILL with methods like persisting the data at each step or will that be considered as over-engineering?
What standard way would you suggest for handling these involuntary disruptions if it is
a) a restful service that responds typically in 1s(follows saga pattern).
b) a service that process a big 1GB file in 1 hour.
There are couple of ways to handle those disruptions. As mentioned here here:
Here are some ways to mitigate involuntary disruptions:
Ensure your pod requests the resources it needs.
Replicate your application if you need higher availability. (Learn about running replicated stateless and stateful applications.)
For even higher availability when running replicated applications, spread applications across racks (using anti-affinity) or across zones
(if using a multi-zone cluster.)
The frequency of voluntary disruptions varies.
So:
if your budget allows it, spread your app accross zones or racks, you can use Node affinity to schedule Pods on cetrain nodes,
make sure to configure Replicas, it will ensure that when one Pod receives SIGKILL the load is automatically directed to another Pod. You can read more about this here.
consider using DaemonSets, which ensure each Node runs a copy of a Pod.
use Deployments for stateless apps and StatefulSets for stateful.
last thing you can do is to write your app to be distruption tolerant.
I hope I cleared the water a little bit for you, feel free to ask more questions.

Is performance testing multiple deployment stacks in one Kuberentes cluster a valid test?

We have a deployment stack with about 20 microservices/pods. Each deployment goes to its own namespace. To make sure that the cpu and memory are guaranteed for each pod and not shared, we set the request amounts the same as limit amount. Now we sometimes need to deploy more stack into the same performance cluster, e.g. testing different releases of the same stack. The question is whether having more than one deployment in one cluster can invalidate the test result due to shared network or some other reasons?
Initially we were thinking to create one cluster for each performance testing to make sure it is isolated and test results are correct but creating a new cluster and maintaining it a very costly. We also thought about making sure each deployment goes to one node to avoid load testing on one stack impact the others but I'm not sure if that really helps. Please share your knowledge on this as Kubernetes is almost new to us.
If the containers are running on the same underlying hosts then bleedthrough is always possible. If you set all pods into Guaranteed QoS mode (aka requests == limits) then it at least reduces the bleedthrough to a minimum. Running things on one cluster is always fine but if you want to truly reduce the crosstalk to zero then you would need dedicated workload nodes for each.

Request buffering in Kubernetes clusters

This is a purely theoretical question. A standard Kubernetes clusted is given with autoscaling in place. If memory goes above a certain targetMemUtilizationPercentage than a new pod is started and it takes on the flow of requests that is coming to the contained service. The number of minReplicas is set to 1 and the number of maxReplicas is set to 5.
What happens when the number of pods that are online reaches maximum (5 in our case) and requests from clients are still coming towards the node? Are these requests buffered somewhere of they are discarded? Can I take any actions to avoid request loss?
Natively Kubernetes does not support messaging queue buffering. Depends on the scenario and setup you use your requests will most likely 'timeout'. To efficiently manage those you`ll need custom resource running inside Kubernetes cluster.
In that situations it very common to use a message broker which ensures communication between microservices is reliable and stable, that the messages are managed and monitored within the system and that messages don’t get lost.
RabbitMQ, Kafka and Redis appears to be most popular but choosing the right one will heaving depend on your requirement and features needed.
Worth to note since Kubernetes essentially runs on linux is that linux itself also manages/limits the requests coming in socket. You may want to read more about it here.
Another thing is that if you have pods limits set or lack of resource it is most likely that pods might be restarted or cluster will become unstable. Usually you can prevent it by configuring some kind of "circuit breaker" to limit amount of requests that could go to backed without overloading it. If the amount of requests goes beyond the circuit breaker threshold, excessive requests will be dropped.
It is better to drop some request than having cascading failure.
I managed to test this scenario and I get 503 Service Unavailable and 403 Forbidden on my requests that do not get processed.
Knative Serving actually does exactly this. https://github.com/knative/serving/
It buffers requests and informs autoscaling decisions based on in-flight request counts. It also can enforce per-Pod max in-flight requests and hold onto request until newly scaled-up Pods come up and then Knative proxies the request to them as it has this container named queue-proxy as a sidecar to its workload type called "Service".

Running one pod per node with deterministic hostnames

I have what I believe is a simple goal, but I can't figure out how to get Kubernetes to play ball.
For my particular application, I am trying to deploy a number of replicas of a docker image that is a worker for another service. This system uses the hostname of the worker to distinguish between workers that are running at the same time.
I would like to be able to deploy a cluster where every node runs a worker for this service.
The problem is that the master also keeps track of every worker that ever worked for it, and displays these in a status dashboard. The intent is that you spin up a fixed number of workers by hand and leave it that way. I would like to be able to resize my cluster and have the number of workers change accordingly.
This seems like a perfect application for DaemonSet, except that then the hostnames are randomly generated and the master ends up tracking many orphaned hostnames.
An alternative might be StatefulSet, which gives us deterministic hostnames, but I can't find a way to force it to scale to one pod per node.
The system I am running is open source and I am looking into changing how it identifies workers to avoid this mess, but I was wondering if there was any sensible way to dynamically scale a StatefulSet to the number of nodes in the cluster. Or any way to achieve similar functionality.
The one way is to use nodeSelector, but I totally agree with #Markus: the more correct and advanced way is to use anti-affinity. This is really powerful and at the same time simple solution to prevent scheduling pods with the same labels to 1 node.

Colocating related containers on nodes to avoid the cost of network accesses

I'm still new to Kubernetes so please excuse if this is a silly question.
I'm architecting a system which includes:
an MQTT broker
a set of (containerized) microservices that publish and subscribe to it
a Redis cache that the microservices read and write to.
We will certainly need multiplicity of all of these components as we scale.
There is a natural division in the multiplicity of each of these things: they each pertain to a set of intersections in a city. A publishing or subscribing microservice will handle 1 or more intersections. The MQTT broker instance and the Redis instance each could be set up to handle n intersections.
I am wondering if it makes sense to try to avoid unnecessary network hops in Kubernetes by trying to divide things up by intersection and put all containers related to a given set of intersections on one node. Would this mean putting them all on a single pod, or is there another way?
(By the way, there will still be other publishers and subscribers that need to access the MQTT broker that are not intersection-specific.)
This is more of an opinion question.
Would this mean putting them all on a single pod, or is there another way?
I would certainly avoid putting them all in one Pod. In theory, you can put anything in a single pod, but the general practice is to add lightweight sidecars that handle a very specific function.
IMO an MQTT broker, a Redis datastore and a subscribe/publish app seem like a lot of to put in a single pod.
Possible Disadvantages:
Harder to debug because you may not know where the failure comes from.
A publish/subscriber is generally more of a stateless application and MQTT & Redis would stateful. Deployments are more recommended for stateless services and StatefulSets are recommended for stateful services.
Maybe networking latency. But you can use Node Affinity and Pod Affinity to mitigate that.
Possible Advantages:
All services sharing the same IP/Context.
Too much clutter in a pod.
It would be cleaner if you had:
Deployment for your sub/pub app.
StatefulSet with its own storage for your Redis server.
Statefulset with its own storage for your MQTT.
Each one of these workload resources would create separate pods and you can scale independently up/down.