How to notify POD in statefull set about other PODS in Kubernetes - kubernetes

I was reading the tutorial on deploying a Cassandra ring and zookeeper with statefulsets. What I don't understand is if I decide to add another replica into the statefulset, how do I notify the other PODS that there is another one. What are best practices for it? I want to be able for one POD to redirect request to another POD in my custom application in case the request doesn't belong to it (ie. it doesn't have the data)

Well, seems like you want to run a clustered application inside kubernetes. It is not something that kubernetes is directly responsible for. The cluster coordination for given solution should be handled within it, and a response to a "how to" question can not be generic.
Most of the softwares out there will have some kind of coordination, discovery and registration mechanism. Be it preconfigured members, external dioscovery catalog/db or some networ broadcasting.
StatefulSet helps a lot in it by retaining network identity under service/pod, or helping to keep storage, so you can ie. always point your new replicas to register with first replica (or preferably one of the first two, cause what if your no.1 is the one that restarted), but as a wrote above, this is pretty much depending on capabilities available on the solution you want to deploy.

Related

What is common strategy for synchronized communication between replica's of same PODS?

Lets say we have following apps ,
API app : Responsible for serving the user requests.
Backend app: Responsible for handling the user requests which are long running tasks. It updates the progress to database (postgres) and distributed cache (Redis).
Both apps are scalable service. Single Backend app handles multiple tenants e.g. Customer here but one customer is assigned to single backend app only.
I have a usecase where I need API layer to connect to specific replica which is handling that customer. Do we have a common Pattern for this ?
Few strategies in mind
Pub/Sub: Problem is we want sync guranteed response , probably using Redis
gRPC : Using POD IP to connect to specific pod is not a standard way
Creating a Service at runtime by adding labels to the replicas and use those. -- Looks promising
Do let me know if there is common pattern or example architecture of this or standard way of doing this?
Note :[Above is a simulation of production usecase, names and actual use case is changed]
You should aim to keep your services stateless, in a Kubernetes environment there is no telling when one pod might be replaced by another due to worker node maintenance.
If you have long running task that cannot be completed during the configured grace period for pods to shutdown during a worked node drain/evacuation you need to implement some kind of persistent work queue as your are think about in option 1. I suggest you look into the saga pattern.
Another pattern we usually employ is to let the worker service write the current state of the job into the database and let the client pull the status every few seconds. This does however require some way of handling half finished jobs that might be abandoned by pods that are forced to shutdown.

Running one pod per node with deterministic hostnames

I have what I believe is a simple goal, but I can't figure out how to get Kubernetes to play ball.
For my particular application, I am trying to deploy a number of replicas of a docker image that is a worker for another service. This system uses the hostname of the worker to distinguish between workers that are running at the same time.
I would like to be able to deploy a cluster where every node runs a worker for this service.
The problem is that the master also keeps track of every worker that ever worked for it, and displays these in a status dashboard. The intent is that you spin up a fixed number of workers by hand and leave it that way. I would like to be able to resize my cluster and have the number of workers change accordingly.
This seems like a perfect application for DaemonSet, except that then the hostnames are randomly generated and the master ends up tracking many orphaned hostnames.
An alternative might be StatefulSet, which gives us deterministic hostnames, but I can't find a way to force it to scale to one pod per node.
The system I am running is open source and I am looking into changing how it identifies workers to avoid this mess, but I was wondering if there was any sensible way to dynamically scale a StatefulSet to the number of nodes in the cluster. Or any way to achieve similar functionality.
The one way is to use nodeSelector, but I totally agree with #Markus: the more correct and advanced way is to use anti-affinity. This is really powerful and at the same time simple solution to prevent scheduling pods with the same labels to 1 node.

Colocating related containers on nodes to avoid the cost of network accesses

I'm still new to Kubernetes so please excuse if this is a silly question.
I'm architecting a system which includes:
an MQTT broker
a set of (containerized) microservices that publish and subscribe to it
a Redis cache that the microservices read and write to.
We will certainly need multiplicity of all of these components as we scale.
There is a natural division in the multiplicity of each of these things: they each pertain to a set of intersections in a city. A publishing or subscribing microservice will handle 1 or more intersections. The MQTT broker instance and the Redis instance each could be set up to handle n intersections.
I am wondering if it makes sense to try to avoid unnecessary network hops in Kubernetes by trying to divide things up by intersection and put all containers related to a given set of intersections on one node. Would this mean putting them all on a single pod, or is there another way?
(By the way, there will still be other publishers and subscribers that need to access the MQTT broker that are not intersection-specific.)
This is more of an opinion question.
Would this mean putting them all on a single pod, or is there another way?
I would certainly avoid putting them all in one Pod. In theory, you can put anything in a single pod, but the general practice is to add lightweight sidecars that handle a very specific function.
IMO an MQTT broker, a Redis datastore and a subscribe/publish app seem like a lot of to put in a single pod.
Possible Disadvantages:
Harder to debug because you may not know where the failure comes from.
A publish/subscriber is generally more of a stateless application and MQTT & Redis would stateful. Deployments are more recommended for stateless services and StatefulSets are recommended for stateful services.
Maybe networking latency. But you can use Node Affinity and Pod Affinity to mitigate that.
Possible Advantages:
All services sharing the same IP/Context.
Too much clutter in a pod.
It would be cleaner if you had:
Deployment for your sub/pub app.
StatefulSet with its own storage for your Redis server.
Statefulset with its own storage for your MQTT.
Each one of these workload resources would create separate pods and you can scale independently up/down.

Kubernetes: single POD with many container, or many Pod with single container

I've rather a teoretical question which I can't answer with the reousrces found online. The question is: what's the rule to decide how to compose containers in POD? . Let me explain with an example.
I've these microservices:
Authentication
Authorization
Serving content
(plus) OpenResty to forward the calls form one to the other and orhcestarate the flow. (is there a possibility to do so natively in K8?, it seems to have services base on nginx+lua, but not sure how it works)
For the sake of the example I avoid Databases and co, I assume they are external and not managed by kubernetes
Now, what's the correct way here LEFT or RIGHT of the image?
LEFT : this seems easier to make it working, everything works on "localhost" , the downside is that it looses a bit the benefit of the microservices. For example, if the auth become slows and it would need more instances, I've to duplicate the whole pod and not just that service.
RIGHT seems a bit more complex, need services to expose each POD to the other PODs. Yet, here, I could duplicate auth as I need without duplicating the other containers. On the other hand I'll have a lot of pods since each pod is basically a container.
It is generally recommended to keep different services in different pods or better deployments that will scale independently. The reasons are what is generally discussed as benefits of a microservices architecture.
A more loose coupling allowing the different services to be developed independently in their own languages/technologies,
be deployed and updated independently and
also to scale independently.
The exception are what is considered a "helper application" to assist a "primary application". Examples given in the k8s docs are data pullers, data pushers and proxies. In those cases a share file system or exchange via loopback network interface can help with critical performance use cases. A data puller can be a side-car container for an nginx container pulling a website to serve from a GIT repository for example.
right image, each in own pod. multi containers in a pod should really only be used when they are highly coupled or needed for support of the main container such as a data loader.
With separate pods, it allows for each service to be updated and deployed independently. It also allows for more efficient scaling. in the future, you may need 2 or 3 content pods but still only one authorization. if they are all together you scale them all since you don't have a choice with them all together in the same pod.
Right image is better option. Easier management, upgrades, scaling.
Should choose the right side of the structure, on the grounds that the deployment of the left side of the architecture model is tight coupling is not conducive to a module according to the actual needs of the business expansion capacity.

Application startup and shutdown based on authenticated user activity

There are applications and services in enterprises that do not need to run all the time and that have a limited user base (say a handful of people).
These applications can be shut down and started either based on scheduling or even better user activity. So, we are talking about on-demand service (say wrapped by a container) and node start-up and shut down.
Now, first to mention that the reason why I mention authenticated user activity is because is makes sense to startup and shutdown on that basis (i.e. not based on lower level network traffic). One can imagine corporate SSO (say OAuth 2 based) being involved.
So, my question is whether anyone has attempted to implement what I have described using Consul or Kubernetes?
In the case of Consul, it could be that the key-value store could be used to give "Micro" (i.e. small user base) class applications a TTL, each time an authenticated user requests access to a given "Micro" class application it's TTL is updated. During the TTL window we want to check the health of the node(s), containers and services - outside of the window we don't (since we want to save on op ex).
This question is similar to this autoscaling question, however different in the sense that this use case is about scaling from 0 nodes and then down to 0 based on an authenticated user base (most likely using SSO).
In the case of Kubernetes, the Horizontal Pod Autoscaling documentation lists the exact use case described under Next steps (i.e. the feature is on the backlog and may be implemented after v1.1. of Kubernetes). The cited feature description (Unidling proposal) is as follows:
Scale the number of pods starting from 0. All pods can be turned-off, and then turned-on when there is a demand for them. When a request to service with no pods arrives, kube-proxy will generate an event for autoscaler to create a new pod.
So basically, it may be possible to do what I've described in future using Kubernetes, but it is not possible right now. This in itself does not address the requirement to only scale from 0 based on authenticated user activity.
It's worth noting, as a cluster-agnostic aside, on-demand container activation based on systemd. This solution will of course not scale back down to 0 without a controlling process, but it's still worth noting.