Is there a cloud-native friendly method to select a master among the replicas? - kubernetes

Is there a way in Kubernetes to upgrade a given type of pod first when we have a deployment or stateful set with two or more replicas ( where one pod is master and others are not)?
My requirement to be specific is to ensure when calling upgrade on deployment/statefull set is to upgrade master as the last pod under a given number of replicas..

The only thing that's built into Kubernetes is the automatic sequential naming of StatefulSet pods.
If you have a StatefulSet, one of its pods is guaranteed to be named statefulsetname-0. That pod can declare itself the "master" for whatever purposes this is required. A pod can easily determine (by looking at its hostname(1)) whether it is the "master", and if it isn't, it can also easily determine what pod is. Updates happen by default in numerically reverse order, so statefulsetname-0 will be upgraded last, which matches your requirement.
StatefulSets have other properties, which you may or may not want. It's impossible for another pod to take over as the master if the first one fails; startup and shutdown happens in a fairly rigid order; if any part of your infrastructure is unstable then you may not be able to reliably scale the StatefulSet.
If you don't want a StatefulSet, you can implement your own leader election in a couple of ways (use a service like ZooKeeper or etcd that can help you maintain this state across the cluster; bring in a library for a leader-election algorithm like Raft). Kubernetes doesn't provide this on its own. The cluster will also be unaware of the "must upgrade the leader last" requirement, but if the current leader is terminated, another pod can take over the leadership.

The easiest way is probably having master in one deployment/statefulset, while followers in another deployment/statefulset. This approach ensure update is persist and can make use of update strategy in k8s.
The fact that k8s does not differentiate pod by containers nor any role specific to user application architecture ('master'); it is better to manage your own deployment when you have specific sequence that is outside of deployment/statefulset control. You can patch but change will not persist rollout restart.

Related

In Kubernetes, what is the real purpose of replicasets?

I am aware about the hierarchical order of k8s resources. In brief,
service: a service is what exposes the application to outer world (or with in cluster). (The service types like, CluserIp, NodePort, Ingress are not so much relevant to this question. )
deployment: a deployment is what is responsible to keep a set of pods running.
replicaset: a replica set is what a deployment in turn relies on to keep the set of pods running.
pod: - a pod consist of a container or a group of container
container - the actual required application is run inside the container.
The thing i want to empasise in this question is, why we have replicaset. Why don't the deployment directly handle or take responsibility of keeping the required number of pods running. But deployment in turn relies on replicset for this.
If k8s is designed this way there should be definitely some benefit of having replicaset. And this is what i want to explore/understand in depth.
Both essentially serves the same purpose. Deployments are a higher abstraction and as the name suggests it deals with creating, maintining and upgrading the deployment (collection of pods) as a whole.
Whereas, ReplicationControllers or Replica sets primary responsibility is to maintain a set of identical replicas (which you can achieve declaratively using deployments too, but internally it creates a resplicaset to enable this).
More specifically, when you are trying to perform a "rolling" update to your deployment, such as updating the image versions, the deployment internally creates a new replica set and performs the rollout. during the rollout you can see two replicasets for the same deployment.
So in other words, Deployment needs the lower level "encapsulation" of Replica sets to achive this.

Persistent Kafka transacton-id across restarts on Kubernetes

I am trying to achieve the exactly-once delivery on Kafka using Spring-Kafka on Kubernetes.
As far as I understood, the transactional-ID must be set on the producer and it should be the same across restarts, as stated here https://stackoverflow.com/a/52304789/3467733.
The problem arises using this semantic on Kubernetes. How can you get a consistent ID?
To solve this problem I implementend a Spring boot application, let's call it "Replicas counter" that checks, through the Kubernetes API, how many pods there are with the same name as the caller, so I have a counter for every pod replica.
For example, suppose I want to deploy a Pod, let's call it APP-1.
This app does the following:
It perfoms a GET to the Replicas-Counter passing the pod-name as parameter.
The replicas-counter calls the Kubernetes API in order to check how many pods there are with that pod name. So it does a a +1 and returns it to the caller. I also need to count not-ready pods (think about a first deploy, they couldn't get the ID if I wasn't checking for not-ready pods).
The APP-1 gets the id and will use it as the transactional-id
But, as you can see a problem could arise when performing rolling updates, for example:
Suppose we have 3 pods:
At the beginning we have:
app-1: transactional-id-1
app-2: transactional-id-2
app-3: transactional-id-3
So, during a rolling update we would have:
old-app-1: transactional-id-1
old-app-2: transactional-id-2
old-app-3: transactional-id-3
new-app-3: transactional-id-4 (Not ready, waiting to be ready)
New-app-3 goes ready, so Kubernetes brings down the Old-app-3. So time to continue the rolling update.
old-app-1: transactional-id-1
old-app-2: transactional-id-2
new-app-3: transactional-id-4
new-app-2: transactional-id-4 (Not ready, waiting to be ready)
As you can see now I have 2 pods with the same transactional-id.
As far as I understood, these IDs have to be the same across restarts and unique.
How can I implement something that gives me consistent IDs? Is there someone that have dealt with this problem?
The problem with these IDs are only for the Kubernetes Deployments, not for the Stateful-Sets, as they have a stable identifier as name. I don't want to convert all deployment to stateful sets to solve this problem as I think it is not the correct way to handle this scenario.
The only way to guarantee the uniqueness of Pods is to use StatefulSet.
StatefulSets will allow you to keep the number of replicas alive but everytime pod dies it will be replaced with the same host and configuration. That will prevent data loss that is required.
Service in Statefulset must be headless because since each pod is going to be unique, so you are going to need certain traffic to reach certain pods.
Every pod require a PVC (in order to store data and recreate whenever pod is deleted from that data).
Here is a great article describing why StatefulSet should be used in similar case.

How to delete pods inside "default" namespace permanently. As when I delete only pods it is coming back because of "replication controller"

"How to permanently delete pods inside "default" namespace? As when I delete pods, they are coming back because of "replication controller".
As this is in a Default namespace, I am sure that we can delete it permanently. Any idea how to do it ?
I'd like to add some update to what was already said in previous answer.
Basically in kubernetes you have several abstraction layers. As you can read in the documentation:
A Pod is the basic execution unit of a Kubernetes application–the
smallest and simplest unit in the Kubernetes object model that you
create or deploy. A Pod represents processes running on your Cluster .
It is rarely deployed as separate entity. In most cases it is a part of higher level object such as Deployment or ReplicationController. I would advise you to familiarize with general concept of controllers, especially Deployments, as they are currently the recommended way of setting up replication [source]:
Note: A Deployment that configures a ReplicaSet is now the recommended
way to set up replication.
As you can read further:
A ReplicationController ensures that a specified number of pod
replicas are running at any one time. In other words, a
ReplicationController makes sure that a pod or a homogeneous set of
pods is always up and available.
It applies also to situation when certain pods are deleted by user. Replication controller doesn't care why the pods were deleted. Its role is just to make sure they are always up and running. Its very simple concept. When you don't want certain pods to exist any more, you must get rid of the higher level object that manages them and ensures they are always available.
Read about Replication Controllers, then delete the ReplicationController.
It can't "ensure that a specified number of pod replicas are running" when it's dead.
kubectl delete replicationcontroller <name>

Is There a Way To Control Demonset's Rolling Update Way In Kubernetes?

I have three demonset pods which contain a container of hadoop resource manager in each pod. One of three is active node. And the other two are standby nodes.
So there is two quesion:
Is there a way to let kubernetes know the hadoop resource manager
inside the pod is a active node or standby node?
I want to control the rolling update way to update the standby node at first and update the active node in last for decrease the times
changing active node which may cause risk.
Consider the following: Deployments, DaemonSets and ReplicaSets are abstractions meant to manage a uniform group of objects.
In your specific case, although you're running the same application, you can't say it's a uniform group of object as you have two types: active and standby objects.
There is no way for telling Kubernetes which is which if they're grouped in what is supposed to be an uniform set of objects.
As suggested by #wolmi, having them in a Deployment instead of DaemonSet still leaves you with the issue that deployment strategies can't individually identify objects to control when they're updated because of the aforementioned logic.
My suggestion would be that, additional to using a Deployment with node affinity to ensure a highly available environment, you separate active and standby objects in different Deployments/Services and base your rolling update strategy on that scenario.
This will ensure that you're updating the standby nodes first, removing the risk of updating the active nodes before the other.
I think this is not the best way to do that, totally understand that you use Daemonset to be sure that Hadoop exists on an HA environment one per node but you can have that same scenario using a deployment and affinity parameters more concrete the pod affinity, then you can be sure only one Hadoop node exists per K8S node.
With that new approach, you can use a replication-controller to control the rolling-update, some resources from the documentation:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
https://kubernetes.io/docs/tasks/run-application/rolling-update-replication-controller/

Force Kubernetes Pod shutdown before starting a new one in case of disruption

I'm trying to set up a stateful Apache Flink application in Kubernetes and I need to save the current state in case of a disruption, such as someone deleting the pod or it being rescheduled due to cluster resizing.
I added a preStop hook to the container that accomplishes this behaviour, but when I delete a pod using kubectl delete pod it spins up a new Pod before the old one terminates.
Guides such as this one use the Recreate update strategy to make sure only one pod runs at a time. This works fine in case of updating a deployment, but it does not cover disruptions like I described above. I also tried to set spec.strategy.rollingUpdate.maxSurge to 0 but that made no difference.
Is it possible to configure my Deployment in such a way that no pod ever starts before another one is terminated, or do I need to switch to StatefulSets?
I agree with #Cosmic Ossifrage as StatefulSets make it easy to achieve your goal. Each Pod in StatefulSets is represented with unique, persistent identities and stable hostnames that Kubernetes Engine maintains regardless of where they are scheduled.
Therefore, StatefulSets are deployed in sequential order and are terminated in reverse ordinal order assuming that Kubernetes StatefulSet controller removes one Pod each time after complete deletion of previous one as well.