I have a stateful service in Service Fabric with multiple partitions and replicas. I have configured it so that both the Primary and ActiveSecondary replicas expose their endpoints. The purpose is so that I can leverage the secondary replicas for read operations.
The problem I'm having is that inside the service I want to be able to tell it is a Primary or ActiveSecondary because some shared defaulting code needs to not run for the secondary replicas. (Because inserting defaults into the ReliableStateManager throws on secondaries.)
Can I determine the Replica Role at runtime?
You can override OnChangeRoleAsync and check the ReplicaRole parameter. Note that:
The role can change during the lifetime of a service (e.g. secondary promoted to primary)
RunAsync is only executed on primary replicas (will be cancelled if the role changes) - so you can safely place your initialization code there
For more advanced scenarios, you can also check the Partition's ReadStatus and WriteStatus
Related
In Kubernetes, I have a statefulset with a number of replicas.
I've set the updateStrategy to RollingUpdate.
I've set podManagementPolicy to Parallel.
My statefulset instances do not have a persistent volume claim -- I use the statefulset as a way to allocate ordinals 0..(N-1) to pods in a deterministic manner.
The main reason for this, is to keep availability for new requests while rolling out software updates (freshly built containers) while still allowing each container, and other services in the cluster, to "know" its ordinal.
The behavior I want, when doing a rolling update, is for the previous statefulset pods to linger while there are still long-running requests processing on them, but I want new traffic to go to the new pods in the statefulset (mapped by the ordinal) without a temporary outage.
Unfortunately, I don't see a way of doing this -- what am I missing?
Because I don't use volume claims, you might think I could use deployments instead, but I really do need each of the pods to have a deterministic ordinal, that:
is unique at the point of dispatching new service requests (incoming HTTP requests, including public ingresses)
is discoverable by the pod itself
is persistent for the duration of the pod lifetime
is contiguous from 0 .. (N-1)
The second-best option I can think of is using something like zookeeper or etcd to separately manage this property, using some of the traditional long-poll or leader-election mechanisms, but given that kubernetes already knows (or can know) about all the necessary bits, AND kubernetes service mapping knows how to steer incoming requests from old instances to new instances, that seems more redundant and complicated than necessary, so I'd like to avoid that.
I assume that you need this for a stateful workload, a workload that e.g. requires writes. Otherwise you can use Deployments with multiple pods online for your shards. A key feature with StatefulSet is that they provide unique stable network identities for the instances.
The behavior I want, when doing a rolling update, is for the previous statefulset pods to linger while there are still long-running requests processing on them, but I want new traffic to go to the new pods in the statefulset.
This behavior is supported by Kubernetes pods. But you also need to implement support for it in your application.
New traffic will not be sent to your "old" pods.
A SIGTERM signal will be sent to the pod - your application may want to listen to this and do some action.
After a configurable "termination grace period", your pod will get killed.
See Kubernetes best practices: terminating with grace for more info about pod termination.
Be aware that you should connect to services instead of directly to pods for this to work. E.g. you need to create headless services for the replicas in a StatefulSet.
If your clients are connecting to a specific headless service, e.g. N, this means that it will not be available for some times during upgrades. You need to decide if your clients should retry their connections during this time period or if they should connect to another headless service if N is not available.
If you are in a case where you need:
stateful workload (e.g. support for write operations)
want high availability for your instances
then you need a form of distributed system that does some form of replication/synchronization, e.g. using raft or a product that implements this. Such system is easiest deployed as a StatefulSet.
You may be able to do this using Container Lifecycle Hooks, specifically the preStop hook.
We use this to drain connections from our Varnish service before it terminates.
However, you would need to implement (or find) a script to do the draining.
I want to deploy my application (stateful) in kubernetes as 3 replicas just for high availability. Therefore only one instance of my application should get all the request. Other replicas are just for HA (in case master is down).
I know things like Redis or MySQL can be deployed but they themselves provide the master-slave architecture. https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/
How can this be achieved in kubernetes for any other simple application?
You need to put the failover logic somewhere. Either on the server-side or client-side. Either client can talk to instance 1 by default and if it is not up, then failover to instance 2 and so on. Or you can have an agent/proxy on the server-side which does this routing for you. In this case, the proxy has the logic of checking if instance 1 is up or down, etc.
Usually, for stateful applications, failover is not just as simple as connecting to the other instance when the primary is down. It might involve reconciling the state, making sure the other replica has an up-to-date state or has a quorum, etc depending on the application. So there is no "one size fits all" type solution for all stateful applications. You need to build and chose the model appropriate for your application.
I have been learning Kubernetes for a few weeks and now I am trying to figure out the right way to connect a web server to a statefulset correctly.
Let's say I deployed a master-slave Postgres statefulset and now I will connect my web server to it. By using a cluster IP service, the requests will be load balanced across the master and the slaves for both reading (SELECT) and writing (UPDATE, INSERT, DELETE) records, right? But I can't do that because writing requests should be handled by the master. However, when I point my web server to the master using the headless service that will give us a DNS entry for each pod, I won't get any load balancing to the other slave replications and all of the requests will be handled by one instance and that is the master. So how am I supposed to connect them the right way? By obtaining both load balancing to all replications along with the slave in reading records and forwarding writing records requests to the master?
Should I use two endpoints in the web server and configure them in writing and reading records?
Or maybe I am using headless services and statefulsets the wrong way since I am new to Kubernetes?
Well, your thinking is correct - the master should be read-write and replicas should be read only. How to configure it properly? There are different possible approaches.
First approach is what you thinking about, to setup two headless services - one for accessing primary instances, the second one to access to the replica instances - good example is Kubegres:
In this example, Kubegres created 2 Kubernetes Headless services (of default type ClusterIP) using the name defined in YAML (e.g. "mypostgres"):
a Kubernetes service "mypostgres" allowing to access to the Primary PostgreSql instances
a Kubernetes service "mypostgres-replica" allowing to access to the Replica PostgreSql instances
Then you will have two endpoints:
Consequently, a client app running inside a Kubernetes cluster, would use the hostname "mypostgres" to connect to the Primary PostgreSql for read and write requests, and optionally it can also use the hostname "mypostgres-replica" to connect to any of the available Replica PostgreSql for read requests.
Check this starting guide for more details.
It's worth noting that there are many database solutions which are using this approach - another example is MySQL. Here is a good article in Kubernetes documentation about setting MySQL using Stateful set.
Another approach is to use some middleware component which will act as a gatekeeper to the cluster, for example Pg-Pool:
Pg pool is a middleware component that sits in front of the Postgres servers and acts as a gatekeeper to the cluster.
It mainly serves two purposes: Load balancing & Limiting the requests.
Load Balancing: Pg pool takes connection requests and queries. It analyzes the query to decide where the query should be sent.
Read-only queries can be handled by read-replicas. Write operations can only be handled by the primary server. In this way, it loads balances the cluster.
Limits the requests: Like any other system, Postgres has a limit on no. of concurrent connections it can handle gracefully.
Pg-pool limits the no. of connections it takes up and queues up the remaining. Thus, gracefully handling the overload.
Then you will have one endpoint for all operations - the Pg-Pool service. Check this article for more details, including the whole setup process.
I need to run my application with "at most once" semantics. It is absolutely crucial that only one instance of my app is running at any given time or none at all
At first I was using resource type "deployment" with single replica but then I realized the during network partition we might inadvertently be running more than one instances.
I stumbled up on "stateful sets" while I was searching for most once semantics in kubernetes. On reading further, the examples dealt with cases where the containers needed a persistent volume and typically these containers were running with more than one replica. My application is not even using any volumes.
I also read about tolerations to kill the pod if the node is unreachable. Given that tolerations can handle pod unreachable cases, is stateful set an overkill for my usecase?
I am justifying the use stateful set - because even in that mean time the node becomes unreachable and toleration seconds reached and kubelet realizes that it is cut off from the network and kills the processes, kubernetes can spin up another instance. And I believe stateful set prevent this corner case too.
Am I right? Is there any other approach to achieve this?
To quote a Kubernetes doc:
...StatefulSets maintain a sticky, stable identity for their
Pods...Guaranteeing an identity for each Pod helps avoid split-brain
side effects in the case when a node becomes unreachable (network
partition).
As described in the same doc, StatefulSet Pods on a Node are marked as "Unknown" and aren't rescheduled unless forcefully deleted when a node becomes unreachable. Something to consider for proper recovery, if going this route.
So, yes - StatefulSet may be more suitable for the given use case than Deployment.
In my opinion, it won't be an overkill to use StatefulSet - choose the Kubernetes object that works best for your use case.
Statefulsets are not the recourse for at-most one semantics - they are typically used for deploying "state full" applications like databases which use the persistent identity of their pods to cluster among themselves
We have faced similar issues like what you mentioned - we had implicitly assumed that a old pod would be fully deleted before bringing up the new instance
One option is to use the combination of preStop hooks + init-containers combination
preStop hook will do the necessary cleanup (say delete a app specific etcd key)
Init container can wait till the etcd key disappears (with an upper bound).
References:
https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
One alternative is to try with the anti-affinity settings, but i am not very sure about this one though
It is absolutely crucial that only one instance of my app is running at any given time.
Use a leader election pattern to guarantee at most one active replica. If you use more than one replica and leader election, the other replicas are stand by in case of network partition situations. This is how the components in the Kubernetes control plane solves this problem when needing only one active instance.
Leader election algorithms in Kubernetes usually work by taking a lock (e.g. in etcd) with a timeout. Only the instance that has the lock is active. When the lock is timed out, the leader election algorithm either extend the lock timeout or elect a new leader. How it works depends on the implementation but there is a guarantee that there is at most one leader - the active instance.
See e.g. Simple Leader Election with Kubernetes that also describe how to solve this in a side car container.
If your application is stateless, you should use Deployment and not StatefulSet. It can appear that StatefulSet is a way to solve at most one instance in a situation with a network partition, but that is mostly in case of a stateful replicated application like e.g. a cache or database cluster even though it may solve your specific situation as well.
My kubernetes cluster has 3 pods for postgres. I have configured persistent volume outside of the cluster on a separate virtual machine. Now as per kubernetes design, multiple pods will be responding to read/write requests of clients. Is their any dead lock or multiple writes issues that can occur between multiple postgres pods?
You would need a leader election system between them. There can be only one active primary in a Postgres cluster at a time (give or take very very niche cases). I would recommend https://github.com/zalando-incubator/postgres-operator instead.
I agree with the previous answer. In the case you've asked it is better if you use a postgres cluster where only a instance acts as primary the others act as secondary. When the primary is failed, one of the secondary becomes primary and later when the failed primary is back it is added as a secondary of the new primary instance. The leader election is responsible for raising a secondary as new primary instance. That's how cluster is managed.
Besides the previous one you can use kubedb for kubernetes.