How to invalidate the kazoo lease on zookeeper when the client holding the lease disconnect from zookeeper or crashes - apache-zookeeper

I am following kazoo lease recipe. Client creates a znode "/db_leases/hourly_cleanup" and acquire the lease. When the client disconnects from the zookeeper or crashes, I want other client, that is trying to acquire the same lease, able to acquire the it. Since znode is not removed, other client fails to acquire the lease even when no other client holding the lease. How to make sure znode "/db_leases/hourly_cleanup" is removed when client holding the lease exits or crashes.

The other client will not be able to acquire it until the lease expires. Once the time specified by the duration parameter has elapsed, it will be able to acquire the lease.

Related

When does curator lib does retry

I was under the impression that curator lib will do the retry for all the Zookeeper operations even if the session is lost. I was simulating a case where I had created a node and then set some data to that node. Then while retrieving the data i killed the session. I see that the curator is able to reconnect to the session but I thought it will also retry and get the data which was not the case. Is there any documentation as to when exactly and for which operations curator does a retry.
Code that watches the node:
getAsyncCuratorFramework(curatorFramework)
.watched()
.checkExists()
.forPath(fullNodePath)
.event()
.toCompletableFuture()
.get(jobTimeoutDO.getDuration(), jobTimeoutDO.getTimeUnit());
Now I am simulating a test where I am watching an Ephemeral node for node delete event and I schedule the following call in between:
KillSession.kill
Since the session was killed the node will be removed and curator will try to establish the connection again. All of this works fine and as expected. But I also thought that the curator will retry and watch the node again ofcourse if the node does not exists it might throw an exception but I do create a node again.
Just wanted to confirm in the above scenario will the curator not retry. BTW it throws the following exception:
AsyncEventException
But I also thought that the curator will retry and watch the node again
That's not how retries work. Retries in Curator retry individual ZooKeeper operations. They are not a high level feature and will not reset watches for you. What you are looking for is one of Curator's high level recipes that manage a ZNode. Have a look at PersistentNode or NodeCache.

Proper way to elect a leader from a set of candidate pods in Kubernetes

To perform the leader election, Kubernetes documentation suggests deploying a sidecar in the set of candidate pods.
https://kubernetes.io/blog/2016/01/simple-leader-election-with-kubernetes/
This sidecar follows the following steps to elect a leader.
If a valid leader does not exist, every sidecar tries to update a Kubernetes endpoint object atomically. Only one of the sidecars can successfully update the endpoint object.
The sidecar which updates the endpoint will assume the leader for a specified time duration.
Current leader will update the endpoint again to extend the time duration in order to retain the leadership.
Other sidecars will not try to update the endpoint again when a valid leader exists.
If the current leader does not update the endpoint within the time duration, other sidecars will consider that the leadership is revoked. All sidecars will go to step 1.
There are a few issues with this method.
It is possible to run 2 leaders simultaneously for a brief period of time.
Example:
If the current leader hangs and cannot update the endpoint in time, one of the other sidecars will acquire the leadership. But it will take some time for the previous leader to realize that their leadership status is revoked. For this brief period of time, existing 2 leaders can corrupt a shared resource/data.
This is also mentioned in its source code as well.
This implementation does not guarantee that only one client is acting as a leader (a.k.a. fencing).
The source code of this sidecar is retired/archived. So, it is not in active development.
https://github.com/kubernetes-retired/contrib/tree/master/election
So, what is the proper method to elect a leader with Kubernetes?

Is there a way to delete an ephemeral node after a client is disconnected by some time?

Our cluster nodes take actions on the deletion of some ephemeral nodes but we're having network issues at a customer that leads to the deletion of the ephemeral nodes for some clients, although those clients are still up and running.
I agree that the network issues should be solved but it doesn't look like we can do that at the moment.
So is there a way to configure Zookeeper to delete the ephemeral node for a disconnected client only if it stays disconnected for X amount of time ?
We use Apache Curator as a Zookeeper client.
Our Zookeeper version is 3.4.6.
You can play around with zookeeper's session timeout configuration to achieve the desired behavior. Zookeeper server will delete the ephemeral node for a session after not receiving any heartbeat from the client for the session timeout duration.

Are Zookeeper ephemeral nodes written to disk?

Are Zookeeper ephemeral nodes written to disk?
I know normal Zookeeper nodes are written to disk before Zookeeper acks the write to the client.
However, ephemeral nodes only last for the duration of the client session, so if the zookeeper nodes have all crashed, then by definition the client session is broken. So there would be no need to write to disk, because the ephemeral nodes are not recreated when the ensemble restarts. So theoretically it seems like ephemeral nodes only need to be stored in memory.
Is this how its implemented?
I ran into this question myself, and noticed that it had been answered on the Zookeeper mailing list, and I'm posting it here for anyone who finds this question.
In short, yes, ephemeral nodes are indeed written to disk. As a result, a client session can persist even if the entire Zookeeper ensemble is down. To quote Patrick Hunt's answer from the mailing list (emphasis mine):
Ephemeral znodes are treated just like persistent znodes in the sense that
a quorum of nodes need to agree to any change. As such the znode is written
to the transaction log.
A client session ends either when a client closes it's
session explicitly or the ZK quorum leader decides that the session has
expired (which is based on the negotiated session timeout). Only while a
leader is active can a session be expired (or closed for that matter). When
you shutdown an ensemble the sessions are maintained. If you were to, for
example, shut down an ensemble for an hour and then restart it the sessions
would still be active. The clock would "reset" when the new leader was
elected. If the client session is still active the session would continue,
any ephemeral znodes would still exist.

Zookeeper client node did not reconnect after session expired

I intercepted the connection between Zookeeper server and the client node using a custom TCP Monitor (Similar to TCPMon). I stopped the TCPMon and restarted it. When I restarted the TCPMon after the session expiration the client node is notified ("session has expired") but did not reconnect. How can I get it fixed?
Once you get session expired, you need to close the ZooKeeper handle and re-create it.
From The Programmer's Guide - "once a ZooKeeper object is closed or receives a fatal event (SESSION_EXPIRED and AUTH_FAILED), the ZooKeeper object becomes invalid"