Is it possible to avoid nested RetryLoop.callWithRetry calls so I get a consistent timeout? - apache-zookeeper

I've configured a reasonable timeout using BoundedExponentialBackoffRetry, and generally it works as I'd expect if ZK is down when I make a call like "create.forPath". But if ZK is unavailable when I call acquire on an InterProcessReadWriteLock, it takes far longer before it finally times out.
I call acquire which is wrapped in "RetryLoop.callWithRetry" and it goes onto call findProtectedNodeInForeground which is also wrapped in "RetryLoop.callWithRetry". If I've configured the BoundedExponentialBackoffRetry to retry 20 times, the inner retry tries 20 times for every one of the 20 outer retry loops, so it retries 400 times.
We really need a consistent timeout after which we fail. Have I done anything wrong / anyway around this? If not, I guess I'll call the troublesome methods in a new thread that I can kill after my own timeout.
Here is the sample code to recreate it. I stick break points at the lines following the comments, bring ZK down and then let it continue and take the stacktrace whilst it's re-trying.
public class GoCurator {
public static void main(String[] args) throws Exception {
CuratorFramework cf = CuratorFrameworkFactory.newClient(
"localhost:2181",
new BoundedExponentialBackoffRetry(200, 10000, 20)
);
cf.start();
String root = "/myRoot";
if(cf.checkExists().forPath(root) == null) {
// Stacktrace A showing what happens if ZK is down for this call
cf.create().forPath(root);
}
InterProcessReadWriteLock lcok = new InterProcessReadWriteLock(cf, "/grant/myLock");
// See stacktrace B showing the nested re-try if ZK is down for this call
lcok.readLock().acquire();
lcok.readLock().release();
System.out.println("done");
}
}
Stacktrace A (if ZK is down when I'm calling create().forPath). This shows the single retry loop so it exist after the correct number of attempts:
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java:-1)
at java.lang.Object.wait(Object.java:502)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1499)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1487)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2617)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
at com.gebatech.curator.GoCurator.main(GoCurator.java:25)
Stacktrace B (if ZK is down when I call InterProcessReadWriteLock#readLock#acquire). This shows the nested re-try loop so it doesn't exit until 20*20 attempts.
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Unsafe.java:-1)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:434)
at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:56)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
at org.apache.curator.framework.imps.CreateBuilderImpl.findProtectedNodeInForeground(CreateBuilderImpl.java:1239)
at org.apache.curator.framework.imps.CreateBuilderImpl.access$1700(CreateBuilderImpl.java:51)
at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1167)
at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:575)
at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
at org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54)
at org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)
at org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237)
at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
at com.gebatech.curator.GoCurator.main(GoCurator.java:29)

This turns out to be a real, longstanding, problem with how Curator uses retries. I have a fix and PR ready here: https://github.com/apache/curator/pull/346 - I'd appreciate more eyes on it.

Related

ThreadPoolExecutor AbortPolicy not throwing exception

I'm using ThreadPoolExecutor where the default RejectedExecutionHandler is AbortPolicy.
As per the understanding, AbortPolicy will throw a rejection exception once the queue size is full and I'm still pushing the entries to the queue.
This is what I'm using to process something.
CompletableFuture<Void> processTask(Executor executorB) {
return CompletableFuture.runAsync(
() -> {
// doSomething(...);
}, executorB);
}
doSomething(){
Thread.sleep(1000000l);
System.out.println("doing something");
}
I have another executor (let's say executor A), which is calling the processTask method in a loop 10 times (1 sec delay).
For executorB, my queue size is 3, max, and the core pool size is 1. When it is called for the first time it will go in the thread.sleep. As the executorA is continuously sending messages, I should get the rejection exception after 4th message which I'm not seeing anywhere.
Interesting thing is, that the log "doing something" came 4 times which means the tasks after the queue got full were rejected (1 handled by the first call, and 3 were in the queue)
Can someone explain to me why I'm not seeing any exceptions?

Checking health of Kafka Stream Threads

When some of the stream threads die (because of an exception for example), I don't want to continue, but restart the process.
In order to do this, I need to identify this state.
I know I can use kafkaStream.state(), but it checks the state of the whole kstreams. Meaning if only one StreamThread has died, it will not be discovered by kafkaStream.state().
What is the best way for me to know in the code that all StreamThreads are alive and working?
Update : adding timeout to KafkaStreams#close() as it can cause deadlock as Matthias stated in the comment
If you want to detect if any StreamThreads has dies then you can use KafkaStreams#setUncaughtExceptionHandler(), you can stop streaming and exit app:
kafkaStreams.setUncaughtExceptionHandler((t, e) -> {
logger.error("Exiting ", e);
kafkaStreams.close(10);
System.exit(1);//exit with error code so container can restart this app
});

MS-MPI MPI_Barrier: sometimes hangs indefinitely, sometimes doesn't

I'm using the MPI.NET library, and I've recently moved my application to a bigger cluster (more COMPUTE-NODES). I've started seeing various collective functions hang indefinitely, but only sometimes. About half the time a job will complete, the rest of the time it'll hang. I've seen it happen with Scatter, Broadcast, and Barrier.
I've put a MPI.Communicator.world.Barrier() call (MPI.NET) at the start of the application, and created trace logs (using the MPIEXEC.exe /trace switch).
C# code snippet:
static void Main(string[] args)
{
var hostName = System.Environment.MachineName;
Logger.Trace($"Program.Main entered on {hostName}");
string[] mpiArgs = null;
MPI.Environment myEnvironment = null;
try
{
Logger.Trace($"Trying to instantiated on MPI.Environment on {hostName}. Is currently initialized? {MPI.Environment.Initialized}");
myEnvironment = new MPI.Environment(ref mpiArgs);
Logger.Trace($"Is currently initialized?{MPI.Environment.Initialized}. {hostName} is waiting at Barrier... ");
Communicator.world.Barrier(); // CODE HANGS HERE!
Logger.Trace($"{hostName} is past Barrier");
}
catch (Exception envEx)
{
Logger.Error(envEx, "Could not instantiate MPI.Environment object");
}
// rest of implementation here...
}
I can see the msmpi.dll's MPI_Barrier function being called in the log, and I can see messages being sent and received thereafter for a passing and a failing example. For the passing example, messages are sent/received and then the MPI_Barrier function Leave is logged.
For the failing example it look like one (or more) of the send messages is lost - it is never received by the target. Am I correct in thinking that messages lost within the MPI_Barrier call will mean that the processes never synchronize, therefore all get stuck at the Communicator.world.Barrier() call?
What could be causing this to happen intermittently? Could poor network performance between the COMPUTE-NODES be a cause?
I'm running MS HPC Pack 2008 R2, so the version of MS-MPI is pretty old, v2.0.
EDIT - Additional information
If I keep a task running within the same node, then this issue does not happen. For example, if I run a task using 8 cores on one node then fine, but if i use 9 cores on two nodes I'll see this issue ~50% of the time.
Also, we have two clusters in use and this only happens on one of them. They are both virtualized environments, but appear to be set up identically.

Code with a potential deadlock

// down = acquire the resource
// up = release the resource
typedef int semaphore;
semaphore resource_1;
semaphore resource_2;
void process_A(void) {
down(&resource_1);
down(&resource_2);
use_both_resources();
up(&resource_2);
up(&resource_1);
}
void process_B(void) {
down(&resource_2);
down(&resource_1);
use_both_resources();
up(&resource_1);
up(&resource_2);
}
Why does this code causes deadlock?
If we change the code of process_B where the both processes ask for the resources in the same order as:
void process_B(void) {
down(&resource_1);
down(&resource_2);
use_both_resources();
up(&resource_2);
up(&resource_1);
}
Then there is no deadlock.
Why so?
Imagine that process A is running and try to get the resource_1 and gets it.
Now, process B takes control and try to get resource_2. And gets it. Now, process B tries to get resource_1 and does not get it, because it belongs to resource A. Then, process B goes to sleep.
Process A gets control again and try to get resource_2, but it belongs to process B. Now he goes to sleep too.
At this point, process A is waiting for resource_2 and process B is waiting for resource_1.
If you change the order, process B will never lock resource_2 unless it gets resource_1 first, the same for process A.
They will never be dead locked.
A necessary condition for a deadlock is a cycle of resource acquisitions. The first example constructs this a cycle 1->2->1. The second example acquires the resources in a fixed order which makes a cycle and henceforth a deadlock impossible.

SQLConnection Pooling - Handling InvalidOperationExceptions

I am designing a Highly Concurrent CCR Application in which it is imperative that I DO NOT Block or Send to sleep a Thread.
I am hitting SQLConnection Pool issues - Specifically getting InvalidOperationExceptions when trying to call SqlConnection.Open
I can potentially retry a hand full of times, but this isn't really solving the problem.
The ideal solution for me would be a method of periodically re-checking the connection for availablity that doesn't require a thread being tied up
Any ideas?
[Update]
Here is a related problem/solution posted at another forum
The solution requires a manually managed connection pool. I'd rather have a solution which is more dynamic i.e. kicks in when needed
Harry, I've run into this as well, also whilst using the CCR. My experience was that having completely decoupled my dispatcher threads from blocking on any I/O, I could consume and process work items much faster than the SqlConnection pool could cope with. Once the maximum-pool-limit was hit, I ran into the sort of errors you are seeing.
The simplest solution is to pre-allocate a number of non-pooled asynchronous SqlConnection objects and post them to some central Port<SqlConnection> object. Then whenever you need to execute a command, do so within an iterator with something like this:
public IEnumerator<ITask> Execute(SqlCommand someCmd)
{
// Assume that 'connPort' has been posted with some open
// connection objects.
try
{
// Wait for a connection to become available and assign
// it to the command.
yield return connPort.Receive(item => someCmd.Connection = item);
// Wait for the async command to complete.
var iarPort = new Port<IAsyncResult>();
var iar = someCmd.BeginExecuteNonQuery(iarPort.Post, null);
yield return iarPort.Receive();
// Process the response.
var rc = someCmd.EndExecuteNonQuery(iar);
// ...
}
finally
{
// Put the connection back in the 'connPort' pool
// when we're done.
if (someCmd.Connection != null)
connPort.Post(someCmd.Connection);
}
}
The nice thing about using the Ccr is that it is trivial to add the following the features to this basic piece of code.
Timeout - just make the initial receive (for an available connection), a 'Choice' with a timeout port.
Adjust the pool size dynamically. To increase the size of the pool, just post a new open SqlConnection to 'connPort'. To decrease the size of the pool, yield a receive on the connPort, and then close the received connection and throw it away.
Yes, connections are kept open and out of the connection pool. In the above example, the port is the pool.