zookeeper queue delay? - apache-zookeeper

What would you guys suggest to be a good way to implement a queue in zookeeper that has the ability to delay a job without blocking a worker?
Reference beanstalkd delayed job option.

What you need is develop a Barriers using zookeeper.
I assume the "delay time" was set by another process called master.
Master first create a node say /work/flag with data "false"
What worker need to do is get and watch node /work/flag. The watcher would call back in asyn so you can do other thing in worker, would't block.
When the time comes, master would set the /work/flag data to "true", which cause a ZOO_CHANGED_EVENT event.
And the worker should get the event call back saying "ZOO_CHANGED_EVENT" in /work/flag. Then it can get and check if /work/flag is true and determine whether continue the workflow.

Related

How to properly handle race condition caused by retry worker

In one of the services we had some connection issues and we are getting random timeouts (we think it is because of the client library. it is one of the caching services). We decided to handle it by putting it in the queue and retrying on a separate worker until we solve the underlying issue.
However, there is a case. let's say we want to put the value "A" to cache. but it fails. so we put it in the queue to retry again. but during this time user fire a delete request to remove that data and we call it without any timeouts (no error, but no record to delete as well). then our retry strategy writes that data to cache (which is supposed to be deleted and not be there).
How would we handle this scenario? I first thought maybe we can raise an error if delete doesn't delete anything but then I see it also has so many complications and can end with an endless retry even
It appear as the issue is coming as you are doing actual action on main thread and if it fails then only doing retry through queue by worker thread.
If you do actual action as well through worker thread as well through queue then issue will be resolved.
Or 2nd solution is, you can track all the keys that are in queue for retry. If there is any action related to key already in queue then queue the actual action as well. Like delete should be queue as the action for A as retry action on A is already queue.
2nd solution is little inefficient.

Resume Cadence Workflow based on signal without blocking the thread

We want to build a workflow which contains below steps in that order
Execute some synchronous activities.
Trigger an external operation via kafka event.
Listen to the kafka events for the result of the operation.
Execute some other activities based on the result.
Kafka may contain events not related to workflow, so we need a separate workflow to filter the events for that particular workflow.
Using cadence I'm planning to split it into two workflows
Workflow1 : 1 -> 2 -> wait for signal -> 4
Workflow2 : 3 -> Call workflow1.signal
Is it possible to wait for a signal in workflow1 without actually blocking the thread, so that the thread can process another workflow in the meantime.
I think there is some misunderstanding on how Temporal/Cadence works. There is no requirement to not block a thread for other workflows to be able to make progress. Worker instance will have no problem dealing with such situation.
So I would recommend to block the thread in the workflow one to wait for the signal as it is the simplest way to solve your business requirements.
As a side note I don't understand why you need a second workflow. There is no need to have a workflow to filter Kafka events. You can do it directly in a Kafka consumer that signals the first workflow.
I have some experience writing Kafka/Kinesis consumers (not working with Cadence but plan to do so soon). My feeling is that you only need 1 consumer thread blocked and waiting for new events from the Kafka stream. And this consumer can live anywhere as long as it can talk to your Cadence system to send a signal to a workflow. For each Kafka message (after filter out non related), if it can be designed to contain all the information for the consumer to decide which workflow to signal, it will be very simple. If you have no control over what is in the message (sounds like you have an existing stream), it is a little trick. Your consumer may need to look up which workflow to call based on some other identifier in the message

How to return the Celery task to the queue that it could take another worker?

Sometimes, it is required that the worker is, after taking the job, it is not processed, and returned back to the turn to the other worker is able to pick up and handle it. How to implement this?
If you're referring to mid-execution crashes, you can turn on the CELERY_ACKS_LATE configuration.
Keep in mind that it may have some side effects as described here

Zookeeper priority queue

My problem description is follows:
I have n state based database infinite crawlers:
Currently how it is happening:
We are using single machine for crawling.
We have three level of priority queue. High, Medium and LOW.
At starting all Database job are put into lower level queue.
Worker reads a job from queue and do operation.
After finishing job it reschedule it with a delay of 5 minutes.
Solution I found
For Priority Queue I can use:
-
http://zookeeper.apache.org/doc/r3.2.2/recipes.html#sc_recipes_priorityQueues
Problem solution I am still searching are:
How to reschedule a job in queue with future schedule time. Is there
a way to do that in zookeeper ?
Canceling a already started job. Suppose user change his database
authentication details. I want to stop already running job for that
database and restart with new details.
What I thought is while starting a worker It will subscribe for that
it's znode changes and if something happen, It will stop that job and
reschedule it.
Infinite Queue
What I thought is that after finishing it will remove it from queue and
readd it with future schdule time. (It implementation depend on point 1)
Is it correct way of doing this task infinite task?

MPI Task Scheduling

I want to develop a task scheduler using MPI where there is a single master processor and there are worker/client processors. Each worker has all the data it needs to compute, but gets the index to work on from the master. After the computation the worker returns some data to the master. The problem is that some processes will be fast and some will be slow.
If I run a loop so that at each iteration the master sends and receives (blocking/non-blocking) data then it can't proceed to next step till it has received data from the current worker from the previous index assigned to it. The bottom line is if a worker takes too long to compute then it becomes the limiting factor and the master can't move on to assign an index to the next worker even if non-blocking techniques are used. Is it possible to skip assigning to a worker and move on to next.
I'm beginning to think that MPI might not be the paradigm to do this. Would python be a nice platform to do task scheduling?
This is absolutely possible using MPI_Irecv() and MPI_Test(). All the master process needs to do is post a non-blocking receive for each worker process, then in a loop test each one for incoming data. If a process is done, send it a new index, post a new non-blocking receive for it, and continue.
One MPI_IRecv for each process is one solution. This has the downside of needing to cancel unmatched MPI_IRecv when the work is complete.
MPI_ANY_SOURCE is an alternate path. This will allow the manager process to have a single MPI_IRecv outstanding at any given time, and the "next" process to MPI_Send will be matched with MPI_ANY_SOURCE. This has the downside of several ranks blocking in MPI_Send when there is no additional work to be done. Some kind of "nothing more to do" signal needs to be worked out, so the ranks can do a clean exit.