How to set Akka actors run only for specific time period? - scala

I have a big task,which i break down into smaller task and analyse them. I have a basic model.
Master,worker and listener .
Master creates the tasks,give them to worker actors. Once an worker actor completes,it asks for another task from the master. Once all task is completed ,they inform the listener. They usually take around less than 2 minutes to complete 1000 tasks.
Now,Some time the time taken for some tasks might be more than others. I want to set timer for each task,and if a task takes more time,then worker task should be aborted by the master and the task has to be resubmitted later as new one. How to implement this? I can calculate the time taken by a worker task,but how Master actor keeps tab on time taken by all worker actors in real time?

One way of handling this would be for each worker, on receipt of a task to start on, sets a timeout before changing state to process the task, eg:
context.setReceiveTimeout(5 minutes) // for the '5 minutes' notation - import scala.concurrent.duration._
If the timeout is received, the worker can abort the task (or whatever other action you deem appropriate - eg. kill itself, or pass a notification message back to the master). Don't forget to cancel the timeout (set duration = Duration.Undefined) if the task is completed or the like.

Related

Whether the workflow worker in uber-cadence has control of the number of coroutines?

If the workflow executes for a long time (for example, the workflow executes sleep), will a large number of coroutines be generated?
Cadence or Temporal workflow only needs a worker to generate the next steps to execute. When it is blocked waiting for an external event like a timer it doesn't consume any worker resources. So a single worker can process a practically unlimited number of workflows given that it can keep up with their execution rate.
As an optimization workflows are cached on a worker. But any of them can be kicked out of cache at any time without affecting their correctness.

Polling for external state transitions in Cadence workflows

I have a Cadence workflow where I need to poll an external AWS API until a particular resource transitions, which might take some amount of time. I assume I should make each individual 'checkStatus' request an Activity, and have the workflow perform the sleep/check loop. However, that means that I may have an unbounded number of activity calls in my workflow history. Is that worrisome? Is there a better way to accomplish this?
It depends on how frequently you want to poll.
For infrequent polls (every minute or slower) use the server side retry. Specify a RetryPolicy (or RetryOptions for Java) when invoking the activity. In the RetryPolicy specify an exponential coefficient of 1 and an initial interval of the poll frequency. Then fail the activity in case the polled resource is not ready and the server is going to retry it up to the specified retry policy expiration interval.
For very frequent polls of every few seconds or faster the solution is to implement the polling inside an activity implementation as a loop that polls and then sleeps for the poll interval. To ensure that the polling activity is restarted in a timely manner in case of worker failure/restart the activity has to heartbeat on every iteration. Use an appropriate RetryPolicy for the restarts of such failed activity.
In a rare case when the polling requires a periodic execution of a sequence of activities or activity arguments should change between retries a child workflow can be used. The trick is that a parent is not aware about a child calling continue as new. It only gets notified when a child completes or fails. So if a child executes the sequence of activities in a loop and calls continue as new periodically the parent is not affected until the child completes.

Zookeeper priority queue

My problem description is follows:
I have n state based database infinite crawlers:
Currently how it is happening:
We are using single machine for crawling.
We have three level of priority queue. High, Medium and LOW.
At starting all Database job are put into lower level queue.
Worker reads a job from queue and do operation.
After finishing job it reschedule it with a delay of 5 minutes.
Solution I found
For Priority Queue I can use:
-
http://zookeeper.apache.org/doc/r3.2.2/recipes.html#sc_recipes_priorityQueues
Problem solution I am still searching are:
How to reschedule a job in queue with future schedule time. Is there
a way to do that in zookeeper ?
Canceling a already started job. Suppose user change his database
authentication details. I want to stop already running job for that
database and restart with new details.
What I thought is while starting a worker It will subscribe for that
it's znode changes and if something happen, It will stop that job and
reschedule it.
Infinite Queue
What I thought is that after finishing it will remove it from queue and
readd it with future schdule time. (It implementation depend on point 1)
Is it correct way of doing this task infinite task?

End Celery worker task on, time limit, job stage or instruction from client

I'm new to celery and I would appreciate a little help with a design pattern(or example code) for a worker I have yet to write.
Below is a description of the desired characteristics of the worker.
The worker will run a task that collects data from an endless source, a generator.
The worker task will run forever feeding from the generator unless it is directed to stop.
The worker task should stop gracefully on the occurrence of any one of the following triggers.
It exceeds an execution time limit in seconds.
It exceeds a number of iterations of the endless generator loop.
The client sends a message instructing the worker task to finish immediately.
Below is some sudo code for how I believe I need to handle trigger scenarios 1 and 2.
What I don't know is how I send the 'finish immediately' signal from the client and how it is received and executed in the worker task.
Any advice or sample code would be appreciated.
from celery.task import task
from celery.exceptions import SoftTimeLimitExceeded
COUNTLIMIT = # some value sent to the worker task by the client
#task()
def getData():
try:
for count, data in enumerate(endlessGeneratorThing()):
# process data here
if count > COUNTLIMIT: # Handle trigger scenario 2
clean_up_task_nicely()
break
except SoftTimeLimitExceeded: # Handle trigger scenario 1
clean_up_task_nicely()
My understanding of revoke is that it only revokes a task prior to its execution. For (3), I think what you want to do is use an AbortableTask, which provides a cooperative way to end a task:
http://docs.celeryproject.org/en/latest/reference/celery.contrib.abortable.html
On the client end you are able to call task.abort(), on the task end, you are able to poll task.is_aborted()

I want to have a queue that push task to worker (celeryd) depend on interval time setting

I 'm working of project that use celery, rabbitmq. I want to have right to control interval that queue push task to worker(celeryd).
It sounds like you're looking for this documentation on Periodic Tasks.
Essentially, you configure and run celerybeat, which fires off task executions at intervals.
Word of warning:
If it's undesirable to be running your task multiple times concurrently, I'd suggest you follow a task locking recipe. If your workers are busy or offline, you may end up with a backlog of periodic tasks.