celery execute all tasks in queue in certain time - celery

I have a tasks that makes connection to database each time and it can be executed like 1000 times a day. so i don't want this task to be executed instantly when it pushed to the queue.
instead I want all tasks in this queue to wait for certain time (e.g. 01:00 AM) then start executing tasks one by one.
other than that I have some kind of routing and priorities that I still want to be done.
CELERY_TASK_ROUTES= {
'report_app.tasks.*': {'queue': 'create_report_queue', 'priority': 1},
'link_app.tasks.*': {'queue': 'add_link_queue', 'priority': 2},
}
I use Rabbitmq as broker.

The simplest way to do what you have to do is to use the task ETA. The ETA (estimated time of arrival) lets you set a specific date and time that is the earliest time at which your task will be executed.
Example:
from datetime import datetime, timedelta
tomorrow = datetime.utcnow() + timedelta(days=1)
add.apply_async((2, 2), eta=tomorrow, queue="report_queue")

Related

What is a good use case for a delay task in Uber Cadence?

I want to implement a delay task and found a cadence cron example, how to use cadence to implement a delay task?
Cron is for periodic execution of some functionality.
If you need to delay a task you can call sleep and the beginning of the workflow and then call an activity that executes the task.
Cadence supports both activity and workflow delaying.
Activity delay can be achieved with Workflow.Sleep API
Workflow delay can be achieved with DelayStart option. See https://github.com/uber-go/cadence-client/blob/e66e2d4da8def80e7a5730b824a2de7a28f5c050/internal/client.go#L415
For regular workflows, this will delay execution that many seconds, then start.
For cron workflows, this will delay ONLY the FIRST execution. For example you want to set up an hourly cron workflow but you want it to start running from next week on Monday at 9AM. You can pass delayStart seconds option to delay till that Monday between 8AM and 9AM so it will start at 9AM since it's the next schedule.

How to achieve an uncertain score rule in Optaplanner?

I'm using Optaplanner to develop a system, it similars with the example - MeetingScheduling. Assigns some tasks to some machines and determine the start time. I create a class - TaskAssignment as the planning entity, the fields - "machine" and "startTimeGrain" as the planning variables.
But in my use case, there is a constraint doesn't exist in MeetingScheduling, I don't know how to achieve. In some cases, possibly there is a preparation time on the front of the task. It means, TaskA and TaskB is the contiguous tasks on the same machine, TaskB is not going to start until TaskA finished (TaskA is the previous task of TaskB), and possibly there is the preparation time between those tasks, means that after TaskA finished, TaskA have to wait for a while to start, but how long to wait is not fixed, it depends on its previous task.
Possibly like following:
TaskA -> TaskB: TaskB's preparation time is 5 mins.
TaskC -> TaskB: TaskB's preparation time is 15 mins.
TaskC -> TaskA: TaskA's preparation time is 0 min.
So. I get the preparation time for the task base on its previous task (read it from a list) and calculate the interval between two tasks. if the interval is less than preparation time, interval minus preparation time as the punish score.
When I run planning, the rule through a Score Corruption exception. I found that the reason is that both the interval and preparation time are uncertain.
For the interval, it depends on the previous task's end time and its own task's start time, the start time is the planning variable, so it's uncertain.
For the preparation time, there is a preparation time list in each task, which preparation time is available depends on this previous task, due to the start time is keep changing during planning, the preparation time keeps changing too. so preparation time is uncertain too.
in this case, is any way to achieve?
Many thanks
Here is my rule, but score corruption exception appear.
rule "Make sure interval large than preparation time"
salience 1
when
$currentTA : TaskAssignment(
$machine: machine != null,
startingTimeGrain != null,
$lack : getIntervalLack() < 0L // in getIntervalLack(), interval minus preparation time
)
then
scoreHolder.addHardConstraintMatch(kcontext, $lack);
end
The exception message:
Exception in thread "main" java.lang.IllegalStateException: Score corruption: the workingScore (-17hard/0medium/0soft) is not the uncorruptedScore (-20hard/0medium/0soft) after completedAction ([TaskAssignment-5 {Machine-1([023]) -> Machine-1([023])}, TaskAssignment-5 {TimeGrain-2 -> TimeGrain-2}]):
The corrupted scoreDirector has no ConstraintMatch(s) which are in excess.
The corrupted scoreDirector has 1 ConstraintMatch(s) which are missing:
com.esquel.configuration/Make sure interval large than preparation time/[TaskAssignment-4]=-3hard/0medium/0soft
Check your score constraints.
at org.optaplanner.core.impl.score.director.AbstractScoreDirector.assertWorkingScoreFromScratch(AbstractScoreDirector.java:496)
at org.optaplanner.core.impl.solver.scope.DefaultSolverScope.assertWorkingScoreFromScratch(DefaultSolverScope.java:132)
at org.optaplanner.core.impl.phase.scope.AbstractPhaseScope.assertWorkingScoreFromScratch(AbstractPhaseScope.java:167)
at org.optaplanner.core.impl.constructionheuristic.decider.ConstructionHeuristicDecider.processMove(ConstructionHeuristicDecider.java:140)
at org.optaplanner.core.impl.constructionheuristic.decider.ConstructionHeuristicDecider.doMove(ConstructionHeuristicDecider.java:126)
at org.optaplanner.core.impl.constructionheuristic.decider.ConstructionHeuristicDecider.decideNextStep(ConstructionHeuristicDecider.java:99)
at org.optaplanner.core.impl.constructionheuristic.DefaultConstructionHeuristicPhase.solve(DefaultConstructionHeuristicPhase.java:74)
at org.optaplanner.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:87)
at org.optaplanner.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:167)
at com.esquel.main.App.startPlan(App.java:94)
at com.esquel.main.App.main(App.java:43)
If it's a hard constraint, I 'd make it build in and do it with a shadow variable:
I'd probably pre-calculate the task dependencies, so taskB has a reference to it's potentioalPrecedingTasks (taskA and taskC in your example). Then I 'd use the "chained through time" pattern (see docs) to determine the order in which the tasks get executed. Based on that order, the starting time is a shadow variable that is actualPrecedingTask.endingTime + lookUpPreperationTime(precedingTask, thisTask). See arrivalTime listener in VRP, same principle.
If it's a soft constraint, I 'd still have that same shadow variable, but call it desiredStartingTime and add a soft constraint to check if the real startingTime is equal or higher than the desiredStartingTime.
Background summary: Some orders are sent to the production workshop, an order is split into multiple tasks in a sequence by the process routing. The tasks of an order must be executed in the sequence. Each task can only be executed on a particular machine. There is maybe preparation time before starting a task, whether the preparation time exists or not, and it is long or short, depends on what task at the front of it on the same machine.
The following are the main constraints what hard to implement:
Hard constraints:
1. A task must be executed by the particular machine.
2. Tasks in an order must be executed by a particular sequence (tasks of an order, come from the processes of the order, usually they need to be executed by different machine).
3. The first task of an order has the earliest start time. It means when the order arrived the production workshop.
4. Some tasks in an order maybe have a request start time, means if the previous task finish, the next task has to start in a period. For example, TaskA is the previous task of TaskB in the same order, TaskB has to start in 16 hours after TaskA finish.
5. A task probably has a preparation time, depends on its previous task of in the same machine(usually, the same process from different order were assigned to the same machine). If there is the preparation time on the task, the task has to start after the preparation time. In other words, there is an interval between these tasks.
Soft constraints:
1. All task should be executed as soon as possible.
2. Minimize the preparation time, due to the different location of the tasks lead to the different relationship of the tasks, then the preparation times are different.
So, here are two "chains" in the solution during planning. Optaplanner generated a chain for the tasks in the same machine. The another "chain" comes from the order, and in this "chain", tasks will be assigned to the different machine. Two "Chains"
were hang together.
I named that the chain in a machine (generated by Optaplanner) as "Machine chain", and the "chain" in an order as "Order chain".
Now, you can see, due to two "Chains" were hang together, a task as the node both in the Machine chain and Order chain.
I had tried the "chained through time" pattern, the undoMove corruption appeared. I think the reason is when I updated a task in Machine chain, the following task in the same Machine chain will be updated too, these tasks are the nodes of Order chains, a chain reaction broke out.
I think my case looks like the example - Project Job Scheduling. But the difference is two "Chains" in this example never hang together.
So, I try the simple patter, but I can't escape the Score Corruption exception.

Celery chain's place of passing arguments

1 ) Celery chain.
On the doc I read this:
Here’s a simple chain, the first task executes passing its return value to the next task in the chain, and so on.
>>> from celery import chain
>>> # 2 + 2 + 4 + 8
>>> res = chain(add.s(2, 2), add.s(4), add.s(8))()
>>> res.get()
16
But where exactly is chain item's result passed to next chain item? On the celery server side, or it passed to my app and then my app pass it to the next chain item?
It's important to me, because my results is quite big to pass them to app, and I want to do all this messaging right into celery server.
2 ) Celery group.
>>> g = group(add.s(i) for i in xrange(10))
>>> g(10).get()
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Can I be sure that these tasks will be executed as much as possible together. Will celery give priority certain group since the first task of the group start to be being executed?
For example I have 100 requests and each request run group of task, and I don't want to mix task from different groups between each other. First started request to be processed can be the last completed, while his the last task are waiting for free workers which are busy with tasks from others requests. It seems to be better if group of task will be executed as much as possible together.
I will really appreciate if you can help me.
1. Celery Chain
Results are passed on celery side using message passing broker such as rabbitmq. Result are stored using result backend(explicitly required for chord execution). You could verify this information by running your celery worker with loglevel 'INFO' and identify how tasks are invoked.
Celery maintains dependency graph once you invoke tasks, so it exactly knows how to chain your tasks.
Consider callbacks where you link two different tasks,
http://docs.celeryproject.org/en/latest/userguide/canvas.html#callbacks
2. Celery Group
When you call tasks in group celery executes(invokes) them in parallel. Celery worker will try to pick up them depending upon workload it can pick up. If you invoke large number of tasks than your worker can handle, it is certainly possible your first few tasks will get executed first then celery worker will pick rest gradually.
If you have very large no. of task to be invoked in parallel better to invoke then in chunks of certain pool size,
You can mention priority of tasks as mentioned in answer
Completion of tasks in group depends on how much time each task takes. Celery tries to do fair task scheduling as much as possible.

How to set Akka actors run only for specific time period?

I have a big task,which i break down into smaller task and analyse them. I have a basic model.
Master,worker and listener .
Master creates the tasks,give them to worker actors. Once an worker actor completes,it asks for another task from the master. Once all task is completed ,they inform the listener. They usually take around less than 2 minutes to complete 1000 tasks.
Now,Some time the time taken for some tasks might be more than others. I want to set timer for each task,and if a task takes more time,then worker task should be aborted by the master and the task has to be resubmitted later as new one. How to implement this? I can calculate the time taken by a worker task,but how Master actor keeps tab on time taken by all worker actors in real time?
One way of handling this would be for each worker, on receipt of a task to start on, sets a timeout before changing state to process the task, eg:
context.setReceiveTimeout(5 minutes) // for the '5 minutes' notation - import scala.concurrent.duration._
If the timeout is received, the worker can abort the task (or whatever other action you deem appropriate - eg. kill itself, or pass a notification message back to the master). Don't forget to cancel the timeout (set duration = Duration.Undefined) if the task is completed or the like.

Back to back mapReduce jobs mongoDB

I want to run a back to back map reduce job every hour automatically.
So every hour:
startCollection---mapReduce1--->map1ResultCollection---mapReduce2--->map2ResultCollection
I want the mapReduce2 job only to start when the mapReduce1 job is finished.
How do i know when the mapReduce1 job is done and the mapReduce2 is good to go?