Is it possible to introduce dependent tasks in emacs org mode?
Suppose I have three tasks Development, Test, Deploy which should be done one after another. I scheduled the first one with SCHEDULED: and DEADLINE: and want that the second is scheduled automatically after the first one is finished (e.g. I can specify offset from the first task's deadline and the duration of the second task). Or can it at least warn me that tasks overlap? Also if I move the schedule date of one task then following tasks should be moved accordingly.
Check out org-depend, in the contrib directory of the org-mode distribution.
Org-depend is documented on Worg: http://orgmode.org/worg/org-contrib/org-depend.html
Related
I need to create schedulers to execute jobs(class files) at specified intervals..For Now, I'm using Quartz Scheduler which triggers the jobs at defined intervals from the time of triggering of it.
For Eg: Consider I'm giving a cron expression to run for every one hour starting at morning 9.My first run will be at 9 and my second run will be at 10 and so on.
If my job is taking 20 minutes to execute then in that case this method is not that much efficient.
What I need to do is to schedule a job for every one hour from the completion time of the previously ran job
For Eg: Consider my job to run every one hour is triggered at 9 and for the first run it took 20 minutes to run, so for the next time the job should trigger only at 10:20 instead of 10 (ie., one hour from the completion of previous ran job)
I need to know whether there are any methods in Quartz Scheduling to achieve this or any other logic I need to do.
If anyone could help me out on this,it would be very helpful for me.
You can easily achieve this by job-chaining your job executions. There are various approaches you can choose from:
(1) Implement a Quartz JobListener and in its jobWasExecuted method, that is invoked by Quartz whenever a job finishes executing, re-fire your job.
(2) Look at the Quartz JobChainingJobListener that you can use to implement simple job chaining scenarios. Please note that the functionality of this listener is very limited as it does not allow you to insert delays between job executions, there is no support for conditions that must be met before target jobs are executed etc. But you can use it as a good starting point to implement (1).
(3) Use QuartzDesk (our commercial product) or any other product that allows you to create job chains while externalizing and managing all job dependencies outside of your application. A job chain can have multiple target jobs that can be executed immediately, with a fixed delay or at arbitrary time in the future produced by a JavaScript expression. It also allows you to implement somewhat more sophisticated works flows, such as firing a target job when multiple source jobs complete their execution etc. I am attaching screenshots showing you what a simple job chain that re-executes Job1 with a 1 minute delay upon Job1's completion (with any job execution status) looks like:
Update for the bounty
I'd like a solution that does not involve a monitoring thread, if possible.
I know I can view scheduled and active tasks using the Inspect class of my apps Control.
i = myapp.control.inspect()
currently_running = i.active()
scheduled = i.scheduled()
But I could not find any function to show already finished tasks. I know that this information mus be at least temporarily accessible, because I can look up a finished task by its task_id:
>>> r = my task.AsyncResult(task_id=' ... ')
>>> r.state
u'SUCCESS'
How can I get a complete list of scheduled, active and finished tasks? Or possibly a list of all tasks at once?
Celery Flower shows tasks (active, finished, reserved, etc) in real time. It enables to filter tasks by time, workers and types.
https://github.com/mher/flower
One option not requiring a monitoring thread is a Celery on_success handler (using bootsteps feature in 3.1+) - this would need to write relevant info to your own datastore.
You need to create a custom task class to do this. This on_failure example gives an idea.
Possibly better option, needing less code, is to use a task_success signal in a similar way, recording the info you need later.
The Flower option is probably simpler, as you are querying info already maintained by Flower when tasks complete - see this answer.
I'm looking for recommended solution to work around celerybeat being a single point of failure for celery/rabbitmq deployment. I didn't find anything that made sense so far, by searching the web.
In my case, once a day timed scheduler kicks off a series of jobs that could run for half a day or longer. Since there can only be one celerybeat instance, if something happens to it or the server that it's running on, critical jobs will not be run.
I'm hoping there is already a working solution for this, as I can't be the only one who needs reliable (clustered or the like) scheduler. I don't want to resort to some sort of database-backed scheduler, if I don't have to.
There is an open issue in celery github repo about this. Don't know if they are working on it though.
As a workaround you could add a lock for tasks so that only 1 instance of specific PeriodicTask will run at a time.
Something like:
if not cache.add('My-unique-lock-name', True, timeout=lock_timeout):
return
Figuring out lock timeout is well, tricky. We're using 0.9 * task run_every seconds if different celerybeats will try to run them at different times.
0.9 just to leave some margin (e.g. when celery is a little behind schedule once, then it is on schedule which would cause lock to still be active).
Then you can use celerybeat instance on all machines. Each task will be queued for every celerybeat instance but only one task of them will finish the run.
Tasks will still respect run_every this way - worst case scenario: tasks will run at 0.9*run_every speed.
One issue with this case: if tasks were queued but not processed at scheduled time (for example because queue processors was unavailable) - then lock may be placed at wrong time causing possibly 1 next task to simply not run. To go around this you would need some kind of detection mechanism whether task is more or less on time.
Still, this shouldn't be a common situation when using in production.
Another solution is to subclass celerybeat Scheduler and override its tick method. Then for every tick add a lock before processing tasks. This makes sure that only celerybeats with same periodic tasks won't queue same tasks multiple times. Only one celerybeat for each tick (one who wins the race condition) will queue tasks. In one celerybeat goes down, with next tick another one will win the race.
This of course can be used in combination with the first solution.
Of course for this to work cache backend needs to be replicated and/or shared for all of servers.
It's an old question but I hope it helps anyone.
I 'm working of project that use celery, rabbitmq. I want to have right to control interval that queue push task to worker(celeryd).
It sounds like you're looking for this documentation on Periodic Tasks.
Essentially, you configure and run celerybeat, which fires off task executions at intervals.
Word of warning:
If it's undesirable to be running your task multiple times concurrently, I'd suggest you follow a task locking recipe. If your workers are busy or offline, you may end up with a backlog of periodic tasks.
I am trying to use beanstalk for queuing a large number of periodic
tasks (for example, tasks need processed every N minutes), for each
task, if the last queued job is not completed (not reserved, i mean)
when current job to be added, the last queued job should be replaced
with current job, in other words, only the latest queued job of a task
should be processed.
how can i achieve that using beanstalk?
Ideas i have got right now is:
for each task, use memcached store its latest timestamps (set this
when add jobs to queue),
every time the worker reserved a job successfully, it first checks
timestamps for this task in memcached,
if timestamps of the job is same as timestamps in memcached, then
process this job,
otherwise skip this job, and delete it from the queue.
So is there better ways to do such work? please give your suggestions,
thanks.
I found a memcache/beanstalk combination also the best solution for an implementation where I didnt want a newer but identical job entering a queue.
Until 'named jobs' are done and the software released, that may be one of the better solutions.