I need to schedule a task for the first day of each month. Up until now, I have been using this:
system.scheduler.schedule(0.microseconds, 30.days, schedulerActor, "update")
But as you may have guessed it, this ends up sometimes running the task twice a month (march) or none a month (february). Is there a better way to schedule the task for the first day of each month using Akka Scheduler?
Built-in Akka scheduler is more a delayer than a scheduler. I would recommend using akka-quartz-scheduler. This module allows you to actually schedule tasks to run when you want.
The usage is simple. Some config:
akka {
quartz {
schedules {
YourScheduleName {
description = "A cron job that fires off every first of the month at 5AM"
expression = "0 0 5 1 1/1 ? *"
}
}
}
}
And then in the code:
case object Tick
val yourActor = system.actorOf(Props[YourActor])
QuartzSchedulerExtension(system).schedule("YourScheduleName", yourActor, Tick)
Related
I am creating multiple futures and I am expecting only one to achieve the desired goal.
How can I cancel all other futures from within a future?
This is how I create futures:
jobs = days_to_scan.map{|day|
Concurrent::Future.execute do
sleep_time = day.to_f / days_to_scan.count.to_f * seconds_to_complete.to_f
sleep (sleep_time)
if GoogleAPI.new.api_call(#adwords, ad_seeder, visitor, day)
# How to cancel other futures here?
end
end
}
I might be late to the party but I'm gonna reply anyway as other people might stumble upon this question.
So what you want is to probably force-shutdown the thread pool as soon as one Future finishes:
class DailyJobs
def call
thread_pool = ::Concurrent::CachedThreadPool.new
jobs = days_to_scan.map{ |day|
Concurrent::Future.execute(executor: thread_pool) do
sleep_time = day.to_f / days_to_scan.count.to_f * seconds_to_complete.to_f
sleep (sleep_time)
if GoogleAPI.new.api_call(#adwords, ad_seeder, visitor, day)
# How to cancel other futures here?
thread_pool.kill
end
end
}
end
end
the thing is: killing a thread pool is not really recommended and might have unpredictable results
a better approach is to track when a Future is done and ignore other Futures:
class DailyJobs
def call
status = ::Concurrent::AtomicBoolean.new(false)
days_to_scan.map{ |day|
Concurrent::Future.execute do
return if status.true? # Early return so Future does nothing
sleep_time = day.to_f / days_to_scan.count.to_f * seconds_to_complete.to_f
sleep (sleep_time)
if GoogleAPI.new.api_call(#adwords, ad_seeder, visitor, day)
# Do your thing
status.value = true # This will let you know that at least one Future completed
end
end
}
end
end
It is worth noting that if this is a Rails application, you probably want to wrap your Future on Rails executor to avoid autoloading and deadlock issues. I wrote about it here
Okey, I could implement it as:
#wait until one job has achieved the goal
while jobs.select{|job| job.value == 'L' }.count == 0 && jobs.select{|job| [:rejected, :fulfilled].include?(job.state) }.count != jobs.count
sleep(0.1)
end
#cancel other jobs
jobs.each{|job| job.cancel unless (job.state == :fulfilled && job.value == success_value) }
}
I want to accomplish something like this:
results = []
for i in range(N):
data = generate_data_slowly()
res = tasks.process_data.apply_async(data)
results.append(res)
celery.collect(results).then(tasks.combine_processed_data())
ie launch asynchronous tasks over a long period of time, then schedule a dependent task that will only be executed once all earlier tasks are complete.
I've looked at things like chain and chord, but it seems like they only work if you can construct your task graph completely upfront.
For anyone interested, I ended up using this snippet:
#app.task(bind=True, max_retries=None)
def wait_for(self, task_id_or_ids):
try:
ready = app.AsyncResult(task_id_or_ids).ready()
except TypeError:
ready = all(app.AsyncResult(task_id).ready()
for task_id in task_id_or_ids)
if not ready:
self.retry(countdown=2**self.request.retries)
And writing the workflow something like this:
task_ids = []
for i in range(N):
task = (generate_data_slowly.si(i) |
process_data.si(i)
)
task_id = task.delay().task_id
task_ids.append(task_id)
final_task = (wait_for(task_ids) |
combine_processed_data.si()
)
final_task.delay()
That way you would be running your tasks synchronously.
The solution depends entirely on how and where data are collected. Roughly, given that generate_data_slowly and tasks.process_data are synchronized, a better approach would be to join both in one task (or a chain) and to group them.
chord will allow you to add a callback to that group.
The simplest example would be:
from celery import chord
#app.task
def getnprocess_data():
data = generate_data_slowly()
return whatever_process_data_does(data)
header = [getnprocess_data.s() for i in range(N)]
callback = combine_processed_data.s()
chord(header)(callback).get()
Technically I can install cron on the machine and curl the url, but I'm trying to avoid that. Any way to accomplish this?
Reason I want to avoid cron is so I can easily change the schedule or stop it completely without also ssh'ing into the machine to do so.
Take a look at: https://github.com/enragedginger/akka-quartz-scheduler.
Refer to http://quartz-scheduler.org/api/2.1.7/org/quartz/CronExpression.html for valid CronExpressions and examples.
An example taken from the docs:
An example schedule called Every-30-Seconds which, aptly, fires-off every 30 seconds:
akka {
quartz {
schedules {
Every30Seconds {
description = "A cron job that fires off every 30 seconds"
expression = "*/30 * * ? * *"
calendar = "OnlyBusinessHours"
}
}
}
}
You can integrate this into your Play! application (probably in your Global application)
You can use the Akka scheduler.
val scheduler = Akka.system(app).scheduler
scheduler.schedule(0 seconds, 1 hour) {
// run this block every hour
}
The first parameter is a delay, so if you wanted to delay to a specific time you could easily calculate the target time with some simple date arithmetic.
Check out https://github.com/philcali/cronish
Some example code from README.md:
val payroll = task {
println("You have just been paid... Finally!")
}
// Yes... that's how you run it
payroll executes "every last Friday in every month"
val greetings = job (println("Hello there")) describedAs "General Greetings"
// give a delayed start
val delayed = greetings runs "every day at 7:30" in 5.seconds
// give an exact time to start
val exact = greetings runs "every day at noon" starting now + 1.week
// resets a job to its definition
val reseted = exact.reset()
reseted starting now + 1.day
How mature is Chronos? Is it a viable alternative to scheduler like celery-beat?
Right now our scheduling implements a periodic "heartbeat" task that checks of "outstanding" events and fires them if they are overdue. We are using python-dateutil's rrule for defining this.
We are looking at alternatives to this approach, and Chronos seems a very attactive alternative: 1) it would mitigate the necessity to use a heartbeat schedule task, 2) it supports RESTful submission of events with ISO8601 format, 3) has a useful interface for management, and 4) it scales.
The crucial requirement is that scheduling needs to be configurable on the fly from the Web Interface. This is why can't use celerybeat's built-in scheduling out of the box.
Are we going to shoot ourselves in the foot by switching over to Chronos?
This SO has solutions to your dynamic periodic task problem. It's not the accepted answer at the moment:
from djcelery.models import PeriodicTask, IntervalSchedule
from datetime import datetime
class TaskScheduler(models.Model):
periodic_task = models.ForeignKey(PeriodicTask)
#staticmethod
def schedule_every(task_name, period, every, args=None, kwargs=None):
""" schedules a task by name every "every" "period". So an example call would be:
TaskScheduler('mycustomtask', 'seconds', 30, [1,2,3])
that would schedule your custom task to run every 30 seconds with the arguments 1 ,2 and 3 passed to the actual task.
"""
permissible_periods = ['days', 'hours', 'minutes', 'seconds']
if period not in permissible_periods:
raise Exception('Invalid period specified')
# create the periodic task and the interval
ptask_name = "%s_%s" % (task_name, datetime.datetime.now()) # create some name for the period task
interval_schedules = IntervalSchedule.objects.filter(period=period, every=every)
if interval_schedules: # just check if interval schedules exist like that already and reuse em
interval_schedule = interval_schedules[0]
else: # create a brand new interval schedule
interval_schedule = IntervalSchedule()
interval_schedule.every = every # should check to make sure this is a positive int
interval_schedule.period = period
interval_schedule.save()
ptask = PeriodicTask(name=ptask_name, task=task_name, interval=interval_schedule)
if args:
ptask.args = args
if kwargs:
ptask.kwargs = kwargs
ptask.save()
return TaskScheduler.objects.create(periodic_task=ptask)
def stop(self):
"""pauses the task"""
ptask = self.periodic_task
ptask.enabled = False
ptask.save()
def start(self):
"""starts the task"""
ptask = self.periodic_task
ptask.enabled = True
ptask.save()
def terminate(self):
self.stop()
ptask = self.periodic_task
self.delete()
ptask.delete()
I haven't used djcelery yet, but it supposedly has an admin interface for dynamic periodic tasks.
I have created DNN scheduled task on my website to generate a report of all users created since the last run of the task. I want to do this so that the report can be configured to generate daily, weekly, monthly or any other duration, just by changing the properties of the scheduled task in DNN.
My problem is that I am not sure how to get the "last run date" of a task inside my dll. It is not clear if this is possible, and if it is, then which property of the ScheduleHistoryItem object I should use.
(DNN v5.6.2)
Yes it is possible. Once you have pulled the list of ScheduleHistoryItems you want via the SchedulingProvider.Instance().GetScheduleHistory function, you can sort the list by the built in ScheduleHistorySortStartDate IComparer. The function below will return the last ScheduledHistoryItem that ran, which you can then check the EndDate property of the result to determine when the task last completed.
public DotNetNuke.Services.Scheduling.ScheduleHistoryItem GetLastScheduleHistoryItem(int ScheduleId = -1)
{
System.Collections.ArrayList scheduleHistory = DotNetNuke.Services.Scheduling.SchedulingProvider.Instance().GetScheduleHistory(ScheduleId);
if (scheduleHistory != null)
{
scheduleHistory.Sort(new DotNetNuke.Services.Scheduling.ScheduleHistorySortStartDate()); //Sort the returned results by the Start Date
if (scheduleHistory.Count > 0)
return (DotNetNuke.Services.Scheduling.ScheduleHistoryItem)scheduleHistory[0];
}
return null;
}