Setting Time Limit on specific task with celery - celery

I have a task in Celery that could potentially run for 10,000 seconds while operating normally. However all the rest of my tasks should be done in less than one second. How can I set a time limit for the intentionally long running task without changing the time limit on the short running tasks?

You can set task time limits (hard and/or soft) either while defining a task or while calling.
from celery.exceptions import SoftTimeLimitExceeded
#celery.task(time_limit=20)
def mytask():
try:
return do_work()
except SoftTimeLimitExceeded:
cleanup_in_a_hurry()
or
mytask.apply_async(args=[], kwargs={}, time_limit=30, soft_time_limit=10)

This is an example with decorator for an specific Task and Celery 3.1.23 using soft_time_limit=10000
#task(bind=True, default_retry_delay=30, max_retries=3, soft_time_limit=10000)
def process_task(self, task_instance):
"""Task processing."""
pass

Related

Dash - Run callback in server side

Good morning,
I have created a callback in Dash that makes the job of a scheduler.
Every 10 minutes (with the help of an interval component), my callback is running to fetch the data from a server and to update the csv file that I use in my app.
The problem is that my callback is called only when I have the webpage opened. As soon as I close the page, the scheduler stops and runs again when I open the page again.
As the data process of updating data can be long sometimes, I want the scheduler to always run and fetch the data every 10 minutes.
I assume that a callback is a client side process right? So how can I make it run in server side?
Thank you,
Dash is probably not the right solution for this. I think it would make more sense to set up the Python code you need for this job in a simple .py script file, and set a cron job to run that script every 10 min.
Thank you #coralvanda for the help.
I finally did a python script in my container A that calls the container B every 10 minutes. The container B is fetching the data.
It makes the job.
import schedule
import time
import docker
def worker_restart():
client = docker.from_env()
container = client.containers.get('container_worker')
container.restart()
schedule.every(10).minutes.do(worker_restart)
while True:
schedule.run_pending()
time.sleep(1)

How to configure workflow runtime (Informatica)?

I would like the workflow to run for 20 minutes....If the running process does not complete within 20 minutes, the workflow should be ended immediately..However, I can only find the timer, but it is used for starting the process after the indicated time which is not I'm looking for...Does anyone know how to specify the duration for the workflow?
[updated to cover complete scenario and cover issues raised by Koushik Sinharoy in comments]
You can achieve it using the timer:
Link one to the Start task and set it to run for 20 minutes in
parallel with your session.
Have a Decision task with Treat the input links as set to OR. This will trigger the decision whenever any of the preceeding task ends, so either your session will get completed or timer runs out (whatever happens first).
Set the Decision condition to $s_your_session.Status = SUCCEEDED.
Link the Decision task to Control Task.
Set Control task to Fail parent.
Add a condition $Decision.Condition = False to the link between Decision task and Control task.
This should be the result:
Start--->s_your_session--\
\ > Decision [OR] ---(False)---> Control Task [Fail parent]
\-->timer-----------/
Thanks Koushik Sinharoy for the remarks below!

django-celery PeriodicTask and eta field

I have a django project in combination with celery and my need is to be able to schedule tasks dynamically, at some point in the future, with recurrence or not. I need the ability to delete/edit already scheduled tasks
So to achieve this at the beginning I started using django-celery with DatabaseScheduler to store some PeriodicTasks (with expiration) to the database as it is described more or less here
In this way if I close my app and start it again my schedules are still there
My problem though still remains since I cannot utilize the eta and schedule a task at some point in the future. Is it possible somehow to dynamically schedule a task with eta?
A second question of mine is whether I can schedule a once off task, like schedule it to run e.g. at 2015-05-15 15:50:00 (that is why I'm trying to use eta)
Finally, I will be scheduling some thousants of notifications, is celery beat capable to handle this number of scheduled tasks? some of them once-off while others being periodic? Or do I have to go with a more advanced solution such as APScheduler
Thank you
I've faced the same problem yesterday. My ugly temporary solution is:
# tasks.py
from djcelery.models import PeriodicTask, IntervalSchedule
from datetime import timedelta, datetime
from django.utils.timezone import now
...
#app.task
def schedule_periodic_task(task='app.tasks.task', task_args=[], task_kwargs={},
interval=(1, 'minute'), expires=now()+timedelta(days=365*100)):
PeriodicTask.objects.filter(name=task+str(task_args)+str(task_kwargs)).delete()
task = PeriodicTask.objects.create(
name=task+str(task_args)+str(task_kwargs), task=task,
args=str(task_args),
kwargs=str(task_kwargs),
interval=IntervalSchedule.objects.get_or_create(
every=interval[0],
period=interval[1])[0],
expires=expires,
)
task.save()
So, if you want to schedule periodic task with eta, you shoud
# anywhere.py
schedule_periodic_task.apply_async(
kwargs={'task': 'grabber.tasks.grab_events',
'task_args': [instance.xbet_id], 'task_kwargs': {},
'interval': (10, 'seconds'),
'expires': instance.start + timedelta(hours=3)},
eta=instance.start,
)
schedule task with eta, which creates periodic task. Ugly:
deal with raw.task.name
strange period (n, 'interval')
Please, let me know, if you designed some pretty solution.

How is delay of the akka scheduler?

I wrote an actor, using a scheduler each 5 milliseconds to assign a time stamp to a field.
And in test, I found that the result of (System.currentTimeMillis() - timeField) is at least 35
And if I use a scheduleAtFixedRate() method from the Executors.scheduledThreadPool, the result is right.
So is the scheduler delay has a min value ?
Answer: the default tick duration is 100 milliseconds.
ScheduledThreadPoolExecutor uses System.nanoTime():
http://fuseyism.com/classpath/doc/java/util/concurrent/ScheduledThreadPoolExecutor-source.html
See comparison here:
System.currentTimeMillis vs System.nanoTime
Akka has a fairly extensive documentation.
Here's an excerpt:
"The default implementation of Scheduler used by Akka is based on the Netty HashedWheelTimer. It does not execute tasks at the exact time, but on every tick, it will run everything that is overdue. The accuracy of the default Scheduler can be modified by the “ticks-per-wheel” and “tick-duration” configuration properties. For more information, see: HashedWheelTimers."

tasknumber in MATLAB distributed jobs?

I'm running a distributed job on a cluster. I need to execute a script that sends me an email when the last task finishes (rather, all the tasks are complete). I have my script ready, but I'm not sure how to go about finding task completion. Is there a task ID analogous to labindex?
The reason I want to build this email feature into the job is so that I can just quit MATLAB after submission and collect my data when it's done. That way I won't waste resources pinging it often to get its state.
jobMgr = findResource(parameters for your cluster's job manager...);
job = createJob(jobMgr);
set(job, 'JobData', yourdata);
set(job, 'MaximumNumberOfWorkers', yourmaxworkers);
set(job, 'PathDependencies', yourpathdeps);
set(job, 'FileDependencies', yourfiledeps);
set(job, 'Timeout', yourtimeout);
for m = 1:numjobs
task(m) = createTask(job, #parallelfoo, 1, {m});
% Calls taskFinish when the task completes
set(task(m), 'FinishedFcn', {#taskFinish, m});
end
Elsewhere, you'll have defined a function taskFinish that gets a callback when each task completes.
function taskFinish(taskObj, eventData, tasknum)
disp(['Task ' num2str(tasknum) ' completed']);
end
Note, this code was written for the original release of the Distributed Computing Toolbox (which was subsequently renamed the Parallel Computing Toolbox), so it's possible that there are more elegant ways of accomplishing what you're trying to do. This gets the job done though, with one caveat -- my understanding is that this callback functionality only works if you're running the MATLAB job manager on your cluster (not one of the third party MPI job managers such as TORQUE).
I'm not sure if this answers much of your question, but you can get at the Task ID when running on the workers like so:
t = getCurrentTask();
tid = t.ID;
However, note that most schedulers execute tasks in an arbitrary order...