Main Issue
I'm testing how to handle certain task failure, for example handling a 'TimeLimitExceeded' exception which instantly kills the task and is not 'catchable' (Yes...I'm aware of the existence of 'SoftTimeLimit' but it doesn't fit my needs).
First Approach
This is my tasks.py (The worker runs with a --time-limit flag):
import logging
from celery import Celery
import time
app = Celery('tasks', broker='pyamqp://guest#localhost//')
def my_fail(task, exc, req_id, req_args, req_kwargs, einfo, *ext_args, **kwargs):
logger.info("args: %r", req_args)
logger.info("kw: %r", req_kwargs)
#app.task(on_failure=my_fail)
def sum(x, y, delay=0, **kw):
result = x+y
if result == 4:
raise Exception("Some Error")
time.sleep(delay)
return x+y
The main idea when a task fails, to be able to perform some handling based on the args/kwargs of the task
For example if I run sum.delay(3, 1, foo="bar") the Exception("Some Error") is raised and the following is logged:
[2019-06-30 17:21:45,120: INFO/Worker-1] args: (3, 1)
[2019-06-30 17:21:45,121: INFO/Worker-1] kw: {'foo': 'bar'}
[2019-06-30 17:21:45,122: ERROR/MainProcess] Task tasks.sum[9e9de032-1469-44e7-8932-4c490fcee2e3] raised unexpected: Exception('Some Error',)
Traceback (most recent call last):
File "/home/apernin/.virtualenvs/dr/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/apernin/.virtualenvs/dr/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__
return self.run(*args, **kwargs)
File "/home/apernin/test/tasks.py", line 89, in sum
raise Exception("Some Error")
Exception: Some Error
Note the args/kwargs are printed by my on-failure handler.
Now if I run sum.delay(3, 2, delay=7) the TimeLimit is triggered
[2019-06-30 17:23:15,244: INFO/MainProcess] Received task: tasks.sum[8c81398b-4378-401d-a674-a3bd3418ccde]
[2019-06-30 17:23:21,070: ERROR/MainProcess] Task tasks.sum[8c81398b-4378-401d-a674-a3bd3418ccde] raised unexpected: TimeLimitExceeded(5.0,)
Traceback (most recent call last):
File "/home/apernin/.virtualenvs/dr/local/lib/python2.7/site-packages/billiard/pool.py", line 645, in on_hard_timeout
raise TimeLimitExceeded(job._timeout)
TimeLimitExceeded: TimeLimitExceeded(5.0,)
[2019-06-30 17:23:21,071: ERROR/MainProcess] Hard time limit (5.0s) exceeded for tasks.sum[8c81398b-4378-401d-a674-a3bd3418ccde]
[2019-06-30 17:23:21,629: ERROR/MainProcess] Process 'Worker-1' pid:15472 exited with 'signal 15 (SIGTERM)'
Note the args/kwargs are note printed, because of the on-failure handler not being excuted. This is somewhat to be expected due to the nature of Celery's Hard Time Limit.
Second Approach
My second approach is to use a event-listener.
from celery import Celery
def my_monitor(app):
state = app.events.State()
def announce_failed_tasks(event):
state.event(event)
# task name is sent only with -received event, and state
# will keep track of this for us.
task = state.tasks.get(event['uuid'])
with app.connection() as connection:
recv = app.events.Receiver(connection, handlers={
'task-failed': announce_failed_tasks,
})
recv.capture(limit=None, timeout=None, wakeup=True)
if __name__ == '__main__':
app = Celery(broker='amqp://guest#localhost//')
my_monitor(app)
The only info I was able to retrieve was the task uuid, I wasn't able to retrieve the name, args or kwargs of the task (the task object contains the attributes but are all None).
Question
Is there a way to either:
Make the on_failure handler in case of a Hard Time Limit?
Retrieve the tasks args/kwargs of a task with a task-failed event listener?
Thanks in advance
First, the timeout is handled by the Worker (the MainProcess) and it is not treated the same as failures happened INSIDE the task, such as exceptions being thrown, etc. This is why you see it as TimeLimitExceeded raised by the MainProcess in the log. So, unfortunately you can't rely on the same logic...
However, your second approach will prove useful in tracking down what is going on.
I have developed (in-house) a Celery monitoring tool that grabs all the events, and populates a database with them so that later we can do all sort of analytics (see average and worst running times for an example, frequency of failures, etc).
In order to grab the details you need from the data given by the task-failed event you also need to record (store it in some dictionary for an example) the task-received event data. This information contains args, task names, and all sort of useful information you may need. You relate them both by the task UUID.
Related
In my pytest I am actually not doing much just making uiautomator dump and pull the created file to local path, but it does give me the below warning:
copporcomm_test/access_try.py::test_unconfig_deamon_is_running_android
/usr/lib/python3/dist-packages/_pytest/threadexception.py:73: PytestUnhandledThreadExceptionWarning: Exception in thread Thread-1
...
warnings.warn(pytest.PytestUnhandledThreadExceptionWarning(msg))
When I remark out the line where I do uiautomator dump it doesn't occurs.
Any clue what it could be the problem?
My small test case:
def test_android(adb):
adb_output = adb.shell(command="uiautomator dump", assert_ok=True, timeout=10)
assert "UI hierchary dumped to" in adb_output
adb.pull(on_device_path="/sdcard/window_dump.xml", local_path="ui.xml", timeout=10)
print("Pulled file!")
Of course I could catch the warning but I want also to understand why it happens.
After the comment from Teejay Bruno I modified the code to see the threads and I realized that from start it already has 4 thread counts.
Below printout is when thread1 and thread2 lines are comment out (in the test_android()).
START: Current active thread count: 4
Updated test case using threading:
import threading
import pytest
def thread1_sub(adb):
print("Create xml file with current activity layout.")
adb_output = adb.shell(command="uiautomator dump", assert_ok=True, timeout=300)
assert "UI hierchary dumped to" in adb_output
def thread2_sub(adb):
print("Pull created xml file to loacal path.")
adb.pull(on_device_path="/sdcard/window_dump.xml", local_path="ui.xml", timeout=10)
def test_android(adb):
thread1 = threading.Thread(target=thread1_sub, args=(adb,), name="Thread1")
thread2 = threading.Thread(target=thread2_sub, args=(adb,), name="Thread2")
print("START: Current active thread count: ", threading.active_count())
thread1.start()
thread2.start()
thread2.join()
Now by using threading everything works just fine and all threads are handled.
all,
I have a question regarding Celery. Let’s suppose I have the following Celery tasks:
#celery_app.task
def add(x, y):
return x + y
#celery_app.task
def task_no(n):
return f'Finished task {n}.'
#celery_app.task
def add_bunch():
return chord([add.si(1, 1), add.si(2, 2)])(task_no.si('1'))
#celery_app.task
def do_it_all():
chain(
add_bunch.si(),
task_no.si('2')
).apply_async()
If I run do_it_all() , I get the following output:
[INFO/MainProcess] Received task: lumi_translation.celery_tasks.add_bunch[d40dc179-602d-4414-9fbd-ee8d62fe7604]
[INFO/ForkPoolWorker-1] Task lumi_translation.celery_tasks.add_bunch[d40dc179-602d-4414-9fbd-ee8d62fe7604] succeeded in 0.01651039347052574s: <AsyncResult: d5564664-1e6f-445f-a172-442fef547422>
[INFO/MainProcess] Received task: lumi_translation.celery_tasks.add[fbc7288a-1f76-447a-ac2b-906ddaa6c00c]
[INFO/ForkPoolWorker-1] Task lumi_translation.celery_tasks.add[fbc7288a-1f76-447a-ac2b-906ddaa6c00c] succeeded in 0.0005592871457338333s: 2
[INFO/MainProcess] Received task: lumi_translation.celery_tasks.add[472d6142-355d-466b-8ee4-0d8cc7e1d96e]
[INFO/ForkPoolWorker-1] Task lumi_translation.celery_tasks.add[472d6142-355d-466b-8ee4-0d8cc7e1d96e] succeeded in 0.0012424923479557037s: 4
[INFO/MainProcess] Received task: lumi_translation.celery_tasks.task_no[faa013e7-42c5-4321-b132-e749169810ee]
[INFO/ForkPoolWorker-1] Task lumi_translation.celery_tasks.task_no[faa013e7-42c5-4321-b132-e749169810ee] succeeded in 0.0003700973466038704s: 'Finished task 2.'
[INFO/MainProcess] Received task: lumi_translation.celery_tasks.task_no[d5564664-1e6f-445f-a172-442fef547422]
[INFO/ForkPoolWorker-1] Task lumi_translation.celery_tasks.task_no[d5564664-1e6f-445f-a172-442fef547422] succeeded in 0.0003337441012263298s: 'Finished task 1.'
add_bunch task issues success even when the children tasks have not finished; hence, task 2 finishes before task 1. Is there a way to make the add_bunch task issue success only when all the children tasks have finished successfully? In the above example, is there a way to make sure task 1 finishes before task 2?
add_bunch() does "nothing", it just creates a Chord object and returns it. This is always going to succeed, ie. to return a valid Chord object unless, of course, if it can't allocate any more memory...
One of my coworkers showed me a workaround to do this. He said that he had spent a really unreasonable amount of time figuring it out. I am putting it out here so it will save someone else the trouble.
A workable rewrite of the add_bunch() task is:
#celery_app.task(bind=True)
def add_bunch(self):
self.replace(
chord(header=[add.si(1, 1), add.si(2, 2)], body=task_no.si('1'))
)
I have working Celery 3.1 app which logs some sensitive info. Ideally I would to have the same log, but without result part.
Currently it looks like:
worker_1 | [2019-12-10 13:46:40,052: INFO/MainProcess] Task xxxxx succeeded in 13.19569299298746s: yyyyyyy
I would like to have:
worker_1 | [2019-12-10 13:46:40,052: INFO/MainProcess] Task xxxxx succeeded in 13.19569299298746s
How to do that?
Edit:
It seems that this could do the job: https://docs.celeryproject.org/en/3.1/reference/celery.worker.job.html#celery.worker.job.Request.success_msg but I have no idea how to actually use it.
Just in case it's useful to anyone in near future, I found in Celery 4.4 the success_msg in the Request class has been moved to the application tracer.
Luckily, it seems this can be easily overridden in your Django app's celery.py like so:
from celery.app import trace
trace.LOG_SUCCESS = """\
Task %(name)s[%(id)s] succeeded in %(runtime)ss\
"""
You can change it to anything you like of course, this just removes the return value portion. Full context here.
You need to override the success message being sent, remove the return_value format from there.
For that you need to override the Request class, as described here.
You can also override the logging config as mentioned here.
worker_1 | [2019-12-10 13:46:40,052: INFO/MainProcess] Task xxxxx succeeded in 13.19569299298746s: yyyyyyy
yyyyyyy is the result that your function returns, to remove that simply return what you want.
in your case only return will work
Spring Batch jobs can be started from the commandline by telling the JVM to run CommandLineJobRunner. According to the JavaDoc, running the same command with the added parameter of -stop will stop the Job:
The arguments to this class can be provided on the command line
(separated by spaces), or through stdin (separated by new line). They
are as follows:
jobPath jobIdentifier (jobParameters)* The command line options are as
follows
jobPath: the xml application context containing a Job
-restart: (optional) to restart the last failed execution
-stop: (optional) to stop a running execution
-abandon: (optional) to abandon a stopped execution
-next: (optional) to start the next in a sequence according to the JobParametersIncrementer in the Job jobIdentifier: the name of the job or the id of a job execution (for -stop, -abandon or -restart).
jobParameters: 0 to many parameters that will be used to launch a job specified in the form of key=value pairs.
However, on the JavaDoc for the main() method the -stop parameter is not specified. Looking through the code on docjar.com I can't see any use of the -stop parameter where I would expect it to be.
I suspect that it is possible to stop a batch that has been started from the command line but only if the batches being run are backed by a non-transient jobRepository? If running a batch on the command line that only stores its data in HSQL (ie in memory) there is no way to stop the job other than CTRL-C etc?
stop command is implemented, see source for CommandLineJobRunner, line 300+
if (opts.contains("-stop")) {
List<JobExecution> jobExecutions = getRunningJobExecutions(jobIdentifier);
if (jobExecutions == null) {
throw new JobExecutionNotRunningException("No running execution found for job=" + jobIdentifier);
}
for (JobExecution jobExecution : jobExecutions) {
jobExecution.setStatus(BatchStatus.STOPPING);
jobRepository.update(jobExecution);
}
return exitCodeMapper.intValue(ExitStatus.COMPLETED.getExitCode());
}
The stop switch will work, but it will only stop the job after the currently executing step completes. It won't kill the job immediately.
Under some conditions, I want to make a celery task fail from within that task. I tried the following:
from celery.task import task
from celery import states
#task()
def run_simulation():
if some_condition:
run_simulation.update_state(state=states.FAILURE)
return False
However, the task still reports to have succeeded:
Task sim.tasks.run_simulation[9235e3a7-c6d2-4219-bbc7-acf65c816e65]
succeeded in 1.17847704887s: False
It seems that the state can only be modified while the task is running and once it is completed - celery changes the state to whatever it deems is the outcome (refer to this question). Is there any way, without failing the task by raising an exception, to make celery return that the task has failed?
To mark a task as failed without raising an exception, update the task state to FAILURE and then raise an Ignore exception, because returning any value will record the task as successful, an example:
from celery import Celery, states
from celery.exceptions import Ignore
app = Celery('tasks', broker='amqp://guest#localhost//')
#app.task(bind=True)
def run_simulation(self):
if some_condition:
# manually update the task state
self.update_state(
state = states.FAILURE,
meta = 'REASON FOR FAILURE'
)
# ignore the task so no other state is recorded
raise Ignore()
But the best way is to raise an exception from your task, you can create a custom exception to track these failures:
class TaskFailure(Exception):
pass
And raise this exception from your task:
if some_condition:
raise TaskFailure('Failure reason')
I'd like to further expand on Pierre's answer as I've encountered some issues using the suggested solution.
To allow custom fields when updating a task's state to states.FAILURE, it is important to also mock some attributes that a FAILURE state would have (notice exc_type and exc_message)
While the solution will terminate the task, any attempt to query the state (For example - to fetch the 'REASON FOR FAILURE' value) will fail.
Below is a snippet for reference I took from:
https://www.distributedpython.com/2018/09/28/celery-task-states/
#app.task(bind=True)
def task(self):
try:
raise ValueError('Some error')
except Exception as ex:
self.update_state(
state=states.FAILURE,
meta={
'exc_type': type(ex).__name__,
'exc_message': traceback.format_exc().split('\n'),
'custom': '...'
})
raise Ignore()
I got an interesting reply on this question from Ask Solem, where he proposes an 'after_return' handler to solve the issue. This might be an interesting option for the future.
In the meantime I solved the issue by simply returning a string 'FAILURE' from the task when I want to make it fail and then checking for that as follows:
result = AsyncResult(task_id)
if result.state == 'FAILURE' or (result.state == 'SUCCESS' and result.get() == 'FAILURE'):
# Failure processing task