Celery task STARTED permanantly (not retried) - celery

We use a celery worker in a docker instance. If the docker instance is killed (docker could be changed and brought back up) we need to retry the task. My task currently looks like this:
#app.task(bind=True, max_retries=3, default_retry_delay=5, task_acks_late=True)
def build(self, config, import_data):
build_chain = chain(
build_dataset_docstore.s(config, import_data),
build_index.s(),
assemble_bundle.s()
).on_error(handle_chain_error.s())
return build_chain
#app.task(bind=True, max_retries=3, default_retry_delay=5, task_acks_late=True)
def build_dataset_docstore(self, config, import_data):
# do lots of stuff
#app.task(bind=True, max_retries=3, default_retry_delay=5, task_acks_late=True)
def build_index(self, config, import_data):
# do lots of stuff
#app.task(bind=True, max_retries=3, default_retry_delay=5, task_acks_late=True)
def assemble_bundle(self, config, import_data):
# do lots of stuff
To imitate the container being restarted (worker being killed) I'm running the following script:
SLEEP_FOR=1
echo "-> Killing worker"
docker-compose-f docker/docker-compose-dev.yml kill worker
echo "-> Waiting $SLEEP_FOR seconds"
sleep $SLEEP_FOR
echo "-> Bringing worker back to life"
docker-compose-f docker/docker-compose-dev.yml start worker
Looking in flower i see the task is STARTED... cool, but...
why isnt it retried?
Do i need to handle this circumstance manually?
if so, what is the correct way to do this?
EDIT:
from the docs:
If the worker won’t shutdown after considerate time, for being stuck in an infinite-loop or similar, you can use the KILL signal to force terminate the worker: but be aware that currently executing tasks will be lost (i.e., unless the tasks have the acks_late option set).
I'm using the acks late option, so why isn't this retrying?

The issue here seems to be task_acks_late (https://docs.celeryproject.org/en/latest/userguide/configuration.html#task-acks-late), which i assume is a param for the celery app, on the task.
I updated task_acks_late to acks_late and added reject_on_worker_lost and this functions as expected.
Thus:
#app.task(bind=True, max_retries=3, default_retry_delay=5, acks_late=True, reject_on_worker_lost=True)
def assemble_bundle(self, config, import_data):
# do lots of stuff

Related

Periodic Task not running using Celery

I have setup Celery to run a periodic task every 10 seconds that sends a post request to my Django Rest API Framework.
When I run the Celery worker it picks up the task correctly:
[tasks]
. FutureForex.celery.debug_task
. arbitrager.tasks.arb_data_post_request
When I run the beat nothing more gets logged and the POST request is not executed:
[2021-12-10 16:51:24,696: INFO/MainProcess] beat: Starting...
My tasks.py contains the following:
from celery import Celery
from celery.schedules import crontab
from celery.utils.log import get_task_logger
import requests
app = Celery()
logger = get_task_logger(__name__)
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls arb_data_post_request every 10 seconds.
sender.add_periodic_task(10.0, arb_data_post_request.s(), name='arb_data_post_request')
#app.task
def arb_data_post_request():
"""
Post request to the Django framework to pull the exchnage data and save to the database
:return:
"""
request = requests.post('http://127.0.0.1:8000/arbitrager/data/')
logger.info(request.text)
I believe Celery is installed and setup correctly, as it finds the task. I can provide any settings if required though.
Any ideas as to why it doesn't kick off the task according to the scheduled 10 seconds would be appreciated.
Thanks,
Saul

Is there a function in celery for finding waiting messages in a queue? [duplicate]

How can I retrieve a list of tasks in a queue that are yet to be processed?
EDIT: See other answers for getting a list of tasks in the queue.
You should look here:
Celery Guide - Inspecting Workers
Basically this:
my_app = Celery(...)
# Inspect all nodes.
i = my_app.control.inspect()
# Show the items that have an ETA or are scheduled for later processing
i.scheduled()
# Show tasks that are currently active.
i.active()
# Show tasks that have been claimed by workers
i.reserved()
Depending on what you want
If you are using Celery+Django simplest way to inspect tasks using commands directly from your terminal in your virtual environment or using a full path to celery:
Doc: http://docs.celeryproject.org/en/latest/userguide/workers.html?highlight=revoke#inspecting-workers
$ celery inspect reserved
$ celery inspect active
$ celery inspect registered
$ celery inspect scheduled
Also if you are using Celery+RabbitMQ you can inspect the list of queues using the following command:
More info: https://linux.die.net/man/1/rabbitmqctl
$ sudo rabbitmqctl list_queues
if you are using rabbitMQ, use this in terminal:
sudo rabbitmqctl list_queues
it will print list of queues with number of pending tasks. for example:
Listing queues ...
0b27d8c59fba4974893ec22d478a7093 0
0e0a2da9828a48bc86fe993b210d984f 0
10#torob2.celery.pidbox 0
11926b79e30a4f0a9d95df61b6f402f7 0
15c036ad25884b82839495fb29bd6395 1
celerey_mail_worker#torob2.celery.pidbox 0
celery 166
celeryev.795ec5bb-a919-46a8-80c6-5d91d2fcf2aa 0
celeryev.faa4da32-a225-4f6c-be3b-d8814856d1b6 0
the number in right column is number of tasks in the queue. in above, celery queue has 166 pending task.
If you don't use prioritized tasks, this is actually pretty simple if you're using Redis. To get the task counts:
redis-cli -h HOST -p PORT -n DATABASE_NUMBER llen QUEUE_NAME
But, prioritized tasks use a different key in redis, so the full picture is slightly more complicated. The full picture is that you need to query redis for every priority of task. In python (and from the Flower project), this looks like:
PRIORITY_SEP = '\x06\x16'
DEFAULT_PRIORITY_STEPS = [0, 3, 6, 9]
def make_queue_name_for_pri(queue, pri):
"""Make a queue name for redis
Celery uses PRIORITY_SEP to separate different priorities of tasks into
different queues in Redis. Each queue-priority combination becomes a key in
redis with names like:
- batch1\x06\x163 <-- P3 queue named batch1
There's more information about this in Github, but it doesn't look like it
will change any time soon:
- https://github.com/celery/kombu/issues/422
In that ticket the code below, from the Flower project, is referenced:
- https://github.com/mher/flower/blob/master/flower/utils/broker.py#L135
:param queue: The name of the queue to make a name for.
:param pri: The priority to make a name with.
:return: A name for the queue-priority pair.
"""
if pri not in DEFAULT_PRIORITY_STEPS:
raise ValueError('Priority not in priority steps')
return '{0}{1}{2}'.format(*((queue, PRIORITY_SEP, pri) if pri else
(queue, '', '')))
def get_queue_length(queue_name='celery'):
"""Get the number of tasks in a celery queue.
:param queue_name: The name of the queue you want to inspect.
:return: the number of items in the queue.
"""
priority_names = [make_queue_name_for_pri(queue_name, pri) for pri in
DEFAULT_PRIORITY_STEPS]
r = redis.StrictRedis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
return sum([r.llen(x) for x in priority_names])
If you want to get an actual task, you can use something like:
redis-cli -h HOST -p PORT -n DATABASE_NUMBER lrange QUEUE_NAME 0 -1
From there you'll have to deserialize the returned list. In my case I was able to accomplish this with something like:
r = redis.StrictRedis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
pickle.loads(base64.decodestring(json.loads(l[0])['body']))
Just be warned that deserialization can take a moment, and you'll need to adjust the commands above to work with various priorities.
To retrieve tasks from backend, use this
from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="localhost:5672 ", userid="guest",
password="guest", virtual_host="/", insist=False)
chan = conn.channel()
name, jobs, consumers = chan.queue_declare(queue="queue_name", passive=True)
A copy-paste solution for Redis with json serialization:
def get_celery_queue_items(queue_name):
import base64
import json
# Get a configured instance of a celery app:
from yourproject.celery import app as celery_app
with celery_app.pool.acquire(block=True) as conn:
tasks = conn.default_channel.client.lrange(queue_name, 0, -1)
decoded_tasks = []
for task in tasks:
j = json.loads(task)
body = json.loads(base64.b64decode(j['body']))
decoded_tasks.append(body)
return decoded_tasks
It works with Django. Just don't forget to change yourproject.celery.
This worked for me in my application:
def get_celery_queue_active_jobs(queue_name):
connection = <CELERY_APP_INSTANCE>.connection()
try:
channel = connection.channel()
name, jobs, consumers = channel.queue_declare(queue=queue_name, passive=True)
active_jobs = []
def dump_message(message):
active_jobs.append(message.properties['application_headers']['task'])
channel.basic_consume(queue=queue_name, callback=dump_message)
for job in range(jobs):
connection.drain_events()
return active_jobs
finally:
connection.close()
active_jobs will be a list of strings that correspond to tasks in the queue.
Don't forget to swap out CELERY_APP_INSTANCE with your own.
Thanks to #ashish for pointing me in the right direction with his answer here: https://stackoverflow.com/a/19465670/9843399
The celery inspect module appears to only be aware of the tasks from the workers perspective. If you want to view the messages that are in the queue (yet to be pulled by the workers) I suggest to use pyrabbit, which can interface with the rabbitmq http api to retrieve all kinds of information from the queue.
An example can be found here:
Retrieve queue length with Celery (RabbitMQ, Django)
I think the only way to get the tasks that are waiting is to keep a list of tasks you started and let the task remove itself from the list when it's started.
With rabbitmqctl and list_queues you can get an overview of how many tasks are waiting, but not the tasks itself: http://www.rabbitmq.com/man/rabbitmqctl.1.man.html
If what you want includes the task being processed, but are not finished yet, you can keep a list of you tasks and check their states:
from tasks import add
result = add.delay(4, 4)
result.ready() # True if finished
Or you let Celery store the results with CELERY_RESULT_BACKEND and check which of your tasks are not in there.
As far as I know Celery does not give API for examining tasks that are waiting in the queue. This is broker-specific. If you use Redis as a broker for an example, then examining tasks that are waiting in the celery (default) queue is as simple as:
connect to the broker
list items in the celery list (LRANGE command for an example)
Keep in mind that these are tasks WAITING to be picked by available workers. Your cluster may have some tasks running - those will not be in this list as they have already been picked.
The process of retrieving tasks in particular queue is broker-specific.
I've come to the conclusion the best way to get the number of jobs on a queue is to use rabbitmqctl as has been suggested several times here. To allow any chosen user to run the command with sudo I followed the instructions here (I did skip editing the profile part as I don't mind typing in sudo before the command.)
I also grabbed jamesc's grep and cut snippet and wrapped it up in subprocess calls.
from subprocess import Popen, PIPE
p1 = Popen(["sudo", "rabbitmqctl", "list_queues", "-p", "[name of your virtula host"], stdout=PIPE)
p2 = Popen(["grep", "-e", "^celery\s"], stdin=p1.stdout, stdout=PIPE)
p3 = Popen(["cut", "-f2"], stdin=p2.stdout, stdout=PIPE)
p1.stdout.close()
p2.stdout.close()
print("number of jobs on queue: %i" % int(p3.communicate()[0]))
If you control the code of the tasks then you can work around the problem by letting a task trigger a trivial retry the first time it executes, then checking inspect().reserved(). The retry registers the task with the result backend, and celery can see that. The task must accept self or context as first parameter so we can access the retry count.
#task(bind=True)
def mytask(self):
if self.request.retries == 0:
raise self.retry(exc=MyTrivialError(), countdown=1)
...
This solution is broker agnostic, ie. you don't have to worry about whether you are using RabbitMQ or Redis to store the tasks.
EDIT: after testing I've found this to be only a partial solution. The size of reserved is limited to the prefetch setting for the worker.
from celery.task.control import inspect
def key_in_list(k, l):
return bool([True for i in l if k in i.values()])
def check_task(task_id):
task_value_dict = inspect().active().values()
for task_list in task_value_dict:
if self.key_in_list(task_id, task_list):
return True
return False
With subprocess.run:
import subprocess
import re
active_process_txt = subprocess.run(['celery', '-A', 'my_proj', 'inspect', 'active'],
stdout=subprocess.PIPE).stdout.decode('utf-8')
return len(re.findall(r'worker_pid', active_process_txt))
Be careful to change my_proj with your_proj
To get the number of tasks on a queue you can use the flower library, here is a simplified example:
from flower.utils.broker import Broker
from django.conf import settings
def get_queue_length(queue):
broker = Broker(settings.CELERY_BROKER_URL)
queues_result = broker.queues([queue])
return queues_result.result()[0]['messages']

Celery - handle WorkerLostError exception with Task.retry()

I'm using celery 4.4.7
Some of my tasks are using too much memory and are getting killed with SIGTERM 9. I would like to retry them later since I'm running with concurrency on the machine and they might run OK again.
However, as far as I understand you can't catch WorkerLostError exception thrown within a task i.e. this won't won't work as I expect:
from billiard.exceptions import WorkerLostError
#celery_app.task(acks_late=True, max_retries=2, autoretry_for=(WorkerLostError,))
def some_task():
#task code
I also don't won't to use task_reject_on_worker_lost as it makes the tasks requeued and max_retries is not applied.
What would be the best approach to handle my use case?
Thanks in advance for your time :)
Gal

Using ForkJoinPool in Scala

In code:
val executor = new ForkJoinPool()
executor.execute(new Runnable{
def run = println("This task is run asynchronously")
})
Thread.sleep(10000)
This code prints: This task is run asynchronously
But if I remove Thread.sleep(10000), program doesn't print.
I then learnt that its so because sleep prevents daemon threads in ForkJoinPool from being terminated before they call run method on Runnable object.
So, few questions:
Does it mean threads started by ForkJoinPool are all daemon threads?Any why is it so?
How does sleep help here?
Answers:
Yes, because you are using the default thread factory and that is how it is configured. You can provide a custom thread factory to it if you wish, and you may configure the threads to be non-daemon.
Sleep helps because it prevents your program from exiting for long enough for the thread pool threads to find your task and execute it.

PoisonPill not killing the Akka actor system as intended

I'm trying to get a child actor in my Scala 2.11 app (that uses Akka) to send a PoisonPill to the top-level actor (Master) and shut down the entire JVM process.
I uploaded this repo so you can reproduce what I'm seeing. Just clone it and run ./gradlew run.
But essentially we have the Driver:
object Driver extends App {
println("Starting upp the app...")
lazy val cortex = ActorSystem("cortex")
cortex.registerOnTermination {
System.exit(0)
}
val master = cortex.actorOf(Props[Master], name = "Master")
println("About to fire a StartUp message at Master...")
master ! StartUp
println("Fired! Actor system is spinning up...")
}
and then in the Child:
class Child extends Actor {
override def receive = {
case MakeItHappen =>
println("Child will make it happen!")
context.parent ! PoisonPill
}
}
When I run ./gradlew run I see the following output:
./gradlew run
Starting a Gradle Daemon (subsequent builds will be faster)
:compileJava NO-SOURCE
:compileScala UP-TO-DATE
:processResources NO-SOURCE
:classes UP-TO-DATE
:run
Starting upp the app...
About to fire a StartUp message at Master...
Fired! Actor system is spinning up...
Stage Director has received a command to start up the actor system!
Child will make it happen!
<==========---> 80% EXECUTING
> :run
However the JVM process never shuts down. I would have expected the PoisonPill to kill the Master (as well as Child), exit/shutdown the actor system, and then the System.exit(0) command (which I registered upon actor system termination) to kick in and exit the JVM process.
Can anyone tell what's going on here?
You're stopping the Master, but you're not shutting down the ActorSystem. Even though Master is a top-level actor (i.e., an actor created by system.actorOf), stopping it doesn't terminate the ActorSystem (note that Master itself has a parent, so it's not the "highest" actor in the actor hierarchy).
The normal way to terminate a ActorSystem is to call terminate. If you want the entire system to shut down if the Master stops, then you could override the Master's postStop hook and call context.system.terminate() there. Calling system.terminate() in an actor's lifecycle hook is risky, however. For example, if an exception is thrown in the actor, and the actor's supervision strategy stops it, you probably don't intend to shut down the entire system at that point. Such a design is not very resilient.
Check out this pattern for shutting down an actor system "at the right time."