Celery flower Persistent data - celery

I want to keep the tasks, that is, if my service is stopped and run again
I can see the previous tasks again
Apparently, I have to set it up in flower, I tried this but it doesn't work, have you had any experience with it?
celery -A tasks flower --persistent=True --db=flower.db --state_save_interval=5
I used the
result backend
and save it in mongo, but I have a problem to display it
After it was reset, I can see the previous ones again
--persistent=True
--db=flower.db
--state_save_interval=5your text
result backend

Related

Django celery delete specific tasks

we know we have some badly formed tasks in the celery queue that are crashing our workers, is there an easy way to manually delete them?
We don't want to flush the whole thing as there might be some important emails to be sent...
Use celery flower to manage and monitor the celery task. Refer to https://flower.readthedocs.io/en/latest/

How to have celery expire results when using a database backend

I'm not sure I understand how result_expires works.
I read,
result_expires
Default: Expire after 1 day.
Time (in seconds, or a timedelta object) for when after stored task tombstones will be deleted.
A built-in periodic task will delete the results after this time (celery.backend_cleanup), assuming that celery beat is enabled. The task runs daily at 4am.
...
When using the database backend, celery beat must be running for the results to be expired.
(from here: http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-result_expires)
So, in order for this to work, I have to actually do something like this:
python -m celery -A myapp beat -l info --detach
?
Is that what the documentation is referring to by "celery beat is enabled"? Or, rather than executing this manually, there is some configuration that needs to be set which would cause celery beat to be called automatically?
Re: celery beat--you are correct. If you use a database backend, you have to run celery beat as you posted in your original post. By default celery beat sets up a daily task that will delete older results from the results database. If you are using a redis results backend, you do not have to run celery beat. How you choose to run celery beat is up to you, personally, we do it via systemd.
If you want to configure the default expiration time to be something other than the default 1 day, you can use the result_expires setting in celery to set the number of seconds after a result is recorded that it should be deleted. e.g., 1800 for 30 minutes.

Airflow: what do `airflow webserver`, `airflow scheduler` and `airflow worker` exactly do?

I've been working with Airflow for a while now, which was set up by a colleague. Lately I run into several errors, which require me to more in dept know how to fix certain things within Airflow.
I do understand what the 3 processes are, I just don't understand the underlying things that happen when I run them. What exactly happens when I run one of the commands? Can I somewhere see afterwards that they are running? And if I run one of these commands, does this overwrite older webservers/schedulers/workers or add a new one?
Moreover, if I for example run airflow webserver, the screen shows some of the things that are happening. Can I simply get out of this by pressing CTRL + C? Because when I do this, it says things like Worker exiting and Shutting down: Master. Does this mean I'm shutting everything down? How else should I get out of the webserver screen then?
Each process does what they are built to do while they are running (webserver provides a UI, scheduler determines when things need to be run, and workers actually run the tasks).
I think your confusion is that you may be seeing them as commands that tell some sort of "Airflow service" to do something, but they are each standalone commands that start the processes to do stuff. ie. Starting from nothing, you run airflow scheduler: now you have a scheduler running. Run airflow webserver: now you have a webserver running. When you run airflow webserver, it is starting a python flask app. While that process is running, the webserver is running, if you kill command, is goes down.
All three have to be running for airflow as a whole to work (assuming you are using an executor that needs workers). You should only ever had one scheduler running, but if you were to run two processes of airflow webserver (ignoring port conflicts, you would then have two separate http servers running using the same metadata database. Workers are a little different in that you may want multiple worker processes running so you can execute more tasks concurrently. So if you create multiple airflow worker processes, you'll end up with multiple processes taking jobs from the queue, executing them, and updating the task instance with the status of the task.
When you run any of these commands you'll see the stdout and stderr output in console. If you are running them as a daemon or background process, you can check what processes are running on the server.
If you ctrl+c you are sending a signal to kill the process. Ideally for a production airflow cluster, you should have some supervisor monitoring the processes and ensuring that they are always running. Locally you can either run the commands in the foreground of separate shells, minimize them and just keep them running when you need them. Or run them in as a background daemon with the -D argument. ie airflow webserver -D.

Gracefully update running celery pod in Kubernetes

I have a Kubernetes cluster running Django, Celery, RabbitMq and Celery Beat. I have several periodic tasks spaced out throughout the day (so as to keep server load down). There are only a few hours when no tasks are running, and I want to limit my rolling-updates to those times, without having to track it manually. So I'm looking for a solution that will allow me to fire off a script or task of some sort that will monitor the Celery server, and trigger a rolling update once there's a window in which no tasks are actively running. There are two possible ways I thought of doing this, but I'm not sure which is best, nor how to implement either one.
Run a script (bash or otherwise) that checks up on the Celery server every few minutes, and initiates the rolling-update if the server is inactive
Increment the celery app name before each update (in the Beat run command, the Celery run command, and in the celery.py config file), create a new Celery pod, rolling-update the Beat pod, and then delete the old Celery 12 hours later (a reasonable time span for all running tasks to finish)
Any thoughts would be greatly appreciated.

celery flower gives unknown worker

I'm using celery with redis backend.
i am passing CELERY_RESULT_BACKEND with the correct redis url and also the broker url to the Celery app config.
I start flower by giving the path to my Celery app with -A and also set the --inspect_timeout=30 argument to allow for slow response from worker. I get Unknown worker 'celery#' when clicking on the worker in the UI.
Any ideas how to get this working?
There is a Refresh button on the dashboard page. It refreshes workers by resending inspect command. If you launch workers after flower just refresh workers.
The Refresh button is a choice under the drop down menu that has Shut Down selected by default. In order to Refresh, you have to select a worker (or all of them) first.According here:
https://github.com/mher/flower/issues/395