Is there any way to prevent celery from doing apply_async if the task with provided task_id already exists? - celery

I have been expecting problems when celery has started 2 tasks with the same id in parallel.
We can prevent this by checking if the celery already has task with the specified id and do not send the task, however, the way is not a really beautiful one.
Are there any ideas on how to make it in a beauty manner?

It is explained in the Ensuring a task is only executed one at a time section of the Celery documentation.

Related

Detecting outstanding celery tasks

This is purely for a non-eager pytest mode of operation. I want to know when celery has "caught" up with all the outstanding work. Is there any way to find that information? My testing config has a celery_session_app and a single celery_session_worker in it's own thread.
Check the number of entries in the Rabbit queue. This has problems because of pre-fetch. I can set prefetch to 1 and maybe solve it that way but I worry about race conditions. (I'm testing chords and some celery tasks queue other celery tasks)
Add a task to the "end" of the list and then .wait() on it to finish. This has problems for tasks that queue other tasks because the queue is being extended in the other thread so I can be at the end of the list when queued, but that quickly moves forward as tasks are queued behind it. I can work around this using .apply_async(countdown=3) but this is pretty much the definition of a race condition and I might need countdown=4 or I might need nothing and that is some number of seconds wasted on a test regardless.
Use signals (somehow). But what I really need is a worker_is_bored which does not exist and suffers from the same kind of race conditions mentioned above. Tasks queueing tasks could make it flash "bored" and right back to "busy".
time.sleep(N) but what should N be. (i'm running pytest -n 10 so how busy the machine is during tests, is non-trivial). And this wastes time like countdown= above.

Methods of testing nested celery tasks

To simplify the question, suppose we have a celery task which calls other celery tasks.
My understanding is that there no way to "patch" (as in unit test mock) the task that is being queued within the task you want to test, and therefore you have no means of controlling that nested/queued task internally.
Also, my understanding is that the only way to verify the outcome of this [task you want to test], is to do integration tests, and verify the external effects of the whole flow you just executed.
Therefore it would seem the best solution is avoid as much as possible queuing tasks within tasks and when you do so, to keep it as basic as possible?
Can someone confirm this or provide alternative views on the topic?
Thank you

Give an entire Celery chain priority over new tasks

I want to launch a chain of Celery tasks, and have them all execute before any newly arriving tasks do. I'll have a single worker process handling all tasks.
I guess the easiest thing to do would be to not make them a chain at all, but instead launch a single task that synchronously calls a sequence of functions. But I'd like to take advantage of Celery retries, allowing each task to be retried a different number of times.
What's the best way to do this?
If you have a single worker running a single process then as far as I can tell from working with celery (this is not explicitly documented) you should get the behavior you want.
If you want to use multiple worker processes then you may need to set CELERYD_PREFETCH_MULTIPLIER to 1.

How can I create a Scheduled Task that will run every Second in MarkLogic?

MarkLogic Scheduled Tasks cannot be configured to run at an interval less than a minute.
Is there any way I can execute an XQuery module at an interval of 1 second?
NOTE:
Considering the situation where the Task Server is fully loaded and I need to make sure that the secondly scheduled task gets the Task Server thread whenever it needs.
Please let me know if there is anything in MarkLogic that can be used to achieve this.
Wanting rapid-fire scheduled tasks may be a hint that the design needs rethinking.
Even running a task once a minute can be risky, and needs careful thought to manage the possibilities of overlapping tasks and runaway tasks. If the application design calls for a scheduled task to run once a second, I would raise that as a potentially serious problem. Back up a few steps, and if necessary ask a new question about the higher-level problem that led to looking at scheduled tasks.
There was a sub-question about managing queue priority for tasks. Task priorities can handle some of that. There are two priorities: normal and higher. The Task Server empties the higher-priority queue first, then the normal queue. But each queue is still a simple queue, and there's no way to change priorities after a task has been spawned. So if you always queue tasks with priority=higher, then they'll all be in the higher priority queue and they'll all run in order. You can play some games with techniques like using server fields as signals to already-running tasks. But wanting to reorder tasks within a queue could be another hint that the design needs rethinking.
If, after careful thought about all the pitfalls and dangers, I decided I needed a rapid-fire task of some kind.... I would probably do it using external requests. Pick any scripting language and write a simple while loop with an HTTP request to the MarkLogic cluster. Even so, spend some time thinking about overlapping requests and locking. What happens if the request times out on the client side? Will it keep running on the server? Will that lead to overlapping requests and require deadlock resolution? Could it lead to runaway resource consumption?
Avoid any ideas that use xdmp:sleep. That will tie up a Task Server thread during the sleep period, and then you'll have two problems.

Celery: list all tasks, scheduled, active *and* finished

Update for the bounty
I'd like a solution that does not involve a monitoring thread, if possible.
I know I can view scheduled and active tasks using the Inspect class of my apps Control.
i = myapp.control.inspect()
currently_running = i.active()
scheduled = i.scheduled()
But I could not find any function to show already finished tasks. I know that this information mus be at least temporarily accessible, because I can look up a finished task by its task_id:
>>> r = my task.AsyncResult(task_id=' ... ')
>>> r.state
u'SUCCESS'
How can I get a complete list of scheduled, active and finished tasks? Or possibly a list of all tasks at once?
Celery Flower shows tasks (active, finished, reserved, etc) in real time. It enables to filter tasks by time, workers and types.
https://github.com/mher/flower
One option not requiring a monitoring thread is a Celery on_success handler (using bootsteps feature in 3.1+) - this would need to write relevant info to your own datastore.
You need to create a custom task class to do this. This on_failure example gives an idea.
Possibly better option, needing less code, is to use a task_success signal in a similar way, recording the info you need later.
The Flower option is probably simpler, as you are querying info already maintained by Flower when tasks complete - see this answer.