To simplify the question, suppose we have a celery task which calls other celery tasks.
My understanding is that there no way to "patch" (as in unit test mock) the task that is being queued within the task you want to test, and therefore you have no means of controlling that nested/queued task internally.
Also, my understanding is that the only way to verify the outcome of this [task you want to test], is to do integration tests, and verify the external effects of the whole flow you just executed.
Therefore it would seem the best solution is avoid as much as possible queuing tasks within tasks and when you do so, to keep it as basic as possible?
Can someone confirm this or provide alternative views on the topic?
Thank you
Related
This is purely for a non-eager pytest mode of operation. I want to know when celery has "caught" up with all the outstanding work. Is there any way to find that information? My testing config has a celery_session_app and a single celery_session_worker in it's own thread.
Check the number of entries in the Rabbit queue. This has problems because of pre-fetch. I can set prefetch to 1 and maybe solve it that way but I worry about race conditions. (I'm testing chords and some celery tasks queue other celery tasks)
Add a task to the "end" of the list and then .wait() on it to finish. This has problems for tasks that queue other tasks because the queue is being extended in the other thread so I can be at the end of the list when queued, but that quickly moves forward as tasks are queued behind it. I can work around this using .apply_async(countdown=3) but this is pretty much the definition of a race condition and I might need countdown=4 or I might need nothing and that is some number of seconds wasted on a test regardless.
Use signals (somehow). But what I really need is a worker_is_bored which does not exist and suffers from the same kind of race conditions mentioned above. Tasks queueing tasks could make it flash "bored" and right back to "busy".
time.sleep(N) but what should N be. (i'm running pytest -n 10 so how busy the machine is during tests, is non-trivial). And this wastes time like countdown= above.
I have been expecting problems when celery has started 2 tasks with the same id in parallel.
We can prevent this by checking if the celery already has task with the specified id and do not send the task, however, the way is not a really beautiful one.
Are there any ideas on how to make it in a beauty manner?
It is explained in the Ensuring a task is only executed one at a time section of the Celery documentation.
I want to launch a chain of Celery tasks, and have them all execute before any newly arriving tasks do. I'll have a single worker process handling all tasks.
I guess the easiest thing to do would be to not make them a chain at all, but instead launch a single task that synchronously calls a sequence of functions. But I'd like to take advantage of Celery retries, allowing each task to be retried a different number of times.
What's the best way to do this?
If you have a single worker running a single process then as far as I can tell from working with celery (this is not explicitly documented) you should get the behavior you want.
If you want to use multiple worker processes then you may need to set CELERYD_PREFETCH_MULTIPLIER to 1.
MarkLogic Scheduled Tasks cannot be configured to run at an interval less than a minute.
Is there any way I can execute an XQuery module at an interval of 1 second?
NOTE:
Considering the situation where the Task Server is fully loaded and I need to make sure that the secondly scheduled task gets the Task Server thread whenever it needs.
Please let me know if there is anything in MarkLogic that can be used to achieve this.
Wanting rapid-fire scheduled tasks may be a hint that the design needs rethinking.
Even running a task once a minute can be risky, and needs careful thought to manage the possibilities of overlapping tasks and runaway tasks. If the application design calls for a scheduled task to run once a second, I would raise that as a potentially serious problem. Back up a few steps, and if necessary ask a new question about the higher-level problem that led to looking at scheduled tasks.
There was a sub-question about managing queue priority for tasks. Task priorities can handle some of that. There are two priorities: normal and higher. The Task Server empties the higher-priority queue first, then the normal queue. But each queue is still a simple queue, and there's no way to change priorities after a task has been spawned. So if you always queue tasks with priority=higher, then they'll all be in the higher priority queue and they'll all run in order. You can play some games with techniques like using server fields as signals to already-running tasks. But wanting to reorder tasks within a queue could be another hint that the design needs rethinking.
If, after careful thought about all the pitfalls and dangers, I decided I needed a rapid-fire task of some kind.... I would probably do it using external requests. Pick any scripting language and write a simple while loop with an HTTP request to the MarkLogic cluster. Even so, spend some time thinking about overlapping requests and locking. What happens if the request times out on the client side? Will it keep running on the server? Will that lead to overlapping requests and require deadlock resolution? Could it lead to runaway resource consumption?
Avoid any ideas that use xdmp:sleep. That will tie up a Task Server thread during the sleep period, and then you'll have two problems.
I'm planning a mechanism whose usage scenarios would be like cron's. It's a clock-ish mechanism that attempts task execution at prespecified times. Cron doesn't seem suitable, because these tasks trigger Scala method calls and the queue stored on a cloud database.
I imagine it like this: every x minutes, tasks' execution dates are retrieved from the database, and compared against current time, if the task is over-due it is executed and removed from queue.
My question is: how do I run the aforementioned check every x minutes on a distributed environment?
All advice encouraged.
I think the Akka scheduler might be what you are looking for. Here's a link to the Akka documentation and here's another link describing how to use Akka in Play.
Update: as Viktor Klang points out Akka is not a scheduler, however it does allow you to run a task periodically. I've used it in this mode quite successfully.
The best known library for this is Quartz Scheduler.