How to use canvas without import task module in celery client? - celery

Suppose that we have a caller A and a callee B. B implements some function and has heavy dependencies.
Typically, I don't want to import these dependencies in A to keep it lightweight.
Previously, I can happily use send_task to call B by name.
Now I have more complex logic and wanna orchestrate tasks with canvas. Following the user guide here:
signature('tasks.add', args=(2, 2), countdown=10)
I got a task NotRegistered error.
How to register task by name?

Without seeing your code, we can't actually help you. So I'll answer your question generically using the docs:
For reference, here is the example snippet from celery:
from celery import Celery
app = Celery('tasks', broker='pyamqp://guest#localhost//')
#app.task
def add(x, y):
return x + y
Note that we create the signature tasks.add because the application is named tasks in the Celery constructor and the function is named add. This task is registered because it is in a tasks.py file. So, to call a task by name we have to make sure of three things:
Is the task registered properly? We can check by looking at the output when a celery worker starts, and the worker outputs a list of registered tasks. If not, make sure the task is in a tasks.py file or a tasks module that celery knows to look for.
What is my application name? We can find this by looking for the call to the Celery constructor. It is what you pass to the -A parameter when starting a worker.
What is my task name? If no name is specified in the app.task decorator, use the function name. Otherwise use the name specified in the app.task decorator.
After ensuring (1), concatenate (2) and (3), and put a . in between them to get the task name to call.

Related

Celery setup and teardown tasks

I am trying to use celery to parallelise the evaluation of a function with different parameters.
Here is a pseudo-code of why I am trying to achieve, which assumes that there is a function called evaluate decorated with #app.task
# 0. Setup cluster, celery or whatever parallelisation backend
pass
# 1. Prepare each node to simulate, this means sending some files
for node in mycluster:
#send files to node
pass
# 2. Evaluation phase
gen = Generator() # A Generator object creates parameter vectors that need to be evaluated
while not gen.finished():
par_list = gen.generate()
asyncs = []
for p in par_list:
asyncs.append(evaluate.delay(p))
results = [-1 for _ in par_list]
for i, pending in enumerate(asyncs):
if not pending.ready():
pending.wait()
if pending.successful():
results[i] = pending.get()
else:
pass # manage error
# send results to generator so that it generates a new set of parameters later
gen.tell(results)
# 3. Teardown phase
for node in mycluster:
#tell node to delete files
pass
The problems with this approach is that if my main application is running, and it has already passed the setup phase, then when new node connects, it certainly will not pass the setup phase. Similarly, the teardown phase will not be executed if a node disconnects.
A couple of solutions come to mind:
Instead of using a setup phase, chain two functions so that each node does setup | evaluate | teardown for each iteration of the "2. evaluation phase" loop. The problem here is that sending files through the message queue is something that I would like to avoid as much as possible.
Configure the workers to have a setup and teardown task so that they are automatically ready when they connect. I tried using bootsteps.StartStopStep , but I am not sure if this is the right way to go.
Setup a distributed file system so that there is no need to prepare and delete files before and after the evaluations
The concrete question here is, what's the recommended approach for these kind of tasks? I am sure that this is not a convoluted use-case and maybe one of you can provide some guidance on how should I approach this.
I'm not sure this is a worker issue - remember you may have a host of workers on a node. This sounds more like a node initialization issue. Why not have a job (a systemd task, an init script, whatever) that runs before the celery workers and which copies the files over. Similarly, in reverse, for tear down.

Questions about Celery

I'm trying to learn what Celery is & does and I've gone through some basic tutorials like first steps with Celery and this one, and I have a few questions about Celery:
In the (first steps) tutorial, you start a Celery worker and then you can basically just open an interpreter & call the task defined as:
>>> from tasks import add
>>> add.delay(4, 4)
So my questions are:
What's happening here? add() is a method that we wrote in the tasks file and we're calling add.delay(4,4). So we're calling a method over a method!?
How does this work? How does the 'delay' method get added to 'add'?
Does the Celery worker do the work of adding 4+4? As opposed to the work being done by the caller of that method? - like it would have been if I had just defined a method called add in the interpreter and just executed
add(4,4)
If the answer to 3 is yes, then how does Celery know it has to do some work? All we're doing is - importing a method from the module we wrote and call that method. How does control get passed to the Celery worker?
Also, while answering #4, it'd also be great if you could tell me how you know this. I'd be very curious to know if these things are documented somewhere that I'm missing/failing to understand it, and how I could have known the answer. Thanks much in advance!
What's happening here? add() is a method that we wrote in the tasks file and we're calling add.delay(4,4). So we're calling a method over a method!?
Everything is an object in Python. Everything has properties. Functions/methods also have properties. For example:
def foo(): pass
print(foo.__name__)
This is nothing special syntax-wise.
How does this work? How does the delay method get added to add?
The #app.task decorator does that.
Does the Celery worker do the work of adding 4+4? As opposed to the work being done by the caller of that method?
Yes, the worker does that. Otherwise this would be pretty nonsensical. You're passing two arguments (4 and 4) to the Celery system which passes them on to the worker, which does the actual work, in this case addition.
If the answer to 3 is yes, then how does Celery know it has to do some work? All we're doing is - importing a method from the module we wrote and call that method. How does control get passed to the Celery worker?
Again, the #app.task decorator abstracts a lot of magic here. This decorator registers the function with the celery worker pool. It also adds magic properties to the same method that allow you to call that function in the celery worker pool, namely delay. Imagine this instead:
def foo(): pass
celery.register_worker('foo', foo)
celery.call('foo')
The decorator is essentially just doing that, just without you having to repeatedly write foo in various ways. It's using the function itself as identifier for you, purely as syntactic sugar so you don't have to distinguish much between foo() and 'foo' in your code.

How to define a new setting and set its value for a task?

I've a myTask task that invokes other tasks as follows:
def myTask = Task <<= (Task1, Task2, Task3) map {(_,_,_)=>;}
Task1, Task2, Task3 take a tcWebApp config variable that is a directory.
tcWebApp := file("../tomcat")
Everything works fine.
What I need now is to create another task myTask2 that'd be similar to myTask, but I'd like to invoke this task with another directory set for the tcWebApp setting, i.e. the setting should have another value for the task. Is it possible?
I've tried something like
tcWebApp in myTask2 := file("newDir")
but it didn't work. Please advice.
When you write:
`tcWebApp in myTask2` := ...
It doesn't mean "while myTask2 is executing, tcWebApp has the following value," as you want it to. What it does mean is, "if anyone asks myTask2 what value it has for tcWebApp, it will reply as follows." It doesn't have any effect on the global value of tcWebApp; and if nobody ever asks myTask2 what its value for tcWebApp is, then setting it in that task has no effect at all. So Task1 will continue to use the global value of tcWebApp.
I found some related questions on Stack Overflow:
Using SBT, how do you execute a task with a different Setting[T] value at runtime?
Here Daniel Sobral writes "From what I understand from your question, you want the setting to be different for a dependency depending on what is depending on it. This doesn't make sense -- a dependency either is satisfied or it isn't, and what depends on it doesn't come into the equation." As I understand it, that is the answer to your question.
In order to work around this, instead of attempting to reuse Task1 and Task2 as tasks, reuse the code inside them instead. Have Task1 and Task2 invoke ordinary methods that you define, and then have myTask2 call those same methods, passing them different parameters. In other words don't try to solve your problem with settings; solve it by means of ordinary Scala code.
How to change setting inside SBT command?
Or, here's another approach you could take. If you make myTask2 a command rather than a task, you can do what you want. See http://www.scala-sbt.org/release/docs/Extending/Commands.html which says "a command can look at or modify other sbt settings".

django - difference between signals and celery

This may be a lame question, but I am really confused with these two. I know signals are used to do some task when something has happened. But what about celery? In the documentation it says:
Celery is an asynchronous task queue/job queue based on distributed message passing.
Will someone please explain to me of what celery is? What's the difference between these two and when to use them? Will be much appreciated! Thank you.
Fist of all django signals are synchronous. For example if you have a signal which handle before_save action of SomeModel. And you have ... say ... view function like this:
def some_view(requrest):
# do some staff
SomeModel(some_field="Hello").save()
# do other stuff
Time line of your code will be like so:
Do some stuff
Do SomeModel(some_field="Hello").save() call:
Do some django stuff
Execute signal before_save handler just before actual save
Do actual save to DB
Do other stuff
Signals work on same instance of python interpreter (in other words in same process of your OS).
Celery provides asynchronous tasks. That tasks MAY work like django signals (e.g. celery_task(arg1="Hello")). But common case is async calls:
celery_task.delay(arg1="Hello")
This is not a simple function call. This function will be executed in other python process (celery worker). And after this call you can decide: do you want to wait for result of this function? or you keep going with your code? or do you want something tricky?
Celery is very handy in case you want do some background or scheduled tasks like resize images, decode videos, update facebook status, etc.

Django Celery Workflow Chain Pause/Resume

Is there any way to pause/resume a running workflow created using chains from celery 3.0?
Basically, we have two different types of tasks in our system: interactive and non-interactive ones. The non-interactive ones we have all the parameters for, but the interactive ones need user input. Note that for the interactive tasks, we can only ask for user input once all the previous taks in the chain have been completed, as their results will affect the interactive tasks (i.e. we cannot ask for user input before creating the actual chain).
Any suggestion on how to approach this? Really at a loss here..
Current ideas:
Create two subclasses of Task (from celery import Task). Add an extra instance (class member) variable to the Interactive task subclass that is set to false by default and represents that some user input is still needed. Somehow have access to the instance of the Task, and set it to true from outside the celery worker (Though I have looked this up quite a bit and it doesn't seem possible to have access to Task objects directly from another module)
Partition the chain into multiple chains delimited by Interactive jobs. Have some sort of mechanism outside the celery worker detect once a chain has reached it's end and trigger the interactive task's interactive client side component. Once the user has entered all this data, get the data, and start the new chain where the interactive task is at the head of the new chain.
We have implemented something like your second idea in our project & it works fine. Here is the gist of the implementation.
Add new field status to your model & override save method.
models.py:
class My_Model(models.Model):
# some fields
status = models.IntegerField(default=0)
def save(self, *args, **kwargs):
super(My_Model, self).save(*args, **kwargs)
from .functions import custom_func
custom_func(self.status)
tasks.py
#celery.task()
def non_interactive_task():
#do something.
#celery.task()
def interactive_task():
#do something.
functions.py
def custom_func(status):
#Change status after non interactive task is completed.
#Based on status, start interactive task.
Pass status variable to template which is useful for displaying UI element for user to enter information. When user enter required info, change the status. This calls custom_func which triggers your interactive_task.